An Update Log System - Shaofan Lai's Blog

For the recent few months, I realized that I wrote in the “Exp” (experiment) section more than writing in the post section. Therefore, it’s hard to track my updating from the homepage. Since the Exp section and the post section are using different schemas, I cannot simply concatenate two tables and sort it by time. It’s quite necessary to design a change log system for my current blog.

Target: An Update Log System for the blog

Functionality: Show the most recent updates from different sources (tables/sections/formats). The demonstrated updates should have a brief information about the update including the title and the URL.

#Design 1: Active Fetching

The basic idea is that when we query for the recent updates, the system will fetch a uniform representation of updates from different sources separately and sort them by time. The key is to translate different formats into one we should do that for different sources.

Since I am using an ORM to define the schemas/tables, we can simply define an abstract class (interface) and inherit other tables from it. For example, we can define a Loggable class:

class Loggable(ABC):
    @abstractmethod
    def get_loggable_item(self):
        raise NotImplementedError

class Post(..., Loggable):
    ...

    def get_loggable_item(self):
        return (self.update_date, self.url)

class ExpLog(..., Loggable):
    ...

    def get_loggable_item(self):
        return (self.modified_date, self.url+'#id-'+self.id)

Advantages[+] and disadvantages[-]:

[+] Doesn’t have to take extra space/tables to save the update logs.
[+] Few modifications. Just need to inherit an interface and update the ORM definition.
[-] Has to scan every element from different sources during the query.
- quick fix: Building a buffer after the query.
[-] One element (explog/post) cannot have multiple update logs.

#Design 2: Active Logging

Rather than scanning all the sources, we want to create a new table and log the updates whenever we make one.

class UpdateLog(ORM.Model):
    id = ORM.Int(primary_key=True)
    time = ORM.TimeType(index=True)
    url = ORM.String()
    info = ORM.Text()
    
    @staticmethod
    def log(time, url, info=''):
        log = UpdateLog(time=time, url=url, info=info)
        db.session.add(log)
        db.session.commit()

def update_post(...):
    ...
    UpdateLog.log(time(), post.url)

Advantages[+] and disadvantages[-]:

[+] Can query directly without scanning all the sources.
[+] Can delete the outdated log to boost the query.
[+] One element can have multiple update logs.
[+] Can attach extra information.
[-] Have to modify every update logic (update_post/create_post/update_exp/create_exp...) in the API. Lots of work.
- Quick fix: Log when committing a change to the database. Which can be hard to hack if the ORM package doesn't provide a callback interface.
[-] Use extra space. Need to add a new table.

#Design 3

If we don’t consider the case where we need multiple logs for one element, then we can compare these two designs in this table:

Space/Time (for n elements)	m Updates (temporally monotonic)	Query first q
Design1 (Fetching)	O(n)/O(m)	O(q)/O(n*log(q))
Design2 (Logging)	O(m)/O(m)	O(q)/O(q)

As we can see, if the n is small, then we should pick the first design. This can be applied in some sections with few elements and lots of updates (e.g., the homepage or the billboard). Otherwise, if the number of elements is large then we should avoid scanning all elements in this source one by one.

Based on that, we can build a mixture system of Design1 and Design2 to balance the query time and storage space. We can design that source by source and merge them eventually.

#My choice

Design2 since I need multiple logs for one element.