Brief Notes (CS)

Starts from October 14, 2018


To record some brief notes/summary of some random pieces I read. Those takeaways might be incomprehensive. Please check the original link to get a better understanding.

10/14/2018 20:43

Spark on Yarn

- comment $SPARK_HOME/conf/spark-defaults.conf to enable/disable yarn

10/24/2018 03:27


What I read: Plain explanation, Document


  • Web server (Nginx/Apache) is a daemon that runs at the background and bridge the application with the requests.
  • Application is to achieve some certain functionalities.
  • CGI is the oldest server-app protocol. The common version of CGI starts a process whenever a request comes and has performance issue. One has to embed the interpreter into the web server.
  • FastCGI is an improved version of CGI by creating a long-running daemon. 
  • WSGI is a new server-app protocol that is specifically designed for Python. The application doesn’t have to worry about what is used at the other layer like mod_python or FastCGI.
  • uWSGI a library that implements WSGI and bridge Web Server (Nginx/Apache) with Web Application.

10/24/2018 03:41

Metaclass in Python

Reading Material: this blog (in Chinese) 


  • Instantiation of a class is an object, and Instantiation of a metaclass is a class.
  • A default metaclass is defined called type.
    • MyClass = type(name, bases, attrs)
  • Objects are created by the __new__ method of its metaclass, not by __init__
  • A good example will be ORM (creates a new class every time when a new Model is defined).

10/24/2018 03:41

Inter-Process Communication

Readings: details (in Chinese), overall (in Chinese).


  • Pipe: single direction at a time; only between parent-child or siblings; in memory; first in first out; buffer will be cleared after read; block when full; has to confirm another process is alive (otherwise breaks).
  • FIFO: use file path to identify; but store content in memory; bridge different processes; doesn’t support seek (only FIFO); has to confirm another process is alive (otherwise blocked)
  • Signal: doesn’t have to confirm aliveness of another process; kernel will keep the signal if the process is in sleep; receiver can block a signal for a while; common signals (SIGKILL/SIGTERM); software-simulation of interruption
  • Message Queue: a message link list in kernel, killed when kernel restarts (unlike Pipe/FIFO only in memory); first in first out but can random access (seek); multiple process read/write; two major types (POSIX, System V)
  • Share memory: kernel reserves a memory and maps it to different processes; require some mechanisms like mutex or semaphore to sync
  • Semaphore: a counter (positive number means #resouces); block when less than 0; +1 and -1 are atomic operations;
    • mutex vs Semaphore: for mutex vs for sync; binary vs integer; semaphore has order?
  • Socket: locally and wwwly; AF_INET uses IP:PORT whilst AF_UNIX uses file path; bidirectional

10/24/2018 04:32

CSR/SSR React Isomorphic

Reading: blog (in Chinese), blog (in Chinese)


  • CSR: client-side rendering
  • SSR: server-side rendering
  • problems:
    • CSR has long TTFP (Time To First Page)
    • CSR has poor SEO (search engine optimization)
  • Isomorphic: render (js) at server-side first; used to reduce TTFP and improve SEO; different routing code in server-side and client-side; use proxy to handle cookie problem (if have to requests other API)
  • Isomorphic redux: store cannot be singleton on server (since we have multiple clients); 

10/27/2018 14:49

Duck Typing and Monkey Patching

Readings: Duck Typing, Monkey Patching


  • In Duck Typing, whether an object is suitable for an operation depends on its methods/properties, not its type (class/base class).
    • E.g. A class with `__iter__` and `__next__` can be considered iterable although it didn’t inherit from class ``.
  • Using Monkey Patching, we can replace the methods/attributes/functions at runtime.
    • Pros: applied locally (for test) / extend the original code / distribute fixes with source code
    • Cons: only applied locally / can be misleading

11/06/2018 01:26

Spark Stream/Flink/Storm

Readings: zhihu1, zhihu2, Strom vs Flink


  • Spark:
    • Micro-batch streaming
    • With spark on yarn (cluster mode), the spark streaming has to communicate with the driver for each batch, which causes high latency
  • Storm
    • Native streaming
    • Depends ZooKeeper: Nimbus (master) and Supervisor (slave)
    • Stateless
  • Flink
    • Native streaming
    • High throughput and low latency (than storm)
    • Stateful (Exactly once)

02/14/2019 14:38

Goroutine vs Thread


Thread’s disadvantage:

  • Large stack size per thread (>=1MB)
  • A lot of registers to store when switching
  • Setup and teardown requires


  • Goroutines exist only in the virtual space of go runtime (not in OS)
  • Go Runtime maintains three C structs:
    • G Struct: a single goroutine (stack pointer, base of stack, ID, cache, status)
    • M Struct: an OS thread (info of thread+pointer to global queue of runnable goroutines, current running goroutine, reference to the scheduler)
    • Sched Struct: a global struct (contains queue free and waiting goroutines, threads)
  • When boosting, go runtimes initiates few goroutines for GC, scheduler and user code.
  • A Goroutine only takes only 2KB of stack size when created. The stack can be doubled and copied to extend.
  • If a thread is blocked because the running goroutine get blocked on a system call, then another thread is taken from the waiting queue of Scheduler and used for other runnable goroutines.
  • Communication using channels happens in the virtual space and hence, OS doesn’t block the thread.

Go Scheduler

  • Cooperative scheduling: another goroutine will only be scheduled if the running one is block or done:
    • Channel send/receive operations (if is a blocking operation)
    • Go statement.
    • Blocking syscalls (file/network/other IO ...)
    • Being stopped for a GC cycle.
  • Better than pre-emptive scheduling: uses timely system interrupts (e.g. every 10 ms, 100 CPU clock) to block and schedule a new thread
  • Since the switching is invoked implicitly in the code (e.g. during sleep or channel wait), only 3 registers (i.e. IP, SP, DX) being updated during context switch.

02/14/2019 15:35

How AOP (Aspect-oriented-programming) works: Code Weaving

Sources: StackOverflow (Comprehensive)CNblogs (Chinese)StackOverflow (Spring AOP)


  • The key is to weave a piece of code with another:
    • Source Code Weaving:
      • How: copy and paste (via some preprocessor)
      • Good for performance; No dependency attached
      • Hard for separated debugging/building; Large compiled file
      • Overall: outdated
    • Compile-Time Weaving:
      • How: a special compiler
      • No runtime overhead
      • Cannot defer decisions to runtime; Need to have the source code (not good for 3rd party libs)
    • Binary Weaving:
      • How: weave during "linking"
      • No need for source code; avoid Overhead of loading-time weaving
      • Cannot cancel an aspect once woven into the code (can do with `if()` but not efficient)
    • Load-Time Weaving:
      • How: a weaving agent/library is loaded when VM/container is started. A configuration file will describe which aspect should be woven into which class.
      • Can dynamically decide what/if to weave; same efficiency as compile-time weaving/binary weaving
      • Overhead during application start-up (when class-loading occurs)
    • Proxy-based Load-time weaving:
      • Used by Spring AOP, a mixture of CTW/BW/LTW. Compiled by AspectJ.
      • ???No special advantage; framework support
      • limited to public, non-static methods and runtime overhead due to the proxy-based approach; does not capture internal method calls (i.e. not proxied methods); special pointcuts not supported (i.e. constructor)

02/15/2019 21:03

TODO: React Context