Brief Notes (CS) - Shaofan Lai's Blog

Introduction:

To record some brief notes/summary of some random pieces I read. Those takeaways might be incomprehensive. Please check the original link to get a better understanding.

10/14/2018 20:43

Spark on Yarn

- comment $SPARK_HOME/conf/spark-defaults.conf to enable/disable yarn

10/24/2018 03:27

CGI/FastCGI/WSGI/uWSGI

What I read: Plain explanation, Document

Takeaway:

Web server (Nginx/Apache) is a daemon that runs at the background and bridge the application with the requests.
Application is to achieve some certain functionalities.
CGI is the oldest server-app protocol. The common version of CGI starts a process whenever a request comes and has performance issue. One has to embed the interpreter into the web server.
FastCGI is an improved version of CGI by creating a long-running daemon.
WSGI is a new server-app protocol that is specifically designed for Python. The application doesn’t have to worry about what is used at the other layer like mod_python or FastCGI.
uWSGI a library that implements WSGI and bridge Web Server (Nginx/Apache) with Web Application.

10/24/2018 03:41

Metaclass in Python

Reading Material: this blog (in Chinese)

Takeaway:

Instantiation of a class is an object, and Instantiation of a metaclass is a class.
A default metaclass is defined called type.
- MyClass = type(name, bases, attrs)
Objects are created by the __new__ method of its metaclass, not by __init__
A good example will be ORM (creates a new class every time when a new Model is defined).

10/24/2018 03:41

Inter-Process Communication

Readings: details (in Chinese), overall (in Chinese).

Takeaway:

Pipe: single direction at a time; only between parent-child or siblings; in memory; first in first out; buffer will be cleared after read; block when full; has to confirm another process is alive (otherwise breaks).
FIFO: use file path to identify; but store content in memory; bridge different processes; doesn’t support seek (only FIFO); has to confirm another process is alive (otherwise blocked)
Signal: doesn’t have to confirm aliveness of another process; kernel will keep the signal if the process is in sleep; receiver can block a signal for a while; common signals (SIGKILL/SIGTERM); software-simulation of interruption
Message Queue: a message link list in kernel, killed when kernel restarts (unlike Pipe/FIFO only in memory); first in first out but can random access (seek); multiple process read/write; two major types (POSIX, System V)
Share memory: kernel reserves a memory and maps it to different processes; require some mechanisms like mutex or semaphore to sync
Semaphore: a counter (positive number means #resouces); block when less than 0; +1 and -1 are atomic operations;
- mutex vs Semaphore: for mutex vs for sync; binary vs integer; semaphore has order?
Socket: locally and wwwly; AF_INET uses IP:PORT whilst AF_UNIX uses file path; bidirectional

10/24/2018 04:32

CSR/SSR React Isomorphic

Reading: blog (in Chinese), blog (in Chinese)

Takeaway:

CSR: client-side rendering
SSR: server-side rendering
problems:
- CSR has long TTFP (Time To First Page)
- CSR has poor SEO (search engine optimization)
Isomorphic: render (js) at server-side first; used to reduce TTFP and improve SEO; different routing code in server-side and client-side; use proxy to handle cookie problem (if have to requests other API)
Isomorphic redux: store cannot be singleton on server (since we have multiple clients);

10/27/2018 14:49

Duck Typing and Monkey Patching

Readings: Duck Typing, Monkey Patching

Takeaway:

In Duck Typing, whether an object is suitable for an operation depends on its methods/properties, not its type (class/base class).
- E.g. A class with `__iter__` and `__next__` can be considered iterable although it didn’t inherit from class `collections.abc.Iterable`.
Using Monkey Patching, we can replace the methods/attributes/functions at runtime.
- Pros: applied locally (for test) / extend the original code / distribute fixes with source code
- Cons: only applied locally / can be misleading

11/06/2018 01:26

Spark Stream/Flink/Storm

Readings: zhihu1, zhihu2, Strom vs Flink

Takeaway:

Spark:
- Micro-batch streaming
- With spark on yarn (cluster mode), the spark streaming has to communicate with the driver for each batch, which causes high latency
Storm
- Native streaming
- Depends ZooKeeper: Nimbus (master) and Supervisor (slave)
- Stateless
Flink
- Native streaming
- High throughput and low latency (than storm)
- Stateful (Exactly once)

02/14/2019 14:38

Goroutine vs Thread

Source: https://codeburst.io/why-goroutines-are-not-lightweight-threads-7c460c1f155f

Thread’s disadvantage:

Large stack size per thread (>=1MB)
A lot of registers to store when switching
Setup and teardown requires

Goroutine

Goroutines exist only in the virtual space of go runtime (not in OS)
Go Runtime maintains three C structs:
- G Struct: a single goroutine (stack pointer, base of stack, ID, cache, status)
- M Struct: an OS thread (info of thread+pointer to global queue of runnable goroutines, current running goroutine, reference to the scheduler)
- Sched Struct: a global struct (contains queue free and waiting goroutines, threads)
When boosting, go runtimes initiates few goroutines for GC, scheduler and user code.
A Goroutine only takes only 2KB of stack size when created. The stack can be doubled and copied to extend.
If a thread is blocked because the running goroutine get blocked on a system call, then another thread is taken from the waiting queue of Scheduler and used for other runnable goroutines.
Communication using channels happens in the virtual space and hence, OS doesn’t block the thread.

Go Scheduler

Cooperative scheduling: another goroutine will only be scheduled if the running one is block or done:
- Channel send/receive operations (if is a blocking operation)
- Go statement.
- Blocking syscalls (file/network/other IO ...)
- Being stopped for a GC cycle.
Better than pre-emptive scheduling: uses timely system interrupts (e.g. every 10 ms, 100 CPU clock) to block and schedule a new thread
Since the switching is invoked implicitly in the code (e.g. during sleep or channel wait), only 3 registers (i.e. IP, SP, DX) being updated during context switch.

02/14/2019 15:35

How AOP (Aspect-oriented-programming) works: Code Weaving

Sources: StackOverflow (Comprehensive), CNblogs (Chinese), StackOverflow (Spring AOP)

Takeaway:

The key is to weave a piece of code with another:
- Source Code Weaving:
  - How: copy and paste (via some preprocessor)
  - Good for performance; No dependency attached
  - Hard for separated debugging/building; Large compiled file
  - Overall: outdated
- Compile-Time Weaving:
  - How: a special compiler
  - No runtime overhead
  - Cannot defer decisions to runtime; Need to have the source code (not good for 3rd party libs)
- Binary Weaving:
  - How: weave during "linking"
  - No need for source code; avoid Overhead of loading-time weaving
  - Cannot cancel an aspect once woven into the code (can do with `if()` but not efficient)
- Load-Time Weaving:
  - How: a weaving agent/library is loaded when VM/container is started. A configuration file will describe which aspect should be woven into which class.
  - Can dynamically decide what/if to weave; same efficiency as compile-time weaving/binary weaving
  - Overhead during application start-up (when class-loading occurs)
- Proxy-based Load-time weaving:
  - Used by Spring AOP, a mixture of CTW/BW/LTW. Compiled by AspectJ.
  - ???No special advantage; framework support
  - limited to public, non-static methods and runtime overhead due to the proxy-based approach; does not capture internal method calls (i.e. not proxied methods); special pointcuts not supported (i.e. constructor)

02/15/2019 21:03

TODO: React Context

Source: React Context