- comment $SPARK_HOME/conf/spark-defaults.conf to enable/disable yarn
Introduction:
To record some brief notes/summary of some random pieces I read. Those takeaways might be incomprehensive. Please check the original link to get a better understanding.
10/24/2018 03:27
CGI/FastCGI/WSGI/uWSGI
What I read: Plain explanation, Document
Takeaway:
- Web server (Nginx/Apache) is a daemon that runs at the background and bridge the application with the requests.
- Application is to achieve some certain functionalities.
- CGI is the oldest server-app protocol. The common version of CGI starts a process whenever a request comes and has performance issue. One has to embed the interpreter into the web server.
- FastCGI is an improved version of CGI by creating a long-running daemon.
- WSGI is a new server-app protocol that is specifically designed for Python. The application doesn’t have to worry about what is used at the other layer like mod_python or FastCGI.
- uWSGI a library that implements WSGI and bridge Web Server (Nginx/Apache) with Web Application.
10/24/2018 03:41
Metaclass in Python
Reading Material: this blog (in Chinese)
Takeaway:
- Instantiation of a class is an object, and Instantiation of a metaclass is a class.
- A default metaclass is defined called
type
.MyClass = type(name, bases, attrs)
- Objects are created by the
__new__
method of its metaclass, not by__init__
- A good example will be ORM (creates a new class every time when a new Model is defined).
10/24/2018 03:41
Inter-Process Communication
Readings: details (in Chinese), overall (in Chinese).
Takeaway:
- Pipe: single direction at a time; only between parent-child or siblings; in memory; first in first out; buffer will be cleared after read; block when full; has to confirm another process is alive (otherwise breaks).
- FIFO: use file path to identify; but store content in memory; bridge different processes; doesn’t support seek (only FIFO); has to confirm another process is alive (otherwise blocked)
- Signal: doesn’t have to confirm aliveness of another process; kernel will keep the signal if the process is in sleep; receiver can block a signal for a while; common signals (SIGKILL/SIGTERM); software-simulation of interruption
- Message Queue: a message link list in kernel, killed when kernel restarts (unlike Pipe/FIFO only in memory); first in first out but can random access (seek); multiple process read/write; two major types (POSIX, System V)
- Share memory: kernel reserves a memory and maps it to different processes; require some mechanisms like mutex or semaphore to sync
- Semaphore: a counter (positive number means #resouces); block when less than 0; +1 and -1 are atomic operations;
- mutex vs Semaphore: for mutex vs for sync; binary vs integer; semaphore has order?
- Socket: locally and wwwly; AF_INET uses IP:PORT whilst AF_UNIX uses file path; bidirectional
10/24/2018 04:32
CSR/SSR React Isomorphic
Reading: blog (in Chinese), blog (in Chinese)
Takeaway:
- CSR: client-side rendering
- SSR: server-side rendering
- problems:
- CSR has long TTFP (Time To First Page)
- CSR has poor SEO (search engine optimization)
- Isomorphic: render (js) at server-side first; used to reduce TTFP and improve SEO; different routing code in server-side and client-side; use proxy to handle cookie problem (if have to requests other API)
- Isomorphic redux: store cannot be singleton on server (since we have multiple clients);
10/27/2018 14:49
Duck Typing and Monkey Patching
Readings: Duck Typing, Monkey Patching
Takeaway:
- In Duck Typing, whether an object is suitable for an operation depends on its methods/properties, not its type (class/base class).
- E.g. A class with `__iter__` and `__next__` can be considered iterable although it didn’t inherit from class `collections.abc.Iterable`.
- Using Monkey Patching, we can replace the methods/attributes/functions at runtime.
- Pros: applied locally (for test) / extend the original code / distribute fixes with source code
- Cons: only applied locally / can be misleading
11/06/2018 01:26
Spark Stream/Flink/Storm
Readings: zhihu1, zhihu2, Strom vs Flink
Takeaway:
- Spark:
- Micro-batch streaming
- With spark on yarn (cluster mode), the spark streaming has to communicate with the driver for each batch, which causes high latency
- Storm
- Native streaming
- Depends ZooKeeper: Nimbus (master) and Supervisor (slave)
- Stateless
- Flink
- Native streaming
- High throughput and low latency (than storm)
- Stateful (Exactly once)
02/14/2019 14:38
Goroutine vs Thread
Source: https://codeburst.io/why-goroutines-are-not-lightweight-threads-7c460c1f155f
Thread’s disadvantage:
- Large stack size per thread (>=1MB)
- A lot of registers to store when switching
- Setup and teardown requires
Goroutine
- Goroutines exist only in the virtual space of go runtime (not in OS)
- Go Runtime maintains three C structs:
- G Struct: a single goroutine (stack pointer, base of stack, ID, cache, status)
- M Struct: an OS thread (info of thread+pointer to global queue of runnable goroutines, current running goroutine, reference to the scheduler)
- Sched Struct: a global struct (contains queue free and waiting goroutines, threads)
- When boosting, go runtimes initiates few goroutines for GC, scheduler and user code.
- A Goroutine only takes only 2KB of stack size when created. The stack can be doubled and copied to extend.
- If a thread is blocked because the running goroutine get blocked on a system call, then another thread is taken from the waiting queue of Scheduler and used for other runnable goroutines.
- Communication using channels happens in the virtual space and hence, OS doesn’t block the thread.
Go Scheduler
- Cooperative scheduling: another goroutine will only be scheduled if the running one is block or done:
- Channel send/receive operations (if is a blocking operation)
- Go statement.
- Blocking syscalls (file/network/other IO ...)
- Being stopped for a GC cycle.
- Better than pre-emptive scheduling: uses timely system interrupts (e.g. every 10 ms, 100 CPU clock) to block and schedule a new thread
- Since the switching is invoked implicitly in the code (e.g. during sleep or channel wait), only 3 registers (i.e. IP, SP, DX) being updated during context switch.
02/14/2019 15:35
How AOP (Aspect-oriented-programming) works: Code Weaving
Sources: StackOverflow (Comprehensive), CNblogs (Chinese), StackOverflow (Spring AOP)
Takeaway:
- The key is to weave a piece of code with another:
- Source Code Weaving:
- How: copy and paste (via some preprocessor)
- Good for performance; No dependency attached
- Hard for separated debugging/building; Large compiled file
- Overall: outdated
- Compile-Time Weaving:
- How: a special compiler
- No runtime overhead
- Cannot defer decisions to runtime; Need to have the source code (not good for 3rd party libs)
- Binary Weaving:
- How: weave during "linking"
- No need for source code; avoid Overhead of loading-time weaving
- Cannot cancel an aspect once woven into the code (can do with `if()` but not efficient)
- Load-Time Weaving:
- How: a weaving agent/library is loaded when VM/container is started. A configuration file will describe which aspect should be woven into which class.
- Can dynamically decide what/if to weave; same efficiency as compile-time weaving/binary weaving
- Overhead during application start-up (when class-loading occurs)
- Proxy-based Load-time weaving:
- Used by Spring AOP, a mixture of CTW/BW/LTW. Compiled by AspectJ.
- ???No special advantage; framework support
- limited to public, non-static methods and runtime overhead due to the proxy-based approach; does not capture internal method calls (i.e. not proxied methods); special pointcuts not supported (i.e. constructor)
- Source Code Weaving: