Dig deep in Python

Posted on June 16, 2018

Python is a widely used language nowadays. You can learn python in 30 minutes and write scripts to tackle a wide range of task from batch processing to deep learning. However, Python itself has a lot of mechanisms that most people don’t know even they used it for years (that’s me). To dive deep and re-learn Python again, I am going to read:

I will share with you some interesting features that I used to ignore. I assume you are semi-pro in Python and want to dig deeper in Python. (This is just a list of GISTS, NOT DETAILS.)


> importing in a class

# PythonForProgrammers/compose.py
class Compose:
    from utility import f


> Static field

The simple definition of static field in python class is dangerous. It only stores the reference. Therefore, if you use inplace functions like `append` or `update`, the static field maintains the same across different instances. However, if you use assign operation like `=`, then rather than changing the value of the static field, the reference of the new value will override the static field of that instance:

In [1]:
class Foo(object):
    v1 = "a"
    v2 = []
    def __repr__(self):
        return ("<v1 = {} | v2 = {}>".format(self.v1, self.v2))
In [2]:

f1 = Foo()
f2 = Foo()
f3 = Foo()
In [3]:

f1.v1 = "b"
f1.v2 = [1,2,3]
f1, f2, f3

(<v1 = b | v2 = [1, 2, 3]>, <v1 = a | v2 = []>, <v1 = a | v2 = []>)
In [4]:

f1.v1 = "b"
f1.v2 = [1,2,3]
f1, f2, f3

(<v1 = b | v2 = [1, 2, 3]>, <v1 = a | v2 = []>, <v1 = a | v2 = []>)
In [5]:

Foo.v1 = "100"
f1, f2, f3

(<v1 = b | v2 = [1, 2, 3]>, <v1 = 100 | v2 = []>, <v1 = 100 | v2 = []>)
In [6]:

f1, f2, f3

(<v1 = b | v2 = [1, 2, 3]>, <v1 = 100 | v2 = [10]>, <v1 = 100 | v2 = [10]>)
In [7]:

id(Foo.v1), id(f1.v1), id(f2.v1)

(139652076653288, 139652244728440, 139652076653288)

Remember to use the class to refer to the static member, aka use `Foo.v1` not `f1.v1`.

> Clean up

I found this part is confusing and recommend you this post instead to get a better understanding of weak reference.
When assigning an instance to another variable like `a=Foo(); b = a`, only the reference is copied and the instance will not be totally removed when we delete it with `del a` because we can still access it via `b`.
A way to avoid that is to use the weak reference, which will prevent `b` from deleting instance `a`. Without that, some hazard situation like “cycle references” might happen.
Cycle reference means the reference of object `a` is stored as a member of object `b`, while the reference of `b` is also stored as a member of object `a`. In that case, deleting both of them will not trigger `__del__()`. This might happen in hierarchical design like (display) tree or double-pointer-link list.
Also, there is a Dead-on-arrival problem where bound method can be referred by a weakref but it actually get destroyed immediately after stored. That is because bound methods are created on demand when accessed on an instance. To store a bound method properly, we should use weakref to refer the method and use a strong reference to refer the class definition, which is used to new a bound method later.

> Unit test

Write the test before code.
White-box testThe test code has complete access to the internals of the class that’s being tested.
Black-box testTreating the class under test as an impenetrable box.
Use frameworks like Unittest or something that comes with the package (e.g. flask.ext.testing for framework flask)

> Decorators

Used to modify/inject functions. `@` is just a little syntax sugar meaning “pass a function object through another function and assign the result to the original function.” or “applying code to other code” (i.e. macros)

def foo(): pass
foo = staticmethod(foo)

# is equal to 

def foo(): pass
  • The `__name__` of the returned function is not as same as the original one, but the one generated by the decorator.
  • (DEMO) Decorator without arguments: If we create a decorator without arguments, the function to be decorated is passed to the constructor, and the __call__() method is called whenever the decorated function is invoked.
  • (DEMO) Decorator with arguments: Use the passed in arguments to decorate the inner-decorator, and then return the inner-decorator. The inner-decorator or wrapper will decorate the outside function that is truly needed to be decorated.

> Indexing

When indexing, we are actually calling the `__getitem__` function of that object.

  • By passing slicing index like `arr[1:10:2]`, we are actually passing a `slice` object initialized with `slice(1, 10, 2)`.
  • By passing `arr[…]`, we are passing `Ellipsis` to `__getitem__`, which is a constant as `False` and `True`.

> Metaprogramming

Some basic concepts:

  • Class can also be edited as an object.
  • It’s worth re-emphasizing that in the vast majority of cases, you don’t need metaclasses
  • metaclasses create classes, and classes create instances
  • Define a class with type: type(name, bases, dict), where dict is the namespace of the class (fields and methods).
  • Suppose object a is an instance of class A (i.e. a = A();). Then we have a.__class__.__class__ equals <class ‘type’> (metaclass)
  • Using type, we can dynamically define classes.

The main usage of metaclass:
Like type, a metaclass class inherits from type class, which takes a name, a list of bases, and a dictionary to initialize the class. The book provides a simple example of customizing the metaclass and use it on a class:

# Metaprogramming/SimpleMeta1.py
# Two-step metaclass creation in Python 2.x

class SimpleMeta1(type):
    def __init__(cls, name, bases, nmspc):
        super(SimpleMeta1, cls).__init__(name, bases, nmspc)
        cls.uses_metaclass = lambda self : "Yes!"

class Simple1(object):
    __metaclass__ = SimpleMeta1
    def foo(self): pass
    def bar(): pass

simple = Simple1()
print([m for m in dir(simple) if not m.startswith('__')])
# A new method has been injected by the metaclass:
print simple.uses_metaclass()

""" Output:
['bar', 'foo', 'uses_metaclass']

Also because __metaclass__ only needs to be callable, we can also do inline-definition as in the example. However, those won’t work in Python3 (to be confirmed by the authors), we instead pass __metaclass__ along with the inheriting classes:

class Simple1(object, metaclass = SimpleMeta1):

Also, because metaclass method is applied before a class is defined, we can use it to define a final class as in Java:

class final(type):
    def __init__(cls, name, bases, namespace):
        super(final, cls).__init__(name, bases, namespace)
        print("Checking type", name)
        for klass in bases:
            if isinstance(klass, final):
                raise TypeError(str(klass.__name__) + " is final")

class A(metaclass=final): pass
class B(A): pass

When inheriting A, final.__init__ will be executed and since A inherits from final, we will receive an Error when trying to do that.
__new__ vs __init__: the first one is applied before the class is created while the second one is applied after the class is created. __slots__ is used to indicate the fixed fields for a class and avoid using dict (dict is very memory consuming). With __new__, we can define __slots__ but not with __init__.

> Comprehensions

Comprehensions are constructs that allow sequences to be built from other sequences


Also in Python3 we can do comprehensions in set or dictionary:

a = {i for i in range(10)}
{'{}_plus_one'.format(i):i+1 for i in a}

> for/else
If we want to print whether a positive integer is a prime number in a for loop, we have to use a flag with a break. But with for/else grammar, we can do simplify it as:

x = 11
for i in range(2, x):
    if x % i == 0:
        print(x, "can be divided by", i)
    print(x, "is a prime number")

the else statement will be executed if no break appears in the for loop.

> Global Interpreter Lock

A mutex/lock that only allows one thread to control the interpreter. GIL is a lock on the interpreter level to prevent different threads entering the same critical section at the same time. Meanwhile, it can also prevent dead-lock problem since there is only one global lock.

Python chooses this way to ensure the safety and performance of single-threaded programs, which meets a part of Python’s purpose, namely, an easy-to-use script language. Although a lot of complaints have been filed since Python becomes popular, it isn't Easy to Remove the GIL without decreasing the performance of a single-threaded program.

In the previous version of GIL, a thread, after being forced to release it in a fixed interval, is allowed to regain the thread control if no other thread is asking for it. This can block an I/O-bound program when it is parallelized with a CPU-bound program. Visualization can be found in here. The fix in Python 3.2 makes GIL competition fairer.

Workarounds: Multi-thread programming; use other interpreter rather than CPython; wait for the update in Python 3.x.


Restore-point; http://python-3-patterns-idioms-test.readthedocs.io/en/latest/PatternConcept.html#design-principles