Python parallelism cheat sheet

I often get asked “how can I parallelise my Python code?". I’ve come up with this simple cheat sheet to explain it. I will only explain the most common method of parallel problems here: embarrassingly parallel problems. This blog post is the first in a series I am writing, covering methods of simple parallelism. The following posts cover more convenient methods, as well as some things that should be considered.

Installing rust on older linux systems

At work we use SLES 11 which has quite old versions of openssl and installed certificates. I was getting certificate errors trying to install rust with the rustup tool. I tried searching for any help at all but in the end I followed the following advice: download a more recent certificate bundle (e.g. from certifi or mozilla) set the environment variable SSL_CERT_FILE to point to this new file This works for both rustup and cargo meaning I can develop with rust on my work machine.

Fighting the compiler

I’m learning Rust at the moment, which I’m finding quite an interesting challenge. I agree with a lot of the Rust principles and find it extremely comforting that the compiler has got my back, but it’s bringing me back to my early times learning C and “fighting with the compiler”. How many hours did I spend adding “&” and “*” to variables to pass into functions before I really understood what it meant for a function to take a pointer?

Numpy functions may not do what you think

Numpy has the ability to mask arrays and ignore their values for certain computations, called “masked arrays”. They contain a .mask attribute which is a boolean array, True where the value should be masked and False otherwise. Numpy also comes with a suite of functions which can handle this masking naturally. Typically for a function in the np. namespace, there is a masked-array-aware version under the namespace: np.median => np.

Command line inconsistency

RTFM! Today I brought down our head node at work, because of a misunderstanding of command line arguments for a linux program. In fairness, I should have read the man page more carefully for the entry in question! I was using xargs for some nice command line parallelism and process running. The command I ran was: ls | grep action119 | grep exposureCycle | xargs -n 1 -I {} find {} -name 'IMAGE*.

Add timestamps to stdout

I spent some time trying to get timestamps added to C++ printing, e.g .through cout. I naive approach is to write a function get_current_time() and put it before all printing statements e.g.: cout << get_current_time() << "Message" << endl; This requires changing all logging statements. Then my googling stumbled upon this question which had an elegant solution incorporating a decorator object. Further down the page however I came upon a much nicer solution that transcends languages and programs and can be applied to running shell commands.

Python database transactions

pymysql Defaults to autocommit=False connection = pymysql.connect(user='user', db='test') cursor = connection.cursor() cursor.execute('insert into test (value) values (10)') connection.close() connection = pymysql.connect(user='user', db='test') cursor = connection.cursor() cursor.execute('select value from test') # => [] To commit changes to the database, #commit() must be called: connection = pymysql.connect(user='user', db='test') cursor = connection.cursor() cursor.execute('insert into test (value) values (10)') # Call the commit line connection.commit() connection.close() connection = pymysql.connect(user='user', db='test') cursor = connection.

Separate IPython profiles for interactive use

I used to have two simple shell aliases for IPython: alias ipy=ipython alias pylab='ipython --pylab' These were separated for a couple of reasons: The pylab mode of IPython was deprecated, for good reason. It “infects” the global namespace with all matplotlib and numpy functions. It breaks two entries in the famous “Zen of Python”: Explicit is better than implicit. Namespaces are one honking great idea – let’s do more of those!

git submodules are not so bad

I see a lot of complaints about git submodules, people suggesting alternatives, complaints about merging or other bits and pieces. Git submodules have their place. Yes they are not ideal for all situations but they are ideal for the typical use case I’m about to outline. Example use case In my work I have a master project which contains multiple submodules. Each submodule is also cloned into a separate development repository sitting near by.

git rebase --skip is fine

So git rebase is a powerful tool, able to change history itself. With this power however requires great care to avoid needing to git push --force. Git rebase comes with very user friendly ways to cancel out of a rebase if something goes wrong or if you become confused: git rebase --abort This returns your working tree back to the state before the rebase was started. One thing that has always made me nervous when using rebase was when I rebased and a conflict occured, so I only kept changes from the HEAD commit which caused the following message: