Big Data analytics and optimization in Python

Image for post
Image for post

The increasing popularity of Python as a language for Big Data analysis is yet another reason to learn some optimisation techniques to write software that can scale easily without putting the hardware infrastructure under stress (for nothing). Here is a list of optimisation strategies, from string manipulation to loops. The list is non-exhaustive, but a good starting point to improve your code.

No string pumping in a for loop

s = "" 
for substr in list:
s += substr

Instead you should use s = "".join(list). The same applies when you are generating strings via a function foo:

s = "" 
for x in list:
s += foo(x)

This is much better:

slist = [foo(x) for x in somelist] 
s = "".join(slist)

Still in the realm of string manipulation, pumping a string with

out = "<html>" + head + post + query + tail + "</html>"

is not nice, though correct. Instead, use the sprintf C-like form

out = "<html>%s%s%s%s</html>" % (head, post, query, tail)

Appending the result of a function

newlist = [] for word in oldlist: 

That’s correct. But terribly slow. The for loop, the string manipulation function and the append function will crash a pretty powerful computer whenever crunching on very large lists. The function map will get rid of the interpreted loop and switch to the C-compiled loop, of the Python virtual machine.

Check it out

newlist = map(str.upper, oldlist)

Use local variables

Write something like this

def foo(): 
upper = str.upper
newlist = []
append = newlist.append
for word in oldlist:
return newlist

Fast histogram without plotting

from matplotlib import pyplot as plt binned_data = plt.hist(data, bins)[0]

Now I do this:

import numpy as np 
binned_data, bin_edges = np.histogram(data, bins)

Enjoy Python. Happy optimization!

Before you go

Written by

Managing Director @ Chief Software Engineer & Host

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store