Use range and arange to allocate python and numpy arrays of 10 million
elements. Compare how long it takes to sum up the values and the memory usage of the python processes.
time.time() returns the current time in s (elapsed since some arbitrary date).
Use top or the Mac/Windows task manager to see the mem usage.
solution: exo_data/basic/test_range.py
Level 2: requires some searching
Primes
Write a sieve of Eratosthenes to find the prime numbers between 1 and 1 million.
use a boolean array that indicates whether each element is prime.
use [::k] indexing to set multiples of primes to False.
verify with np.bincount that the obtained numbers follow the distribution.
Divide the interval [0, 1] in 10 bins, each bin's width being the
corresponding probability. Then draw uniform values between 0 and 1,
and select the bin it falls in (use np.cumsum and np.searchsorted).
solution: exo_data/basic/random_dist.py
Level 3: no solution
Redo the "vanishing series" exercice with u[0] becoming arbitrarily
large. Encode large numbers as an array of uint64's.
Compute the 2D coordinates of the vertices of a regular polygon of n sides.
Compute an array of angles and apply sin and cos to it.
solution: exo_data/matrix/regular_polygon.py
Centering vectors
Subtract the mean from the set of line vectors in matrix mat.
n = 50
d = 3
# normal-distributed, size n * d
mat = np.random.randn(n, d)
compute the mean with np.mean, reshape the mean vector to a matrix to subtract it from mat.
solution: exo_data/matrix/centering.py
Swapping lines and columns
Write a function
def swap2(m, i, j):
...
that swaps line i with line j, and column i with column j of the square matrix m. Test it on
mat = np.arange(100).reshape(10, 10)
Be careful that getting a line or column just creates a new view of the matrix data.
solution: exo_data/matrix/swap.py
Level 2
Transpose
Transpose a huge square matrix in-place. Generate the matrix with
n = 10000
mat = np.arange(n * n).reshape(n, n)
You cannot afford to make a temporary copy of the array, so do it by blocks and be careful with the border effects.
solution: exo_data/matrix/transpose.py
Breaking codes
You are a hacker, and you have stolen a set of encrypted keys, such
that key = secret1 * random_data. You have to factor the keys in order
to retrieve the secret data. For this you have a collection of prime
numbers, and you have to check if each key is a multiple of each prime
number.
Write code to find which key can be factorized by any of the primes.
Use a line and a column matrix to broadcast the computation
Use np.any to check if any value in a column is 0
solution: exo_data/matrix/break_key.py
Distance computations
The
code exo_data/matrix/distances.py
generates two matrices X and Y of size n*d and m*d, computes the
pairwise Euclidean distances between rows of X and Y (size n*m), and finds
the nearest neighbor of each row of X in Y.
Rewrite it without any loop, using a matrix multiplication.
Use the fact that for vectors x and y, (x-y|y-y) = (x|x) + (y|y) - 2 *
(x|y). The squared norms (x|x) and (y|y) can be precomputed, the only expensive operation is (x|y), that can be cast as a matrix multiply over over all vector x and y.
Open the file, read the header with normal python i/o, then read the rest using np.fromfile.
solution: exo_data/io/load_off.py
.mat file format
The Matlab .mat
file exo_data/io/000023.mat
contains a set of rectangular matrices. Find them and convert them to a list of numpy float32 arrays.
Load the file with x = scipy.io.loadmat("000023.mat")
Use type() to recursively explore the structure loadmat returns.
solution: exo_data/io/load_mat.py
Parse output log
This is the output of a verbose computational geometry code: exo_data/io/csg_log.gz. It is line-oriented.
A path with name <1212122112> represents a path from root to leaf in a binary tree: from root, go to child 1, then from root's child 1 go to child 2, etc.
The program explores the tree and when a leaf node is reached, it says (a):
An algorithm produces a prediction score that attempts to predict
"positive" or "negative". In order to show how well the algorithm
works, produce a plot that shows the distribution of the
score for ground-truth positives and negatives.
The data for a series of experiments is here: exo_data/graph/data_to_plot.dat
it can be loaded with np.loadtxt("data_to_plot.dat"). Each matrix line is
[score label]
where label is +1 or -1 for ground-truth positives and negatives.
Use np.histogram to make histograms and pyplot.bar to plot them.
solution: exo_data/graph/distribution.py
Level 2
View image matches
An algorithm provides a list
of matching points between the two images:
exo_data/graph/image_matches.dat. Each
line of the matrix
[x1, y1, x2, y2, score]
describes a point match between (x1, y1) on the first image and (x2, y2) on the second one, with the confidence value score.
Draw these two images one above another, and a line between each pair of matches, with a code for the strength of the matches (color or thickness).
The Mandelbrot fractal is defined by the complex series
u[n+1] = u[n] ** 2 + u[0]
The outside of the fractal is the set of u[0] such that abs(u) diverges. It can be shown that if abs(u[n]) becomes larger than 2 at some point, then it diverges.
In the knapsack problem,
we are given a bag and a set of n items. Item
i has weight weights[i] and value values[i]. We want to find a subset
of the items that we can put in the bag so that the total weight is
below maxweight and the total value as large as possible. This problem
is NP hard.
The code exo_data/parallel/knapsack.py
finds an approximate solution by drawing a set of items in a random
order and checking how many of them fit in the knapsack.
Parallelize the solution using multiprocessing, then multithreading.
The code exo_data/parallel/c_knapsack.c
is a C implementation of the fill_knapsack function. Compare its speed
against the python version. Add code to release the global interpreter
lock when appropriate and compare again.
Use a pool of multiprocessing.Pool of multiprocessing.cpu_count()
workers, either directly with multiprocessing.Pool or multiprocessing.dummy.Pool.
To release the GIL, place a Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS
pair at the correct location.
solution: exo_data/parallel/knapsack_parallel.py
Long exercices
OpenGL
This exercise requires to install PyOpenGL. Instructions:
curl -O https://pypi.python.org/packages/source/P/PyOpenGL/PyOpenGL-3.1.0b3.tar.gz
tar xvzf PyOpenGL-3.1.0b3.tar.gz
cd PyOpenGL-3.1.0b3
mkdir -p /Users/matthijs/Desktop/numpy_training/packs/lib/python2.7/site-packages/
export PYTHONPATH=/Users/matthijs/Desktop/numpy_training/packs/lib/python2.7/site-packages/
python setup.py install --prefix=/Users/matthijs/Desktop/numpy_training/packs