Now that we can start doing serious numerical analysis with Numpy arrays, we also reach the stage where we can no longer print out hundreds or thousands of values, so we need to be able to make plots to show the results.
The Matplotlib package can be used to make scientific-grade plots. You can import it with:
import matplotlib.pyplot as plt
If you are using IPython and you want to make interactive plots, you can start up IPython with:
ipython --matplotlib
If you now type a plotting command, an interactive plot will pop up.
If you use the IPython notebook, add a cell containing:
%matplotlib inline
and the plots will appear inside the notebook.
The main plotting function is called plot
:
plt.plot([1,2,3,6,4,2,3,4])
In the above example, we only gave a single list, so it will assume the x values are the indices of the list/array.
However, we can instead specify the x values:
plt.plot([3.3, 4.4, 4.5, 6.5], [3., 5., 6., 7.])
Matplotlib can take Numpy arrays, so we can do for example:
import numpy as np
x = np.linspace(0., 10., 50)
y = np.sin(x)
plt.plot(x, y)
The plot
function is actually quite complex, and for example can take arguments specifying the type of point, the color of the line, and the width of the line:
plt.plot(x, y, marker='o', color='green', linewidth=2)
The line can be hidden with:
plt.plot(x, y, marker='o', color='green', linewidth=0)
If you are interested, you can specify some of these attributes with a special syntax, which you can read up more about in the Matplotlib documentation:
plt.plot(x, y, 'go') # means green and circles
We start off by loading the data/munich_temperatures_average_with_bad_data.txt
file which we encountered in the Numpy lecture:
# The following code reads in the file and removes bad values
import numpy as np
date, temperature = np.loadtxt('data/munich_temperatures_average_with_bad_data.txt', unpack=True)
keep = np.abs(temperature) < 90
date = date[keep]
temperature = temperature[keep]
Now that the data has been read in, plot the temperature against time:
# your solution here
Next, try plotting the data against the fraction of the year (all years on top of each other). Note that you can use the %
(modulo) operator to find the fractional part of the dates:
# your solution here
While the plot
function can be used to show scatter plots, it is mainly used for line plots, and the scatter
function is more often used for scatter plots, because it allows more fine control of the markers:
x = np.random.random(100)
y = np.random.random(100)
plt.scatter(x, y)
Histograms are easy to plot using the hist
function:
v = np.random.uniform(0., 10., 100)
h = plt.hist(v) # we do h= to capture the output of the function, but we don't use it
h = plt.hist(v, range=[-5., 15.], bins=100)
You can also show two-dimensional arrays with the imshow
function:
array = np.random.random((64, 64))
plt.imshow(array)
And the colormap can be changed:
plt.imshow(array, cmap=plt.cm.gist_heat)
You can easily customize plots. For example, the following code adds axis labels, and sets the x and y ranges explicitly:
x = np.random.random(100)
y = np.random.random(100)
plt.scatter(x, y)
plt.xlabel('x values')
plt.ylabel('y values')
plt.xlim(0., 1.)
plt.ylim(0., 1.)
To save a plot to a file, you can do for example:
plt.savefig('my_plot.png')
and you can then view the resulting file like you would iew a normal image. On Linux, you can also do:
$ xv my_plot.png
in the terminal.
One of the nice features of Matplotlib is the ability to make interactive plots. When using IPython, you can do:
%matplotlib qt
to change the backend to be interactive, after which plots that you make will be interactive.
The easiest way to find out more about a function and available options is to use the ?
help in IPython:
In [11]: plt.hist?
Definition: plt.hist(x, bins=10, range=None, normed=False, weights=None, cumulative=False, bottom=None, histtype='bar', align='mid', orientation='vertical', rwidth=None, log=False, color=None, label=None, stacked=False, hold=None, **kwargs)
Docstring:
Plot a histogram.
Call signature::
hist(x, bins=10, range=None, normed=False, weights=None,
cumulative=False, bottom=None, histtype='bar', align='mid',
orientation='vertical', rwidth=None, log=False,
color=None, label=None, stacked=False,
**kwargs)
Compute and draw the histogram of *x*. The return value is a
tuple (*n*, *bins*, *patches*) or ([*n0*, *n1*, ...], *bins*,
[*patches0*, *patches1*,...]) if the input contains multiple
data.
etc.
But sometimes you don't even know how to make a specific type of plot, in which case you can look at the Matplotlib Gallery for example plots and scripts.
Use Numpy to generate 10000 random values following a Gaussian/Normal distribution, and make a histogram. Try changing the number of bins to properly see the Gaussian. Try overplotting a Gaussian function on top of it using a colored line, and adjust the normalization so that the histogram and the line are aligned.
# your solution here
The central limit theorem states that the arithmetic mean of a large number of independent random samples (from any distribution) will approach a normal distribution. You can easily test this with Numpy and Matplotlib:
total
with 10000 values (set to 0)total
arraytotal
by 10 to get the mean of the values you addedtotal
You can also see how the histogram of total
values changes at each step, if you want to see the evolution!
# your solution here