Accessing Remote Resources

Web pages and data

I have mentioned before how one can access data files on your hard drive, but Python also allows you to access remote data, for example on the internet. The easiest way to do this is to use the requests module. To start off, you just can get the URL:

In [1]:
import requests

response = requests.get('http://xkcd.com/353/')

response holds the response now. You can access the content as text via the text-property:

In [2]:
print(response.text[:1000])  # only print the first 1000 characters
<!DOCTYPE html>
<html>
<head>
<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-25700708-7', 'auto');
  ga('send', 'pageview');
</script>
<link rel="stylesheet" type="text/css" href="/s/b0dcca.css" title="Default"/>
<title>xkcd: Python</title>
<meta http-equiv="X-UA-Compatible" content="IE=edge"/>
<link rel="shortcut icon" href="/s/919f27.ico" type="image/x-icon"/>
<link rel="icon" href="/s/919f27.ico" type="image/x-icon"/>
<link rel="alternate" type="application/atom+xml" title="Atom 1.0" href="/atom.xml"/>
<link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="/rss.xml"/>
<script type="text/javascript" src="/s/b66ed7.js" async></script>
<script type="text/javascript" 

You can either just use this information directly, or in some cases you might want to write it to a file. Let's download one of the full resolution files for the Ice coverage data from Problem Set 9:

In [3]:
r2 = requests.get('http://mpia.de/~robitaille/share/ice_data/20060313.npy')
In [4]:
r2.text[:200]
Out[4]:
'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>404 Not Found</title>\n</head><body>\n<h1>Not Found</h1>\n<p>The requested URL /~robitaille/share/ice_data/20060313.npy was not foun'

However, this doesn't seem to be actual text. Instead, its a binary format. The binary data of the response can be accessed via

In [5]:
r2.content[:200]
Out[5]:
b'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>404 Not Found</title>\n</head><body>\n<h1>Not Found</h1>\n<p>The requested URL /~robitaille/share/ice_data/20060313.npy was not foun'

Note the little b at the beginning indicating a binary byte-string.

Now we can open a new (binary) file and download the data to the file.

In [6]:
f = open('20060313.npy', 'wb')
f.write(r2.content)
f.close()

Let's now load and plot the data:

In [7]:
import numpy as np
data = np.load('20060313.npy')
---------------------------------------------------------------------------
UnpicklingError                           Traceback (most recent call last)
~/miniconda3/envs/dev/lib/python3.6/site-packages/numpy/lib/npyio.py in load(file, mmap_mode, allow_pickle, fix_imports, encoding)
    425             try:
--> 426                 return pickle.load(fid, **pickle_kwargs)
    427             except:

UnpicklingError: invalid load key, '<'.

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
<ipython-input-7-dfe7451e09f0> in <module>()
      1 import numpy as np
----> 2 data = np.load('20060313.npy')

~/miniconda3/envs/dev/lib/python3.6/site-packages/numpy/lib/npyio.py in load(file, mmap_mode, allow_pickle, fix_imports, encoding)
    427             except:
    428                 raise IOError(
--> 429                     "Failed to interpret file %s as a pickle" % repr(file))
    430     finally:
    431         if own_fid:

OSError: Failed to interpret file '20060313.npy' as a pickle
In [8]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize=(12,12))
plt.imshow(data, origin='lower')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-8-8f01272a7de4> in <module>()
      2 import matplotlib.pyplot as plt
      3 plt.figure(figsize=(12,12))
----> 4 plt.imshow(data, origin='lower')

NameError: name 'data' is not defined
<matplotlib.figure.Figure at 0x1126bf400>

APIs

Imagine that you want to access some data online. In some cases, you will need to download a web page and search through the HTML to extract what you want. For example:

In [9]:
r = requests.get('http://www.wetteronline.de/wetter/heidelberg')
In [10]:
r.text[:1000]
Out[10]:
'<!DOCTYPE html>\n<html>\n<head>\n \n <title>Wetter Heidelberg - aktuelle Wettervorhersage von WetterOnline</title>\n <meta http-equiv="X-UA-Compatible" content="IE=edge" />\n <meta name="description" content="Das Wetter in Heidelberg - Wettervorhersage für heute, morgen und die kommenden Tage mit Wetterbericht und Regenradar von wetteronline.de" />\n <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n <meta http-equiv="content-language" content="de-DE" />\n \n  <meta property="fb:admins" content="100001020190994" />\n <meta property="fb:admins" content="1060016694" />\n\n <meta property="og:title" content="Wetter Heidelberg - aktuelle Wettervorhersage von WetterOnline">\n <meta property="og:type" content="article">\n  <meta name="viewport" content="width=1160">\n <meta property="og:image" content="https://st.wetteronline.de/dr/1.0.807/images/logo/ogimage_wetteronline_1200x630.png">\n \n <meta name="skype_toolbar" content="skype_toolbar_parser_compatible" />\n<meta name="msapplication-'

This is not ideal because it is messy, and also slow if all you want are a couple of values. A number of websites now offer an "Application programming interface" (or API) which is basically a way of accessing data is a machine-readable way. Let's take a look at http://openweathermap.org/ for example, which has an API: http://openweathermap.org/API. To access the weather for Heidelberg, you can do:

In [11]:
r = requests.get('http://api.openweathermap.org/data/2.5/weather?q=Heidelberg,Germany')
In [12]:
r.text
Out[12]:
'{"cod":401, "message": "Invalid API key. Please see http://openweathermap.org/faq#error401 for more info."}'

This is much shorter, but still not ideal for reading into Python as-is. The format above is called JSON, and Python includes a library to easily read in this data:

In [13]:
import json
data = json.loads(r.text)
In [14]:
data
Out[14]:
{'cod': 401,
 'message': 'Invalid API key. Please see http://openweathermap.org/faq#error401 for more info.'}

You should now be able to do:

In [15]:
data[u'main'][u'temp']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-15-0bb31bcca6c0> in <module>()
----> 1 data[u'main'][u'temp']

KeyError: 'main'

It looks like the temperature is in K!

Exercise

You can find over 2000 tiles of the Arctic ice coverage data using the URL with the format:

http://mpia.de/~robitaille/share/ice_data/YYYYMMDD.npy

Write a Python function that takes three arguments - the year, month, and day, as integers, and returns a Numpy array. If the map does not exist, try and return None instead of having an error:

In [16]:
# your solution here

Try using the function to make a plot, as shown above:

In [17]:
# your solution here