.py in the skyhttp://astrofrog.github.com/2016-01-12T00:00:00+01:00Stop writing code that will break on Python 4!2016-01-12T00:00:00+01:00Thomas Robitailletag:astrofrog.github.com,2016-01-12:blog/2016/01/12/stop-writing-python-4-incompatible-code/<p>With the end of support for Python 2 on the horizon
(<a href="https://www.python.org/dev/peps/pep-0373/">in 2020</a>), many package developers
have made their packages compatible with both Python 2 and Python 3 by using
constructs such as:</p>
<div class="highlight"><pre><span class="k">if</span> <span class="n">sys</span><span class="p">.</span><span class="n">version_info</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mi">2</span><span class="o">:</span>
<span class="err">#</span> <span class="n">Python</span> <span class="mi">2</span> <span class="n">code</span>
<span class="nl">else:</span>
<span class="err">#</span> <span class="n">Python</span> <span class="mi">3</span> <span class="n">code</span>
</pre></div>
<p>in places where things have changed between Python 2 and 3.</p>
<p>The <a href="https://pythonhosted.org/six/">six</a> package simplifies many of the
differences by providing wrappers that behave the same on Python 2 and 3. For
instance, iterating over dictionary keys is normally done with:</p>
<div class="highlight"><pre><span class="k">for</span> <span class="n">item</span> <span class="n">in</span> <span class="n">dictionary</span><span class="p">.</span><span class="n">iteritems</span><span class="p">()</span><span class="o">:</span>
<span class="err">#</span> <span class="n">code</span> <span class="n">here</span>
</pre></div>
<p>in Python 2 and:</p>
<div class="highlight"><pre><span class="k">for</span> <span class="n">item</span> <span class="n">in</span> <span class="n">dictionary</span><span class="p">.</span><span class="n">items</span><span class="p">()</span><span class="o">:</span>
<span class="err">#</span> <span class="n">code</span> <span class="n">here</span>
</pre></div>
<p>in Python 3. With <a href="https://pythonhosted.org/six/">six</a>, one can simply do:</p>
<div class="highlight"><pre><span class="n">import</span> <span class="n">six</span>
<span class="k">for</span> <span class="n">item</span> <span class="n">in</span> <span class="n">six</span><span class="p">.</span><span class="n">iteritems</span><span class="p">(</span><span class="n">dictionary</span><span class="p">)</span><span class="o">:</span>
<span class="err">#</span> <span class="n">code</span> <span class="n">here</span>
</pre></div>
<p>and this will work seamlessly both with Python 2 and 3. However, there are some
more complex cases where one has to resort to the type of <code>if</code> statement shown at
the top of this post. The <a href="https://pythonhosted.org/six/">six</a> package again
makes this slightly easier by providing <code>PY2</code> and <code>PY3</code> boolean constants:</p>
<div class="highlight"><pre><span class="k">if</span> <span class="n">six</span><span class="p">.</span><span class="n">PY2</span><span class="o">:</span>
<span class="err">#</span> <span class="n">Python</span> <span class="mi">2</span> <span class="n">code</span>
<span class="nl">else:</span>
<span class="err">#</span> <span class="n">Python</span> <span class="mi">3</span> <span class="n">code</span>
</pre></div>
<p>So far so good.</p>
<p>This brings me to the main point of this post. We don't
really know yet what Python 4 will look like, but we can be pretty sure that
the transition from Python 3 to Python 4 will be a lot smoother and will likely
<a href="https://opensource.com/life/14/9/why-python-4-wont-be-python-3">not be backward-incompatible in the same way as Python 3 was</a>. If that's
the case, we should be able to use packages developed for Python 2 and 3
seamlessly with Python 4. Right?...</p>
<p>Not quite! By searching on GitHub, I found almost 300,000 matches for the
following kind of syntax:</p>
<div class="highlight"><pre><span class="k">if</span> <span class="n">six</span><span class="p">.</span><span class="n">PY3</span><span class="o">:</span>
<span class="err">#</span> <span class="n">Python</span> <span class="mi">3</span> <span class="n">code</span>
<span class="nl">else:</span>
<span class="err">#</span> <span class="n">Python</span> <span class="mi">2</span> <span class="n">code</span>
</pre></div>
<p>See the problem? In <code>six</code>, <code>PY3</code> is defined as:</p>
<div class="highlight"><pre><span class="n">PY3</span> <span class="o">=</span> <span class="n">sys</span><span class="p">.</span><span class="n">version_info</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mi">3</span>
</pre></div>
<p>so that once Python 4 is used, both <code>PY2</code> and <code>PY3</code> will be (correctly)
<code>False</code> and the above <code>if</code> statement will execute the <code>else</code> statement for
Python 2 code. Oops!</p>
<p>Another example of problematic code is the following:</p>
<div class="highlight"><pre><span class="k">if</span> <span class="n">six</span><span class="p">.</span><span class="n">PY2</span><span class="o">:</span>
<span class="err">#</span> <span class="n">Python</span> <span class="mi">2</span> <span class="n">code</span>
<span class="n">elif</span> <span class="n">six</span><span class="p">.</span><span class="n">PY3</span><span class="o">:</span>
<span class="err">#</span> <span class="n">Python</span> <span class="mi">3</span> <span class="n">code</span>
</pre></div>
<p>In this case, no code will get executed on Python 4 at all!</p>
<p>To avoid this, it's critical that we avoid treating Python 3 as a special case
in these kinds of <code>if</code> statements and instead treat Python <strong>2</strong> as the
special case, and default to Python 3 code otherwise:</p>
<div class="highlight"><pre><span class="k">if</span> <span class="n">six</span><span class="p">.</span><span class="n">PY2</span><span class="o">:</span>
<span class="err">#</span> <span class="n">Python</span> <span class="mi">2</span> <span class="n">code</span>
<span class="nl">else:</span>
<span class="err">#</span> <span class="n">Python</span> <span class="mi">3</span> <span class="n">code</span>
</pre></div>
<p>It's a small change, but it will save a lot of headaches down the road. So if
you develop a Python package, please check now to make sure that your code will
be Python 4-compatible!</p>
<p><strong>Update:</strong> of course, the same logic applies even if not using the six
package. If doing version comparisons using <code>sys.version_info</code>, make sure you
<strong>don't</strong> do:</p>
<div class="highlight"><pre><span class="k">if</span> <span class="n">sys</span><span class="p">.</span><span class="n">version_info</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mi">3</span><span class="o">:</span>
<span class="err">#</span> <span class="n">Python</span> <span class="mi">3</span> <span class="n">code</span>
<span class="nl">else:</span>
<span class="err">#</span> <span class="n">Python</span> <span class="mi">2</span> <span class="n">code</span>
</pre></div>
<p>Either swap the if statement and check for Python 2, or make sure you use
<code>>=</code>.</p>And now for something completely different!2015-11-10T00:00:00+01:00Thomas Robitailletag:astrofrog.github.com,2015-11-10:blog/2015/11/10/and-now-for-something-completely-different/<p>I am excited to share that at the end of 2015 I will leave my
'traditional' academic position and will start a new chapter in my
professional life! During my time as a researcher, it has become clear
that what I enjoy most is finding new ways to do science, developing robust
and re-usable software, and helping and teaching others to do so. Throughout
my projects, I have constantly tried to promote good research and software
practices (such as <a href="https://github.com/hyperion-rt/paper-galaxy-rt-model">reproducible research</a>),
and create tools that could be used by others and are applicable beyond my
specific research area. In the last few years, I have also been incredibly
lucky to have been involved as one of the co-ordinators and lead developers
of the <a href="http://www.astropy.org">Astropy</a> project. My goal is now to transform
my passion for scientific software and open science into a full-time
job, rather than fitting it in between all the usual responsibilities of a
traditional academic job.</p>
<p>In January, I therefore plan to start working full time as a freelance
consultant! I hope to work on projects relating to scientific software
development, data analysis, as well as open science, and will continue to
provide training workshops for Python and scientific computing. I am
interested in focusing not only on projects in astronomy, but applying these
skills to projects in other fields of science, and eventually maybe even to
projects outside science. This is going to be an exciting transition that I
hope will open many new opportunities!</p>
<p>I will remain involved in the development of many of the
Python packages I have worked on in the last few years. In particular, I will
continue to help with the coordination and development of the <a href="http://www.astropy.org">Astropy</a> project, and I will continue to support many
Python packages that I have developed or co-developed over the years
(such as <a href="http://aplpy.github.io">APLpy</a>,
<a href="http://wcsaxes.readthedocs.org">WCSAxes</a>, <a href="http://www.glueviz.org">Glue</a>,
<a href="http://www.dendrograms.org">astrodendro</a>, and many more).</p>
<p>Until the end of the year, I will be wrapping up existing research
projects. After this, I will no longer invest a significant amount
of time into projects from a research point of view (but will of course be
open to opportunities to contribute to projects as a consultant on matters
relating to software development and data analysis). I will continue
to support the <a href="http://www.hyperion-rt.org/">Hyperion</a> radiative
transfer package and will be happy to help anyone interested in contributing
new features. I would love for more developers to get involved in this
project, so please get in touch if you are interested in helping!</p>
<p>I will save my thoughts about the future of software in astronomy for another time, in
particular about the lack of stable academic career paths for people
interested in <em>research</em> software development (as opposed to telescope and
instrumentation-related software development, which is at least supported by
some institutes and/or projects, although also not enough). For now, suffice to say that any
change in this respect cannot happen fast enough from my point of view, and I
am very much looking forward to continuing to try and impact the way science
is done, from outside the traditionally followed path.</p>
<p><img alt="monty_python_foot" src="/images/foot.jpg" /></p>Python 3 in Science: the great migration has begun!2015-05-09T00:00:00+02:00Thomas Robitailletag:astrofrog.github.com,2015-05-09:blog/2015/05/09/2015-survey-results/<p>Back in 2012, <a href="http://astrofrog.github.io/blog/2013/01/13/what-python-installations-are-scientists-using/">I carried out a survey</a> to find out which Python, NumPy, and
SciPy versions scientists are currently using for their daily work, in order
to better understand which versions should be supported. The main finding was that a large fraction of people have
reasonably up-to-date Python installations, although virtually no-one was
using Python 3 for daily work.</p>
<p>This year, I decided to repeat the experiment: last January
I advertised a survey which asked users to provide information
about their Python installation(s) for research/production work, as well as
more general information about their Python experience, which packages they
used regularly, why they are not using Python 3 if they were still using
Python 2, and so on.</p>
<p>There is a <em>lot</em> to be learned from this data, and there is no way that I can
cover all results in a single blog post, so instead I will focus only on a
few points in this post, and will write several more posts over the next
couple of weeks to highlight various other results.</p>
<p>For this post, I thought it would be fun to take a look specifically at what
Python versions users in the scientific Python community are using, and in
particular, the state of Python 3 adoption. I am making an anonymized
and cleaned-up version of the subset of the data used in this post in <a href="https://github.com/astrofrog/scientific-python-survey-2015">this</a> GitHub repository, and will add to the data over time with future blog posts.</p>
<h2>The survey</h2>
<p>Before I go ahead, just a few details about the survey. To start with, I
asked respondents to provide information about their primary Python
installation for research/production work, gave the option to provide
information about a second and third installation they use regularly, also
for research/production work (not for development). The information collected
about the different Python installations was:</p>
<ul>
<li>Their operating system</li>
<li>The full version numbers for Python, NumPy, SciPy, and Matplotlib</li>
<li>Their regularly used installation method/manager (e.g. pip, conda, etc.)</li>
</ul>
<p>In addition, I asked general questions (not specific to a given Python
installation), for example:</p>
<ul>
<li>What scientific Python packages do they use?</li>
<li>How long have they been using Python for?</li>
<li>If they are not using Python 3, why not?</li>
<li>How did they find out about the survey?</li>
<li>Did they take the 2012 survey?</li>
</ul>
<p>In total, there were 786 responses to the survey, far more than I had
anticipated, and more than twice the number of respondents in 2012 (313)! The
results below include 781 responses, because 5 of the responses were partially
invalid or problematic.</p>
<p>Let's now dive in and take a look at some of the results!</p>
<h2>Demographics</h2>
<p>We can start off by taking a quick look at the demographics of surveyed
users, and in particular the field of work:</p>
<p><img alt="field" src="/images/survey_plots/fields.svg" /></p>
<p>Note that one respondent could select multiple fields, so the percentages add
up to more than 100%. In addition, I left out several important branches
(such as Biology/Bioinformatics, Computer Science, Mathematics) from the
original survey, but parsed these from the answers provided in the 'Other'
field. I did my best to advertise the survey beyond Astronomy, but there were
some very effective advertising channels for Astronomers (for instance, there
is a Facebook group with over 8,000 professional Astronomers, a significant
fraction of all Astronomers worldwide!) which explains the huge bias towards
Astronomy/Astrophysics. If your field is under-represented here, please leave
a comment below to let me know where I can advertise a similar survey next
time for maximum impact!</p>
<p>Another piece of information we have is how long people have been using Python for:</p>
<p><img alt="field" src="/images/survey_plots/experience.svg" /></p>
<p>The bins are not all the same width, but this shows that we have a nice mix, ranging from very experienced to very new users. This information will come in handy in the next section :)</p>
<h2>Python versions</h2>
<p>Now let's get to the main point of this post which is to look at what fraction of users are using different Python versions for their <em>primary</em> installation:</p>
<p><img alt="python versions" src="/images/survey_plots/python.svg" /></p>
<p>There are a few interesting things to notice here. Firstly, <strong>most users are using either Python 2.7 or 3.4</strong>, very few users are
using Python 2.6 and 3.3, and virtually no one uses Python 3.1, and 3.2. This
has clear implications for which Python versions package developers need to
support. Based on this, I would argue that the main versions that need to be supported are Python 2.7 and 3.4, as well as 3.3 (since it is not a negligible fraction of Python 3 users). <strong>Support for Python 2.6 as well as
3.1 and 3.2 can essentially be dropped.</strong></p>
<p>Secondly, over 17% of respondents use Python 3 as their <strong>primary</strong> Python
installation. In fact, if we
look at the exact statistics from the data, we find that 17.4% of users use
Python 3 as their primary installation, and 2.8% use it as a secondary
installation, which means that around 20.2% of users are now actively using
Python 3 on a regular basis. While this is a little low, remember that in the 2012 survey,
virtually no one used Python 3 as their primary installation, which is not too surprising because at the time, not all of the core Scientific Python packages were fully functional in Python 3. Now that all core packages support Python 3 fully, it's nice to see that we've gone from essentially 0% to 20% in only a couple of years.</p>
<p>Let's now take a quick look at operating systems:</p>
<p><img alt="os" src="/images/survey_plots/os.svg" /></p>
<p>The Linux/Mac split is not surprising, but this shows that almost 10% of
Scientific Python users are on Windows, which is not negligible. Thankfully,
services like <a href="http://www.appveyor.com/">AppVeyor</a> now make it easy to set up
continuous integration/testing for packages on Windows, so it's becoming
easier to support this community.</p>
<p>Now for an unexpected (at least for me) result relating to operating systems. The
following plot is normalized by rows (i.e. the sum of each row is 100%) to show, for each operating system, the
distribution of Python versions:</p>
<p><img alt="python vs os" src="/images/survey_plots/os_vs_python.svg" /></p>
<p>Yes, that's right, Windows users are the most up-to-date when it comes to Python
versions – almost 40% of Windows users are using Python 3! Mac users on the
other hand are the most conservative, with almost 90% sticking to Python 2.7.
At this point, I'm not sure what the difference is due to, but I'd be
interested in hearing your thoughts in the comments! There may be information in the
full dataset that can help us answer that – in particular, I have a suspicion
it could be related to installation methods for Python and default Python
versions available on Linux and MacOS X (whereas Windows users always have to
install Python themselves).</p>
<p>So <strong>why</strong> do some users not use Python 3? Here's the breakdown of the main
reasons, for the 80% or so of users whose primary Python version is 2.6 or
2.7 (note that one user can select several of these answers):</p>
<p><img alt="python3" src="/images/survey_plots/why_not_python3.svg" /></p>
<p>Almost two thirds of users who are still using Python 2 do not have any
motivation to update to Python 3. This is essentially what Jake Vanderplas
<a href="https://jakevdp.github.io/blog/2013/01/03/will-scientists-ever-move-to-
python-3/">wrote about</a> two years ago – to quote him in that blog post, <em>I'd do it myself
[switch to Python 3], but I'm too much of a pragmatist: python 2.7 is more
than sufficient for my own research</em>.</p>
<p>Now I can certainly understand this argument, and I always
find it difficult to give concrete features in Python 3 that will benefit
users directly – of course, idealistically, unicode support by default is
great because there's no reason that strings should be limited to the ASCII
alphabet, but for users that don't need this, it's a harder sell. There are
other features that exist in Python 3 and not Python 2, but personally, I
switched to using Python 3 for a very pragmatic reason, which is the
following: Python 3 is the future and we are going to have to switch to it
sooner or later – <strong>the more we put it off, the harder the transition will
be!</strong> To me, that is a good enough motivation to switch as soon as possible
now that the Scientific Python ecosystem supports this.</p>
<p>Whether or not you agree with me that this is a good enough reason, at the very least, there really is no reason we shouldn't be <em>teaching</em>
Python 3 by default. We can still tell new users about Python 2 in case they
encounter it, but only as an aside. So let's see whether new users are preferentially using Python 3? (note: the following plot is normalized by
columns):</p>
<p><img alt="python vs experience" src="/images/survey_plots/python_vs_experience.svg" /></p>
<p>Hmm, no.... Actually, 6% of the newest users (<1 year) are using Python 2.6
(the most compared to other users!) and only 13% are using Python 3, <strong>less</strong>
than any other users. I suspect this is because new users just use whatever
Python versions are available and aren't yet aware of which versions are the
latest. Furthermore, I suspect most Python courses/workshops still use Python 2. But this is all wrong – we should be teaching new users to use Python 3! New users won't thank you if you teach them Python 2 and they have to migrate all their scripts to Python 3 in a few years... I would strongly encourage any
of you involved in teaching Python to switch now to using Python 3, <em>even if
you don't use it yourself</em> (I teach a <a href="http://mpia.de/~robitaille/PY4SCI_SS_2015">Python course</a> which uses Python 3.4 at the University of Heidelberg, and everything has gone very
smoothly!)</p>
<p>One final suggestion I have is that we start holding Python 3 'migration clinics', where we can help users convert their code to be Python 3-compatible, and help them get set up with Python 3, either as a secondary installation, or a primary one. We can also teach users how to write code that is Python 2 and 3-compatible, using e.g. the <code>__future__</code> imports, as well as libraries like <a href="https://pypi.python.org/pypi/six">six</a>.</p>
<h2>What can developers do?</h2>
<p>I think that as developers, we can do more to encourage Python 3 adoption. In my view, one of
the mistakes of the Python 3 transition was that Python
developers backported many new and very nice features of Python 3 back to
Python 2, making Python 3 a harder sell. A couple of random examples of backported features include
<a href="https://docs.python.org/3/library/collections.html#collections.OrderedDict">ordered dictionaries</a> and <a href="https://docs.python.org/3.4/tutorial/datastructures.html#dictionaries">dictionary comprehensions</a>. However, the Python developers have now stated that <a href="https://www.python.org/dev/peps/pep-0404/">there will be no Python 2.8 release</a>. Essentially, no new features are going to be added to Python 2. In fact, after 2020 (which is not <em>so</em> far in the future), Python 2 will no longer be supported.</p>
<p>At some point in the near future, I feel that other package developers should follow this example by having only bugfix releases on Python 2, and releasing new major versions of packages that only support Python 3. To be clear, Python 2 users will still be able to use packages for as long as they like, and no features would be taken away, but in order to get the latest and greatest, they would need to upgrade to Python 3.</p>
<h2>Take-away points</h2>
<p>We've only scratched the surface of the data from this survey, and already, we can see several interesting things:</p>
<ul>
<li>
<p>It's pretty safe to say that developers can now drop support for Python
2.6, 3.1, and 3.2. Python 3.4 is far more popular than Python 2.6, and so it's much more important to make sure we support the former than the latter.</p>
</li>
<li>
<p>We should not under-estimate the fraction of Windows users in the
Scientific Python community (almost 10%). Supporting these users is now
made easier by online continuous integration services running Windows.</p>
</li>
<li>
<p>Python 3 is catching on, with over 17% of people using it as their primary
installation, and over 20% if we include people who use it as a secondary
installation, compared to essentially 0% a couple of years ago.</p>
</li>
<li>
<p>The main reason for Python 2 users to not switch to Python 3 is the
lack of motivation/killer features. We need to therefore be more
proactive in encouraging people to switch to Python 3 by (a) better
advertising new features in Python 3 not available in Python 2, (b) making
sure that any new users are always directed to the latest Python 3 version,
and (c) releasing, in the near future, new major versions of packages for
Python 3 only, while maintaining long term bugfix support for Python 2
versions.</p>
</li>
</ul>
<p>Let me know if you have any thoughts on these results so far in the comment
section below, and stay tuned for future blog posts with more results!</p>
<p><strong>Update 1 (9 May 2015 at 19:15 UT):</strong> I have now include more fields of
research based on answers supplied in the 'Other' field. I have also reworded
my recommendation for dropping support for Python versions to make it clear
that I don't think we should drop support for 3.3.</p>
<p><strong>Update 2 (9 May 2015 at 11:30 UT):</strong> Updated the last bullet in the
take-away points to clarify that we should also make sure we better advertise
Python 3-only features. For instance Python 3.5 will have a matrix
multiplication operator which will be very useful especially for Scientific
Python users. Also, updated the paragraph starting <em>Now I can certainly
understand this argument</em> to mention that of course there are other new
features besides default unicode in Python 3.</p>
<p><strong>Update 3 (10 May 2015 at 8:30 UT):</strong> If you are interested in discussing
the results in this post, there are discussion threads on
<a href="https://news.ycombinator.com/item?id=9517392">Hacker News</a> and
<a href="http://www.reddit.com/r/Python/comments/35ec96/python_3_in_science_the_great_migration_has_begun/">Reddit</a> in addition to the comments section below!</p>The Acknowledgment Generator2014-12-07T00:00:00+01:00Thomas Robitailletag:astrofrog.github.com,2014-12-07:blog/2014/12/07/acknowledgment-generator/<p>This week, the 6th installment of the <a href="http://dotastronomy.com/">.Astronomy</a>
conference series will be taking place in Chicago. I will unfortunately not be
attending this year, but I was nevertheless motivated today to try and finish
up a hack that started as a result of discussions with
<a href="https://twitter.com/nialldeacon">Niall Deacon</a> before and at
<a href="http://dotastronomy.com/events/five/">.Astronomy 5</a> in Boston!</p>
<p>The idea is simple: as I described in a <a href="http://astrofrog.github.io/blog/2013/10/02/acknowledging-tools-services-in-papers/">blog post</a>
last year, we are not doing good job at acknowledging the tools
that we use for our research, which in turn means that many people who spend
time developing tools for the community are not getting the credit they deserve.
(how to give credit to people for non-traditional work in academia is a
recurring theme of .Astronomy meetings).</p>
<p>Enter the <a href="http://astrofrog.github.io/acknowledgment-generator/">Acknowledgment Generator</a>,
a new tool to make it easier for the laziest of us to generate in just a few
clicks the list of acknowledgments that should go at the end of a paper or on a
website! This includes options to show LaTeX-friendly output and BiBTeX
references. At this point, this is a proof of concept, and only very few
examples of codes, facilities, or resources are included...</p>
<p>... and this is where I need <em>your</em> help: we can crowd-source the collection of
the information needed for the database of entries! The website and the
database of entries are kept in a
<a href="https://github.com/astrofrog/acknowledgment-generator">GitHub repository</a>
so anyone can go and make changes to the website, and add or modify entries.
Any contribution helps, and your name will of course be listed as a contributor :)</p>
<p>In fact, no <a href="http://git-scm.com/">git</a> expertise is required to help. Take a
look at the instructions I have written
<a href="https://github.com/astrofrog/acknowledgment-generator#i-want-to-help">here</a>
which describe how you can add a new entries with just a few clicks, and also
gives details of how you can send me entries if all else fails.
If you have ideas of how to make this website better, you can also comment on
this blog post, or even better, open an issue
<a href="https://github.com/astrofrog/acknowledgment-generator/issues">here</a>!</p>
<p>Finally, I would also very much welcome any help in implementing new features
into the website itself. I am still a javascript newbie, so there are a lot of
low-hanging fruit if you are interested in helping out! The full list of issues
is available <a href="https://github.com/astrofrog/acknowledgment-generator/issues">here</a>.</p>Are we acknowledging tools and services enough in Astronomy papers?2013-10-02T12:57:00+02:00Thomas Robitailletag:astrofrog.github.com,2013-10-02:blog/2013/10/02/acknowledging-tools-services-in-papers/<p>A couple of weeks ago, I attended the 5th
<a href="http://dotastronomy.com/">.Astronomy</a> meeting, which took place in Boston. For
anyone not familiar with this series of conferences, the aim is to bring
together researchers, developers, and educators/outreach specialists who
use or are interested in using the web as a tool for their work (I like to
think of it as an astro-hipster conference!).</p>
<p>One of the topics that comes up regularly at .Astronomy meetings is the
question of credit: how do we, as scientists, get credit for work that is not
considered 'traditional', such as (but not limited to) creating or contributing
to open source software, outreach activities, or refereeing?
<a href="http://twitter.com/sarahkendrew">Sarah Kendrew</a> already summarized the
discussions on this topic in <a href="http://sarahaskew.net/2013/10/01/astronomy-5-share-the-love/">her blog</a>, so I won't
repeat them here. However, given that I contribute to a number of open source
projects (such as <a href="http://www.astropy.org">Astropy</a>,
<a href="http://aplpy.github.io">APLpy</a>, and many others) this got me wondering
how often authors actually acknowledge the tools that they use in papers?</p>
<p>I previously played around with the <a href="http://adsabs.harvard.edu/">NASA/ADS</a>
full-text search, but what I wanted was a way to be able to do this
automatically for any keyword/phrase, and be able to see the evolution of
acknowledgments over time. With the release of the <a href="https://github.com/adsabs/adsabs-dev-api">ADS developer API</a> (which
<a href="http://twitter.com/aaccomazzi">Alberto Accomazzi</a> presented on the Monday at
.Astronomy), I finally had the tool I needed to do this! This was a fun
post-dotastro hack, for which I now present the results below.</p>
<h2>The code</h2>
<p>Since not everyone reading this will be interested in the code I used to do
this, I have placed it in a separate IPython notebook <a href="http://nbviewer.ipython.org/urls/raw.github.com/astrofrog/mining_acknowled
gments/master/Mining%2520acknowledgments%2520in%2520ADS.ipynb">that you can access here</a>. Please feel
free to fork and contribute to it!</p>
<h2>The results</h2>
<p>Let's start off by looking at how often <a href="http://adsabs.harvard.edu/">ADS</a>
itself is acknowledged. The suggested acknowledgment phrase includes
<em>Astrophysics Data System</em>, so we will search for that:</p>
<p><img alt="ADS" src="http://astrofrog.github.com/images/mining_ack/ads_final.png" /></p>
<p>This shows that more and more people are acknowledging ADS, but that even now
this represents less than 1% of all papers! Of course, many people, myself
included (until now), use ADS without thinking about acknowledging it, but this
illustrates to what extend we are under-acknowledging what we use to do our
research and write papers.</p>
<p>Let's move on to common online databases, such as
<a href="http://simbad.u-strasbg.fr/simbad/">Simbad</a>,
<a href="http://vizier.u-strasbg.fr/viz-bin/VizieR">Vizier</a>, and
<a href="http://ned.ipac.caltech.edu">NED</a>:</p>
<p><img alt="online databases" src="http://astrofrog.github.com/images/mining_ack/databases_final.png" /></p>
<p>I want you to take a look at the y scale. At most <strong>0.17%</strong> of refereed papers
currently acknowledge using SIMBAD. Now I know there are quite a few theorists
out there, but this is a little on the low side... As was the case for ADS,
it's encouraging to see that the fraction is increasing over time, but if we
extrapolate the increase since the year 2000, it will take another <strong>two
thousand years</strong> before 10% of papers acknowledge the use of SIMBAD (and I'm
sure the real value should be higher).</p>
<p>Moving on to programming languages:</p>
<p><img alt="programming languages" src="http://astrofrog.github.com/images/mining_ack/programming_final.png" /></p>
<p>The fractions are even smaller than the online databases above, although in all
fairness there is no requirement to acknowledge programming languages directly,
so I will not complain about this. What is interesting though are the trends.
IDL and Fortran both see a large drop in fraction of acknowledgments this year,
while mentions of Python have seen a sharp increase from almost none around
2005 to more than any of the other languages shown here. While this is a poor
metric of which languages people are actually using, it does show that the
uptake of Python over the last few years is very encouraging!</p>
<p>Finally, let's wrap up with a few common tools:</p>
<p><img alt="tools" src="http://astrofrog.github.com/images/mining_ack/tools_final.png" /></p>
<p>Again, the fractions are far too low compared to the real usage, but the trends
are again very instructive. <a href="http://iraf.noao.edu/">IRAF</a> and
<a href="http://starlink.jach.hawaii.edu/starlink">Starlink</a> are now past their peak,
while
<a href="http://hea-www.harvard.edu/RD/ds9/site/Home.html">Ds9</a>,
<a href="http://aladin.u-strasbg.fr/">Aladin</a>, and
<a href="http://www.star.bris.ac.uk/~mbt/topcat/">Topcat</a> are all on the rise!</p>
<h2>Take-home message</h2>
<p>Most of the services and tools I have shown results for above actually have
standard phrases that you can add to the acknowledgment section of your
latest paper, but it's clear that most papers are not following these
guidelines. This is a severe problem because for some of these projects,
funding may be dependent on the level of use, and for volunteers it may be the
only way they can get credit for their work.</p>
<p>People may ask where acknowledgments should stop - should also acknowledge
LaTeX, Apple, or the use of Fourier transforms? Of course not. In my view, the
line should be drawn at the point where we think that these acknowledgments
matter and will make a difference to projects in our community. All of the
examples above are ones that should be acknowledged, and it is also crucial
that you think of acknowledging smaller software packages that you use,
especially if the developers have provided a standard phrase. Yes, your
acknowledgment section may become quite long, but this is not about esthetics -
it is something that may make a real difference to some of these projects.</p>
<p>Of course, you don't want to spend hours searching around for all the possible
acknowledgments on the web, but fear not! AstroBetter now hosts an <a href="http://www.astrobetter.com/wiki/tiki-index.php?page=Acknowledgements">acknowledgment wiki</a> on
which you will find many acknowledgments - this list is far from exhaustive, so
please add to it any acknowledgment you are aware of!</p>
<p>In the mean time, what do you think about the low fractions of acknowledgments?
How can we encourage more people in our community to fairly acknowledge the
tools and services we use?</p>Astropy: Google Summer of Code!2013-05-30T23:40:00+02:00Thomas Robitailletag:astrofrog.github.com,2013-05-30:blog/2013/05/30/astropy-google-summer-of-code/<p><img class="right" src="http://astrofrog.github.com/images/astropy_logo.png" title="astropy" alt="astropy"></p>
<p>As one of the co-ordinators of the <a href="http://www.astropy.org">Astropy</a> project, I am very happy to announce that two talented students will be joining the Astropy project as part of this year's <a href="http://www.google-melange.com/gsoc/homepage/google/gsoc2013">Google Summer of Code (GSoC)</a>!</p>
<p>For anyone not familiar with GSoC, it is a great program that allows students around the world to spend the summer contributing to an open source project (the students receive a stipend from Google for their work). Astropy is participating in GSoC as a sub-organization in the <a href="http://www.python.org/psf/">Python Software Foundation</a> organization.</p>
<p>The two students that will be working with us this summer are:</p>
<ul>
<li>
<p>Madhura Parikh, who will be working on the Astroquery affiliated package. Astroquery aims to provide a Python interface to many web services such as IRSA, SIMBAD, VizieR, and many others. Madhura will be refactoring Astroquery to unify the API, with the aim of a first stable release at the end of the summer.</p>
</li>
<li>
<p>Axel Donath, who will be working on significantly extending the capabilities of the Photutils affiliated package. Photutils aims to provide Python tools to perform aperture and PSF photometry, and the long-term goal is to integrate it into the core Astropy package. Axel will focus on developing the source detection and PSF-fitting functionality which are currently missing.</p>
</li>
</ul>
<p>Competition for GSoC was tough this year, and there were a number of excellent applications to work with Astropy, so I want to thank all the students who applied to work with us!</p>
<p>The official GSoC mentors for Astropy are Tom Aldcroft, Adam Ginsburg, Wolfgang Kerzendorf, Adrian Price-Whelan, Erik Tollerud, and myself. Throughout the summer, Madhura and Axel will be communicating with the Astropy development team through the astropy-dev list, so if you are interested in either of the projects mentioned above, please feel free to get involved in the discussions! Madhura and Axel will be also be blogging about their experience in GSoC (<a href="http://ping-vyom.blogspot.in/">Madhura's</a> and <a href="http://adonath.github.io/">Axel's</a> blog).</p>How to conduct a full code review on GitHub2013-04-10T13:38:00+02:00Thomas Robitailletag:astrofrog.github.com,2013-04-10:blog/2013/04/10/how-to-conduct-a-full-code-review-on-github/<h2>Why we might want to do it</h2>
<p>I think it's fair to say I'm addicted to using
<a href="http://www.github.com">GitHub</a>. I've used it so much in the last couple of
years that I don't understand/remember how we got any serious collaborative
coding done before. In particular, the ability to comment on code
line-by-line, having conversations, updating the pull requests, and merging
them with a single click is in my mind so much more rewarding and productive
than having to comment on a patch in an email discussion.</p>
<p>However, I occasionally want to do a full review of a package that someone
else has written, and comment on various parts of the code. While it is
possible to leave line-by-line comments on a commit-by-commit basis, GitHub
does not provide an official way to review the latest <em>full</em> version of a file
or package.</p>
<p>There are a few ways to conduct a full code review that I can think of:</p>
<ol>
<li>
<p>Browse through the files, on GitHub or locally, and open new issues
for anything we would like to comment on, copying and pasting the relevant
code. Not ideal if we want to comment on 20-30 chunks of code or more!</p>
</li>
<li>
<p>Browse through the files on GitHub, and if we see a line we want to comment
on, we can go to the <em>Blame</em> tab, and then find the last commit that
modified that line, and comment on it. The issue with this is that we might
want to comment on a chunk of code that was the result of several commits in
which case this method breaks down.</p>
</li>
<li>
<p>Leverage the <a href="https://help.github.com/articles/using-pull-requests">pull request</a>
interface, with a little git-<em>fu</em>, to conduct a proper full code review.
This is in my opinion the best approach, and in this post, I describe one
way to do this. There may be more elegant ways, so please let me know if you
have any suggestions!</p>
</li>
</ol>
<h2>How to do it</h2>
<p>Ideally, one could simply create an empty branch on GitHub, then set up a pull
request from <code>master</code> (or whatever branch you want to review) onto the empty
branch. However, as far as I can tell, you can't create completely empty
branches on GitHub - instead, we need our empty branch to have at least one
commit, which needs to match the first commit of the branch we want to review
(otherwise GitHub will complain that there is no common history).</p>
<p>So how we proceed depends on whether the first commit contains code that needs
to be reviewed, or if it is unimportant (for example, a lot of repositories
start with the addition of an empty README file).</p>
<h3>If the first commit is unimportant...</h3>
<p>... then the situation is fairly easy. You first need to find out the commit
hash for the first commit in the repository, which you can do with:</p>
<div class="highlight"><pre><span class="err">$</span> <span class="n">git</span> <span class="n">rev</span><span class="o">-</span><span class="n">list</span> <span class="o">--</span><span class="n">all</span> <span class="o">|</span> <span class="n">tail</span> <span class="o">-</span><span class="mi">1</span>
<span class="n">ec2287e5837386c54fbd082021530aa18c0dcf18</span>
</pre></div>
<p>In the example above the hash is <code>ec2287e5837386c54fbd082021530aa18c0dcf18</code>,
but this will be different for you. Now, create an empty branch containing
only that commit:</p>
<div class="highlight"><pre><span class="err">$</span> <span class="n">git</span> <span class="n">branch</span> <span class="n">empty</span> <span class="n">ec2287e5837386c54fbd082021530aa18c0dcf18</span>
</pre></div>
<p>This will create, but not switch to, the empty branch. Next push your
<code>empty</code> branch to GitHub:</p>
<div class="highlight"><pre><span class="err">$</span> <span class="n">git</span> <span class="n">push</span> <span class="n">origin</span> <span class="n">empty</span>
</pre></div>
<p>Go to your repository on GitHub and click on the 'Pull Request' button at the
top right of the window:</p>
<p><img alt="pull request 1" src="http://astrofrog.github.com/images/code_review/pull_request_1.png" /></p>
<p>Then set it up so that you are pulling the changes from <code>master</code> into
<code>empty</code>, as follows:</p>
<p><img alt="pull request 2" src="http://astrofrog.github.com/images/code_review/pull_request_2.png" /></p>
<p>You can now enter a title and message for the pull request, and invite other
people to comment on the code. If you make changes to <code>master</code>, you can
simply push the changes to GitHub as usual:</p>
<div class="highlight"><pre><span class="err">$</span> <span class="n">git</span> <span class="n">push</span> <span class="n">origin</span> <span class="n">master</span>
</pre></div>
<p>which should cause the new commits to appear in the pull request. Once the
review is complete, you can just close the pull request (without merging), and
keep the empty branch for future reviews (or delete it).</p>
<h3>If the first commit is important...</h3>
<p>... this makes things a little more complicated. The approach we'll take here
is to create two new branches - <code>review</code>, containing the code to review, and
<code>empty</code>, containing no files - both of which contain a common and empty
first commit (which we will add). In this way, the two branches have a common
history, even though the <code>empty</code> branch has no files. We can set then set up
a pull request from <code>review</code> to <code>empty</code>.</p>
<p><strong>Important disclaimer</strong>: make sure that you make a backup of your repository,
and that there are no unsaved changes! If you follow these instructions, any
files that are not already in the repository <em>will</em> get deleted, as well as
any uncommitted changes! In fact, it might be safest to do this in a clean
clone of your repository, so that if anything goes wrong, you haven't affected
your usual work repository.</p>
<p>With that disclaimer in mind, go to the repository you want to do a review
for, and then create an empty branch that we will call <code>review</code></p>
<div class="highlight"><pre><span class="err">$</span> <span class="n">git</span> <span class="n">checkout</span> <span class="o">--</span><span class="n">orphan</span> <span class="n">review</span>
</pre></div>
<p>This branch has no history, but the files should still be there and would be
added to the branch if we were to commit. However, you don't want to do this,
so remove all the files in the repository in the current branch by first
unstaging all the files:</p>
<div class="highlight"><pre><span class="err">$</span> <span class="n">git</span> <span class="n">rm</span> <span class="o">-</span><span class="n">r</span> <span class="o">--</span><span class="n">cached</span> <span class="o">*</span>
</pre></div>
<p>then removing them all:</p>
<div class="highlight"><pre><span class="err">$</span> <span class="n">git</span> <span class="n">clean</span> <span class="o">-</span><span class="n">fxd</span>
</pre></div>
<p>Note that any file that was not previously part of the repository will be
deleted for good, not just from this branch!</p>
<p>You should now have a nice and empty branch:</p>
<div class="highlight"><pre><span class="err">$</span> <span class="n">git</span> <span class="n">log</span>
<span class="nl">fatal:</span> <span class="n">bad</span> <span class="k">default</span> <span class="n">revision</span> <span class="err">'</span><span class="n">HEAD</span><span class="err">'</span>
<span class="err">$</span> <span class="n">git</span> <span class="n">status</span>
<span class="cp"># On branch review</span>
<span class="cp">#</span>
<span class="cp"># Initial commit</span>
<span class="cp">#</span>
<span class="n">nothing</span> <span class="n">to</span> <span class="n">commit</span> <span class="p">(</span><span class="n">create</span><span class="o">/</span><span class="n">copy</span> <span class="n">files</span> <span class="n">and</span> <span class="n">use</span> <span class="s">"git add"</span> <span class="n">to</span> <span class="n">track</span><span class="p">)</span>
</pre></div>
<p>You are now ready to set up the review. You should first add a dummy commit
that contains no files:</p>
<div class="highlight"><pre><span class="err">$</span> <span class="n">git</span> <span class="n">commit</span> <span class="o">--</span><span class="n">allow</span><span class="o">-</span><span class="n">empty</span> <span class="o">-</span><span class="n">m</span> <span class="s">"Start of the review"</span>
</pre></div>
<p>Then create a new branch called <code>empty</code> that will contain only this commit:</p>
<div class="highlight"><pre><span class="err">$</span> <span class="n">git</span> <span class="n">branch</span> <span class="n">empty</span>
</pre></div>
<p>This will create a branch with the same empty commit, but will keep on the
<code>review</code> branch. You can now merge in the changes from the branch we want to
actually review, say <code>master</code>, into <code>review</code>:</p>
<div class="highlight"><pre><span class="err">$</span> <span class="n">git</span> <span class="n">merge</span> <span class="n">master</span>
</pre></div>
<p>You will be asked to provide a merge commit message, and you can just leave
the default. Next push your <code>review</code> and <code>empty</code> branches to GitHub:</p>
<div class="highlight"><pre><span class="err">$</span> <span class="n">git</span> <span class="n">push</span> <span class="n">origin</span> <span class="n">review</span>
<span class="err">$</span> <span class="n">git</span> <span class="n">push</span> <span class="n">origin</span> <span class="n">empty</span>
</pre></div>
<p>Go to your repository on GitHub and click on the 'Pull Request' button at the
top right of the window:</p>
<p><img alt="pull request 1" src="http://astrofrog.github.com/images/code_review/pull_request_1.png" /></p>
<p>Then set it up so that you are pulling the changes from <code>review</code> into
<code>empty</code>, as follows:</p>
<p><img alt="pull request 3" src="http://astrofrog.github.com/images/code_review/pull_request_3.png" /></p>
<p>You can now enter a title and message for the pull request, and invite other
people to comment on the code. Make sure that you switch back to your
<code>master</code> (or other) branch to implement the changes, and if you then want to
update the review pull request, you can switch back to <code>review</code> and merge
the latest changes from <code>master</code>:</p>
<div class="highlight"><pre><span class="err">$</span> <span class="n">git</span> <span class="n">checkout</span> <span class="n">review</span>
<span class="err">$</span> <span class="n">git</span> <span class="n">merge</span> <span class="n">master</span>
<span class="err">$</span> <span class="n">git</span> <span class="n">push</span> <span class="n">origin</span> <span class="n">review</span>
</pre></div>
<p>which should cause the new commits to appear in the pull request.</p>
<h2>Epilogue</h2>
<p>As you can see, if the first commit in your repository is unimportant, things
are actually pretty straightforward. I'd love to hear if anyone has a better
way to deal with the case where we want to review all commits, including the
first. Finally, if any GitHub employees are reading this - please make it
easier for people to conduct full reviews! :)</p>What Python installations are scientists using?2013-01-13T10:10:00+01:00Thomas Robitailletag:astrofrog.github.com,2013-01-13:blog/2013/01/13/what-python-installations-are-scientists-using/<p>Back in November 2012, I
<a href="https://twitter.com/astrofrog/status/269743084215103488">asked</a> Python
users in Science to fill out a survey to find out what <a href="http://www.python.org">Python</a>, <a href="http://www.numpy.org">Numpy</a>, and
<a href="http://www.scipy.org">Scipy</a> versions they were using, and how they maintain their installation. My motivation for this was to collect quantitative
information to inform discussions amongst developers regarding which versions
to support, because those discussions are usually based only on guessing and
personal experience. In particular, there has been some discussion in the
<a href="http://www.astropy.org">Astropy</a> project regarding whether we should drop
support for Numpy 1.4, but we had no quantitative information about whether
this would affect many users (which motivated this study).</p>
<p>In this post, I'll give an overview of the results, as well as access to the
(anonymized) raw data. First, I should mention that given my area of research
and networks, the only community I obtained significant data are Astronomers,
so the results I present here only include these (though I also provide the
raw data for the remaining users for anyone interested).</p>
<p>Before I show the results, I just want to make it clear that I am not claiming
that the results are a true sampling of Python user levels. I advertised the
poll via Twitter, a couple of Python mailing lists, and the Facebook group for
Astronomers. The survey was announced on different days on Twitter and
Facebook, so there may be some useful information about the typical Python
installations of Twitter vs Facebook users buried in the data that I won't
cover here. If anyone is interested about when the announcements were made, to
correlate with response peaks in the data, please let me know!</p>
<p>With that out of the way... let's look at the results!</p>
<h2>Overview</h2>
<p>First, some general stats - there were 313 responses in total, of which 244
were related to Astronomy (where I use the term in the broadest sense,
including solar physics, planetary science, astrophysics, and cosmology). The
responses were recorded between November 17th 2012 and December 2nd 2012 (at
which point the rate of responses had gone down to less than one a day).</p>
<h2>Python Versions</h2>
<p><img alt="python versions" src="http://astrofrog.github.com/images/python_versions.png" /></p>
<p>As shown above, an overwhelming 80% of Astronomers use Python 2.7, and almost
15% use Python 2.6. Almost no-one uses Python 3.x for production work yet,
which is not surprising, given that at the time of the poll there were not
stable versions for all the crucial packages in a scientific Python stack (in
particular, Matplotlib only released their first Python 3.x compatible release
in December). It will be interesting to see how this fraction changes over the
next year (more on that in future blog posts).</p>
<h2>Numpy Versions</h2>
<p><img alt="python versions" src="http://astrofrog.github.com/images/numpy_versions.png" /></p>
<p>In the above plot, <em>dev</em> includes anything that is a developer version more
recent than the 1.6.2 release (which was the latest stable release at the time
of the poll). The distribution is again significantly peaked, with almost 80%
of respondents using Numpy 1.6.x. There is more of a spread in the remaining
versions compared with the Python versions, but the vast majority of people
are using Numpy 1.5.x or more recent.</p>
<h2>Scipy Versions</h2>
<p><img alt="python versions" src="http://astrofrog.github.com/images/scipy_versions.png" /></p>
<p>In the above plot, <em>dev</em> includes anything that is a developer version more
recent than the stable 0.11 release (which was the latest stable release at
the time of the poll). Unlike the Python and Numpy versions, which are almost
exclusively dominated by two versions, the Scipy versions show a larger
spread, with the most popular version, 0.10.x, representing less than 45% of
users.</p>
<p>I originally thought that Scipy released more often than Numpy, and this would
explain the difference, but it seems that both projects have been releasing at
a reasonably similar rate (see
<a href="http://sourceforge.net/projects/numpy/files/NumPy/">here</a> and
<a href="http://sourceforge.net/projects/scipy/files/scipy/">here</a>). Therefore, this
might be to do with package managers, or simply to the fact that Numpy is used
more often than Scipy, and users are therefore more likely to run into bugs
and update to the latest stable version? I have to admit that I would not even
be able to tell without checking what Scipy version I am using, whereas I know
I'm using Numpy 1.6.2 for production work.</p>
<h2>Installation</h2>
<p>We now get to some very interesting statistics - how users install Python and
dependencies. While Python is awesome in many respects, installation is
probably the biggest hurdle that users have to jump to get started.</p>
<p><img alt="python versions" src="http://astrofrog.github.com/images/install_methods.png" /></p>
<p>I'm not sure if anyone's quantitatively looked at this before, but this was
the first time that I really got a sense for all the different ways that one
can maintain a Python installation, and which methods are the most popular. The options shown above are described below:</p>
<p><em>Linux Manager</em> means linux package managers (<code>apt-get</code>, <code>yum</code>, etc.)
<em>Source</em> means an installation from the source code. This means either
downloading the source code and running <code>python setup.py install</code>, or using
<code>pip install</code> or <code>easy_install</code>.
<em>EPD</em> stands for the
<a href="http://www.enthought.com/products/epd.php">Enthought Python Distribution</a>,
which is a scientific Python bundle that includes e.g. Numpy, Scipy,
Matplotlib, and many other packages. It is free for users at academic
institutions.
<a href="http://www.macports.org"><em>MacPorts</em></a> is one of the most widely used package
managers on Mac, and I have provided instructions for getting set up with
Python and MacPorts <a href="http://astrofrog.github.com/macports-python/">here</a>.
<em>Official Installers</em> refers to the MacOS X disk images, Linux RPMs, and
Windows installers that are provided by some projects (including Python
itself, Numpy, and Scipy).
<em>Admins</em> means that Python and the packages were installed by System Administrators.
<a href="http://www.eso.org/sci/software/scisoft/"><em>SciSoft</em></a> and <a href="http://www.stsci.edu/institute/software_hardware/pyraf/stsci_python/current/stsci-python-download"><em>STScI Python</em></a> are two Astronomy-specific software bundles.
And <a href="http://www.activestate.com/activepython"><em>ActivePython</em></a> is similar to
EPD, but where binary packages are downloaded on-the-fly as needed.</p>
<p>Of course, some of these are not orthogonal, because for example
<code>easy_install</code> can be used to install additional packages not in EPD. But
the responses from the survey refer to how the main packages (Python, Numpy,
and Scipy) were installed.</p>
<p>What can we take away from the results?</p>
<ul>
<li>
<p>If we combine the Linux Package Managers and MacPorts (one of the Mac
Package Managers) into a more general <em>Package Managers</em> category, this
amounts to around 40% of users, the single largest group.</p>
</li>
<li>
<p>Only a small fraction of people use the official binary installers, with
many more people installing from source. This was surprising to me, given
how quick/easy it is to install Python, Numpy, Scipy, and Matplotlib using
the official installers. I think this is down to the fact that this is not a
well-documented installation procedure, and is platform dependent.</p>
</li>
<li>
<p>Astronomy-specific bundles (SciSoft and STScI Python) are not as widely
used, which indicates that more effort should be put in getting packages in
existing package managers than building new software bundles.</p>
</li>
<li>
<p>A small fraction (around 7%) have no idea how they installed Python and
other packages, so they may run into issues when they try and upgrade in
future. If you install Python for someone, please explain to them what you
are doing and how they can update packages in future!</p>
</li>
</ul>
<p>I personally feel that we should encourage users to install Python and
whatever dependencies are available from package managers. Of course, in some
cases users don't have root access, but this generally means that they have
sysadmins, so in those cases, the best option is still for the sysadmins to
install the main Python packages via package managers.</p>
<h2>Summary</h2>
<p>To me, one of the most interesting results is that a large number of people
have a reasonably up-to-date installation, with Python 2.7 and Numpy 1.6.x,
and I imagine that the Python 2.7 peak is here to stay, given that the
transition to Python 3 will be slow.</p>
<p>For developers, supporting only Python 2.6 and above seems like a sensible
choice at this stage (a decision we made within Astropy), and given the
imminent release of Numpy 1.7.0, I think that developers can start thinking
about dropping support for Numpy 1.4 in the near future. For Scipy, things are
a little more difficult, given the broad spread of versions, so developers
should ensure that they know what versions they are implicitly supporting, and
to check what version users have installed.</p>
<p>In terms of installation method, I think it's very important to ensure that
packages are included in package managers. Even if it is easy to install
packages via <code>pip</code> or <code>easy_install</code> in some cases, putting packages in
package managers ensures that users will more likely stay up-to-date with the
most recent versions.</p>
<p>There is more information still contained in the data than I covered here (for
example, some of the above points can be correlated - do the people who do not
know how they installed Python correlate with the older versions?). For anyone
who is interested in looking at the data, I've placed the files and the
scripts I used to make the above plots in a GitHub repository
<a href="https://github.com/astrofrog/python-versions-survey">here</a>.</p>
<p>If you have any thoughts about the results, or find anything interesting in
the raw data, please leave a comment!</p>