I am excited to share that at the end of 2015 I will leave my 'traditional' academic position and will start a new chapter in my professional life! During my time as a researcher, it has become clear that what I enjoy most is finding new ways to do science, developing robust and re-usable software, and helping and teaching others to do so. Throughout my projects, I have constantly tried to promote good research and software practices (such as reproducible research), and create tools that could be used by others and are applicable beyond my specific research area. In the last few years, I have also been incredibly lucky to have been involved as one of the co-ordinators and lead developers of the Astropy project. My goal is now to transform my passion for scientific software and open science into a full-time job, rather than fitting it in between all the usual responsibilities of a traditional academic job.
Back in 2012, I carried out a survey to find out which Python, NumPy, and SciPy versions scientists are currently using for their daily work, in order to better understand which versions should be supported. The main finding was that a large fraction of people have reasonably up-to-date Python installations, although virtually no-one was using Python 3 for daily work.
This year, I decided to repeat the experiment: last January I advertised a survey which asked users to provide information about their Python installation(s) for research/production work, as well as more general information about their Python experience, which packages they used regularly, why they are not using Python 3 if they were still using Python 2, and so on.
There is a lot to be learned from this data, and there is no way that I can cover all results in a single blog post, so instead I will focus only on a few points in this post, and will write several more posts over the next couple of weeks to highlight various other results.
For this post, I thought it would be fun to take a look specifically at what Python versions users in the scientific Python community are using, and in particular, the state of Python 3 adoption. I am making an anonymized and cleaned-up version of the subset of the data used in this post in this GitHub repository, and will add to the data over time with future blog posts.
This week, the 6th installment of the .Astronomy conference series will be taking place in Chicago. I will unfortunately not be attending this year, but I was nevertheless motivated today to try and finish up a hack that started as a result of discussions with Niall Deacon before and at .Astronomy 5 in Boston!
The idea is simple: as I described in a blog post last year, we are not doing good job at acknowledging the tools that we use for our research, which in turn means that many people who spend time developing tools for the community are not getting the credit they deserve. (how to give credit to people for non-traditional work in academia is a recurring theme of .Astronomy meetings).
A couple of weeks ago, I attended the 5th .Astronomy meeting, which took place in Boston. For anyone not familiar with this series of conferences, the aim is to bring together researchers, developers, and educators/outreach specialists who use or are interested in using the web as a tool for their work (I like to think of it as an astro-hipster conference!).
One of the topics that comes up regularly at .Astronomy meetings is the question of credit: how do we, as scientists, get credit for work that is not considered 'traditional', such as (but not limited to) creating or contributing to open source software, outreach activities, or refereeing? Sarah Kendrew already summarized the discussions on this topic in her blog, so I won't repeat them here. However, given that I contribute to a number of open source projects (such as Astropy, APLpy, and many others) this got me wondering how often authors actually acknowledge the tools that they use in papers?
I previously played around with the NASA/ADS full-text search, but what I wanted was a way to be able to do this automatically for any keyword/phrase, and be able to see the evolution of acknowledgments over time. With the release of the ADS developer API (which Alberto Accomazzi presented on the Monday at .Astronomy), I finally had the tool I needed to do this! This was a fun post-dotastro hack, for which I now present the results below.
For anyone not familiar with GSoC, it is a great program that allows students around the world to spend the summer contributing to an open source project (the students receive a stipend from Google for their work). Astropy is participating in GSoC as a sub-organization in the Python Software Foundation organization.
Why we might want to do it
I think it's fair to say I'm addicted to using GitHub. I've used it so much in the last couple of years that I don't understand/remember how we got any serious collaborative coding done before. In particular, the ability to comment on code line-by-line, having conversations, updating the pull requests, and merging them with a single click is in my mind so much more rewarding and productive than having to comment on a patch in an email discussion.
However, I occasionally want to do a full review of a package that someone else has written, and comment on various parts of the code. While it is possible to leave line-by-line comments on a commit-by-commit basis, GitHub does not provide an official way to review the latest full version of a file or package.
Back in November 2012, I asked Python users in Science to fill out a survey to find out what Python, Numpy, and Scipy versions they were using, and how they maintain their installation. My motivation for this was to collect quantitative information to inform discussions amongst developers regarding which versions to support, because those discussions are usually based only on guessing and personal experience. In particular, there has been some discussion in the Astropy project regarding whether we should drop support for Numpy 1.4, but we had no quantitative information about whether this would affect many users (which motivated this study).
In this post, I'll give an overview of the results, as well as access to the (anonymized) raw data. First, I should mention that given my area of research and networks, the only community I obtained significant data are Astronomers, so the results I present here only include these (though I also provide the raw data for the remaining users for anyone interested).