In response to comments on my post about MTEST that said, “People should use open source tools instead,” I can only reply, “If some people can’t afford cheap, what makes you think that everyone can afford free?” I’m a big fan of Python, NumPy, SciPy, matplotlib, and everything else, but they have their costs too — costs that scientists with deadlines often can’t afford to pay. If someone wants to give me a grant to go and measure how productive different kinds of scientists are using different combinations of open and closed source software, you know how to reach me…
Come on, Greg, you know this is a gross simplification. I actually believe that you are trolling purposely hear. Yes there are plenty of problems with Python, scipy, … But the alternatives have problem too.
For instance, rolling out quickly code written without tests and or a distribution mechanism yields unreproducible results and as a consequence as a huge cost. Unless the journals and funding agencies don’t care about reproducibility, which is unfortunately often the case. However, I wonder why we call these practices science.
Not to say that commercial products are not rapidly improving in these regards, but it is unfair to claim that the hidden cost with the various open source tools is always larger than the one with market-leading commercial solutions.
I didn’t say that the hidden cost of open source is always larger than that of “market-leading commercial solutions”. The fact is, I don’t know, and I don’t believe anyone else does either: we simply don’t have data on how much time scientists (and other people) spend wrestling with installation, configuration, poor documentation, bugs, etc., in either category of software versus how much time they spend using it to do things they consider productive.
What hapens when you move to a university or dept. that doesn’t have a matlab license? I try to avoid depending on software that I won’t have access to in the future.
On my operating systems (mac os & ubuntu), python is much easier to install than any of the closed-source alternatives (even with a site license).
I have no direct experience with Windows, but none of my students complained about installing Python after I pointed them to distributions like enthought [which is a commercial product, btw] or python(x,y).
I agree completely that getting something for $0 is not always preferable, except that my main point has little to do with money. For the purpose of science, I’m far more interested in the “open” part than the “without cost” part. If we can reduce financial barriers to entry (for people who have little money but lots of time to invest), wonderful, but I’m also hoping that more companies like Enthought pop up to provide training, support, packaging, and custom industrial development based around open source scientific tools. As I understand it, they’re doing quite a lot of business, and my limited experience with EPD has been fantastic; I imagine it will only get better.
I’m not debating that MATLAB fills a niche, and fills it pretty well. Nor am I against paying for software, or using it if it’s closed source (I’m considering shelling out for a licensed copy of the Intel Math Kernel Library to link against NumPy). What I am saying is that the more of the code that drives scientific analyses that anyone can independently inspect, the better for the entire scientific enterprise.
David Joyner and William Stein make a pretty good case to their fellow mathematicians as to why they should be uneasy about papers that rely on closed tools, and I think an analogous case can be made for scientific computing of all sorts.
I can only say That I feel very productive on Linux using Python (and all the batteries not included), R, Sage, etc.
I’ve been away from closed-source software for so long that if I had to move to a closed-source tool set, my productivity would drop to almost zero. But that only means You will alway be more productivity with the tools you are familiar with.
When it comes to reproducibility, I am convinced that my Python or R codes are definetly more reproducible than Matlab or Mathematica, considering that people who don’t (or won’t) have a license for these softwares cannot reproduce my work. Reproducibility, however, has a very poor track record in Science. Reproducibility has more to do with changing the culture about how we publish results than with the license of the software used.
I work for a government research organisation. I try to use python based, free open source tools for my work when I can, but it’s not always the cheapest way overall to get a result. Let me list some of the additional costs:
1. Can’t leverage existing IDL code to write programs faster.
2. Can’t draw on many other people’s experience when I am stuck as they are almost all fortran, IDL or Matlab people.
3. Other people can’t reuse my scripts, or debug them while I am on leave.
3. To work with others in my group I need to maintain proficiency in IDL, as well as the python toolchain.
n.
4. The tools are still quite fragmented. It’s time consuming to set up a python based environment that can do the data processing, analysis, plotting and image processing I can do with existing IDL scripts.
5. Not every package is mature. I’ve been burned a few times with third party python packages that are just not stable enough to use for jobs that need to be automated.
6. There are multiple competing tools. Some atmospheric scientists prefer CDAT. Others like the python interface to NCL. Me I prefer scipy+netcdf4+matplotlib. This fragmentation increases the costs associated with being one of a small number of users in a workgroup or field.
7. If anyone wants to reproduce what I’ve done, they will more often than not use their tool of choice. In this case documenting algorithms and details is the main thing.
So there are real costs – mostly in time and problem-solving energy. I tend to think the benefits outweigh the costs in the long run, but very frequently in science the short term is what matters.
Every time I drop implementing something in python and write it in IDL because it’s easier/quicker/more reliable, I feel a little disappointed, and this happens more often than I’d like.