CyberTech Rambler

February 10, 2010

In defense of scientists

Filed under: Uncategorized — ctrambler @ 2:22 am

Who am I to argue with a professor? In the Guardian, Professor Darrel Ince wrote an excellent article on the problem with not releasing the source code of scientific program for public scrutiny. Being a person working in an academic environment but in a scientific support role in a non-engineering department I can say I share his view. However, probably because I am  on the lowest possible rung of the ladder, i.e., not even on the ladder, I feel that I need to bring you the readers, the the scientists’ viewpoint.

Why didn’t those scientists release their code in open source fashion? The primary reason is they want to commercialize their product. Luckily that thinking is shifting now. Take one example I am closely involved in for example: It took two persons, a researcher that the professor respect a lot, and me, someone he trust on the computing side, that releasing the software under the General Public License will not decrease the value of the software. The type of work we do means software is simply a tool. What they are selling is expertise in the field. Scientific software are not your run-of-the-mill photoshop wannabe. It crunches data, nothing more. Therefore, as a scientist, at least one that is serious, what you want to know is whether you used the correct model (as embodied in the program), and whether are you asking the correct question. For example, if you ask the computer and the computer replies that colour A is not brighter than colour B, you cannot reach the conclusion that colour A is dimmer than colour B. This are not information you get from journal papers, and once you learn it, you can choose whatever software you are using. And looking at the field, any field for that matter, the definition of scientific research means there is not many people that has the knowledge in this world. Therefore they need support. This means then they should concentrate on selling that precious commodity which is inexhausible. The software is merely a demonstration of their competancy. For that, they should disseminate it as wide as possible.

Second, experience show that putting software out as open source does not really help improve quality. First problem: no eyeball to look at it. The number of people who has the necessary expertise to vet the software is simply not there. Fellow researchers (read competition)? No. They probably do not know how to program in the language you are programming in. Even if they do, they do not have the incentive to do so.

Third, scientists are not judged by the quality of the program or the program themselves, but how the bring new information to their field of interest.

Fourth, one key way  with acadamic software quality assurance is number of time the software is used to process different data. They rely on the fact that the more times the software is used, the more likely that the bugs are found and quashed. That is why instead of saying “more eyeballs make bug shallow”, I say “more data make bugs less likely”. Using this system, being open source is not a necessary attribute.

Having paint a bad picture on the programming front,  I think I need to aswer two questions: First, is releasing software as open source important and second, is there anything we computing professional can do?

The answer to both is yes.

The reason for the answer yes to the first question is mathematical equation alone is insufficient to describe what they really did. They made assumptions, managed outliners and borderline cases without making those decisions clear in the article. There are things that were accepted in journals that a more computing-based journal will reject as insufficient information. A few times when I have to rewrite someone’s algorithm, I have to refer to and infers their decisions on outliers and borderline cases by reading the codes. A few times, I come across construction of the software that will fails because it relied on perculiarity of particular system. Luckily, most of the time, it just increase noise rather than invalidating the work.

The answer to the second question is a definite yes but we need to weave in advantages that they can see.

I do this by slowly move them to adopt modern software practise, like reusing pieces of software. Academic/scientific software tends to be silo in themselves, even between parts of the same software. A lot of scientists, when needing the same function that they have in another part of the software, will prefers to duplicate the code instead of working to make sure that the same function can  work on both part of the software. We know this is just storing problems for the future. However, there is no point pointing this out to them. The way I sell it is to show them how reusing software make sense and make their program more robust. I show them that by sharing the function, it creates incentive to improve the function, and ensure that the improvement is propagated to all users. In the long run, it actually shorter their programming cycle, making them more competitive  as they build up a library of tried and tested functions and find that they do not have to recode anything.

So, in short, things are changing, for the better.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: