CyberTech Rambler

July 11, 2007

Nothing new about Excel Formula

Filed under: Uncategorized — ctrambler @ 11:32 am

Back in 1997 when I started my PhD, the university organized a talk on computational statistics. That were the days when I first discover that Excel has statistical formula. I  thought it was great. Until, that is, university computing service tell us not to use those statistics function. The reason: They don’t know how they are formulated. (Translation: they do not give you correct results).

Fast forward 2001, this time I am not a PhD student anymore, but the most computer-savvy person in another university’s medical school outfit. Everytime someone shows me a chart which looks like it is generated from Excel, I repeated the warning I received, but added that things might had changed in the few years between. I sincerely believed the statistical formula thing had changed back then. It appears, I have misplaced my faith.

This issue surface in the ongoing ODF vs OOXML beauty contest. This time, Rob Weir fires off his salvo attacking OOXML formula, after months of being in the firing line by OOXML supporters saying that ODF lacks a standard for spreadsheet formula. OpenMalaysia blog further demonstrate the current problem with one function, ceiling() and shows that there is no concensus at present on what it actually means in spreadsheet program. So clearly, the problem is not only Microsoft’s, but at least, ODF is closer to a mathematically correct spreadsheet formula than OOXML (Comparing ODF 1.2 vs OOXML’s ISO submission).

My experience with statistics shows that bad statistics is WORSE, far worse,  than no statistics at all. Hey, your brain surgeon is going to rely on CT machine to decide the least damaging path to descend into, and cut out your brain tumour. Rudimentary CT relies on statistics, among other mathematical stuff, to correctly display your brain as an image. More advanced CT scans, especially those that post process and claims to more clearly identify the tumour, rely on statistics even more.

The take home message from Rob Weir’s article is there is more than what meets the eye when we talk about spreadsheet formulas. As demonstrated, it is not easy to implement it correctly. Given this difficulty, it is understandable why ODF choose to postpone the consideration of spreadsheet formula in the standard until it finish work on the aspects covered by the original ODF specification. To do it all together will be too time consuming. It is better to finish up ODF 1.0, let it loose to build experience and expertise in using it, and, at the same time, design a robust formula language.

Is Excel’s or other spreadsheet applications’ formula extremely bad? Most of the time, no. It is just that they made assumption that is not universally true. They are too narrowly defined, perhaps with particular field, say finance, in mind.

One small little events highlighted by Rob Weir that is interesting:

“OOXML devotes 324 pages of the standard to document the formulas and functions.

“The original submission to the ECMA TC45 working group did not have any of this information. Jody Goldberg and Michael Meeks that represented Novell at the TC45 requested the information and it eventually made it into the standards. I consider this a win, and I consider those 324 extra pages a win for everyone (almost half the size of the ODF standard).” – Miguel de Icaza

Well done Novell! What was Microsoft thinking when it submit the original specification to ECMA WITHOUT spreadsheet formula? Certainly without specification of spreadsheet formula, one cannot “100% representing existing Microsoft Office documents”. With this hindsight it is possible that Microsoft knows the formula does not stand up to scrutiny and therefore omitted it. I do not think this is an oversight. I do not think Microsoft did this to make sure it is the only party that use OOXML either.

OOXML team can argue that it needs to “faithfully represents existing Microsoft Office document” so it needs to carry on using these “partially correct” formulas. It made this argument on 1900/1904 date system so why not this? In fact, there is a stronger case here to keep these formulas to allow migration away from these formula because every users of these formula has to perform the conversion, not application vendors only. However, in the long run, we need to get away from these formula for good.  Just ask yourself whether you want to preserve compatibility with old and existing document, or do you want the correct results?

Advertisements

2 Comments »

  1. Hello,

    The *initial* submission did not include it, but the initial submission lacked lots of things. The ECMA process helped make sure that the bits that the members considered to be missing were implemented.

    IBM could have participated had they had an interest in improving OOXML.

    The point that I made on my original post was addressing various statements from those that had a political agenda to keep the discussion on ODF and OOXML technical.

    The ODF proponents could have taken the high road, and could have stayed technically honest, but some of them chose to use the very same FUD techniques that they accuse Microsoft of using against others.

    In my opinion, we should not destroy our reputation by using scare tactics and half-facts. Maybe Mr Rob Weir thinks this is fine, but if he were to use these tactics on an actual free software contribution he would be voted off the island.

    Miguel.

    Comment by Miguel de Icaza — July 12, 2007 @ 5:10 am | Reply

  2. Dear Miguel,

    With reference to

    “The *initial* submission did not include it, but the initial submission lacked lots of things. The ECMA process helped make sure that the bits that the members considered to be missing were implemented.”

    in case you thought I was being sarcastic when I said “Well Done Novell!”, I meant it as a good sign that someone is at least reading the initial submission carefully, and an insult to Microsoft for failing to do it.

    Nobody expect initial submission to be perfect. But missing out spreadsheet formula is a big thing and so blatantly obvious for the submitter to miss, whether you are ODF or OOXML. It must be a conscious decision by somebody in Microsoft not to include it. What I did, was to speculate the reasons why it did. Pure speculation, nothing more.

    On “IBM could have participated had they had an interest in improving OOXML.” :

    IBM, Microsoft (and SUN) could have sit on both Technical Committee. For one reason or another they all choose not to participate in one of the committee. All have to shoulder some blame if some important things did not happen because they did not participated in the committee.

    Rob Weir’s postings is always on the radical side. And yes, I see his agenda. (I should add that I have an agenda of my own) When I read them I always find it necessary to filter out the rhetorics. But underneath it you get quite good technical arguments, something I find (frustratingly) lagging on pro-OOXML blogs such as Brian Jones’. A lot of Jones’ arguments seems to be what Tony Blair likes to say: “Trust me coz I know better”, which is frustrating.

    And for the interest of fair disclosure, I did submit a comment to BSI on OOXML recommending the rejection for fast-track process. Some of the arguments I made came from ODF camp. However, I did make sure I filtered it. My interest in commenting is to show the problems I face as a small, independent person that has the knowledge and ability to be a producer/consumer of OOXML. I see flaws in the specification that makes life difficult for people like me who do not want to use third party tools to handle OOXML but to do things directly on XML using DOM/SAX parser and I highlighted them. A lot of them are rather petty, like complaining my UNIX command line console will actually automatically hide “.rels” directory and I do not like to have to escape [ and ] just to reference a compulsory file in the package’s root dir. Hopefully if enough of us did, someone will take notice and discuss the issues I raised. Whether or not changes will be made will depends on the overall merits of the issues and I do trust National Bodies to be able to be independent in their decision making.

    Back to this particular Weir’s post. The underlying technical message I get is we _ALL_ had been extremely bad at spreadsheet formula. He frame it as OOXML have crap formula, but I see it OOXML being probably too specific to the needs of one particular field, perhaps finance, during its creation, where it is very likely to be safe to make assumption

    In my line of work, I see statistics being abused day in day out and sometimes the stats simply did not support the conclusion the researcher is trying to make. That’s why I made the comment that bad statistics is perhaps worse than no statistics. I accept this is a judgment call.

    Time to correct this spreadsheet formula problems for ALL involved. If I get my way I will say lets create an independent spreadsheet formula committee, cut out all these ODF vs OOXML debate coz they are diversions and simply create tension between the two camps, for the collective good of producers and consumers.

    Comment by ctrambler — July 12, 2007 @ 2:33 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: