CyberTech Rambler

March 29, 2007

Lost in Translation

Filed under: Uncategorized — ctrambler @ 8:25 am

When reading ECMA’s response to National Bodies, one thing that strike me as rubbish is ECMA’s argument that it is possible to translate from EOXML to ODF, and therefore, there is no contradiction between EOXML and ODF. At that time, my thinking is if we accept this argument, there can never be any contradiction when it comes to software, as there is always a way to translate bit patterns. However, I did decided that the abbreviated style used for EOXML element/attribute name and predefined values did not matter, as I work with XML every day and can reasonably guess those value. Moreover, EOXML document package, although different from ODF, is just another document package and most people should be able to figure out its contents without reading the specs anyway. Unfortunately, less than a month after forming that two opinions, I was proven to be mistaken.

Last Monday, I was trying to do an ad hoc demonstration the ability to use XML to get an application to data in and out from MSOffice’s new EOXML format. I tried to demo it by hand as the person I am targeting is a developer and she can connect the dots. The reason why I attempted the demonstration is that she just got a copy of MS Office 2007 and I thought it is a good idea if I can persuade her to modify her application to read/write data in EOXML because this reduce the chances that users cut-and-paste results wrongly.

Well, I am a pro-ODF person. However, this demo was not set out to discredit EOXML. It is in my interest to see EOXML succeed in this case. If I succeeded and she adopt EOXML, it actually make it easier for me to translate it to ODF using XSLT for example. There are two key targets in the ad hoc demonstration. First is the ease of getting hold of the correct XML file and second the ease of modifying the XML file. Let’s face it, in both EOXML and ODF, it is not easy to create a document from scratch. Most developers, like me, will simply use Office applications to create a dummy document then modify it appropriately. These two key targets may be specific to academics but it is likely to apply to the majority of developers.

The demo is simple. Create a dummy spreadsheet, save to file. Then using a simple text editor, change one value in the dummy spreadsheet and reopen it in MSOffice. Did I succeeded? Well you guessed it: NO.

After I unzip a EOXML spreadsheet file, my first problem is that I have to hunt for the XML file containing the spreadsheet data. Lets call this the content file. I was looking for a filename that is really obvious, but nothing jump out. Not a good start. Then I decided to follow the EOXML standard by locating the file that explains where it is. Lets call this the TOC (table of content) file. I know this file is held in subdirectories but again, no directory jump out. I end up doing a wild goose chase searching the document package until I finally located it. (Yes, I know if I just read the EOXML standard I will know what the subdirectory and filename this TOC file is.) I opened the TOC file and quickly located the content file.

The next stage is to edit the content file. I opened it in Visual Studio’s XML editor. The first thing that hits me is, I cannot really tell what the single letter XML tags “r”, “c” and “v” etc really means. This surprised me. Never mind, lets gloss over this by proceeding to modify the values. That should be easy since I created the spreadsheet. I changed a number to a string. I repackaged the document and open it in Excel 2007. Puff…. it complains that the document contains error and cannot open it. My error, as it turns out, is that I did not realize that by changing a cell value from number to string, I should had changed the cell type as well.

My demonstration just gone up in smoke. Not surprisingly she is not convinced. For reading the document, I just demonstrate that she need to do the extra work of hunt for the correct content file by reading and understanding a TOC file. The problem here is since some other application created the content, one cannot assume the location of the content file unless the document format mandate it. I also demonstrated that it is not easy to read the it (BIG thank you to the “r”, “c” and “v” tag designer). As for writing to EOXML, again the abbreviated tags make it difficult to write the data correctly.

Was I to blame for the failed demonstration? Yes, but only partly. I should had read the specs and practiced the demo first but that’s it. The need to locate the TOC file is just a small speed bump. The need to read and understand the TOC file just to locate the XML file just make things complicated the casual EOXML users like us. However, the biggest problem is the difficulty in understanding the EOXML tags, attribute and value. Although this sounds like I am trying to put the blame for my failure on someone else by saying that EOXML should had made it easier for me to demonstrate the beauty of XML-based office document format, it is not. After all, how many people bothered to read the specs for something before using it? I did not do this when I started using XML, text editor or other technology.

What I learned from this experience is the danger of cryptic abbreviation. It constitute a new vocabulary and becomes a new language. To me, the content file must as well be written in a foreign language. May be a non-English speaker will find the abbreviation easier to use. Even if this is true, why create a new vocabulary and a new language? Reusing existing ones means at least a subset of users can understand it easily. Yes a translator can help, but why design one when it is not necessary?


1 Comment »

  1. […] March 30th, 2007 · No Comments Lost In Translation « CyberTech Rambler […]

    Pingback by CyberTech Rambler: OOXML's Cryptic Tag Names Cause Problems « Opportunity Knocks — March 30, 2007 @ 9:33 pm | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: