CyberTech Rambler

July 11, 2007

XML representation and application requirement

Filed under: Uncategorized — ctrambler @ 7:20 pm

Let’s face it. You and I are not going to sit down tomorrow and start consuming XML data. XML data are mainly data for computer to read, with the additional advantage  that we human can eyeball it if needed. This advantage is comes at a price, i.e., the more human can read XML, the more likely (but not necessary to be true all the time) machine will have a more difficult time in reading it. The trick is to trade off the two.

My starting point is XML is a machine language, not human language. Like everything in computing, the advantage of trying to improve human readability is a curve where initially you get a lot of human readability per additional work workload, which then reduce until you finally hit a reflexion point where, to further improve readability, you need to do a lot of work for very little readability gain until  you overcome this point. Unfortunately, once you overcome this point, you will find that your return curve is often worse than the one you had previously. The trick to me is therefore getting as close as you can to the reflexion point, and do not attempt to overcome it.

The reflexion point is rather arbitrary. The more Mr Jones blog about OOXML, the more clear is it to me that OOXML team has a different reflexion point from a lot of other people. My reflexion point is the point where most machines/applications will find it difficult to understand the XML, but Microsoft’s reflexion point seems to be the point where the Office team will find it not worthwhile to work Microsoft Office application to do the additional work. Its difficult to say this is fair enough if you just want to open up OOXML for others to be able to peek and emulate, and is certainly not good enough to be a good quality standard.

Needless to say, I find Mr Jone’s blog post on why WordProcessingML is the way it is a confirmation that a lot of decisions are simply too specific to Microsoft’s willingness to modify Microsoft Office.

To me, the starting goal is already wrong, i.e., to follow the original Word document model pretty closely.  That is fatal, as it simply means you are prepare to discard a lot of good, desirable feature for the sake of following the model closely. He claims they tried other approaches but find this the best. Surprised? Not a bit. The office team is simply too comfortable with the existing word document structure and to follow that structure closely will be the path of least resistance.

MS office have formula fix-up issue. That’s Microsoft’s problem and up to Microsoft to fix it. Longer load problem? It’s up to Microsoft to fix it. To use this as the excuse not use ISO Dates and use the 1900/1904 date bug instead is exporting Microsoft problems for others to deal with. People who starts with clean sheet should not have to deal with legacy problems faced by existing office application vendors.

The rest of the article goes on to explain why the things is the way it is. The only value there is for technically inclined people like me to understand the structure of Microsoft Office application. The main point he is trying to make across on how this influence WordProcessingML structure is interesting, but irrelevant. Who cares if Microsoft Office does not capture the “outline view” and it is simply the artifact? I don’t want to clone Microsoft Office. That will be legally problematic and does not give me the competitive advantage. I do not want to be burdened by old problems with Lotus 123, Word Perfect, Microsoft Office, Star Office etc, etc, etc. I want a clean format where all vendors agrees, for the public good, to deal with their own legacy problem and not export it to me.

He claims WordprocessingML is a flat model, flatter than ODF. That is not my experience. As I see it, the use of  rPr tags add a level to the hierarchy, and WordprocessingML is generously peppered with rPr tags. My overall impression is WordProcessingML is more hierarchical from an XML perspective.

At the end of the article, Jones say Microsoft will provide a translator to any format for its customer. So why the half-baked effort for ODF converters from Microsoft. Sorry I forgotten. Microsoft did not provide one but help someone else to write one. My apology. I  further apologize if there is “not enough demand” from the customers to justify OOXML to ODF converter in MS office. Someone obviously forgotten to tell me to vote for such a thing a while ago.


Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Blog at

%d bloggers like this: