CyberTech Rambler

October 19, 2007

Compound Document Format and OpenDocument Foundation (Updated 20071109)

Filed under: Uncategorized — ctrambler @ 7:50 am

The first time I heard about OpenDocument Foundation people not happy with ODF is from Stephen McGibbon post about Gary Edwards disagreement with Sun. Then comes Rob Weir’s that OpenDocument Foundation had moved away from OpenDocumentFormat. With Rob Weir post I sense some crack in OpenDocument Foundation over ODF. While Weir’s post continues its tradition of building up evidence to support his argument, he is known to be a very passionate guy about ODF and is not shy about attacking opposition, any opposition to ODF. Hence, in this respect, I believe I have to exercise a certain amount of caution when Weir start attacking someone new. Today, I came across Jason Matusow’s happy rambling about how OpenDocument Foundation is unhappy about ODF and appears to be supporting a single document format. Matusow view it as an argument that the “one document format” theory does not work. More on this later.

Hmmm… Did OpenDocument Foundation change direction away from ODF? and what is this Compound Document Format (CDF) thing that seems to be the new love of OpenDocument Foundation.

First, what is CDF? At CDF HQ:” A Compound Document is the W3C term for a document that combines multiple formats …” and although the title says the ‘F’ in CDF refers to ‘Format’, the rest of page appears to reference ‘F’ as Framework. The description of the initiative, together with it links to IBM’s pages on the same subject and a description of CDF use, suggest that the ‘F’ is indeed Framework. This means it is possible to use CDF as the container for either ODF or OOXML contents. Since it is from W3C, and that one of the article suggest possible unification of content generated from the web, UI display and plain old office documents, it can be describe as a superset in which office documents like ODF and OOXML is a subset. This means its purpose is substantially different from ODF or OOXML. The most important merit is probably this superset ‘feature’ and mentality. One other merit of CDF, picked up by OpenDocument Foundation, is better perception of vendor neutrality especially when compared to ODF or OOXML.

CDF does aim to be a unified document format. The keyword here is ‘a’ unified document format, not ‘the’ unified document format. This ability to be a unified document format can be used by advocate of a single document format to be ‘the’ document format of choice. This is important as it makes the distinction that it is OpenDocument Foundation that wants CDF to be the single document format, not W3C or CDF’s working committee.

Can CDF replace ODF? My cursory review of its core specification says it specify a container, like XML, not the content, e.g. HTML. To use it as a replacement for either ODF or OOXML requires more work, a lot more work since one have to now specify the content. One way is to do so is to put ODF or OOXML as an object in CDF.

Can CDF be the container for the next ODF/OOXML document? Yes. There are merits to see this through, some of which are mentioend earlier. Based ODF/OOXML’s participants current stance on the document issue, ODF is more likely to adopt CDF as the container, although the chances of this happening is remote today.

One practical consequence of advocating “one document” approach is that there will be multiple document formats aiming for this throne. But this does not support multiple document format’s proponents theory, e.g. that of Matusow’s, that multiple document format is good.

Back to the question of whether OpenDocument Foundation is “changing direction”. One can make the argument that it is looking forward to the next generation of document format and is advocating the use of CDF as the foundation for this new generation. Hence, it is too early to say this is a change in direction. To me, this implies they have to choose another existing, working format. They did not.

As a matter of principle, I believe if OpenDocument Foundation is going to advocate a different file format from ODF, it should at least have the changed the name of the foundation to something more neutral.

[Updated 20071109] Andy Updegrove has the scoop that it might be impossible for CDF to replace ODF.  Certainly one influential participant in W3C standardization process does not think so. My feeling is it is going to be an extreme uphill struggle to wrestle CDF into something usable as editable document format, but not impossible. Is it worthwhile, I don’t know. If its advocate wants to try, they are welcome to.

3 Comments »

  1. A very VERY perceptive post by you, ctrambler.

    “… [the Foundation] is looking forward to the next generation of document format and is advocating the use of CDF as the foundation for this new generation.”

    If people are curious about our direction & motivations, I can say before our release of prototype software that, yes, we are changing direction. If one accepts our assertion that OOXML & ODF fail to fulfill customer requirements, one needs to look no further for our reasons.

    It may seem naive, but we believe in a Universal Document Format where any and all applications are citizens with the same rights of access. W3C is the ideal place to host such a thing owing to its credibility of objectives and process…if they’ll have us?

    Comment by Sam Hiser — October 20, 2007 @ 11:16 am | Reply

  2. Good work on CDF. We are using it as a container for HTML+ (HTML5, CSS3, SVG/Canvas, JS, jQuery).

    The key to understanding our resigning from OASIS ODF work (after seven years) has to do with our plug-in experiences in Massachusetts and California. There are public expectations for ODF that are clearly beyond the scope of the specifications design. Way beyond.

    What many of the governments expected was an open file format that was highly structured, application-platform-vendor independent, compatible/convertable with the legacy of MSOffice binary documents, interoeprable and, Web ready.

    That’s not ODF. Nor is it OOXML.

    Strip it all down and you’ll realize that these governments expected XML formats to be a better HTML than HTML.

    The problems with XML formats begins with XML itself. It was originally designed to be a “language for creating custom languages”. It was not designed for “interoperability” or “Web readiness”, but that’s not to say these attributes were unattainable. I think it’s entirely possible to design an XML format to meet the public expectation stated above. But to do so, one would have to start from a clean slate, and that’s not what happened with either ODF or OOXML.

    ODF and OOXML began life as application specific XML encodings of the OpenOffice/MSOffice binary dump. It was exactly the fact that XML was designed for customized uses that it was so useful to Sun and Microsoft. But this customization comes at the cost of interoperability. The good news is that both ODF and OOXML are open and structured. The bad news is that it will takes years of work to strip out the application specific aspects and replace them with neutral elements.

    HTML on the other hand has evolved to the point where it can be used for an extraordinary wide range of purposes; including office suite documents. And with HTML5-CSS3-JS, we finally have a sound structure separating content, presentation (layout, formatting, relational behaviors) and logic.

    In fact, HTML+ is so good that if it had existed back in the 1998-2001 time frame where ODF and OOXML were first conceived and implemented, there would have been an enormous outcry against these application specific XML formats.

    Here’s another point worthy of consideration. With over 60 billion public documents written in HTML+, and another estimated 30 billion behind the firewall, HTML+ is perhaps the most interoperable format mankind has ever known. Yet, there are few end user facing editors capable of producing interoeprable HTML+. One has to wonder how many billions of HTML+ documents would be out there if powerful desktop office suite editors were enabled to produce native HTML+?

    We believe that it is entirely possible to re-purpose both OpenOffice and MSOffice to natively produce (read/write) HTML+ documents, using the application plug-in and developer extension models.

    As for the OpenDocument Foundation? It was scheduled to end in December of 2007 when the OASIS membership expired. This had been planned as far back as June of 2007, when, after seven years of work, i dropped out of the ODF TC.

    Although i personally had been a member of the OASIS Open Office – OpenDocument TC since it’s 2002 inception, the OpenDocument Foundation had a much shorter life span. It was created in June of 2005, shortly after the May 2005 OASIS approval of ODF. The purpose was pretty straight forward. After the May 2005 OASIS approval for ODF 1.0, the ODF TC was swamped with corporate members. IBM led this rush, and brought with them Oracle, Intel, Novell, Google and Adobe (although Adobe and Google might dispute any connection to IBM).

    An OASIS member representing Computer Associates contacted me, expressing concerns i had shared about this corporate rush, and the changes wrought. His idea was to broaden the participation of open source community and individual experts across ALL OASIS standards efforts by taking advantage of little known loop-hole in the OASIS rules enabling 501c(3) non-profits to sponsor multiple representatives.

    So, in the interests of balancing out the increasingly overwhelming influence of corporate vendors, we formed the OpenDocument Foundation. All in all, we sponsored the membership of 28 particpants over the next two years. These Foundation participants formed most of the membership of the ODF sub committees for OpenFormula and Metadata; including prominent contributors, Bruce d’Arcus, Dan Brickland, and David Wheeler. 15 of our sponsored members continued their work on ODF long after OASIS closed off our loophole and effectively ended the Foundation’s rather unique role in the development of ODF.

    Let me also point out that there was a great deal of difference between the 2002 – 2005 TC and the post OASIS-ISO 26300 TC. We had moved from being a group of individuals dedicated to doing something useful for the future of mankind’s information needs, to becoming a vendor centric group dedicated and driven to replacing MSOffice on the desktop and ending Microsoft’s monopoly. Although i personally supported the vendor objectives, i had always hoped that the original initiative would, on it’s own, accomplish the challenge of creating an open market of document solutions while providing that highly structured format that was legacy compatible, interoperable and Web ready.

    Thanks for your post. For the rest of the Foundation story, try these links:

    An honest misunderstanding? Hardly! Play the tape!

    da Vinci here

    The Search for Interoperability: ODF, OOXML and HTML

    ODF and OOXML: Something New to Ponder

    Hope this helps,
    ~ge~

    Comment by Gary Edwards — May 7, 2009 @ 1:44 am | Reply

  3. There is also a lengthy response to Rob Weir’s libelous Cracks in the Foundation” that can be found in the comment section of Jesper Lund Stocholm’s Hypocrisy 101 post.

    Rob was kind enough to join the fray; even though he was himself the butt of the joke. He made his usual entrance, sliming and slamming everyone daring to disagree with his dictates. But when the going got tough, and he was unable to respond to the facts, he dropped out of sight just as quickly and mysteriously as he had appeared.

    In my comments, i lay out the facts of what happened in Massachusetts, and our rather unique relationship with IBM that had come about because of efforts by Massachusetts CTO Louis Gutierrez to solve the impossible ODF implementation problem.

    The short story is that our da Vinci plug-in did exist, and was demonstrated to a full house of Massachusetts executives and IT personnel at an IBM hosted event. A number of IBM execs attended, includin Rob Weir and his boss, Doug Heintzman. Rob sat in the fornt row, and had his ears pinned back when the da Vinci ODF plug-in for Excel beat the loading of a native .xls version of the same million row document. I also discuss the the fact that Massachusetts had asked us to let IBM fully review and inspect the da Vinci source code. Which IBM did.

    The events and facts are quite contrary to the false representations in Rob’s incendiary hit piece,“Cracks in the Foundation”

    Enjoy,
    ~ge~

    Comment by Gary Edwards — May 7, 2009 @ 2:14 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.