CyberTech Rambler

October 27, 2008

Element vs Attribute

Filed under: Uncategorized — ctrambler @ 1:25 am

In his post, “Old wine in new skins“, Patrick Durusau tries to engage us on one age-old discussion and something that still bugs XML designers today: When to use attribute or element. He framed it as one tenant of  the ODF vs OOXML beauty contest. My main objection with OOXML syntax style is not “attribute vs element” but the unnecessary pollution of implementation detail which makes it difficult to read OOXML. Nonetheless, I will bite.

Firstly, I will like to complain about his definitiion of “semantically correct”. To me, semantic correctness in XML means nothing. It is really easy to form a semantically correct XML syntax, especially if you are free to do whatever you wish.

The point with XML design is not to achieve semantic correctness, but to say what you mean in a logical way with minimal fuss, greatest clarity, easily read by machine and human, and finally, as much as possible, free from implementation details. To have implementation details leaks into the XML is unavoidable, but its effect can be minimized. Lets take the example of me and my dog. It is semantically correct to implement “I->own->dog” or “dog->owned_by->me”. If you are running a database that reunits missing dogs based on their name on their dogtag, then the second will be more appropriate since the information you have is the dog’s name and your search will necessary start by identifying dog with the given name. This means you will be looking at dog’s name more frequently than owners. Using the second scheme means you simply look at the top level elements, but with the second you have to navigate down one level to fetch the dog’s name, which is an extra, probably unnecessary operation. To choose the second system instead of the first is definitely an “implementation leak” but it is unavoidable. It will indeed be stupid to insist on the first, eventhough for a human, it is a more natural way of representing owner-dog relationship. It is however, incorrect to capture the implementation detail as dog->owned_by->pointer_address(0x0002a)->me simply because you used a pointer to match my dog to me.

Now, which is easier to read?

<owner name=”ctrambler”>

<dog name=”doggyRambler” />

</owner>

Or

<owner>

<name value=”Rambler” />

<dog>

<name value=”DoggyRambler” />

</dog>

</owner>

Both are sematically correct. Both represents the same thing, i.e. “I ->own->dog”. I am sure at least 90% of people will say the first one communicate better than the second. Therefore, attribute wins hands down.

More importantly, from a technical point of view, the first one is potentially more cost effective. Say I want to search for my dog’s name in the database and I am already at the correct “owner” element. Taking the simplest case of one dog per owner, but bearing in mind that I might store other elements or attributes as well, let’s see what is the potential cost scenario: In the first case, search for element “dog”, then search for the “name” attribute for dog, i.e., two searches. The second scenario: search for element “dog”, then search for element “name” before finally searching for “value” attribute. Three searches. Note that in the second case I will have more elements than the first, since what is normally attributes become elements. This is important because searching for element might not as efficient as searching for attribute. XML Parsers normally keep attributes together with the element, but keeps elements separately. This is a common design trick for parser that has to optimize for memory and have to store data on disk caches.

I know a good XML parser, created to parse specific XML syntax, can be optimized to remove a lot of cost associated with element access. However, this is a luxury affordable only by those who writes application that depends extremely heavily on one XML syntax. The rest, including me, will have to depend on general XML parser where element-element navigation is likely to cost more.

Also note that, it is semantically correct if you have two “name” elements but it is not for two “name” attribute. To limit “name” element to one you need a schema. In short, you need to apply a secondary mechanism to limit your XML syntax with the “name” element. Do you really need this extra complication or should you use inbuilt XML rule to enforce one “name” only?

If this is not enough to convince you that attribute is the better approach here, lets look at another example:

<myelement name=”e1″ restriction=”restrictionA”>

<myelement name=”e2″ restriction=”restrictionB” />

</myelement>

Or

<myelement name=”e1″>

<restriction value=”restrictionA”/>

<myelement name=”e2″>

<restriction value=”restrictionB”/>

</myelement>

</myelement>

In which version are you more likely to be certain that “restrictionA” applies to element with the name “e1”? It is more of a judgement call but I believe more people will think the example where “restriction” is an attribute is more likely to imply this. We are predisposed into thinking the attribute is a property of the element and definitely applies to this element and may be its children. On the other hand, a child simple says to us that it has a relationship with its parent and its own children but nothing else. The “restriction” element can equally says the restriction it carries apply to the parent or itself or its children or indeed any combination of the three with no preference at all.

So, which is better, element or attribute?

Before I conclude this post, let me assure you that the decision on choosing to use element or attribute to represent something is normally not so straightforward, especially the data you want to capture is not a simple piece of data. Take for example if you want to capture my name in two parts: first name and family name. In this case, the advantage of using attribute is somehow diminished from a design point of view.

In Durusau’s post, he used a simple boolean attribute as an example, therefore I had chosen to treat “name” as a simple property in the example.

Advertisements

2 Comments »

  1. Is it true Patrick Durusau tries to engage us in this detail discussion, or does he want to discuss the object model.

    Comment by Rob Weemhoff — October 28, 2008 @ 11:34 am | Reply

  2. @Weemhoff

    I originally thought it is the object model, but after reading the main body of the post, it is detail write up that he is interested in.

    Looking at his background there is nothing to suggest he is a programmer, or a computer science major. Therefore I concluded that he is not engaging us in the object model, at least in the computer-science/programming sense.

    Comment by ctrambler — October 28, 2008 @ 12:40 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: