Monday, June 30, 2008

Bim Bam BOM

We had this strange bug recently, while trying to parse a perfectly healthy XML file we got an exception saying:

org.xml.sax.SAXParseException: Document root element is missing.

We could, however, open the XML file in the browser without an error. And also our XML editor took it without a problem. Yet, in our code trying to operate some XSL transformation using Xalan, we got the exception above. And the problem was not with the transformation itself, at least according to the exception. The problem is with the XML document starting without a root. Though the first character seen in the file was an opening triangle bracket...

Well, there are things beyond what you see.

To find out what is the exact stream of bytes that the transformer receives, and doesn't like, we added the simplest debug line, dumping the bytes from the file to screen, not as chars but in their value. There appeared to be two bytes before the opening triangle bracket of the XML: FF FE.

At this point, my friend and colleague Effie Nadiv (famous for his Hebrew site, and a UI authority and legend), shouted out: it's the BOM! Xalan doesn't recognize the BOM correctly!!

And without further ado he presented the file (same file) in text mode and in binary mode:





See the FF FE at the beginning? This is the BOM.

BOM stands for Byte-Order-Mark, added to UTF-16 documents to denote the order of the bytes in each two consecutive bytes creating a character. UTF-8 documents may also have BOM, but it will be redundant and have no mean.

To read more about Unicode, UTF-8, UTF-16 and BOM, you may want to go to:
http://unicode.org/faq/utf_bom.html
http://en.wikipedia.org/wiki/Byte_Order_Mark

But, before you rush to the above, a GREAT reading material to understand once and for all the entire encoding and charset thing:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky.
This is a must read!

Specifically to solve the above problem we changed the doc to UTF-8 without BOM. Most text editors support conversion to different unicode transformation formats, and allow the user to decide whether to add BOM to UTF-8 or not (probably it's better not to add, just turn off the option underneath).

.


=========================================
Added 21/7/08:
------------------------------
Just found this old newsgroup entry on the subject...
http://biglist.com/lists/xsl-list/archives/200208/msg01302.html
=========================================

Thursday, June 26, 2008

Tag, Log, Debug!

I was presenting with my friend Alex Romanov our work on "Automated Log Generation and Analysis using Collaborative Tagging" at the IBM Programming Languages and Environments seminar 2008, yesterday (25/6).


This is Alex with the poster describing our work:






More info on this work can be found here: http://ed.finnerty.googlepages.com/taglogdebug.
Here you can find the initial paper.
And finally, a blog that will follow our work was opened here: http://taglogdebug.blogspot.com

Tuesday, June 24, 2008

Eclipse Plugins Tutorial (OOPSLA '07)



I was asked where can people get the materials from the tutorial on "Creating Plug-ins and Applications on Eclipse Platform" that Alex Romanov and I gave in Montreal on OOPSLA '07. It was while ago, and the materials are on the web already. But here are the links, as it seems google is not doing a good job referring people to the googlegroups site we created...
The tutorial deals with all the important things for getting started with Eclipse plug-ins: creating a plug-in project, the components of a plug-in project, GEF, MVC in the Eclipse Plug-in architecture, SWT, Actions, RCP and more.

So here is the companion booklet.
And here are the slides.
Both can be printed or redistributed, as is, for any purpose, referring back to the source.

Hope you can find it useful.


Wednesday, June 11, 2008

False Requirements

A giraffe
Suppose you are assigned to design a system that should carry out people from place to place, and occasionally let's say once in ten years, would have to travel a giraffe. It's tricky, but you may be able to come out with some strange car with a very high ceiling, working out the balance and stabilization. It may not be so economic, but who is going to be piker when it comes to carrying giraffes. (One can of course suggest a car without a ceiling at all, which might be a good solution, but one of the other requirements rejects a convertible.)

It turns out that we are often bending and twisting simple systems just to carry the giraffe. And in many cases when digging into, we find out the giraffe was not even in the formal requirements to begin with! It grew in somewhere, in one's imagination, and became an important part of the system. Oh, how much we could have saved without this giraffe, and the system could have been much simpler...

Let the giraffe travel on its own!

If it was in the original requirements, go back to the system analyst or the guy who wrote the requirements and ask him: do you really need this giraffe thing? maybe we can send him with another vehicle?

Giraffes are nice, but don't let them into your system.
Unless you want to run a zoo.