Friday, November 14, 2008

Make sure to have a strict XSD!

I'm going to tell you three things here:

  1. Don't invent new languages!

  2. XMLs are also languages (if they describe flow control)

  3. If you do create a new XML based language, make sure that it has a strict schema!

-------



Do not invent new languages

We have enough of them. And by a "new language" I mean anything that describes a flow, that includes all kind of weired XMLs. In some cases people call these XMLs "configuration" but when I look into it, it's NOT. Configuration is when you want to set a value to some attribute. If it is a complex attribute, e.g. a network topology, you may need to use XML, as you need a way to indicate hierarchy. But when it comes to defining a flow, and you have conditions maybe even method calls and loops, this is a new language! don't underestimate it.


What's BAD with creating your own new cute language?
(I'd even phrase it: "Why DSL sucks!")

Suppose that you have a service and you want to allow the user to externally configure its flow. Now you don't want to expose your code and allow the user to change it (that would be indeed a weird idea). And you don't want to load in run-time some external lib or dll or java class, which the user can compile and add. Which again is a reasonable decision. You want to keep it as simple as editing a text file. So you create your own little XML based language and it works great. The problem starts when problems start. And it's not a tautology. When something is not working correctly with your cute little language you find it very hard to debug it, as you don't have a proper debugger, nor even a proper IDE for it, not to speak of reasonable intellisence auto-complete. Your cute language is orphaned, to properly support it you will find yourself investing huge efforts, much beyond what you have planned originally.


What's the alternative

The best way, if you wish to have external flow configuration that includes conditions and such, is to use some existing script language and call it from your program. Perl, Python, JPython, of course Groovy, and others, can be good. The advantage is that you get a mature language with all the required surroundings. You can run the script from within your program and can get back feedback on variable values or return code. And it is much easier to explain the language to your user, you just point him to the language tutorial.


But if you do need your own new XML based langauge

Make sure that the language has a very strict XSD:

  • Use enums for list of values, so the user cannot select anything, only what's legal

  • Use types: never allow to get a string for an int!

  • Use regexp to define the allowed patterns of string values

  • Don't use the same attribute for two purposes, it breaks the schema: if you can get either an int (number) or a string (variable), use two different attributes, each will have its own rule

  • Use different XML elements to oblige set of attributes: suppose that A-B and C-D are attribute couples (i.e. if I use attribute A I must also provide B, and if I use C I must provide D), it's better to break them into two different elements, even if they are similar in nature

A strict schema gives you a good language protection, thus less bugs, a decent intellisence auto-complete if you use a decent XML editor, and a clear documentation for the user of what's right and what's wrong.

The same as any respectable language has its BNF, if you invent a new XML based language you should base it on a strict schema.


No comments: