Pages

Wednesday, January 8, 2014

IKON: human oriented JSON look-alike

So, yet another DSL/markup but please take a look:

{ Star
   size = 12
    x = 9
    y = 4
    name "Alpha Centaury"
    planets [
        { Planet size = 100 }
        { Planet size = 120 }
        { Planet size = 10 }
    ]
}
Looks like JSON. So why make up another language? Well, unlike JSON this one isn't designed with Javascript compatibility first, instead it's designed with human editor in mind. In general case IKON (Ivan Kravarčšan's object notation) looks like JSON with less double quotes but it can be expanded with domain specific syntax. Need shorthand syntax for matrices? No problem, plug it in. And even more customization can be achieved by extending IKADN (Ivan Kravarščan's abstract data notatins) but more about it later.

(Brief) history

Before unleashing language/notation specifics allow me to tell you a little about brief history of IKADN and IKON. DSLs (domain specific languages) have a tendency of cropping up in almost every project. A year ago I decided to rewrite the Stareater, a strategy game and a rather big project of mine and since it was rewrite I knew that I'll need not one DSL but three: one for settings and save file, another one for assets (statistical data about game objects such as how much they cost and what image present them) and yet another one for localization. Most of developers would simply pick XML, XML and XML. I find XML too heavily oriented on describing data organization instead of data itself. It's good for some purposes but it's overkill for my needs. JSON, YAML and plain INI were options I considered but at some point I figured what I was really looking for is something that can adapted to a specific problem. Settings file would work OK general markup language such as JSON but localization files would work better with text oriented solution. For instance there is no need numbers and arrays in localization data while simplified syntax for single line text is very useful. So, I decided to make my own solution that satisfies specific needs.

The solution I came up with are IKADN, an abstract notation that can be used for making other markup languages and IKON, a general purpose language. The rest are technicalities but I'll mention that I'm actively using IKON and two IKADN derivates in the Stareater project for a year and improve them regularly.

About IKON

As said before IKON is general purpose solution so, much like JSON and YAML it features three classes of data:
  • Scalars (atoms of data such as numbers or texts)
  • Tables (key-value pairs)
  • Array (much like tables but with values only and values are ordered)
For more information about data types consult with project's wiki page: https://code.google.com/p/ikon-library/wiki/IKON. Some differences to JSON are that notation doesn't limit range and precision of numbers (it's up parser implementation), array items are not comma separated, table keys are without double qoutes (but they can't contain white spaces) and there is no separator between key-value pairs. Since it known where each value ends, item separators in arrays and tables would be purely cosmetic so I decided to drop them entirely. Strings (textual atoms) are similar to JSON's "backslash escaped" strings and I'm not very fond of it. It makes multiline text blocks look like a single line text and hard to edit. I do have an idea for a syntax that to make text blocks look more natural but I'm working on details on how to introduce it without breaking too hard the compatibility with previous version of the notation.

About IKADN

The sexiest part of the solution is an abstract notation that simplifies implementation (and to some degree design) of your own notation. Projects wiki page https://code.google.com/p/ikon-library/wiki/IKADN contains more details but in short IKADN syntax consist of following rules:
  • Each data type starts with a specific character (such as "=" for numbers in IKON or "[" for array), 
  • Notation designer defines how data it self looks like 
  • White spaces between data are ignored.
Doesn't look like much but IKADN parser and writer considerably simplify parser and writer of  concrete notation. Official C# implementation (I intend to write more detailed post about it later), available on NuGet, contains:
  • Logic for reading stream
  • Handling of unexpected end of stream 
  • Deciding which data type to read
  • Helper methods for reading, skipping and substituting characters from the input stream 
  • Helper methods for writing nicely indented IKADN document. 
For comparison, official implemention of IKADN consists of 243 lines of code while IKON implementation that uses it has 328 lines. That's 40% job done by IKADN.

How to get it

Official project web site it at Google Project Hosting: https://code.google.com/p/ikon-library/. There are source code, wiki pages and most recent build for .Net. Build is also available through NuGet.

Here is an example how to use the library. Let's say IKON document from the top of the post is in input.txt. This the code that would print the name of the star:


using System;
using System.IO;
using Ikadn.Ikon;
using Ikadn.Ikon.Types;

namespace IKON_example
{
    class Program
    {
        static void Main(string[] args)
        {
            var reader = new StreamReader("input.txt");
            using(var parser = new IkonParser(reader))
            {
                IkonComposite star = parser.ParseNext().To<IkonComposite>();
               
                Console.WriteLine(star["name"].To<string>());
            }
        }
    }
}


Parser can read form any TextReader subclass so if you want to read from string instead of file, use StringReader instead StreamReader. Also parser implements disposable patters so you can use it with using statement to ensure that input stream is closed after use.

Notice To<T>() method, it's the helper method for converting objects and the alternative to usual C#'s cast-and-get (cast an object to target type and then call a getter method). In the first case IkadnBaseObject is simply cast to IkonComposite (table type). The second case is more complex and shows true power or the method. Instead of type casting, underlying IkonText (textual type) returns it's contents as .Net string. Which types can be requested depends on the underlaying IkadnBaseObject subclass. For example, in the case of IkonNumber (numeric type) valid conversions include most native .Net numeric type (decimal, int, float, ...). In case of IkonArray (array type) one can ask for T[] (native .Net array) or IEnumerable<T> if array elements can be converter to T.

If you have any questions out the project, feel free to drop a comment. If you have a feature request or bug to report, you can file an "issue" on the project site too.