It's bearable when expressions are simple but take a look at the image below. That is an expression for the maximum number of starports on a planet. Floor is a function for rounding down, "^" is the power operator, PLANET_VELICINA is the size of a planet and SVEMIRSKI_PROGRAM_LVL is a level of relevant technology.
This, the picture above, is the syntax tree for 3+0.125*level of farming expression. Arrow at the top points to the root of the tree (knowledge of the trees from graph theory assumed). To get the value of the expression, user code (the one using the tree) has to simply ask the root to evaluate itself. Rest is the magic of composite design pattern, root would see that it needs two or more operands and sum them and it will ask it's children to evaluate them self. Since the first child is a constant, it will simply return 3. Second child on the other hand will ask it's children to evaluate themselves and multiply their values. As you can see a nice utilization of recursion and dynamic polymorphism.
Building syntax tree for infix expression complicated thing, at first there are binary operators (assumed reader knows what that mean) which have one set of not-so-easy-to-implement rules, then add to the mix unary operators and you'd get stuff complicated by factor 1.5 and things can be complicated even further with brackets and functions. On top of that you have to decide whether a / b / c should be interpreted as (a / b) / c, a / (b / c) or rejected as invalid. It would be hell to code and test all of it from scratch. Fortunately our fathers and grandfathers solved that for us when they invented compilers. Programmers use compilers for transforming a source code to an executable files but compilers are more general then that, they translate one language to another. In case of C++, source code written by the rules of C++ is translated to a machine code of the target hardware. There is fair amount of computer science theory involved here and it is named the formal language.
Main phases of compilation process
Every programming language is formal language and as such they have vocabulary (set of lexemes), grammar rules (syntax) and semantic rules. Vocabulary usually contains list of keywords (if, for and default for example) and regular expressions for variable names, constant numeric and textual values and other simple stuff though regular expressions are formal languages of their own. Grammar rules define more complicated stuff such as assignment has form of variable followed by '=' character followed by expression and ended with ';'. Semantic rules cover stuff which are hard to cover by the grammar such as checking if the type of a value can be stored in a variable. There are tools for generating source code for compiler's analysis phases, an inception one might say.
Problem of parsing text containing infix expression is basically that, lexical, syntax and semantic analysis. So I goggled a little bit for C# solutions and found that most queries lead to ANTLR, Java tools that can generate C# source code from grammar rules. More about my impressions with the tool in the following post(s).
No comments:
Post a Comment