| |||||||||
When generating code for arithmetic expressions, the compiler has to decide which is the best way to translate the expression in terms of number of instructions used as well as number of registers needed to evaluate a certain subtree (especially if free registers are scarce). The so called Sethi-Ullman algorithm (also known as Sethi-Ullman numbering) fulfills the property of producing code which needs the least number of instructions possible as well as the least number of storage references (under the assumption that at the most commutativity and associativity apply to the operators used, but laws like <math>a * b + a * c = a * (b + c)<math> do not hold). Please note that the algorithm succeeds as well if neither commutativity nor associativity hold for the expressions used, and therefore arithmetic transformations can not be applied.
The simple Sethi-Ullman algorithm works as follows (for a load-store architecture):
For an arithmetic expression <math>a = (b + c) * (d + 3)<math>, the abstract syntax tree looks like this:
To continue with the algorithm, we only need to examine the arithmetic expression <math>(b + c) * (d + 3)<math>, i.e. we only have to look at the right subtree of the assignment '=':
Now we start traversing the tree (in preorder for now), assigning the number of registers needed to evaluate each subtree (note that the last summand in the expression <math>(b + c) * (d + 3)<math> is a constant):
From this tree it can be seen that we need 2 registers to compute the left subtree of the '*', but only 1 register to compute the right subtree. Therefore we shall start to emit code for the left subtree first, because we might run into the situation that we only have 2 registers left to compute the whole expression. If we now computed the right subtree first (which only needs 1 register), we would then need a register to hold the result of the right subtree while computing the left subtree (which would still need 2 registers), therefore needing 3 registers concurrently. Computing the left subtree first needs 2 registers, but the result can be stored in 1, and since the right subtree needs only 1 register to compute, the evaluation of the epxression can do with only 2 registers left.
In an advanced version of the Sethi-Ullman algorithm, the arithmetic expressions are first transformed, exploiting the algebraic properties of the operators used.