Write a filter "yucca" (Yetanother Unicode Compiler-Compiler Abomination) which takes a more yacc-like grammar like my older "tacc" parser, e.g. keywordargument : ID EQ boolean { $$ = MakeBinary (NodeTag.arg, $1, $3); } | ID EQ NUMBER { $$ = MakeBinary (NodeTag.arg, $1, $3); } | ID EQ size_vector { $$ = MakeBinary (NodeTag.arg, $1, $3); } | ID EQ vector { $$ = MakeBinary (NodeTag.arg, $1, $3); } | ID EQ point_2d { $$ = MakeBinary (NodeTag.arg, $1, $3); } | ID EQ stripped_string { $$ = MakeBinary (NodeTag.arg, $1, $3); } ; I guess really all I need to do is add the $$ and $n symbols. ${nn} for >= 10. Maybe ${name} for named args, or ${name:num} for multiple occurances of same phrase. Also need a syntax to handle grouping, as in P = ( "," )* ; - maybe ${item:2[n]} - :2 is second explicit occurance of item, n is index of element of multiple instances of that subphrase. Unfortunately the linked list structure means that it doesn't map to an array element but to links down a linked list. Equivalent to P = ; P = "," , ; Which suggests: P = ( "," )* ; | | | | | $$ $1 $2 | | | | $2_$1 | | $2_$2 $2_$2 could also be referred to as $2_${item} Add << in emacs filter for .g files to convert to guillemets. Add an alternative syntax for regexps that doesn't use unicode guillemets, for ease of editing.