Frontend
We begin our journey with the user facing language of Juvix. This is
either text in a file, or text written into a REPL session. We then
use Parser Combinators
to turn this unstructured form into a Juvix
form. Finally we end on converting this Structured Juvix form into an
equivalent S-expression one to do further passes.
Parser Combinators
the parser combinator file that we will be inspecting will be found here.
In order to be able to understand this code, I suggest reading about how parser combinators work, for a very brief rundown here are the key features we will be abusing. If you wish to skip the brief rundown skip to the Specific Information Section.
Quick RunDown
matchL :: Parser Types.MatchL
matchL = do
skipLiner J.pipe
match <- P.try matchLogicSN
spaceLiner (P.string "->")
Types.MatchL match <$> P.try expression
Here the do just means we do the first statement, parse for a pipe, then a match. If any of these parsers fail to match (for example if there is no pipe), then we consider the parser to have failed.
Try is another important concept, this says if the parser fails for
any reason, don't consume any text it may have started to parse (say
our pipe passes but our matchLogicSN
doesn't, if that is the case,
then try matchL
would revert the pipe parse!).
topLevel :: Parser Types.TopLevel
topLevel =
P.try (Types.Type <$> typeP)
<|> P.try fun
<|> P.try modT
<|> P.try (Types.ModueOpen <$> moduleOpen)
<|>
means or, so if a parser fails, we move onto the next.
Finally it would be imporper of me to end the section without talking about the infix handler.
expressionGen :: Parser Types.Expression -> Parser Types.Expression
expressionGen p =
Expr.makeExprParser (spaceLiner (expressionGen' p)) tableExp
-- For Expressions
tableExp :: [[Expr.Operator Parser Types.Expression]]
tableExp =
[ [refine],
[infixOp],
[arrowExp]
]
infixOp :: Expr.Operator Parser Types.Expression
infixOp =
Expr.InfixR
( P.try $ do
inf <- spaceLiner infixSymbolDot
pure
(\l r -> Types.Infix (Types.Inf l inf r))
)
Here we outline three key functions. The expressionGen
uses the
tableExp
as the precedence table of infix parsers. So refine
has a
higher precedence than infixOp
. At the end of expressionGen
we get
back a new parser for any expression parser we may hand it.
The infixOp
structure explicitly states what kind of infix it is, in
our case we make it nest on the right, so a -> b -> c
gets turned
into a -> (b -> c)
. The important thing to note is that we only
parse for the inifix symbol itself, we let the parser we hand to
expressionGen
handle the other side.
Every other tool you'll see is an abstraction on top of these base tools. Even the Infix handler is built upon the first two primitives we've outlined.
Specific Information
Now we get into the specifics of our version. We use parser
combinators mostly in the standard way, however you'll often see the
forms skipLiner
, spaceLiner
, and *SN
which are not typical in a
parser combinator system.
spaceLiner
just eats all empty symbols, these are spaces and
newlines after the current parsed expression.
skipLiner
is the same however it is for any Character given to it.
finally *SN
just calls spaceLiner
on whatever parser.
These concepts exist namely due to the wish of eventually making the parser indent sensitive.
The main confusing bit of our layout is the many variants of expression!
do''' :: Parser Types.Expression
do''' = Types.Do <$> do'
app'' :: Parser Types.Expression
app'' = Types.Application <$> P.try application
all'' :: Parser Types.Expression
all'' = P.try do''' <|> app''
-- used to remove do from parsing
expression' :: Parser Types.Expression
expression' = expressionGen app''
-- used to remove both from parsing
expression''' :: Parser Types.Expression
expression''' = expressionGen (fail "")
expression :: Parser Types.Expression
expression = expressionGen all''
We have three main variants, ones with application, ones with do
syntax, and ones with both! These exists because certain
transformations will go into an infinite loop if you're not
careful. This is mainly due to how some forms like do
and infix
generators behave together. In other cases like in adt declarations,
we want to disable application type List a = Cons a (List a)
. It
would be a shame if the a
was applied to List a
!
S-Expression Transformation
The transformation file can be found here.
Here we do a very straightforward algorithm, of when we see a particular Juvix ML ADT form, we then transform it to the equivalent Lisp expression that can be found in the s-expression document in design
Ideally we could generate the particulars of the syntax via some schema and have the ML code use that. This would remove the hardbaked structure of each form from the transformation, however with how many different types exist in the frontend form this would be quite a challenge. We do this technique later on in the compiler, reducing the amount of hard baked forms.
The above issue also extrapolates a bit on why we have chosen this path. ML structured ADT's are wonderful, however they lack flexibility. We used a trees that grow structure in the past, however this would convert one rigid ADT into another. And so we had to convert the massive adt that we built one by one, which ended up with over 500 lines of code per pass! However an issue just as big is that no common interface can be had, thus our passes couldn't as easily be abstracted awway.
Final Words
Overall, this stage of the compiler is not too complicated, it should
be somewhat easy to get a grasp of and see how the data flows. I
suggest going into the EasyPipeline
library and loading up Easy.hs
and playing around. We have just covered up to the SEXP PHASE
section of that page, and so you should be able to understand what
sexp1
and sexp2
examples there dl