Frontend

Documentation on the Juvix Frontend Phase

Frontend

We begin our journey with the user facing language of Juvix. This is either text in a file, or text written into a REPL session. We then use Parser Combinators to turn this unstructured form into a Juvix form. Finally we end on converting this Structured Juvix form into an equivalent S-expression one to do further passes.

Parser Combinators

the parser combinator file that we will be inspecting will be found here.

In order to be able to understand this code, I suggest reading about how parser combinators work, for a very brief rundown here are the key features we will be abusing. If you wish to skip the brief rundown skip to the Specific Information Section.

Quick RunDown

  matchL :: Parser Types.MatchL
  matchL = do
    skipLiner J.pipe
    match <- P.try matchLogicSN
    spaceLiner (P.string "->")
    Types.MatchL match <$> P.try expression

Here the do just means we do the first statement, parse for a pipe, then a match. If any of these parsers fail to match (for example if there is no pipe), then we consider the parser to have failed.

Try is another important concept, this says if the parser fails for any reason, don't consume any text it may have started to parse (say our pipe passes but our matchLogicSN doesn't, if that is the case, then try matchL would revert the pipe parse!).

  topLevel :: Parser Types.TopLevel
  topLevel =
    P.try (Types.Type <$> typeP)
      <|> P.try fun
      <|> P.try modT
      <|> P.try (Types.ModueOpen <$> moduleOpen)

<|> means or, so if a parser fails, we move onto the next.

Finally it would be imporper of me to end the section without talking about the infix handler.

  expressionGen :: Parser Types.Expression -> Parser Types.Expression
  expressionGen p =
    Expr.makeExprParser (spaceLiner (expressionGen' p)) tableExp

  -- For Expressions
  tableExp :: [[Expr.Operator Parser Types.Expression]]
  tableExp =
    [ [refine],
      [infixOp],
      [arrowExp]
    ]

  infixOp :: Expr.Operator Parser Types.Expression
  infixOp =
    Expr.InfixR
      ( P.try $ do
          inf <- spaceLiner infixSymbolDot
          pure
            (\l r -> Types.Infix (Types.Inf l inf r))
      )

Here we outline three key functions. The expressionGen uses the tableExp as the precedence table of infix parsers. So refine has a higher precedence than infixOp. At the end of expressionGen we get back a new parser for any expression parser we may hand it.

The infixOp structure explicitly states what kind of infix it is, in our case we make it nest on the right, so a -> b -> c gets turned into a -> (b -> c). The important thing to note is that we only parse for the inifix symbol itself, we let the parser we hand to expressionGen handle the other side.

Every other tool you'll see is an abstraction on top of these base tools. Even the Infix handler is built upon the first two primitives we've outlined.

Specific Information

Now we get into the specifics of our version. We use parser combinators mostly in the standard way, however you'll often see the forms skipLiner, spaceLiner, and *SN which are not typical in a parser combinator system.

spaceLiner just eats all empty symbols, these are spaces and newlines after the current parsed expression.

skipLiner is the same however it is for any Character given to it.

finally *SN just calls spaceLiner on whatever parser.

These concepts exist namely due to the wish of eventually making the parser indent sensitive.

The main confusing bit of our layout is the many variants of expression!

  do''' :: Parser Types.Expression
  do''' = Types.Do <$> do'

  app'' :: Parser Types.Expression
  app'' = Types.Application <$> P.try application

  all'' :: Parser Types.Expression
  all'' = P.try do''' <|> app''

  -- used to remove do from parsing
  expression' :: Parser Types.Expression
  expression' = expressionGen app''

  -- used to remove both from parsing
  expression''' :: Parser Types.Expression
  expression''' = expressionGen (fail "")

  expression :: Parser Types.Expression
  expression = expressionGen all''

We have three main variants, ones with application, ones with do syntax, and ones with both! These exists because certain transformations will go into an infinite loop if you're not careful. This is mainly due to how some forms like do and infix generators behave together. In other cases like in adt declarations, we want to disable application type List a = Cons a (List a). It would be a shame if the a was applied to List a!

S-Expression Transformation

The transformation file can be found here.

Here we do a very straightforward algorithm, of when we see a particular Juvix ML ADT form, we then transform it to the equivalent Lisp expression that can be found in the s-expression document in design

Ideally we could generate the particulars of the syntax via some schema and have the ML code use that. This would remove the hardbaked structure of each form from the transformation, however with how many different types exist in the frontend form this would be quite a challenge. We do this technique later on in the compiler, reducing the amount of hard baked forms.

The above issue also extrapolates a bit on why we have chosen this path. ML structured ADT's are wonderful, however they lack flexibility. We used a trees that grow structure in the past, however this would convert one rigid ADT into another. And so we had to convert the massive adt that we built one by one, which ended up with over 500 lines of code per pass! However an issue just as big is that no common interface can be had, thus our passes couldn't as easily be abstracted awway.

Final Words

Overall, this stage of the compiler is not too complicated, it should be somewhat easy to get a grasp of and see how the data flows. I suggest going into the EasyPipeline library and loading up Easy.hs and playing around. We have just covered up to the SEXP PHASE section of that page, and so you should be able to understand what sexp1 and sexp2 examples there dl

Last modified June 30, 2021 : Update Document to have the up to date locations (2b76396)