This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Overview

An overview of the compiler pipeline process.

The document is laid out so the first section of each part is a small overview, and sub-pages below that are more detailed explanations

Overview

The documents in this section give a brief and detailed explanation of many parts of our compiler. The more detailed explanations have some overlap with the design section.

Getting Started

Getting started with the Juvix codebase can be confusing to say the least. Thus we outline here the order of sub pages that may be useful

  1. Pipeline

    • Gives an overview and the pipeline page

  2. Frontend

    • Talks about the Parsing and S-expression transformation phase.

  3. Translate

    • Talks about the Passes

  4. Type Theory

  5. Typechecker

  6. Core

  7. Backends

    1. Michelson

  8. Misc

    1. S-expression

1 - Core

Talks about various aspects of the Core code-base

Type Theory

Our Core type theory, called "Quantitative Type Theory" ("QTT"), is derived from the paper I Got Plenty o’ Nuttin’. Modifications and extensions are documented here.

Extensions/Annotations

Core works off a system of annotations/extensions. These currently are crucial for understanding what happens to the data structures in Core.

More can be read about our extensible system here.

HR

HR is the first step into core from the frontend language. This phase is a lot like IR (Read below), however it includes names rather than De Brujin indices. Due to this, this phase is rather short lived.

Characteristics of HR

  • Is an Extension of HR with Names

  • Short Lived

IR

IR is the representation of Core in which all other extensions are based off of. In order to facilitate fast evaluation Core is broken up into terms and values. Values are like terms but they have no β-redexs via type construction. Thus they statically rule out λ being applied to arguments. This means values are in normal form.

Characteristics of IR

  • De Brujin indices

  • Extensions via Extensible

  • Idea of terms and values

    • Values have no β-redexs by construction

      • Free name can still be there, and could be inlined

      • Impossible to reduce if the Free names are a data type function

Extension Layout and mapping

  • Core-Diagram.jpg
    • In this diagram we can see various mappings that happen through the extensions of core.

  • Main representation

Type Checker

Type checking is the first phase of IR. This phase annotates every node with its type and usage. To facilitate fast type checking every type annotated is a fully evaluated value. Other phases like the unifier use this concept as well.

Erased

After the code gets pass through IR, we then want to erase all the type annotations we've built up. To do this, we send the code through the Erased representation. Unlike HR, Erased is not an extension of IR.

Erased also erases all forms which have 0 usages. Since usages are not first class yet, this is a rather simple.

Another key characteristic of Erased is that types are separate from Terms as they are currently assumed to have 0 usages. Thus types are always erased and separated out.

  • This may change with time.

ErasedAnn

  • This is the backend specific form that Michelson uses.

1.1 - HR

Replaces bound and free with just names, has names in the binders

  • Extension of IR

  • replace free and bound by var

  • Contains a map of namesybmols

Place for Pretty printing

Substitution

Translates into IR

  • State monad with traversal

1.2 - IR

1.2.1 - Representation

Representation

Global

  • A lot of the global information predates the frontend context. A lot can probably be integrated.

  • Raw Globals Type is a term, which may not be correct (what the user wrote), Regular global the types are values, which can be checked for equality eaiser.

    • Only safe to eval something after it's been type checked

    • Way to unify?

1.2.2 - Extensions

Extensions

Extensions are an important part of Core IR. They allow certain kinds of transformations

Transformation

1.2.3 - Evaluator

Evaluation of the internal representation in Juvix plays an important part in the compilation pipeline: it simplifies the program before passing it to the backend. As the internal representation is closely related to the λ-calculus with de Bruijn indices, the evaluator implements all necessary functions to do so.

In addition to the evaluation functions, the module provides functions for inlining globals (inlineAllGlobals / inlineAllGlobalsElim), turning function definitions into anonymous functions (toLambda), lookup functions for variables and filtering functions for extensions.

For more information on the λ-calculus with de Bruijn indices, see Wikipedia article. This document will describe the way the theory is implemented for Juvix, and expects a basic understanding of the λ-calculus with de Bruijn indices.

Substitution boils down to replacing the variables bound under the current λ, after which the free variables in the term need to be updated to reflect the removal of an abstraction layer, this is the task of weakening.

Substitution

Substitution lies at the core of evaluation, replacing bound variables with terms. Currently the compiler contains several closely related substitution implementations, both for terms and patterns. All of them serve the same purpose, but differ slightly in their types.

Term substitution

  • substWith: Substitutes any term a with an IR.Elim', but results in a term of type a.

  • substTermWith: Substitutes any term a with an IR.Elim', and results in an IR.Elim'.

  • gsubstWith: Generic substitution of f t with an IR.Elim', and results an f t again. This generic is used for automatic deriving of instances using GHC.Generics.

Pattern substitution

The functions are for substituting patterns, each of them take:

  • A number indicating the current traversal depth.

  • A IR.PatternMap containing pairs of pattern variables and terms.

All the functions return an Either with a PatternVar in case not substitution has happened, and the resulting term in case of a substitution.

  • patSubst': Substitutes any term a with any of the patterns provided in the map, and results in an a.

  • patSubstTerm': Substitutes any term a with any of the patterns provided in the map, and results in an IR.Term'.

  • gpatSubst': Generic substitution any term f t with any of the patterns provided in the map, and results in an f t. This generic is used for automatic deriving of instances using GHC.Generics.

Weakening

Weakening or shifting of de Bruijn indices is a central part of the evaluation. After performing β-reduction, any free variables need to be decreased to account for removal of the λ-abstraction.

The basic weakening function is weakBy', which takes:

  • An natural number, denoting the amount to shift free variables.

  • A cutoff value, for which all variables less than or equal are bound, and all variables greater than that are free. Typically this value will be 0, to denote that we want to account for removal of the outermost abstraction.

  • A term a to perform weakening / shifting on.

weakBy' will return a term a again, on which the weakening has been executed. The generic function gweakBy' is similar, but takes an f t as term and returns an f t. It is used for automatic deriving of instances using GHC.Generics.

1.3 - Original QTT Erasure algorithm

This article explains the QTT Erasure algorithm in a higher-level language for an easier digestion

Original QTT Erasure algorithm

Disclaimer This text only explains the QTT Erasure algorithm, you must already be familiar with how QTT works.

Initially, the algorithm assumes that:

  • any typechecked term $t : \rho > T$ elaborates to a program p1;
  • $\rho \neq 0$, since you want to make sure that your program actually has something to execute before translating it into runnable code;
  • the translation is into the untyped $\lambda$-calculus.

The whole algorithm traverses the tree of terms elaborating programs – given that dependent types are the zero resource being erased. For a more complex language, we should probably not merge QTT erasure with an actual elaboration phase.

The only terms affected by erasure are binding-dependent terms, in McBride’s calculus, it’s $\lambda$ abstraction and application.

The algorithm goes as follows:

  • for $\lambda$-abs with a zeroed resource,
    • you have a variable $0 > x : S$ 2 bound to a body $\rho t : T$;
    • it elaborates to a program p of type $T$. Meaning, that $x$ won’t be consumed in p – as it was evaluated already – and, then, $p$ can proceed.
  • for $\lambda$-app with a zeroed resource,
    • you have a function $f$ that takes a variable $0 x : S$;
    • while within your context, you have a zeroed value $s : S$;
    • given $f[s : S/x]$, p elaborates to the body $t : T$ of $f$. Meaning that what’s left for $f$ is the returning function when $s$ is applied.
    • for all remaining terms, elaborate it as is.

The main idea of the paper is to pre-bind zeroed resources, making it unavailable to the elaborated program p. The main trick of the paper is to pair zeroed resources from $\lambda$-abs with zeroed resources from $\lambda$-app. McBride’s paper provides a proof that a zeroed application will always meet a zeroed $\lambda$ abstraction.

Resources

  1. I Got Plenty o Nuttin by Conor McBride

  1. McBride’s paper actually splits the whole semantics into checking and synthesis, following bi-directional typechecking. Implementing the recipe bi-directionally would be ideal since it provides a literal algorithm, here we explain it by assuming both checking and synthesis mean just checking. ↩︎

  2. The 0 prefix means that the term has 0 usage. ↩︎

1.4 - HR IR Conversion

This article goes over some of the idea of HR to IR conversion

HR ⟷ IR Conversion

The Pass which can be found here deals with the conversion between Core HR and Core IR.

HR ⟶ IR

This phase simply removes the names from the HR form and constructs a De-Brujin index from the names given in HR.

  • The algorithm for this for Terms and Elims is rather simple

    1. When we see a binder (Pi, Lam, Sig, or Let), push the name to the stack for the form in which it is scoped over. Recurse and form and reconstruct said binder.

    2. For forms without names, recurse and reconstruct.

    3. When a name appears, look up it's index from the stack, this is the De-Brujin index we care about. If the Index is not there then we mark it as a Global

HR-Pattern's work in a different way, since patterns are disconnected from terms themselves and do not interact, they work on a system of registering pattern variables, when we come across the binder form =

Outstanding Questions

  • Since for Terms and Elims we don't account for pattern variables, and just mark everything unknown as Global, how does Pattern's affect this? It seems we would mark any PatternVar above this form as a Global and lose that information

IR ⟶ HR

This conversion adds back names to the IR form. This is mainly used for Pretty Printing.

  • The algorithm for Terms and Elims is once again rather simple

    1. When we see a binder (Pi, Lam, Sig, or Let), generate a new name, and push it to the stack. Recurse and form the reconsturct said binder

    2. Forms without names just recurse and reconsturct.

    3. When a name appears, lookup the index from the stack, that position is the name which we care about.

1.5 - Type Theory

The type theory implemented by Juvix

1.5.1 - Foundations

Foundations of Juvix’s type theory

Quantitative Type Theory

Core's type theory, called "Quantitative Type Theory" ("QTT"), is derived from the paper I Got Plenty o’ Nuttin’. Further references for QTT include:

1.5.2 - Inductive data types

Juvix’s inductive data types

Juvix extends QTT with inductive data types.

1.6 - Typechecker

Core IR types and type-checking

1.6.1 - Overview

Overview of Typechecker’s place in Core

Where Typechecker fits into the pipeline

The typechecker is called via typeTerm, which in turn is called by typecheckErase and typecheckEval, ultimately via coreToAnn, which in turn is called by the backend-specific typecheck functions (which will soon be factored out into a common Pipeline.typecheck function), which fits between Pipeline.parse and Pipeline.compile.

I believe that once function typechecking is implemented, typecheckErase and typecheckEval will also need to call typeCheckDeclaration.

Overall structure

IR/Types/Base.hs defines the elements of the syntax of QTT: terms, values, variables, and patterns.

IR/Types/Globals.hs defines the main compound types derived from the Base types: functions and datatypes.

IRAnn/Types.hs defines the typechecker-specific annotations which correspond to the kinds of terms in QTT.

IR/CheckDatatype.hs and IR/CheckTerm.hs enforce the typing rules of QTT.

Interactions with other components

  • The typechecker imports back-end-specific types through Parameterisation.

  • Characteristically of a dependently-typed system, the Typechecker uses types and functions from the Evaluator, for example in:

    • Typechecker's CanTC and Error types

    • typeCheckConstructor, which has to evaluate the terms of the telescope passed to it

    • typeTerm' and typeElim', which must perform reductions on terms to perform type-checking, in accordance with QTT's PRE and POST rules (characteristically of a dependently-typed theory)

Open questions / things I don't understand

1.6.2 - Correspondences

Correspondences between Typechecker code and QTT paper

Correspondences between Typechecker and QTT typing/evaluation rules

In this section we examine where the formal elements of Juvix's implementaion of Quantitative Type Theory appear in our implementation. $x=1$

  • Definitions 1 and 2 address generic rig s (ring without negation) as well as the 0, 1, many rig used by Idris-2. Neither of those has an explicit analogue in our code, since we use one specific rig, and not the one used by Idris-2 – rather, we use the natural numbers plus omega, which we call NatAndw.

  • Definition 3 defines sort ordering as a transitive relation on sorts which allows well-founded induction. This will be our subtyping relation, which is defined by the <: operator in IR/CheckTerm.hs.

  • Definition 4 is of term and elimination, which correspond to our Term and Elim defined in IR/Types/Base.hs.

  • Definition 5 is of contraction, reduction, and computation. Contraction is a single step of beta-reduction (substitution) or nu-reduction (stripping an annotation, which we can do once a term is fully computed), corresponding to "small-step" semantics. Computation is the reflexive-transitive closure thereof, corresponding to "big-step" semantics. We refer to computation as Evaluation, so the implementation of computation in our code is evalTerm (see the Evaluator documentation).

  • Definition 6 is of pre-context and context. The former is what is usually just called "context" in type theories – an association list of names to types. In QTT, the context is a mark-up of a pre-context with usages. In our typechecker, the pre-context is Context, and the mark-up is UContext.

  • Definition 7 is of prejudgment, which is what is usually just called "[type] judgment" in type theories; as QTT is presented bidirectionally, there is both a checking prejudgment (a Term is claimed to a given type) and a synthesis prejudgment (a given type is inferred for an Elim). Definition 8 is of a judgment as a prejudgment annotated with a usage. We don't explicitly use the term "judgment" in our QTT code, but the type judgments and usage judgments are captured by the Typechecker's specific extension to Annotation.

  • Definition 9 is of the type checking and synthesis rules. (This is extended later by definition 39, which includes erasure and computation. Those two are primarily subjects of other documents.) Below are some correspondences between the typing rules (from definition 9) and the implementations in our code. The explicit implementation of these rules is concentrated within typeTerm' and typeElim', and their helper functions, in CheckTerm.hs.

    • PRE, POST: These rules, which state that a type may be used in place of another if the latter reduces to the former, are implemented in the IRAnn' case of typeElim', and as calls throughout the cases of typeTerm' to evalTC and substApp (which calls evalTC internally).

    • SORT: The IR.Star' case of typeElim'. Stratification appears not to be fully implemented as of this writing – the call to requireUniverseLT is commented out.

    • VAR: The IR.Bound' and (two – one for globals and one for patterns) IR.Free' cases of typeElim'.

    • FUN: IR.Pi' case of typeTerm'

    • LAM: IR.Lam' case of typeTerm'

    • APP: IR.App' case of typeElim'

    • ELIM: The requireSubtype call in the IR.Elim' case of typeTerm'.

    • CUT: The call to typeElim' itself, in the IR.Elim' case of typeTerm'.

  • Definition 17, of subtyping, is implemented by the <: operator in IR/CheckTerm.hs.

  • Definitions 10-16 and 18-36, and all the Lemmas, all relate to reasoning about the typing system, rather than defining the type system, so they do not have direct analogues in our code (except maybe in some comments!). They could, however, if we were to implement Juvix in a dependently-typed language (or to at least some degree if we were to use Liquid Haskell) – such as Juvix itself.

  • Definitions 37 and on refer to Erasure and to computation (which we call Evaluation), each of which we document as a separate module, so we do not explore them here, except insofar as we have touched on some aspects where exploring definitions 9 and 39 together above.

1.6.3 - Modules

Documentation of individual Typechecker modules

Definitions by module (beneath Core/)

Types

Types used throughout the typechecker (at least I think that's the intent):

  • PipelineError

    Includes TypecheckerError, which we currently exit with when a user program is incorrectly typed. (I expect the long-term plan is to produce from it a human-readable error message.)

  • PipelineLog

    Used with @"log" (from Capability.Writer) to log a couple of types of events.

  • TermAssignment, AssignWithType

    Maps from variables to formulae of Elementary Affine Logic (EAL) / Elementary Affine Core (EAL). These terms are translated to interaction nets, which Juvix uses as its execution model.

    I didn't find any uses of these types that weren't ultimately within the InteractionNetIR module, so they should probably be moved there.

IR

Aliases for types and functions from IR/Typechecker.

IR/

  • Types

    Aliases for types and functions from IR/Types/Base and IR/Types/Globals, and some simple queries on them. There is one significant pair of mutually recursive functions here: quote :: Value -> Term and neutralQuote :: Neutral -> Elim. (As we shall see below, a Neutral is a particular kind of Value, and may include a Value, hence the mutual recursion.)

  • Types/

    • Base

      • The definitions in terms of primitive types of several internal types. For example, a BoundVar is represented by a Natural, and a PatternVar by an Int. Name s are either Global s or Pattern s; the implementation of GlobalName is defined in the NameSymbol library.

      • GlobalUsage is defined here and restricted to only Zero or Omega, but there is a TODO comment asking whether globals might have other usages.

      • Several of the most pervasive data types are defined here, all as extensible, in the sense of trees-that-grow, meaning that different modules can extend them in their own ways, to use their own specific types of annotations, producing versions with a ' suffix:

        • Term Since Juvix is dependently-typed, a type (PrimTy) is one kind of term.

        • Elim An Elim may be a bound variable, a free variable, an application of an Elim (to a Term), or a usage annotation. An Elim is also a term (it may be embedded in a Term with the Elim constructor). Elim is so-called because McBride calls this type an "elimination" in I Got Plenty o' Nuttin'.

          Elim may be viewed as a term that can be inferred (synthesized).

        • Value Since Juvix is dependently-typed, a type (VPrimTy) is one kind of value. Value's definition strongly resembles that of Term, but there are differences; for example, there are let Term s, but no let Value.

        • Neutral Neutral is to Value as Elim is to Term: A Neutral is a bound variable, a free variable, or an application of a Neutral (to a Value); a Neutral is also a Value (it may be embedded in a Value with the VNeutral constructor).

          A Neutral may be viewed as a "stuck term": one for which typechecking must be deferred until some other term(s) has/have been typechecked. (Perhaps this is the source of the name "neutral" – it's neither good (definitely typechecks) nor bad (definitely doesn't typecheck) yet?)

        • Pattern Patterns contain PVar s which are bound by successful pattern matches, as well as primitive values, terms, and pairs and lists of Pattern s.

    • Globals

      • Many Constraint-valued definitions which allow multiple instance s of various classes for various types to be declared at once. (For example, RawGlobalAll allows any type which satisfies it to derive Show, Eq, and Data, among others.)

      • Several more pervasive types, built upon those in Base:

        • Pos, representing strict positivity (or otherwise)

        • FunClause', which defines the form of Juvix function definitions. It comprises arguments (a Telescope), a pattern match (a list of Pattern'), a body, a type, and Bool fields indicating whether the clause is a catchall and/or unreachable.

        • A Function' comprises a NonEmpty list of FunClause' s.

        • The widely-used DataArg' includes a name, a usage, and a type.

        • A Datatype' includes a name, DataArg' s (each with a Pos), a universe level, and a list of constructors (DataCon').

        • A type constructor – DataCon' – includes a name, a type, an the function which defines it (whose output type is the type of the constructor).

        • Raw forms of the above. In each case, the Raw form uses Term where the non-Raw (Cooked?) form uses Value.

  • Typechecker

    There's only one definition in this module, but it's a big one: typeCheckDeclaration, which takes a telescope, a list of datatypes, and a list of functions, and calls typeCheckAllCons (see below) on each datatype in the list. If type-checking is successful, it adds the datatype and its constructors to the globals.

    As I write, typeCheckDeclaration is incomplete: it doesn't use one of its parameters, the one comprising a list of function declarations; furthermore, nothing actually calls it yet (except itself, recursively). That is because we simply haven't gotten to writing the typechecking of functions yet (which is also why the backend-specific typecheck functions currently inline everything recursively starting with main).

  • Typechecker/

    Some of the types in Typechecker/ are parameterized over back-ends through the Parameterisation type which each back-end exports.

    • Types

      Contains typechecker-specific extensions of some core types:

      • Annotation': Adds usage and type.

      • BindAnnotation': Adds an Annotation' for both the binder and the term that introduces the binder (Lam or Let).

      • Term': Annotations for each of the constructors of Term s (pairs, pi-types, sigma-tyoes, Elim' s, and so on).

      • Elim': Annotations for each of the constructors of Elim s (bound and free variables and applications of them).

    • Env

      The typechecker's extension of EnvCtx is a GlobalsT, which is a (hash)map of names to Global' s, which in turn may be datatypes, data constructors, functions, or Value s (with names and usages). Env therefore also defines a HasGlobals alias for HasReader "globals".

      The interaction with the evaluator, as required by dependent typing, appears in this module via the CanTC type, which requires instances for Eval.HasSubstValue, Eval.HasPatSubstTerm, and Eval.HasWeak.

      This module also defines a Context as a list of the typechecker-specific form of Annotation. There is also a usage context, UContext, which is a list of Usage.

      Pattern bindings and usages – PatBinds and PatUsages – are defined as IntMap s to annotations and usages, respectively, reflecting the use of de Bruijn indices.

      Env also defines the typechecker's State monad: InnerState, which contains a Parameterisation, a PatBinds map, and a Context. (There's a lot of boilerplate related to it.)

      Env also defines signatures: SigDef is a sum of function, constructor, and datatype signatures. Signature maps global names to SigDef s, so it represents all the signatures in global scope. Typechecking functions return Signature s through the TypeCheck type, which uses Capability, with the name "typeSigs".

    • Error

      Defines which errors can occur during typechecking. TypeMismatch, for example, represents a unification error on a variable. Other errors largely follow the possible types of terms – universe mismatches, usage mismatches, non-function types where function types are expected, incompletely-applied data constructors, and unbound variables. Again because dependent typing makes the typechecker depend on the evaluator, the typechecker may also wrap an error from the evaluator (Core/IR/Evaluator/Types.Error).

      The module also provides pretty-printing and an exception wrapper for the main TypecheckError type.

  • CheckDatatype

    Exports two interfaces, which between them allow the type-checking of datatype declarations; hence, both are called by typeCheckDeclaration :

    • checkDataType

      Typechecks all the parameters of a datatype declaration, ultimately by calling typeTerm on each.

    • typeCheckAllCons

      Typechecks all the constructors of a datatype declaration, ultimately by calling typeTerm on each, followed by evalTerm, because of the potential dependency of types on values.

  • CheckTerm

    Exports typeTerm, through which the typechecker is plugged into the Pipeline, via typecheckEval and typecheckErase, and through them from coreToAnn, which is called from the backend-specific typecheck functions with the appropriate Parameterisation s. (The backend-specific typecheck functions in the current development branch, as I write, have a lot in common, but Alberto is factoring them out as part of PR 821.)

    Internally, typeTerm is implemented in terms of the mutually-recursive typeTerm' and typeElim' (which follow the structures of the mutually-recursive Term and Elim). These two functions are, in a sense, the core of the typechecker: they're the expression in code of the typing rules of QTT. For example, when typeTerm' is given a lambda with expected type A -> B, it checks whether assuming that the variable has type A results in the body correctly typechecking (with a recursive call to typeTerm') with type B.

    The mutual recursion expresses the bidirectionality of the type checking: annotated terms are checked by typeTerm' ; unannotated terms ("eliminations") have their types inferred (synthesized) by typeElim'.

    (Note: as of this writing, there is a bug in the Pair' case of typeTerm' : it doesn't evaluate the left term in checking the type of the right, so it wouldn't handle dependent pairs correctly. I will fix that shortly.)

    One of the types that appears as a frequent return value in CheckTerm.hs is Leftovers. This type keeps track of variables which haven't yet been used as often as their Usage annotation allows – and, if it's not omega, requires! Hence if there are any Leftovers with usages other than omega after a term has been type-checked, that causes a LeftoverUsage TypecheckerError to be thrown.

    CheckTerm.hs is also the location of our definition of the subtyping relation, operator <:.

    Many of the return values of functions in CheckTerm.hs are parameterised over Monad s with constraints such as HasThrowTC' and HasBound. So they expect to be able to throw TypecheckerError s, or to be called as part of the execution of a state monad containing (at least) a list of bindings. That's a pattern which strikes me as crying out for algebraic effects – I think they could make the signatures much more readable, with less boilerplate and less code to compose effects. So I feel this module should be able to "read" better when it's written in Juvix itself!

  • TransformExt

    This module defines an interface for transforming one extension of a Term (in the sense of trees-that-grow) into another extension of Term – an applicative functor on term extensions. Such functors are defined in terms of their operations on each variant of Term, such as primitive, pi-type, let-binding, and so on. Such functors compose (and the TransformExt module define the compose function which composes them). This module also defines a forgetful functor, forgetter, which transforms an extended Term into a Term with no extension (NoExt).

    There are forgetful uses in the Evaulator and Erasure modules, but I haven't found any non-forgetful uses yet.

  • TransformExt/

    • OnlyExts

      OnlyExts is so-called because it defines a Term -transforming functor which discards annotations but preserves extensions.

IRAnn/

  • Types

    The typechecker-specific annotation of terms comprises:

    • A usage (any natural number, or omega)

    • A term of QTT (which might be a type), which is one of:

      • * (a universe, of which there is a natural-number-indexed hierarchy)

      • a primitive type

      • a primitive constant

      • a pi-type (dependent product)

      • a lambda

      • a sigma type (dependent sum)

      • a pair

      • a let-binding

      • the unit type

      • the unit value

      • one of the Elim types:

        • a bound variable

        • a free variable

        • an application of an Elim to a Term

        • a Term annotated with a Usage

      Some of these terms are extensions to core QTT, which, for example, does not appear to me to define explicit pair types, for example.

  • Erasure

    This module just defines two forgetful functors, for Term and Elim, which discard extensions (see TransformExt).

1.6.4 - Formalization

Formalization of QTT in Idris-2

Category-theoretic formalization of QTT: "Quantitative Categories with Families"

-- Following Definition 3.1, of "Categories with Families (CwFs)",
-- in _Syntax and Semantics of Quantitative Type Theory_.
record CwF where
  constructor MkCwF
  -- "C is a category..."
  Underlying : Category
  -- "...with a chosen terminal object T"
  Terminal : Object Underlying
  TerminalCorrect : Universality.IsTerminal {cat=Underlying} Terminal
  -- "For delta in Ob(C), a collection Ty(delta) of semantic types"
  SemanticType : Object Underlying -> Type
  -- "For delta in Ob(c) and S in Ty(delta), a collection Tm(delta, S) of
  -- semantic terms"
  SemanticTerm : (object : Object Underlying) -> SemanticType object -> Type
  -- "For every f : delta -> delta' in C, a function
  -- -{f} : Ty(delta') -> Ty(delta)..."
  MapType : (obj, obj' : Object Underlying) ->
            Morphism Underlying obj obj' ->
            SemanticType obj' -> SemanticType obj
  -- "...and for S in Ty(delta') a function
  -- -{f} : Tm(delta', S) -> Tm(delta, S{f})..."
  -- (Implementer's note:  I don't understand why both MapType and MapTerm
  -- (as I'm calling them) are referred to in the paper as "-{f}".  Is that
  -- a typo, or will they just distinguish between MapType and MapTerm by
  -- context?)
  MapTerm : (obj, obj' : Object Underlying) ->
            (f : Morphism Underlying obj obj') ->
            (S : SemanticType obj') ->
            SemanticTerm obj' S ->
            SemanticTerm obj (MapType obj obj' f S)
  -- "Such that both assignments preserve identity and composition"
  MapTypePreservesIdentity : (obj : Object Underlying) ->
                             (S : SemanticType obj) ->
                             MapType obj obj (Identity Underlying obj) S = S
  MapTypePreservesComposition : (a, b, c : Object Underlying) ->
                                (f : Morphism Underlying a b) ->
                                (g : Morphism Underlying b c) ->
                                MapType a c
                                  ((.*) {cat=Underlying} {a} {b} {c} g f) =
                                (MapType a b f) . (MapType b c g)
  MapTermPreservesIdentity : (obj : Object Underlying) ->
                             (S : SemanticType obj) ->
                             MapTerm obj obj (Identity Underlying obj) S = S
  MapTermPreservesComposition : (a, b, c : Object Underlying) ->
                                (f : Morphism Underlying a b) ->
                                (g : Morphism Underlying b c) ->
                                (S : SemanticType c) ->
                                (t : SemanticTerm c S) ->
                                MapTerm a c
                                  ((.*) {cat=Underlying} {a} {b} {c} g f) S t =
                                MapTerm a b f (MapType b c g S)
                                  (MapTerm b c g S t)
  -- "For each object delta in C and S in Ty(delta) an object delta.S (called
  --  the _comprehension of S_)..."
  Comprehension : (obj : Object Underlying) ->
    SemanticType obj -> Object Underlying
  -- "...such that there is a bijection natural in delta':"
  ComprehensionToMorphism :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj' : Object Underlying) ->
    Morphism Underlying obj' (Comprehension obj S) ->
    Morphism Underlying obj' obj
  ComprehensionToTerm :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj' : Object Underlying) ->
    (f : Morphism Underlying obj' (Comprehension obj S)) ->
    SemanticTerm obj'
      (MapType obj' obj (ComprehensionToMorphism obj S obj' f) S)
  TermToComprehension :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj' : Object Underlying) ->
    (m : Morphism Underlying obj' obj) ->
    SemanticTerm obj' (MapType obj' obj m S) ->
    Morphism Underlying obj' (Comprehension obj S)
  ComprehensionToTermToComprehension_id :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj' : Object Underlying) ->
    (f : Morphism Underlying obj' (Comprehension obj S)) ->
    TermToComprehension obj S obj'
      (ComprehensionToMorphism obj S obj' f)
      (ComprehensionToTerm obj S obj' f) =
    f
  TermToComprehensionToTerm_id_morphism :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj' : Object Underlying) ->
    (m : Morphism Underlying obj' obj) ->
    (term : SemanticTerm obj' (MapType obj' obj m S)) ->
    ComprehensionToMorphism obj S obj' (TermToComprehension obj S obj' m term) =
    m
  TermToComprehensionToTerm_id_term :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj' : Object Underlying) ->
    (m : Morphism Underlying obj' obj) ->
    (term : SemanticTerm obj' (MapType obj' obj m S)) ->
    ComprehensionToTerm obj S obj' (TermToComprehension obj S obj' m term) =
    replace {p=(\m' => SemanticTerm obj' (MapType obj' obj m' S))}
      (sym (TermToComprehensionToTerm_id_morphism obj S obj' m term)) term
  ComprehensionToMorphismIsNatural :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj', obj'' : Object Underlying) ->
    (observerChange : ObserverChange {cat=Underlying} obj' obj'') ->
    -- I'm not sure whether it is necessary to impose naturality on the
    -- observer change in order to ensure it on the comprehension-to-morphism
    -- bijection; that's just a guess.  We could find out whether proofs would
    -- go through without it.
    (isNatural : ObserverChangeIsNatural
      {cat=Underlying} {observerA=obj'} {observerB=obj''} observerChange) ->
    (f : Morphism Underlying obj' (Comprehension obj S)) ->
    ComprehensionToMorphism obj S obj''
      (observerChange (Comprehension obj S) f) =
    observerChange obj (ComprehensionToMorphism obj S obj' f)
  ComprehensionToTermIsNatural :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj', obj'' : Object Underlying) ->
    (observerChange : ObserverChange {cat=Underlying} obj' obj'') ->
    -- I'm not sure whether it is necessary to impose naturality on the
    -- observer change in order to ensure it on the comprehension-to-term
    -- bijection; that's just a guess.  We could find out whether proofs would
    -- go through without it.
    (isNatural : ObserverChangeIsNatural
      {cat=Underlying} {observerA=obj'} {observerB=obj''} observerChange) ->
    (f : Morphism Underlying obj' (Comprehension obj S)) ->
    ComprehensionToTerm obj S obj''
      (observerChange (Comprehension obj S) f) =
    MapTerm obj'' obj'
      (ObserverChangeInducedMorphism
        {cat=Underlying} {observerA=obj'} {observerB=obj''} observerChange)
      (MapType obj' obj (ComprehensionToMorphism obj S obj' f) S)
      (ComprehensionToTerm obj S obj' f)

2 - S-expression

An Overview on dealing With S-Expressions WIP

3 - Translate

Documention on the Juvix Translation Phase

Translate

This phase is the direct manipulation of the S-expression construct we left off of in the frontend phase.

overview.png

This section is broken up into three section, each growing in complexity as we go along. These are:

  1. Desugaring

  2. Context Phase

Desugaring

Every important operation in this phase is a direct Syntax to Syntax transformation with no extra context needed.

desugar.png

The function op pipes all desugaring functions that do not require a context.

  -- | @op@ fully desugars the frontend syntax from the original
  -- frontend s-exp representation to a form without modules, conditions,
  -- guards, etc. This pass thus does all transformations that do not
  -- requires a context
  op :: [Sexp.T] -> [Sexp.T]
  op syn =
    syn
      >>| Pass.moduleTransform
      >>| Pass.moduleLetTransform
      >>| Pass.condTransform
      >>| Pass.ifTransform
      >>| Pass.multipleTransLet
      |> Pass.multipleTransDefun
      |> Pass.combineSig
      >>| Pass.removePunnedRecords
      >>| Pass.translateDo

The order of these passes is relatively arbitrary. We simply remove modules, then module lets, then conds, then …, and then do. The op function is ran directly on the output of the last phase.

Examples of its use are provided in the EasyPipeline module (run desugar1 and desugar2 to test the results for yourself!).

Desugaring S-expressions, Structurally

Structurally here refers to a systematic way of manipulating s-expressions in which form has and equivalent Haskell ADT.

Before getting into all the functions in op, such as Pass.moduleTransform, we first discuss the underlying tool that allows us to handle these passes in a principled way.

We'll use the let/defun form as an example of this transformation. This code can be found in the Structure module.

  ;; From Design/S-expression Syntax
  (:defun foo (x)
    (:cond
      ((:infix == x 2) (:infix + 3 (:infix + 4 x)))
      (else            (:infix + x 2))))

defun is broken into the name foo, the arguments x (note this can pattern match), and the body cond.

In Structure/Frontend.hs we find a Haskell encoding of this form:

  -- | @Defun@ is the base defun structure
  -- currently it does not have matching
  data Defun = Defun
    { defunName :: Sexp.T,
      defunArgs :: Sexp.T,
      defunBody :: Sexp.T
    }
    deriving (Show)

Note: Notice how we say nothing about the head being the cadr of the structure, and the arguments the caddr, and the body of course the caddr. Instead, we just lay out the logical structures in a record, divorced from any representation.

We provide an API to deal with the actual representation. It would need to change with a lisp-form redesign.

  ----------------------------------------
  -- Defun
  ----------------------------------------

  nameDefun :: NameSymbol.T
  nameDefun = ":defun"

  isDefun :: Sexp.T -> Bool
  isDefun (Sexp.Cons form _) = Sexp.isAtomNamed form nameDefun
  isDefun _ = False

  toDefun :: Sexp.T -> Maybe Defun
  toDefun form
    | isDefun form =
      case form of
        _Defun Sexp.:> sexp1 Sexp.:> sexp2 Sexp.:> sexp3 Sexp.:> Sexp.Nil ->
          Defun sexp1 sexp2 sexp3 |> Just
        _ ->
          Nothing
    | otherwise =
      Nothing

  fromDefun :: Defun -> Sexp.T
  fromDefun (Defun sexp1 sexp2 sexp3) =
    Sexp.list [Sexp.atom nameDefun, sexp1, sexp2, sexp3]

All records in that file have a corresponding interface of name<structure>, is<structure>, to<strcuture>, and from<structure>. These deal with the small details of cars and cdrs.

This can be tested by opening the Easy file in the Juvix repl and running

  λ> Right defun = Sexp.parse "(:defun foo (x) (:cond ((:infix == x 2) (:infix + 3 (:infix + 4 x))) (else (:infix + x 2))))"
  λ> import qualified Juvix.Sexp.Structure as Structure
  λ> Structure.toDefun defun
  Just (Defun {defunName = "foo", defunArgs = ("x"), defunBody = (":cond" ((":infix" "==" "x" 2) (":infix" "+" 3 (":infix" "+" 4 "x"))) ("else" (":infix" "+" "x" 2)))})

We can see this transformation agrees with our documentation.

What this buys us, is that when it comes time to do the passes, we don't have to worry about these details, while trying to make sure we don't change the semantics of the passes themselves. This properly decouples our concerns so we can worry about the abstraction meaning of the syntax in the passes while worrying about the details here.

However all is not fine, notice, we had to write nineteen lines of boilerplate in order to get rid of the details of the syntax! This is rather tedious, and if we count the numbers of these there are at least 22 structures in that file, and over 500 lines of this rather straightforward API. This is especially annoying given that we are essentially just translating over BNF translations.

It is due to this tedium early on that the lisp haskell generator was created. The code itself isn't the best (due to using string interpolation) and can with a bit of engineering be improved (with a proper S-expression DSL in the lisp code), however it works fairly well for our purposes.

Here are some choice snippets that cover every case of the generator

  ;; 1
  (generate-haskell "Defun" '("sexp" "sexp" "sexp") ":defun")

  ;; 2
  (generate-haskell "ArgBody" '("sexp" "sexp") nil)

  ;; 3
  (generate-haskell "DeconBody" (repeat 2 "sexp") nil)

  (generate-haskell "Case" '("sexp" "deconBody") "case" :list-star t)


  ;; 4
  (generate-haskell "NotPunned" '("sexp" "sexp") nil)

  (generate-haskell "RecordNoPunned" '("notPunnedGroup") ":record-no-pun"
                    :list-star t
                    :un-grouped t)

  ;; 5
  (generate-haskell "ArgBody" '("sexp" "sexp") nil)
  ;; bodys here, as there are multiple!
  (generate-haskell "LetMatch" '("sexp" "argBodys" "sexp") ":let-match")
  1. The first one correlates to the structure we've already seen. Namely the Defun struct. the first argument is the type name we wish to generate. This is of course Defun. the second argument is the list we wish to parse, in this case it says all three arguments are just Sexp.T's. and binds them in order to name, args, and body in the struct itself. The third argument is the name of the structure itself if it has one. Our lets are translated to the :defun structure, and so we note that here.

  2. The second case correlates to a very similar structure, except for that it lacks a name. so the structure we care about is (arg body) with no name attached. The nil in the third argument reflects this change.

  3. the third case correlates to a scenario with two changes. The first being is that we can define types for another to rely on. Here we are saying that case has the type DecondBody, we use lower case to reflect the proper variable name this associates with. Further the :list-start t aspect of this tells us that the last argument, in this case of DeconBody, is the rest of the s-expression and not just the cadr.

      data Case = Case
        { caseOn :: Sexp.T,
          caseImplications :: [DeconBody]
        }
        deriving (Show)
  4. The fourth is the last new concept of the bunch, namely we have :un-grouped t, which states the form in which this parses are not grouped like

    ((name₁ type₁) (name₂ type₂) … (nameₙ typeₙ)), but rather

    (name₁ type₁ name₂ type₂ … nameₙ typeₙ).

    This means that we have to tell it explicitly that it occurs over 2 positions with that :un-grouped argument.

  5. The fifth case is an interesting one. The key insight is that we say argBodys rather than argBody. This is because our generator is limited. Thus we manually write

      -- these are ungrouned fromArgBodys, where we groupBy2 both ways
      fromArgBodys :: [ArgBody] -> Sexp.T
      fromArgBodys = Sexp.unGroupBy2 . toStarList fromArgBody
    
      -- these are ungrouned fromArgBodys, where we groupBy2 both ways
      toArgBodys :: Sexp.T -> Maybe [ArgBody]
      toArgBodys = fromStarList toArgBody . Sexp.groupBy2

    To be able to handle the scenario where the list-star like form happens in not the last argument, but rather in a list in some other position!

Improvements Upon the Generator

The generator can be improved a lot, as the previous section mentions it was hacked together using string interpolation which is not a good structural way to handle this sort of problem. The alternative would however take a week of hacking, so it is not a priority to undergo.

However there are two worthwhile hacks that we should undergo.

  1. Change the Maybe of the to<Name> to an Either.

    This change at the current moment does not matter! Namely because Nothing should never be triggered. This is due to the parser in frontend excludes this possibility. However when syntax redesign comes in, this will no longer be the case. It would be great if the generated code could instead generate an Either where the left counts the number of arguments given and states precisely why it can't translate the given S-expression form into the type we want.

  2. list-star like behavior anywhere. Currently we bake it into the last form, but it would be nice if we could have a way to denote this for the middle slots, so we can avoid hacks, like the manual argBodys example given in the last section.

Desugaring Passes

Now that we understand the structural underpinnings of the pass, we can now talk about the transformation itself.

We can observe the passes here.

There are two ways to proceed, if you don't want to go through the entire creation process of a pass you can skip the next section and go directly to reviewing a pass

The most important detail to note is that if you have a clear view of the input and output of pass, a pass should be easy to write and review. If creating this is hard, then I suggest reading through the next section to see the full life cycle of this process.

Creating a Pass

Instead of just showing an example, let us show writing one. Let us define the cond transformation. We can find the form here.

It seems the Cond form is filled with a bunch of ((:infix ≡≡ x 3) 1) forms which we can abstractly view as (<pred> <ans>). So let us define these forms as a type. We do this in the Structure folder. In particular we want to put this in Frontend.hs since this shows up in the frontend syntax.

  -- | @PredAns@ is an abstraction over questions and answers
  data PredAns = PredAns {predAnsPredicate :: Sexp.T, predAnsAnswer :: Sexp.T}
    deriving (Show)

  -- | @Cond@ here Cond form takes a list of predicate answers
  newtype Cond = Cond {condEntailments :: [PredAns]} deriving (Show)

To reflect the structure we include the raw forms in the generator file and take the output of these calls into the Structure file.

  (generate-haskell "PredAns" (repeat 2 "sexp") nil)

  (generate-haskell "Cond" '("predAns") ":cond" :list-star t)

Now we can test that we successfully matched the BNF by using the REPL

  λ> Right cond = Sexp.parse "(:cond ((:infix == x 2) (:infix + 3 (:infix + 4 x))) (else (:infix + x 2)))"
  λ> Structure.toCond cond
  Just
    (Cond
       {condEntailments =
          [PredAns { predAnsPredicate = (":infix" "==" "x" 2)
                   , predAnsAnswer = (":infix" "+" 3 (":infix" "+" 4 "x"))}
          ,PredAns { predAnsPredicate = "else"
                   , predAnsAnswer = (":infix" "+" "x" 2)}]})

Now we know the shape of the data we are working with!

To get closer to the core version of match, let us first desugar this to an if. Something like

(:Cond (pred-1 result-1) … (pred-n result-n)) should turn into

(if pred-1 result-1 (if pred-2 result-2 (… (if pred-n result-n))))

So like cond, we also also define the records and forms we wish to work on

  -- | @If@ has an pred, then, and else.
  data If = If
    { ifPredicate :: Sexp.T,
      ifConclusion :: Sexp.T,
      ifAlternative :: Sexp.T
    }
    deriving (Show)

  -- | @IfNoElse@ has a pred and a then.
  data IfNoElse = IfNoElse
    { ifNoElsePredicate :: Sexp.T,
      ifNoElseConclusion :: Sexp.T
    }
    deriving (Show)


  -- In the generator
  (generate-haskell "If" (repeat 3 "sexp") "if")
  (generate-haskell "IfNoElse" (repeat 2 "sexp") "if")

Because our generator is limited we make two variants.

Now let us write the form, first let us observe that we can view this operation from cond to if as folding if over the list of PredAns, with the result being an if with no else condition.


  condToIf condExpression
      | Just cond <- Structure.toCond condExpression,
        -- we need to setup the if with no else
        Just last <- lastMay (cond ^. entailments) =
        let acc =
              -- let's create it, with the predicate and answer of the
              -- PredAns tye
              Structure.IfNoElse (last ^. predicate) (last ^. answer)
                |> Structure.fromIfNoElse
         -- Now let us employ the fold strategy we talked about
         in foldr generation acc (initSafe (cond ^. entailments))
      | otherwise = error "malformed cond"
    where
      -- this is the folded function, see how we just build up the if,
      -- then push it back to an s-expression at the end?
      generation predAns acc =
        Structure.If (predAns ^. predicate) (predAns ^. answer) acc
          |> Structure.fromIf

With our current understanding we'd write something like this, and in fact we can test it as is! Just go to the Easy Pipeline file and include these dependencies and the function above.

  import qualified Juvix.Sexp.Structure as Structure
  import Juvix.Sexp.Structure.Lens
  import Control.Lens hiding ((|>))

Along with the definition and now we can see

  λ> Right cond = Sexp.parse "(:cond ((:infix == x 2) (:infix + 3 (:infix + 4 x))) (else (:infix + x 2)))"

  λ> condToIf cond
  ("if" (":infix" "==" "x" 2) (":infix" "+" 3 (":infix" "+" 4 "x")) ("if" "else" (":infix" "+" "x" 2)))

We just created the bulk of our first pass! There is a few more details that one has to care about in a real pass, but we'll cover those in the reviewing section coming up

Reviewing a Pass

The following code can be found here.

  -- | @condTransform@ - CondTransform turns the cond form of the fronted
  -- language into a series of ifs
  -- - BNF input form:
  --   + (:Cond (pred-1 result-1) … (pred-n result-n))
  -- - BNF output form:
  --   + (if pred-1 result-1 (if pred-2 result-2 (… (if pred-n result-n))))
  condTransform :: Sexp.T -> Sexp.T
  condTransform xs = Sexp.foldPred xs (== Structure.nameCond) condToIf
    where
      condToIf atom cdr
        | Just cond <- Structure.toCond (Sexp.Atom atom Sexp.:> cdr),
          Just last <- lastMay (cond ^. entailments) =
          let acc =
                Structure.IfNoElse (last ^. predicate) (last ^. answer)
                  |> Structure.fromIfNoElse
           in foldr generation acc (initSafe (cond ^. entailments))
                |> Sexp.addMetaToCar atom
        | otherwise = error "malformed cond"
      --
      generation predAns acc =
        Structure.If (predAns ^. predicate) (predAns ^. answer) acc
          |> Structure.fromIf

We saw most of this form in the creation of a pass section above. However we will go over the rough strategy briefly. In order to turn the input of cond to an if, we can view it as a fold, where we fold the if form over the predicate and answers of the cond structure with the accumulator being the already built up if's. We can see this by the BNF comment at the start of the function. This transformation gets us closer to the more primitive match representation that is present in core.

We can see the implementation of this strategy shine through in the body of condToIf and the folded function generation.

The main difference from our creation of a pass section is that we have this Sexp.foldPred call, and that condToIf has two arguments. The impotence of this is that Sexp.foldPred searches the entire S-expression structure for the name Structure.nameCond, which is :cond. When it sees this function it breaks the structure into the atom, :cond, and the cdr which is the rest of the structure. This means that if you run this on any form that contains the cond form, then it will automatically run this transformation for you!

The last key bit of information is Sexp.addMetaToCar atom, this is to preserve meta information like line-numbers that were on the :cond atom that we would like on our newly generated if atom.

Overall, reviewing a pass is rather easy, just keep in mind what the input form is, and what the output form is. As long as the output form is getting closer to the Core representation of matters, we have a successful pass.

To verify this pass, we can easily run it for ourselves and see if the BNF Comment is being truthful.

  λ> Pass.condTransform cond

  λ> Right cond = Sexp.parse "(:defun f (x) (:cond ((:infix == x 2) (:infix + 3 (:infix + 4 x))) (else (:infix + x 2))))"
  λ> cond
  (":defun" "f" ("x")
     (":cond" ((":infix" "==" "x" 2) (":infix" "+" 3 (":infix" "+" 4 "x")))
              ("else"                (":infix" "+" "x" 2))))
  λ> Pass.condTransform cond
  (":defun" "f" ("x")
     ("if" (":infix" "==" "x" 2)
           (":infix" "+" 3 (":infix" "+" 4 "x"))
           ("if" "else"
                 (":infix" "+" "x" 2))))

Although we used the cond pass to talk about how these passes work, we only do so to have a simple example of how these work. All the other passes follow the same format and can be dissected by the same thought process.

Context

The context phase is an important phase, this is where we start to introduce the context and our operations start to revolve around it. This is also where our pipeline starts to show some inadequacies with how we ingest data. But first we must talk about how we transform our list of S-expressions into a context

Contextify

We call this phase Contextify, or Contextualization. The files we will first be investigating is split among many files, so we will link them before we write code rather than at the start.

4 - Frontend

Documentation on the Juvix Frontend Phase

Frontend

We begin our journey with the user facing language of Juvix. This is either text in a file, or text written into a REPL session. We then use Parser Combinators to turn this unstructured form into a Juvix form. Finally we end on converting this Structured Juvix form into an equivalent S-expression one to do further passes.

Parser Combinators

the parser combinator file that we will be inspecting will be found here.

In order to be able to understand this code, I suggest reading about how parser combinators work, for a very brief rundown here are the key features we will be abusing. If you wish to skip the brief rundown skip to the Specific Information Section.

Quick RunDown

  matchL :: Parser Types.MatchL
  matchL = do
    skipLiner J.pipe
    match <- P.try matchLogicSN
    spaceLiner (P.string "->")
    Types.MatchL match <$> P.try expression

Here the do just means we do the first statement, parse for a pipe, then a match. If any of these parsers fail to match (for example if there is no pipe), then we consider the parser to have failed.

Try is another important concept, this says if the parser fails for any reason, don't consume any text it may have started to parse (say our pipe passes but our matchLogicSN doesn't, if that is the case, then try matchL would revert the pipe parse!).

  topLevel :: Parser Types.TopLevel
  topLevel =
    P.try (Types.Type <$> typeP)
      <|> P.try fun
      <|> P.try modT
      <|> P.try (Types.ModueOpen <$> moduleOpen)

<|> means or, so if a parser fails, we move onto the next.

Finally it would be imporper of me to end the section without talking about the infix handler.

  expressionGen :: Parser Types.Expression -> Parser Types.Expression
  expressionGen p =
    Expr.makeExprParser (spaceLiner (expressionGen' p)) tableExp

  -- For Expressions
  tableExp :: [[Expr.Operator Parser Types.Expression]]
  tableExp =
    [ [refine],
      [infixOp],
      [arrowExp]
    ]

  infixOp :: Expr.Operator Parser Types.Expression
  infixOp =
    Expr.InfixR
      ( P.try $ do
          inf <- spaceLiner infixSymbolDot
          pure
            (\l r -> Types.Infix (Types.Inf l inf r))
      )

Here we outline three key functions. The expressionGen uses the tableExp as the precedence table of infix parsers. So refine has a higher precedence than infixOp. At the end of expressionGen we get back a new parser for any expression parser we may hand it.

The infixOp structure explicitly states what kind of infix it is, in our case we make it nest on the right, so a -> b -> c gets turned into a -> (b -> c). The important thing to note is that we only parse for the inifix symbol itself, we let the parser we hand to expressionGen handle the other side.

Every other tool you'll see is an abstraction on top of these base tools. Even the Infix handler is built upon the first two primitives we've outlined.

Specific Information

Now we get into the specifics of our version. We use parser combinators mostly in the standard way, however you'll often see the forms skipLiner, spaceLiner, and *SN which are not typical in a parser combinator system.

spaceLiner just eats all empty symbols, these are spaces and newlines after the current parsed expression.

skipLiner is the same however it is for any Character given to it.

finally *SN just calls spaceLiner on whatever parser.

These concepts exist namely due to the wish of eventually making the parser indent sensitive.

The main confusing bit of our layout is the many variants of expression!

  do''' :: Parser Types.Expression
  do''' = Types.Do <$> do'

  app'' :: Parser Types.Expression
  app'' = Types.Application <$> P.try application

  all'' :: Parser Types.Expression
  all'' = P.try do''' <|> app''

  -- used to remove do from parsing
  expression' :: Parser Types.Expression
  expression' = expressionGen app''

  -- used to remove both from parsing
  expression''' :: Parser Types.Expression
  expression''' = expressionGen (fail "")

  expression :: Parser Types.Expression
  expression = expressionGen all''

We have three main variants, ones with application, ones with do syntax, and ones with both! These exists because certain transformations will go into an infinite loop if you're not careful. This is mainly due to how some forms like do and infix generators behave together. In other cases like in adt declarations, we want to disable application type List a = Cons a (List a). It would be a shame if the a was applied to List a!

S-Expression Transformation

The transformation file can be found here.

Here we do a very straightforward algorithm, of when we see a particular Juvix ML ADT form, we then transform it to the equivalent Lisp expression that can be found in the s-expression document in design

Ideally we could generate the particulars of the syntax via some schema and have the ML code use that. This would remove the hardbaked structure of each form from the transformation, however with how many different types exist in the frontend form this would be quite a challenge. We do this technique later on in the compiler, reducing the amount of hard baked forms.

The above issue also extrapolates a bit on why we have chosen this path. ML structured ADT's are wonderful, however they lack flexibility. We used a trees that grow structure in the past, however this would convert one rigid ADT into another. And so we had to convert the massive adt that we built one by one, which ended up with over 500 lines of code per pass! However an issue just as big is that no common interface can be had, thus our passes couldn't as easily be abstracted awway.

Final Words

Overall, this stage of the compiler is not too complicated, it should be somewhat easy to get a grasp of and see how the data flows. I suggest going into the EasyPipeline library and loading up Easy.hs and playing around. We have just covered up to the SEXP PHASE section of that page, and so you should be able to understand what sexp1 and sexp2 examples there dl

5 - Context

This article goes over the Context structure in detail

What is the Context?

The context can be viewed as the working image of the Juvix Language. In particular it is the storage/information system of the language. It is also responsible for having the semantics of symbol resolution Due to this, we can view the context as a fancy nested hashtable.

What information is stored in the Context?

The context stores a plethora of information.

  • Currently the Context Contains

    1. Definitions

      • These hold the Standard desugared Juvix definitions.

    2. Types

      • These are the data type declarations in the Juvix language.

      • Every definition also has a type attached as well that one can query.

    3. Constructors

      • These are stored like Definitions, but for type consturctors.

    4. Records/Modules

      • These are recursive name-spaces.

      • Each record/module holds two name-spaces inside of them, a public and private one.

    5. Precedence

      • This is associated with a definition, so we know how to desugar precedence.

    6. Reverse Module use lookup

      • We get to know what modules import what module, and if they are implicit or explicit.

    7. Current Module/Record

      • This is the current module in scope.

In the long run this information will be added to, with each declared item in the image having a general hashtable that any information can be stored on. This will include documentation and the original source code.

How does Symbol resolution work?

Symbol Resolution is handled entirely by the context. When we ask the context what Foo.Bar.zed resolves to, we do a few things.

First we check if Foo is in the current name-space. Specifically if Foo is both in the private name-space and public one, then the private one wins. After this point, we have no means of accessing private name-spaces, so we just access the public Bar and zed inside the Foo module we resolved.

If Foo is not in the locals, then we check the top of the context for Foo. If it's not there, then we are unable to resolve it.

The only other special part of a known symbol is the resolution of names like TopLevel which bypasses any local module checks, and instantly resolves from the top. so Toplevel.Foo.Bar.zed will always look in the toplevel name-space.

Special Rules for dealing with unresolved symbols

Sometimes we wish to quantify over symbols which we may add or create. For our previous example, what if Foo.Bar.zed is undefined? Operations which operate over hypothetical symbols have a way to know if the symbol is defined or not.

The resolution rules are fairly simple, but we will investigate them with three case studies where the symbols may be undefined. zed, Foo.zed, TopLevel.Foo.zed

  1. zed

    • Since zed is not in scope, we only really have two choices. We either mean Toplevel.zed or CurrentNameSpace.zed. Since we find that we typically mean zed in the current module, and not some hypothetical one at a top level, we try to do the latter. So the operation deals with an undefined CurrentNameSpace.zed

  2. Foo.zed

    • Since This case can have a condition where Foo could also be resolved, we break this into another case analysis.

      1. Foo is resolved

        • In this case, we act much like the 1) case, where we explicitly work in an unresolved CurrentNameSpace.Foo.zed. or a Toplevel.Foo.zed depending on which one is in scope (see the scoping section above!).

      2. Foo is not resolved

        • In this case we actually do a different resolution than the zed case. We assume we are talking a toplevel version, namely Toplevel.Foo.zed

  3. Toplevel.Foo.zed

    • This one is rather easy as well, as it falls back to case 2 subsection 2.

Overall some of these rules are arbitrary, namely case 2 subsection 2 may need to conform to be more like case 1, however this requires testing and playing with the system more to determine.

Both this section and the above are implemented in the modify speace impure section.

What are the Functions of the Context?

The context hosts a plethora of useful functions that make traversing it easy. Most of these more useful functions are built upon the more primitive operations.

Many operations that create name-spaces are in IO, to hold the recrusive maps we use to track information.

Primitive Operations

Empty

  empty :: NameSymbol.T -> IO (T term ty sumRep)

empty creates a new context in the name-space you call it with (thus empty "Foo.Bar" makes a context with the nested module Bar in Foo).

The current module is then set to this name.

Lookup

  lookup ::
    NameSymbol.T -> T term ty sumRep -> Maybe (From (Definition term ty sumRep))

  (!?) ::
    T term ty sumRep -> NameSymbol.T -> Maybe (From (Definition term ty sumRep))

looup looks up the given name-symbol according to the rules in the Symbol resolution section. !? is just flipped lookup.

Add

  add ::
    NameSpace.From Symbol ->
    Definition term ty sumRep ->
    T term ty sumRep ->
    T term ty sumRep

add name def context adds name with the definition def to the current name-space of an existing context. The type NameSpace.From can either be Pub or Priv currently, stating how we want to add the definition to the current context.

As can be observed, we are adding a Symbol and not a NameSymbol, and so this primitive can't add name to a separate module. This should be changed in time.

Remove

  remove ::
    NameSpace.From Symbol ->
    T term ty sumRep ->
    T term ty sumRep

remove name context removes the name from the public or private name-space of the current name-space.

InNameSpace

  inNameSpace :: NameSymbol.T -> T term ty sumRep -> Maybe (T term ty sumRep)

inNameSpace name context is the primitive way in which we change the current name-space. like lookup, we use the standard symbol resolution rule to determine which module we resolve to. If the module is not found, then we simply return Nothing. Also if the definition we land on is not a record/module, then likewise we also return Nothing. If all is successful, then we end up with a new context with a new current name-space.

ModifySpaceImp ∧ type Stage ∧ type Return

  data Stage b
    = -- | 'Final' signifies the last symbol that we
      -- pursue in updating a structure
      Final b
    | -- | 'Continue' signifies that there are parts of
      -- the namespace that we can still continue down
      Continue b

  data Return term ty sumRep
    = -- | 'GoOn' signifies that we should continue
      -- going down records
      GoOn (Record term ty sumRep)
    | -- | 'Abort' signifies that we should cancel
      -- the changes on the map and
      Abort
    | -- | 'UpdateNow' signifies that we should
      -- update the context with the current value
      UpdateNow (Definition term ty sumRep)
    | -- | 'RemoveNow' signifies that we show remove
      -- the definition at this level
      RemoveNow

  modifySpaceImp ::
    Monad m =>
    ( Stage (Maybe (Definition term ty sumRep)) ->
      m (Return term ty sumRep)
    ) ->
    NameSymbol.T ->
    T term ty sumRep ->
    m (Maybe (T term ty sumRep))

modifySpaceImp f sym context is a rather important operation. Some forms like Empty even rely on it! modifySpaceImp is responsible for implementing the behavior for known symbols exactly how Symbol Resolution

lays it out, and for unknown symbols how Rules for Unresolved Symbols lays it out.

The Maybe it takes indicates whether the symbol is resolved or not.

We decided to include the Return and Stage type here, as they both talk about something integral to the design of this function. Namely the Return talks about the range of operations we care about. This limits the kinds of operations this modifyNameSpaceImp function can do. The Stage on the other hand denotes to the operation function f whether we are at the final symbol in the resolution or are we still going down to nested structure to find the proper symbol. Many operations will do something special at the very end, and so we often care about what stage we are at.

Finally, our function returns some monad m, this is due to just being general to accept any kind of function, as the name suggests there is a non Impure version of this, however this just runs this more general one in the Identity monad.

Derived Operations

These operations aren't derived from the primitive operations, however are quite important for operation of the context. Some even have special rules one must pay attention to.

SwitchNameSpace

  switchNameSpace ::
    NameSymbol.T -> T term ty sumRep -> IO (Either PathError (T term ty sumRep))

switchNamespace name context is an important operation. This creates and switches to the name-space even if it doesn't exist! The rules for creation are exactly that ruled in the Symbol Resolution section.

The creation of new name-spaces are IO, and so the operation is in IO as well.

6 - Pipeline

Documentation on the Juvix compiler pipeline

Pipeline

This section provides an overview of the compiler pipeline, from a source text file to the specific backend calls to produce the binary.

pipeline.png

The in code explanation in that image can be itself be more boxed into what we see in app/main.hs

  Compile fin fout (Michelson backend) ->
    liftIO (readFile fin)
      >>= Pipeline.parse
      >>= Pipeline.typecheck @Pipeline.BMichelson
      >>= Pipeline.compile @Pipeline.BMichelson
      >>= Pipeline.writeout fout
      >>= liftIO . print

Parsing

The first set of phases to happen is that the text file is turned into a structured document, namely an ML AST.

Transition

This is a long phase of the compiler where we do a few things.

  1. The first is that we desugar the expression using baked in expansions. This happens for function definitions, punned arguments, etc.

  2. After that, we then take the desugared s-expressions and put them in our global "context". We do some extra processing in this stage as well like resolving what modules open each other.

    • A context is like a primitive image. It can access all the information in a nested hashtable of sorts.

  3. We still have a few more passes to run, as phases like symbol resolution and infix resolution can be run now.

  4. Finally we have one last pass, translating what is left of the S-expression AST into a format where Core understands.

Core

Currently in our compiler the Pipeline.typecheck and Pipeline.compile function are backend specific, however the general transformations are similar regardless of the backend.

HR

This layer of core stand for Human Readable, as we have variable names instead of hairy de-bruijn indices. This phase is shortly lived, as most processing at this level wants to work on some extension of the Internal Representation as names create issues for the various algorithms.

IR

This is where the main process of type checking happens.

  • Names are reduced to de-bruijn indices.

  • We have many extensions with metadata that fluctuate depending on what pass of core is run.

Eliminations

In this phase of Core, we remove the types from the terms, as they don't serve any computation purpose. By this point we've done most of the processing we need.

In the future I suspect inlining algorithms will either be done here or in IR right above.

Backends

Michelson

LLVM

Arithmetic Circuit

6.1 - Idealized Pipeline

An Idealized version of our Core pipeline

This page covers what an idealized version of our core pipeline should look like for the time being.

The sections are laid out as followings.

  • The Pipeline defines the overall structure of the pipeline with comments between each step

  • Pipeline Function as a BNF with Examples gives a more long form BNF along with examples and commentary about the function. Further at the bottom of each section is a link to the reference for any literature that may be or may not be relevant.

  • The Outstanding section asks open questions about our passes and what questions arise that we don't have good answers to yet.

  • The Exhibition section gives further examples of a few key points when noted elsewhere in this document

  • The References section is where further links go about any related subjects

Pipeline

  patternMatchPipeline :: [(Pats,RHS)]  Type  ???
  patternMatchPipeline patternClauses type' =
    terminationChecking patternClauses type'
    |> corePipline

  -- The @corePipeline@ works over a context and an expression. The
  -- context is preserved throughout the computation as it's needed for
  -- many of these steps
  corePipeline  Context  Expression  ???
  corePipeline form =
    -- Need to lambda lift pattern matching let abstractions, we don't
    -- worry about them here, instead we have the corePipeline go over
    -- those
    lambdaLiftLocalMatches form
    -- Local lets are simply just have guards left, we can remove both
    -- the top level guards here, and the let local guards.
    -- Once we realize this, we can also notice that lets can only be
    -- a single form.

    -- TODO ∷ we currently don't have guards going down to this level, we
    --        discharge it as one of our first desugar passes
    |> removeGuardsAndSimplifyLetToSingleCase
    -- Here we turn lets which have arguments into letting a name over
    -- multiple lambdas
    |> localLetsIntoLambdas
    -- Move OUT ∷ Desugar pass/Macro pass once we have that
    |> openExpressionIntoLetOverOpenName
    -- Has to resolve inner modules, so we have to treat namesymbols as a
    -- term for forms in the closure
    -- see exhibition ⒈
    |> hrToIR
    -- This step should be moved to the effect block of typeCheckEvalLoop
    -- when we support first class opens
    |> nameResolution
    -- Here we have to reun infixResolution after nameResolution
    |> infixResolution
    -- toANF runs an algorithm that takes the current form, and creates a valid HTML
    -- ANF algo should run before ‘interactiveMode‘, so we can run algebraic effects
    |> toANF
    |> branch [pipelineTypedMacro, pipelineUntypedMacro]
    -- either go to interactiveMode or proceed to compilation
    -- eraseZeroUsages looks a the current form, and removes any
    -- values/terms that have zero usages attached to them
    |> eraseZeroUsages
    -- TODO ∷
    -- See Reference Inlining
    -- for some good analysis on when to inline
    -- Currently We inline all globals, which excludes pattern matching functions
    |> inlineAlgorithm
    -- For the backends we need names rather than de bruijn indices to
    -- compile. This step gives us back names
    |> restoreNameFromDeBruijn
    -- TODO ∷
    -- See Reference Fast Curry
    -- Here we want the eval-apply algorithm from the reference
    -- This will change the code to a form that can explicitly talk about currying.
    |> fastCurry
    -- We currently have this form. This computes the capture environment
    |> captureLambdaEnvironment
    -- In Michelson we want to branch off before this
    -- This pass we want to explicitly break a lambda into an
    -- environment-closure and the binder itself
    |> lambdaIntoExplicitBinderAndClosure


  pipelineUntypedMacro form =
    mutual [macroExpand, eval] form
    -- This is a mutually recursive step, due to how haskell works, we
    -- have to put it as 1 step, however if we could, we should be able
    -- to break it up and put labels to show it's explicitly recursive.
    |> mutal [typeCheckEvalLoop, interactiveMode]

  -- Requires research on
  pipelineTypedMacro form =
    -- If macros are typed, then it's recursive with typeCheck
    -- meta programming consturctions inside the context, so witch can use it
    mutual [typeCheck, macroExpand, eval, interactiveMode] form

  -- Interactive mode is a poor man’s REPL
  -- Initially, it displays the results from typechecking and then it
  -- restarts the whole process again
  interactiveMode form =
    -- return the result of the evaluation, and a derivation tree
    -- optionally read some code and return
    showResultAndDerTree form
    input <- readCode
    juvixPipeline input

Pipeline Function as a BNF with examples

Lambda Lift the Local Matches

  • BNF

      input-expresson ::=
          let <let-path-input>+ in <expression>
        | ...
    
      output-expression ::=
          let <let-name-output>+ in <expression>
        | ...
    
      let-pat-input   ::= <name> <pattern>* <guard>+
      let-name-output ::= <name> <name>* <guard>+
    
      guard? ::= \| <expression> = <expression>
               | <expression>
  • Examples

      (** Example 1 *)
    
      (* Input *)
      def example =
        let
          foo (Just x) = 5
          foo Nothing  = 6
        in foo (Just ())
    
      (* Output *)
    
    
      (* generated by the pass *)
      def #g:unique-name (Just x) = 5
          #g:unique-name Nothing  = 6
    
      def example =
        #g:unique-name (Just ())
    
    
      (** Example 2 *)
    
      (* Input *)
      def example =
        let foo x = 5 + x in
        foo 3
    
      (* Output *)
      def example =
        let foo x = 5 + x in
        foo 3
    
      (** Example 3 *)
    
      (* Input *)
      def example =
        let foo {x, y} = y + x in
        foo {x = 3, x = 5}
    
    
    
      (* Output *)
      (* generated by the pass *)
      def #g:unique-name {x, y} = x + y
    
      def example =
        let foo x = 5 + x in
        #g:unique-name {x = 3, x = 5}
  • Description

    The Transformation takes any let expression with <patterns>*'s that are any form of pattern matching, into top level declarations that are uniquely named for use where the previous let expression was used.

    If the let form has only simple patterns, then we leave it un-transformed, this can be seen in Example 2 in the above code.

  • Possible Complications

    • We have to lambda lift extra items which may not be related to this transform see this example

        (* Input *)
        def example =
          let
            def bar x = x
            def foo (Just x) = bar 5
                foo Nothing  = bar 6
          foo (bar (Just ()))
      
      
        (* Output *)
      
        def #g:unique-bar x = x
      
        def #g:unique-foo (Just x) = #g:unique-bar 5
            #g:unique-foo Nothing  = #g:unique-bar 6
      
        def example = #g:unique-foo (#g:unique-bar (Just ()))
  • Literature

    See Lambda Lifting

Remove Guards, and Multiple lets to single let

  • NOTE, Multiple lets of the same function, not nested

  • BNF

      ;; Input
      input-expression ::=
          let <let-name-output>+ in <expression>
        | ...
    
      input-declaration ::=
          def <def-pattern-input>+
        | ...
    
      ;; Ouptut
      ouptut-expression ::=
          let <name> <name>* = <expression> in <expression>
        | ...
    
      ouptut-declaration ::=
          def <def-pattern-ouptut>+
        | ...
    
      ;; Helpers
      def-pattern-input ::= <name> <pattern>* <guard?>
    
      def-pattern-ouptut ::= <name> <pattern>* <expression>
    
      let-name-input ::= <name> <name>* <guard?>
    
      guard? ::= \| <expression> = <expression>
               | <expression>
  • Examples

      (** Example 1 *)
    
      (* Input *)
      def add-3 x =
        let bar y = 3 + y in
        bar x
    
      (* Output *)
      def add-3 x =
        let bar y = 3 + y in
        bar x
    
      (** Example 2 *)
    
      (* Input *)
      def add-3-not-on-0 x =
        let compute y
             | y == 0 = Nothing
             | else   = Just (y + 3)
        in compute x
    
      (* Output *)
      def add-3-not-on-0 x =
        let compute y = if (Just (y + 3)) Nothing (y == 0)
        in compute x
    
      (** Example 3 *)
    
      (* Input *)
      def add-3-not-on-0 x =
        let compute y
             | y == 0 = Nothing
            compute y = Just (y + 3)
        in compute x
    
      (* Output *)
      def add-3-not-on-0 x =
        let compute y = if (Just (y + 3)) Nothing (y == 0)
        in compute x
    
    
      (** Example 4 *)
    
      (* Input *)
      def add-3-unless-1-or-0 x
        | x == 0 = 0
        | x == 1 = 1
        | else   = x + 3
    
      (* Output *)
      def add3-unless-1-or-0 x =
        if 0
           (if 1
               (x + 3)
               (x == 1))
           (x == 0)
    
      (* If is defined as follows *)
      (* about usages, consult the following table about
         what we can do usage wise
         cond
           first-class-usages? = if' (* comedy *)
           weakening?          = 1
           else                = ω *)
      sig if : {P : ((x : Bool) -0-> *))}
            -0-> t : P True
            -ω-> f : P False
            -ω-> b : Bool
            -1-> P b
      let if {Proof} then-case else-case True  = then-case
          if {Proof} then-case else-case False = else-case
  • Description

    Here we do the majority of the work of removing guards from lets. We can notice that lets and defs can both at this point desugar their guard checks into dispatching to this if destructor function given.

    We can also notice that after we run this, there can be no more lets with multiple patterns as both guards and matches have been handled now.

  • Possible Complications

    • We already handle guards in the desugar passes. Currently it just sees the guards as Cond's and desugars them into cases. This is however incorrect as pattern matching has a fall through behavior this does not respect.

      • This can be solved with a proper pattern matching algorithm

    • This algorithm may be better served by the linearization algorithm given by the pattern match simplification. I'm not sure this is always possible in the def case. For the simple lets we currently have this works as a strategy however.

    • This could maybe also be served better with an with pattern through some strategy. See the following.

      
        (* Input *)
        sig foo : maybe (maybe int) -1-> int
        def foo (Just (Just x)) | x == 0      = 3
            foo (Just x)        | x == Just 3 = 5
            foo _                             = 0
      
        (* output *)
        sig foo-inner : maybe (maybe int) -1-> bool -1-> bool -1-> int
        def foo-inner (Just (Just x)) true _    = 3
            foo-inner (Just x)        _    true = 5
            foo-inner _               _    _    = 0
      
        sig foo : maybe (maybe int) -1-> int
        def foo m@(Just (Just x)) = foo-inner m (x == 0) false
            foo m@(Just x)        = foo-inner m false    (x == Just 3)
            foo m                 = foo-inner m false    false

Local Lets Into Lambdas

  • BNF

      input-expression ::=
          let <name> <name>* = <expression> in <expression>
        | ...
    
      output-expression ::=
          let <name> = <expression> in <expression>
        | ...
  • Examples

      (** Example 1 *)
    
      (* Input *)
      def foo =
        let bar a b c = a + b + c in
        bar 1 2 3
    
      (* Output *)
      def foo =
        let bar = λ a  (λ b  (λ c  a + b + c)) in
        bar 1 2 3
  • Description

    Here we just take the given let form and transform it into it's pure lambda form for the body.

  • Possible Complications

    • The type may need to be explicitly given, thus forcing us to give a signature with our bar.

Open Expression into Let Over Simple Open

  • BNF

      input-expression ::= open <expression> in <expression>
                         | ...
    
      output-expression ::= open <name> in <expression>
                          | ...
  • Examples

      (** Example 1 *)
    
      (* Input *)
      def foo =
        open bar in
        x + y
    
      (* Output *)
      def foo =
        open bar in
        x + y
    
      (** Example 2 *)
    
      (* Input *)
      def foo xs =
        open (fold building-up-a-module xs : mod {}) in
        x + y
    
      (* Output *)
      def foo xs =
        let sig #:gensym 1 : mod {}
            def #:gensym = fold building-up-a-module xs
        open #:gensym in
        x + y
  • Description

    Here we want to have any non trivial open converted into a form where we let a name bind to this complicated expression and then have that name be used once in the open expression.

  • Possible Complications

    • There is an outstanding question of how opens relate to de-buijn indicies.

      • Maybe name resolution can help sort this out?

    • Type resolution of the module is also an interesting question to answer as well.

  • Literature

HR to IR

  • BNF

      term-input ::= λ <name> ⟶ <expression>
                   | let <usage> <name> ...
                   | sig <usage> <name> ...
                   | pi  <usage> <name> ...
                   | ...
    
      term-output = λ ⟶ <expression>
                   | let <usage> = <expression>
                   | sig <usage> ...
                   | pi  <usage> ...
  • Examples

      (** Example 1 *)
    
      (* Input *)
      def foo a b =
        let add-them = a + b in
        λ d  d + add-them
    
      (* Output *)
      def foo (Global-Pat 0) (Global-pat 1) =
        let (global-pat 0 + global-pat 1) in
        λ  indicy 0 + indicy 1
    
      (** Example 2 *)
    
      (* Input *)
      def foo (Just a) (List b c d) =
        let add-them = a + b in
        λ d : (Π 0 (x : Nat.t) Nat.t)   d + add-them + c + d
    
      (* Output *)
      def foo (Just (Global-pat 0))
              (List (Global-pat 1) (Global-pat 2) (Global-pat 3)) =
        let (global-pat 0 + global-pat 1) in
        λ : (Π 1 Nat.t Nat.t)   indicy 0 + indicy 1 + global-pat 2 + global-pat 3
    
      (** Example 3 *)
    
      (* Input *)
      def foo (Just a) (List b c d) =
        let add-them = a + b in
        λ typ : (Π 0 (typ : ×₀) (Π 1 typ typ))  
          λ d 
            d + add-them + c + d
    
      (* Output *)
      def foo (Just (Global-pat 0))
              (List (Global-pat 1) (Global-pat 2) (Global-pat 3)) =
        let (global-pat 0 + global-pat 1) in
        λ : (Π 0 ×₀ (Π 1 (indicy 0) (indicy 1))) 
         λ 
           indicy 0 + indicy 2 + global-pat 2 + global-pat 3
  • Literature

    See De Bruijn

  • Description

    Here we simply run a simple De Bruijn algorithm. In our code base this is actually already fully implemented

  • Possible Complications

    • Bug see issue #864, namely we are inconsistent about how we handle recursive lets

      • Local recursive functions have to be lambda lifted per discussion

    • How do we treat opens in terms of de-bruijn indicies?

Name Resolution

  • BNF

      input-expression ::= open <name> in <input-expression>
                         | ...
      ;; remove open
      ;; Note that in more complex cases this may not be possible, so we may
      ;; not have any bnf changes in this pass
      output-expression ::= ...
  • Examples

      (** Example 1 *)
    
      (* Input 1 *)
    
      (* Written by the user, which is part of the env by now *)
      open Prelude
    
      (* Input we see at this level *)
      sig add-list : Int.t -> List.t Int.t -> Int.t
      def
        add-list (global-pat 0) (Cons (global-pat 1) (global-pat 2)) =
          open Michelson in
          (* + is in Michelson *)
          let global-pat 0 + global-pat 1 in
          add-list (indicy 0) (global-pat 2)
        add-list (global-pat 0) Nil =
          global-pat 0
    
      (* Output *)
      sig add-list :
        Prelude.Int.t Prelude.-> Prelude.List.t Prelude.Int.t Prelude.-> Prelude.Int.t
      def
        add-list (global-pat 0) (Prelude.Cons (global-pat 1) (global-pat 2)) =
          open Prelude.Michelson in
          (* + is in Michelson *)
          let global-pat 0 Prelude.Michelson.+ global-pat 1 in
          add-list (indicy 0) (global-pat 2)
        add-list (global-pat 0) Prelude.Nil =
          global-pat 0
  • Description

    Here we run name resolution, for top level opens, this is rather simple, as our current Contextify step notes top level opens. Thus what needs to be respected is how opens effect the closure and determine for that what is referencing to outer names.

  • Possible Complications

    • To get complex opens working, we need to do some form of type checking. Thus this might have to be integrated to some level of type checking.

  • Literature

Infix Resolution

  • BNF

      input-expression ::= <input-expression> <op> <input-expression>
                         | ...
    
      output-expression ::= ...
  • Examples

      (** Example 1 *)
    
      (* Input *)
      (* * has precedence left 6  *)
      (* + has precedence 5  *)
      def foo = a * b * d + c
    
      (* Output *)
      def foo = + (* a (* b d))
                  c
  • Description

    Here we have to resolve infix symbols into prefix symbols given some precedence table. The easiest way to do this is to modify the Shunt Yard algorithm slightly for the constraints of Juvix. Thankfully this is already in.

  • Possible Complications

    • One has to be careful about how infixl, infixr and infix all interact with each other.

  • Literature

    See Shunt Yard

To ANF

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Mutual Block

Type Checker

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Witch

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Erase Zero Usages

  • BNF

      ;; No BNF changes
  • Examples

      (** Example 1 *)
      (* Example may be artificial *)
      (* Input *)
      sig linear-comb-pos-nats
        : Π 1 Nat.t
             (Π 1 Nat.t
                  (Π 2 Nat.t
                       (Π 0 (>= (indicy 0) 1)
                            (Π 0 (>= (indicy 2) 2)
                                 (refinement Nat.t (> (indicy 0) 1))))))
      (* global-pat 3 and global-pat 4 are the 0 usage proofs *)
      def pos-linear-comb (global-pat 0)
                          (global-pat 1)
                          (global-pat 2)
                          (global-pat 3)
                          (global-pat 4) =
        let ( * (global-pat 2) (global-pat 1)) : 1 Nat.t in
        (* Let's prove that our result is at least (max (arg 1) (arg 2)),
           using our proofs of (arg 3) and (arg 4) *)
        let times-by--is-greater (indicy 0) (global-pat 3) (global-pat 4) : 0 Nat.t in
        refinement (+ (indicy 1) ( * (global-pat 0) (global-pat 3))) (indicy 0)
      (* Output *)
    
      sig linear-comb-pos-nats
        : Π 1 Nat.t
             (Π 1 Nat.t
                  (Π 2 Nat.t
                       (Π 0 (>= (indicy 0) 1)
                            (Π 0 (>= (indicy 2) 2)
                                 (Refinement Nat.t (> (indicy 0) 1))))))
      (* global-pat 3 and global-pat 4 are the 0 usage proofs *)
      def pos-linear-comb (global-pat 0)
                          (global-pat 1)
                          (global-pat 2)
                          (global-pat 3)
                          (global-pat 4) =
        let ( * (global-pat 2) (global-pat 1)) : 1 Nat.t in
        (+ (indicy 0) ( * (global-pat 0) (global-pat 2)))
  • Description

    In this code we want to remove any code which has 0 usages in them. In particular, we can see in Example 1 that the proof about the number being greater than 1 is fully erased.

    However, it is important to note a few points. We did not erase the passing of extra arguments to linear-comb-pos-nats. This is because these arguments are still passed on the application side, both in existing code and in any incremental compilation passes. These arguments are ignored. Alternatively, we could chose a strategy such that it's noted what the 0 usage arguments are, but have a shim layer that transforms arguments before application with functions with 0 argument functions.

  • Possible Complications

    • What to do with 0 usage function arguments. See Description for another method of doing this.

    • What do we do about top level records usage wise? We need the 0 usage arguments, so we have to treat them in an interesting way.

    • We need to preserve signatures which only partially care about usage erasure.

  • Literature

Inline Algorithm

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Restore name from De Bruijn indicies

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Fast Curry

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Capture Lambda Environment

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Lambda Into Binder and Closure

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Outstanding questions

How to Deal with Expansion?

Need eval? How does this reflect in the pipeline

Types of Macros?

How do we properly type it

Algebraic effects without Algebraic effects?

We want to structure the typeCheckEvalLoop as follows

  throw CantResolveSymbol ~> nameResolution

  -- typeChecking does
  -- Hole insertion, unifier, Type Checking, nameResolution
  typeCheckEvalLoop =
    doesAthing
      {resolverEffects
          = {
            -- filling holes
            , UnifyImplicits    ~> unifier
            -- Creating holes for the unifier to fill
            , InsertImplicit    ~> implicitInsertion
            -- For evaling terms
            , Evaluate          ~> eval
            -- For first class Name Resolution
            , CantResolveSymbol ~> nameResolution
            }}

Exhibition

def bar =
  let
    mod baz =
      mod biz =
        def foo = 3.
  baz.biz.foo
   ~> 0.biz.foo
   ~> resolve-symbol (resolve-symbol foo biz) 0

⒉ openExpressionIntoSimpleOpenPlusLet

def bar =
  let mod bar =
      def foo = 3.
  let #g:gensym = (bar djfka djfkas dfjak)
  open #g:gensym ~> open with type {foo}
  resolve-symbol foo #g:gensym.

def bar =
  let mod bar x y z =
        def foo = 3.
  let #g:gensym = (bar djfka djfkas dfjak)
  open #g:gensym ~> open with type {foo}
  resolve-symbol foo #g:gensym.


def bar =
  let mod bar x y z =
        def foo = 3.
  open (fold function-generating-a-module xs : mod {…})
  resolve-symbol foo #g:gensym.

def bar =
  let mod bar x y z =
        def foo = 3.
  let def #g:gensym =
        fold function-generating-a-module xs : mod {…}
  open-simple #g:gensym
  resolve-symbol foo #g:gensym.

References

Inlining

De Bruijn Indicies

  1. Benjamin C. Pierce: Types and Programming Languages chapter 6 page 75-80

Template

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

7 - Backends

Juvix supports backends for generating code for different targets (Michelson, LLVM etc.). Juvix programs are written in terms of primitives, i.e. values and types that are directly available in the target language. This allows is us to abstract away from target-specific code by replacing these primitives with ones from other targets, but at the same time give enough flexibility to execute target-specific operations if necessary.

Connection with pipeline

The backend implements several datatypes and class-instances that connect to the pipeline of Juvix.

Main backend instance

An instance of HasBackend, that implements the main connection to the pipeline:

  • The datatype used for types, values and errors (Ty, Val, Err

respectively);

  • stdlib, the Juvix source file(s) that make up the standard library for

this backend;

  • Optionally an implementation for the parse function;

  • Optionally an implementation for the typecheck function;

  • An implementation for the compile function.

Parameterisation

We need to inform the Juvix compiler of some basic information about our backend, such as:

  • A function for typechecking primitives: hasType.

  • Lists of builtin values and types: builtinTypes and builtinValues.

  • Functions for creating primitives based on a literal integers, floats or strings: stringVal, intVal and floatVal.

Substitution and weakening

For compilation, the backend takes a representation of the input program that is very close to a primitive-extended lambda calculus with multi-argument lambda's. As the primitives of the target language themselves can be functions, for example basic arithmetic operators, we can normalize these terms as well. For this we need to instantiate two classes for the primitives:

  • HasSubstV: For substitution of term.

  • HasWeak: For weakening of terms (accounting for removal of a lambda abstraction).

  • HasPatSubstTerm: Substitution of patterns.

Each of these are documented in more detail at the evaluator page.

Additionally, to make applying terms possible during substitution, the CanApply class needs to be instantiated for primitive terms and types.

7.1 - LLVM

Note that this description is merely a design, and does not reflect the current state of the LLVM backend yet. The design and implementation might change in the future.

Introduction to LLVM

LLVM is an intermediate language and collection of compiler tools. The intermediate language can be compiled to several backends including AMD64 and Web Assembly, giving us a lot of flexibility.

Structure of the backend

Primitives

The LLVM backend uses an intermediate representation for primitives that are specific to LLVM: RawPrimVal for values and functions, and PrimTy for types. Each of these have a rather direct relation to constructs available in LLVM.

Application

It is possible that primitives are terms that can be reduced further, i.e. addition of two constants can be replaced by a single constant. To support this, the LLVM backend implements application of terms by instancing the CanApply class for both types and values.

Parameterisation

The parameterisation of the LLVM backend is really easy and follows implementation of the Parameterisation datatype:

  • hasType is a function that checks the types for raw primitive values. Currently the typechecking is very basic, it just checks if the given types matches with the primitive value in arity. For operator primitives, we check if the types of the arguments are equal. This implementation will definitely have to be refined in the future.

  • builtinTypes and builtinValues just provide a mapping from source code literal names to the actual literals.

  • integerToRawPrimVal creates a literal based on an integer. Currently it doesn't take the size of the literal into account, as integer literals always share the same type. This should be improved in the future.

  • There are currently no implementations for stringVal and floatVal, as the backend does not support these types yet.

Substitution and weakening

For simplicity, the current HasWeak, HasSubstValue and HasSubstTerm instances do not do anything at all. This will likely change in the future.

Connection to the pipeline

The LLVM instances for HasBackend relies on the default for the parse function. Additionally:

  • The typecheck function relies on the general typechecker implementation by calling the typecheck' function, together with the parameterisation and a the LLVM representation of a set type.

  • The compile function, which gets a filename and an annotated term (AnnTermT) as the program. The annotated term with typed primitives is translated to an annotated term with raw primitives RawPrimVal. These are then translated to LLVM code.

Compilation

With the input program being represented by the annotated Term, we can now easily translate these to LLVM code. We rely mostly on the llvm-hs-pure package, as it provides a nice monadic interface LLVM.IRBuilder that keeps track of variables for us. Therefore, the implementation is rather straightforward:

  • The whole program, is translated to an LLVM module by the compileProgram function. This module contains a hardcoded main function, with the type taken from the outermost annotated term.

  • The term itself is compiled by the compileTerm function, using recursion for any sub terms.

  • For any primitive functions, we check if enough arguments are available, and if so the primitive function is written, with unnamed references to the arguments all taken care of by the IRBuilder monad.

Each of these functions are called in their own part of the pipeline, the output of parse is the input of typecheck and the output of typecheck is the input of compile.

References