Backends

1: LLVM

Juvix supports backends for generating code for different targets (Michelson, LLVM etc.). Juvix programs are written in terms of primitives, i.e. values and types that are directly available in the target language. This allows is us to abstract away from target-specific code by replacing these primitives with ones from other targets, but at the same time give enough flexibility to execute target-specific operations if necessary.

Connection with pipeline

The backend implements several datatypes and class-instances that connect to the pipeline of Juvix.

Main backend instance

An instance of HasBackend, that implements the main connection to the pipeline:

The datatype used for types, values and errors (Ty, Val, Err

respectively);

stdlib, the Juvix source file(s) that make up the standard library for

this backend;

Optionally an implementation for the parse function;
Optionally an implementation for the typecheck function;
An implementation for the compile function.

Parameterisation

We need to inform the Juvix compiler of some basic information about our backend, such as:

A function for typechecking primitives: hasType.
Lists of builtin values and types: builtinTypes and builtinValues.
Functions for creating primitives based on a literal integers, floats or strings: stringVal, intVal and floatVal.

Substitution and weakening

For compilation, the backend takes a representation of the input program that is very close to a primitive-extended lambda calculus with multi-argument lambda's. As the primitives of the target language themselves can be functions, for example basic arithmetic operators, we can normalize these terms as well. For this we need to instantiate two classes for the primitives:

HasSubstV: For substitution of term.
HasWeak: For weakening of terms (accounting for removal of a lambda abstraction).
HasPatSubstTerm: Substitution of patterns.

Each of these are documented in more detail at the evaluator page.

Additionally, to make applying terms possible during substitution, the CanApply class needs to be instantiated for primitive terms and types.

1 - LLVM

Note that this description is merely a design, and does not reflect the current state of the LLVM backend yet. The design and implementation might change in the future.

Introduction to LLVM

LLVM is an intermediate language and collection of compiler tools. The intermediate language can be compiled to several backends including AMD64 and Web Assembly, giving us a lot of flexibility.

Structure of the backend

Primitives

The LLVM backend uses an intermediate representation for primitives that are specific to LLVM: RawPrimVal for values and functions, and PrimTy for types. Each of these have a rather direct relation to constructs available in LLVM.

Application

It is possible that primitives are terms that can be reduced further, i.e. addition of two constants can be replaced by a single constant. To support this, the LLVM backend implements application of terms by instancing the CanApply class for both types and values.

Parameterisation

The parameterisation of the LLVM backend is really easy and follows implementation of the Parameterisation datatype:

hasType is a function that checks the types for raw primitive values. Currently the typechecking is very basic, it just checks if the given types matches with the primitive value in arity. For operator primitives, we check if the types of the arguments are equal. This implementation will definitely have to be refined in the future.
builtinTypes and builtinValues just provide a mapping from source code literal names to the actual literals.
integerToRawPrimVal creates a literal based on an integer. Currently it doesn't take the size of the literal into account, as integer literals always share the same type. This should be improved in the future.
There are currently no implementations for stringVal and floatVal, as the backend does not support these types yet.

Substitution and weakening

For simplicity, the current HasWeak, HasSubstValue and HasSubstTerm instances do not do anything at all. This will likely change in the future.

Connection to the pipeline

The LLVM instances for HasBackend relies on the default for the parse function. Additionally:

The typecheck function relies on the general typechecker implementation by calling the typecheck' function, together with the parameterisation and a the LLVM representation of a set type.
The compile function, which gets a filename and an annotated term (AnnTermT) as the program. The annotated term with typed primitives is translated to an annotated term with raw primitives RawPrimVal. These are then translated to LLVM code.

Compilation

With the input program being represented by the annotated Term, we can now easily translate these to LLVM code. We rely mostly on the llvm-hs-pure package, as it provides a nice monadic interface LLVM.IRBuilder that keeps track of variables for us. Therefore, the implementation is rather straightforward:

The whole program, is translated to an LLVM module by the compileProgram function. This module contains a hardcoded main function, with the type taken from the outermost annotated term.
The term itself is compiled by the compileTerm function, using recursion for any sub terms.
For any primitive functions, we check if enough arguments are available, and if so the primitive function is written, with unnamed references to the arguments all taken care of by the IRBuilder monad.

Each of these functions are called in their own part of the pipeline, the output of parse is the input of typecheck and the output of typecheck is the input of compile.

References

LLVM website: https://llvm.org