LLVM

Note that this description is merely a design, and does not reflect the current state of the LLVM backend yet. The design and implementation might change in the future.

Introduction to LLVM

LLVM is an intermediate language and collection of compiler tools. The intermediate language can be compiled to several backends including AMD64 and Web Assembly, giving us a lot of flexibility.

Structure of the backend

Primitives

The LLVM backend uses an intermediate representation for primitives that are specific to LLVM: RawPrimVal for values and functions, and PrimTy for types. Each of these have a rather direct relation to constructs available in LLVM.

Application

It is possible that primitives are terms that can be reduced further, i.e. addition of two constants can be replaced by a single constant. To support this, the LLVM backend implements application of terms by instancing the CanApply class for both types and values.

Parameterisation

The parameterisation of the LLVM backend is really easy and follows implementation of the Parameterisation datatype:

  • hasType is a function that checks the types for raw primitive values. Currently the typechecking is very basic, it just checks if the given types matches with the primitive value in arity. For operator primitives, we check if the types of the arguments are equal. This implementation will definitely have to be refined in the future.

  • builtinTypes and builtinValues just provide a mapping from source code literal names to the actual literals.

  • integerToRawPrimVal creates a literal based on an integer. Currently it doesn't take the size of the literal into account, as integer literals always share the same type. This should be improved in the future.

  • There are currently no implementations for stringVal and floatVal, as the backend does not support these types yet.

Substitution and weakening

For simplicity, the current HasWeak, HasSubstValue and HasSubstTerm instances do not do anything at all. This will likely change in the future.

Connection to the pipeline

The LLVM instances for HasBackend relies on the default for the parse function. Additionally:

  • The typecheck function relies on the general typechecker implementation by calling the typecheck' function, together with the parameterisation and a the LLVM representation of a set type.

  • The compile function, which gets a filename and an annotated term (AnnTermT) as the program. The annotated term with typed primitives is translated to an annotated term with raw primitives RawPrimVal. These are then translated to LLVM code.

Compilation

With the input program being represented by the annotated Term, we can now easily translate these to LLVM code. We rely mostly on the llvm-hs-pure package, as it provides a nice monadic interface LLVM.IRBuilder that keeps track of variables for us. Therefore, the implementation is rather straightforward:

  • The whole program, is translated to an LLVM module by the compileProgram function. This module contains a hardcoded main function, with the type taken from the outermost annotated term.

  • The term itself is compiled by the compileTerm function, using recursion for any sub terms.

  • For any primitive functions, we check if enough arguments are available, and if so the primitive function is written, with unnamed references to the arguments all taken care of by the IRBuilder monad.

Each of these functions are called in their own part of the pipeline, the output of parse is the input of typecheck and the output of typecheck is the input of compile.

References

Last modified August 3, 2021 : LLVM documentation (#17) (067ac5d)