This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Juvix Compiler Documentation

This page gives the overview of all other documentation pages

Welcome

Welcome to the documentation of the Juvix Compiler. Juvix is a functional compiler that is inspired by various dependently typed languages (Agda, Idris, F), ML's (OCaml, Haskell), Lisps (Common Lisp, Scheme), and much more. Thus our compiler has many idioms and ideas that are not typically found within a standard Haskell codebase.

The documentation is broken down into many sections that are linked at the bottom of this page. Each are unique in their topic. One section may cover General Overviews of the entire codebase while another may just talk about the justification for our various idioms.

Getting Started

See the overview sub-page to learn more about the Juvix compiler process

1 - Terminology

Definitions for the more unique terminology we use.

Shared Context

Since most of us come from a Haskell/ML background, the bulk of that terminology will be omitted here. However since most of us are unfamiliar with all the other inspirations Juvix is derived from, we lay out such terminology here.

Image

This is probably the most confusing terminology we use. An image is a living changing environment, from which where we can take in new code inputted by the system and/or user and change the world the code lives in. There are many practical benefits of this, see the design section of this web-page for more.

Note that Image is used for more than just programming languages. One can view the kernel of an Operating system as an image (we often do download image files.), or if the operating system is Image based, the entire thing. The terminology also shows up in docker Images.

Effects

By effects we typically mean effect systems, and for our implementation we typically use the word, effect/effects, to mean algebraic effect/effects.

The key idea of effects is that a type doesn't only describe the shape of the final computation, but rather an approximation of the various effects it causes when we evaluate it.

Effects have a rather rich history starting in 1987 with the FX-87 programming language. to see the progression of the ideas see the following slide by Benjamin Pierce

Types and Programming LanguagesThe Next Generation (2003) page 36

Desugaring

Desugaring is a word synonymous with macro-expansion, or syntax expansion. It is a process by which we transform one syntax form to another by some abstract syntax transform. See the following example.

  ;; See the S-expression section for slightly more information on this
  ;; structure

  ;; Input generic structure

  (:do (%<- name-1 body-1)
       body-2
       body-3
       
       (%<- name-n body-n)
       return)

  ;; Ouptut generic structure

  (>>= body-1
     (lambda (name-1)
       (Prelude.>> body-2
                  (Prelude.>> body-n
                             ( (Prelude.>>= body-n (lambda (name-n) return)))))))

In this example we desugared Haskell do syntax into the base monadic operation form.

Core

Core is a term we use to mean a class of Intermediate representation (IR) Layers of the Juvix compiler. These classes of layers deal with the type theory, and is what our fronted syntax desugars into.

Currently there are over 3 layers of Core IR, Core.HR, Core.IR, Core.Erased, etc. You can find out more about those layers over in the overview section.

Read Time

Interaction Nets

Interaction Nets are a novel graphical model (graphical as in graph based) of computation. Like all graphs, Interaction Nets, are composed of nodes, edges, and edges which is the spot where the edges connect to on a node. Each node has a single principled port. Each node may be of a different types (Cons is an unique node, so is plus, every operation you can think of is unique.) and have different rules. The rules are limited in that, they can only fire when two principled ports touch, and there is a rule talking about the reduction of these nodes.

Since the operations are limited to principled ports only, the system is trivially parallelizable. There are many other benefits as well. To see some example pictures, click on this wikipedia page here.

Parameterization

We often use the word Parameterization around the code-base pertaining to the various backends. In particular it is the abstraction (interface) around primitives the backend may have.

Such abstractions include but not limited to:

  1. The primitive types in the code-base

  2. Applying primitive types for type checking

  3. A list of builtin values the backend may have

  4. Abstractions over various literals in the fronted language

    • Ints, Strings, etc.

Form

Form is a term often used in the lisp world. In Juvix we mean it much in the same way.

S-expression

Juvix is interesting in that we have the ability to reason over the syntax abstractly. This form, for the sake of ease of use and simplicity, should be as minimal as possible. Thus we have chosen to use S-expressions as our Form.

An S-expression consists of an atom (think symbol, number, string etc.) or a list containing more S-expressions. They are often written as such

  ;; these are examples of atoms
  1
  a
  hi
  +

  ;; these are examples of lists
  (+ 1 2 3)
  (+ (+ 3 4 5) (+ 1 2 3))

They are quite versatile and offer a consistent interface of manipulation throughout the code-base.

Symbol

A symbol is simply an identifier for some value in the language. For example in

  def foo = 3.

foo is the symbol that when evaluated becomes 3. Some symbols evaluate to themselves.

Symbols are not prefixed, to see more of those see the Name Symbol section on this page.

Name Symbol

Name Symbols (Or NameSymbol as in the code-base) are qualified symbols. So instead of saying foo we could say Prelude.foo. There is a distinction, as resolution of non qualified symbols are a bit more implicit and don't explicitly talk about which resolution path they take unlike Name Symbols. There are also some forms (like def) that expect the name they take to be a symbol rather than a Name Symbol, these are often arbitrary choices that may change in the future.

Telescope

Recursive Groups

2 - Design

A document on various language design decisions and philosophies for Juvix. Also hosts a variety of sub documents on various language aspects

Design Philosophy

The philosophy of Juvix is one of generality and flexibility. It should not be the case that the Juvix compiler developers should have many extra tools over the users of the language. The compiler/language developers/designers should be able to chose what is canonical, however the forms of the language should be generic in a way that ordinary users could have made if they so wish.

For example, the form of def in the frontend language, is not a fixed piece of syntax, but rather a simple macro over a primitive that is most suitable for our core type theory. This means that if one is an application developer for some domain, like web-browsers, they can make their own version of def (def-command) that fits into their domain better (registering hooks and functions into the web-browser itself!).

In order to achieve this goal of flexibility, Juvix has made many design decisions that allow traditionally built in tools to be first class in Juvix.

  • Some of the major design decisions

    1. Small core syntax

      • This means our syntax isn't special

    2. Image based system

      • This means Juvix can talk about the environment in which it lives in.

  • Some Applications that are traditionally built in or hard to achieve without these systems

    1. Build tools can be made in the language!

    2. Live documentation tools can be made while developing!

    3. Proof Assistant with more power to effect the environment for a better interactive experience.

    4. Better proof generation and tools to introspect various aspects of their Code.

Another design goal of Juvix is to allow for safer programs that are mission critical. When one is writing smart contracts that have millions of dollars at stake, a mistake is unacceptable. Thus Juvix has dependent types. Dependent types are wonderful at removing entire classes of errors with proofs about behavior. Further they can make many idioms in a typed language simpler.

However dependent types on their own can be confusing, as the specific incantations one needs to write to be able to effectively prove code is often not intuitive. Juvix thus seeks to make tooling in this area better, offering abstractions that help the language work to help you achieve your goals.

Design Influences

  • Growing a language

    • A wonderful talk by guy Steele on why a language should allow user's features to feel built in.

Decisions

2.1 - Extensible

Talking about the Extensible system.

Like Trees that Grow. When you give an extensible declaration. What it does is add extra field to every constructor. It does this via type fields for each constructor. For example on Either, we'd get Left a Annotation, Right b Annotation, and an extension field. The type families for each constructor also has the extension type. If the extension is a type annotation, you can add the primty and primval to have the primitive types.

These live in the Extend modules. We also generate a constructor


  extTerm =
    \_primTy _primVal ->
      IR.defaultExtTerm
        { IR.nameLam = "Lam0",
          IR.typeLam = Just [[t|NameSymbol.T|]],
          IR.namePi = "Pi0",
          IR.typePi = Just [[t|NameSymbol.T|]],
          IR.nameSig = "Sig0",
          IR.typeSig = Just [[t|NameSymbol.T|]],
          IR.nameLet = "Let0",
          IR.typeLet = Just [[t|NameSymbol.T|]]
        }

  extElim :: p1 -> p2 -> IR.ExtElim
  extElim =
    \_primTy _primVal ->
      IR.defaultExtElim
        { IR.typeBound = Nothing,
          IR.typeFree = Nothing,
          IR.typeElimX = [("Var", [[t|NameSymbol.T|]])]
        }

  IR.extendTerm "Term" [] [t|T|] extTerm

  -- TODO allow extendTerm to reorder fields?
  pattern Lam x t = Lam0 t x
  • The typeElimX can add multiple constructors.

  • The name<Constructor> is the new constructor name.

  • The type<Constructor> talks about an extra argument.

  • The extendTerm is how we instantiate the alias and extensions system.

  • The pattern are for synonyms to make the argument order better

    • This is needed for more than 1 extra constructor, as it'll generate lefts and rights.

2.2 - Image System

In which we talk about the Image System

What Is the Juvix Image System

The definition for Image can be found in the terminology section. As for the image system itself, it is the way we are able to talk about Juvix in Juvix. It allows us primitives in Juvix to be able to reason over what is in scope, or to change and add to the system while it's live. It is a very powerful technique that allows Juvix to do a lot with little effort.

Image System Goals

Research Areas

Images are as old as computation, with the Lisp 1.5 compiler being an image which wrote to and read from a magnetic strip (source required). However there is very little literature on the subject. This was a list of resources we were able to collect about the subject. Further, since the technique has fallen out of fashion in the past few decades, there are many areas we can explore on this front

First Class Environments

First class environments are a way of making the environment code lives in first class. This means we could talk about discrete environments and more easily distinguish between startup, compilation time, evaluation time, etc environments. There are many bigger implications as well, like being able to talk about environments over the network, having sandboxed environments, etc.

While the benefits are quite promising, the technique seems to suffer from a lack of research. There is the SICL paper that talks about adding it to Common Lisp, and the SBCL compiler which at it's bootstrapping phase works on creating it's own environment. There is also MIT scheme which has first class environments

With Juvix, we can explore these ideas a lot further, as we have many systems which facilitate this kind of reasoning. Three particular constraints come together for this.

  1. We have an algebraic effects system

    • This is very significant for making progress, as we can see precisely how our effects touch the image. If we can determine that, we can likely create a system that can precisely denote the scope of the image we need to reason over.

    • With that said, we need to be careful with our thoughts here, as free variables are an effect that are discharged by the default handler. Meaning that if the system fully throws this information away as it can lookup in the image what answers are bound to and throw away what it relies upon in the image to function, then this will be harder to manage.

  2. We have multiple backends we must support in a single program.

    • This means that we already have to tackle the issue of some parts of our language being out of scope.

    • We are likely to include parts of Racket's #lang research, which should help talking about environments on a module level.

  3. We have a first class module system.

    • We might be able to formalize this idea as our module system, or view it as an extension of our module system.

    • After all the image is one big module!

If we are able to synthesize 1., 2., and 3. properly along with our type system, we should have a very ergonomic way of creating very eloquent systems that can talk about library versioning, backend support, and evaluation time processing.

Possible Benefits

  1. Talk formally about the various environments

    • Boot Strapping environment

    • Evaluation Environment

    • Compilation Environment

  2. Better Integration with Block chains

    • We could ingest Anoma data into the image, and have multiple different conflicting nets around.

    • We can fully explore them as if they were on our live image, and switch between the environments and see what effect our code would have on different sets of conflicting data!

  3. Give us a formal way to talk about the environment where our type system gets injected

    • The base image will have to not have an image, as that is rather hairy to do.

    • What this gives us is an environment in which we can talk about everything having a nice type system.

      • We can even push this further for other kinds of type systems ala what Agda does, but as libraries, not baked in(?)

  4. Leads into a Possible packge/versioning system.

    • This could give us an LTS model like Haskell has, and maybe even better as we can have multiple versions on our image that doesn't conflict with each other?

2.3 - Subtyipng

This article covers the discussion on subtyping and considerations we’ve had

This document is bound to change once we have a proper writeup of row polymorphism

Subtyping

An argument about the extent of subtyping

Subtyping is an important part of any module or object system. The rules these systems usually care about revolve around the subtype of Records and Records in relation to functions.

About Subtyping Rules on Records

An example of a record subtyping is as follows, we will use a :> b to mean that a is a subtype of b

  1. ∀x, a … v : Type. {x : Type, a : Type … v : Type} :> {a … v : Type}

  2. ∀x, a … v : Type. ∀ f : {a : Type … v : Type} → Type
    f :> f : {x : Type, a : Type … v : Type} → Type

What 1 tells us is that we can forget a field, and thus a record with more fields is a sub type of the same record with less fields

What 2 tells us is that a function which accepts a record with fields a through v can also accept a record with those fields and extra fields. The intuition is that in functions the sub typing relation is reversed, but we still need to forget information to get our relationship.

Implications of Rules and Arguments about Behavior

These implications can be quite worrying, consider the longer-form example below. We will introduce all the forms first then make an argument about what various object systems have done and what we are proposing to do.

  f0 : λX:>{x : int} -> x1 -> x2 -> x1
  f0 rec1 rec2 = rec1 { x = rec1.x + rec2.x}


  f1 : λX:>{x : int} -> X -> X -> X
  f1 rec1 rec2 = rec1 {x = rec1.x + rec2.x}


  f2 : λX:>{x : int} -> X -> X -> X
  f2 rec1 rec2 = {x = rec1.x + rec2.x}


  -- note that the t is opaque
  sig set : int -> {x : t,  type t, add : t -> t -> t}
  mod set x =
    type t = int
    add = (+)
    x
  end

  -- We include the set
  sig bag : int -> {x : t,  type t, add : t -> t -> t}
  mod bag x = include (set x)

  foo a b c =
    let ab = a in
    ...

   foo1 a b c =
     let ab = new bag in
     ...


  f3 : λX:>{x : int} -> X -> -> X
  f3 rec1 = rec1 {x = rec1.x + rec1.x}
NOTE

f0 uses a short hand notation. x1 and x2 are sub types of {x : int}.

The main functions we wish to discuss are f0, f1, and f2. In an object oriented system all three of these are permissible. This is due to in a sub typing relation these are all fine, we just end up with some subtype of {x : int}.

However this is a major issue, see Subtyping, Subclassing, and Trouble with OOP, written by Oleg for how insidious this can make errors.

Thus on <2020-11-19 Thu> we have decided that f2 signature is invalid as we don't just have a subtype of {x : int}, but in fact our return type is {x : int}. Therefore, we don't have the issue of the function foo in the article above, which we will now quickly detail.

In our code above, foo and foo1 both return a derivative of the ab value. In an object oriented language the following type would be applicable to both

  λX:>bag X -> X -> X

The issue is that if we send in a set instead, then foo gives us a set, while foo1 would give us a bag. Thus in our type system foo1's signature would be

  λX:>bag X -> X -> bag

However what was more contentious was what to do with f1 vs f0. Should we allow

  f1 {x = 5; y = 3} {x = 1}

if f1 were f0 then obviously yes, as we know we get two seperate subtypes and return the first.

An argument for allowing this stems on the answer of This is how subtyping relations work. While a fair argument, we have concluded that either way is a valid way of working with sub typing, as we can merely check the two arguments are the same, then begin doing subtyping.

Another argument is around the flexibility of functors if we disallow the above example. However this argument was also flawed in that we still have a subtyping relation so f3 in our code block can be called like

  f3 {x = 3; y = 4}

Thus the argument regarding functors with a single argument can be disregarded. However, for multiple arguments the legitimacy of the rest of the claim can be discredited by having a mechanism which allows the user to specify that the two arguments are really subtypes of each other. This explicit system still allows the mixup of bag and set in Oleg's article, however it must be done so explicitly and full intention, with the most serious error still being impossible!

Finally the last claim for f1's legitimacy rests on flexibility of a library user and the incorrect choices of the author. f1 should really allow different subtypes, it just needs to specify it so! For this we can gain back our expressiveness by adding an coerce function which forgets fields to make the two records the same, and thus legitimate in f1

  f1 {x = 5; y = 3} (coerce {x = 1})

  -- or
  f1 (coerce {x = 5; y = 3}) {x = 1}

Further Arguments/Unresolved Issues

As of <2020-11-19 Thu> we still don't have a full answer for the following questions

  1. How do we handle usages

    • We have already have some arguments about ω and 0 usage items and how subtyping works there, however we still need to flesh out the mechanism for other usages and how subtyping will work in relation to usages inside records.

  2. How do we handle refinements

    • What do we do with this information, forget it for the application and allow implicit subtyping of refinements?

      • Currently we plan on translating refinements to Σ's and thus this will not be the default behavior.

2.4 - Trees-That-Grow

This covers why we use the extensible abstraction over trees that grow

Trees that Grow

In the Juvix codebase we have a variety of similar but different abstract syntaxes for Core—for example, adding type annotations or names (plain IR is locally nameless). Rather than duplicating code that needs to work on multiple kinds of syntax, or keep a single abstract syntax and write partial functions, we instead use the idiom described in the Trees that Grow paper by Najd and Peyton Jones.

The basic idea is to have a type family for each branch of the constructor and place anything not present in plain IR in there. In addition, there is also an extra "extension" type family for each datatype in the syntax to add extra constructors (for example, names). The first group all default to (), providing new information, and the second group to Void, ensuring they don't appear unexpectedly.

Functions which don't care about the annotations can simply ignore them and thus be applied to any kind of syntax (as long as the extensions are still Void). Alternatively, if they need to work on the annotations as well, they can be encoded using type classes (see Juvix.Core.IR.Evaluator.subst for an example of this).

2.5 - S-expression Syntax

The concrete syntax of the Juvix syntax after turning into a LISP

This document is bound to change once a syntax redesign will happen. This will be greatly minimized

For this page we give examples of the syntax rather than a hard bnf unlike what is found in the syntax page

Generation

Top Level

Let

  1. basic

    • input

        let foo x
          | x == 2 = 3 + 4 + x
          | else   = x + 2
    • output

        (:defun foo (x)
          (:cond
            ((:infix == x 2) (:infix + 3 (:infix + 4 x)))
            (else            (:infix + x 2))))

Sig

  1. basic

    • input

        sig foo : int -> int -> int
    • output

        (:defsig foo (:infix -> int (:infix -> int int)))
  2. usage

    • input

        sig foo 0 : int -> int -> int
    • output

        (:defsig foo (:usage 0 (:infix -> int (:infix -> int int))))
    • This form is a bit odd, ideally it'd mimic the Types version like

        (:defsig (foo :usage 0) (:infix -> int (:infix -> int int)))
      • However for now I think it's fine… Though this is a place for improvement for the future

Types

  1. record

    • input

        type foo x y z =
          { y-axis : y
          , x-axis : x
          , z-axis 1 : z }
    • output

        (type foo (x y z)
           (:record-d
             (y-axis (:primitve Builtin.Omega) y)
             (x-axis (:primitve Builtin.Omega) x)
             (z-axis 1 z)))
  2. first sum

    • input

        type foo x y z =
          | Foo {y-axis : y, x-axis : x, z-axis : z}
          | Bar (var1 : x) (var2 : y)
          | Car : hd : x -> cdr : foo y -> foo x y z
    • output

        (type foo (x y z)
          (Foo (:record-d (y-axis (:primitve Builtin.Omega) y)
                          (x-axis (:primitve Builtin.Omega) x)
                          (z-axis (:primitve Builtin.Omega) z)))
          (Bar (:paren (:infix : var1 x))
               (:paren (:infix : var2 y)))
          (Car (:arrow
                (:infix : hd
                        (:infix -> x
                                (:infix : cdr
                                        (:infix -> (foo y) (foo x y z))))))))
  3. Signature

    • input

        type foo x y z : typ -> typ -> typ = Foo a b c
    • output

        (type (foo :type (:infix -> typ (:infix -> typ typ))) (x y z)
           (Foo a b c))

Decalration

  1. basic

    • input

        declare infixl foo 3
    • output

        (declare infixl foo 3)

Module

  1. basic

    • input

        mod foo =
          let bar = 3
          type zar = Boo
        end
    • output

        (:defmodule foo ()
          (:defun bar () 3)
          (type zar () (Boo)))
  2. guards

    • input

        mod foo x
          | x == 3 =
            let foo = 3
            let bar = 5
          | else =
            let foo = 5
        end
    • output

        (:defmodule foo (x)
          (:cond ((:infix == x 3)
                  (:defun foo () 3)
                  (:defun bar () 5))
                 (else
                  (:defun foo () 5))))

Expression

Let

  1. basic

    • input

        let foo y =
          let fi = 3 in
          fi
    • output

        (:defun foo (y)
          (let fi () 3
            fi))
  2. arguments

    • input

        let foo y =
          let fi x = 3 + x in
          fi y
    • output

        (:defun foo (y)
          (let fi (x) (:infix + 3 x)
            (fi y)))

Module

  1. basic

    • input

        let foo =
          mod Bar =
            let bar = 3
            type zar = Boo
          end in
          Bar.bar
    • output

        (:defun foo ()
          (:let-mod Bar ()
             ((:defun bar () 3)
              (type zar () (Boo)))
            Bar.bar))

Tuples

  1. basic

    • input

        let foo = (1,2,3)
    • output

        (:defun foo () (:tuple 1 2 3))

Lists

  1. basic

    • input

        let foo = [1,2,3,4]
    • output

        (:defun foo () (:list 1 2 3 4))

Records

  1. basic

    • input

        let foo = {a, b = 2}
    • output

        (:defun foo () (:record (a) (b 2)))

Do

  1. basic

    • input

        let foo xs =
          a <- xs;
          more-comp;
          pure a
    • output

        (:defun foo (xs)
          (:do (%<- a xs)
               more-comp
               (pure a)))

Lambda

  1. basic

    • input

        let foo y =
          \x -> x + y
    • output

        (:defun foo (y)
          (:lambda (x) (:infix + x y)))

Open

  1. basic

    • input

        let foo y =
          open Prelude in
          x + y
    • output

        (:defun foo (y)
          (:open-in Prelude
             (:infix + x y)))

Parens

  1. basic

    • input

        let foo y = (y + 3) * 9
    • output

        (:defun foo (y)
          (:infix * (:paren (:infix + y 3)) 9))

Block

  1. basic

    • input

        let foo y = 3 * begin y + y end
    • output

        (:defun foo (y)
          (:infix * 3 (:progn (:infix + y y))))

Primitive

  1. basic

    • input

        let add = %Michelson.add
    • output

        (:defun add ()
          (:primitive Michelson.add))

Declaration

  1. basic

    • input

        let foo = declare infixl (+) 3 in 3
    • output

        (:defun foo ()
          (:declaim (infixl + 3) 3))

Cond

  1. basic

    • input

        let add x =
          if | x == 3 = 1
             | x == 4 = 5
             | else   = 0
    • output

        (:defun add (x)
          (:cond ((:infix == x 3) 1)
                 ((:infix == x 4) 5)
                 (else            0)))

Case

  1. basic

    • input

        let foo =
           case x of
           | (A (B {a, b})) -> a + b
           | A (B c) (C d)  -> c + d
    • output

        (:defun foo ()
           (case x
              ((A (B (:record (a) (b)))) (:infix + a b))
              ((A (B c) (C d))           (:infix + c d))))

Notes

Design Principles

Overall the goal of the S-expression incantation is twofold 1.

  1. An unambiguous syntax that is faithful to the original ML syntax

    • This is a necessity to have limited strife between the ML syntax. and the work we need to do on the generic IR-syntax layer.

    • Unambiguous in this case implies bijection, as we have to convert between the two forms.

  2. A fully realized syntax that is readable and looks as if it were a standalone language.

    • This is needed so inspection of the output is obvious. Further this will empower ease of reasoning over the syntax without too many surprises.

On Verbosity

  • Overall a few forms in this representation are quite verbose and thus harder to read than they should be, see

    
      (:arrow
       (:infix : hd
               (:infix -> x
                       (:infix : cdr
                               (:infix -> (foo y) (foo x y z))))))
    
      ;; should be
      [ hd : x -> cdr : (foo y) -> (foo x y z) ]
    
      (:record
       y-axis y
       x-axis x
       z-axis z)
    
      ;; should be
      {y-axis y
       x-axis x
       z-axis z}
  • This is due to lacking reader macros at this level. Overall this can be largely improved in the future for better inspection (we thus will offer another level of inspection that can do away with the infix view!)

On Forms

The syntax is heavily inspired from Common Lisp (CL). There are of course a plethora of changes, however some key concepts have survived through this transition.

Namely for adding extra property based information in declarations we got for an alist concept

  (type (foo :type (:infix -> typ (:infix -> typ typ))) (x y z)
     (Foo a b c))

which is taken directly from CL's defstruct construct.

Even naming conventions like defun are lifted directly from Common Lisp.

With that said, many forms (namely the lets) lack an extra set of parenthesis when in comparison to CL and the Schemer language.

  (let fi (x) (:infix + 3 x)
      (fi y))

  (let fi () 3
      fi)

This is due to us fully abusing the fact that we must represent non let blocks currently along with having a single expression each name is bound to.

Inconsistencies

There are likely inconsistencies between some of the forms. These should be removed as soon as possible. I have tried my best on this front, but likely some forms are inconsistent.

Known

Signatures

Currently we have a minor point of alists on the name itself to describe extra properties

  (type (foo :type (:infix -> typ (:infix -> typ typ))) (x y z)
     (Foo a b c))

This is great and we'll be adding it to Modules, however we currently do an odd transformation in regards to signatures

  (:defsig foo (:usage 0 (:infix -> int (:infix -> int int))))
  ;; should be
  (:defsig (foo :usage 0) (:infix -> int (:infix -> int int)))

This likely should be changed

Footnotes


1

The Whimsical spell like nature of sexpressions are heavily derived from SICP and the general imagery of other literature such as the dragon book.

2.5.1 - Core S-Expression Syntax

The concrete syntax of Juvix once we Hit Core, namely the Named version (HR)

Generation

Term

Star

  • Haskell ADT

      = -- | (sort i) i th ordering of (closed) universe.
        Star Universe
  • ML Syntax

      -- Universe 0
      ×₀
      -- Universe 1
      ×₁
      -- Universe n
      ×ₙ
  • S-expression Syntax

      ;; universe 0
      (:star 0)
      ;; universe 1
      (:star 1)
      ;; universe n
      (:star n)

Prim type

  • Haskell ADT

      | -- | PrimTy primitive type
        PrimTy primTy
  • ML Syntax

      primTy Michelson.Int
  • S-expression Syntax

      (:prim-ty Michelson.add)

Prim

  • Haskell ADT

      | -- | primitive constant
        Prim primVal
  • ML Syntax

      prim Michelson.Add
  • S-expression Syntax

      (:prim Michelson.add)

Π / Pi

  • Haskell ADT

      | -- | formation rule of the dependent function type PI.
        -- the Usage(π) tracks how many times x is used.
        Pi Usage NameSymbol.T (Term primTy primVal) (Term primTy primVal)
  • ML Syntax

      (Π (0 | typ : ×₀) (Π (1 | _ : typ) typ))
  • S-expression Syntax

      (:pi (typ 0 (:star 0)) (:pi (_ 1 typ) typ))

Lam

  • Haskell ADT

      | -- | LAM Introduction rule of PI.
        -- The abstracted variables usage is tracked with the Usage(π).
        Lam NameSymbol.T (Term primTy primVal)
  • ML Syntax

      -- Note we don't have (λ x : unit -> x) : unit -> unit, as that requires
      -- the type to be given in the Haskell ADT, and is thus handled in Π
      λ x  
  • S-expression Syntax

      (:named-lambda x :unit)

Sig

  • Haskell ADT

      | -- | Dependent pair (Σ) type, with each half having its own usage
        Sig Usage NameSymbol.T (Term primTy primVal) (Term primTy primVal)
  • ML Syntax

      Π| x : A) 
        Σ| y : B) 
         C x y
  • S-expression Syntax

      (:pi (x ω A)
         (:sigma (y ω b)
            (C x y)))

Pair

  • Haskell ADT

      | -- | Pair value
        Pair (Term primTy primVal) (Term primTy primVal)
  • ML Syntax

      f x y, g x y
  • S-expression Syntax

      (:pair (f x y) (g x y))

Let

  • Haskell ADT

      | -- | Let binder.
        -- the local definition is bound to de Bruijn index 0.
        Let Usage NameSymbol.T (Elim primTy primVal) (Term primTy primVal)
  • ML Syntax

      let 2 | a = f x y in
        <a,a>
  • S-expression Syntax

      (:named-let (a 2 (f x y))
        (:pair a a))

Unit type

  • Note

    Should this be treated specially?

  • Haskell ADT

      | -- | Unit type.
        UnitTy
  • ML Syntax

      Unit
  • S-expression Syntax

      :unit-type

Unit

  • Note

    Should this be treated specially?

  • Haskell ADT

      | -- | Unit Value
        Unit
  • ML Syntax

      
  • S-expression Syntax

      :unit

Elim

  • Should we even have Elims with this strategy, we know by exclusion, an explicit tome for it seems needless? Depends how the code turns out.

Record

  • Haskell ADT

      | -- | Constructor for a record type.
        Record [(Symbol, Term primTy primVal)]
  • ML Syntax

      {name = term, name = term, , nameₙ = termₙ}
  • S-expression Syntax

      (:record name term name term  nameₙ termₙ)

RecordTy

  • Haskell ADT

      | -- | Record type. (It's a list not a map because the dependency order
        -- needs to be preserved.)
        RecordTy [(Symbol, Usage, Term primTy primVal)]
  • ML Syntax

      {name usage : term, name usage : term, , nameₙ usageₙ : termₙ}
  • S-expression Syntax

      (:record-ty
        (name usage term)
        (name usage term)
        ...
        (nameₙ usageₙ termₙ))

Elim

Var

  • Haskell ADT

      = -- | Named Variables
        Var NameSymbol.T
  • ML Syntax

      λ x  x
  • S-expression Syntax

      (:named-lambda x x)
    • We treat vars as simple any atom in the list that isn't a special form.

App

  • Haskell ADT

      | -- | elimination rule of PI (APP).
        App (Elim primTy primVal) (Term primTy primVal)
  • ML Syntax

      f a b
  • S-expression Syntax

      (f a b)
    • We simply represent application as the car of the list if it's not a special form

Ann

  • Haskell ADT

      | -- | Annotation with usage.
        Ann Usage (Term primTy primVal) (Term primTy primVal) Universe
  • ML Syntax

    | f x y : C x y : * 0)
  • S-expression Syntax

      (: (f x y) (:usage ω :universe 0)
         (C x y))
    
      ;; Should we adopt it without names?
      (: (f x y) (ω 0)
         (C x y))
    • Here we have a choice, I′m not sure which one I prefer, the first is more explicit, but may be a bit too verbose

Lookup

  • Haskell ADT

      | -- | Project an element from a record type.
        Lookup (Elim primTy primVal) Symbol
  • ML Syntax

      name.elim
  • S-expression Syntax

      (:lookup elim name)

Value

Instead of giving each variant a section, we will instead note that Values relate to Terms, except instead of an Elim they have a Neutral.

Thus, it's an extension of Term.

Neutral

Neutral is the same as Value, except we remove annotations from the possibilities

Pattern

Constructor

  • Haskell ADT

      = PCon GlobalName [Pattern primTy primVal]
  • ML Syntax

      (Cons a as)
  • S-expression Syntax

      ;; No other form has an unattended name with arguments, so this is
      ;; safe
      (Cons a as)

Pair

  • Haskell ADT

      | PPair (Pattern primTy primVal) (Pattern primTy primVal)
  • ML Syntax

      -- We have syntax for multi pairs
      x, y, z
      -- This is just an alias for
      x, y, z››
  • S-expression Syntax

      (:pair x y z)
      ;; This is just an alias for
      (:pair x (:pair y z))

Unit

  • Note

    Should this be treated specially?

  • Haskell ADT

      | PUnit
  • ML Syntax

      
  • S-expression Syntax

      :unit

Var

  • Haskell ADT

      | PVar PatternVar
  • ML Syntax

      (Constructor a b c)
    
      a
    
      z
  • S-expression Syntax

      (Constructor a b c)
    
      a
    
      z

Dot

  • Haskell ADT

      -- Deduction by the unifier, that the term should already be
      -- known
      | PDot (Term primTy primVal)
  • ML Syntax

      -- Taken From
      -- data Double (n : ℕ) : ℕ → Set where
      --   double : Double n (n + n)
    
      -- foo : ∀ m n → Double m n → {!!}
      -- foo m .(m + m) double = {!!}
    
      .(m + m)
  • S-expression Syntax

      (:dot (+ m m))

Primitive

  • Haskell ADT

      | PPrim primVal
  • ML Syntax

      (Prim +)
  • S-expression Syntax

      (:prim +)

Raw Globals

  • This is the target that the context type should hold. However we have to note a few facts about how things are stored.

  • We don't really have ML syntax counterparts for these, so instead we will give the raw structure as the syntax.

FunClause

  • In this the full type has many overlapping elements, thus we should just steal the RawFunClause field

  • Haskell ADT

      data RawFunction' ext primTy primVal = RawFunction
        { rawFunName :: GlobalName,
          rawFunUsage :: GlobalUsage,
          rawFunType :: Term' ext primTy primVal,
          -- The value we want
          rawFunClauses :: NonEmpty (RawFunClause' ext primTy primVal)
        }
        deriving (Generic)
    
      -- The Type that we will emulate
      data RawFunClause' ext primTy primVal = RawFunClause
        { rawClauseTel :: RawTelescope ext primTy primVal,
          rawClausePats :: [Pattern' ext primTy primVal], --TODO [SplitPattern]
          rawClauseBody :: Term' ext primTy primVal,
          rawClauseCatchall :: Bool
        }
        deriving (Generic)
  • Haskell Context ADT

      data Def term ty = D
          -- This mimics rawFunUsage
        { defUsage :: Maybe Usage.T,
          -- This mimics rawFunType
          defMTy :: Maybe ty,
          -- This field will mimic rawFunClause'
          defTerm :: term,
          defPrecedence :: Precedence
        }
        deriving (Show, Read, Generic, Eq, Data)
  • Raw Fun Clause Example

      RawFunClause
        { rawClauseTel = []
        , rawClausePats =
            [ PVar "x" ]
        , rawClauseBody = Elim'
                          ( Var "x" )
        , rawClauseCatchall = False } :| []
    • Currently I can't get a better example sadly.

  • S-expression Syntax

      (:raw-fun-clause () (x)
        x
        :false)

2.6 - Syntax

This is the BNF of Juvix’s syntax

Syntax

  header ::= mod <symbol> where <top-level>*
           | <top-level>*

  top-level ::= <type>
              | <module-open>
              | <type-class>
              | <type-class-instance>
              ; These four last, due to lack of words before
              | <module-signature>
              | <infix-declare>
              | <module>
              | <module-open>
              | <signature>
              | <function>

  ;;;; Declarations ======================================================

  declarations ::= declare <declare-adt>

  declare-adt ::= <infix-declare>

  infix-declare ::= infix <symbol> <number>
                  | infixl <symbol> <number>
                  | infixr <symbol> <number>

  declaration-expression ::= <declarations> in <expression>

  ;;;; Functions ======================================================

  ;; let until indent sensitivity is gone
  function ::= let  <function-body>

  function-body ::= <name> <args>+ = <expression>
                  | <name> <args>+ <cond-logic>*

  ;; Match-logic could just be a name, and so is thus fine
  ;; # is for implicit
  ;; Should we have implicit pattern matching like here?
  args ::= <match-logic>
         | #<match-logic>

  ;; sig only here while indent sensitivity is not here
  ;; if we only have usage in arrows, we should allow a way to have
  ;; usage information of variables as well!
  signature ::= sig <name>         :                  <expression>
              | sig <name>         : <constraint> =\> <expression>
              | sig <name> <usage> :                  <expression>
              | sig <name> <usage> : <constraint> =\> <expression>

  constraint ::= <type-name>
               | ( <constBody>* )

  constBody ::= <type-name> , <type-name>
              | <type-name>
  ;;;; Types ==========================================================

  type ::= <data-declaration>

  ;; Should we let there be a refinement in adt type declaration?
  ;; ? means it may or may not be there
  ;; it makes sense to have the number here, as it can't be confused with an
  ;; expression type
  data-declaration
    ::= type u#<usage>? <non-capital-name> <symbols>+ : <expression> = <adt>
      | type u#<usage>? <non-capital-name> <symbols>+                = <adt>

  adt ::= <sum1> <sum>*
        | <product1>


  sum1 ::= <name>
         | <name> <product>

  ;; | is special in bnf
  sum ::= \| <name>
        | \| <name> <product>

  product1 ::= { <name-type-comma>+ <name-type>* } -\> <type-name>
             | { <name-type-comma>* }              -\> <type-name>
             | { <name-type-comma>+ <name-type>* }
             | { <name-type-comma>* }


  product ::= { <name-type-comma>+ <name-type>* } -\> <type-name>
            | { <name-type-comma>* }              -\> <type-name>
            | { <name-type-comma>+ <name-type>* }
            | { <name-type-comma>* }
            | : <expression>
            | <expression''>*

  name-type      ::= <name> <usage>?  <type-signature>
                   | #<name> <usage>? <type-signature>
  name-type-comma ::= <name-type> <comma>


  ;; { } here are a refinement type!
  type-refine ::= <expression> { <expression> }
                | <expression>
                | <name> : <expression>
                | #<name> : <expression>
                | <name> : <expression> { <expression> }
                | #<name> : <expression> { <expression> }

  ;;;; Arrows =========================================================

  arrow ::= -<usage>-\>

  ;;; Modules ========================================================

  ;; For all intensive purposes, modules are the same as values, just with
  ;; top level expressions, and a minimal amount of sugar

  ;; This doubles as our import
  module-open ::= open <module-name>


  module-open-expression ::= open <module-name> in
                           | <module-name>.( <expression> )

  ;; We are going to make modules a bit more interesting, syntax wise
  ;; imagine modules were functions with capital name to delineate
  ;; thus module signatures have the same signature look as functions

  ;; Interestingly enough module "functors" can take more than just modules
  ;; they can take any value, however for examples, we will encourage the use
  ;; of taking (and thus parameterizing) modules

  ;; let and end are there until indent sensitivity is in
  module ::= mod <name> <args>+ = <top-level>* end
           | mod <name> <args>+ <cond-top>* end


  ;; we use mod here to disambiguate it for parsing speed
  moduleE ::= mod <name> <args>+ = <top-level>* end in <expression>
            | mod <name> <args>+ <cond-top>* end in <expression>

  ;; what should we allow in a module signature?
  ;; begin and end are only here while we aren't indent sensitive



  module-signature ::= <module-signature-helper> Module <sigs-and-types>+ end

  sigs-and-types ::= <sig>
                   | <module-signature>
                   | <type>

  module-signature-helper
    ::= sig <name>           :                 <expression>
      | sig <name>           : <type-name> =\> <expression>
      | sig <name> <usage-f> :                 <expression>
      | sig <name> <usage-f> : <type-name> =\> <expression>


  cond-top ::= \| <expression> = <top-level>*
  ;;;; Types Classes ==================================================

  ;; Need end if we are indent sensitive!
  type-class ::= class <type-name> where
               | class <type-name> =\> <type-name> where

  ;; Need end if we are indent sensitive!
  type-class-instance ::= instance <type-name> where

  ;;;; Expressions ====================================================

  ;; See comments about which to keep and which to maybe remove
  expression'' ::= <match>
                 | <if>
                 | <cond>
                 | <record-access>
                 | <module-lookup>
                 | <let>
                 | <moduleE>
                 | <let-type>
                 | <module-open-expression>
                 | <where>
                 | <string>
                 | <number>
                 | <lambda>
                 | <tuple>
                 | <list>
                 | <parens>
                 | <symbol>
                 ; This is useful for having nested do's or matchs
                 | <block>
                 | <do>
                 | <comments>
                 | <arrow>
                 | <infix>
                 | <record-creation>
                 | <type-refine>
                 ; TODO
                 | <record-update>
                 | <declaration-expression>

  expression ::= <application>
              | <expression''>

  usage ::= <expression>

  usage-f ::= <constant> | ( <expression> )

  record-access ::= <name>.<name>

  module-lookup ::= <module-name>.<name>

  application ::= <name> <expressions>*

  lambda ::= \\ <match-logic>* -\> <expression>

  symbol ::= <name>

  ;; useful for match, and nested do's!
  block ::= begin <expression> end

  do ::= <do-body>*

  do-body ::= <exprsesion> \; <expression>

  list ::= [ <command-list>* ]

  commad-list ::= <exprsesion> , <expression>


  tuple ::= ( <command-tuple>* )

  commad-tuple ::= <exprsesion> , <expression>

  parens ::= ( <expression> )

  comments ::= -- <any-text-not-new-line> \n
             | \n-- <any-text-not-new-line> \n
             | <comments-rec>

  comments-rec ::= <multi-comments>
                 | {- <comments-rec> -}

  multi-comments ::= {- <any-text-not-{-> -}

  infix ::= <expression> <inifx-name> <expression>

  record-creation ::= { <name-set-comma>* }


  name-set-comma ::= <name-set-creation> ,
                   | <name-set-creation>

  name-set-creation ::= <name> = <expression>
                      | <name>


  ;;; Matching ===================================

  match ::= case <expression> of <match-l>*

  match-l ::= \| <match-logic> -\> <expression>

  match-logic ::= <name>@<match-logic'>
                | <match-logic'>

  match-logic' ::= ( <match-logic''> )
                 | <match-logic''>

  match-logic'' ::= <record-match>
                  | <constructor-name> <match-args>+
                  | <constant>

  match-args ::= <name>
               | <match-logic>
               | <constant>

  record-match ::= { <name-set-comma-match>* }

  name-set-comma-match ::= <name-set> ,
                         | <name-set>


  name-set ::= <name> = <match-logic>
             | <name>

  ;; we should remove either if or cond!?
  if   ::= if   <cond-logic>*
  cond ::= cond <cond-logic>*

  constant ::= <string>
             | <number>
             | <char>

  ;;; Bindings ===================================

  ;; Due to trying to be less indent sensitive,
  ;; we only look for the in alternative,
  ;; is that we only have a single binding per let.
  let ::= let <function-body> in <expression>

  type-let ::= let <type> in <expression>

  ;; Does this even make sense to have?
  ;; Juvix is not lazy, how is order determined?
  ;; is it only for pure values???
  where ::= <expression> where <bindings>*

  binding ::= <match-logic> = <expression>


  ;; note it's fine to use |,
  ;; as matches have to be a pattern,
  ;; and thus not some expression

  ;; note in stdlib else and otherwise will both be true

  cond-logic ::= \| <expression> = <expression>

  ;;; Numbers ====================================

  number ::= <digits>*.<digits>*
           | <digits>*<exp>
           | <digits>*.<digits>*<exp>


  digits ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9


  exp ::= e <digits>*
  ;;; Strings ====================================

  ;; Give nicer string syntax?
  string ::= " <escaped-string>+ "

  escaped-string ::= <ascii-no-quotes-no-backslash> <escaped-string>+
                   | \" <escaped-string>+
                   | \ <escaped-string>+

  ;;; Universe ====================================

  ;; for now, set it up to what F* has, expand it later
  universe-expression ::= u#<name>
                       | u#<name> + u#<name>
                       | max u#<name>*

  ;;;; Misc ===========================================================
  ;; ; is comment in bnf
  comma            ::= ,
  semi             ::= \;
  name             ::= <utf8-non-reserved>
  non-capital-name ::= <utf8-no-capital>
  capital-name     ::= <utf8-capital>
  ;; may want m e or Map.t int string?
  type-name   ::= <name> <others-names>+
  infix-symbol ::= <utf8-infix>

  module-name ::= <name> ; enforce capital names?

  constructor-name ::= <capital-name-and-symbols>

  utf8-infix        ::= `<utf-non-reserved>`
                      | <UTF.Symbol>
  utf8-non-reserved ::= <UTF.Alpha>
                      | (<utf8-infix>)
  utf8-no-capital   ::=
  utf8-capital      ::=

3 - Idioms

Reasons for various idioms in the codebase

The document to see the rational for any of our idioms

Qualified Imports

Throughout the codebase every import except the standard library is imported qualified. This allows us the flexibility of having common naming patterns between modules.

Some of the more obvious ones are

  • T

  • op

  • of<Type>

  • to<Type>

  • etc.

Further we often import our modules with long names, like Namesymbol for Juvix.Library.NameSymbol.

Consider the following example if we have short qualification names.

  import qualified Juvix.Library.NameSymbol as N


  ... 100+ lines of logic ...


  foo :: N.T -> Symbol -> Symbol
  foo ns symb = N.toSymbol <> symb


  ... more lines of logic ...

In complicated logic where we have over a dozen imports, it's quite taxing to see N.toSymbol, as what is this N? However it becomes very clear if we saw NameSymbol.toSymbol

  import qualified Juvix.Library.NameSymbol as NameSymbol


  ... 100+ lines of logic ...


  foo :: NameSymbol.T -> Symbol -> Symbol
  foo ns symb = NameSymbol.toSymbol <> symb


  ... more lines of logic ...

Further it makes our standard idioms of to<Type> (toSymbol) become a bit harder to reason about, what is going to Symbol in this instance? We'd be forced to have names like N.NameSymboltoSymbol, essentially removing any standardization of module's and ensuring that we import everything unqualified.

Module.T

Most modules contain a type T within them. For example we have NameSymbol.T. The T represents the quintessential type of the module. The decision is a consequence of having long qualified names for each module imported.

Consider the following scenario where we did not make this choice, and instead had the full name.

  import qualified Juvix.Library.NameSymbol as Namesymbol


  ... 100+ lines of logic ...


  foo :: NameSymbol.NameSymbol -> Symbol -> Symbol
  foo ns symb = NameSymbol.toSymbol <> symb


  ... more lines of logic ...

Here we are quite tempted to import the module as unqualified for this one type, as it's obvious what we mean! Further, it might encourage short import names like N which we discuss in the previous section.

The name T over Ty or any other variant comes from OCaml where they name all their types T. This in OCaml has practical benefits for their module functors which expect the module to have some principled type T to be generic over. This has the side effect that principled functions and types are named consistently throughout modules, which helps us get more of an intuition about the code without having to ask what is the point of any particular module.

Capability

Capability is our replacement for MTL in the codebase. See the article linked below for a detailed explanation of the benefits over MTL.

More can be read in our sub article below

3.1 - Capability

Further expanding upon why we chose capability

Usage of the Capability Library

  • Historically Haskell has been written with the mtl library. This library allows Monads to stack on top of each other in a linear fashion. This is very nice, as it allows the pluming of how each layer interacts to be abstracted away. However these advancements do not come without costs which are outlined below

    • Downsides with MTL

      • ReaderT design pattern.

        • This means that as our monad stacks grow and our state monad fails to deal with concurrent programming, we will have to rewrite the backend to use IO over a single Reader Monad.

      • Explicit heavy use of lenses

        • Due to every state being forced into a single reader monad, and not wanting to couple every function with everything in the reader environment, lenses with views into the struct itself must be used instead.

      • Inefficiencies in execution

        • Due to MTL Working as a list of effects rather than a set of effects, the more layers one adds to one's MTL stack, the slower the execution speed is.

        • Even worse, is that the writer monad always leaks space and thus should be avoided

  • Solving These Issues with the Capability library

    • Fundamental ideas

      • Instead of having a list of monads one linearly stacks, we instead have a `set` of effects. This set may contain that a database connection is read-only or that this shared thread resource is readable and writable.

      • The set of effects is in reality, a single monad, with a view into the structure itself. So, a read-only view mimics the reader monad, while a write-only view mimics the writer monad.

      • This allusion is achieved by using a new feature in GHC 8.6 known as deriving via, which allows us to derive the proper interface and `effect` for each field in our environment.

    • ReaderT design pattern

      • This issue goes away, as the "State" effect could be implemented as either a proper state monad (we currently do this in the source code) or as a reader monad. This choice of how some computation gets computed is separated from the function logic itself.

    • Explicit heavy use of lenses

      • This issue goes away, as the deriving via extension derives the correct interfaces for us in a straightforward way

        • Currently it is a bit tedious, but it's obvious in how tedious it is, thankfully.

    • Inefficiencies in execution

      • GHC thankfully knows how to optimize a single layer monad like Reader or State very well, so performance should be great.

Further Improvements

Currently the capability library only displays each field of a record as an effect. Ideally we'd have effects explain certain kinds of computation instead, and logically bundle these fields into some kind of greater effect.

Tomas has brought to my attention of the eff library which may some day replace or aid our usage of capability.

4 - Juvix

The one stop shop for all of Juvix documentation

This is a placeholder page that shows you how to use this template site.

Tutorials are complete worked examples made up of multiple tasks that guide the user through a relatively simple but realistic scenario: building an application that uses some of your project’s features, for example. If you have already created some Examples for your project you can base Tutorials on them. This section is optional. However, remember that although you may not need this section at first, having tutorials can be useful to help your users engage with your example code, especially if there are aspects that need more explanation than you can easily provide in code comments.

5 - Overview

An overview of the compiler pipeline process.

The document is laid out so the first section of each part is a small overview, and sub-pages below that are more detailed explanations

Overview

The documents in this section give a brief and detailed explanation of many parts of our compiler. The more detailed explanations have some overlap with the design section.

Getting Started

Getting started with the Juvix codebase can be confusing to say the least. Thus we outline here the order of sub pages that may be useful

  1. Pipeline

    • Gives an overview and the pipeline page

  2. Frontend

    • Talks about the Parsing and S-expression transformation phase.

  3. Translate

    • Talks about the Passes

  4. Type Theory

  5. Typechecker

  6. Core

  7. Backends

    1. Michelson

  8. Misc

    1. S-expression

5.1 - Core

Talks about various aspects of the Core code-base

Type Theory

Our Core type theory, called "Quantitative Type Theory" ("QTT"), is derived from the paper I Got Plenty o’ Nuttin’. Modifications and extensions are documented here.

Extensions/Annotations

Core works off a system of annotations/extensions. These currently are crucial for understanding what happens to the data structures in Core.

More can be read about our extensible system here.

HR

HR is the first step into core from the frontend language. This phase is a lot like IR (Read below), however it includes names rather than De Brujin indices. Due to this, this phase is rather short lived.

Characteristics of HR

  • Is an Extension of HR with Names

  • Short Lived

IR

IR is the representation of Core in which all other extensions are based off of. In order to facilitate fast evaluation Core is broken up into terms and values. Values are like terms but they have no β-redexs via type construction. Thus they statically rule out λ being applied to arguments. This means values are in normal form.

Characteristics of IR

  • De Brujin indices

  • Extensions via Extensible

  • Idea of terms and values

    • Values have no β-redexs by construction

      • Free name can still be there, and could be inlined

      • Impossible to reduce if the Free names are a data type function

Extension Layout and mapping

  • Core-Diagram.jpg
    • In this diagram we can see various mappings that happen through the extensions of core.

  • Main representation

Type Checker

Type checking is the first phase of IR. This phase annotates every node with its type and usage. To facilitate fast type checking every type annotated is a fully evaluated value. Other phases like the unifier use this concept as well.

Erased

After the code gets pass through IR, we then want to erase all the type annotations we've built up. To do this, we send the code through the Erased representation. Unlike HR, Erased is not an extension of IR.

Erased also erases all forms which have 0 usages. Since usages are not first class yet, this is a rather simple.

Another key characteristic of Erased is that types are separate from Terms as they are currently assumed to have 0 usages. Thus types are always erased and separated out.

  • This may change with time.

ErasedAnn

  • This is the backend specific form that Michelson uses.

5.1.1 - HR

Replaces bound and free with just names, has names in the binders

  • Extension of IR

  • replace free and bound by var

  • Contains a map of namesybmols

Place for Pretty printing

Substitution

Translates into IR

  • State monad with traversal

5.1.2 - IR

5.1.2.1 - Representation

Representation

Global

  • A lot of the global information predates the frontend context. A lot can probably be integrated.

  • Raw Globals Type is a term, which may not be correct (what the user wrote), Regular global the types are values, which can be checked for equality eaiser.

    • Only safe to eval something after it's been type checked

    • Way to unify?

5.1.2.2 - Extensions

Extensions

Extensions are an important part of Core IR. They allow certain kinds of transformations

Transformation

5.1.2.3 - Evaluator

Evaluation of the internal representation in Juvix plays an important part in the compilation pipeline: it simplifies the program before passing it to the backend. As the internal representation is closely related to the λ-calculus with de Bruijn indices, the evaluator implements all necessary functions to do so.

In addition to the evaluation functions, the module provides functions for inlining globals (inlineAllGlobals / inlineAllGlobalsElim), turning function definitions into anonymous functions (toLambda), lookup functions for variables and filtering functions for extensions.

For more information on the λ-calculus with de Bruijn indices, see Wikipedia article. This document will describe the way the theory is implemented for Juvix, and expects a basic understanding of the λ-calculus with de Bruijn indices.

Substitution boils down to replacing the variables bound under the current λ, after which the free variables in the term need to be updated to reflect the removal of an abstraction layer, this is the task of weakening.

Substitution

Substitution lies at the core of evaluation, replacing bound variables with terms. Currently the compiler contains several closely related substitution implementations, both for terms and patterns. All of them serve the same purpose, but differ slightly in their types.

Term substitution

  • substWith: Substitutes any term a with an IR.Elim', but results in a term of type a.

  • substTermWith: Substitutes any term a with an IR.Elim', and results in an IR.Elim'.

  • gsubstWith: Generic substitution of f t with an IR.Elim', and results an f t again. This generic is used for automatic deriving of instances using GHC.Generics.

Pattern substitution

The functions are for substituting patterns, each of them take:

  • A number indicating the current traversal depth.

  • A IR.PatternMap containing pairs of pattern variables and terms.

All the functions return an Either with a PatternVar in case not substitution has happened, and the resulting term in case of a substitution.

  • patSubst': Substitutes any term a with any of the patterns provided in the map, and results in an a.

  • patSubstTerm': Substitutes any term a with any of the patterns provided in the map, and results in an IR.Term'.

  • gpatSubst': Generic substitution any term f t with any of the patterns provided in the map, and results in an f t. This generic is used for automatic deriving of instances using GHC.Generics.

Weakening

Weakening or shifting of de Bruijn indices is a central part of the evaluation. After performing β-reduction, any free variables need to be decreased to account for removal of the λ-abstraction.

The basic weakening function is weakBy', which takes:

  • An natural number, denoting the amount to shift free variables.

  • A cutoff value, for which all variables less than or equal are bound, and all variables greater than that are free. Typically this value will be 0, to denote that we want to account for removal of the outermost abstraction.

  • A term a to perform weakening / shifting on.

weakBy' will return a term a again, on which the weakening has been executed. The generic function gweakBy' is similar, but takes an f t as term and returns an f t. It is used for automatic deriving of instances using GHC.Generics.

5.1.3 - Original QTT Erasure algorithm

This article explains the QTT Erasure algorithm in a higher-level language for an easier digestion

Original QTT Erasure algorithm

Disclaimer This text only explains the QTT Erasure algorithm, you must already be familiar with how QTT works.

Initially, the algorithm assumes that:

  • any typechecked term $t : \rho > T$ elaborates to a program p1;
  • $\rho \neq 0$, since you want to make sure that your program actually has something to execute before translating it into runnable code;
  • the translation is into the untyped $\lambda$-calculus.

The whole algorithm traverses the tree of terms elaborating programs – given that dependent types are the zero resource being erased. For a more complex language, we should probably not merge QTT erasure with an actual elaboration phase.

The only terms affected by erasure are binding-dependent terms, in McBride’s calculus, it’s $\lambda$ abstraction and application.

The algorithm goes as follows:

  • for $\lambda$-abs with a zeroed resource,
    • you have a variable $0 > x : S$ 2 bound to a body $\rho t : T$;
    • it elaborates to a program p of type $T$. Meaning, that $x$ won’t be consumed in p – as it was evaluated already – and, then, $p$ can proceed.
  • for $\lambda$-app with a zeroed resource,
    • you have a function $f$ that takes a variable $0 x : S$;
    • while within your context, you have a zeroed value $s : S$;
    • given $f[s : S/x]$, p elaborates to the body $t : T$ of $f$. Meaning that what’s left for $f$ is the returning function when $s$ is applied.
    • for all remaining terms, elaborate it as is.

The main idea of the paper is to pre-bind zeroed resources, making it unavailable to the elaborated program p. The main trick of the paper is to pair zeroed resources from $\lambda$-abs with zeroed resources from $\lambda$-app. McBride’s paper provides a proof that a zeroed application will always meet a zeroed $\lambda$ abstraction.

Resources

  1. I Got Plenty o Nuttin by Conor McBride

  1. McBride’s paper actually splits the whole semantics into checking and synthesis, following bi-directional typechecking. Implementing the recipe bi-directionally would be ideal since it provides a literal algorithm, here we explain it by assuming both checking and synthesis mean just checking. ↩︎

  2. The 0 prefix means that the term has 0 usage. ↩︎

5.1.4 - HR IR Conversion

This article goes over some of the idea of HR to IR conversion

HR ⟷ IR Conversion

The Pass which can be found here deals with the conversion between Core HR and Core IR.

HR ⟶ IR

This phase simply removes the names from the HR form and constructs a De-Brujin index from the names given in HR.

  • The algorithm for this for Terms and Elims is rather simple

    1. When we see a binder (Pi, Lam, Sig, or Let), push the name to the stack for the form in which it is scoped over. Recurse and form and reconstruct said binder.

    2. For forms without names, recurse and reconstruct.

    3. When a name appears, look up it's index from the stack, this is the De-Brujin index we care about. If the Index is not there then we mark it as a Global

HR-Pattern's work in a different way, since patterns are disconnected from terms themselves and do not interact, they work on a system of registering pattern variables, when we come across the binder form =

Outstanding Questions

  • Since for Terms and Elims we don't account for pattern variables, and just mark everything unknown as Global, how does Pattern's affect this? It seems we would mark any PatternVar above this form as a Global and lose that information

IR ⟶ HR

This conversion adds back names to the IR form. This is mainly used for Pretty Printing.

  • The algorithm for Terms and Elims is once again rather simple

    1. When we see a binder (Pi, Lam, Sig, or Let), generate a new name, and push it to the stack. Recurse and form the reconsturct said binder

    2. Forms without names just recurse and reconsturct.

    3. When a name appears, lookup the index from the stack, that position is the name which we care about.

5.1.5 - Type Theory

The type theory implemented by Juvix

5.1.5.1 - Foundations

Foundations of Juvix’s type theory

Quantitative Type Theory

Core's type theory, called "Quantitative Type Theory" ("QTT"), is derived from the paper I Got Plenty o’ Nuttin’. Further references for QTT include:

5.1.5.2 - Inductive data types

Juvix’s inductive data types

Juvix extends QTT with inductive data types.

5.1.6 - Typechecker

Core IR types and type-checking

5.1.6.1 - Overview

Overview of Typechecker’s place in Core

Where Typechecker fits into the pipeline

The typechecker is called via typeTerm, which in turn is called by typecheckErase and typecheckEval, ultimately via coreToAnn, which in turn is called by the backend-specific typecheck functions (which will soon be factored out into a common Pipeline.typecheck function), which fits between Pipeline.parse and Pipeline.compile.

I believe that once function typechecking is implemented, typecheckErase and typecheckEval will also need to call typeCheckDeclaration.

Overall structure

IR/Types/Base.hs defines the elements of the syntax of QTT: terms, values, variables, and patterns.

IR/Types/Globals.hs defines the main compound types derived from the Base types: functions and datatypes.

IRAnn/Types.hs defines the typechecker-specific annotations which correspond to the kinds of terms in QTT.

IR/CheckDatatype.hs and IR/CheckTerm.hs enforce the typing rules of QTT.

Interactions with other components

  • The typechecker imports back-end-specific types through Parameterisation.

  • Characteristically of a dependently-typed system, the Typechecker uses types and functions from the Evaluator, for example in:

    • Typechecker's CanTC and Error types

    • typeCheckConstructor, which has to evaluate the terms of the telescope passed to it

    • typeTerm' and typeElim', which must perform reductions on terms to perform type-checking, in accordance with QTT's PRE and POST rules (characteristically of a dependently-typed theory)

Open questions / things I don't understand

5.1.6.2 - Correspondences

Correspondences between Typechecker code and QTT paper

Correspondences between Typechecker and QTT typing/evaluation rules

In this section we examine where the formal elements of Juvix's implementaion of Quantitative Type Theory appear in our implementation. $x=1$

  • Definitions 1 and 2 address generic rig s (ring without negation) as well as the 0, 1, many rig used by Idris-2. Neither of those has an explicit analogue in our code, since we use one specific rig, and not the one used by Idris-2 – rather, we use the natural numbers plus omega, which we call NatAndw.

  • Definition 3 defines sort ordering as a transitive relation on sorts which allows well-founded induction. This will be our subtyping relation, which is defined by the <: operator in IR/CheckTerm.hs.

  • Definition 4 is of term and elimination, which correspond to our Term and Elim defined in IR/Types/Base.hs.

  • Definition 5 is of contraction, reduction, and computation. Contraction is a single step of beta-reduction (substitution) or nu-reduction (stripping an annotation, which we can do once a term is fully computed), corresponding to "small-step" semantics. Computation is the reflexive-transitive closure thereof, corresponding to "big-step" semantics. We refer to computation as Evaluation, so the implementation of computation in our code is evalTerm (see the Evaluator documentation).

  • Definition 6 is of pre-context and context. The former is what is usually just called "context" in type theories – an association list of names to types. In QTT, the context is a mark-up of a pre-context with usages. In our typechecker, the pre-context is Context, and the mark-up is UContext.

  • Definition 7 is of prejudgment, which is what is usually just called "[type] judgment" in type theories; as QTT is presented bidirectionally, there is both a checking prejudgment (a Term is claimed to a given type) and a synthesis prejudgment (a given type is inferred for an Elim). Definition 8 is of a judgment as a prejudgment annotated with a usage. We don't explicitly use the term "judgment" in our QTT code, but the type judgments and usage judgments are captured by the Typechecker's specific extension to Annotation.

  • Definition 9 is of the type checking and synthesis rules. (This is extended later by definition 39, which includes erasure and computation. Those two are primarily subjects of other documents.) Below are some correspondences between the typing rules (from definition 9) and the implementations in our code. The explicit implementation of these rules is concentrated within typeTerm' and typeElim', and their helper functions, in CheckTerm.hs.

    • PRE, POST: These rules, which state that a type may be used in place of another if the latter reduces to the former, are implemented in the IRAnn' case of typeElim', and as calls throughout the cases of typeTerm' to evalTC and substApp (which calls evalTC internally).

    • SORT: The IR.Star' case of typeElim'. Stratification appears not to be fully implemented as of this writing – the call to requireUniverseLT is commented out.

    • VAR: The IR.Bound' and (two – one for globals and one for patterns) IR.Free' cases of typeElim'.

    • FUN: IR.Pi' case of typeTerm'

    • LAM: IR.Lam' case of typeTerm'

    • APP: IR.App' case of typeElim'

    • ELIM: The requireSubtype call in the IR.Elim' case of typeTerm'.

    • CUT: The call to typeElim' itself, in the IR.Elim' case of typeTerm'.

  • Definition 17, of subtyping, is implemented by the <: operator in IR/CheckTerm.hs.

  • Definitions 10-16 and 18-36, and all the Lemmas, all relate to reasoning about the typing system, rather than defining the type system, so they do not have direct analogues in our code (except maybe in some comments!). They could, however, if we were to implement Juvix in a dependently-typed language (or to at least some degree if we were to use Liquid Haskell) – such as Juvix itself.

  • Definitions 37 and on refer to Erasure and to computation (which we call Evaluation), each of which we document as a separate module, so we do not explore them here, except insofar as we have touched on some aspects where exploring definitions 9 and 39 together above.

5.1.6.3 - Modules

Documentation of individual Typechecker modules

Definitions by module (beneath Core/)

Types

Types used throughout the typechecker (at least I think that's the intent):

  • PipelineError

    Includes TypecheckerError, which we currently exit with when a user program is incorrectly typed. (I expect the long-term plan is to produce from it a human-readable error message.)

  • PipelineLog

    Used with @"log" (from Capability.Writer) to log a couple of types of events.

  • TermAssignment, AssignWithType

    Maps from variables to formulae of Elementary Affine Logic (EAL) / Elementary Affine Core (EAL). These terms are translated to interaction nets, which Juvix uses as its execution model.

    I didn't find any uses of these types that weren't ultimately within the InteractionNetIR module, so they should probably be moved there.

IR

Aliases for types and functions from IR/Typechecker.

IR/

  • Types

    Aliases for types and functions from IR/Types/Base and IR/Types/Globals, and some simple queries on them. There is one significant pair of mutually recursive functions here: quote :: Value -> Term and neutralQuote :: Neutral -> Elim. (As we shall see below, a Neutral is a particular kind of Value, and may include a Value, hence the mutual recursion.)

  • Types/

    • Base

      • The definitions in terms of primitive types of several internal types. For example, a BoundVar is represented by a Natural, and a PatternVar by an Int. Name s are either Global s or Pattern s; the implementation of GlobalName is defined in the NameSymbol library.

      • GlobalUsage is defined here and restricted to only Zero or Omega, but there is a TODO comment asking whether globals might have other usages.

      • Several of the most pervasive data types are defined here, all as extensible, in the sense of trees-that-grow, meaning that different modules can extend them in their own ways, to use their own specific types of annotations, producing versions with a ' suffix:

        • Term Since Juvix is dependently-typed, a type (PrimTy) is one kind of term.

        • Elim An Elim may be a bound variable, a free variable, an application of an Elim (to a Term), or a usage annotation. An Elim is also a term (it may be embedded in a Term with the Elim constructor). Elim is so-called because McBride calls this type an "elimination" in I Got Plenty o' Nuttin'.

          Elim may be viewed as a term that can be inferred (synthesized).

        • Value Since Juvix is dependently-typed, a type (VPrimTy) is one kind of value. Value's definition strongly resembles that of Term, but there are differences; for example, there are let Term s, but no let Value.

        • Neutral Neutral is to Value as Elim is to Term: A Neutral is a bound variable, a free variable, or an application of a Neutral (to a Value); a Neutral is also a Value (it may be embedded in a Value with the VNeutral constructor).

          A Neutral may be viewed as a "stuck term": one for which typechecking must be deferred until some other term(s) has/have been typechecked. (Perhaps this is the source of the name "neutral" – it's neither good (definitely typechecks) nor bad (definitely doesn't typecheck) yet?)

        • Pattern Patterns contain PVar s which are bound by successful pattern matches, as well as primitive values, terms, and pairs and lists of Pattern s.

    • Globals

      • Many Constraint-valued definitions which allow multiple instance s of various classes for various types to be declared at once. (For example, RawGlobalAll allows any type which satisfies it to derive Show, Eq, and Data, among others.)

      • Several more pervasive types, built upon those in Base:

        • Pos, representing strict positivity (or otherwise)

        • FunClause', which defines the form of Juvix function definitions. It comprises arguments (a Telescope), a pattern match (a list of Pattern'), a body, a type, and Bool fields indicating whether the clause is a catchall and/or unreachable.

        • A Function' comprises a NonEmpty list of FunClause' s.

        • The widely-used DataArg' includes a name, a usage, and a type.

        • A Datatype' includes a name, DataArg' s (each with a Pos), a universe level, and a list of constructors (DataCon').

        • A type constructor – DataCon' – includes a name, a type, an the function which defines it (whose output type is the type of the constructor).

        • Raw forms of the above. In each case, the Raw form uses Term where the non-Raw (Cooked?) form uses Value.

  • Typechecker

    There's only one definition in this module, but it's a big one: typeCheckDeclaration, which takes a telescope, a list of datatypes, and a list of functions, and calls typeCheckAllCons (see below) on each datatype in the list. If type-checking is successful, it adds the datatype and its constructors to the globals.

    As I write, typeCheckDeclaration is incomplete: it doesn't use one of its parameters, the one comprising a list of function declarations; furthermore, nothing actually calls it yet (except itself, recursively). That is because we simply haven't gotten to writing the typechecking of functions yet (which is also why the backend-specific typecheck functions currently inline everything recursively starting with main).

  • Typechecker/

    Some of the types in Typechecker/ are parameterized over back-ends through the Parameterisation type which each back-end exports.

    • Types

      Contains typechecker-specific extensions of some core types:

      • Annotation': Adds usage and type.

      • BindAnnotation': Adds an Annotation' for both the binder and the term that introduces the binder (Lam or Let).

      • Term': Annotations for each of the constructors of Term s (pairs, pi-types, sigma-tyoes, Elim' s, and so on).

      • Elim': Annotations for each of the constructors of Elim s (bound and free variables and applications of them).

    • Env

      The typechecker's extension of EnvCtx is a GlobalsT, which is a (hash)map of names to Global' s, which in turn may be datatypes, data constructors, functions, or Value s (with names and usages). Env therefore also defines a HasGlobals alias for HasReader "globals".

      The interaction with the evaluator, as required by dependent typing, appears in this module via the CanTC type, which requires instances for Eval.HasSubstValue, Eval.HasPatSubstTerm, and Eval.HasWeak.

      This module also defines a Context as a list of the typechecker-specific form of Annotation. There is also a usage context, UContext, which is a list of Usage.

      Pattern bindings and usages – PatBinds and PatUsages – are defined as IntMap s to annotations and usages, respectively, reflecting the use of de Bruijn indices.

      Env also defines the typechecker's State monad: InnerState, which contains a Parameterisation, a PatBinds map, and a Context. (There's a lot of boilerplate related to it.)

      Env also defines signatures: SigDef is a sum of function, constructor, and datatype signatures. Signature maps global names to SigDef s, so it represents all the signatures in global scope. Typechecking functions return Signature s through the TypeCheck type, which uses Capability, with the name "typeSigs".

    • Error

      Defines which errors can occur during typechecking. TypeMismatch, for example, represents a unification error on a variable. Other errors largely follow the possible types of terms – universe mismatches, usage mismatches, non-function types where function types are expected, incompletely-applied data constructors, and unbound variables. Again because dependent typing makes the typechecker depend on the evaluator, the typechecker may also wrap an error from the evaluator (Core/IR/Evaluator/Types.Error).

      The module also provides pretty-printing and an exception wrapper for the main TypecheckError type.

  • CheckDatatype

    Exports two interfaces, which between them allow the type-checking of datatype declarations; hence, both are called by typeCheckDeclaration :

    • checkDataType

      Typechecks all the parameters of a datatype declaration, ultimately by calling typeTerm on each.

    • typeCheckAllCons

      Typechecks all the constructors of a datatype declaration, ultimately by calling typeTerm on each, followed by evalTerm, because of the potential dependency of types on values.

  • CheckTerm

    Exports typeTerm, through which the typechecker is plugged into the Pipeline, via typecheckEval and typecheckErase, and through them from coreToAnn, which is called from the backend-specific typecheck functions with the appropriate Parameterisation s. (The backend-specific typecheck functions in the current development branch, as I write, have a lot in common, but Alberto is factoring them out as part of PR 821.)

    Internally, typeTerm is implemented in terms of the mutually-recursive typeTerm' and typeElim' (which follow the structures of the mutually-recursive Term and Elim). These two functions are, in a sense, the core of the typechecker: they're the expression in code of the typing rules of QTT. For example, when typeTerm' is given a lambda with expected type A -> B, it checks whether assuming that the variable has type A results in the body correctly typechecking (with a recursive call to typeTerm') with type B.

    The mutual recursion expresses the bidirectionality of the type checking: annotated terms are checked by typeTerm' ; unannotated terms ("eliminations") have their types inferred (synthesized) by typeElim'.

    (Note: as of this writing, there is a bug in the Pair' case of typeTerm' : it doesn't evaluate the left term in checking the type of the right, so it wouldn't handle dependent pairs correctly. I will fix that shortly.)

    One of the types that appears as a frequent return value in CheckTerm.hs is Leftovers. This type keeps track of variables which haven't yet been used as often as their Usage annotation allows – and, if it's not omega, requires! Hence if there are any Leftovers with usages other than omega after a term has been type-checked, that causes a LeftoverUsage TypecheckerError to be thrown.

    CheckTerm.hs is also the location of our definition of the subtyping relation, operator <:.

    Many of the return values of functions in CheckTerm.hs are parameterised over Monad s with constraints such as HasThrowTC' and HasBound. So they expect to be able to throw TypecheckerError s, or to be called as part of the execution of a state monad containing (at least) a list of bindings. That's a pattern which strikes me as crying out for algebraic effects – I think they could make the signatures much more readable, with less boilerplate and less code to compose effects. So I feel this module should be able to "read" better when it's written in Juvix itself!

  • TransformExt

    This module defines an interface for transforming one extension of a Term (in the sense of trees-that-grow) into another extension of Term – an applicative functor on term extensions. Such functors are defined in terms of their operations on each variant of Term, such as primitive, pi-type, let-binding, and so on. Such functors compose (and the TransformExt module define the compose function which composes them). This module also defines a forgetful functor, forgetter, which transforms an extended Term into a Term with no extension (NoExt).

    There are forgetful uses in the Evaulator and Erasure modules, but I haven't found any non-forgetful uses yet.

  • TransformExt/

    • OnlyExts

      OnlyExts is so-called because it defines a Term -transforming functor which discards annotations but preserves extensions.

IRAnn/

  • Types

    The typechecker-specific annotation of terms comprises:

    • A usage (any natural number, or omega)

    • A term of QTT (which might be a type), which is one of:

      • * (a universe, of which there is a natural-number-indexed hierarchy)

      • a primitive type

      • a primitive constant

      • a pi-type (dependent product)

      • a lambda

      • a sigma type (dependent sum)

      • a pair

      • a let-binding

      • the unit type

      • the unit value

      • one of the Elim types:

        • a bound variable

        • a free variable

        • an application of an Elim to a Term

        • a Term annotated with a Usage

      Some of these terms are extensions to core QTT, which, for example, does not appear to me to define explicit pair types, for example.

  • Erasure

    This module just defines two forgetful functors, for Term and Elim, which discard extensions (see TransformExt).

5.1.6.4 - Formalization

Formalization of QTT in Idris-2

Category-theoretic formalization of QTT: "Quantitative Categories with Families"

-- Following Definition 3.1, of "Categories with Families (CwFs)",
-- in _Syntax and Semantics of Quantitative Type Theory_.
record CwF where
  constructor MkCwF
  -- "C is a category..."
  Underlying : Category
  -- "...with a chosen terminal object T"
  Terminal : Object Underlying
  TerminalCorrect : Universality.IsTerminal {cat=Underlying} Terminal
  -- "For delta in Ob(C), a collection Ty(delta) of semantic types"
  SemanticType : Object Underlying -> Type
  -- "For delta in Ob(c) and S in Ty(delta), a collection Tm(delta, S) of
  -- semantic terms"
  SemanticTerm : (object : Object Underlying) -> SemanticType object -> Type
  -- "For every f : delta -> delta' in C, a function
  -- -{f} : Ty(delta') -> Ty(delta)..."
  MapType : (obj, obj' : Object Underlying) ->
            Morphism Underlying obj obj' ->
            SemanticType obj' -> SemanticType obj
  -- "...and for S in Ty(delta') a function
  -- -{f} : Tm(delta', S) -> Tm(delta, S{f})..."
  -- (Implementer's note:  I don't understand why both MapType and MapTerm
  -- (as I'm calling them) are referred to in the paper as "-{f}".  Is that
  -- a typo, or will they just distinguish between MapType and MapTerm by
  -- context?)
  MapTerm : (obj, obj' : Object Underlying) ->
            (f : Morphism Underlying obj obj') ->
            (S : SemanticType obj') ->
            SemanticTerm obj' S ->
            SemanticTerm obj (MapType obj obj' f S)
  -- "Such that both assignments preserve identity and composition"
  MapTypePreservesIdentity : (obj : Object Underlying) ->
                             (S : SemanticType obj) ->
                             MapType obj obj (Identity Underlying obj) S = S
  MapTypePreservesComposition : (a, b, c : Object Underlying) ->
                                (f : Morphism Underlying a b) ->
                                (g : Morphism Underlying b c) ->
                                MapType a c
                                  ((.*) {cat=Underlying} {a} {b} {c} g f) =
                                (MapType a b f) . (MapType b c g)
  MapTermPreservesIdentity : (obj : Object Underlying) ->
                             (S : SemanticType obj) ->
                             MapTerm obj obj (Identity Underlying obj) S = S
  MapTermPreservesComposition : (a, b, c : Object Underlying) ->
                                (f : Morphism Underlying a b) ->
                                (g : Morphism Underlying b c) ->
                                (S : SemanticType c) ->
                                (t : SemanticTerm c S) ->
                                MapTerm a c
                                  ((.*) {cat=Underlying} {a} {b} {c} g f) S t =
                                MapTerm a b f (MapType b c g S)
                                  (MapTerm b c g S t)
  -- "For each object delta in C and S in Ty(delta) an object delta.S (called
  --  the _comprehension of S_)..."
  Comprehension : (obj : Object Underlying) ->
    SemanticType obj -> Object Underlying
  -- "...such that there is a bijection natural in delta':"
  ComprehensionToMorphism :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj' : Object Underlying) ->
    Morphism Underlying obj' (Comprehension obj S) ->
    Morphism Underlying obj' obj
  ComprehensionToTerm :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj' : Object Underlying) ->
    (f : Morphism Underlying obj' (Comprehension obj S)) ->
    SemanticTerm obj'
      (MapType obj' obj (ComprehensionToMorphism obj S obj' f) S)
  TermToComprehension :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj' : Object Underlying) ->
    (m : Morphism Underlying obj' obj) ->
    SemanticTerm obj' (MapType obj' obj m S) ->
    Morphism Underlying obj' (Comprehension obj S)
  ComprehensionToTermToComprehension_id :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj' : Object Underlying) ->
    (f : Morphism Underlying obj' (Comprehension obj S)) ->
    TermToComprehension obj S obj'
      (ComprehensionToMorphism obj S obj' f)
      (ComprehensionToTerm obj S obj' f) =
    f
  TermToComprehensionToTerm_id_morphism :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj' : Object Underlying) ->
    (m : Morphism Underlying obj' obj) ->
    (term : SemanticTerm obj' (MapType obj' obj m S)) ->
    ComprehensionToMorphism obj S obj' (TermToComprehension obj S obj' m term) =
    m
  TermToComprehensionToTerm_id_term :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj' : Object Underlying) ->
    (m : Morphism Underlying obj' obj) ->
    (term : SemanticTerm obj' (MapType obj' obj m S)) ->
    ComprehensionToTerm obj S obj' (TermToComprehension obj S obj' m term) =
    replace {p=(\m' => SemanticTerm obj' (MapType obj' obj m' S))}
      (sym (TermToComprehensionToTerm_id_morphism obj S obj' m term)) term
  ComprehensionToMorphismIsNatural :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj', obj'' : Object Underlying) ->
    (observerChange : ObserverChange {cat=Underlying} obj' obj'') ->
    -- I'm not sure whether it is necessary to impose naturality on the
    -- observer change in order to ensure it on the comprehension-to-morphism
    -- bijection; that's just a guess.  We could find out whether proofs would
    -- go through without it.
    (isNatural : ObserverChangeIsNatural
      {cat=Underlying} {observerA=obj'} {observerB=obj''} observerChange) ->
    (f : Morphism Underlying obj' (Comprehension obj S)) ->
    ComprehensionToMorphism obj S obj''
      (observerChange (Comprehension obj S) f) =
    observerChange obj (ComprehensionToMorphism obj S obj' f)
  ComprehensionToTermIsNatural :
    (obj : Object Underlying) -> (S : SemanticType obj) ->
    (obj', obj'' : Object Underlying) ->
    (observerChange : ObserverChange {cat=Underlying} obj' obj'') ->
    -- I'm not sure whether it is necessary to impose naturality on the
    -- observer change in order to ensure it on the comprehension-to-term
    -- bijection; that's just a guess.  We could find out whether proofs would
    -- go through without it.
    (isNatural : ObserverChangeIsNatural
      {cat=Underlying} {observerA=obj'} {observerB=obj''} observerChange) ->
    (f : Morphism Underlying obj' (Comprehension obj S)) ->
    ComprehensionToTerm obj S obj''
      (observerChange (Comprehension obj S) f) =
    MapTerm obj'' obj'
      (ObserverChangeInducedMorphism
        {cat=Underlying} {observerA=obj'} {observerB=obj''} observerChange)
      (MapType obj' obj (ComprehensionToMorphism obj S obj' f) S)
      (ComprehensionToTerm obj S obj' f)

5.2 - S-expression

An Overview on dealing With S-Expressions WIP

5.3 - Translate

Documention on the Juvix Translation Phase

Translate

This phase is the direct manipulation of the S-expression construct we left off of in the frontend phase.

overview.png

This section is broken up into three section, each growing in complexity as we go along. These are:

  1. Desugaring

  2. Context Phase

Desugaring

Every important operation in this phase is a direct Syntax to Syntax transformation with no extra context needed.

desugar.png

The function op pipes all desugaring functions that do not require a context.

  -- | @op@ fully desugars the frontend syntax from the original
  -- frontend s-exp representation to a form without modules, conditions,
  -- guards, etc. This pass thus does all transformations that do not
  -- requires a context
  op :: [Sexp.T] -> [Sexp.T]
  op syn =
    syn
      >>| Pass.moduleTransform
      >>| Pass.moduleLetTransform
      >>| Pass.condTransform
      >>| Pass.ifTransform
      >>| Pass.multipleTransLet
      |> Pass.multipleTransDefun
      |> Pass.combineSig
      >>| Pass.removePunnedRecords
      >>| Pass.translateDo

The order of these passes is relatively arbitrary. We simply remove modules, then module lets, then conds, then …, and then do. The op function is ran directly on the output of the last phase.

Examples of its use are provided in the EasyPipeline module (run desugar1 and desugar2 to test the results for yourself!).

Desugaring S-expressions, Structurally

Structurally here refers to a systematic way of manipulating s-expressions in which form has and equivalent Haskell ADT.

Before getting into all the functions in op, such as Pass.moduleTransform, we first discuss the underlying tool that allows us to handle these passes in a principled way.

We'll use the let/defun form as an example of this transformation. This code can be found in the Structure module.

  ;; From Design/S-expression Syntax
  (:defun foo (x)
    (:cond
      ((:infix == x 2) (:infix + 3 (:infix + 4 x)))
      (else            (:infix + x 2))))

defun is broken into the name foo, the arguments x (note this can pattern match), and the body cond.

In Structure/Frontend.hs we find a Haskell encoding of this form:

  -- | @Defun@ is the base defun structure
  -- currently it does not have matching
  data Defun = Defun
    { defunName :: Sexp.T,
      defunArgs :: Sexp.T,
      defunBody :: Sexp.T
    }
    deriving (Show)

Note: Notice how we say nothing about the head being the cadr of the structure, and the arguments the caddr, and the body of course the caddr. Instead, we just lay out the logical structures in a record, divorced from any representation.

We provide an API to deal with the actual representation. It would need to change with a lisp-form redesign.

  ----------------------------------------
  -- Defun
  ----------------------------------------

  nameDefun :: NameSymbol.T
  nameDefun = ":defun"

  isDefun :: Sexp.T -> Bool
  isDefun (Sexp.Cons form _) = Sexp.isAtomNamed form nameDefun
  isDefun _ = False

  toDefun :: Sexp.T -> Maybe Defun
  toDefun form
    | isDefun form =
      case form of
        _Defun Sexp.:> sexp1 Sexp.:> sexp2 Sexp.:> sexp3 Sexp.:> Sexp.Nil ->
          Defun sexp1 sexp2 sexp3 |> Just
        _ ->
          Nothing
    | otherwise =
      Nothing

  fromDefun :: Defun -> Sexp.T
  fromDefun (Defun sexp1 sexp2 sexp3) =
    Sexp.list [Sexp.atom nameDefun, sexp1, sexp2, sexp3]

All records in that file have a corresponding interface of name<structure>, is<structure>, to<strcuture>, and from<structure>. These deal with the small details of cars and cdrs.

This can be tested by opening the Easy file in the Juvix repl and running

  λ> Right defun = Sexp.parse "(:defun foo (x) (:cond ((:infix == x 2) (:infix + 3 (:infix + 4 x))) (else (:infix + x 2))))"
  λ> import qualified Juvix.Sexp.Structure as Structure
  λ> Structure.toDefun defun
  Just (Defun {defunName = "foo", defunArgs = ("x"), defunBody = (":cond" ((":infix" "==" "x" 2) (":infix" "+" 3 (":infix" "+" 4 "x"))) ("else" (":infix" "+" "x" 2)))})

We can see this transformation agrees with our documentation.

What this buys us, is that when it comes time to do the passes, we don't have to worry about these details, while trying to make sure we don't change the semantics of the passes themselves. This properly decouples our concerns so we can worry about the abstraction meaning of the syntax in the passes while worrying about the details here.

However all is not fine, notice, we had to write nineteen lines of boilerplate in order to get rid of the details of the syntax! This is rather tedious, and if we count the numbers of these there are at least 22 structures in that file, and over 500 lines of this rather straightforward API. This is especially annoying given that we are essentially just translating over BNF translations.

It is due to this tedium early on that the lisp haskell generator was created. The code itself isn't the best (due to using string interpolation) and can with a bit of engineering be improved (with a proper S-expression DSL in the lisp code), however it works fairly well for our purposes.

Here are some choice snippets that cover every case of the generator

  ;; 1
  (generate-haskell "Defun" '("sexp" "sexp" "sexp") ":defun")

  ;; 2
  (generate-haskell "ArgBody" '("sexp" "sexp") nil)

  ;; 3
  (generate-haskell "DeconBody" (repeat 2 "sexp") nil)

  (generate-haskell "Case" '("sexp" "deconBody") "case" :list-star t)


  ;; 4
  (generate-haskell "NotPunned" '("sexp" "sexp") nil)

  (generate-haskell "RecordNoPunned" '("notPunnedGroup") ":record-no-pun"
                    :list-star t
                    :un-grouped t)

  ;; 5
  (generate-haskell "ArgBody" '("sexp" "sexp") nil)
  ;; bodys here, as there are multiple!
  (generate-haskell "LetMatch" '("sexp" "argBodys" "sexp") ":let-match")
  1. The first one correlates to the structure we've already seen. Namely the Defun struct. the first argument is the type name we wish to generate. This is of course Defun. the second argument is the list we wish to parse, in this case it says all three arguments are just Sexp.T's. and binds them in order to name, args, and body in the struct itself. The third argument is the name of the structure itself if it has one. Our lets are translated to the :defun structure, and so we note that here.

  2. The second case correlates to a very similar structure, except for that it lacks a name. so the structure we care about is (arg body) with no name attached. The nil in the third argument reflects this change.

  3. the third case correlates to a scenario with two changes. The first being is that we can define types for another to rely on. Here we are saying that case has the type DecondBody, we use lower case to reflect the proper variable name this associates with. Further the :list-start t aspect of this tells us that the last argument, in this case of DeconBody, is the rest of the s-expression and not just the cadr.

      data Case = Case
        { caseOn :: Sexp.T,
          caseImplications :: [DeconBody]
        }
        deriving (Show)
  4. The fourth is the last new concept of the bunch, namely we have :un-grouped t, which states the form in which this parses are not grouped like

    ((name₁ type₁) (name₂ type₂) … (nameₙ typeₙ)), but rather

    (name₁ type₁ name₂ type₂ … nameₙ typeₙ).

    This means that we have to tell it explicitly that it occurs over 2 positions with that :un-grouped argument.

  5. The fifth case is an interesting one. The key insight is that we say argBodys rather than argBody. This is because our generator is limited. Thus we manually write

      -- these are ungrouned fromArgBodys, where we groupBy2 both ways
      fromArgBodys :: [ArgBody] -> Sexp.T
      fromArgBodys = Sexp.unGroupBy2 . toStarList fromArgBody
    
      -- these are ungrouned fromArgBodys, where we groupBy2 both ways
      toArgBodys :: Sexp.T -> Maybe [ArgBody]
      toArgBodys = fromStarList toArgBody . Sexp.groupBy2

    To be able to handle the scenario where the list-star like form happens in not the last argument, but rather in a list in some other position!

Improvements Upon the Generator

The generator can be improved a lot, as the previous section mentions it was hacked together using string interpolation which is not a good structural way to handle this sort of problem. The alternative would however take a week of hacking, so it is not a priority to undergo.

However there are two worthwhile hacks that we should undergo.

  1. Change the Maybe of the to<Name> to an Either.

    This change at the current moment does not matter! Namely because Nothing should never be triggered. This is due to the parser in frontend excludes this possibility. However when syntax redesign comes in, this will no longer be the case. It would be great if the generated code could instead generate an Either where the left counts the number of arguments given and states precisely why it can't translate the given S-expression form into the type we want.

  2. list-star like behavior anywhere. Currently we bake it into the last form, but it would be nice if we could have a way to denote this for the middle slots, so we can avoid hacks, like the manual argBodys example given in the last section.

Desugaring Passes

Now that we understand the structural underpinnings of the pass, we can now talk about the transformation itself.

We can observe the passes here.

There are two ways to proceed, if you don't want to go through the entire creation process of a pass you can skip the next section and go directly to reviewing a pass

The most important detail to note is that if you have a clear view of the input and output of pass, a pass should be easy to write and review. If creating this is hard, then I suggest reading through the next section to see the full life cycle of this process.

Creating a Pass

Instead of just showing an example, let us show writing one. Let us define the cond transformation. We can find the form here.

It seems the Cond form is filled with a bunch of ((:infix ≡≡ x 3) 1) forms which we can abstractly view as (<pred> <ans>). So let us define these forms as a type. We do this in the Structure folder. In particular we want to put this in Frontend.hs since this shows up in the frontend syntax.

  -- | @PredAns@ is an abstraction over questions and answers
  data PredAns = PredAns {predAnsPredicate :: Sexp.T, predAnsAnswer :: Sexp.T}
    deriving (Show)

  -- | @Cond@ here Cond form takes a list of predicate answers
  newtype Cond = Cond {condEntailments :: [PredAns]} deriving (Show)

To reflect the structure we include the raw forms in the generator file and take the output of these calls into the Structure file.

  (generate-haskell "PredAns" (repeat 2 "sexp") nil)

  (generate-haskell "Cond" '("predAns") ":cond" :list-star t)

Now we can test that we successfully matched the BNF by using the REPL

  λ> Right cond = Sexp.parse "(:cond ((:infix == x 2) (:infix + 3 (:infix + 4 x))) (else (:infix + x 2)))"
  λ> Structure.toCond cond
  Just
    (Cond
       {condEntailments =
          [PredAns { predAnsPredicate = (":infix" "==" "x" 2)
                   , predAnsAnswer = (":infix" "+" 3 (":infix" "+" 4 "x"))}
          ,PredAns { predAnsPredicate = "else"
                   , predAnsAnswer = (":infix" "+" "x" 2)}]})

Now we know the shape of the data we are working with!

To get closer to the core version of match, let us first desugar this to an if. Something like

(:Cond (pred-1 result-1) … (pred-n result-n)) should turn into

(if pred-1 result-1 (if pred-2 result-2 (… (if pred-n result-n))))

So like cond, we also also define the records and forms we wish to work on

  -- | @If@ has an pred, then, and else.
  data If = If
    { ifPredicate :: Sexp.T,
      ifConclusion :: Sexp.T,
      ifAlternative :: Sexp.T
    }
    deriving (Show)

  -- | @IfNoElse@ has a pred and a then.
  data IfNoElse = IfNoElse
    { ifNoElsePredicate :: Sexp.T,
      ifNoElseConclusion :: Sexp.T
    }
    deriving (Show)


  -- In the generator
  (generate-haskell "If" (repeat 3 "sexp") "if")
  (generate-haskell "IfNoElse" (repeat 2 "sexp") "if")

Because our generator is limited we make two variants.

Now let us write the form, first let us observe that we can view this operation from cond to if as folding if over the list of PredAns, with the result being an if with no else condition.


  condToIf condExpression
      | Just cond <- Structure.toCond condExpression,
        -- we need to setup the if with no else
        Just last <- lastMay (cond ^. entailments) =
        let acc =
              -- let's create it, with the predicate and answer of the
              -- PredAns tye
              Structure.IfNoElse (last ^. predicate) (last ^. answer)
                |> Structure.fromIfNoElse
         -- Now let us employ the fold strategy we talked about
         in foldr generation acc (initSafe (cond ^. entailments))
      | otherwise = error "malformed cond"
    where
      -- this is the folded function, see how we just build up the if,
      -- then push it back to an s-expression at the end?
      generation predAns acc =
        Structure.If (predAns ^. predicate) (predAns ^. answer) acc
          |> Structure.fromIf

With our current understanding we'd write something like this, and in fact we can test it as is! Just go to the Easy Pipeline file and include these dependencies and the function above.

  import qualified Juvix.Sexp.Structure as Structure
  import Juvix.Sexp.Structure.Lens
  import Control.Lens hiding ((|>))

Along with the definition and now we can see

  λ> Right cond = Sexp.parse "(:cond ((:infix == x 2) (:infix + 3 (:infix + 4 x))) (else (:infix + x 2)))"

  λ> condToIf cond
  ("if" (":infix" "==" "x" 2) (":infix" "+" 3 (":infix" "+" 4 "x")) ("if" "else" (":infix" "+" "x" 2)))

We just created the bulk of our first pass! There is a few more details that one has to care about in a real pass, but we'll cover those in the reviewing section coming up

Reviewing a Pass

The following code can be found here.

  -- | @condTransform@ - CondTransform turns the cond form of the fronted
  -- language into a series of ifs
  -- - BNF input form:
  --   + (:Cond (pred-1 result-1) … (pred-n result-n))
  -- - BNF output form:
  --   + (if pred-1 result-1 (if pred-2 result-2 (… (if pred-n result-n))))
  condTransform :: Sexp.T -> Sexp.T
  condTransform xs = Sexp.foldPred xs (== Structure.nameCond) condToIf
    where
      condToIf atom cdr
        | Just cond <- Structure.toCond (Sexp.Atom atom Sexp.:> cdr),
          Just last <- lastMay (cond ^. entailments) =
          let acc =
                Structure.IfNoElse (last ^. predicate) (last ^. answer)
                  |> Structure.fromIfNoElse
           in foldr generation acc (initSafe (cond ^. entailments))
                |> Sexp.addMetaToCar atom
        | otherwise = error "malformed cond"
      --
      generation predAns acc =
        Structure.If (predAns ^. predicate) (predAns ^. answer) acc
          |> Structure.fromIf

We saw most of this form in the creation of a pass section above. However we will go over the rough strategy briefly. In order to turn the input of cond to an if, we can view it as a fold, where we fold the if form over the predicate and answers of the cond structure with the accumulator being the already built up if's. We can see this by the BNF comment at the start of the function. This transformation gets us closer to the more primitive match representation that is present in core.

We can see the implementation of this strategy shine through in the body of condToIf and the folded function generation.

The main difference from our creation of a pass section is that we have this Sexp.foldPred call, and that condToIf has two arguments. The impotence of this is that Sexp.foldPred searches the entire S-expression structure for the name Structure.nameCond, which is :cond. When it sees this function it breaks the structure into the atom, :cond, and the cdr which is the rest of the structure. This means that if you run this on any form that contains the cond form, then it will automatically run this transformation for you!

The last key bit of information is Sexp.addMetaToCar atom, this is to preserve meta information like line-numbers that were on the :cond atom that we would like on our newly generated if atom.

Overall, reviewing a pass is rather easy, just keep in mind what the input form is, and what the output form is. As long as the output form is getting closer to the Core representation of matters, we have a successful pass.

To verify this pass, we can easily run it for ourselves and see if the BNF Comment is being truthful.

  λ> Pass.condTransform cond

  λ> Right cond = Sexp.parse "(:defun f (x) (:cond ((:infix == x 2) (:infix + 3 (:infix + 4 x))) (else (:infix + x 2))))"
  λ> cond
  (":defun" "f" ("x")
     (":cond" ((":infix" "==" "x" 2) (":infix" "+" 3 (":infix" "+" 4 "x")))
              ("else"                (":infix" "+" "x" 2))))
  λ> Pass.condTransform cond
  (":defun" "f" ("x")
     ("if" (":infix" "==" "x" 2)
           (":infix" "+" 3 (":infix" "+" 4 "x"))
           ("if" "else"
                 (":infix" "+" "x" 2))))

Although we used the cond pass to talk about how these passes work, we only do so to have a simple example of how these work. All the other passes follow the same format and can be dissected by the same thought process.

Context

The context phase is an important phase, this is where we start to introduce the context and our operations start to revolve around it. This is also where our pipeline starts to show some inadequacies with how we ingest data. But first we must talk about how we transform our list of S-expressions into a context

Contextify

We call this phase Contextify, or Contextualization. The files we will first be investigating is split among many files, so we will link them before we write code rather than at the start.

5.4 - Frontend

Documentation on the Juvix Frontend Phase

Frontend

We begin our journey with the user facing language of Juvix. This is either text in a file, or text written into a REPL session. We then use Parser Combinators to turn this unstructured form into a Juvix form. Finally we end on converting this Structured Juvix form into an equivalent S-expression one to do further passes.

Parser Combinators

the parser combinator file that we will be inspecting will be found here.

In order to be able to understand this code, I suggest reading about how parser combinators work, for a very brief rundown here are the key features we will be abusing. If you wish to skip the brief rundown skip to the Specific Information Section.

Quick RunDown

  matchL :: Parser Types.MatchL
  matchL = do
    skipLiner J.pipe
    match <- P.try matchLogicSN
    spaceLiner (P.string "->")
    Types.MatchL match <$> P.try expression

Here the do just means we do the first statement, parse for a pipe, then a match. If any of these parsers fail to match (for example if there is no pipe), then we consider the parser to have failed.

Try is another important concept, this says if the parser fails for any reason, don't consume any text it may have started to parse (say our pipe passes but our matchLogicSN doesn't, if that is the case, then try matchL would revert the pipe parse!).

  topLevel :: Parser Types.TopLevel
  topLevel =
    P.try (Types.Type <$> typeP)
      <|> P.try fun
      <|> P.try modT
      <|> P.try (Types.ModueOpen <$> moduleOpen)

<|> means or, so if a parser fails, we move onto the next.

Finally it would be imporper of me to end the section without talking about the infix handler.

  expressionGen :: Parser Types.Expression -> Parser Types.Expression
  expressionGen p =
    Expr.makeExprParser (spaceLiner (expressionGen' p)) tableExp

  -- For Expressions
  tableExp :: [[Expr.Operator Parser Types.Expression]]
  tableExp =
    [ [refine],
      [infixOp],
      [arrowExp]
    ]

  infixOp :: Expr.Operator Parser Types.Expression
  infixOp =
    Expr.InfixR
      ( P.try $ do
          inf <- spaceLiner infixSymbolDot
          pure
            (\l r -> Types.Infix (Types.Inf l inf r))
      )

Here we outline three key functions. The expressionGen uses the tableExp as the precedence table of infix parsers. So refine has a higher precedence than infixOp. At the end of expressionGen we get back a new parser for any expression parser we may hand it.

The infixOp structure explicitly states what kind of infix it is, in our case we make it nest on the right, so a -> b -> c gets turned into a -> (b -> c). The important thing to note is that we only parse for the inifix symbol itself, we let the parser we hand to expressionGen handle the other side.

Every other tool you'll see is an abstraction on top of these base tools. Even the Infix handler is built upon the first two primitives we've outlined.

Specific Information

Now we get into the specifics of our version. We use parser combinators mostly in the standard way, however you'll often see the forms skipLiner, spaceLiner, and *SN which are not typical in a parser combinator system.

spaceLiner just eats all empty symbols, these are spaces and newlines after the current parsed expression.

skipLiner is the same however it is for any Character given to it.

finally *SN just calls spaceLiner on whatever parser.

These concepts exist namely due to the wish of eventually making the parser indent sensitive.

The main confusing bit of our layout is the many variants of expression!

  do''' :: Parser Types.Expression
  do''' = Types.Do <$> do'

  app'' :: Parser Types.Expression
  app'' = Types.Application <$> P.try application

  all'' :: Parser Types.Expression
  all'' = P.try do''' <|> app''

  -- used to remove do from parsing
  expression' :: Parser Types.Expression
  expression' = expressionGen app''

  -- used to remove both from parsing
  expression''' :: Parser Types.Expression
  expression''' = expressionGen (fail "")

  expression :: Parser Types.Expression
  expression = expressionGen all''

We have three main variants, ones with application, ones with do syntax, and ones with both! These exists because certain transformations will go into an infinite loop if you're not careful. This is mainly due to how some forms like do and infix generators behave together. In other cases like in adt declarations, we want to disable application type List a = Cons a (List a). It would be a shame if the a was applied to List a!

S-Expression Transformation

The transformation file can be found here.

Here we do a very straightforward algorithm, of when we see a particular Juvix ML ADT form, we then transform it to the equivalent Lisp expression that can be found in the s-expression document in design

Ideally we could generate the particulars of the syntax via some schema and have the ML code use that. This would remove the hardbaked structure of each form from the transformation, however with how many different types exist in the frontend form this would be quite a challenge. We do this technique later on in the compiler, reducing the amount of hard baked forms.

The above issue also extrapolates a bit on why we have chosen this path. ML structured ADT's are wonderful, however they lack flexibility. We used a trees that grow structure in the past, however this would convert one rigid ADT into another. And so we had to convert the massive adt that we built one by one, which ended up with over 500 lines of code per pass! However an issue just as big is that no common interface can be had, thus our passes couldn't as easily be abstracted awway.

Final Words

Overall, this stage of the compiler is not too complicated, it should be somewhat easy to get a grasp of and see how the data flows. I suggest going into the EasyPipeline library and loading up Easy.hs and playing around. We have just covered up to the SEXP PHASE section of that page, and so you should be able to understand what sexp1 and sexp2 examples there dl

5.5 - Context

This article goes over the Context structure in detail

What is the Context?

The context can be viewed as the working image of the Juvix Language. In particular it is the storage/information system of the language. It is also responsible for having the semantics of symbol resolution Due to this, we can view the context as a fancy nested hashtable.

What information is stored in the Context?

The context stores a plethora of information.

  • Currently the Context Contains

    1. Definitions

      • These hold the Standard desugared Juvix definitions.

    2. Types

      • These are the data type declarations in the Juvix language.

      • Every definition also has a type attached as well that one can query.

    3. Constructors

      • These are stored like Definitions, but for type consturctors.

    4. Records/Modules

      • These are recursive name-spaces.

      • Each record/module holds two name-spaces inside of them, a public and private one.

    5. Precedence

      • This is associated with a definition, so we know how to desugar precedence.

    6. Reverse Module use lookup

      • We get to know what modules import what module, and if they are implicit or explicit.

    7. Current Module/Record

      • This is the current module in scope.

In the long run this information will be added to, with each declared item in the image having a general hashtable that any information can be stored on. This will include documentation and the original source code.

How does Symbol resolution work?

Symbol Resolution is handled entirely by the context. When we ask the context what Foo.Bar.zed resolves to, we do a few things.

First we check if Foo is in the current name-space. Specifically if Foo is both in the private name-space and public one, then the private one wins. After this point, we have no means of accessing private name-spaces, so we just access the public Bar and zed inside the Foo module we resolved.

If Foo is not in the locals, then we check the top of the context for Foo. If it's not there, then we are unable to resolve it.

The only other special part of a known symbol is the resolution of names like TopLevel which bypasses any local module checks, and instantly resolves from the top. so Toplevel.Foo.Bar.zed will always look in the toplevel name-space.

Special Rules for dealing with unresolved symbols

Sometimes we wish to quantify over symbols which we may add or create. For our previous example, what if Foo.Bar.zed is undefined? Operations which operate over hypothetical symbols have a way to know if the symbol is defined or not.

The resolution rules are fairly simple, but we will investigate them with three case studies where the symbols may be undefined. zed, Foo.zed, TopLevel.Foo.zed

  1. zed

    • Since zed is not in scope, we only really have two choices. We either mean Toplevel.zed or CurrentNameSpace.zed. Since we find that we typically mean zed in the current module, and not some hypothetical one at a top level, we try to do the latter. So the operation deals with an undefined CurrentNameSpace.zed

  2. Foo.zed

    • Since This case can have a condition where Foo could also be resolved, we break this into another case analysis.

      1. Foo is resolved

        • In this case, we act much like the 1) case, where we explicitly work in an unresolved CurrentNameSpace.Foo.zed. or a Toplevel.Foo.zed depending on which one is in scope (see the scoping section above!).

      2. Foo is not resolved

        • In this case we actually do a different resolution than the zed case. We assume we are talking a toplevel version, namely Toplevel.Foo.zed

  3. Toplevel.Foo.zed

    • This one is rather easy as well, as it falls back to case 2 subsection 2.

Overall some of these rules are arbitrary, namely case 2 subsection 2 may need to conform to be more like case 1, however this requires testing and playing with the system more to determine.

Both this section and the above are implemented in the modify speace impure section.

What are the Functions of the Context?

The context hosts a plethora of useful functions that make traversing it easy. Most of these more useful functions are built upon the more primitive operations.

Many operations that create name-spaces are in IO, to hold the recrusive maps we use to track information.

Primitive Operations

Empty

  empty :: NameSymbol.T -> IO (T term ty sumRep)

empty creates a new context in the name-space you call it with (thus empty "Foo.Bar" makes a context with the nested module Bar in Foo).

The current module is then set to this name.

Lookup

  lookup ::
    NameSymbol.T -> T term ty sumRep -> Maybe (From (Definition term ty sumRep))

  (!?) ::
    T term ty sumRep -> NameSymbol.T -> Maybe (From (Definition term ty sumRep))

looup looks up the given name-symbol according to the rules in the Symbol resolution section. !? is just flipped lookup.

Add

  add ::
    NameSpace.From Symbol ->
    Definition term ty sumRep ->
    T term ty sumRep ->
    T term ty sumRep

add name def context adds name with the definition def to the current name-space of an existing context. The type NameSpace.From can either be Pub or Priv currently, stating how we want to add the definition to the current context.

As can be observed, we are adding a Symbol and not a NameSymbol, and so this primitive can't add name to a separate module. This should be changed in time.

Remove

  remove ::
    NameSpace.From Symbol ->
    T term ty sumRep ->
    T term ty sumRep

remove name context removes the name from the public or private name-space of the current name-space.

InNameSpace

  inNameSpace :: NameSymbol.T -> T term ty sumRep -> Maybe (T term ty sumRep)

inNameSpace name context is the primitive way in which we change the current name-space. like lookup, we use the standard symbol resolution rule to determine which module we resolve to. If the module is not found, then we simply return Nothing. Also if the definition we land on is not a record/module, then likewise we also return Nothing. If all is successful, then we end up with a new context with a new current name-space.

ModifySpaceImp ∧ type Stage ∧ type Return

  data Stage b
    = -- | 'Final' signifies the last symbol that we
      -- pursue in updating a structure
      Final b
    | -- | 'Continue' signifies that there are parts of
      -- the namespace that we can still continue down
      Continue b

  data Return term ty sumRep
    = -- | 'GoOn' signifies that we should continue
      -- going down records
      GoOn (Record term ty sumRep)
    | -- | 'Abort' signifies that we should cancel
      -- the changes on the map and
      Abort
    | -- | 'UpdateNow' signifies that we should
      -- update the context with the current value
      UpdateNow (Definition term ty sumRep)
    | -- | 'RemoveNow' signifies that we show remove
      -- the definition at this level
      RemoveNow

  modifySpaceImp ::
    Monad m =>
    ( Stage (Maybe (Definition term ty sumRep)) ->
      m (Return term ty sumRep)
    ) ->
    NameSymbol.T ->
    T term ty sumRep ->
    m (Maybe (T term ty sumRep))

modifySpaceImp f sym context is a rather important operation. Some forms like Empty even rely on it! modifySpaceImp is responsible for implementing the behavior for known symbols exactly how Symbol Resolution

lays it out, and for unknown symbols how Rules for Unresolved Symbols lays it out.

The Maybe it takes indicates whether the symbol is resolved or not.

We decided to include the Return and Stage type here, as they both talk about something integral to the design of this function. Namely the Return talks about the range of operations we care about. This limits the kinds of operations this modifyNameSpaceImp function can do. The Stage on the other hand denotes to the operation function f whether we are at the final symbol in the resolution or are we still going down to nested structure to find the proper symbol. Many operations will do something special at the very end, and so we often care about what stage we are at.

Finally, our function returns some monad m, this is due to just being general to accept any kind of function, as the name suggests there is a non Impure version of this, however this just runs this more general one in the Identity monad.

Derived Operations

These operations aren't derived from the primitive operations, however are quite important for operation of the context. Some even have special rules one must pay attention to.

SwitchNameSpace

  switchNameSpace ::
    NameSymbol.T -> T term ty sumRep -> IO (Either PathError (T term ty sumRep))

switchNamespace name context is an important operation. This creates and switches to the name-space even if it doesn't exist! The rules for creation are exactly that ruled in the Symbol Resolution section.

The creation of new name-spaces are IO, and so the operation is in IO as well.

5.6 - Pipeline

Documentation on the Juvix compiler pipeline

Pipeline

This section provides an overview of the compiler pipeline, from a source text file to the specific backend calls to produce the binary.

pipeline.png

The in code explanation in that image can be itself be more boxed into what we see in app/main.hs

  Compile fin fout (Michelson backend) ->
    liftIO (readFile fin)
      >>= Pipeline.parse
      >>= Pipeline.typecheck @Pipeline.BMichelson
      >>= Pipeline.compile @Pipeline.BMichelson
      >>= Pipeline.writeout fout
      >>= liftIO . print

Parsing

The first set of phases to happen is that the text file is turned into a structured document, namely an ML AST.

Transition

This is a long phase of the compiler where we do a few things.

  1. The first is that we desugar the expression using baked in expansions. This happens for function definitions, punned arguments, etc.

  2. After that, we then take the desugared s-expressions and put them in our global "context". We do some extra processing in this stage as well like resolving what modules open each other.

    • A context is like a primitive image. It can access all the information in a nested hashtable of sorts.

  3. We still have a few more passes to run, as phases like symbol resolution and infix resolution can be run now.

  4. Finally we have one last pass, translating what is left of the S-expression AST into a format where Core understands.

Core

Currently in our compiler the Pipeline.typecheck and Pipeline.compile function are backend specific, however the general transformations are similar regardless of the backend.

HR

This layer of core stand for Human Readable, as we have variable names instead of hairy de-bruijn indices. This phase is shortly lived, as most processing at this level wants to work on some extension of the Internal Representation as names create issues for the various algorithms.

IR

This is where the main process of type checking happens.

  • Names are reduced to de-bruijn indices.

  • We have many extensions with metadata that fluctuate depending on what pass of core is run.

Eliminations

In this phase of Core, we remove the types from the terms, as they don't serve any computation purpose. By this point we've done most of the processing we need.

In the future I suspect inlining algorithms will either be done here or in IR right above.

Backends

Michelson

LLVM

Arithmetic Circuit

5.6.1 - Idealized Pipeline

An Idealized version of our Core pipeline

This page covers what an idealized version of our core pipeline should look like for the time being.

The sections are laid out as followings.

  • The Pipeline defines the overall structure of the pipeline with comments between each step

  • Pipeline Function as a BNF with Examples gives a more long form BNF along with examples and commentary about the function. Further at the bottom of each section is a link to the reference for any literature that may be or may not be relevant.

  • The Outstanding section asks open questions about our passes and what questions arise that we don't have good answers to yet.

  • The Exhibition section gives further examples of a few key points when noted elsewhere in this document

  • The References section is where further links go about any related subjects

Pipeline

  patternMatchPipeline :: [(Pats,RHS)]  Type  ???
  patternMatchPipeline patternClauses type' =
    terminationChecking patternClauses type'
    |> corePipline

  -- The @corePipeline@ works over a context and an expression. The
  -- context is preserved throughout the computation as it's needed for
  -- many of these steps
  corePipeline  Context  Expression  ???
  corePipeline form =
    -- Need to lambda lift pattern matching let abstractions, we don't
    -- worry about them here, instead we have the corePipeline go over
    -- those
    lambdaLiftLocalMatches form
    -- Local lets are simply just have guards left, we can remove both
    -- the top level guards here, and the let local guards.
    -- Once we realize this, we can also notice that lets can only be
    -- a single form.

    -- TODO ∷ we currently don't have guards going down to this level, we
    --        discharge it as one of our first desugar passes
    |> removeGuardsAndSimplifyLetToSingleCase
    -- Here we turn lets which have arguments into letting a name over
    -- multiple lambdas
    |> localLetsIntoLambdas
    -- Move OUT ∷ Desugar pass/Macro pass once we have that
    |> openExpressionIntoLetOverOpenName
    -- Has to resolve inner modules, so we have to treat namesymbols as a
    -- term for forms in the closure
    -- see exhibition ⒈
    |> hrToIR
    -- This step should be moved to the effect block of typeCheckEvalLoop
    -- when we support first class opens
    |> nameResolution
    -- Here we have to reun infixResolution after nameResolution
    |> infixResolution
    -- toANF runs an algorithm that takes the current form, and creates a valid HTML
    -- ANF algo should run before ‘interactiveMode‘, so we can run algebraic effects
    |> toANF
    |> branch [pipelineTypedMacro, pipelineUntypedMacro]
    -- either go to interactiveMode or proceed to compilation
    -- eraseZeroUsages looks a the current form, and removes any
    -- values/terms that have zero usages attached to them
    |> eraseZeroUsages
    -- TODO ∷
    -- See Reference Inlining
    -- for some good analysis on when to inline
    -- Currently We inline all globals, which excludes pattern matching functions
    |> inlineAlgorithm
    -- For the backends we need names rather than de bruijn indices to
    -- compile. This step gives us back names
    |> restoreNameFromDeBruijn
    -- TODO ∷
    -- See Reference Fast Curry
    -- Here we want the eval-apply algorithm from the reference
    -- This will change the code to a form that can explicitly talk about currying.
    |> fastCurry
    -- We currently have this form. This computes the capture environment
    |> captureLambdaEnvironment
    -- In Michelson we want to branch off before this
    -- This pass we want to explicitly break a lambda into an
    -- environment-closure and the binder itself
    |> lambdaIntoExplicitBinderAndClosure


  pipelineUntypedMacro form =
    mutual [macroExpand, eval] form
    -- This is a mutually recursive step, due to how haskell works, we
    -- have to put it as 1 step, however if we could, we should be able
    -- to break it up and put labels to show it's explicitly recursive.
    |> mutal [typeCheckEvalLoop, interactiveMode]

  -- Requires research on
  pipelineTypedMacro form =
    -- If macros are typed, then it's recursive with typeCheck
    -- meta programming consturctions inside the context, so witch can use it
    mutual [typeCheck, macroExpand, eval, interactiveMode] form

  -- Interactive mode is a poor man’s REPL
  -- Initially, it displays the results from typechecking and then it
  -- restarts the whole process again
  interactiveMode form =
    -- return the result of the evaluation, and a derivation tree
    -- optionally read some code and return
    showResultAndDerTree form
    input <- readCode
    juvixPipeline input

Pipeline Function as a BNF with examples

Lambda Lift the Local Matches

  • BNF

      input-expresson ::=
          let <let-path-input>+ in <expression>
        | ...
    
      output-expression ::=
          let <let-name-output>+ in <expression>
        | ...
    
      let-pat-input   ::= <name> <pattern>* <guard>+
      let-name-output ::= <name> <name>* <guard>+
    
      guard? ::= \| <expression> = <expression>
               | <expression>
  • Examples

      (** Example 1 *)
    
      (* Input *)
      def example =
        let
          foo (Just x) = 5
          foo Nothing  = 6
        in foo (Just ())
    
      (* Output *)
    
    
      (* generated by the pass *)
      def #g:unique-name (Just x) = 5
          #g:unique-name Nothing  = 6
    
      def example =
        #g:unique-name (Just ())
    
    
      (** Example 2 *)
    
      (* Input *)
      def example =
        let foo x = 5 + x in
        foo 3
    
      (* Output *)
      def example =
        let foo x = 5 + x in
        foo 3
    
      (** Example 3 *)
    
      (* Input *)
      def example =
        let foo {x, y} = y + x in
        foo {x = 3, x = 5}
    
    
    
      (* Output *)
      (* generated by the pass *)
      def #g:unique-name {x, y} = x + y
    
      def example =
        let foo x = 5 + x in
        #g:unique-name {x = 3, x = 5}
  • Description

    The Transformation takes any let expression with <patterns>*'s that are any form of pattern matching, into top level declarations that are uniquely named for use where the previous let expression was used.

    If the let form has only simple patterns, then we leave it un-transformed, this can be seen in Example 2 in the above code.

  • Possible Complications

    • We have to lambda lift extra items which may not be related to this transform see this example

        (* Input *)
        def example =
          let
            def bar x = x
            def foo (Just x) = bar 5
                foo Nothing  = bar 6
          foo (bar (Just ()))
      
      
        (* Output *)
      
        def #g:unique-bar x = x
      
        def #g:unique-foo (Just x) = #g:unique-bar 5
            #g:unique-foo Nothing  = #g:unique-bar 6
      
        def example = #g:unique-foo (#g:unique-bar (Just ()))
  • Literature

    See Lambda Lifting

Remove Guards, and Multiple lets to single let

  • NOTE, Multiple lets of the same function, not nested

  • BNF

      ;; Input
      input-expression ::=
          let <let-name-output>+ in <expression>
        | ...
    
      input-declaration ::=
          def <def-pattern-input>+
        | ...
    
      ;; Ouptut
      ouptut-expression ::=
          let <name> <name>* = <expression> in <expression>
        | ...
    
      ouptut-declaration ::=
          def <def-pattern-ouptut>+
        | ...
    
      ;; Helpers
      def-pattern-input ::= <name> <pattern>* <guard?>
    
      def-pattern-ouptut ::= <name> <pattern>* <expression>
    
      let-name-input ::= <name> <name>* <guard?>
    
      guard? ::= \| <expression> = <expression>
               | <expression>
  • Examples

      (** Example 1 *)
    
      (* Input *)
      def add-3 x =
        let bar y = 3 + y in
        bar x
    
      (* Output *)
      def add-3 x =
        let bar y = 3 + y in
        bar x
    
      (** Example 2 *)
    
      (* Input *)
      def add-3-not-on-0 x =
        let compute y
             | y == 0 = Nothing
             | else   = Just (y + 3)
        in compute x
    
      (* Output *)
      def add-3-not-on-0 x =
        let compute y = if (Just (y + 3)) Nothing (y == 0)
        in compute x
    
      (** Example 3 *)
    
      (* Input *)
      def add-3-not-on-0 x =
        let compute y
             | y == 0 = Nothing
            compute y = Just (y + 3)
        in compute x
    
      (* Output *)
      def add-3-not-on-0 x =
        let compute y = if (Just (y + 3)) Nothing (y == 0)
        in compute x
    
    
      (** Example 4 *)
    
      (* Input *)
      def add-3-unless-1-or-0 x
        | x == 0 = 0
        | x == 1 = 1
        | else   = x + 3
    
      (* Output *)
      def add3-unless-1-or-0 x =
        if 0
           (if 1
               (x + 3)
               (x == 1))
           (x == 0)
    
      (* If is defined as follows *)
      (* about usages, consult the following table about
         what we can do usage wise
         cond
           first-class-usages? = if' (* comedy *)
           weakening?          = 1
           else                = ω *)
      sig if : {P : ((x : Bool) -0-> *))}
            -0-> t : P True
            -ω-> f : P False
            -ω-> b : Bool
            -1-> P b
      let if {Proof} then-case else-case True  = then-case
          if {Proof} then-case else-case False = else-case
  • Description

    Here we do the majority of the work of removing guards from lets. We can notice that lets and defs can both at this point desugar their guard checks into dispatching to this if destructor function given.

    We can also notice that after we run this, there can be no more lets with multiple patterns as both guards and matches have been handled now.

  • Possible Complications

    • We already handle guards in the desugar passes. Currently it just sees the guards as Cond's and desugars them into cases. This is however incorrect as pattern matching has a fall through behavior this does not respect.

      • This can be solved with a proper pattern matching algorithm

    • This algorithm may be better served by the linearization algorithm given by the pattern match simplification. I'm not sure this is always possible in the def case. For the simple lets we currently have this works as a strategy however.

    • This could maybe also be served better with an with pattern through some strategy. See the following.

      
        (* Input *)
        sig foo : maybe (maybe int) -1-> int
        def foo (Just (Just x)) | x == 0      = 3
            foo (Just x)        | x == Just 3 = 5
            foo _                             = 0
      
        (* output *)
        sig foo-inner : maybe (maybe int) -1-> bool -1-> bool -1-> int
        def foo-inner (Just (Just x)) true _    = 3
            foo-inner (Just x)        _    true = 5
            foo-inner _               _    _    = 0
      
        sig foo : maybe (maybe int) -1-> int
        def foo m@(Just (Just x)) = foo-inner m (x == 0) false
            foo m@(Just x)        = foo-inner m false    (x == Just 3)
            foo m                 = foo-inner m false    false

Local Lets Into Lambdas

  • BNF

      input-expression ::=
          let <name> <name>* = <expression> in <expression>
        | ...
    
      output-expression ::=
          let <name> = <expression> in <expression>
        | ...
  • Examples

      (** Example 1 *)
    
      (* Input *)
      def foo =
        let bar a b c = a + b + c in
        bar 1 2 3
    
      (* Output *)
      def foo =
        let bar = λ a  (λ b  (λ c  a + b + c)) in
        bar 1 2 3
  • Description

    Here we just take the given let form and transform it into it's pure lambda form for the body.

  • Possible Complications

    • The type may need to be explicitly given, thus forcing us to give a signature with our bar.

Open Expression into Let Over Simple Open

  • BNF

      input-expression ::= open <expression> in <expression>
                         | ...
    
      output-expression ::= open <name> in <expression>
                          | ...
  • Examples

      (** Example 1 *)
    
      (* Input *)
      def foo =
        open bar in
        x + y
    
      (* Output *)
      def foo =
        open bar in
        x + y
    
      (** Example 2 *)
    
      (* Input *)
      def foo xs =
        open (fold building-up-a-module xs : mod {}) in
        x + y
    
      (* Output *)
      def foo xs =
        let sig #:gensym 1 : mod {}
            def #:gensym = fold building-up-a-module xs
        open #:gensym in
        x + y
  • Description

    Here we want to have any non trivial open converted into a form where we let a name bind to this complicated expression and then have that name be used once in the open expression.

  • Possible Complications

    • There is an outstanding question of how opens relate to de-buijn indicies.

      • Maybe name resolution can help sort this out?

    • Type resolution of the module is also an interesting question to answer as well.

  • Literature

HR to IR

  • BNF

      term-input ::= λ <name> ⟶ <expression>
                   | let <usage> <name> ...
                   | sig <usage> <name> ...
                   | pi  <usage> <name> ...
                   | ...
    
      term-output = λ ⟶ <expression>
                   | let <usage> = <expression>
                   | sig <usage> ...
                   | pi  <usage> ...
  • Examples

      (** Example 1 *)
    
      (* Input *)
      def foo a b =
        let add-them = a + b in
        λ d  d + add-them
    
      (* Output *)
      def foo (Global-Pat 0) (Global-pat 1) =
        let (global-pat 0 + global-pat 1) in
        λ  indicy 0 + indicy 1
    
      (** Example 2 *)
    
      (* Input *)
      def foo (Just a) (List b c d) =
        let add-them = a + b in
        λ d : (Π 0 (x : Nat.t) Nat.t)   d + add-them + c + d
    
      (* Output *)
      def foo (Just (Global-pat 0))
              (List (Global-pat 1) (Global-pat 2) (Global-pat 3)) =
        let (global-pat 0 + global-pat 1) in
        λ : (Π 1 Nat.t Nat.t)   indicy 0 + indicy 1 + global-pat 2 + global-pat 3
    
      (** Example 3 *)
    
      (* Input *)
      def foo (Just a) (List b c d) =
        let add-them = a + b in
        λ typ : (Π 0 (typ : ×₀) (Π 1 typ typ))  
          λ d 
            d + add-them + c + d
    
      (* Output *)
      def foo (Just (Global-pat 0))
              (List (Global-pat 1) (Global-pat 2) (Global-pat 3)) =
        let (global-pat 0 + global-pat 1) in
        λ : (Π 0 ×₀ (Π 1 (indicy 0) (indicy 1))) 
         λ 
           indicy 0 + indicy 2 + global-pat 2 + global-pat 3
  • Literature

    See De Bruijn

  • Description

    Here we simply run a simple De Bruijn algorithm. In our code base this is actually already fully implemented

  • Possible Complications

    • Bug see issue #864, namely we are inconsistent about how we handle recursive lets

      • Local recursive functions have to be lambda lifted per discussion

    • How do we treat opens in terms of de-bruijn indicies?

Name Resolution

  • BNF

      input-expression ::= open <name> in <input-expression>
                         | ...
      ;; remove open
      ;; Note that in more complex cases this may not be possible, so we may
      ;; not have any bnf changes in this pass
      output-expression ::= ...
  • Examples

      (** Example 1 *)
    
      (* Input 1 *)
    
      (* Written by the user, which is part of the env by now *)
      open Prelude
    
      (* Input we see at this level *)
      sig add-list : Int.t -> List.t Int.t -> Int.t
      def
        add-list (global-pat 0) (Cons (global-pat 1) (global-pat 2)) =
          open Michelson in
          (* + is in Michelson *)
          let global-pat 0 + global-pat 1 in
          add-list (indicy 0) (global-pat 2)
        add-list (global-pat 0) Nil =
          global-pat 0
    
      (* Output *)
      sig add-list :
        Prelude.Int.t Prelude.-> Prelude.List.t Prelude.Int.t Prelude.-> Prelude.Int.t
      def
        add-list (global-pat 0) (Prelude.Cons (global-pat 1) (global-pat 2)) =
          open Prelude.Michelson in
          (* + is in Michelson *)
          let global-pat 0 Prelude.Michelson.+ global-pat 1 in
          add-list (indicy 0) (global-pat 2)
        add-list (global-pat 0) Prelude.Nil =
          global-pat 0
  • Description

    Here we run name resolution, for top level opens, this is rather simple, as our current Contextify step notes top level opens. Thus what needs to be respected is how opens effect the closure and determine for that what is referencing to outer names.

  • Possible Complications

    • To get complex opens working, we need to do some form of type checking. Thus this might have to be integrated to some level of type checking.

  • Literature

Infix Resolution

  • BNF

      input-expression ::= <input-expression> <op> <input-expression>
                         | ...
    
      output-expression ::= ...
  • Examples

      (** Example 1 *)
    
      (* Input *)
      (* * has precedence left 6  *)
      (* + has precedence 5  *)
      def foo = a * b * d + c
    
      (* Output *)
      def foo = + (* a (* b d))
                  c
  • Description

    Here we have to resolve infix symbols into prefix symbols given some precedence table. The easiest way to do this is to modify the Shunt Yard algorithm slightly for the constraints of Juvix. Thankfully this is already in.

  • Possible Complications

    • One has to be careful about how infixl, infixr and infix all interact with each other.

  • Literature

    See Shunt Yard

To ANF

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Mutual Block

Type Checker

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Witch

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Erase Zero Usages

  • BNF

      ;; No BNF changes
  • Examples

      (** Example 1 *)
      (* Example may be artificial *)
      (* Input *)
      sig linear-comb-pos-nats
        : Π 1 Nat.t
             (Π 1 Nat.t
                  (Π 2 Nat.t
                       (Π 0 (>= (indicy 0) 1)
                            (Π 0 (>= (indicy 2) 2)
                                 (refinement Nat.t (> (indicy 0) 1))))))
      (* global-pat 3 and global-pat 4 are the 0 usage proofs *)
      def pos-linear-comb (global-pat 0)
                          (global-pat 1)
                          (global-pat 2)
                          (global-pat 3)
                          (global-pat 4) =
        let ( * (global-pat 2) (global-pat 1)) : 1 Nat.t in
        (* Let's prove that our result is at least (max (arg 1) (arg 2)),
           using our proofs of (arg 3) and (arg 4) *)
        let times-by--is-greater (indicy 0) (global-pat 3) (global-pat 4) : 0 Nat.t in
        refinement (+ (indicy 1) ( * (global-pat 0) (global-pat 3))) (indicy 0)
      (* Output *)
    
      sig linear-comb-pos-nats
        : Π 1 Nat.t
             (Π 1 Nat.t
                  (Π 2 Nat.t
                       (Π 0 (>= (indicy 0) 1)
                            (Π 0 (>= (indicy 2) 2)
                                 (Refinement Nat.t (> (indicy 0) 1))))))
      (* global-pat 3 and global-pat 4 are the 0 usage proofs *)
      def pos-linear-comb (global-pat 0)
                          (global-pat 1)
                          (global-pat 2)
                          (global-pat 3)
                          (global-pat 4) =
        let ( * (global-pat 2) (global-pat 1)) : 1 Nat.t in
        (+ (indicy 0) ( * (global-pat 0) (global-pat 2)))
  • Description

    In this code we want to remove any code which has 0 usages in them. In particular, we can see in Example 1 that the proof about the number being greater than 1 is fully erased.

    However, it is important to note a few points. We did not erase the passing of extra arguments to linear-comb-pos-nats. This is because these arguments are still passed on the application side, both in existing code and in any incremental compilation passes. These arguments are ignored. Alternatively, we could chose a strategy such that it's noted what the 0 usage arguments are, but have a shim layer that transforms arguments before application with functions with 0 argument functions.

  • Possible Complications

    • What to do with 0 usage function arguments. See Description for another method of doing this.

    • What do we do about top level records usage wise? We need the 0 usage arguments, so we have to treat them in an interesting way.

    • We need to preserve signatures which only partially care about usage erasure.

  • Literature

Inline Algorithm

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Restore name from De Bruijn indicies

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Fast Curry

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Capture Lambda Environment

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Lambda Into Binder and Closure

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

Outstanding questions

How to Deal with Expansion?

Need eval? How does this reflect in the pipeline

Types of Macros?

How do we properly type it

Algebraic effects without Algebraic effects?

We want to structure the typeCheckEvalLoop as follows

  throw CantResolveSymbol ~> nameResolution

  -- typeChecking does
  -- Hole insertion, unifier, Type Checking, nameResolution
  typeCheckEvalLoop =
    doesAthing
      {resolverEffects
          = {
            -- filling holes
            , UnifyImplicits    ~> unifier
            -- Creating holes for the unifier to fill
            , InsertImplicit    ~> implicitInsertion
            -- For evaling terms
            , Evaluate          ~> eval
            -- For first class Name Resolution
            , CantResolveSymbol ~> nameResolution
            }}

Exhibition

def bar =
  let
    mod baz =
      mod biz =
        def foo = 3.
  baz.biz.foo
   ~> 0.biz.foo
   ~> resolve-symbol (resolve-symbol foo biz) 0

⒉ openExpressionIntoSimpleOpenPlusLet

def bar =
  let mod bar =
      def foo = 3.
  let #g:gensym = (bar djfka djfkas dfjak)
  open #g:gensym ~> open with type {foo}
  resolve-symbol foo #g:gensym.

def bar =
  let mod bar x y z =
        def foo = 3.
  let #g:gensym = (bar djfka djfkas dfjak)
  open #g:gensym ~> open with type {foo}
  resolve-symbol foo #g:gensym.


def bar =
  let mod bar x y z =
        def foo = 3.
  open (fold function-generating-a-module xs : mod {…})
  resolve-symbol foo #g:gensym.

def bar =
  let mod bar x y z =
        def foo = 3.
  let def #g:gensym =
        fold function-generating-a-module xs : mod {…}
  open-simple #g:gensym
  resolve-symbol foo #g:gensym.

References

Inlining

De Bruijn Indicies

  1. Benjamin C. Pierce: Types and Programming Languages chapter 6 page 75-80

Template

  • BNF

  • Examples

  • Description

  • Possible Complications

  • Literature

5.7 - Backends

Juvix supports backends for generating code for different targets (Michelson, LLVM etc.). Juvix programs are written in terms of primitives, i.e. values and types that are directly available in the target language. This allows is us to abstract away from target-specific code by replacing these primitives with ones from other targets, but at the same time give enough flexibility to execute target-specific operations if necessary.

Connection with pipeline

The backend implements several datatypes and class-instances that connect to the pipeline of Juvix.

Main backend instance

An instance of HasBackend, that implements the main connection to the pipeline:

  • The datatype used for types, values and errors (Ty, Val, Err

respectively);

  • stdlib, the Juvix source file(s) that make up the standard library for

this backend;

  • Optionally an implementation for the parse function;

  • Optionally an implementation for the typecheck function;

  • An implementation for the compile function.

Parameterisation

We need to inform the Juvix compiler of some basic information about our backend, such as:

  • A function for typechecking primitives: hasType.

  • Lists of builtin values and types: builtinTypes and builtinValues.

  • Functions for creating primitives based on a literal integers, floats or strings: stringVal, intVal and floatVal.

Substitution and weakening

For compilation, the backend takes a representation of the input program that is very close to a primitive-extended lambda calculus with multi-argument lambda's. As the primitives of the target language themselves can be functions, for example basic arithmetic operators, we can normalize these terms as well. For this we need to instantiate two classes for the primitives:

  • HasSubstV: For substitution of term.

  • HasWeak: For weakening of terms (accounting for removal of a lambda abstraction).

  • HasPatSubstTerm: Substitution of patterns.

Each of these are documented in more detail at the evaluator page.

Additionally, to make applying terms possible during substitution, the CanApply class needs to be instantiated for primitive terms and types.

5.7.1 - LLVM

Note that this description is merely a design, and does not reflect the current state of the LLVM backend yet. The design and implementation might change in the future.

Introduction to LLVM

LLVM is an intermediate language and collection of compiler tools. The intermediate language can be compiled to several backends including AMD64 and Web Assembly, giving us a lot of flexibility.

Structure of the backend

Primitives

The LLVM backend uses an intermediate representation for primitives that are specific to LLVM: RawPrimVal for values and functions, and PrimTy for types. Each of these have a rather direct relation to constructs available in LLVM.

Application

It is possible that primitives are terms that can be reduced further, i.e. addition of two constants can be replaced by a single constant. To support this, the LLVM backend implements application of terms by instancing the CanApply class for both types and values.

Parameterisation

The parameterisation of the LLVM backend is really easy and follows implementation of the Parameterisation datatype:

  • hasType is a function that checks the types for raw primitive values. Currently the typechecking is very basic, it just checks if the given types matches with the primitive value in arity. For operator primitives, we check if the types of the arguments are equal. This implementation will definitely have to be refined in the future.

  • builtinTypes and builtinValues just provide a mapping from source code literal names to the actual literals.

  • integerToRawPrimVal creates a literal based on an integer. Currently it doesn't take the size of the literal into account, as integer literals always share the same type. This should be improved in the future.

  • There are currently no implementations for stringVal and floatVal, as the backend does not support these types yet.

Substitution and weakening

For simplicity, the current HasWeak, HasSubstValue and HasSubstTerm instances do not do anything at all. This will likely change in the future.

Connection to the pipeline

The LLVM instances for HasBackend relies on the default for the parse function. Additionally:

  • The typecheck function relies on the general typechecker implementation by calling the typecheck' function, together with the parameterisation and a the LLVM representation of a set type.

  • The compile function, which gets a filename and an annotated term (AnnTermT) as the program. The annotated term with typed primitives is translated to an annotated term with raw primitives RawPrimVal. These are then translated to LLVM code.

Compilation

With the input program being represented by the annotated Term, we can now easily translate these to LLVM code. We rely mostly on the llvm-hs-pure package, as it provides a nice monadic interface LLVM.IRBuilder that keeps track of variables for us. Therefore, the implementation is rather straightforward:

  • The whole program, is translated to an LLVM module by the compileProgram function. This module contains a hardcoded main function, with the type taken from the outermost annotated term.

  • The term itself is compiled by the compileTerm function, using recursion for any sub terms.

  • For any primitive functions, we check if enough arguments are available, and if so the primitive function is written, with unnamed references to the arguments all taken care of by the IRBuilder monad.

Each of these functions are called in their own part of the pipeline, the output of parse is the input of typecheck and the output of typecheck is the input of compile.

References

6 - Tutorials

A place to show off plently of examples and tutorials about Juvix

A place to see all kinds of tutorials about Juvix

6.1 - Writing Proof with Witch

An Article About using Witch

Writing proofs in Juvix

To popularise software verification, we must make proof writing less of a burden on you, the user. Currently, only specialists and a few others can write proofs for their programs, even though you surely understand the domain and invariants of your projects – otherwise you wouldn’t be able to write any software at all.

Our hypothesis is that programmers are well-equipped to derive and prove properties of the software they write, but they lack the mathematical maturity and vocabulary to carry out a formal proof. We have evidence that students fail to produce well-made proofs due to the lack of mathematical maturity, even though they do understand the subject matter at hand.

We believe that program verification doesn’t have to be like this, and we did something about it. We created an assistant tool for theorem provers that puts together many different strategies for proof automation. Our specification for such an assistant for Juvix uses algebraic effects and handlers as the means of congregation.

The approach to facilitate program verification in Juvix is named Witch, a play on the name of assistant tools colloquially called “wizards”. There is no consensus of what a wizard is or what the exact tasks it is supposed to assist with. Wizards seem to be used mostly for multiple-step and/or configuration features, however. We went for the name “witch” to align it to the idea of assistant tools, while dodging the overloaded and confusing terminology.

We propose an effects and handlers view of such proofs, based on prior work developed on the Andromeda prover.
In Juvix, you program as you would normally and invoke the proof environment as an effect to prove certain properties as you go. Given that the proof environment is just an effect, different proof styles (e.g., Agda-style dependent types, SMT solver, proof search) can be composed under a shared interface, a proof object that can manipulate itself while querying different automated provers.

The reasons we employ algebraic effects and handlers are numerous:

  1. as proofs cannot be fully automated, all approaches that try to automate the process (e.g, proof search, SMT solver) may be non-deterministic or never find a solution. Therefore, the system should be able to handle “impure” computations and errors. Algebraic effects and handlers have well-defined semantics and provide a simple interface for performing effects. With them, we avoid indiscriminate effects that are often error-prone and intricate effects machinery such as monad transformers;

  2. the semantics of effects and handlers accommodate composition of arbitrary effects and the composition of multiple handlers, which means users have the ability to weaken more general strategies into specific ones while maintaining top-level handlers unmodified;

  3. effects and handlers have well-defined semantics, and it is the main feature of many new languages, such as Koka, Eff, Frank and Effekt, and have implementation in all major functional programming languages. This guarantees that our approach is based in research and real-world experiences, and not something we hacked together over the weekend.

The Essence of Witch

As for the syntax, we use operation { params } via handler, which is semantically equivalent to but a syntactic improvement over handler(operation, params), since effect handling is similar to function application, but also carries effect information. The user defines data types and functions as usual, and then uses Witch to prove properties about said definitions. The examples below show it in action:

data ℕ : Type where
  0 : ℕ 
  s : ℕ → ℕ
  
  _+_       : ℕ → ℕ → ℕ
  0 + n     = n
  (s m) + n = suc (m + n)

+comm          : ∀ (x y : ℕ) → Witch x + y ≡ y + x
+comm 0 y     = tactic { 0 + y ≡ y + 0 } via Ring
+comm (s x) y = search { x + (s y) ≡ suc (x + y) } via Backtracking


++assoc         : ∀ {A : Set} (x y z: [A]) → Witch (x ++ y) ++ z ≡ x ++ (ys ++ zs)
++assoc ∅ y z   = solve { (∅ ++ y) ++ z ≡ ∅ ++ (ys ++ zs) } via SMT 
++assoc (x ∷ xs) y z rewrite ++assoc xs y z 
                = pure ref

The proof of commutativity of addition under natural numbers and of associativity of list concatenation are shown, and use the three main effects: Solver, Proof Search and Tactics. In the proof assistant literature, there exists no precise definition of commonly used terms “solver”, “proof search” and “tactics”. All these terms are used in different communities, and mean some variation of “automatic strategy to construct a term under certain constraints”.

In Witch, however, we use the following definitions of previously known terms:

  • The Solver effect is used for integration with external solvers via IO; we believe it should suffice for the user to write, e.g. SMT, and handlers should implement internal strategies to choose between the different theories supported by solvers. If the black-box approach to solvers presents itself a challenge, a specialisation of the handler is possible, e.g. operation { params } via Z3.QF-UFLIA .

  • The Proof Search effect is used for library-level algorithms; users may choose to implement their own algorithms using a limited set of meta-programming¹ constructs that are handled at top-level².

  • The Tactics effect is used for strategies that simplify the goal at least one step, and may not complete all proof goals. This style is mainly used in the proof assistant Coq. While we do not foresee using as many as many tactics implemented as in Coq, we believe tactics such as eauto, rewrite, unfold are useful to users of all levels of expertise.

  • Lastly, the Error effect is used for feedback to the user, since any of the strategies may fail. Error has two operations, throw and trace: the former notifies the user that a strategy has failed, while the latter registers³ completed sub-goals during the strategy’s attempt to complete the proof.

Witch for Juvix is currently under development, and we hope to release its first version soon. :)

¹ By meta-programming, we mean “code that manipulates itself”, and not “programming that happens in a higher level of abstraction”. For dependently typed programming languages, the usual term is reflection. However, we prefer not use reflection since it has a different meaning in terms of effectful computations.

² The meta-programming constructs are operations of the Typechecker effect whose handlers are not available for the user.

³ Internally, the Typechecker effect should have a tree that stores all currently saved traces.

6.2 - Multi-Bear Domicile Setup

A short lead description about this content page. It can be bold or italic and can be split over multiple paragraphs.

This is a placeholder page. Replace it with your own content.

Text can be bold, italic, or strikethrough. Links should be blue with no underlines (unless hovered over).

There should be whitespace between paragraphs. Vape migas chillwave sriracha poutine try-hard distillery. Tattooed shabby chic small batch, pabst art party heirloom letterpress air plant pop-up. Sustainable chia skateboard art party banjo cardigan normcore affogato vexillologist quinoa meggings man bun master cleanse shoreditch readymade. Yuccie prism four dollar toast tbh cardigan iPhone, tumblr listicle live-edge VHS. Pug lyft normcore hot chicken biodiesel, actually keffiyeh thundercats photo booth pour-over twee fam food truck microdosing banh mi. Vice activated charcoal raclette unicorn live-edge post-ironic. Heirloom vexillologist coloring book, beard deep v letterpress echo park humblebrag tilde.

90’s four loko seitan photo booth gochujang freegan tumeric listicle fam ugh humblebrag. Bespoke leggings gastropub, biodiesel brunch pug fashion axe meh swag art party neutra deep v chia. Enamel pin fanny pack knausgaard tofu, artisan cronut hammock meditation occupy master cleanse chartreuse lumbersexual. Kombucha kogi viral truffaut synth distillery single-origin coffee ugh slow-carb marfa selfies. Pitchfork schlitz semiotics fanny pack, ugh artisan vegan vaporware hexagon. Polaroid fixie post-ironic venmo wolf ramps kale chips.

There should be no margin above this first sentence.

Blockquotes should be a lighter gray with a border along the left side in the secondary color.

There should be no margin below this final sentence.

First Header 2

This is a normal paragraph following a header. Knausgaard kale chips snackwave microdosing cronut copper mug swag synth bitters letterpress glossier craft beer. Mumblecore bushwick authentic gochujang vegan chambray meditation jean shorts irony. Viral farm-to-table kale chips, pork belly palo santo distillery activated charcoal aesthetic jianbing air plant woke lomo VHS organic. Tattooed locavore succulents heirloom, small batch sriracha echo park DIY af. Shaman you probably haven’t heard of them copper mug, crucifix green juice vape single-origin coffee brunch actually. Mustache etsy vexillologist raclette authentic fam. Tousled beard humblebrag asymmetrical. I love turkey, I love my job, I love my friends, I love Chardonnay!

Deae legum paulatimque terra, non vos mutata tacet: dic. Vocant docuique me plumas fila quin afuerunt copia haec o neque.

On big screens, paragraphs and headings should not take up the full container width, but we want tables, code blocks and similar to take the full width.

Scenester tumeric pickled, authentic crucifix post-ironic fam freegan VHS pork belly 8-bit yuccie PBR&B. I love this life we live in.

Second Header 2

This is a blockquote following a header. Bacon ipsum dolor sit amet t-bone doner shank drumstick, pork belly porchetta chuck sausage brisket ham hock rump pig. Chuck kielbasa leberkas, pork bresaola ham hock filet mignon cow shoulder short ribs biltong.

Header 3

This is a code block following a header.

Next level leggings before they sold out, PBR&B church-key shaman echo park. Kale chips occupy godard whatever pop-up freegan pork belly selfies. Gastropub Belinda subway tile woke post-ironic seitan. Shabby chic man bun semiotics vape, chia messenger bag plaid cardigan.

Header 4

  • This is an unordered list following a header.
  • This is an unordered list following a header.
  • This is an unordered list following a header.
Header 5
  1. This is an ordered list following a header.
  2. This is an ordered list following a header.
  3. This is an ordered list following a header.
Header 6
WhatFollows
A tableA header
A tableA header
A tableA header

There’s a horizontal rule above and below this.


Here is an unordered list:

  • Liverpool F.C.
  • Chelsea F.C.
  • Manchester United F.C.

And an ordered list:

  1. Michael Brecker
  2. Seamus Blake
  3. Branford Marsalis

And an unordered task list:

  • Create a Hugo theme
  • Add task lists to it
  • Take a vacation

And a “mixed” task list:

  • Pack bags
  • ?
  • Travel!

And a nested list:

  • Jackson 5
    • Michael
    • Tito
    • Jackie
    • Marlon
    • Jermaine
  • TMNT
    • Leonardo
    • Michelangelo
    • Donatello
    • Raphael

Definition lists can be used with Markdown syntax. Definition headers are bold.

Name
Godzilla
Born
1952
Birthplace
Japan
Color
Green

Tables should have bold headings and alternating shaded rows.

ArtistAlbumYear
Michael JacksonThriller1982
PrincePurple Rain1984
Beastie BoysLicense to Ill1986

If a table is too wide, it should scroll horizontally.

ArtistAlbumYearLabelAwardsSongs
Michael JacksonThriller1982Epic RecordsGrammy Award for Album of the Year, American Music Award for Favorite Pop/Rock Album, American Music Award for Favorite Soul/R&B Album, Brit Award for Best Selling Album, Grammy Award for Best Engineered Album, Non-ClassicalWanna Be Startin' Somethin', Baby Be Mine, The Girl Is Mine, Thriller, Beat It, Billie Jean, Human Nature, P.Y.T. (Pretty Young Thing), The Lady in My Life
PrincePurple Rain1984Warner Brothers RecordsGrammy Award for Best Score Soundtrack for Visual Media, American Music Award for Favorite Pop/Rock Album, American Music Award for Favorite Soul/R&B Album, Brit Award for Best Soundtrack/Cast Recording, Grammy Award for Best Rock Performance by a Duo or Group with VocalLet’s Go Crazy, Take Me With U, The Beautiful Ones, Computer Blue, Darling Nikki, When Doves Cry, I Would Die 4 U, Baby I’m a Star, Purple Rain
Beastie BoysLicense to Ill1986Mercury RecordsnoawardsbutthistablecelliswideRhymin & Stealin, The New Style, She’s Crafty, Posse in Effect, Slow Ride, Girls, (You Gotta) Fight for Your Right, No Sleep Till Brooklyn, Paul Revere, Hold It Now, Hit It, Brass Monkey, Slow and Low, Time to Get Ill

Code snippets like var foo = "bar"; can be shown inline.

Also, this should vertically align with this and this.

Code can also be shown in a block element.

foo := "bar";
bar := "foo";

Code can also use syntax highlighting.

func main() {
  input := `var foo = "bar";`

  lexer := lexers.Get("javascript")
  iterator, _ := lexer.Tokenise(nil, input)
  style := styles.Get("github")
  formatter := html.New(html.WithLineNumbers())

  var buff bytes.Buffer
  formatter.Format(&buff, style, iterator)

  fmt.Println(buff.String())
}
Long, single-line code blocks should not wrap. They should horizontally scroll if they are too long. This line should be long enough to demonstrate this.

Inline code inside table cells should still be distinguishable.

LanguageCode
Javascriptvar foo = "bar";
Rubyfoo = "bar"{

Small images should be shown at their actual size.

Large images should always scale down and fit in the content container.

The photo above of the Spruce Picea abies shoot with foliage buds: Bjørn Erik Pedersen, CC-BY-SA.

Components

Alerts

Another Heading

Add some sections here to see how the ToC looks like. Bacon ipsum dolor sit amet t-bone doner shank drumstick, pork belly porchetta chuck sausage brisket ham hock rump pig. Chuck kielbasa leberkas, pork bresaola ham hock filet mignon cow shoulder short ribs biltong.

This Document

Inguina genus: Anaphen post: lingua violente voce suae meus aetate diversi. Orbis unam nec flammaeque status deam Silenum erat et a ferrea. Excitus rigidum ait: vestro et Herculis convicia: nitidae deseruit coniuge Proteaque adiciam eripitur? Sitim noceat signa probat quidem. Sua longis fugatis quidem genae.

Pixel Count

Tilde photo booth wayfarers cliche lomo intelligentsia man braid kombucha vaporware farm-to-table mixtape portland. PBR&B pickled cornhole ugh try-hard ethical subway tile. Fixie paleo intelligentsia pabst. Ennui waistcoat vinyl gochujang. Poutine salvia authentic affogato, chambray lumbersexual shabby chic.

Contact Info

Plaid hell of cred microdosing, succulents tilde pour-over. Offal shabby chic 3 wolf moon blue bottle raw denim normcore poutine pork belly.

Stumptown PBR&B keytar plaid street art, forage XOXO pitchfork selvage affogato green juice listicle pickled everyday carry hashtag. Organic sustainable letterpress sartorial scenester intelligentsia swag bushwick. Put a bird on it stumptown neutra locavore. IPhone typewriter messenger bag narwhal. Ennui cold-pressed seitan flannel keytar, single-origin coffee adaptogen occupy yuccie williamsburg chillwave shoreditch forage waistcoat.

This is the final element on the page and there should be no margin below this.

6.3 - Another Tutorial

A short lead description about this content page. It can be bold or italic and can be split over multiple paragraphs.

This is a placeholder page. Replace it with your own content.

Proof Searching of Juvix

  • Data Structures in Juvix should provide complimentary proofs that are relevant to the users.

The Argument for Implicit Refinements Over Indexed Types For Structures

  • The current thought on how to do that is as follows

    1. Provide a base structure with no extra information as the main module

      • So

          module Data.Map where
        
          type Map a b = ...
    2. Provide various libraries that provides proofs over the structure

      • an example would be

          module Data.Map.Proofs.Size

        which would give all proofs about size that can be automatically applied to a piece of code.

    3. Provide a module

        module Data.Map.Proofs

      which gives back all available refinements defined by the library.

      • This gives the programmer a heavy weight way of figuring out what proofs/refinements are needed in order to allow your code to verify

      • This module is not intended to be left in final, code but in process of developing a module

        • Thus the environment itself should give back what refinements were used in verifying said expression.

  • I believe this is superior to indexed types, as this would allow the programmer to only add types they care about, and not be forced to consider moving their list to vector if they care about length.

    • This use of index types, would explode the number of variants of a particular type.

      • This also forces the programmer to have to understand that a vector is a list and thus an UI on these variants should be maintained.

6.4 - Syntax-Guide

A tutorial on the Current Syntax

This page is subject to change with the syntax redesign.

Syntax Guide

This document is heavily inspired by the Idris syntax guide.

File Organization

A file contains zero or more Top Level Declarations

For example

  -- this is a function with a signature!
  sig foo : Natural
  let foo = 3

  -- this is a data type declaration
  type list a = Cons a (List a)
              | Nil

  -- this is a module declaration!
  mod Boo =
    let fi = 10
  end

Comments

Comments are denoted by two dashes, --. The two dashes and all characters up until the end of the line are discarded.

Example:

  -- This is a comment!

Symbols

Symbols are used for any name declared in the Juvix programming language. Symbols are broken into two categories, infix and prefix.

Prefix symbols start with either an letter or an underscore, which can be followed up by any alphanumeric character, underscores, punctuation(?, !), or dashes.


  -- a valid symbol
  hello-there

  -- another valid symbol
  _unused-symbol

  -- another valid symbol. Typically '?' denotes that the symbol is a predicate
  even?

  -- An important action
  foo!

  -- not a valid prefix symbol
  -foo

An infix symbol starts with any character other than a letter or an underscore.

  -- This is a valid infix symbol

  !!

  -- this is also another valid infix symbol

  -<++

  -- this is not a valid infix symbol, but instead a comment of a -

  ---

  -- here is an arrow infix symbol

  ->

Top Level Declarations

Functions

Functions are started by writing let which is followed by any valid prefix symbol or an infix symbol surrounded by parentheses which shall be referred to as the function name. Then, there are zero or more arguments, with implicit arguments surrounded by curly braces ({}). The argument list ends when an equal sign (=) is placed, which denotes the start of the body.

Example:

  -- this is a valid function
  let f x = x + 3

  -- this is another valid variable/function
  let y = 5

  -- this is a valid infix symbol
  let (+) = plus


  -- a function with an implicit argument
  let foo {prf} x = x

Another important part of a function is the signature.

A signature is denoted first by sig, then the function name. From here colon (:) denotes the start of the type of the function name. Subsequently, any valid type can be written.

Example:

  -- a valid signature and function
  sig foo : int -> int
  let foo x = x + 3


  -- an example of a dependent signature
  sig add-nat-int
      :  x : nat
      -> y : int
      -> if | x > y -> nat
            | else  -> int
  let add-nat-int = (+)

Types

Types are very similar to Haskell and Idris ADT and GADT declarations.

Types are declared by writing type following by the name of the type and arguments much like function syntax. Optionally a type signature can be given at this point, by writing colon (:) followed by the type.

An equals sign (=) denotes the start of the body of the type declaration.

From here a declaration can take a few forms.

  1. Zero or more sums, each of which starts with pipe (|) and contains a tagged product.

  2. A tagged product which starts with the new constructor name and either the arguments separated by spaces, a colon (:) followed by the arguments separated by arrows, or a base record.

  3. A base record which is denoted by curly braces ({}). inside the curly braces, a name is given to every argument, which type is started via colon and terminated by a comma (,).


  -- This is a valid type
  -- the a is a type argument
  type list a
    -- Cons is the constructor
    -- Cons takes an item of type a and a List of a
    = Cons a (list a)
    -- Nil is another constructor taking no arguments
    | Nil


  -- this is the same type, but GADT style arrow syntax
  -- is given to the constructor
  type list a : a -> list a
  -- Curly braces can be used here to name the arguments
    = Cons { car : a,
             cdr : list a }
    | Nil

  -- Same type again but using GADT syntax in the constructors
  -- The first product can have a pipe!
  type list a =
    | Cons : a -> list a -> list a
    | Nil  : list a

  -- an example of a base record!
  type coords a = {
    x : a,
    y : a
  }

  -- Same example but we have a trailing comma
  type cords a = {
    x : a,
    y : a,
  }

  -- An example of a dependent type
  type vect : (len : nat) -> (elem : set) -> set =
    | Nil  : vect 0 elem
    | Cons : elem -> vect len elem -> vect (succ len) elem

Modules

modules are denoted similarly to functions except that instead of using let, mod is used instead.

Instead of an expression, the body consists of zero or more top-level declarations followed by end.


  -- example defining a module

  mod Foo =
    sig bar : nat
    let bar = 3

    -- The type is inferred here
    let baz = 5

  -- end ends the module definition
  end

  -- example using a module
  let test = Foo.bar + Foo.baz

Imports

A module can be imported in two ways.

Importing a module unqualified via =open=ing them means that every symbol in the module becomes unqualified.

A module can be open-ed:

Example:

  -- A valid open
  open Foo

  -- opening the module Baz in the moudle Bar in the moudle Bar
  open Foo.Bar.Baz

  -- This is the same statement as above.
  open Foo
  open Bar.Baz


  -- let us define a module
  mod IntMod =
    let t = int

    sig of-nat : int -> t
    let of-nat x = x

    sig add : t -> t -> t
    let add = (+)
  end

  -- now we shall open it into our scope
  open IntMod

  -- we can now use it unqualified
  sig use-int-mod : nat -> nat -> t
  let use-int-mod x y = add (of-nat x) (of-nat y)

A module can also be aliased with a let:

Example:

  -- a valid module alias
  let F = Foo

Expressions

Conditionals

If

If expressions have a non-zero number of clauses. Each clause consists of a boolean test, followed by a body term.

Example:

  -- this is a valid if expression!
  if | x == 3 -> 5
     | else   -> 6
  -- ^ test      ^ consequence

  -- this is also a valid a valid if expression
  if | x == 10     -> 25
     | positive? x -> x
     | negative? x -> abs x
     | else        -> 0

The else name is just an alias for True.

Case

Case expressions have a non-zero number of clauses. Each clause consists of a pattern, followed by a consequence.

A pattern works much like Haskell or Idris, in that one can deconstruct on a record or a constructor. We also allow record punning on matches.

Example:

  type tree a = Branch (tree a) a (tree a)
              | Leaf a
              | Empty


  -- an example with match!
  sig func : Tree nat -> nat
  let func foo =
    case foo of
    | Branch left ele right ->
      func left + ele + func right
    | Leaf ele ->
      ele
    | Empty ->
      0

  -- This is the same function!
  let func2 (Branch left ele right) =
    func2 left + ele + func2 right
  let func2 (Leaf ele) =
    ele
  let func2 Empty =
    0

  type coords = {
    x : int,
    y : int
  }

  -- match on record

  sig origin? : coords -> boolean
  let origin? {x, y}
    | x == y && x == 0 = True
    | else             = False

  -- same function as origin
  sig origin2? : coords -> boolean
  let origin2? {x = origX, y = origY}
    | origX == origY && origX == 0 =
      True
    | else = False
Dependent matching

Definitions

Definitions within an expression are like their top level counterparts, except that in followed by an expression must be written after the definition.

Let
  let foo =
    let bar = 3 in
    bar + 10
Modules
  let foo =
    mod Bar =
      let foo = 3
      let bat = 10
    end in
    Bar.foo + Bar.bat
Signatures
Types
  let foo =
    type bar = Foo int
             | Bar nat
    in [Foo 3, Bar 10]

Lists

List literals are started by the open bracket character ([). Within, elements are separated by commas (,) before ending with a closing bracket (]).

List literal syntax is just sugar for the Cons and Nil constructors.

Example:

  -- this is a valid list
  [1]

  -- another valid list
  [1,2,3]

  -- the same list without sugar
  Cons 1 (Cons 2 (Cons 3 Nil))

Tuples

Tuples are formatted like lists, however instead of using brackets, parenthesis are used instead ( ( ) ).

Example:

  -- this is a tuple
  (1, 2)

  -- this is not a tuple
  (1)

  -- this is a 5 tuple!
  (1,2,3,4,5)

Records

Record literals are started by an open curly brace ({). Within, elements are bound to the corresponding name of the record via the equals sign (=), or punned by the name directly. Elements, like lists, are separated by commas (,) before ending with a closing brace (}).

Example:


  type coords = {
    x : int,
    y : int
  }

  -- a new construct called foo for coords
  sig create-cords : int -> int -> coords
  let create-cords x-dir y-dir = {
    x = x-dir,
    y = y-dir
  }


  -- same function with punning
  sig create-cords : int -> int -> coords
  let create-cords x y = {x, y}
Record updating syntax

Constants

String Literals

Strings are enclosed by double quotes (")

Example:

  let foo =
    "this is a string!"
Integers/Naturals

numbers are denoted by the characters 123456789.

Examples:

  -- a valid number literal
  let foo = 123


  -- another valid number
  let number-one = 1

Do Notation

Do notation works similarly as it does in Haskell with changes to make it indent insensitive. Namely, this means that after every binding a semicolon (;) is needed to start the next expression. Further, no do is needed, the semicolon is enough to determine if an expression is in do syntax or not.

Thus like Haskell to bind terms, one states the name, then a left arrow (<-), then the monadic expression terminated by a semicolon.

For non bindings, just the monadic expression with a semicolon is needed.

The last expression in do notation does not need a semicolon.

Example:

  let foo my =
    x <- Just 5;
    y <- my;
    pure (x + y)


  let bar =
    Out.print "hello";
    name <- In.prompt "What is your name";
    Out.print ("hello" <> name)

Local opens

Local opens work just like global opens, however one has to write in then a body like other inner definitions.

Example:

  let foo xs ys zs =
    open List in
    append xs (append ys zs)

There is also a more brief syntax where the module is then following by .( ... code here ... )

Example:

  let foo xs ys zs =
    List.(append xs (append ys zs))