Juvix Compiler Documentation
This page gives the overview of all other documentation pages
Welcome
Welcome to the documentation of the Juvix Compiler. Juvix is a
functional compiler that is inspired by various dependently typed
languages (Agda, Idris, F★), ML's (OCaml, Haskell), Lisps (Common
Lisp, Scheme), and much more. Thus our compiler has many idioms and
ideas that are not typically found within a standard Haskell codebase.
The documentation is broken down into many sections that are linked at
the bottom of this page. Each are unique in their topic. One section
may cover General Overviews of the entire codebase while another may
just talk about the justification for our various idioms.
Getting Started
See the overview sub-page to learn more about the Juvix compiler process
1 - Terminology
Definitions for the more unique terminology we use.
Shared Context
Since most of us come from a Haskell/ML background, the bulk of that
terminology will be omitted here. However since most of us are
unfamiliar with all the other inspirations Juvix is derived from, we
lay out such terminology here.
Image
This is probably the most confusing terminology we use. An image is a
living changing environment, from which where we can take in new code
inputted by the system and/or user and change the world the code lives
in. There are many practical benefits of this, see the design section
of this web-page for more.
Note that Image is used for more than just programming languages. One
can view the kernel of an Operating system as an image (we often do
download image files.), or if the operating system is Image based, the
entire thing. The terminology also shows up in docker Images.
Effects
By effects we typically mean effect systems, and for our
implementation we typically use the word, effect/effects, to mean algebraic
effect/effects.
The key idea of effects is that a type doesn't only describe the shape
of the final computation, but rather an approximation of the various
effects
it causes when we evaluate it.
Effects have a rather rich history starting in 1987 with the FX-87
programming language. to see the progression of the ideas see the
following slide by Benjamin Pierce
Types and Programming LanguagesThe Next Generation (2003) page 36
Desugaring
Desugaring is a word synonymous with macro-expansion, or syntax
expansion. It is a process by which we transform one syntax form to
another by some abstract syntax transform. See the following example.
;; See the S-expression section for slightly more information on this
;; structure
;; Input generic structure
(:do (%<- name-1 body-1)
body-2
body-3
…
(%<- name-n body-n)
return)
;; Ouptut generic structure
(>>= body-1
(lambda (name-1)
(Prelude.>> body-2
(Prelude.>> body-n
(… (Prelude.>>= body-n (lambda (name-n) return)))))))
In this example we desugared Haskell do syntax into the base monadic
operation form.
Core
Core
is a term we use to mean a class of Intermediate representation
(IR) Layers of the Juvix compiler. These classes of layers deal with
the type theory, and is what our fronted syntax desugars into.
Currently there are over 3 layers of Core IR, Core.HR
, Core.IR
,
Core.Erased
, etc. You can find out more about those layers over in
the overview section.
Read Time
Interaction Nets
Interaction Nets
are a novel graphical model (graphical as in graph
based) of computation. Like all graphs, Interaction Nets, are composed
of nodes, edges, and edges which is the spot where the edges connect
to on a node. Each node has a single principled port
. Each node may
be of a different types (Cons is an unique node, so is plus, every
operation you can think of is unique.) and have different rules. The
rules are limited in that, they can only fire when two principled
ports touch, and there is a rule talking about the reduction of these
nodes.
Since the operations are limited to principled ports only, the system
is trivially parallelizable. There are many other benefits as well. To
see some example pictures, click on this wikipedia page here.
Parameterization
We often use the word Parameterization
around the code-base
pertaining to the various backends. In particular it is the
abstraction (interface) around primitives the backend may have.
Such abstractions include but not limited to:
The primitive types in the code-base
Applying primitive types for type checking
A list of builtin values the backend may have
Abstractions over various literals in the fronted language
Form
Form is a term often used in the lisp world. In Juvix we mean it much
in the same way.
S-expression
Juvix is interesting in that we have the ability to reason over the
syntax abstractly. This form, for the sake of ease of use and
simplicity, should be as minimal as possible. Thus we have chosen to
use S-expressions
as our Form.
An S-expression consists of an atom (think symbol, number, string
etc.) or a list containing more S-expressions. They are often written
as such
;; these are examples of atoms
1
a
hi
+
;; these are examples of lists
(+ 1 2 3)
(+ (+ 3 4 5) (+ 1 2 3))
They are quite versatile and offer a consistent interface of
manipulation throughout the code-base.
Symbol
A symbol is simply an identifier for some value in the language. For
example in
foo
is the symbol that when evaluated becomes 3. Some symbols
evaluate to themselves.
Symbols are not prefixed, to see more of those see the Name Symbol
section on this page.
Name Symbol
Name Symbols (Or NameSymbol
as in the code-base) are qualified
symbols. So instead of saying foo
we could say Prelude.foo
. There
is a distinction, as resolution of non qualified symbols are a bit
more implicit and don't explicitly talk about which resolution path
they take unlike Name Symbols. There are also some forms (like def
)
that expect the name they take to be a symbol rather than a Name
Symbol, these are often arbitrary choices that may change in the
future.
Telescope
Recursive Groups
2 - Design
A document on various language design decisions and philosophies for Juvix. Also hosts a variety of sub documents on various language aspects
Design Philosophy
The philosophy of Juvix is one of generality and flexibility. It
should not be the case that the Juvix compiler developers should have
many extra tools over the users of the language. The compiler/language
developers/designers should be able to chose what is canonical,
however the forms of the language should be generic in a way that
ordinary users could have made if they so wish.
For example, the form of def
in the frontend language, is not a
fixed piece of syntax, but rather a simple macro over a primitive that
is most suitable for our core type theory. This means that if one is
an application developer for some domain, like web-browsers, they can
make their own version of def
(def-command
) that fits into their
domain better (registering hooks and functions into the web-browser
itself!).
In order to achieve this goal of flexibility, Juvix has made many
design decisions that allow traditionally built in tools to be first
class in Juvix.
Another design goal of Juvix is to allow for safer programs that are
mission critical. When one is writing smart contracts that have
millions of dollars at stake, a mistake is unacceptable. Thus Juvix
has dependent types. Dependent types are wonderful at removing entire
classes of errors with proofs about behavior. Further they can make
many idioms in a typed language simpler.
However dependent types on their own can be confusing, as the specific
incantations one needs to write to be able to effectively prove code
is often not intuitive. Juvix thus seeks to make tooling in this area
better, offering abstractions that help the language work to help you
achieve your goals.
Decisions
2.1 - Extensible
Talking about the Extensible system.
Like Trees that Grow. When you give an extensible declaration. What it
does is add extra field to every constructor. It does this via type
fields for each constructor. For example on Either
, we'd get Left a
Annotation
, Right b Annotation
, and an extension field. The type
families for each constructor also has the extension type. If the
extension is a type annotation, you can add the primty
and primval
to have the primitive types.
These live in the Extend
modules. We also generate a constructor
extTerm =
\_primTy _primVal ->
IR.defaultExtTerm
{ IR.nameLam = "Lam0",
IR.typeLam = Just [[t|NameSymbol.T|]],
IR.namePi = "Pi0",
IR.typePi = Just [[t|NameSymbol.T|]],
IR.nameSig = "Sig0",
IR.typeSig = Just [[t|NameSymbol.T|]],
IR.nameLet = "Let0",
IR.typeLet = Just [[t|NameSymbol.T|]]
}
extElim :: p1 -> p2 -> IR.ExtElim
extElim =
\_primTy _primVal ->
IR.defaultExtElim
{ IR.typeBound = Nothing,
IR.typeFree = Nothing,
IR.typeElimX = [("Var", [[t|NameSymbol.T|]])]
}
IR.extendTerm "Term" [] [t|T|] extTerm
-- TODO allow extendTerm to reorder fields?
pattern Lam x t = Lam0 t x
The typeElimX
can add multiple constructors.
The name<Constructor>
is the new constructor name.
The type<Constructor>
talks about an extra argument.
The extendTerm
is how we instantiate the alias and extensions
system.
The pattern
are for synonyms to make the argument order better
2.2 - Image System
In which we talk about the Image System
What Is the Juvix Image System
The definition for Image can be found in the terminology section. As
for the image system itself, it is the way we are able to talk about
Juvix in Juvix. It allows us primitives in Juvix to be able to reason
over what is in scope, or to change and add to the system while it's
live. It is a very powerful technique that allows Juvix to do a lot
with little effort.
Image System Goals
Research Areas
Images are as old as computation, with the Lisp 1.5 compiler being an
image which wrote to and read from a magnetic strip (source
required). However there is very little literature on the
subject. This was a list of resources we were able to collect about
the subject. Further, since the technique has fallen out of fashion in
the past few decades, there are many areas we can explore on this front
First Class Environments
First class environments are a way of making the environment code
lives in first class. This means we could talk about discrete
environments and more easily distinguish between startup, compilation
time, evaluation time, etc environments. There are many bigger
implications as well, like being able to talk about environments over
the network, having sandboxed environments, etc.
While the benefits are quite promising, the technique seems to suffer
from a lack of research. There is the SICL paper that talks about
adding it to Common Lisp, and the SBCL
compiler which at it's
bootstrapping phase works on creating it's own environment. There is
also MIT scheme which has first class environments
With Juvix, we can explore these ideas a lot further, as we have many
systems which facilitate this kind of reasoning. Three particular
constraints come together for this.
We have an algebraic effects system
This is very significant for making progress, as we can see
precisely how our effects touch the image. If we can determine
that, we can likely create a system that can precisely denote the
scope of the image we need to reason over.
With that said, we need to be careful with our thoughts here, as
free variables are an effect that are discharged by the default
handler. Meaning that if the system fully throws this information
away as it can lookup in the image what answers are bound to and
throw away what it relies upon in the image to function, then
this will be harder to manage.
We have multiple backends we must support in a single program.
This means that we already have to tackle the issue of some parts
of our language being out of scope.
We are likely to include parts of Racket's #lang
research,
which should help talking about environments on a module level.
We have a first class module system.
We might be able to formalize this idea as our module system, or
view it as an extension of our module system.
After all the image is one big module!
If we are able to synthesize 1., 2., and 3. properly along with our type
system, we should have a very ergonomic way of creating very eloquent
systems that can talk about library versioning, backend support, and
evaluation time processing.
Possible Benefits
Talk formally about the various environments
Better Integration with Block chains
We could ingest Anoma data into the image, and have multiple
different conflicting nets around.
We can fully explore them as if they were on our live image, and
switch between the environments and see what effect our code
would have on different sets of conflicting data!
Give us a formal way to talk about the environment where our type
system gets injected
The base image will have to not have an image, as that is rather
hairy to do.
What this gives us is an environment in which we can talk about
everything having a nice type system.
Leads into a Possible packge/versioning system.
2.3 - Subtyipng
This article covers the discussion on subtyping and considerations we’ve had
This document is bound to change once we have a proper writeup of row polymorphism
Subtyping
An argument about the extent of subtyping
Subtyping is an important part of any module or object system. The
rules these systems usually care about revolve around the subtype of
Records
and Records
in relation to functions.
About Subtyping Rules on Records
An example of a record subtyping is as follows, we will use a :> b
to
mean that a
is a subtype of b
∀x, a … v : Type. {x : Type, a : Type … v : Type} :> {a … v : Type}
∀x, a … v : Type. ∀ f : {a : Type … v : Type} → Type
f :> f : {x : Type, a : Type … v : Type} → Type
What 1 tells us is that we can forget a field, and thus a record
with more fields is a sub type of the same record with less fields
What 2 tells us is that a function which accepts a record with fields
a
through v
can also accept a record with those fields and extra
fields. The intuition is that in functions the sub typing relation is
reversed, but we still need to forget information to get our
relationship.
Implications of Rules and Arguments about Behavior
These implications can be quite worrying, consider the longer-form
example below. We will introduce all the forms first then make an
argument about what various object systems have done and what we are
proposing to do.
f0 : λX:>{x : int} -> x1 -> x2 -> x1
f0 rec1 rec2 = rec1 { x = rec1.x + rec2.x}
f1 : λX:>{x : int} -> X -> X -> X
f1 rec1 rec2 = rec1 {x = rec1.x + rec2.x}
f2 : λX:>{x : int} -> X -> X -> X
f2 rec1 rec2 = {x = rec1.x + rec2.x}
-- note that the t is opaque
sig set : int -> {x : t, type t, add : t -> t -> t}
mod set x =
type t = int
add = (+)
x
end
-- We include the set
sig bag : int -> {x : t, type t, add : t -> t -> t}
mod bag x = include (set x)
foo a b c =
let ab = a in
...
foo1 a b c =
let ab = new bag in
...
f3 : λX:>{x : int} -> X -> -> X
f3 rec1 = rec1 {x = rec1.x + rec1.x}
- NOTE
f0 uses a short hand notation. x1 and x2 are sub types of
{x : int}.
The main functions we wish to discuss are f0, f1, and f2. In an object
oriented system all three of these are permissible. This is due to in
a sub typing relation these are all fine, we just end up with some
subtype of {x : int}.
However this is a major issue, see Subtyping, Subclassing, and Trouble
with OOP, written by Oleg for how insidious this can make errors.
Thus on <2020-11-19 Thu> we have decided that f2 signature is invalid
as we don't just have a subtype of {x : int}
, but in fact our return
type is {x : int}
. Therefore, we don't have the issue of the function
foo
in the article above, which we will now quickly detail.
In our code above, foo
and foo1
both return a derivative of the
ab
value. In an object oriented language the following type would be
applicable to both
The issue is that if we send in a set instead, then foo
gives us a
set, while foo1
would give us a bag. Thus in our type system
foo1
's signature would be
However what was more contentious was what to do with f1 vs f0. Should
we allow
f1 {x = 5; y = 3} {x = 1}
if f1 were f0 then obviously yes, as we know we get two seperate
subtypes and return the first.
An argument for allowing this stems on the answer of
This is how subtyping relations work. While a fair argument, we have
concluded that either way is a valid way of working with sub typing,
as we can merely check the two arguments are the same, then begin
doing subtyping.
Another argument is around the flexibility of functors if we disallow
the above example. However this argument was also flawed in that we
still have a subtyping relation so f3 in our code block can be called
like
Thus the argument regarding functors with a single argument can be
disregarded. However, for multiple arguments the legitimacy of the rest
of the claim can be discredited by having a mechanism which allows the
user to specify that the two arguments are really subtypes of each
other. This explicit system still allows the mixup of bag
and set
in Oleg's article, however it must be done so explicitly and full
intention, with the most serious error still being impossible!
Finally the last claim for f1's legitimacy rests on flexibility of a
library user and the incorrect choices of the author. f1 should really
allow different subtypes, it just needs to specify it so! For this we
can gain back our expressiveness by adding an coerce
function which
forgets fields to make the two records the same, and thus legitimate
in f1
f1 {x = 5; y = 3} (coerce {x = 1})
-- or
f1 (coerce {x = 5; y = 3}) {x = 1}
Further Arguments/Unresolved Issues
As of <2020-11-19 Thu> we still don't have a full answer for the
following questions
How do we handle usages
How do we handle refinements
2.4 - Trees-That-Grow
This covers why we use the extensible abstraction over trees that grow
Trees that Grow
In the Juvix codebase we have a variety of similar but different abstract
syntaxes for Core—for example, adding type annotations or names (plain IR is
locally nameless). Rather than duplicating code that needs to work on
multiple kinds of syntax, or keep a single abstract syntax and write partial
functions, we instead use the idiom described in the
Trees that Grow
paper by Najd and Peyton Jones.
The basic idea is to have a type family for each branch of the constructor
and place anything not present in plain IR in there. In addition, there is
also an extra "extension" type family for each datatype in the syntax to add
extra constructors (for example, names). The first group all default to ()
,
providing new information, and the second group to Void
, ensuring they
don't appear unexpectedly.
Functions which don't care about the annotations can simply ignore them and
thus be applied to any kind of syntax (as long as the extensions are still
Void
). Alternatively, if they need to work on the annotations as well, they
can be encoded using type classes (see Juvix.Core.IR.Evaluator.subst
for an
example of this).
2.5 - S-expression Syntax
The concrete syntax of the Juvix syntax after turning into a LISP
This document is bound to change once a syntax redesign will
happen. This will be greatly minimized
For this page we give examples of the syntax rather than a hard bnf
unlike what is found in the syntax page
Generation
Top Level
Let
basic
input
let foo x
| x == 2 = 3 + 4 + x
| else = x + 2
output
(:defun foo (x)
(:cond
((:infix == x 2) (:infix + 3 (:infix + 4 x)))
(else (:infix + x 2))))
Sig
basic
input
sig foo : int -> int -> int
output
(:defsig foo (:infix -> int (:infix -> int int)))
usage
Types
record
input
type foo x y z =
{ y-axis : y
, x-axis : x
, z-axis 1 : z }
output
(type foo (x y z)
(:record-d
(y-axis (:primitve Builtin.Omega) y)
(x-axis (:primitve Builtin.Omega) x)
(z-axis 1 z)))
first sum
input
type foo x y z =
| Foo {y-axis : y, x-axis : x, z-axis : z}
| Bar (var1 : x) (var2 : y)
| Car : hd : x -> cdr : foo y -> foo x y z
output
(type foo (x y z)
(Foo (:record-d (y-axis (:primitve Builtin.Omega) y)
(x-axis (:primitve Builtin.Omega) x)
(z-axis (:primitve Builtin.Omega) z)))
(Bar (:paren (:infix : var1 x))
(:paren (:infix : var2 y)))
(Car (:arrow
(:infix : hd
(:infix -> x
(:infix : cdr
(:infix -> (foo y) (foo x y z))))))))
Signature
input
type foo x y z : typ -> typ -> typ = Foo a b c
output
(type (foo :type (:infix -> typ (:infix -> typ typ))) (x y z)
(Foo a b c))
Module
basic
input
mod foo =
let bar = 3
type zar = Boo
end
output
(:defmodule foo ()
(:defun bar () 3)
(type zar () (Boo)))
guards
input
mod foo x
| x == 3 =
let foo = 3
let bar = 5
| else =
let foo = 5
end
output
(:defmodule foo (x)
(:cond ((:infix == x 3)
(:defun foo () 3)
(:defun bar () 5))
(else
(:defun foo () 5))))
Expression
Let
basic
input
let foo y =
let fi = 3 in
fi
output
(:defun foo (y)
(let fi () 3
fi))
arguments
input
let foo y =
let fi x = 3 + x in
fi y
output
(:defun foo (y)
(let fi (x) (:infix + 3 x)
(fi y)))
Module
basic
input
let foo =
mod Bar =
let bar = 3
type zar = Boo
end in
Bar.bar
output
(:defun foo ()
(:let-mod Bar ()
((:defun bar () 3)
(type zar () (Boo)))
Bar.bar))
Tuples
basic
input
output
(:defun foo () (:tuple 1 2 3))
Lists
basic
input
output
(:defun foo () (:list 1 2 3 4))
Records
basic
input
output
(:defun foo () (:record (a) (b 2)))
Do
basic
input
let foo xs =
a <- xs;
more-comp;
pure a
output
(:defun foo (xs)
(:do (%<- a xs)
more-comp
(pure a)))
Lambda
basic
input
output
(:defun foo (y)
(:lambda (x) (:infix + x y)))
Open
basic
input
let foo y =
open Prelude in
x + y
output
(:defun foo (y)
(:open-in Prelude
(:infix + x y)))
Parens
basic
input
output
(:defun foo (y)
(:infix * (:paren (:infix + y 3)) 9))
Block
basic
input
let foo y = 3 * begin y + y end
output
(:defun foo (y)
(:infix * 3 (:progn (:infix + y y))))
Primitive
basic
input
output
(:defun add ()
(:primitive Michelson.add))
Declaration
basic
input
let foo = declare infixl (+) 3 in 3
output
(:defun foo ()
(:declaim (infixl + 3) 3))
Cond
basic
input
let add x =
if | x == 3 = 1
| x == 4 = 5
| else = 0
output
(:defun add (x)
(:cond ((:infix == x 3) 1)
((:infix == x 4) 5)
(else 0)))
Case
basic
input
let foo =
case x of
| (A (B {a, b})) -> a + b
| A (B c) (C d) -> c + d
output
(:defun foo ()
(case x
((A (B (:record (a) (b)))) (:infix + a b))
((A (B c) (C d)) (:infix + c d))))
Notes
Design Principles
Overall the goal of the S-expression incantation is twofold .
An unambiguous syntax that is faithful to the original ML syntax
This is a necessity to have limited strife between the ML syntax.
and the work we need to do on the generic IR-syntax layer.
Unambiguous in this case implies bijection, as we have to convert
between the two forms.
A fully realized syntax that is readable and looks as if it were a
standalone language.
On Verbosity
Overall a few forms in this representation are quite verbose and
thus harder to read than they should be, see
(:arrow
(:infix : hd
(:infix -> x
(:infix : cdr
(:infix -> (foo y) (foo x y z))))))
;; should be
[ hd : x -> cdr : (foo y) -> (foo x y z) ]
(:record
y-axis y
x-axis x
z-axis z)
;; should be
{y-axis y
x-axis x
z-axis z}
This is due to lacking reader macros at this level. Overall this can
be largely improved in the future for better inspection (we thus
will offer another level of inspection that can do away with the
infix view!)
On Forms
The syntax is heavily inspired from Common Lisp (CL). There are of course a
plethora of changes, however some key concepts have survived through
this transition.
Namely for adding extra property based information in declarations we
got for an alist concept
(type (foo :type (:infix -> typ (:infix -> typ typ))) (x y z)
(Foo a b c))
which is taken directly from CL's defstruct
construct.
Even naming conventions like defun
are lifted directly from Common
Lisp.
With that said, many forms (namely the lets) lack an extra set of
parenthesis when in comparison to CL and the Schemer language.
(let fi (x) (:infix + 3 x)
(fi y))
(let fi () 3
fi)
This is due to us fully abusing the fact that we must represent non
let blocks currently along with having a single expression each
name is bound to.
Inconsistencies
There are likely inconsistencies between some of the forms. These
should be removed as soon as possible. I have tried my best on this
front, but likely some forms are inconsistent.
Known
Signatures
Currently we have a minor point of alists on the name itself to
describe extra properties
(type (foo :type (:infix -> typ (:infix -> typ typ))) (x y z)
(Foo a b c))
This is great and we'll be adding it to Modules, however we currently
do an odd transformation in regards to signatures
(:defsig foo (:usage 0 (:infix -> int (:infix -> int int))))
;; should be
(:defsig (foo :usage 0) (:infix -> int (:infix -> int int)))
This likely should be changed
Footnotes
2.5.1 - Core S-Expression Syntax
The concrete syntax of Juvix once we Hit Core, namely the Named version (HR)
Generation
Term
Star
Haskell ADT
= -- | (sort i) i th ordering of (closed) universe.
Star Universe
ML Syntax
-- Universe 0
×₀
-- Universe 1
×₁
-- Universe n
×ₙ
S-expression Syntax
;; universe 0
(:star 0)
;; universe 1
(:star 1)
;; universe n
(:star n)
Prim type
Haskell ADT
| -- | PrimTy primitive type
PrimTy primTy
ML Syntax
S-expression Syntax
Prim
Haskell ADT
| -- | primitive constant
Prim primVal
ML Syntax
S-expression Syntax
Π / Pi
Haskell ADT
| -- | formation rule of the dependent function type PI.
-- the Usage(π) tracks how many times x is used.
Pi Usage NameSymbol.T (Term primTy primVal) (Term primTy primVal)
ML Syntax
(Π (0 | typ : ×₀) (Π (1 | _ : typ) typ))
S-expression Syntax
(:pi (typ 0 (:star 0)) (:pi (_ 1 typ) typ))
Lam
Haskell ADT
| -- | LAM Introduction rule of PI.
-- The abstracted variables usage is tracked with the Usage(π).
Lam NameSymbol.T (Term primTy primVal)
ML Syntax
-- Note we don't have (λ x : unit -> x) : unit -> unit, as that requires
-- the type to be given in the Haskell ADT, and is thus handled in Π
λ x ⟶ ∅
S-expression Syntax
Sig
Haskell ADT
| -- | Dependent pair (Σ) type, with each half having its own usage
Sig Usage NameSymbol.T (Term primTy primVal) (Term primTy primVal)
ML Syntax
Π (ω | x : A) ⟶
Σ (ω | y : B) ⟶
C x y
S-expression Syntax
(:pi (x ω A)
(:sigma (y ω b)
(C x y)))
Pair
Haskell ADT
| -- | Pair value
Pair (Term primTy primVal) (Term primTy primVal)
ML Syntax
S-expression Syntax
Let
Haskell ADT
| -- | Let binder.
-- the local definition is bound to de Bruijn index 0.
Let Usage NameSymbol.T (Elim primTy primVal) (Term primTy primVal)
ML Syntax
let 2 | a = f x y in
<a,a>
S-expression Syntax
(:named-let (a 2 (f x y))
(:pair a a))
Unit type
- Note
Should this be treated specially?
Haskell ADT
ML Syntax
S-expression Syntax
Unit
- Note
Should this be treated specially?
Haskell ADT
ML Syntax
S-expression Syntax
Record
Haskell ADT
| -- | Constructor for a record type.
Record [(Symbol, Term primTy primVal)]
ML Syntax
{name₁ = term₁, name₂ = term₂, …, nameₙ = termₙ}
S-expression Syntax
(:record name₁ term₁ name₂ term₂ … nameₙ termₙ)
RecordTy
Haskell ADT
| -- | Record type. (It's a list not a map because the dependency order
-- needs to be preserved.)
RecordTy [(Symbol, Usage, Term primTy primVal)]
ML Syntax
{name₁ usage₁ : term₁, name₂ usage₂ : term₂, …, nameₙ usageₙ : termₙ}
S-expression Syntax
(:record-ty
(name₁ usage₁ term₁)
(name₂ usage₂ term₂)
...
(nameₙ usageₙ termₙ))
Elim
Var
Haskell ADT
= -- | Named Variables
Var NameSymbol.T
ML Syntax
S-expression Syntax
App
Haskell ADT
| -- | elimination rule of PI (APP).
App (Elim primTy primVal) (Term primTy primVal)
ML Syntax
S-expression Syntax
Ann
Haskell ADT
| -- | Annotation with usage.
Ann Usage (Term primTy primVal) (Term primTy primVal) Universe
ML Syntax
(ω | f x y : C x y : * 0)
S-expression Syntax
(: (f x y) (:usage ω :universe 0)
(C x y))
;; Should we adopt it without names?
(: (f x y) (ω 0)
(C x y))
Here we have a choice, I′m not sure which one I prefer, the first
is more explicit, but may be a bit too verbose
Lookup
Haskell ADT
| -- | Project an element from a record type.
Lookup (Elim primTy primVal) Symbol
ML Syntax
S-expression Syntax
Value
Instead of giving each variant a section, we will instead note that
Values relate to Terms, except instead of an Elim they have a Neutral.
Thus, it's an extension of Term.
Neutral
Neutral is the same as Value, except we remove annotations from the
possibilities
Pattern
Constructor
Haskell ADT
= PCon GlobalName [Pattern primTy primVal]
ML Syntax
S-expression Syntax
;; No other form has an unattended name with arguments, so this is
;; safe
(Cons a as)
Pair
Haskell ADT
| PPair (Pattern primTy primVal) (Pattern primTy primVal)
ML Syntax
-- We have syntax for multi pairs
‹x, y, z›
-- This is just an alias for
‹x, ‹y, z››
S-expression Syntax
(:pair x y z)
;; This is just an alias for
(:pair x (:pair y z))
Unit
- Note
Should this be treated specially?
Haskell ADT
ML Syntax
S-expression Syntax
Var
Haskell ADT
ML Syntax
S-expression Syntax
Dot
Haskell ADT
-- Deduction by the unifier, that the term should already be
-- known
| PDot (Term primTy primVal)
ML Syntax
-- Taken From
-- data Double (n : ℕ) : ℕ → Set where
-- double : Double n (n + n)
-- foo : ∀ m n → Double m n → {!!}
-- foo m .(m + m) double = {!!}
.(m + m)
S-expression Syntax
Primitive
Haskell ADT
ML Syntax
S-expression Syntax
Raw Globals
This is the target that the context type should hold. However we
have to note a few facts about how things are stored.
We don't really have ML syntax counterparts for these, so instead we
will give the raw structure as the syntax.
2.6 - Syntax
This is the BNF of Juvix’s syntax
Syntax
header ::= mod <symbol> where <top-level>*
| <top-level>*
top-level ::= <type>
| <module-open>
| <type-class>
| <type-class-instance>
; These four last, due to lack of words before
| <module-signature>
| <infix-declare>
| <module>
| <module-open>
| <signature>
| <function>
;;;; Declarations ======================================================
declarations ::= declare <declare-adt>
declare-adt ::= <infix-declare>
infix-declare ::= infix <symbol> <number>
| infixl <symbol> <number>
| infixr <symbol> <number>
declaration-expression ::= <declarations> in <expression>
;;;; Functions ======================================================
;; let until indent sensitivity is gone
function ::= let <function-body>
function-body ::= <name> <args>+ = <expression>
| <name> <args>+ <cond-logic>*
;; Match-logic could just be a name, and so is thus fine
;; # is for implicit
;; Should we have implicit pattern matching like here?
args ::= <match-logic>
| #<match-logic>
;; sig only here while indent sensitivity is not here
;; if we only have usage in arrows, we should allow a way to have
;; usage information of variables as well!
signature ::= sig <name> : <expression>
| sig <name> : <constraint> =\> <expression>
| sig <name> <usage> : <expression>
| sig <name> <usage> : <constraint> =\> <expression>
constraint ::= <type-name>
| ( <constBody>* )
constBody ::= <type-name> , <type-name>
| <type-name>
;;;; Types ==========================================================
type ::= <data-declaration>
;; Should we let there be a refinement in adt type declaration?
;; ? means it may or may not be there
;; it makes sense to have the number here, as it can't be confused with an
;; expression type
data-declaration
::= type u#<usage>? <non-capital-name> <symbols>+ : <expression> = <adt>
| type u#<usage>? <non-capital-name> <symbols>+ = <adt>
adt ::= <sum1> <sum>*
| <product1>
sum1 ::= <name>
| <name> <product>
;; | is special in bnf
sum ::= \| <name>
| \| <name> <product>
product1 ::= { <name-type-comma>+ <name-type>* } -\> <type-name>
| { <name-type-comma>* } -\> <type-name>
| { <name-type-comma>+ <name-type>* }
| { <name-type-comma>* }
product ::= { <name-type-comma>+ <name-type>* } -\> <type-name>
| { <name-type-comma>* } -\> <type-name>
| { <name-type-comma>+ <name-type>* }
| { <name-type-comma>* }
| : <expression>
| <expression''>*
name-type ::= <name> <usage>? <type-signature>
| #<name> <usage>? <type-signature>
name-type-comma ::= <name-type> <comma>
;; { } here are a refinement type!
type-refine ::= <expression> { <expression> }
| <expression>
| <name> : <expression>
| #<name> : <expression>
| <name> : <expression> { <expression> }
| #<name> : <expression> { <expression> }
;;;; Arrows =========================================================
arrow ::= -<usage>-\>
;;; Modules ========================================================
;; For all intensive purposes, modules are the same as values, just with
;; top level expressions, and a minimal amount of sugar
;; This doubles as our import
module-open ::= open <module-name>
module-open-expression ::= open <module-name> in
| <module-name>.( <expression> )
;; We are going to make modules a bit more interesting, syntax wise
;; imagine modules were functions with capital name to delineate
;; thus module signatures have the same signature look as functions
;; Interestingly enough module "functors" can take more than just modules
;; they can take any value, however for examples, we will encourage the use
;; of taking (and thus parameterizing) modules
;; let and end are there until indent sensitivity is in
module ::= mod <name> <args>+ = <top-level>* end
| mod <name> <args>+ <cond-top>* end
;; we use mod here to disambiguate it for parsing speed
moduleE ::= mod <name> <args>+ = <top-level>* end in <expression>
| mod <name> <args>+ <cond-top>* end in <expression>
;; what should we allow in a module signature?
;; begin and end are only here while we aren't indent sensitive
module-signature ::= <module-signature-helper> Module <sigs-and-types>+ end
sigs-and-types ::= <sig>
| <module-signature>
| <type>
module-signature-helper
::= sig <name> : <expression>
| sig <name> : <type-name> =\> <expression>
| sig <name> <usage-f> : <expression>
| sig <name> <usage-f> : <type-name> =\> <expression>
cond-top ::= \| <expression> = <top-level>*
;;;; Types Classes ==================================================
;; Need end if we are indent sensitive!
type-class ::= class <type-name> where
| class <type-name> =\> <type-name> where
;; Need end if we are indent sensitive!
type-class-instance ::= instance <type-name> where
;;;; Expressions ====================================================
;; See comments about which to keep and which to maybe remove
expression'' ::= <match>
| <if>
| <cond>
| <record-access>
| <module-lookup>
| <let>
| <moduleE>
| <let-type>
| <module-open-expression>
| <where>
| <string>
| <number>
| <lambda>
| <tuple>
| <list>
| <parens>
| <symbol>
; This is useful for having nested do's or matchs
| <block>
| <do>
| <comments>
| <arrow>
| <infix>
| <record-creation>
| <type-refine>
; TODO
| <record-update>
| <declaration-expression>
expression ::= <application>
| <expression''>
usage ::= <expression>
usage-f ::= <constant> | ( <expression> )
record-access ::= <name>.<name>
module-lookup ::= <module-name>.<name>
application ::= <name> <expressions>*
lambda ::= \\ <match-logic>* -\> <expression>
symbol ::= <name>
;; useful for match, and nested do's!
block ::= begin <expression> end
do ::= <do-body>*
do-body ::= <exprsesion> \; <expression>
list ::= [ <command-list>* ]
commad-list ::= <exprsesion> , <expression>
tuple ::= ( <command-tuple>* )
commad-tuple ::= <exprsesion> , <expression>
parens ::= ( <expression> )
comments ::= -- <any-text-not-new-line> \n
| \n-- <any-text-not-new-line> \n
| <comments-rec>
comments-rec ::= <multi-comments>
| {- <comments-rec> -}
multi-comments ::= {- <any-text-not-{-> -}
infix ::= <expression> <inifx-name> <expression>
record-creation ::= { <name-set-comma>* }
name-set-comma ::= <name-set-creation> ,
| <name-set-creation>
name-set-creation ::= <name> = <expression>
| <name>
;;; Matching ===================================
match ::= case <expression> of <match-l>*
match-l ::= \| <match-logic> -\> <expression>
match-logic ::= <name>@<match-logic'>
| <match-logic'>
match-logic' ::= ( <match-logic''> )
| <match-logic''>
match-logic'' ::= <record-match>
| <constructor-name> <match-args>+
| <constant>
match-args ::= <name>
| <match-logic>
| <constant>
record-match ::= { <name-set-comma-match>* }
name-set-comma-match ::= <name-set> ,
| <name-set>
name-set ::= <name> = <match-logic>
| <name>
;; we should remove either if or cond!?
if ::= if <cond-logic>*
cond ::= cond <cond-logic>*
constant ::= <string>
| <number>
| <char>
;;; Bindings ===================================
;; Due to trying to be less indent sensitive,
;; we only look for the in alternative,
;; is that we only have a single binding per let.
let ::= let <function-body> in <expression>
type-let ::= let <type> in <expression>
;; Does this even make sense to have?
;; Juvix is not lazy, how is order determined?
;; is it only for pure values???
where ::= <expression> where <bindings>*
binding ::= <match-logic> = <expression>
;; note it's fine to use |,
;; as matches have to be a pattern,
;; and thus not some expression
;; note in stdlib else and otherwise will both be true
cond-logic ::= \| <expression> = <expression>
;;; Numbers ====================================
number ::= <digits>*.<digits>*
| <digits>*<exp>
| <digits>*.<digits>*<exp>
digits ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
exp ::= e <digits>*
;;; Strings ====================================
;; Give nicer string syntax?
string ::= " <escaped-string>+ "
escaped-string ::= <ascii-no-quotes-no-backslash> <escaped-string>+
| \" <escaped-string>+
| \ <escaped-string>+
;;; Universe ====================================
;; for now, set it up to what F* has, expand it later
universe-expression ::= u#<name>
| u#<name> + u#<name>
| max u#<name>*
;;;; Misc ===========================================================
;; ; is comment in bnf
comma ::= ,
semi ::= \;
name ::= <utf8-non-reserved>
non-capital-name ::= <utf8-no-capital>
capital-name ::= <utf8-capital>
;; may want m e or Map.t int string?
type-name ::= <name> <others-names>+
infix-symbol ::= <utf8-infix>
module-name ::= <name> ; enforce capital names?
constructor-name ::= <capital-name-and-symbols>
utf8-infix ::= `<utf-non-reserved>`
| <UTF.Symbol>
utf8-non-reserved ::= <UTF.Alpha>
| (<utf8-infix>)
utf8-no-capital ::=
utf8-capital ::=
3 - Idioms
Reasons for various idioms in the codebase
The document to see the rational for any of our idioms
Qualified Imports
Throughout the codebase every import except the standard library is
imported qualified. This allows us the flexibility of having common
naming patterns between modules.
Some of the more obvious ones are
Further we often import our modules with long names, like Namesymbol
for Juvix.Library.NameSymbol
.
Consider the following example if we have short qualification names.
import qualified Juvix.Library.NameSymbol as N
... 100+ lines of logic ...
foo :: N.T -> Symbol -> Symbol
foo ns symb = N.toSymbol <> symb
... more lines of logic ...
In complicated logic where we have over a dozen imports, it's quite
taxing to see N.toSymbol
, as what is this N? However it becomes very
clear if we saw NameSymbol.toSymbol
import qualified Juvix.Library.NameSymbol as NameSymbol
... 100+ lines of logic ...
foo :: NameSymbol.T -> Symbol -> Symbol
foo ns symb = NameSymbol.toSymbol <> symb
... more lines of logic ...
Further it makes our standard idioms of to<Type>
(toSymbol
) become
a bit harder to reason about, what is going to Symbol
in this
instance? We'd be forced to have names like N.NameSymboltoSymbol
,
essentially removing any standardization of module's and ensuring that
we import everything unqualified.
Module.T
Most modules contain a type T
within them. For example we have
NameSymbol.T
. The T
represents the quintessential type of the
module. The decision is a consequence of having long qualified names
for each module imported.
Consider the following scenario where we did not make this choice, and
instead had the full name.
import qualified Juvix.Library.NameSymbol as Namesymbol
... 100+ lines of logic ...
foo :: NameSymbol.NameSymbol -> Symbol -> Symbol
foo ns symb = NameSymbol.toSymbol <> symb
... more lines of logic ...
Here we are quite tempted to import the module as unqualified for this
one type, as it's obvious what we mean! Further, it might encourage
short import names like N
which we discuss in the previous section.
The name T
over Ty
or any other variant comes from OCaml where
they name all their types T
. This in OCaml has practical benefits
for their module functors which expect the module to have some
principled type T
to be generic over. This has the side effect that
principled functions and types are named consistently throughout
modules, which helps us get more of an intuition about the code
without having to ask what is the point of any particular module.
Capability
Capability is our replacement for MTL in the codebase. See the article
linked below for a detailed explanation of the benefits over MTL.
More can be read in our sub article below
3.1 - Capability
Further expanding upon why we chose capability
Usage of the Capability Library
Historically Haskell has been written with the mtl
library. This library
allows Monads
to stack on top of each other in a linear fashion. This
is very nice, as it allows the pluming of how each layer interacts to
be abstracted away. However these advancements do not come without
costs which are outlined below
Solving These Issues with the Capability
library
Further Improvements
Currently the capability library only displays each field of a record
as an effect. Ideally we'd have effects explain certain kinds
of computation instead, and logically bundle these fields into some
kind of greater effect.
Tomas has brought to my attention of the eff library which may some
day replace or aid our usage of capability.
4 - Juvix
The one stop shop for all of Juvix documentation
This is a placeholder page that shows you how to use this template site.
Tutorials are complete worked examples made up of multiple tasks that guide the user through a relatively simple but realistic scenario: building an application that uses some of your project’s features, for example. If you have already created some Examples for your project you can base Tutorials on them. This section is optional. However, remember that although you may not need this section at first, having tutorials can be useful to help your users engage with your example code, especially if there are aspects that need more explanation than you can easily provide in code comments.
5 - Overview
An overview of the compiler pipeline process.
The document is laid out so the first section of each part is a small
overview, and sub-pages below that are more detailed explanations
Overview
The documents in this section give a brief and detailed explanation of
many parts of our compiler. The more detailed explanations have some
overlap with the design section.
5.1 - Core
Talks about various aspects of the Core code-base
Type Theory
Our Core type theory, called "Quantitative Type Theory" ("QTT"), is derived from the paper I Got Plenty o’ Nuttin’. Modifications and extensions are documented here.
Extensions/Annotations
Core works off a system of annotations/extensions. These currently are
crucial for understanding what happens to the data structures in Core.
More can be read about our extensible system here.
HR
HR is the first step into core from the frontend language. This phase
is a lot like IR (Read below), however it includes names rather than
De Brujin indices. Due to this, this phase is rather short lived.
IR
IR is the representation of Core in which all other extensions are
based off of. In order to facilitate fast evaluation Core is broken up
into terms
and values
. Values are like terms but they have no
β-redexs via type construction. Thus they statically rule out λ being
applied to arguments. This means values
are in normal form.
Extension Layout and mapping

Main representation
Type Checker
Type checking is the first phase of IR. This phase annotates every
node with its type and usage. To facilitate fast type checking every
type annotated is a fully evaluated value. Other phases like the
unifier use this concept as well.
Erased
After the code gets pass through IR, we then want to erase all the
type annotations we've built up. To do this, we send the code through
the Erased
representation. Unlike HR, Erased is not an extension of
IR.
Erased also erases all forms which have 0 usages. Since usages are not
first class yet, this is a rather simple.
Another key characteristic of Erased is that types are separate from
Terms as they are currently assumed to have 0 usages. Thus types are
always erased and separated out.
5.1.1 - HR
Replaces bound and free with just names, has names in the binders
Place for Pretty printing
Substitution
5.1.2 - IR
5.1.2.1 - Representation
Representation
Global
A lot of the global information predates the frontend context. A lot
can probably be integrated.
Raw Globals Type is a term, which may not be correct (what the user
wrote), Regular global the types are values, which can be checked
for equality eaiser.
5.1.2.2 - Extensions
Extensions
Extensions are an important part of Core IR. They allow certain kinds
of transformations
Transformation
5.1.2.3 - Evaluator
Evaluation of the internal representation in Juvix plays an important part in
the compilation pipeline: it simplifies the program before passing it to the
backend. As the internal representation is closely related to the λ-calculus
with de Bruijn indices, the evaluator implements all necessary functions to
do so.
In addition to the evaluation functions, the module provides functions for
inlining globals (inlineAllGlobals
/ inlineAllGlobalsElim
), turning
function definitions into anonymous functions (toLambda
), lookup functions
for variables and filtering functions for extensions.
For more information on the λ-calculus with de Bruijn indices, see
Wikipedia article. This
document will describe the way the theory is implemented for Juvix, and
expects a basic understanding of the λ-calculus with de Bruijn indices.
Substitution boils down to replacing the variables bound under the current
λ, after which the free variables in the term need to be updated to reflect
the removal of an abstraction layer, this is the task of weakening.
Substitution
Substitution lies at the core of evaluation, replacing bound variables with
terms. Currently the compiler contains several closely related substitution
implementations, both for terms and patterns. All of them serve the same
purpose, but differ slightly in their types.
Term substitution
substWith
: Substitutes any term a
with an IR.Elim'
, but results in a
term of type a
.
substTermWith
: Substitutes any term a
with an IR.Elim'
, and results
in an IR.Elim'
.
gsubstWith
: Generic substitution of f t
with an IR.Elim'
, and
results an f t
again. This generic is used for automatic deriving of
instances using GHC.Generics.
Pattern substitution
The functions are for substituting patterns, each of them take:
All the functions return an Either
with a PatternVar
in case not
substitution has happened, and the resulting term in case of a substitution.
patSubst'
: Substitutes any term a
with any of the patterns provided in
the map, and results in an a
.
patSubstTerm'
: Substitutes any term a
with any of the patterns provided
in the map, and results in an IR.Term'
.
gpatSubst'
: Generic substitution any term f t
with any of the patterns
provided in the map, and results in an f t
. This generic is used for
automatic deriving of instances using
GHC.Generics.
Weakening
Weakening or shifting of de Bruijn indices is a central part of the evaluation.
After performing β-reduction, any free variables need to be decreased to
account for removal of the λ-abstraction.
The basic weakening function is weakBy'
, which takes:
An natural number, denoting the amount to shift free variables.
A cutoff value, for which all variables less than or equal are bound, and all
variables greater than that are free. Typically this value will be 0, to
denote that we want to account for removal of the outermost abstraction.
A term a
to perform weakening / shifting on.
weakBy'
will return a term a
again, on which the weakening has been
executed. The generic function gweakBy'
is similar, but takes an f t
as
term and returns an f t
. It is used for automatic deriving of instances
using GHC.Generics.
5.1.3 - Original QTT Erasure algorithm
This article explains the QTT Erasure algorithm in a higher-level language for an easier digestion
Original QTT Erasure algorithm
Disclaimer This text only explains the QTT Erasure algorithm, you must already be familiar with how QTT works.
Initially, the algorithm assumes that:
- any typechecked term $t : \rho > T$ elaborates to a program
p
; - $\rho \neq 0$, since you want to make sure that your program actually has something to execute before translating it into runnable code;
- the translation is into the untyped $\lambda$-calculus.
The whole algorithm traverses the tree of terms elaborating programs – given that dependent types are the zero resource being erased. For a more complex language, we should probably not merge QTT erasure with an actual elaboration phase.
The only terms affected by erasure are binding-dependent terms, in McBride’s calculus, it’s $\lambda$ abstraction and application.
The algorithm goes as follows:
- for $\lambda$-abs with a zeroed resource,
- you have a variable $0 > x : S$ bound to a body $\rho t : T$;
- it elaborates to a program
p
of type $T$. Meaning, that $x$ won’t be consumed in p
– as it was evaluated already – and, then, $p$ can proceed.
- for $\lambda$-app with a zeroed resource,
- you have a function $f$ that takes a variable $0 x : S$;
- while within your context, you have a zeroed value $s : S$;
- given $f[s : S/x]$,
p
elaborates to the body $t : T$ of $f$. Meaning that what’s left for $f$ is the returning function when $s$ is applied.
- for all remaining terms, elaborate it as is.
The main idea of the paper is to pre-bind zeroed resources, making it unavailable to the elaborated program p
. The main trick of the paper is to pair zeroed resources from $\lambda$-abs with zeroed resources from $\lambda$-app. McBride’s paper provides a proof that a zeroed application will always meet a zeroed $\lambda$ abstraction.
Resources
- I Got Plenty o Nuttin by Conor McBride
5.1.4 - HR IR Conversion
This article goes over some of the idea of HR to IR conversion
HR ⟷ IR Conversion
The Pass which can be found here deals with the conversion between
Core HR and Core IR.
HR ⟶ IR
This phase simply removes the names from the HR form and constructs a
De-Brujin index from the names given in HR.
HR-Pattern
's work in a different way, since patterns are
disconnected from terms themselves and do not interact, they work on a
system of registering pattern variables, when we come across the binder form =
Outstanding Questions
Since for Terms
and Elims
we don't account for pattern
variables, and just mark everything unknown as Global, how does
Pattern
's affect this? It seems we would mark any PatternVar
above this form as a Global and lose that information
IR ⟶ HR
This conversion adds back names to the IR form. This is mainly used
for Pretty Printing.
5.1.5 - Type Theory
The type theory implemented by Juvix
5.1.5.1 - Foundations
Foundations of Juvix’s type theory
Quantitative Type Theory
Core's type theory, called "Quantitative Type Theory" ("QTT"), is derived from
the paper I Got Plenty o’ Nuttin’.
Further references for QTT include:
5.1.5.2 - Inductive data types
Juvix’s inductive data types
Juvix extends QTT with inductive data types.
5.1.6 - Typechecker
Core IR types and type-checking
5.1.6.1 - Overview
Overview of Typechecker’s place in Core
Where Typechecker fits into the pipeline
The typechecker is called via typeTerm
, which in turn is called by
typecheckErase
and typecheckEval
, ultimately via coreToAnn
,
which in turn is called by the backend-specific typecheck
functions
(which will soon be factored out into a common Pipeline.typecheck
function), which fits between Pipeline.parse
and Pipeline.compile
.
I believe that once function typechecking is implemented, typecheckErase
and typecheckEval
will also need to call typeCheckDeclaration
.
Overall structure
IR/Types/Base.hs
defines the elements of the syntax of QTT: terms,
values, variables, and patterns.
IR/Types/Globals.hs
defines the main
compound types derived from the Base
types: functions and datatypes.
IRAnn/Types.hs
defines the typechecker-specific annotations which correspond
to the kinds of terms in QTT.
IR/CheckDatatype.hs
and IR/CheckTerm.hs
enforce the typing rules of QTT.
Interactions with other components
The typechecker imports back-end-specific types through Parameterisation
.
Characteristically of a dependently-typed system, the Typechecker uses
types and functions from the Evaluator, for example in:
Typechecker's CanTC
and Error
types
typeCheckConstructor
, which has to evaluate the terms of the telescope
passed to it
typeTerm'
and typeElim'
, which must perform reductions on terms to
perform type-checking, in accordance with QTT's PRE
and POST
rules
(characteristically of a dependently-typed theory)
Open questions / things I don't understand
5.1.6.2 - Correspondences
Correspondences between Typechecker code and QTT paper
Correspondences between Typechecker
and QTT typing/evaluation rules
In this section we examine where the formal elements of
Juvix's implementaion of Quantitative Type Theory
appear in our implementation. $x=1$
Definitions 1 and 2 address generic rig
s (ring without negation) as well as
the 0, 1, many
rig used by Idris-2. Neither of those has an explicit
analogue in our code, since we use one specific rig, and not the one
used by Idris-2 – rather, we use the natural numbers plus omega, which we
call NatAndw
.
Definition 3 defines sort ordering
as a transitive relation on sorts which
allows well-founded induction. This will be our subtyping relation, which is
defined by the <:
operator in IR/CheckTerm.hs
.
Definition 4 is of term
and elimination
, which correspond to our Term
and Elim
defined in IR/Types/Base.hs
.
Definition 5 is of contraction
, reduction
, and computation
.
Contraction
is a single step of beta-reduction
(substitution) or
nu-reduction
(stripping an annotation, which we can do once a term
is fully computed), corresponding to "small-step" semantics. Computation
is the reflexive-transitive closure thereof, corresponding to "big-step"
semantics. We refer to computation as Evaluation
, so the implementation
of computation
in our code is evalTerm
(see the Evaluator
documentation).
Definition 6 is of pre-context
and context
. The former is what is
usually just called "context" in type theories – an association list of
names to types. In QTT, the context
is a mark-up of a pre-context
with usages. In our typechecker, the pre-context is Context
, and the
mark-up is UContext
.
Definition 7 is of prejudgment
, which is what is usually just called
"[type] judgment" in type theories; as QTT is presented bidirectionally,
there is both a checking
prejudgment (a Term
is claimed to a given type)
and a synthesis
prejudgment (a given type is inferred for an Elim
).
Definition 8 is of a judgment
as a prejudgment
annotated with a usage.
We don't explicitly use the term "judgment" in our QTT code, but the
type judgments and usage judgments are captured by the Typechecker's
specific extension to Annotation
.
Definition 9 is of the type checking and synthesis rules. (This is extended
later by definition 39, which includes erasure and computation. Those two
are primarily subjects of other documents.) Below are some correspondences
between the typing rules (from definition 9) and the implementations in our
code. The explicit implementation of these rules is concentrated within
typeTerm'
and typeElim'
, and their helper functions, in CheckTerm.hs
.
PRE
, POST
: These rules, which state that a type may be used in
place of another if the latter reduces to the former, are implemented in
the IRAnn'
case of typeElim'
, and as calls throughout the cases of
typeTerm'
to evalTC
and substApp
(which calls evalTC
internally).
SORT
: The IR.Star'
case of typeElim'
. Stratification appears not to
be fully implemented as of this writing – the call to requireUniverseLT
is commented out.
VAR
: The IR.Bound'
and (two – one for globals and one for patterns)
IR.Free'
cases of typeElim'
.
FUN
: IR.Pi'
case of typeTerm'
LAM
: IR.Lam'
case of typeTerm'
APP
: IR.App'
case of typeElim'
ELIM
: The requireSubtype
call in the IR.Elim'
case of typeTerm'
.
CUT
: The call to typeElim'
itself, in the IR.Elim'
case of
typeTerm'
.
Definition 17, of subtyping
, is implemented by the <:
operator in
IR/CheckTerm.hs
.
Definitions 10-16 and 18-36, and all the Lemmas, all relate to reasoning
about the typing system, rather than defining the type system, so they do
not have direct analogues in our code (except maybe in some comments!). They
could, however, if we were to implement Juvix in a dependently-typed
language (or to at least some degree if we were to use Liquid Haskell) – such
as Juvix itself.
Definitions 37 and on refer to Erasure
and to computation (which we call
Evaluation
), each of which we document as a separate module, so we do
not explore them here, except insofar as we have touched on some aspects
where exploring definitions 9 and 39 together above.
5.1.6.3 - Modules
Documentation of individual Typechecker modules
Definitions by module (beneath Core/
)
Types
Types used throughout the typechecker (at least I think that's the intent):
PipelineError
Includes TypecheckerError
, which we currently exit with when a user
program is incorrectly typed. (I expect the long-term plan is to produce
from it a human-readable error message.)
PipelineLog
Used with @"log"
(from Capability.Writer
) to log a couple of types of
events.
TermAssignment
, AssignWithType
Maps from variables to formulae of Elementary Affine Logic (EAL) /
Elementary Affine Core (EAL). These terms are translated to
interaction nets, which Juvix uses as its execution model.
I didn't find any uses of these types that weren't ultimately
within the InteractionNetIR
module, so they should probably be
moved there.
IR
Aliases for types and functions from IR/Typechecker
.
IR/
Types
Aliases for types and functions from IR/Types/Base
and IR/Types/Globals
,
and some simple queries on them. There is one significant pair of
mutually recursive functions here: quote :: Value -> Term
and
neutralQuote :: Neutral -> Elim
. (As we shall see below, a Neutral
is a particular kind of Value
, and may include a Value
, hence the
mutual recursion.)
Types/
Base
The definitions in terms of primitive types of several internal types.
For example, a BoundVar
is represented by a Natural
, and a
PatternVar
by an Int
. Name
s are either Global
s or
Pattern
s; the implementation of GlobalName
is defined in the
NameSymbol
library.
GlobalUsage
is defined here and restricted to only Zero
or
Omega
, but there is a TODO comment asking whether globals might have
other usages.
Several of the most pervasive data types are defined here, all
as extensible
, in the sense of trees-that-grow, meaning that
different modules can extend
them in their own ways, to use their
own specific types of annotations, producing versions with a '
suffix:
Term
Since Juvix is dependently-typed, a type (PrimTy
) is one
kind of term.
Elim
An Elim
may be a bound variable, a free variable, an application
of an Elim
(to a Term
), or a usage annotation. An Elim
is
also a term (it may be embedded in a Term
with the Elim
constructor). Elim
is so-called because McBride calls this
type an "elimination" in I Got Plenty o' Nuttin'.
Elim
may be viewed as a term that can be inferred (synthesized).
Value
Since Juvix is dependently-typed, a type (VPrimTy
) is one
kind of value. Value
's definition strongly resembles that of
Term
, but there are differences; for example, there are let
Term
s, but no let
Value
.
Neutral
Neutral
is to Value
as Elim
is to Term
: A Neutral
is
a bound variable, a free variable, or an application of a Neutral
(to a Value
); a Neutral
is also a Value
(it may be embedded in a
Value
with the VNeutral
constructor).
A Neutral
may be viewed as a "stuck term": one for which
typechecking must be deferred until some other term(s) has/have been
typechecked. (Perhaps this is the source of the name "neutral" –
it's neither good (definitely typechecks) nor bad (definitely doesn't
typecheck) yet?)
Pattern
Patterns contain PVar
s which are bound by successful pattern
matches, as well as primitive values, terms, and pairs and
lists of Pattern
s.
Globals
Many Constraint
-valued definitions which allow multiple instance
s of
various classes for various types to be declared at once.
(For example, RawGlobalAll
allows any type which satisfies it to
derive Show
, Eq
, and Data
, among others.)
Several more pervasive types, built upon those in Base
:
Pos
, representing strict positivity (or otherwise)
FunClause'
, which defines the form of Juvix function definitions.
It comprises arguments (a Telescope
), a pattern match (a list of
Pattern'
), a body, a type, and Bool
fields indicating whether the
clause is a catchall and/or unreachable.
A Function'
comprises a NonEmpty
list of FunClause'
s.
The widely-used DataArg'
includes a name, a usage, and a type.
A Datatype'
includes a name, DataArg'
s (each with a Pos
),
a universe level, and a list of constructors (DataCon'
).
A type constructor – DataCon'
– includes a name, a type,
an the function which defines it (whose output type is the
type of the constructor).
Raw
forms of the above. In each case, the Raw
form uses
Term
where the non-Raw
(Cooked
?) form uses Value
.
Typechecker
There's only one definition in this module, but it's a big one:
typeCheckDeclaration
, which takes a telescope, a list of
datatypes, and a list of functions, and calls typeCheckAllCons
(see
below) on each datatype in the list. If type-checking is successful,
it adds the datatype and its constructors to the globals.
As I write, typeCheckDeclaration
is incomplete: it doesn't use one
of its parameters, the one comprising a list of function declarations;
furthermore, nothing actually calls it yet (except itself, recursively).
That is because we simply haven't gotten to writing the typechecking
of functions yet (which is also why the backend-specific typecheck
functions currently inline everything recursively starting with main
).
Typechecker/
Some of the types in Typechecker/
are parameterized over back-ends
through the Parameterisation
type which each back-end exports.
Types
Contains typechecker-specific extensions of some core types:
Annotation'
: Adds usage and type.
BindAnnotation'
: Adds an Annotation'
for both the binder and the
term that introduces the binder (Lam
or Let
).
Term'
: Annotations for each of the constructors of Term
s (pairs,
pi-types, sigma-tyoes, Elim'
s, and so on).
Elim'
: Annotations for each of the constructors of Elim
s (bound
and free variables and applications of them).
Env
The typechecker's extension of EnvCtx
is a GlobalsT
, which is a
(hash)map of names to Global'
s, which in turn may be datatypes,
data constructors, functions, or Value
s (with names and usages).
Env
therefore also defines a HasGlobals
alias for
HasReader "globals"
.
The interaction with the evaluator, as required by dependent typing,
appears in this module via the CanTC
type, which requires instances
for Eval.HasSubstValue
, Eval.HasPatSubstTerm
, and Eval.HasWeak
.
This module also defines a Context
as a list of the typechecker-specific
form of Annotation
. There is also a usage context, UContext
,
which is a list of Usage
.
Pattern bindings and usages – PatBinds
and PatUsages
– are defined as
IntMap
s to annotations and usages, respectively, reflecting the use
of de Bruijn indices.
Env
also defines the typechecker's State
monad: InnerState
,
which contains a Parameterisation
, a PatBinds
map, and a Context
.
(There's a lot of boilerplate related to it.)
Env
also defines signatures: SigDef
is a sum of function,
constructor, and datatype signatures. Signature
maps global
names to SigDef
s, so it represents all the signatures in global scope.
Typechecking functions return Signature
s through the TypeCheck
type, which uses Capability
, with the name "typeSigs"
.
Error
Defines which errors can occur during typechecking. TypeMismatch
, for
example, represents a unification error on a variable. Other errors
largely follow the possible types of terms – universe mismatches,
usage mismatches, non-function types where function types are expected,
incompletely-applied data constructors, and unbound variables. Again
because dependent typing makes the typechecker depend on the evaluator,
the typechecker may also wrap an error from the evaluator
(Core/IR/Evaluator/Types.Error
).
The module also provides pretty-printing and an exception wrapper
for the main TypecheckError
type.
CheckDatatype
Exports two interfaces, which between them allow the type-checking of
datatype declarations; hence, both are called by typeCheckDeclaration
:
checkDataType
Typechecks all the parameters of a datatype declaration,
ultimately by calling typeTerm
on each.
typeCheckAllCons
Typechecks all the constructors of a datatype declaration,
ultimately by calling typeTerm
on each, followed by evalTerm
,
because of the potential dependency of types on values.
CheckTerm
Exports typeTerm
, through which the typechecker is plugged into the
Pipeline
, via typecheckEval
and typecheckErase
, and through
them from coreToAnn
, which is called from the backend-specific
typecheck
functions with the appropriate Parameterisation
s.
(The backend-specific typecheck
functions in the current development
branch, as I write, have a lot in common, but Alberto is factoring them
out as part of PR 821.)
Internally, typeTerm
is implemented in terms of the mutually-recursive
typeTerm'
and typeElim'
(which follow the structures of the
mutually-recursive Term
and Elim
). These two functions are, in
a sense, the core of the typechecker: they're the expression in code of
the typing rules of QTT. For example, when typeTerm'
is given a
lambda with expected type A -> B
, it checks whether assuming that the
variable has type A
results in the body correctly typechecking (with a
recursive call to typeTerm'
) with type B
.
The mutual recursion expresses the bidirectionality of the type checking:
annotated terms are checked by typeTerm'
; unannotated terms
("eliminations") have their types inferred (synthesized) by typeElim'
.
(Note: as of this writing, there is a bug in the Pair'
case of
typeTerm'
: it doesn't evaluate the left term in checking the type
of the right, so it wouldn't handle dependent pairs correctly. I
will fix that shortly.)
One of the types that appears as a frequent return value in CheckTerm.hs
is Leftovers
. This type keeps track of variables which haven't yet
been used as often as their Usage
annotation allows – and, if it's not
omega
, requires! Hence if there are any Leftovers
with usages other
than omega
after a term has been type-checked, that causes a
LeftoverUsage
TypecheckerError
to be thrown.
CheckTerm.hs
is also the location of our definition of the subtyping
relation, operator <:
.
Many of the return values of functions in CheckTerm.hs
are
parameterised over Monad
s with constraints such as HasThrowTC'
and HasBound
. So they expect to be able to throw TypecheckerError
s,
or to be called as part of the execution of a state monad containing
(at least) a list of bindings. That's a pattern which strikes me as
crying out for algebraic effects – I think they could make the
signatures much more readable, with less boilerplate and less code to
compose effects. So I feel this module should be able to "read" better
when it's written in Juvix itself!
TransformExt
This module defines an interface for transforming one extension of a
Term
(in the sense of trees-that-grow) into another extension of
Term
– an applicative functor on term extensions. Such functors
are defined in terms of their operations on each variant of Term
,
such as primitive, pi-type, let-binding, and so on. Such functors
compose (and the TransformExt
module define the compose
function
which composes them). This module also defines a forgetful functor,
forgetter
, which transforms an extended Term
into a Term
with
no extension (NoExt
).
There are forgetful uses in the Evaulator
and Erasure
modules,
but I haven't found any non-forgetful uses yet.
TransformExt/
IRAnn/
Types
The typechecker-specific annotation of terms comprises:
A usage (any natural number, or omega
)
A term of QTT (which might be a type), which is one of:
*
(a universe, of which there is a natural-number-indexed hierarchy)
a primitive type
a primitive constant
a pi-type (dependent product)
a lambda
a sigma type (dependent sum)
a pair
a let-binding
the unit type
the unit value
one of the Elim
types:
Some of these terms are extensions to core QTT, which, for example, does
not appear to me to define explicit pair types, for example.
Erasure
This module just defines two forgetful functors, for Term
and Elim
,
which discard extensions (see TransformExt
).
5.1.6.4 - Formalization
Formalization of QTT in Idris-2
Category-theoretic formalization of QTT: "Quantitative Categories with Families"
-- Following Definition 3.1, of "Categories with Families (CwFs)",
-- in _Syntax and Semantics of Quantitative Type Theory_.
record CwF where
constructor MkCwF
-- "C is a category..."
Underlying : Category
-- "...with a chosen terminal object T"
Terminal : Object Underlying
TerminalCorrect : Universality.IsTerminal {cat=Underlying} Terminal
-- "For delta in Ob(C), a collection Ty(delta) of semantic types"
SemanticType : Object Underlying -> Type
-- "For delta in Ob(c) and S in Ty(delta), a collection Tm(delta, S) of
-- semantic terms"
SemanticTerm : (object : Object Underlying) -> SemanticType object -> Type
-- "For every f : delta -> delta' in C, a function
-- -{f} : Ty(delta') -> Ty(delta)..."
MapType : (obj, obj' : Object Underlying) ->
Morphism Underlying obj obj' ->
SemanticType obj' -> SemanticType obj
-- "...and for S in Ty(delta') a function
-- -{f} : Tm(delta', S) -> Tm(delta, S{f})..."
-- (Implementer's note: I don't understand why both MapType and MapTerm
-- (as I'm calling them) are referred to in the paper as "-{f}". Is that
-- a typo, or will they just distinguish between MapType and MapTerm by
-- context?)
MapTerm : (obj, obj' : Object Underlying) ->
(f : Morphism Underlying obj obj') ->
(S : SemanticType obj') ->
SemanticTerm obj' S ->
SemanticTerm obj (MapType obj obj' f S)
-- "Such that both assignments preserve identity and composition"
MapTypePreservesIdentity : (obj : Object Underlying) ->
(S : SemanticType obj) ->
MapType obj obj (Identity Underlying obj) S = S
MapTypePreservesComposition : (a, b, c : Object Underlying) ->
(f : Morphism Underlying a b) ->
(g : Morphism Underlying b c) ->
MapType a c
((.*) {cat=Underlying} {a} {b} {c} g f) =
(MapType a b f) . (MapType b c g)
MapTermPreservesIdentity : (obj : Object Underlying) ->
(S : SemanticType obj) ->
MapTerm obj obj (Identity Underlying obj) S = S
MapTermPreservesComposition : (a, b, c : Object Underlying) ->
(f : Morphism Underlying a b) ->
(g : Morphism Underlying b c) ->
(S : SemanticType c) ->
(t : SemanticTerm c S) ->
MapTerm a c
((.*) {cat=Underlying} {a} {b} {c} g f) S t =
MapTerm a b f (MapType b c g S)
(MapTerm b c g S t)
-- "For each object delta in C and S in Ty(delta) an object delta.S (called
-- the _comprehension of S_)..."
Comprehension : (obj : Object Underlying) ->
SemanticType obj -> Object Underlying
-- "...such that there is a bijection natural in delta':"
ComprehensionToMorphism :
(obj : Object Underlying) -> (S : SemanticType obj) ->
(obj' : Object Underlying) ->
Morphism Underlying obj' (Comprehension obj S) ->
Morphism Underlying obj' obj
ComprehensionToTerm :
(obj : Object Underlying) -> (S : SemanticType obj) ->
(obj' : Object Underlying) ->
(f : Morphism Underlying obj' (Comprehension obj S)) ->
SemanticTerm obj'
(MapType obj' obj (ComprehensionToMorphism obj S obj' f) S)
TermToComprehension :
(obj : Object Underlying) -> (S : SemanticType obj) ->
(obj' : Object Underlying) ->
(m : Morphism Underlying obj' obj) ->
SemanticTerm obj' (MapType obj' obj m S) ->
Morphism Underlying obj' (Comprehension obj S)
ComprehensionToTermToComprehension_id :
(obj : Object Underlying) -> (S : SemanticType obj) ->
(obj' : Object Underlying) ->
(f : Morphism Underlying obj' (Comprehension obj S)) ->
TermToComprehension obj S obj'
(ComprehensionToMorphism obj S obj' f)
(ComprehensionToTerm obj S obj' f) =
f
TermToComprehensionToTerm_id_morphism :
(obj : Object Underlying) -> (S : SemanticType obj) ->
(obj' : Object Underlying) ->
(m : Morphism Underlying obj' obj) ->
(term : SemanticTerm obj' (MapType obj' obj m S)) ->
ComprehensionToMorphism obj S obj' (TermToComprehension obj S obj' m term) =
m
TermToComprehensionToTerm_id_term :
(obj : Object Underlying) -> (S : SemanticType obj) ->
(obj' : Object Underlying) ->
(m : Morphism Underlying obj' obj) ->
(term : SemanticTerm obj' (MapType obj' obj m S)) ->
ComprehensionToTerm obj S obj' (TermToComprehension obj S obj' m term) =
replace {p=(\m' => SemanticTerm obj' (MapType obj' obj m' S))}
(sym (TermToComprehensionToTerm_id_morphism obj S obj' m term)) term
ComprehensionToMorphismIsNatural :
(obj : Object Underlying) -> (S : SemanticType obj) ->
(obj', obj'' : Object Underlying) ->
(observerChange : ObserverChange {cat=Underlying} obj' obj'') ->
-- I'm not sure whether it is necessary to impose naturality on the
-- observer change in order to ensure it on the comprehension-to-morphism
-- bijection; that's just a guess. We could find out whether proofs would
-- go through without it.
(isNatural : ObserverChangeIsNatural
{cat=Underlying} {observerA=obj'} {observerB=obj''} observerChange) ->
(f : Morphism Underlying obj' (Comprehension obj S)) ->
ComprehensionToMorphism obj S obj''
(observerChange (Comprehension obj S) f) =
observerChange obj (ComprehensionToMorphism obj S obj' f)
ComprehensionToTermIsNatural :
(obj : Object Underlying) -> (S : SemanticType obj) ->
(obj', obj'' : Object Underlying) ->
(observerChange : ObserverChange {cat=Underlying} obj' obj'') ->
-- I'm not sure whether it is necessary to impose naturality on the
-- observer change in order to ensure it on the comprehension-to-term
-- bijection; that's just a guess. We could find out whether proofs would
-- go through without it.
(isNatural : ObserverChangeIsNatural
{cat=Underlying} {observerA=obj'} {observerB=obj''} observerChange) ->
(f : Morphism Underlying obj' (Comprehension obj S)) ->
ComprehensionToTerm obj S obj''
(observerChange (Comprehension obj S) f) =
MapTerm obj'' obj'
(ObserverChangeInducedMorphism
{cat=Underlying} {observerA=obj'} {observerB=obj''} observerChange)
(MapType obj' obj (ComprehensionToMorphism obj S obj' f) S)
(ComprehensionToTerm obj S obj' f)
5.2 - S-expression
An Overview on dealing With S-Expressions WIP
5.3 - Translate
Documention on the Juvix Translation Phase
Translate
This phase is the
direct manipulation of the S-expression construct we left off of in
the frontend phase.

This section is broken up into three section, each growing in
complexity as we go along. These are:
Desugaring
Context Phase
Desugaring
Every important operation in this phase is a direct Syntax
to
Syntax
transformation with no extra context needed.

The function op
pipes all desugaring functions that do not require a context.
-- | @op@ fully desugars the frontend syntax from the original
-- frontend s-exp representation to a form without modules, conditions,
-- guards, etc. This pass thus does all transformations that do not
-- requires a context
op :: [Sexp.T] -> [Sexp.T]
op syn =
syn
>>| Pass.moduleTransform
>>| Pass.moduleLetTransform
>>| Pass.condTransform
>>| Pass.ifTransform
>>| Pass.multipleTransLet
|> Pass.multipleTransDefun
|> Pass.combineSig
>>| Pass.removePunnedRecords
>>| Pass.translateDo
The order of these passes is relatively arbitrary. We simply
remove modules, then module lets, then conds, then …, and then
do. The op
function is ran directly on the output of the last phase.
Examples of its use are provided in the EasyPipeline module (run desugar1
and desugar2
to test the results for yourself!).
Desugaring S-expressions, Structurally
Structurally here refers to a systematic way of manipulating s-expressions in which form has and equivalent Haskell ADT.
Before getting into all the functions in op
, such as Pass.moduleTransform
, we first discuss the underlying tool that allows us to handle these passes in a principled way.
We'll use the let/defun
form as an example of this transformation.
This code can be found in the Structure module.
;; From Design/S-expression Syntax
(:defun foo (x)
(:cond
((:infix == x 2) (:infix + 3 (:infix + 4 x)))
(else (:infix + x 2))))
defun
is broken into the name foo
, the arguments x
(note this can pattern match), and the body cond
.
In Structure/Frontend.hs we find a Haskell encoding of this form:
-- | @Defun@ is the base defun structure
-- currently it does not have matching
data Defun = Defun
{ defunName :: Sexp.T,
defunArgs :: Sexp.T,
defunBody :: Sexp.T
}
deriving (Show)
Note: Notice how we say nothing about the head being the cadr
of the structure, and the arguments the caddr
, and the body of course the caddr
. Instead, we just lay out the logical structures in a record, divorced from any representation.
We provide an API to deal with the actual representation. It would need to change with a lisp-form redesign.
----------------------------------------
-- Defun
----------------------------------------
nameDefun :: NameSymbol.T
nameDefun = ":defun"
isDefun :: Sexp.T -> Bool
isDefun (Sexp.Cons form _) = Sexp.isAtomNamed form nameDefun
isDefun _ = False
toDefun :: Sexp.T -> Maybe Defun
toDefun form
| isDefun form =
case form of
_Defun Sexp.:> sexp1 Sexp.:> sexp2 Sexp.:> sexp3 Sexp.:> Sexp.Nil ->
Defun sexp1 sexp2 sexp3 |> Just
_ ->
Nothing
| otherwise =
Nothing
fromDefun :: Defun -> Sexp.T
fromDefun (Defun sexp1 sexp2 sexp3) =
Sexp.list [Sexp.atom nameDefun, sexp1, sexp2, sexp3]
All records in that file have a corresponding interface
of name<structure>
, is<structure>
, to<strcuture>
, and
from<structure>
. These deal with the small details of cars
and
cdrs
.
This can be tested by opening the Easy file in the Juvix
repl and running
λ> Right defun = Sexp.parse "(:defun foo (x) (:cond ((:infix == x 2) (:infix + 3 (:infix + 4 x))) (else (:infix + x 2))))"
λ> import qualified Juvix.Sexp.Structure as Structure
λ> Structure.toDefun defun
Just (Defun {defunName = "foo", defunArgs = ("x"), defunBody = (":cond" ((":infix" "==" "x" 2) (":infix" "+" 3 (":infix" "+" 4 "x"))) ("else" (":infix" "+" "x" 2)))})
We can see this transformation agrees with our documentation.
What this buys us, is that when it comes time to do the passes, we
don't have to worry about these details, while trying to make sure we
don't change the semantics of the passes themselves. This properly
decouples our concerns so we can worry about the abstraction meaning
of the syntax in the passes while worrying about the details here.
However all is not fine, notice, we had to write nineteen lines of
boilerplate in order to get rid of the details of the syntax! This is
rather tedious, and if we count the numbers of these there are at
least 22 structures in that file, and over 500 lines of this rather
straightforward API. This is especially annoying given that we are
essentially just translating over BNF translations.
It is due to this tedium early on that the lisp haskell generator was
created. The code itself isn't the best (due to using string
interpolation) and can with a bit of engineering be improved (with a
proper S-expression DSL in the lisp code), however it works fairly
well for our purposes.
Here are some choice snippets that cover every case of the generator
;; 1
(generate-haskell "Defun" '("sexp" "sexp" "sexp") ":defun")
;; 2
(generate-haskell "ArgBody" '("sexp" "sexp") nil)
;; 3
(generate-haskell "DeconBody" (repeat 2 "sexp") nil)
(generate-haskell "Case" '("sexp" "deconBody") "case" :list-star t)
;; 4
(generate-haskell "NotPunned" '("sexp" "sexp") nil)
(generate-haskell "RecordNoPunned" '("notPunnedGroup") ":record-no-pun"
:list-star t
:un-grouped t)
;; 5
(generate-haskell "ArgBody" '("sexp" "sexp") nil)
;; bodys here, as there are multiple!
(generate-haskell "LetMatch" '("sexp" "argBodys" "sexp") ":let-match")
The first one correlates to the structure we've already seen. Namely
the Defun
struct. the first argument is the type name we wish to
generate. This is of course Defun
. the second argument is the
list we wish to parse, in this case it says all three arguments are
just Sexp.T
's. and binds them in order to name
, args
, and
body
in the struct itself. The third argument is the name of the
structure itself if it has one. Our lets are translated to the
:defun structure, and so we note that here.
The second case correlates to a very similar structure, except for
that it lacks a name. so the structure we care about is (arg
body)
with no name attached. The nil
in the third argument
reflects this change.
the third case correlates to a scenario with two changes. The first
being is that we can define types for another to rely on. Here we
are saying that case has the type DecondBody
, we use lower case
to reflect the proper variable name this associates with. Further
the :list-start t
aspect of this tells us that the last argument,
in this case of DeconBody
, is the rest of the s-expression and
not just the cadr
.
data Case = Case
{ caseOn :: Sexp.T,
caseImplications :: [DeconBody]
}
deriving (Show)
The fourth is the last new concept of the bunch, namely we have
:un-grouped t
, which states the form in which this parses are not
grouped like
((name₁ type₁) (name₂ type₂) … (nameₙ typeₙ))
, but rather
(name₁ type₁ name₂ type₂ … nameₙ typeₙ)
.
This means that we have to tell it explicitly that it occurs over 2
positions with that :un-grouped
argument.
The fifth case is an interesting one. The key insight is that we
say argBodys
rather than argBody
. This is because our generator
is limited. Thus we manually write
-- these are ungrouned fromArgBodys, where we groupBy2 both ways
fromArgBodys :: [ArgBody] -> Sexp.T
fromArgBodys = Sexp.unGroupBy2 . toStarList fromArgBody
-- these are ungrouned fromArgBodys, where we groupBy2 both ways
toArgBodys :: Sexp.T -> Maybe [ArgBody]
toArgBodys = fromStarList toArgBody . Sexp.groupBy2
To be able to handle the scenario where the list-star like form
happens in not the last argument, but rather in a list in some
other position!
Improvements Upon the Generator
The generator can be improved a lot, as the previous section mentions
it was hacked together using string interpolation which is not a good
structural way to handle this sort of problem. The alternative would
however take a week of hacking, so it is not a priority to undergo.
However there are two worthwhile hacks that we should undergo.
Change the Maybe
of the to<Name>
to an Either
.
This change at the current moment does not matter! Namely because
Nothing
should never be triggered. This is due to the parser in
frontend excludes this possibility. However when syntax redesign
comes in, this will no longer be the case. It would be great if the
generated code could instead generate an Either
where the left
counts the number of arguments given and states precisely why it
can't translate the given S-expression form into the type we want.
list-star like behavior anywhere. Currently we bake it into the
last form, but it would be nice if we could have a way to denote
this for the middle slots, so we can avoid hacks, like the manual
argBodys
example given in the last section.
Desugaring Passes
Now that we understand the structural underpinnings of the pass, we
can now talk about the transformation itself.
We can observe the passes here.
There are two ways to proceed, if you don't want to go through the
entire creation process of a pass you can skip the next section and go
directly to reviewing a pass
The most important detail to note is that if you have a clear view of
the input and output of pass, a pass should be easy to write and
review. If creating this is hard, then I suggest reading through the
next section to see the full life cycle of this process.
Creating a Pass
Instead of just showing an example, let us show writing one. Let us
define the cond
transformation. We can find the form here.
It seems the Cond form is filled with a bunch of ((:infix ≡≡ x 3) 1)
forms which we can abstractly view as (<pred> <ans>)
. So let us
define these forms as a type. We do this in the Structure
folder. In
particular we want to put this in Frontend.hs since this shows up in
the frontend syntax.
-- | @PredAns@ is an abstraction over questions and answers
data PredAns = PredAns {predAnsPredicate :: Sexp.T, predAnsAnswer :: Sexp.T}
deriving (Show)
-- | @Cond@ here Cond form takes a list of predicate answers
newtype Cond = Cond {condEntailments :: [PredAns]} deriving (Show)
To reflect the structure we include the raw forms in the generator
file and take the output of these calls into the Structure file.
(generate-haskell "PredAns" (repeat 2 "sexp") nil)
(generate-haskell "Cond" '("predAns") ":cond" :list-star t)
Now we can test that we successfully matched the BNF by using the REPL
λ> Right cond = Sexp.parse "(:cond ((:infix == x 2) (:infix + 3 (:infix + 4 x))) (else (:infix + x 2)))"
λ> Structure.toCond cond
Just
(Cond
{condEntailments =
[PredAns { predAnsPredicate = (":infix" "==" "x" 2)
, predAnsAnswer = (":infix" "+" 3 (":infix" "+" 4 "x"))}
,PredAns { predAnsPredicate = "else"
, predAnsAnswer = (":infix" "+" "x" 2)}]})
Now we know the shape of the data we are working with!
To get closer to the core version of match, let us first desugar this
to an if. Something like
(:Cond (pred-1 result-1) … (pred-n result-n))
should turn into
(if pred-1 result-1 (if pred-2 result-2 (… (if pred-n result-n))))
So like cond, we also also define the records and forms we wish to
work on
-- | @If@ has an pred, then, and else.
data If = If
{ ifPredicate :: Sexp.T,
ifConclusion :: Sexp.T,
ifAlternative :: Sexp.T
}
deriving (Show)
-- | @IfNoElse@ has a pred and a then.
data IfNoElse = IfNoElse
{ ifNoElsePredicate :: Sexp.T,
ifNoElseConclusion :: Sexp.T
}
deriving (Show)
-- In the generator
(generate-haskell "If" (repeat 3 "sexp") "if")
(generate-haskell "IfNoElse" (repeat 2 "sexp") "if")
Because our generator is limited we make two variants.
Now let us write the form, first let us observe that we can view this
operation from cond
to if
as folding if
over the list of
PredAns
, with the result being an if with no else condition.
condToIf condExpression
| Just cond <- Structure.toCond condExpression,
-- we need to setup the if with no else
Just last <- lastMay (cond ^. entailments) =
let acc =
-- let's create it, with the predicate and answer of the
-- PredAns tye
Structure.IfNoElse (last ^. predicate) (last ^. answer)
|> Structure.fromIfNoElse
-- Now let us employ the fold strategy we talked about
in foldr generation acc (initSafe (cond ^. entailments))
| otherwise = error "malformed cond"
where
-- this is the folded function, see how we just build up the if,
-- then push it back to an s-expression at the end?
generation predAns acc =
Structure.If (predAns ^. predicate) (predAns ^. answer) acc
|> Structure.fromIf
With our current understanding we'd write something like this, and in
fact we can test it as is! Just go to the Easy Pipeline file and
include these dependencies and the function above.
import qualified Juvix.Sexp.Structure as Structure
import Juvix.Sexp.Structure.Lens
import Control.Lens hiding ((|>))
Along with the definition and now we can see
λ> Right cond = Sexp.parse "(:cond ((:infix == x 2) (:infix + 3 (:infix + 4 x))) (else (:infix + x 2)))"
λ> condToIf cond
("if" (":infix" "==" "x" 2) (":infix" "+" 3 (":infix" "+" 4 "x")) ("if" "else" (":infix" "+" "x" 2)))
We just created the bulk of our first pass! There is a few more
details that one has to care about in a real pass, but we'll cover
those in the reviewing section coming up
Reviewing a Pass
The following code can be found here.
-- | @condTransform@ - CondTransform turns the cond form of the fronted
-- language into a series of ifs
-- - BNF input form:
-- + (:Cond (pred-1 result-1) … (pred-n result-n))
-- - BNF output form:
-- + (if pred-1 result-1 (if pred-2 result-2 (… (if pred-n result-n))))
condTransform :: Sexp.T -> Sexp.T
condTransform xs = Sexp.foldPred xs (== Structure.nameCond) condToIf
where
condToIf atom cdr
| Just cond <- Structure.toCond (Sexp.Atom atom Sexp.:> cdr),
Just last <- lastMay (cond ^. entailments) =
let acc =
Structure.IfNoElse (last ^. predicate) (last ^. answer)
|> Structure.fromIfNoElse
in foldr generation acc (initSafe (cond ^. entailments))
|> Sexp.addMetaToCar atom
| otherwise = error "malformed cond"
--
generation predAns acc =
Structure.If (predAns ^. predicate) (predAns ^. answer) acc
|> Structure.fromIf
We saw most of this form in the creation of a pass section
above. However we will go over the rough strategy briefly. In order to
turn the input of cond
to an if
, we can view it as a fold, where
we fold the if
form over the predicate and answers of the cond
structure with the accumulator being the already built up if's. We can
see this by the BNF
comment at the start of the function. This
transformation gets us closer to the more primitive match
representation that is present in core.
We can see the implementation of this strategy shine through in the
body of condToIf
and the folded function generation
.
The main difference from our creation of a pass section is that we
have this Sexp.foldPred
call, and that condToIf
has two
arguments. The impotence of this is that Sexp.foldPred
searches the
entire S-expression structure for the name Structure.nameCond
, which
is :cond
. When it sees this function it breaks the structure into
the atom
, :cond
, and the cdr
which is the rest of the
structure. This means that if you run this on any form that contains
the cond
form, then it will automatically run this transformation
for you!
The last key bit of information is Sexp.addMetaToCar atom
, this is
to preserve meta information like line-numbers that were on the
:cond
atom that we would like on our newly generated if
atom.
Overall, reviewing a pass is rather easy, just keep in mind what the
input form is, and what the output form is. As long as the output form
is getting closer to the Core representation of matters, we have a
successful pass.
To verify this pass, we can easily run it for ourselves and see if the
BNF
Comment is being truthful.
λ> Pass.condTransform cond
λ> Right cond = Sexp.parse "(:defun f (x) (:cond ((:infix == x 2) (:infix + 3 (:infix + 4 x))) (else (:infix + x 2))))"
λ> cond
(":defun" "f" ("x")
(":cond" ((":infix" "==" "x" 2) (":infix" "+" 3 (":infix" "+" 4 "x")))
("else" (":infix" "+" "x" 2))))
λ> Pass.condTransform cond
(":defun" "f" ("x")
("if" (":infix" "==" "x" 2)
(":infix" "+" 3 (":infix" "+" 4 "x"))
("if" "else"
(":infix" "+" "x" 2))))
Although we used the cond
pass to talk about how these passes work,
we only do so to have a simple example of how these work. All the
other passes follow the same format and can be dissected by the same
thought process.
Context
The context phase is an important phase, this is where we start to
introduce the context and our operations start to revolve around
it. This is also where our pipeline starts to show some inadequacies
with how we ingest data. But first we must talk about how we transform
our list of S-expressions into a context
Contextify
We call this phase Contextify
, or Contextualization
. The files we
will first be investigating is split among many files, so we will link
them before we write code rather than at the start.
5.4 - Frontend
Documentation on the Juvix Frontend Phase
Frontend
We begin our journey with the user facing language of Juvix. This is
either text in a file, or text written into a REPL session. We then
use Parser Combinators
to turn this unstructured form into a Juvix
form. Finally we end on converting this Structured Juvix form into an
equivalent S-expression one to do further passes.
Parser Combinators
the parser combinator file that we will be inspecting will be found
here.
In order to be able to understand this code, I suggest reading about
how parser combinators work, for a very brief rundown here are the key
features we will be abusing. If you wish to skip the brief rundown
skip to the Specific Information Section.
Quick RunDown
matchL :: Parser Types.MatchL
matchL = do
skipLiner J.pipe
match <- P.try matchLogicSN
spaceLiner (P.string "->")
Types.MatchL match <$> P.try expression
Here the do just means we do the first statement, parse for a pipe,
then a match. If any of these parsers fail to match (for example if
there is no pipe), then we consider the parser to have failed.
Try is another important concept, this says if the parser fails for
any reason, don't consume any text it may have started to parse (say
our pipe passes but our matchLogicSN
doesn't, if that is the case,
then try matchL
would revert the pipe parse!).
topLevel :: Parser Types.TopLevel
topLevel =
P.try (Types.Type <$> typeP)
<|> P.try fun
<|> P.try modT
<|> P.try (Types.ModueOpen <$> moduleOpen)
<|>
means or, so if a parser fails, we move onto the next.
Finally it would be imporper of me to end the section without talking
about the infix handler.
expressionGen :: Parser Types.Expression -> Parser Types.Expression
expressionGen p =
Expr.makeExprParser (spaceLiner (expressionGen' p)) tableExp
-- For Expressions
tableExp :: [[Expr.Operator Parser Types.Expression]]
tableExp =
[ [refine],
[infixOp],
[arrowExp]
]
infixOp :: Expr.Operator Parser Types.Expression
infixOp =
Expr.InfixR
( P.try $ do
inf <- spaceLiner infixSymbolDot
pure
(\l r -> Types.Infix (Types.Inf l inf r))
)
Here we outline three key functions. The expressionGen
uses the
tableExp
as the precedence table of infix parsers. So refine
has a
higher precedence than infixOp
. At the end of expressionGen
we get
back a new parser for any expression parser we may hand it.
The infixOp
structure explicitly states what kind of infix it is, in
our case we make it nest on the right, so a -> b -> c
gets turned
into a -> (b -> c)
. The important thing to note is that we only
parse for the inifix symbol itself, we let the parser we hand to
expressionGen
handle the other side.
Every other tool you'll see is an abstraction on top of these base
tools. Even the Infix handler is built upon the first two primitives
we've outlined.
Specific Information
Now we get into the specifics of our version. We use parser
combinators mostly in the standard way, however you'll often see the
forms skipLiner
, spaceLiner
, and *SN
which are not typical in a
parser combinator system.
spaceLiner
just eats all empty symbols, these are spaces and
newlines after the current parsed expression.
skipLiner
is the same however it is for any Character given to it.
finally *SN
just calls spaceLiner
on whatever parser.
These concepts exist namely due to the wish of eventually making the
parser indent sensitive.
The main confusing bit of our layout is the many variants of
expression!
do''' :: Parser Types.Expression
do''' = Types.Do <$> do'
app'' :: Parser Types.Expression
app'' = Types.Application <$> P.try application
all'' :: Parser Types.Expression
all'' = P.try do''' <|> app''
-- used to remove do from parsing
expression' :: Parser Types.Expression
expression' = expressionGen app''
-- used to remove both from parsing
expression''' :: Parser Types.Expression
expression''' = expressionGen (fail "")
expression :: Parser Types.Expression
expression = expressionGen all''
We have three main variants, ones with application, ones with do
syntax, and ones with both! These exists because certain
transformations will go into an infinite loop if you're not
careful. This is mainly due to how some forms like do
and infix
generators behave together. In other cases like in adt declarations,
we want to disable application type List a = Cons a (List a)
. It
would be a shame if the a
was applied to List a
!
S-Expression Transformation
The transformation file can be found here.
Here we do a very straightforward algorithm, of when we see a
particular Juvix ML ADT form, we then transform it to the equivalent
Lisp expression that can be found in the s-expression document in design
Ideally we could generate the particulars of the syntax via some
schema and have the ML code use that. This would remove the hardbaked
structure of each form from the transformation, however with how many
different types exist in the frontend form this would be quite a
challenge. We do this technique later on in the compiler, reducing the
amount of hard baked forms.
The above issue also extrapolates a bit on why we have chosen this
path. ML structured ADT's are wonderful, however they lack
flexibility. We used a trees that grow structure in the past, however
this would convert one rigid ADT into another. And so we had to
convert the massive adt that we built one by one, which ended up with
over 500 lines of code per pass! However an issue just as big is that
no common interface can be had, thus our passes couldn't as easily be
abstracted awway.
Final Words
Overall, this stage of the compiler is not too complicated, it should
be somewhat easy to get a grasp of and see how the data flows. I
suggest going into the EasyPipeline
library and loading up Easy.hs
and playing around. We have just covered up to the SEXP PHASE
section of that page, and so you should be able to understand what
sexp1
and sexp2
examples there dl
5.5 - Context
This article goes over the Context structure in detail
What is the Context?
The context can be viewed as the working image of the Juvix
Language. In particular it is the storage/information system of the
language. It is also responsible for having the semantics of symbol
resolution Due to this, we can view the context as a fancy nested
hashtable.
What information is stored in the Context?
The context stores a plethora of information.
In the long run this information will be added to, with each declared
item in the image having a general hashtable that any information can
be stored on. This will include documentation and the original source
code.
How does Symbol resolution work?
Symbol Resolution is handled entirely by the context. When we ask the
context what Foo.Bar.zed
resolves to, we do a few things.
First we check if Foo
is in the current name-space. Specifically if
Foo
is both in the private name-space and public one, then the
private one wins. After this point, we have no means of accessing
private name-spaces, so we just access the public Bar and zed inside
the Foo
module we resolved.
If Foo
is not in the locals, then we check the top of the context
for Foo
. If it's not there, then we are unable to resolve it.
The only other special part of a known symbol is the resolution of
names like TopLevel
which bypasses any local module checks, and
instantly resolves from the top. so Toplevel.Foo.Bar.zed
will always
look in the toplevel name-space.
Special Rules for dealing with unresolved symbols
Sometimes we wish to quantify over symbols which we may add or
create. For our previous example, what if Foo.Bar.zed
is undefined?
Operations which operate over hypothetical symbols have a way to know
if the symbol is defined or not.
The resolution rules are fairly simple, but we will investigate them
with three case studies where the symbols may be undefined. zed
,
Foo.zed
, TopLevel.Foo.zed
zed
Since zed
is not in scope, we only really have two choices. We
either mean Toplevel.zed
or CurrentNameSpace.zed
. Since we
find that we typically mean zed
in the current module, and not
some hypothetical one at a top level, we try to do the latter. So
the operation deals with an undefined CurrentNameSpace.zed
Foo.zed
Toplevel.Foo.zed
Overall some of these rules are arbitrary, namely case 2 subsection 2
may need to conform to be more like case 1, however this requires
testing and playing with the system more to determine.
Both this section and the above are implemented in the
modify speace impure section.
What are the Functions of the Context?
The context hosts a plethora of useful functions that make traversing
it easy. Most of these more useful functions are built upon the more
primitive operations.
Many operations that create name-spaces are in IO, to hold the
recrusive maps we use to track information.
Primitive Operations
Empty
empty :: NameSymbol.T -> IO (T term ty sumRep)
empty
creates a new context in the name-space you call it with (thus
empty "Foo.Bar"
makes a context with the nested module Bar in Foo).
The current module is then set to this name.
Lookup
lookup ::
NameSymbol.T -> T term ty sumRep -> Maybe (From (Definition term ty sumRep))
(!?) ::
T term ty sumRep -> NameSymbol.T -> Maybe (From (Definition term ty sumRep))
looup
looks up the given name-symbol according to the rules in the
Symbol resolution section. !?
is just flipped lookup.
Add
add ::
NameSpace.From Symbol ->
Definition term ty sumRep ->
T term ty sumRep ->
T term ty sumRep
add name def context
adds name
with the definition def
to the
current name-space of an existing context. The type NameSpace.From
can either be Pub
or Priv
currently, stating how we want to add
the definition to the current context.
As can be observed, we are adding a Symbol
and not a NameSymbol
,
and so this primitive can't add name
to a separate module. This
should be changed in time.
Remove
remove ::
NameSpace.From Symbol ->
T term ty sumRep ->
T term ty sumRep
remove name context
removes the name
from the public or private
name-space of the current name-space.
InNameSpace
inNameSpace :: NameSymbol.T -> T term ty sumRep -> Maybe (T term ty sumRep)
inNameSpace name context
is the primitive way in which we change the
current name-space. like lookup, we use the standard symbol resolution
rule to determine which module we resolve to. If the module is not
found, then we simply return Nothing
. Also if the definition we land
on is not a record/module, then likewise we also return Nothing
. If
all is successful, then we end up with a new context with a new
current name-space.
ModifySpaceImp ∧ type Stage ∧ type Return
data Stage b
= -- | 'Final' signifies the last symbol that we
-- pursue in updating a structure
Final b
| -- | 'Continue' signifies that there are parts of
-- the namespace that we can still continue down
Continue b
data Return term ty sumRep
= -- | 'GoOn' signifies that we should continue
-- going down records
GoOn (Record term ty sumRep)
| -- | 'Abort' signifies that we should cancel
-- the changes on the map and
Abort
| -- | 'UpdateNow' signifies that we should
-- update the context with the current value
UpdateNow (Definition term ty sumRep)
| -- | 'RemoveNow' signifies that we show remove
-- the definition at this level
RemoveNow
modifySpaceImp ::
Monad m =>
( Stage (Maybe (Definition term ty sumRep)) ->
m (Return term ty sumRep)
) ->
NameSymbol.T ->
T term ty sumRep ->
m (Maybe (T term ty sumRep))
modifySpaceImp f sym context
is a rather important operation. Some
forms like Empty even rely on it!
modifySpaceImp
is responsible for implementing the behavior for
known symbols exactly how Symbol Resolution
lays it out, and for unknown symbols how Rules for Unresolved Symbols
lays it out.
The Maybe
it takes indicates whether the symbol is resolved or
not.
We decided to include the Return
and Stage
type here, as they both
talk about something integral to the design of this function. Namely
the Return
talks about the range of operations we care about. This
limits the kinds of operations this modifyNameSpaceImp
function can
do. The Stage
on the other hand denotes to the operation function
f
whether we are at the final symbol in the resolution or are we
still going down to nested structure to find the proper symbol. Many
operations will do something special at the very end, and so we often
care about what stage we are at.
Finally, our function returns some monad m
, this is due to just
being general to accept any kind of function, as the name suggests
there is a non Impure
version of this, however this just runs this
more general one in the Identity monad.
Derived Operations
These operations aren't derived from the primitive operations, however
are quite important for operation of the context. Some even have
special rules one must pay attention to.
SwitchNameSpace
switchNameSpace ::
NameSymbol.T -> T term ty sumRep -> IO (Either PathError (T term ty sumRep))
switchNamespace name context
is an important operation. This creates
and switches to the name-space even if it doesn't exist! The rules for
creation are exactly that ruled in the Symbol Resolution section.
The creation of new name-spaces are IO, and so the operation is in IO
as well.
5.6 - Pipeline
Documentation on the Juvix compiler pipeline
Pipeline
This section provides an overview of the compiler pipeline, from a
source text file to the specific backend calls to produce the
binary.

The in code explanation in that image can be itself be more boxed
into what we see in app/main.hs
Compile fin fout (Michelson backend) ->
liftIO (readFile fin)
>>= Pipeline.parse
>>= Pipeline.typecheck @Pipeline.BMichelson
>>= Pipeline.compile @Pipeline.BMichelson
>>= Pipeline.writeout fout
>>= liftIO . print
Parsing
The first set of phases to happen is that the text file is turned
into a structured document, namely an ML AST.
Transition
This is a long phase of the compiler where we do a few things.
The first is that we desugar the expression using baked in
expansions. This happens for function definitions, punned
arguments, etc.
After that, we then take the desugared s-expressions and put them
in our global "context". We do some extra processing in this
stage as well like resolving what modules open each other.
We still have a few more passes to run, as phases like symbol
resolution and infix resolution can be run now.
Finally we have one last pass, translating what is left of the
S-expression AST into a format where Core understands.
Core
Currently in our compiler the Pipeline.typecheck
and
Pipeline.compile
function are backend specific, however the
general transformations are similar regardless of the backend.
HR
This layer of core stand for Human Readable
, as we have variable
names instead of hairy de-bruijn indices.
This phase is shortly lived, as most processing at this level wants
to work on some extension of the Internal Representation
as names
create issues for the various algorithms.
IR
This is where the main process of type checking happens.
Eliminations
In this phase of Core, we remove the types from the terms, as they
don't serve any computation purpose. By this point we've done most
of the processing we need.
In the future I suspect inlining algorithms will either be done here
or in IR right above.
Backends
Michelson
LLVM
Arithmetic Circuit
5.6.1 - Idealized Pipeline
An Idealized version of our Core pipeline
This page covers what an idealized version of our core pipeline should
look like for the time being.
The sections are laid out as followings.
The Pipeline defines the overall structure of the pipeline with
comments between each step
Pipeline Function as a BNF with Examples gives a more long form BNF
along with examples and commentary about the function. Further at
the bottom of each section is a link to the reference for any
literature that may be or may not be relevant.
The Outstanding section asks open questions about our passes and
what questions arise that we don't have good answers to yet.
The Exhibition section gives further examples of a few key points
when noted elsewhere in this document
The References section is where further links go about any related
subjects
Pipeline
patternMatchPipeline :: [(Pats,RHS)] ⟶ Type ⟶ ???
patternMatchPipeline patternClauses type' =
terminationChecking patternClauses type'
|> corePipline
-- The @corePipeline@ works over a context and an expression. The
-- context is preserved throughout the computation as it's needed for
-- many of these steps
corePipeline ∷ Context ⟶ Expression ⟶ ???
corePipeline form =
-- Need to lambda lift pattern matching let abstractions, we don't
-- worry about them here, instead we have the corePipeline go over
-- those
lambdaLiftLocalMatches form
-- Local lets are simply just have guards left, we can remove both
-- the top level guards here, and the let local guards.
-- Once we realize this, we can also notice that lets can only be
-- a single form.
-- TODO ∷ we currently don't have guards going down to this level, we
-- discharge it as one of our first desugar passes
|> removeGuardsAndSimplifyLetToSingleCase
-- Here we turn lets which have arguments into letting a name over
-- multiple lambdas
|> localLetsIntoLambdas
-- Move OUT ∷ Desugar pass/Macro pass once we have that
|> openExpressionIntoLetOverOpenName
-- Has to resolve inner modules, so we have to treat namesymbols as a
-- term for forms in the closure
-- see exhibition ⒈
|> hrToIR
-- This step should be moved to the effect block of typeCheckEvalLoop
-- when we support first class opens
|> nameResolution
-- Here we have to reun infixResolution after nameResolution
|> infixResolution
-- toANF runs an algorithm that takes the current form, and creates a valid HTML
-- ANF algo should run before ‘interactiveMode‘, so we can run algebraic effects
|> toANF
|> branch [pipelineTypedMacro, pipelineUntypedMacro]
-- either go to interactiveMode or proceed to compilation
-- eraseZeroUsages looks a the current form, and removes any
-- values/terms that have zero usages attached to them
|> eraseZeroUsages
-- TODO ∷
-- See Reference Inlining
-- for some good analysis on when to inline
-- Currently We inline all globals, which excludes pattern matching functions
|> inlineAlgorithm
-- For the backends we need names rather than de bruijn indices to
-- compile. This step gives us back names
|> restoreNameFromDeBruijn
-- TODO ∷
-- See Reference Fast Curry
-- Here we want the eval-apply algorithm from the reference
-- This will change the code to a form that can explicitly talk about currying.
|> fastCurry
-- We currently have this form. This computes the capture environment
|> captureLambdaEnvironment
-- In Michelson we want to branch off before this
-- This pass we want to explicitly break a lambda into an
-- environment-closure and the binder itself
|> lambdaIntoExplicitBinderAndClosure
pipelineUntypedMacro form =
mutual [macroExpand, eval] form
-- This is a mutually recursive step, due to how haskell works, we
-- have to put it as 1 step, however if we could, we should be able
-- to break it up and put labels to show it's explicitly recursive.
|> mutal [typeCheckEvalLoop, interactiveMode]
-- Requires research on
pipelineTypedMacro form =
-- If macros are typed, then it's recursive with typeCheck
-- meta programming consturctions inside the context, so witch can use it
mutual [typeCheck, macroExpand, eval, interactiveMode] form
-- Interactive mode is a poor man’s REPL
-- Initially, it displays the results from typechecking and then it
-- restarts the whole process again
interactiveMode form =
-- return the result of the evaluation, and a derivation tree
-- optionally read some code and return
showResultAndDerTree form
input <- readCode
juvixPipeline input
Pipeline Function as a BNF with examples
Lambda Lift the Local Matches
BNF
input-expresson ::=
let <let-path-input>+ in <expression>
| ...
output-expression ::=
let <let-name-output>+ in <expression>
| ...
let-pat-input ::= <name> <pattern>* <guard>+
let-name-output ::= <name> <name>* <guard>+
guard? ::= \| <expression> = <expression>
| <expression>
Examples
(** Example 1 *)
(* Input *)
def example =
let
foo (Just x) = 5
foo Nothing = 6
in foo (Just ())
(* Output *)
(* generated by the pass *)
def #g:unique-name (Just x) = 5
#g:unique-name Nothing = 6
def example =
#g:unique-name (Just ())
(** Example 2 *)
(* Input *)
def example =
let foo x = 5 + x in
foo 3
(* Output *)
def example =
let foo x = 5 + x in
foo 3
(** Example 3 *)
(* Input *)
def example =
let foo {x, y} = y + x in
foo {x = 3, x = 5}
(* Output *)
(* generated by the pass *)
def #g:unique-name {x, y} = x + y
def example =
let foo x = 5 + x in
#g:unique-name {x = 3, x = 5}
Description
The Transformation takes any let expression with <patterns>*'s that
are any form of pattern matching, into top level declarations that
are uniquely named for use where the previous let expression was
used.
If the let form has only simple patterns, then we leave it
un-transformed, this can be seen in Example 2
in the above
code.
Possible Complications
Literature
See Lambda Lifting
Remove Guards, and Multiple lets to single let
NOTE, Multiple lets of the same function, not nested
BNF
;; Input
input-expression ::=
let <let-name-output>+ in <expression>
| ...
input-declaration ::=
def <def-pattern-input>+
| ...
;; Ouptut
ouptut-expression ::=
let <name> <name>* = <expression> in <expression>
| ...
ouptut-declaration ::=
def <def-pattern-ouptut>+
| ...
;; Helpers
def-pattern-input ::= <name> <pattern>* <guard?>
def-pattern-ouptut ::= <name> <pattern>* <expression>
let-name-input ::= <name> <name>* <guard?>
guard? ::= \| <expression> = <expression>
| <expression>
Examples
(** Example 1 *)
(* Input *)
def add-3 x =
let bar y = 3 + y in
bar x
(* Output *)
def add-3 x =
let bar y = 3 + y in
bar x
(** Example 2 *)
(* Input *)
def add-3-not-on-0 x =
let compute y
| y == 0 = Nothing
| else = Just (y + 3)
in compute x
(* Output *)
def add-3-not-on-0 x =
let compute y = if (Just (y + 3)) Nothing (y == 0)
in compute x
(** Example 3 *)
(* Input *)
def add-3-not-on-0 x =
let compute y
| y == 0 = Nothing
compute y = Just (y + 3)
in compute x
(* Output *)
def add-3-not-on-0 x =
let compute y = if (Just (y + 3)) Nothing (y == 0)
in compute x
(** Example 4 *)
(* Input *)
def add-3-unless-1-or-0 x
| x == 0 = 0
| x == 1 = 1
| else = x + 3
(* Output *)
def add3-unless-1-or-0 x =
if 0
(if 1
(x + 3)
(x == 1))
(x == 0)
(* If is defined as follows *)
(* about usages, consult the following table about
what we can do usage wise
cond
first-class-usages? = if' (* comedy *)
weakening? = 1
else = ω *)
sig if : {P : ((x : Bool) -0-> *))}
-0-> t : P True
-ω-> f : P False
-ω-> b : Bool
-1-> P b
let if {Proof} then-case else-case True = then-case
if {Proof} then-case else-case False = else-case
Description
Here we do the majority of the work of removing guards from lets. We
can notice that lets and defs can both at this point desugar their
guard checks into dispatching to this if
destructor function
given.
We can also notice that after we run this, there can be no more lets
with multiple patterns as both guards and matches have been handled now.
Possible Complications
We already handle guards in the desugar passes. Currently it just
sees the guards as Cond
's and desugars them into cases
. This
is however incorrect as pattern matching has a fall through
behavior this does not respect.
This algorithm may be better served by the linearization algorithm
given by the pattern match simplification. I'm not sure this is
always possible in the def
case. For the simple lets we
currently have this works as a strategy however.
This could maybe also be served better with an with pattern
through some strategy. See the following.
(* Input *)
sig foo : maybe (maybe int) -1-> int
def foo (Just (Just x)) | x == 0 = 3
foo (Just x) | x == Just 3 = 5
foo _ = 0
(* output *)
sig foo-inner : maybe (maybe int) -1-> bool -1-> bool -1-> int
def foo-inner (Just (Just x)) true _ = 3
foo-inner (Just x) _ true = 5
foo-inner _ _ _ = 0
sig foo : maybe (maybe int) -1-> int
def foo m@(Just (Just x)) = foo-inner m (x == 0) false
foo m@(Just x) = foo-inner m false (x == Just 3)
foo m = foo-inner m false false
Open Expression into Let Over Simple Open
Name Resolution
BNF
input-expression ::= open <name> in <input-expression>
| ...
;; remove open
;; Note that in more complex cases this may not be possible, so we may
;; not have any bnf changes in this pass
output-expression ::= ...
Examples
(** Example 1 *)
(* Input 1 *)
(* Written by the user, which is part of the env by now *)
open Prelude
(* Input we see at this level *)
sig add-list : Int.t -> List.t Int.t -> Int.t
def
add-list (global-pat 0) (Cons (global-pat 1) (global-pat 2)) =
open Michelson in
(* + is in Michelson *)
let global-pat 0 + global-pat 1 in
add-list (indicy 0) (global-pat 2)
add-list (global-pat 0) Nil =
global-pat 0
(* Output *)
sig add-list :
Prelude.Int.t Prelude.-> Prelude.List.t Prelude.Int.t Prelude.-> Prelude.Int.t
def
add-list (global-pat 0) (Prelude.Cons (global-pat 1) (global-pat 2)) =
open Prelude.Michelson in
(* + is in Michelson *)
let global-pat 0 Prelude.Michelson.+ global-pat 1 in
add-list (indicy 0) (global-pat 2)
add-list (global-pat 0) Prelude.Nil =
global-pat 0
Description
Here we run name resolution, for top level opens, this is rather
simple, as our current Contextify
step notes top level opens. Thus
what needs to be respected is how opens effect the closure and
determine for that what is referencing to outer names.
Possible Complications
Literature
To ANF
BNF
Examples
Description
Possible Complications
Literature
Mutual Block
Type Checker
BNF
Examples
Description
Possible Complications
Literature
Witch
BNF
Examples
Description
Possible Complications
Literature
Erase Zero Usages
BNF
Examples
(** Example 1 *)
(* Example may be artificial *)
(* Input *)
sig linear-comb-pos-nats
: Π 1 Nat.t
(Π 1 Nat.t
(Π 2 Nat.t
(Π 0 (>= (indicy 0) 1)
(Π 0 (>= (indicy 2) 2)
(refinement Nat.t (> (indicy 0) 1))))))
(* global-pat 3 and global-pat 4 are the 0 usage proofs *)
def pos-linear-comb (global-pat 0)
(global-pat 1)
(global-pat 2)
(global-pat 3)
(global-pat 4) =
let ( * (global-pat 2) (global-pat 1)) : 1 Nat.t in
(* Let's prove that our result is at least (max (arg 1) (arg 2)),
using our proofs of (arg 3) and (arg 4) *)
let times-by-ℕ⁺-is-greater (indicy 0) (global-pat 3) (global-pat 4) : 0 Nat.t in
refinement (+ (indicy 1) ( * (global-pat 0) (global-pat 3))) (indicy 0)
(* Output *)
sig linear-comb-pos-nats
: Π 1 Nat.t
(Π 1 Nat.t
(Π 2 Nat.t
(Π 0 (>= (indicy 0) 1)
(Π 0 (>= (indicy 2) 2)
(Refinement Nat.t (> (indicy 0) 1))))))
(* global-pat 3 and global-pat 4 are the 0 usage proofs *)
def pos-linear-comb (global-pat 0)
(global-pat 1)
(global-pat 2)
(global-pat 3)
(global-pat 4) =
let ( * (global-pat 2) (global-pat 1)) : 1 Nat.t in
(+ (indicy 0) ( * (global-pat 0) (global-pat 2)))
Description
In this code we want to remove any code which has 0 usages in
them. In particular, we can see in Example 1
that the proof about
the number being greater than 1 is fully erased.
However, it is important to note a few points. We did not erase the
passing
of extra arguments to linear-comb-pos-nats
. This is
because these arguments are still passed on the application side,
both in existing code and in any incremental compilation
passes. These arguments are ignored. Alternatively, we could chose a
strategy such that it's noted what the 0 usage arguments are, but
have a shim layer that transforms arguments before application with
functions with 0 argument functions.
Possible Complications
What to do with 0 usage function arguments. See Description for
another method of doing this.
What do we do about top level records usage wise? We need the 0
usage arguments, so we have to treat them in an interesting way.
We need to preserve signatures which only partially care about
usage erasure.
Literature
Inline Algorithm
BNF
Examples
Description
Possible Complications
Literature
Restore name from De Bruijn indicies
BNF
Examples
Description
Possible Complications
Literature
Fast Curry
BNF
Examples
Description
Possible Complications
Literature
Capture Lambda Environment
BNF
Examples
Description
Possible Complications
Literature
Lambda Into Binder and Closure
BNF
Examples
Description
Possible Complications
Literature
Outstanding questions
How to Deal with Expansion?
Need eval? How does this reflect in the pipeline
Types of Macros?
How do we properly type it
Algebraic effects without Algebraic effects?
We want to structure the typeCheckEvalLoop
as follows
throw CantResolveSymbol ~> nameResolution
-- typeChecking does
-- Hole insertion, unifier, Type Checking, nameResolution
typeCheckEvalLoop =
doesAthing
{resolverEffects
= {
-- filling holes
, UnifyImplicits ~> unifier
-- Creating holes for the unifier to fill
, InsertImplicit ~> implicitInsertion
-- For evaling terms
, Evaluate ~> eval
-- For first class Name Resolution
, CantResolveSymbol ~> nameResolution
}}
Exhibition
⒈
def bar =
let
mod baz =
mod biz =
def foo = 3.
baz.biz.foo
~> 0.biz.foo
~> resolve-symbol (resolve-symbol foo biz) 0
⒉ openExpressionIntoSimpleOpenPlusLet
def bar =
let mod bar =
def foo = 3.
let #g:gensym = (bar djfka djfkas dfjak)
open #g:gensym ~> open with type {foo}
resolve-symbol foo #g:gensym.
def bar =
let mod bar x y z =
def foo = 3.
let #g:gensym = (bar djfka djfkas dfjak)
open #g:gensym ~> open with type {foo}
resolve-symbol foo #g:gensym.
def bar =
let mod bar x y z =
def foo = 3.
open (fold function-generating-a-module xs : mod {…})
resolve-symbol foo #g:gensym.
def bar =
let mod bar x y z =
def foo = 3.
let def #g:gensym =
fold function-generating-a-module xs : mod {…}
open-simple #g:gensym
resolve-symbol foo #g:gensym.
References
De Bruijn Indicies
Benjamin C. Pierce: Types and Programming Languages chapter 6 page 75-80
Template
BNF
Examples
Description
Possible Complications
Literature
5.7 - Backends
Juvix supports backends for generating code for different targets (Michelson,
LLVM etc.). Juvix programs are written in terms of primitives, i.e.
values and types that are directly available in the target language. This
allows is us to abstract away from target-specific code by replacing these
primitives with ones from other targets, but at the same time give enough
flexibility to execute target-specific operations if necessary.
Connection with pipeline
The backend implements several datatypes and class-instances that connect to
the pipeline of Juvix.
Main backend instance
An instance of HasBackend
, that implements the main connection to the
pipeline:
respectively);
this backend;
Optionally an implementation for the parse
function;
Optionally an implementation for the typecheck
function;
An implementation for the compile
function.
Parameterisation
We need to inform the Juvix compiler of some basic information about our
backend, such as:
A function for typechecking primitives: hasType
.
Lists of builtin values and types: builtinTypes
and builtinValues
.
Functions for creating primitives based on a literal integers, floats or
strings: stringVal
, intVal
and floatVal
.
Substitution and weakening
For compilation, the backend takes a representation of the input program that
is very close to a primitive-extended lambda calculus with multi-argument
lambda's. As the primitives of the target language themselves can be
functions, for example basic arithmetic operators, we can normalize these
terms as well. For this we need to instantiate two classes for the
primitives:
HasSubstV
: For substitution of term.
HasWeak
: For weakening of terms (accounting for removal of a lambda
abstraction).
HasPatSubstTerm
: Substitution of patterns.
Each of these are documented in more detail at the
evaluator page.
Additionally, to make applying terms possible during substitution, the
CanApply
class needs to be instantiated for primitive terms and types.
5.7.1 - LLVM
Note that this description is merely a design, and does not reflect the
current state of the LLVM backend yet. The design and implementation might
change in the future.
Introduction to LLVM
LLVM is an intermediate language and collection of compiler tools. The
intermediate language can be compiled to several backends including AMD64 and
Web Assembly, giving us a lot of flexibility.
Structure of the backend
Primitives
The LLVM backend uses an intermediate representation for primitives that are
specific to LLVM: RawPrimVal
for values and functions, and PrimTy
for
types. Each of these have a rather direct relation to constructs available in
LLVM.
Application
It is possible that primitives are terms that can be reduced further, i.e.
addition of two constants can be replaced by a single constant. To support
this, the LLVM backend implements application of terms by instancing the
CanApply
class for both types and values.
Parameterisation
The parameterisation of the LLVM backend is really easy and follows
implementation of the Parameterisation
datatype:
hasType
is a function that checks the types for raw primitive values.
Currently the typechecking is very basic, it just checks if the given types
matches with the primitive value in arity. For operator primitives, we
check if the types of the arguments are equal. This implementation will
definitely have to be refined in the future.
builtinTypes
and builtinValues
just provide a mapping from source code
literal names to the actual literals.
integerToRawPrimVal
creates a literal based on an integer. Currently it
doesn't take the size of the literal into account, as integer literals
always share the same type. This should be improved in the future.
There are currently no implementations for stringVal
and floatVal
, as
the backend does not support these types yet.
Substitution and weakening
For simplicity, the current HasWeak
, HasSubstValue
and HasSubstTerm
instances do not do anything at all. This will likely change in the future.
Connection to the pipeline
The LLVM instances for HasBackend
relies on the default for the parse
function. Additionally:
The typecheck
function relies on the general typechecker implementation
by calling the typecheck'
function, together with the parameterisation
and a the LLVM representation of a set type.
The compile
function, which gets a filename and an annotated term
(AnnTermT
) as the program.
The annotated term with typed primitives is translated to an annotated term
with raw primitives RawPrimVal
. These are then translated to LLVM code.
Compilation
With the input program being represented by the annotated Term
, we can now
easily translate these to LLVM code. We rely mostly on the
llvm-hs-pure package,
as it provides a nice monadic interface LLVM.IRBuilder
that keeps track of
variables for us. Therefore, the implementation is rather straightforward:
The whole program, is translated to an LLVM module by the compileProgram
function. This module contains a hardcoded main
function, with the type
taken from the outermost annotated term.
The term itself is compiled by the compileTerm
function, using recursion
for any sub terms.
For any primitive functions, we check if enough arguments are available,
and if so the primitive function is written, with unnamed references to the
arguments all taken care of by the IRBuilder
monad.
Each of these functions are called in their own part of the pipeline, the
output of parse
is the input of typecheck
and the output of typecheck
is the input of compile
.
6 - Tutorials
A place to show off plently of examples and tutorials about Juvix
A place to see all kinds of tutorials about Juvix
6.1 - Writing Proof with Witch
An Article About using Witch
Writing proofs in Juvix
To popularise software verification, we must make proof writing less of
a burden on you, the user. Currently, only specialists and a few others
can write proofs for their programs, even though you surely understand
the domain and invariants of your projects – otherwise you wouldn’t
be able to write any software at all.
Our hypothesis is that programmers are
well-equipped to derive and prove properties of the software they write,
but they lack the mathematical maturity and vocabulary to carry out a
formal proof. We have evidence that students fail to produce well-made
proofs due to the lack of mathematical maturity, even though they do
understand the subject matter at hand.
We believe that program verification doesn’t have to be like this,
and we did something about it. We created an assistant tool for theorem provers
that puts together many different strategies for proof automation. Our
specification for such an assistant for Juvix uses algebraic effects and
handlers as the means of congregation.
The approach to facilitate program verification in Juvix is named Witch, a play on
the name of assistant tools colloquially called “wizards”. There is no consensus of
what a wizard is or what the exact tasks it is supposed to assist with. Wizards
seem to be used mostly for multiple-step and/or configuration features, however. We
went for the name “witch” to align it to the idea of assistant tools,
while dodging the overloaded and confusing terminology.
We propose an effects and handlers view of such proofs,
based on prior work developed on the Andromeda
prover.
In Juvix, you program as you would normally and
invoke the proof environment as an effect to prove certain properties as
you go. Given that the proof environment is just an effect, different proof
styles (e.g., Agda-style dependent types,
SMT solver, proof search) can be composed under a shared interface, a
proof object that can manipulate itself while querying different
automated provers.
The reasons we employ algebraic effects and handlers are numerous:
as proofs cannot be fully automated, all approaches that try to
automate the process (e.g, proof search, SMT solver) may be
non-deterministic or never find a solution. Therefore, the system
should be able to handle “impure” computations and errors. Algebraic
effects and handlers have well-defined semantics and provide a
simple interface for performing effects. With them, we avoid
indiscriminate effects that are often error-prone and intricate
effects machinery such as monad transformers;
the semantics of effects and handlers accommodate composition of
arbitrary effects and the composition of multiple handlers, which
means users have the ability to weaken more general strategies into
specific ones while maintaining top-level handlers unmodified;
effects and handlers have well-defined semantics, and it is the main
feature of many new languages, such as
Koka,
Eff, Frank and
Effekt, and have implementation in all major
functional programming languages. This guarantees that our approach is based
in research and real-world experiences, and not something we hacked together
over the weekend.
The Essence of Witch
As for the syntax, we use operation { params } via handler
, which is
semantically equivalent to but a syntactic improvement over handler(operation, params)
, since effect handling
is similar to function application, but also carries effect information.
The user defines data types and functions as usual, and then uses
Witch to prove properties about said definitions. The examples below
show it in action:
data ℕ : Type where
0 : ℕ
s : ℕ → ℕ
_+_ : ℕ → ℕ → ℕ
0 + n = n
(s m) + n = suc (m + n)
+comm : ∀ (x y : ℕ) → Witch x + y ≡ y + x
+comm 0 y = tactic { 0 + y ≡ y + 0 } via Ring
+comm (s x) y = search { x + (s y) ≡ suc (x + y) } via Backtracking
++assoc : ∀ {A : Set} (x y z: [A]) → Witch (x ++ y) ++ z ≡ x ++ (ys ++ zs)
++assoc ∅ y z = solve { (∅ ++ y) ++ z ≡ ∅ ++ (ys ++ zs) } via SMT
++assoc (x ∷ xs) y z rewrite ++assoc xs y z
= pure ref
The proof of commutativity of addition under
natural numbers and of associativity of list concatenation are shown,
and use the three main effects: Solver, Proof Search and Tactics. In the
proof assistant literature, there exists no precise definition of
commonly used terms “solver”, “proof search” and “tactics”. All these
terms are used in different communities, and mean some variation of
“automatic strategy to construct a term under certain constraints”.
In Witch, however, we use the following definitions of previously known terms:
The Solver effect is used for integration with external solvers via
IO; we believe it should suffice for the user to write, e.g. SMT
, and
handlers should implement internal strategies to choose between the
different theories supported by solvers. If the black-box approach
to solvers presents itself a challenge, a specialisation of the
handler is possible, e.g. operation { params } via Z3.QF-UFLIA
.
The Proof Search effect is used for library-level algorithms; users
may choose to implement their own algorithms using a limited set of
meta-programming¹ constructs that are handled at top-level².
The Tactics effect is used for strategies that simplify the goal at
least one step, and may not complete all proof goals. This style is
mainly used in the proof assistant Coq. While we do not foresee
using as many as many tactics implemented as in Coq, we believe
tactics such as eauto
, rewrite
, unfold
are useful to users of
all levels of expertise.
Lastly, the Error effect is used for feedback to the user, since any
of the strategies may fail. Error has two operations, throw
and
trace
: the former notifies the user that a strategy has failed,
while the latter registers³ completed sub-goals during the
strategy’s attempt to complete the proof.
Witch for Juvix is currently under development, and we hope to release its first
version soon. :)
¹ By meta-programming, we mean “code that manipulates itself”, and
not “programming that happens in a higher level of abstraction”. For
dependently typed programming languages, the usual term is
reflection. However, we prefer not use reflection since it has
a different meaning in terms of effectful computations.
² The meta-programming constructs are operations of the Typechecker
effect whose handlers are not available for the user.
³ Internally, the Typechecker effect should have a tree that stores
all currently saved traces.
6.2 - Multi-Bear Domicile Setup
A short lead description about this content page. It can be bold or italic and can be split over multiple paragraphs.
This is a placeholder page. Replace it with your own content.
Text can be bold, italic, or strikethrough. Links should be blue with no underlines (unless hovered over).
There should be whitespace between paragraphs. Vape migas chillwave sriracha poutine try-hard distillery. Tattooed shabby chic small batch, pabst art party heirloom letterpress air plant pop-up. Sustainable chia skateboard art party banjo cardigan normcore affogato vexillologist quinoa meggings man bun master cleanse shoreditch readymade. Yuccie prism four dollar toast tbh cardigan iPhone, tumblr listicle live-edge VHS. Pug lyft normcore hot chicken biodiesel, actually keffiyeh thundercats photo booth pour-over twee fam food truck microdosing banh mi. Vice activated charcoal raclette unicorn live-edge post-ironic. Heirloom vexillologist coloring book, beard deep v letterpress echo park humblebrag tilde.
90’s four loko seitan photo booth gochujang freegan tumeric listicle fam ugh humblebrag. Bespoke leggings gastropub, biodiesel brunch pug fashion axe meh swag art party neutra deep v chia. Enamel pin fanny pack knausgaard tofu, artisan cronut hammock meditation occupy master cleanse chartreuse lumbersexual. Kombucha kogi viral truffaut synth distillery single-origin coffee ugh slow-carb marfa selfies. Pitchfork schlitz semiotics fanny pack, ugh artisan vegan vaporware hexagon. Polaroid fixie post-ironic venmo wolf ramps kale chips.
There should be no margin above this first sentence.
Blockquotes should be a lighter gray with a border along the left side in the secondary color.
There should be no margin below this final sentence.
This is a normal paragraph following a header. Knausgaard kale chips snackwave microdosing cronut copper mug swag synth bitters letterpress glossier craft beer. Mumblecore bushwick authentic gochujang vegan chambray meditation jean shorts irony. Viral farm-to-table kale chips, pork belly palo santo distillery activated charcoal aesthetic jianbing air plant woke lomo VHS organic. Tattooed locavore succulents heirloom, small batch sriracha echo park DIY af. Shaman you probably haven’t heard of them copper mug, crucifix green juice vape single-origin coffee brunch actually. Mustache etsy vexillologist raclette authentic fam. Tousled beard humblebrag asymmetrical. I love turkey, I love my job, I love my friends, I love Chardonnay!
Deae legum paulatimque terra, non vos mutata tacet: dic. Vocant docuique me plumas fila quin afuerunt copia haec o neque.
On big screens, paragraphs and headings should not take up the full container width, but we want tables, code blocks and similar to take the full width.
Scenester tumeric pickled, authentic crucifix post-ironic fam freegan VHS pork belly 8-bit yuccie PBR&B. I love this life we live in.
This is a blockquote following a header. Bacon ipsum dolor sit amet t-bone doner shank drumstick, pork belly porchetta chuck sausage brisket ham hock rump pig. Chuck kielbasa leberkas, pork bresaola ham hock filet mignon cow shoulder short ribs biltong.
This is a code block following a header.
Next level leggings before they sold out, PBR&B church-key shaman echo park. Kale chips occupy godard whatever pop-up freegan pork belly selfies. Gastropub Belinda subway tile woke post-ironic seitan. Shabby chic man bun semiotics vape, chia messenger bag plaid cardigan.
- This is an unordered list following a header.
- This is an unordered list following a header.
- This is an unordered list following a header.
- This is an ordered list following a header.
- This is an ordered list following a header.
- This is an ordered list following a header.
What | Follows |
---|
A table | A header |
A table | A header |
A table | A header |
There’s a horizontal rule above and below this.
Here is an unordered list:
- Liverpool F.C.
- Chelsea F.C.
- Manchester United F.C.
And an ordered list:
- Michael Brecker
- Seamus Blake
- Branford Marsalis
And an unordered task list:
And a “mixed” task list:
And a nested list:
- Jackson 5
- Michael
- Tito
- Jackie
- Marlon
- Jermaine
- TMNT
- Leonardo
- Michelangelo
- Donatello
- Raphael
Definition lists can be used with Markdown syntax. Definition headers are bold.
- Name
- Godzilla
- Born
- 1952
- Birthplace
- Japan
- Color
- Green
Tables should have bold headings and alternating shaded rows.
Artist | Album | Year |
---|
Michael Jackson | Thriller | 1982 |
Prince | Purple Rain | 1984 |
Beastie Boys | License to Ill | 1986 |
If a table is too wide, it should scroll horizontally.
Artist | Album | Year | Label | Awards | Songs |
---|
Michael Jackson | Thriller | 1982 | Epic Records | Grammy Award for Album of the Year, American Music Award for Favorite Pop/Rock Album, American Music Award for Favorite Soul/R&B Album, Brit Award for Best Selling Album, Grammy Award for Best Engineered Album, Non-Classical | Wanna Be Startin' Somethin', Baby Be Mine, The Girl Is Mine, Thriller, Beat It, Billie Jean, Human Nature, P.Y.T. (Pretty Young Thing), The Lady in My Life |
Prince | Purple Rain | 1984 | Warner Brothers Records | Grammy Award for Best Score Soundtrack for Visual Media, American Music Award for Favorite Pop/Rock Album, American Music Award for Favorite Soul/R&B Album, Brit Award for Best Soundtrack/Cast Recording, Grammy Award for Best Rock Performance by a Duo or Group with Vocal | Let’s Go Crazy, Take Me With U, The Beautiful Ones, Computer Blue, Darling Nikki, When Doves Cry, I Would Die 4 U, Baby I’m a Star, Purple Rain |
Beastie Boys | License to Ill | 1986 | Mercury Records | noawardsbutthistablecelliswide | Rhymin & Stealin, The New Style, She’s Crafty, Posse in Effect, Slow Ride, Girls, (You Gotta) Fight for Your Right, No Sleep Till Brooklyn, Paul Revere, Hold It Now, Hit It, Brass Monkey, Slow and Low, Time to Get Ill |
Code snippets like var foo = "bar";
can be shown inline.
Also, this should vertically align
with this
and this.
Code can also be shown in a block element.
foo := "bar";
bar := "foo";
Code can also use syntax highlighting.
func main() {
input := `var foo = "bar";`
lexer := lexers.Get("javascript")
iterator, _ := lexer.Tokenise(nil, input)
style := styles.Get("github")
formatter := html.New(html.WithLineNumbers())
var buff bytes.Buffer
formatter.Format(&buff, style, iterator)
fmt.Println(buff.String())
}
Long, single-line code blocks should not wrap. They should horizontally scroll if they are too long. This line should be long enough to demonstrate this.
Inline code inside table cells should still be distinguishable.
Language | Code |
---|
Javascript | var foo = "bar"; |
Ruby | foo = "bar"{ |
Small images should be shown at their actual size.

Large images should always scale down and fit in the content container.

The photo above of the Spruce Picea abies shoot with foliage buds: Bjørn Erik Pedersen, CC-BY-SA.
Components
Alerts
This is an alert.
Note
This is an alert with a title.Note
This is an alert with a title and Markdown.This is a successful alert.
This is a warning.
Warning
This is a warning with a title.Another Heading
Add some sections here to see how the ToC looks like. Bacon ipsum dolor sit amet t-bone doner shank drumstick, pork belly porchetta chuck sausage brisket ham hock rump pig. Chuck kielbasa leberkas, pork bresaola ham hock filet mignon cow shoulder short ribs biltong.
This Document
Inguina genus: Anaphen post: lingua violente voce suae meus aetate diversi. Orbis unam nec flammaeque status deam Silenum erat et a ferrea. Excitus rigidum ait: vestro et Herculis convicia: nitidae deseruit coniuge Proteaque adiciam eripitur? Sitim noceat signa probat quidem. Sua longis fugatis quidem genae.
Pixel Count
Tilde photo booth wayfarers cliche lomo intelligentsia man braid kombucha vaporware farm-to-table mixtape portland. PBR&B pickled cornhole ugh try-hard ethical subway tile. Fixie paleo intelligentsia pabst. Ennui waistcoat vinyl gochujang. Poutine salvia authentic affogato, chambray lumbersexual shabby chic.
Plaid hell of cred microdosing, succulents tilde pour-over. Offal shabby chic 3 wolf moon blue bottle raw denim normcore poutine pork belly.
External Links
Stumptown PBR&B keytar plaid street art, forage XOXO pitchfork selvage affogato green juice listicle pickled everyday carry hashtag. Organic sustainable letterpress sartorial scenester intelligentsia swag bushwick. Put a bird on it stumptown neutra locavore. IPhone typewriter messenger bag narwhal. Ennui cold-pressed seitan flannel keytar, single-origin coffee adaptogen occupy yuccie williamsburg chillwave shoreditch forage waistcoat.
This is the final element on the page and there should be no margin below this.
6.3 - Another Tutorial
A short lead description about this content page. It can be bold or italic and can be split over multiple paragraphs.
This is a placeholder page. Replace it with your own content.
Proof Searching of Juvix
The Argument for Implicit Refinements Over Indexed Types For Structures
The current thought on how to do that is as follows
Provide a base structure with no extra information as the main
module
So
module Data.Map where
type Map a b = ...
Provide various libraries that provides proofs over the structure
Provide a module
which gives back all available refinements defined by the
library.
This gives the programmer a heavy weight way of figuring out
what proofs/refinements are needed in order to allow your code
to verify
This module is not intended to be left in final, code but in
process of developing a module
I believe this is superior to indexed types, as this would allow the
programmer to only add types they care about, and not be forced to
consider moving their list to vector if they care about length.
6.4 - Syntax-Guide
A tutorial on the Current Syntax
This page is subject to change with the syntax redesign.
Syntax Guide
This document is heavily inspired by the Idris syntax guide.
File Organization
A file contains zero or more Top Level Declarations
For example
-- this is a function with a signature!
sig foo : Natural
let foo = 3
-- this is a data type declaration
type list a = Cons a (List a)
| Nil
-- this is a module declaration!
mod Boo =
let fi = 10
end
Comments
Comments are denoted by two dashes, --
. The two dashes and all characters up until the end of the line are discarded.
Example:
Symbols
Symbols are used for any name declared in the Juvix programming
language. Symbols are broken into two categories, infix and prefix.
Prefix symbols start with either an letter or an underscore,
which can be followed up by any alphanumeric character, underscores,
punctuation(?, !), or dashes.
-- a valid symbol
hello-there
-- another valid symbol
_unused-symbol
-- another valid symbol. Typically '?' denotes that the symbol is a predicate
even?
-- An important action
foo!
-- not a valid prefix symbol
-foo
An infix symbol starts with any character other than a letter or an underscore.
-- This is a valid infix symbol
!!
-- this is also another valid infix symbol
-<++
-- this is not a valid infix symbol, but instead a comment of a -
---
-- here is an arrow infix symbol
->
Top Level Declarations
Functions
Functions are started by writing let
which is followed by any valid
prefix symbol or an infix symbol surrounded by parentheses which shall
be referred to as the function name. Then, there are zero or more
arguments, with implicit arguments surrounded by curly braces
({}
). The argument list ends when an equal sign (=
) is placed,
which denotes the start of the body.
Example:
-- this is a valid function
let f x = x + 3
-- this is another valid variable/function
let y = 5
-- this is a valid infix symbol
let (+) = plus
-- a function with an implicit argument
let foo {prf} x = x
Another important part of a function is the signature.
A signature is denoted first by sig
, then the function name.
From here colon (:
) denotes the start of the type of the function name.
Subsequently, any valid type can be written.
Example:
-- a valid signature and function
sig foo : int -> int
let foo x = x + 3
-- an example of a dependent signature
sig add-nat-int
: x : nat
-> y : int
-> if | x > y -> nat
| else -> int
let add-nat-int = (+)
Types
Types are very similar to Haskell and Idris ADT
and GADT
declarations.
Types are declared by writing type
following by the name of the type
and arguments much like function syntax. Optionally a type signature
can be given at this point, by writing colon (:
) followed by the type.
An equals sign (=
) denotes the start of the body of the
type declaration.
From here a declaration can take a few forms.
Zero or more sums, each of which starts with pipe (|
) and
contains a tagged product.
A tagged product which starts with the new constructor name and
either the arguments separated by spaces, a colon (:
) followed
by the arguments separated by arrows, or a base record.
A base record which is denoted by curly braces ({}
). inside the
curly braces, a name is given to every argument, which type is
started via colon and terminated by a comma (,
).
-- This is a valid type
-- the a is a type argument
type list a
-- Cons is the constructor
-- Cons takes an item of type a and a List of a
= Cons a (list a)
-- Nil is another constructor taking no arguments
| Nil
-- this is the same type, but GADT style arrow syntax
-- is given to the constructor
type list a : a -> list a
-- Curly braces can be used here to name the arguments
= Cons { car : a,
cdr : list a }
| Nil
-- Same type again but using GADT syntax in the constructors
-- The first product can have a pipe!
type list a =
| Cons : a -> list a -> list a
| Nil : list a
-- an example of a base record!
type coords a = {
x : a,
y : a
}
-- Same example but we have a trailing comma
type cords a = {
x : a,
y : a,
}
-- An example of a dependent type
type vect : (len : nat) -> (elem : set) -> set =
| Nil : vect 0 elem
| Cons : elem -> vect len elem -> vect (succ len) elem
Modules
modules are denoted similarly to functions except that instead of using
let
, mod
is used instead.
Instead of an expression, the body consists of zero or more top-level
declarations followed by end
.
-- example defining a module
mod Foo =
sig bar : nat
let bar = 3
-- The type is inferred here
let baz = 5
-- end ends the module definition
end
-- example using a module
let test = Foo.bar + Foo.baz
Imports
A module can be imported in two ways.
Importing a module unqualified via =open=ing them means that every
symbol in the module becomes unqualified.
A module can be open
-ed:
Example:
-- A valid open
open Foo
-- opening the module Baz in the moudle Bar in the moudle Bar
open Foo.Bar.Baz
-- This is the same statement as above.
open Foo
open Bar.Baz
-- let us define a module
mod IntMod =
let t = int
sig of-nat : int -> t
let of-nat x = x
sig add : t -> t -> t
let add = (+)
end
-- now we shall open it into our scope
open IntMod
-- we can now use it unqualified
sig use-int-mod : nat -> nat -> t
let use-int-mod x y = add (of-nat x) (of-nat y)
A module can also be aliased with a let
:
Example:
-- a valid module alias
let F = Foo
Expressions
Conditionals
If
If expressions have a non-zero number of clauses. Each clause consists
of a boolean test, followed by a body term.
Example:
-- this is a valid if expression!
if | x == 3 -> 5
| else -> 6
-- ^ test ^ consequence
-- this is also a valid a valid if expression
if | x == 10 -> 25
| positive? x -> x
| negative? x -> abs x
| else -> 0
The else
name is just an alias for True
.
Case
Case expressions have a non-zero number of clauses. Each clause
consists of a pattern, followed by a consequence.
A pattern works much like Haskell or Idris, in that one can
deconstruct on a record or a constructor. We also allow record punning
on matches.
Example:
type tree a = Branch (tree a) a (tree a)
| Leaf a
| Empty
-- an example with match!
sig func : Tree nat -> nat
let func foo =
case foo of
| Branch left ele right ->
func left + ele + func right
| Leaf ele ->
ele
| Empty ->
0
-- This is the same function!
let func2 (Branch left ele right) =
func2 left + ele + func2 right
let func2 (Leaf ele) =
ele
let func2 Empty =
0
type coords = {
x : int,
y : int
}
-- match on record
sig origin? : coords -> boolean
let origin? {x, y}
| x == y && x == 0 = True
| else = False
-- same function as origin
sig origin2? : coords -> boolean
let origin2? {x = origX, y = origY}
| origX == origY && origX == 0 =
True
| else = False
Dependent matching
Definitions
Definitions within an expression are like their top level
counterparts, except that in
followed by an expression must be
written after the definition.
Let
let foo =
let bar = 3 in
bar + 10
Modules
let foo =
mod Bar =
let foo = 3
let bat = 10
end in
Bar.foo + Bar.bat
Signatures
Types
let foo =
type bar = Foo int
| Bar nat
in [Foo 3, Bar 10]
Lists
List literals are started by the open bracket character ([
). Within,
elements are separated by commas (,
) before ending with a closing
bracket (]
).
List literal syntax is just sugar for the Cons
and Nil
constructors.
Example:
-- this is a valid list
[1]
-- another valid list
[1,2,3]
-- the same list without sugar
Cons 1 (Cons 2 (Cons 3 Nil))
Tuples
Tuples are formatted like lists, however instead of using brackets,
parenthesis are used instead ( (
)
).
Example:
-- this is a tuple
(1, 2)
-- this is not a tuple
(1)
-- this is a 5 tuple!
(1,2,3,4,5)
Records
Record literals are started by an open curly brace ({
). Within,
elements are bound to the corresponding name of the record via the
equals sign (=
), or punned by the name directly. Elements, like
lists, are separated by commas (,
) before ending with a closing
brace (}
).
Example:
type coords = {
x : int,
y : int
}
-- a new construct called foo for coords
sig create-cords : int -> int -> coords
let create-cords x-dir y-dir = {
x = x-dir,
y = y-dir
}
-- same function with punning
sig create-cords : int -> int -> coords
let create-cords x y = {x, y}
Record updating syntax
Constants
String Literals
Strings are enclosed by double quotes ("
)
Example:
let foo =
"this is a string!"
Integers/Naturals
numbers are denoted by the characters 123456789.
Examples:
-- a valid number literal
let foo = 123
-- another valid number
let number-one = 1
Do Notation
Do notation works similarly as it does in Haskell with changes to make
it indent insensitive. Namely, this means that after every binding a
semicolon (;
) is needed to start the next expression. Further, no
do
is needed, the semicolon is enough to determine if an expression
is in do syntax or not.
Thus like Haskell to bind terms, one states the name, then a left
arrow (<-
), then the monadic expression terminated by a semicolon.
For non bindings, just the monadic expression with a semicolon is
needed.
The last expression in do notation does not need a semicolon.
Example:
let foo my =
x <- Just 5;
y <- my;
pure (x + y)
let bar =
Out.print "hello";
name <- In.prompt "What is your name";
Out.print ("hello" <> name)
Local opens
Local opens work just like global opens, however one has to write in
then a body like other inner definitions.
Example:
let foo xs ys zs =
open List in
append xs (append ys zs)
There is also a more brief syntax where the module is then following
by .( ... code here ... )
Example:
let foo xs ys zs =
List.(append xs (append ys zs))