Theme, a functional systems programming language

This is a project of Jaap Weel, started originally as a class project at VU, and now transferred to the back burner.

Summary

----------------------------------------------------------------------
           THEME, A Functional Systems Programming Language
----------------------------------------------------------------------

SUMMARY

The objective is to write a compiler for a functional programming
language that is suitable for systems programming. In particular, the
language should be suitable for implementing its own runtime system,
and for re-implementing parts of the MINIX operating system. In order
to achieve this objectives, a memory model is needed that uses no
tags, and it should be possible to write modules that do not require
garbage collection. Also, a facility is needed to give the programmer
control over data structure representation. Otherwise, the language
can be similar to ML.

POSSIBLE EXTENSIONS

I have intentionally narrowed the scope of this project from the
original plan. One very interesting addition to the project would be a
back end that uses typed assembly language to produce verifiably safe
object code. Such verifiably safe object code can conceivably be used
as an alternative to virtual memory based protection for a timesharing
operating system. Another extension would be to design a syntax with
more semicolons and curly braces, which would be less scary to
potential users. Also, there are various pet ideas about macros and
type systems that I may want to incorporate.

Theme Report

Draft Report on the Functional Systems Programming Language Theme

THIS IS A DRAFT. IT IS NOT CONSISTENT, LET ALONE COMPLETE.

0. Introduction
---------------

This document is intended as a concise reference manual for the Theme
programming language, and not as a formal specification of the
semantics, nor as a tutorial. Syntax is given in extended BNF.

1. Lexical conventions
----------------------

Space, newline, horizontal tab, carriage return, line feed, and form
feed are considered as blanks. Blanks are ignored, but they separate
adjacent lexical elements. Comments are introduced by a semicolon and
run until the end of a line, or between #| and |#, as in Lisp. Also,
lines with a pound sign '#' in the first column are ignored, but may
be used by preprocessors. Comment symbols inside string or character
literals do not count as such.

Identifiers and literals follow OCaml syntax, but operators are
defined slightly more liberally:

ident ::=  ( letter | _ ) { letter|0...9|_|' } 

letter ::=  A ... Z | a ... z       

integer-literal ::=
   [-] (0...9) { 0...9|_ }
 | [-] (0x|0X) (0...9|A...F|a...f)  { 0...9|A...F|a...f|_ }
 | [-] (0o|0O) (0...7) { 0...7|_ }
 | [-] (0b|0B) (0...1) { 0...1|_ } 

float-literal ::=   
   [-] (0...9) { 0...9|_ } [. { 0...9|_ }]  [(e|E) [+|-] (0...9) { 0...9|_ }] 

char-literal ::=   ' regular-char '                           
               |  ' escape-sequence '                         

escape-sequence ::=   \ (\ | " | ' | n | t | b | r)              
                  |  \ (0...9) (0...9) (0...9)                   
                  |  \x (0...9|A...F|a...f)  (0...9|A...F|a...f) 

string-literal ::=   " { string-character } " 

string-character ::=   regular-char-str         
                   |  escape-sequence  

operator-char ::= ! | $ | % | & | * | + | - | . |  / | : 
                | < | = | > | ? | @ |  ^ | | | ~ 

operator = { operator-char }



2. Values
---------


2.1 Atomic data
---------------

Unsigned integer types: u8, u16, u32, u64
Signed integer types:   s8, s16, s32, s64

The type int    is a system native integer (32 or 64 bits).
The type float  is an IEEE-754 32-bit floating point number.
The type double is an IEEE-754 64-bit floating point number.
The type char   is a u8 with some printed representation (e.g. ISO 8859-1).



2.2 Compound data
-----------------

There are two built-in forms of compound data: tuples and
arrays. Strings are simply arrays of characters. The maximum length of
these structures is implementation dependent. 

Variant values are constructor with zero or more objects according to
the arity and type of that constructor. (This follows Haskell, not
OCaml!) Despite the fact that tuples may be syntactically represented
as chains of pairs ( (a * b * c) <=> (a * (b * c)) ), they should be
implemented as flat records!

Procedures are also objects, but they may only be used by means of
function application.



3. Names
--------

Operators in parentheses are just like identifiers. Identifiers
surrounded by dots are just like operators. Operators not in
parentheses trigger a few syntactic transformations:

(a * b)       <=> ((*) a b)
(a * b * c)   <=> ((*) a ((*) b c))
(a .cons. b)  <=> (cons a b)
(a b * c * e) <=> ((*) (a b) ((*) c e))
(a b c)       <=> ((a b) c)

This means that all operators are right associative and have equal
precedence, except juxtaposition, which has highest precedence, and is
left-associative.  (The designer has a strong personal dislike of
precedences and associativities that need to be memorized.) These
rules also apply to keywords; e.g. 

(let (x = 3) ...) <=> (let ((=) x 3) ...)

Constructors (but not type constructors!) always start with a capital
letter. Other names always start with a lowercase letter or
underscore.



4. Type expressions
-------------------

typexpr ::=  < ident >
          |  typeconstr
          |  ( typeconstr { typexpr } )



5. Constants
------------

constant ::= integer-literal | float-literal | char-literal 
           | string-literal | constr



6. Patterns
-----------

pattern ::= < value-name >
          | constant
          | ( constr { pattern } )

Pattern variables (value-name) may not be operators.



7. Expressions
--------------

expr ::= constant
       | ( begin { expr } )
       | ( expr : typexpr )
       | ( expr , expr )
       | ( constr {expr} )
       | ( { expr }+ )
       | ( match expr { ( pattern => expr ) } )
       | ( let { let-binding } expr )

let-binding ::= ( pattern = expr )
              | ( ident { pattern } = expr )



8. Type definitions
-------------------

type-definition ::= ( type { typedef } )'
typedef ::= ( type-name = type-description )
type-name ::= typeconstr-name | ( typeconstr-name { <ident> } )
type-description ::= ( data   { typexpr } )
                   | ( struct { ( field : typexpr ) } )


***TODO*** Specify the special "layout" features here.

8.1 Exceptions
--------------

Exceptions are always of type exn. ***TODO***



10. Modules
-----------

Modules in Theme are like OCaml or SML modules, and unlike Modula or
Caml Light modules, in that they are first class objects with
types. 

***TODO*** MODULES HAVE YET TO BE WORKED OUT. IT REALLY IS DESIRABLE
TO INTRODUCE ML MODULES IN FULL GORY GLORY, BUT THIS WILL TAKE SOME
FIGURING OUT ****


11. Prelude
-----------

THIS SECTION WILL AT SOME POINT CONTAIN THE SPECIFICATION FOR A BASIC
LIBRARY CALLED THE "PRELUDE". FOR NOW IT'S JUST SOME SAMPLE CODE

(type
  (either <a> <b>  = (left <a>) (right <b>))
  ((<a> * <b>)     = (<a> , <b>))
  (list <a>        = nil (<a> :: list <a>))
  (maybe <a>       = none (just <a>))
  (bool            = true false)
  (unit            = unit)
  )

(let
  (map _ nil           = nil)
  (map f (x :: xs)     = (f x :: map f xs))
  ((nil ++ ys)         = ys)
  (((x :: xs) ++ ys)   = (x :: (xs ++ ys)))
  (filter _ nil        = nil)
  (filter p (x :: xs)  = (match (p x) 
                          (true => (x :: filter p xs))
                          (false => (filter p xs))))
  (concat xss          = foldr (++) nil xss)
  (concatMap f         = (concat . map f))
  (head (x :: _)       = x)
  (head nil            = error "empty list") 
  (tail (_ :: xs)      = xs)
  (tail nil            = error "empty list") 
  (last (x :: nil)     = x)
  (last (_ :: xs)      = last xs)
  (last nil            = error "empty list")
  (init (x :: nil)     = x)
  (init (x :: xs)      = (x :: init xs))
  (init nil            = error "empty list")
  (null nil            = true)
  (null (_ :: _)       = false)
  (length nil          = 0)
  (length (_ :: l)     = (1 + length l))
  ((nil !! _)          = error "out of bound")
  ((x :: _) !! 0       = x)
  ((_ :: xs) !! n      = (xs !! (n - 1)))
  (foldl f z nil       = nil)
  (foldl f z (x :: xs) = foldl f (f z x) xs)
  (foldl1 f (x :: xs)  = foldl f x xs)
  (foldl1 f nil        = error "empty list")
  ;; And many more...
  )  


 
12. Macros
----------

A hygienic rewrite based macro system similar to the one in Scheme
ought to be added.