Write simple scheme interpreter on ruby

TL;DR: github repo

Every developer has a moment in his life where he wants to write his own programming language. In this article, I want to show you how to do this for a simple lisp compiler.

Why scheme and lisp?

Firstly, lisp is very simple for create and for understanding. Lisp (LISt Processor) is a family of languages which is based on the idea of S-expressions. S-Expression are needed for data representation and may consist of atoms (numbers, symbols, boolean expressions) or are the expression of the form (x . y) where x and y are s-expressions. This expression may be formed as lists ((1 . ( 2 . 3)) this equals (1 2 3)) and trees (((1 . (2 . 3)) . (4 . 5))).

Secondly, after creating interpreter you can better understand the language (the author fully understood the environment idea). Also you can understand the main idea of compilers and interpreters.

Now let’s begin our journey into the world of compilers and interpreters so that we can write a simple scheme interpreter.

Main idea

Our language will contain two parts: a parser which translates the string to AST and eval function. This function will take the AST with envariement value and will return the result of the code.

Schematically, it looks like this:

code(string) => parse function => AST => eval function => result

First step. Parser.

To begin, let’s define what we want to get. For example, we have a string '(+ 1 1 1)'. What should our parser return? What kind of data structure should we receive? I think, that an array would be correct.

Let write simple test code:

program = '(+ 1 1 1)'
lisp = Lisp.new
lisp.parse(program) == [:+, 1, 1, 1]

As you can see, this is a simple code, so I just displayed parse method code:

class Lisp
  def parse(program)
    tokenize(program)
  end

  def tokenize(chars)
    chars
      .gsub(/\s\s+/, ' ')
      .gsub('(', ' ( ')
      .gsub(')', ' ) ')
      .split(' ')
      .map{ |token| atom(token) }
      .compact
  end

  def atom(token)
    if token[/\.\d+/]
      token.to_f
    elsif token == '(' || token == ')'
      nil
    elsif token[/\d+/]
      token.to_i
    else
      token.to_sym
    end
  end
end

As you know, in lisp you can write your code with nested operators, for example - (+ (* 2 2) (- 5 3)). And this code will return 6.

When we use our parser for this code, the result is not quite what we need, so let’s update our test code:

program = '(+ (* (1 2) 3) 4)'
lisp = Lisp.new
lisp.parse(program) == [:+, [:*, [1, 2], 3], 4]

As you might guess, the most obvious way to fix our code is to call parse method in recursion and all array elements from '(' to ')'. We will move to a nested array. The code will be look loke this:

class Lisp
  def parse(program)
    read_from_tokens(tokenize(program))
  end

  def tokenize(chars)
    chars
      .gsub(/\s\s+/, ' ')
      .gsub('(', ' ( ')
      .gsub(')', ' ) ')
      .split(' ')
  end

  def read_from_tokens(tokens)
    return if tokens.empty?

    token = tokens.shift

    if '(' == token
      list = []

      while tokens.first != ')'
        list << read_from_tokens(tokens)
      end
      tokens.shift

      list
    elsif ')' == token
      raise 'unexpected )'
    else
      atom(token)
    end
  end

  def atom(token)
    if token[/\.\d+/]
      token.to_f
    elsif token[/\d+/]
      token.to_i
    else
      token.to_sym
    end
  end
end

We did it! Let’s start to make eval method.

Eval method

As I said earlier, our interpreter consists of two parts: a parser and eval function.

The eval function will take two arguments: an expression, exp, that we want to evaluate, and an environment, env, in which to evaluate it. An environment is a mapping from variable names to their values. By default, eval will use a instance value that includes the names for a bunch of standard things.

let’s implement @env variable with car, cdr and cons functions:

env = {
  :car  => lambda { |*list| list[0] },
  :cdr  => lambda { |*list| list.drop 1 },
  :cons => lambda { |(e, cell), _| [e] + cell },
}

The next step is to make eval function which will look for a match on the first element of the input array

class Lisp
  def initialize(ext = {})
    @env = {
      :car   => lambda { |*list| list[0] },
      :cdr   => lambda { |*list| list.drop 1 },
      :cons  => lambda { |(e, cell), _| [e] + cell },
    }.merge(ext)
  end

  def eval(exp, env = @env)
    env[exp.first].(exp[1..-1])
  end

  # ...
end

Now we have a problem: what will happen when the first element of the array will be not be a symbol (integer for example) and what will be happen when we have nested functions? I think we can add a check to the element type like this:

class Lisp
  def initialize(ext = {})
    @env = {
      :car   => lambda { |*list| list[0] },
      :cdr   => lambda { |*list| list.drop 1 },
      :cons  => lambda { |(e, cell), _| [e] + cell },
    }.merge(ext)
  end

  def eval(exp, env = @env)
    if exp.is_a? Numeric
      exp
    elsif exp.is_a? Symbol
      env[exp]
    else
      code = eval(exp[0], env)
      args = exp[1..-1].map{ |arg| eval(arg, env) }
      code.(*args)
    end
  end

  # ...
end

Some (eg arithmetic), we can easily add to env variable, and some do not. Therefore, we need to extend the checking in eval function. We will add a check for function name. For example, the code below will demonstrate quote and if functions:

def eval(exp, env)
  # ...
  elsif exp[0] == :quote
    exp[1..-1]
  elsif exp[0] == :if
    _, test, conseq, alt = exp
    exp = eval(test, env) ? conseq : alt
    eval(exp, env)
  # ...
end

The next step is to initialize define and lambda functions. In scheme define function syntax is as follows:

(define name
  (expression))

And our code must create a new key-value pair in the env hash:

def eval(exp, env)
  # ...
  elsif exp[0] == :define
    _, var, e = exp
    env[var] = eval(e, env)
  # ...
end

The last function, lambda in scheme will have this syntax:

(lambda (arg1, arg2, ...)
  (block of code))

The first thing that comes to mind is to return the new lambda object with a new value inside env that will serve our code:

def eval(exp, env)
  # ...
  elsif exp[0] == :lambda
    _, params, e = exp
    lambda { |*args| self.eval(e, env.merge(Hash[params.zip(args)])) }
  # ...
end

As you can see, we made basic functionality of the our interpreter. I will leave learning how to do the implement the arithmetic methods and other methods such as true, false, list, etc up to the reader.

REPL

In main REPL have really simple idea repl takes single user inputs, evaluates them, and returns the result to the user. And all this is happens in an infinite loop:

def repl(prompt = 'lisp >> ')
  while true
    print prompt
    program = gets

    p eval(parse(program))
  end
end

As a result, you should get something like this:

lisp >> (define pi 3.14)
3.14
lisp >> (define circle-area (lambda (r) (* pi (* r r))))
#<Proc:0x007fa6140c6cc0@lib/rlisp.rb:86 (lambda)>
lisp >> (circle-area 11)
379.94

Conclusions

Right now, we have a simple scheme interpreter. It is easy to expand since we wrote a simple repl, and considered the basic idea of the interpreter. In this article, we did not consider such important concepts as macros, multithreading, code optimization, work with the system, and much more. These concepts will be discussed in future articles.

Further reading