Making a Programming Language
A step-by-step guide to building your own programming language in TypeScript — covering lexing, parsing, evaluation, variables, math, arrays, and more.
Have you ever looked at a piece of code and wondered: how does the computer actually know what this means? I don't mean how it compiles or optimizes. I mean the very first step. How does text become instructions?
The core ideas are surprisingly simple. There are only a few pieces: a lexer that turns text into tokens, a parser that builds structure out of those tokens, and an evaluator that runs that structure. Everything else (variables, functions, conditionals, even types) is just patterns built on top.
In this series, we'll build a working programming language called Spark using TypeScript. By the end, you'll have an actual interpreter you can run code in, right here in your browser.
Choosing a Syntax
Before writing any code, we need to decide what our language looks like. I wanted something that feels familiar but has its own character, not just another JavaScript clone.
Here's what Spark looks like:
1val name = "Spark"
2val version = 1
3val features = ["lexer", "parser", "eval"]
4
5when (version > 0) {
6 say("ready to go")
7}
8
9func add(a, b) {
10 return a + b
11}
12
13say(add(3, 4))The design choices:
valinstead ofletorconst(borrowed from Kotlin and Swift)funcinstead offnorfunction(explicit and readable)sayinstead ofprint(Spark speaks to you)wheninstead ofif(reads more like natural language)yesandnoinstead oftrueandfalse(plain English)- Curly braces for blocks, no semicolons, newlines separate statements
The Pipeline
Every language interpreter follows the same pipeline:
1source code → lexer → tokens → parser → AST → evaluator → outputAn analogy: imagine explaining a recipe to someone. The lexer is like recognizing individual words. The parser is understanding the grammar ("chop the onions" is a verb+noun pair). The evaluator is actually performing the actions.
Token Types
The lexer and parser communicate through tokens. Each token has a type, a string value, and a source position for error reporting:
1export enum TokenType {
2 Number, String, Identifier,
3 Let, If, Else, Fn, Return, Print,
4 True, False,
5 Plus, Minus, Star, Slash,
6 Eq, EqEq, Bang, BangEq,
7 Lt, Gt, LtEq, GtEq,
8 LParen, RParen, LBrace, RBrace,
9 LBracket, RBracket,
10 Comma, Arrow, DotDot,
11 EOF,
12}
13
14export interface Token {
15 type: TokenType;
16 value: string;
17 line: number;
18 col: number;
19}Every language construct becomes a TokenType enum member. The line and col fields let us report exactly where errors occur.
Error Classes
When something goes wrong, we need to point the programmer to the exact location:
1export class SparkError extends Error {
2 line: number;
3 col: number;
4
5 constructor(message: string, line: number, col: number) {
6 super(message);
7 this.name = "SparkError";
8 this.line = line;
9 this.col = col;
10 }
11
12 format(source: string): string {
13 const lines = source.split("\n");
14 const lineStr = lines[this.line - 1] ?? "";
15 const pointer = " ".repeat(Math.max(0, this.col - 1)) + "^";
16 return `${this.message}\n at ${this.line}:${this.col}\n ${lineStr}\n ${pointer}`;
17 }
18}
19
20export class ParseError extends SparkError {}
21export class RuntimeError extends SparkError {}SparkError stores the position and can render a formatted message with a caret pointing at the problem. ParseError and RuntimeError extend it with no extra logic. They just carry different names so callers can distinguish syntax errors from runtime errors.
Statement Nodes
A Spark program is a list of statements. Each statement node is a discriminated union with a kind field:
1export type Statement =
2 | Program | LetStatement | IfStatement
3 | FunctionDeclaration | ReturnStatement
4 | ExpressionStatement;
5
6export interface Program {
7 kind: "Program";
8 body: Statement[];
9}
10
11export interface LetStatement {
12 kind: "LetStatement";
13 name: string;
14 value: Expression;
15}
16
17export interface IfStatement {
18 kind: "IfStatement";
19 condition: Expression;
20 consequent: Statement[];
21 alternate: Statement[] | null;
22}
23
24export interface FunctionDeclaration {
25 kind: "FunctionDeclaration";
26 name: string;
27 params: string[];
28 body: Statement[];
29}
30
31export interface ReturnStatement {
32 kind: "ReturnStatement";
33 value: Expression | null;
34}
35
36export interface ExpressionStatement {
37 kind: "ExpressionStatement";
38 expression: Expression;
39}Program is the root node containing all top-level statements. IfStatement carries both branches, and FunctionDeclaration stores its parameters and body separately.
Expression Nodes
Expressions produce values. They follow the same discriminated union pattern:
1export type Expression =
2 | BinaryExpression | Identifier
3 | NumberLiteral | StringLiteral | BoolLiteral
4 | ArrayLiteral | IndexExpression
5 | CallExpression | Assignment;
6
7export interface BinaryExpression {
8 kind: "BinaryExpression";
9 left: Expression;
10 operator: string;
11 right: Expression;
12}
13
14export interface Identifier {
15 kind: "Identifier";
16 name: string;
17}
18
19export interface NumberLiteral {
20 kind: "NumberLiteral";
21 value: number;
22}
23
24export interface StringLiteral {
25 kind: "StringLiteral";
26 value: string;
27}
28
29export interface BoolLiteral {
30 kind: "BoolLiteral";
31 value: boolean;
32}
33
34export interface ArrayLiteral {
35 kind: "ArrayLiteral";
36 elements: Expression[];
37}The operator field on BinaryExpression is a string like "+", "-", "==", or ".." for ranges. This keeps the parser simple: it just records the operator text and lets the evaluator decide what each operator means.
Indexing, Calls, and Assignment
After an expression, you can index into it, call it, or assign to it:
1export interface IndexExpression {
2 kind: "IndexExpression";
3 array: Expression;
4 index: Expression;
5}
6
7export interface CallExpression {
8 kind: "CallExpression";
9 callee: Expression;
10 args: Expression[];
11}
12
13export interface Assignment {
14 kind: "Assignment";
15 name: string;
16 value: Expression;
17}These three node types handle arr[0], foo(), and x = 5 respectively. They're parsed as suffixes to an existing expression, which naturally handles chaining like getArr()[0].
Runtime Values
The evaluator doesn't work with AST nodes. It works with runtime values. Every value in Spark is one of these:
1export type RuntimeValue =
2 | { type: "number"; value: number }
3 | { type: "string"; value: string }
4 | { type: "boolean"; value: boolean }
5 | { type: "array"; value: RuntimeValue[] }
6 | { type: "function"; name: string; params: string[];
7 body: Statement[]; closure: Environment }
8 | { type: "native"; name: string;
9 fn: (args: RuntimeValue[]) => RuntimeValue }
10 | { type: "null" };Functions are first-class values. A user-defined function stores its AST (body), parameters (params), and the scope where it was defined (closure). Native functions are TypeScript callbacks. say is implemented this way.
Environments
Variables live in environments, which form a linked list for lexical scoping:
1export interface Environment {
2 variables: Map<string, RuntimeValue>;
3 parent: Environment | null;
4}When you reference a variable, the evaluator checks the current scope, then walks up through parents. A function's closure is the environment at the point of definition. That is what makes closures work.
With the types in place, we're ready to build the first stage of the pipeline. In the next part, we'll write the lexer, the component that turns raw text into a stream of tokens.