You are on page 1of 11

How to Build a Virtual Machine

Terence Parr
University of San Francisco
May 13, 2014

Preliminaries
What is a VM?
A simulated computer that runs a simple instruction
set (bytecodes).

And why do we want one?


We cant execute high-level languages like Java
directly on the computer
Rather than compile to machine code, we generate
bytecodes for a VM; much easier
We get code portability to anywhere with same VM
But bytecodes are slower than raw machine code
Ultimately VMs today compile by codes to machine
code on the fly

Goal: Simulate a simple computer


Here is a real CPU block diagram and code
Machine code Assembly code
0600
0602
0604
0606
0608
0609
060B
060D
060E
0610
0612
0614
0615
0617
0619
061A

A4 44
D 0 05
A5 45
D 0 01
60
B1 40
91 42
88
D 0 F9
E6 41
E6 43
88
C6 45
D 0 F0
60

address

M EM CPY
LD Y
CN T+ 0
;Set Y = CN T.L
BN E
LO O P
;If CN T.L > 0,then loop
LD A
CN T+ 1
;If CN T.H > 0,
BN E
LO O P
; then loop
RTS
;Return
LO O P
LD A
(SRC),Y
;Load A from ((SRC)+ Y)
STA
(D ST),Y
;Store A to ((D ST)+ Y)
D EY
;D ecr CN T.L
BN E
LO O P
;if CN T.L > 0,then loop
IN C
SRC+ 1
;Incr SRC + = $0100
IN C
D ST+ 1
;Incr D ST + = $0100
D EY
;D ecr CN T.L
D EC
CN T+ 1
;D ecr CN T.H
BN E
LO O P
;If CN T.H > 0,then loop
RTS
;Return
EN D

Programming our VM
Our bytecodes will be very regular
and higher level than machine
instructions
Each bytecode does a tiny bit of work
Print
1+2:
Stack code
Execution trace
ICONST 1
ICONST 2
IADD
PRINT

0000:
0002:
0004:
0005:
0006:

iconst
iconst
iadd
print
halt

1
2

stack= [ 1 ]
stack= [ 1 2 ]
stack= [ 3 ]
stack= [ ]
stack= [ ]

Our instruction set


iadd
integer add (pop 2 operands, add, push result)
isub
imul
ilt
integer less than
ieq
br addr
branch to address
brt addr
branch if true
brf addr
iconst value
push integer constant
load addr
load local
gload addr
load global variable
store addr
gstore addr
print
pop
toss out the top of stack
call addr, numArgscall addr, expect numArgs
ret
return from function, expected result on top of stack
halt

Instruction format
Code memory is 32-bit word
addressable
Bytecodes stored as ints but they are
bytes
Data memory is 32-bit word
addressable
+0
bytecode
Addresses
+1 integer numbers
are just
bytecode
operand
Operands
are 32-bit
integers
+2
bytecode

operand1

operand2

Sample bytecodes
Address

0000
0002
0004
0005

BytecodesAssembly bytecode

9 1ICONST 1
9 2ICONST 2
1 IADD
14 PRINT

Our little stack machine


data
memory

registers
fetch
decode

code
memory

sp
fp
ip
...

execute
stack

Fetch: opcode = code[ip]


Decode: switch (opcode) {}
Execute: stack[++sp] = stack[sp--] + stack[sp--] (iadd instruction)

CPU: Fetch, decode, execute


cycle

Sample implementations
case BR :
ip = code[ip+ + ];
break;
case IAD D :
b = stack[sp--];
// 2nd opnd at top of stack
a = stack[sp--];
// 1st opnd 1 below top
stack[+ + sp] = a + b;
// push result
break;

Implementing functions
f();

call f, 0
pop

g(int x,y) load


{ x (fp-4)
load y (fp-3)
int z;
call f, 2
z = f(x,y);
store z (fp+1)
...
...

return 9;

iconst 9
ret

You might also like