Skip to content

Commit 774c544

Browse files
committed
Specify initial incomplete program schema
1 parent 3a8c080 commit 774c544

File tree

15 files changed

+589
-0
lines changed

15 files changed

+589
-0
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"label": "ethdebug/format/program",
3+
"position": 5,
4+
"link": null
5+
}
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
---
2+
sidebar_position: 2
3+
---
4+
5+
# Key concepts
6+
7+
## Programs are associated with a contract's compiled bytecode
8+
9+
This bytecode might either be the call bytecode, executed when a contract
10+
account with this bytecode receives a message on-chain, or the create bytecode,
11+
executed as part of deploying the contract associated with the bytecode.
12+
13+
Reflecting this relationship, **ethdebug/format/program** records contain
14+
a reference to the concrete contract (i.e., not an `abstract contract` or
15+
`interface`), the environment the bytecode will be executed (call or
16+
create), and the compilation that yielded the contract and bytecode.
17+
18+
## Programs contain instruction listings for debuggers to reference
19+
20+
Programs contain a list of **ethdebug/format/program/instruction** objects,
21+
where each instruction corresponds to one machine instruction in the
22+
associated bytecode.
23+
24+
These instructions are ordered sequentially, matching the order and
25+
corresponding one-to-one with the encoded binary machine instructions in
26+
the bytecode. Instructions specify the byte offset at which they appear in the
27+
bytecode; this offset is equivalent to program counter on non-EOF EVMs.
28+
29+
By indexing these instructions by their offset, **ethdebug/format**
30+
programs allow debuggers to lookup high-level information at any point
31+
during machine execution.
32+
33+
## Instructions describe high-level context details
34+
35+
Each instruction object in a program contains crucial information about the
36+
high-level language state at that point in the bytecode execution.
37+
Instructions represent these details using the
38+
**ethdebug/format/program/context** schema, and these details may include:
39+
40+
- Source code ranges associated with the instruction (i.e., "source mappings")
41+
- Variables known to be in scope following the instruction and where to
42+
find those variable's values in the machine state
43+
- Control flow information such as an instruction being associated with the
44+
process of calling from one function to another
45+
46+
This information serves as a compile-time guarantee about the high-level
47+
state of the world that exists following each instruction.
48+
49+
## Contexts inform high-level language semantics during machine tracing
50+
51+
The context information provided for each instruction serves as a bridge
52+
between low-level EVM execution and high-level language constructs. Debuggers
53+
can use these strong compile-time guarantees to piece together a useful and
54+
consistent model of the high-level language code behind the running machine
55+
binary.
56+
57+
By following the state of machine execution, a debugger can use context
58+
information to stay apprised of the changing compile-time facts over the
59+
course of the trace. Each successively-encountered context serves as the
60+
source of an observed state transition in the debugger's high-level state
61+
model. This allows the debugger to maintain an ever-changing and coherent
62+
view of the high-level language runtime.
63+
64+
In essence, the information provided by objects in this schema serves as a
65+
means of reducing over state transitions, yielding a dynamic and accurate
66+
representation of the program's high-level state. This enables debugging
67+
tools to:
68+
69+
1. Map the current execution point back to the original source code
70+
2. Reconstruct the state of variables at any given point
71+
3. Provide meaningful stack traces that reference function names and source
72+
locations
73+
4. Offer insights into control flow, such as entering or exiting functions,
74+
or iterating through loops
75+
5. Present data structures (like arrays or mappings) in a way that reflects
76+
their high-level representation, rather than their low-level storage
77+
78+
By leveraging these contexts, debugging tools can offer a more intuitive and
79+
developer-friendly experience when working with EVM bytecode, effectively
80+
translating between the machine-level execution and the high-level code that
81+
developers write and understand. This continuous mapping between low-level
82+
execution and high-level semantics allows developers to debug their smart
83+
contracts more effectively, working with familiar concepts and structures
84+
even as they delve into the intricacies of EVM operation.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"label": "Program contexts",
3+
"position": 6,
4+
"link": null
5+
}
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
sidebar_position: 4
3+
---
4+
5+
import SchemaViewer from "@site/src/components/SchemaViewer";
6+
7+
# Code contexts
8+
9+
<SchemaViewer
10+
schema={{ id: "schema:ethdebug/format/program/context/code" }}
11+
/>
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
sidebar_position: 3
3+
---
4+
5+
import SchemaViewer from "@site/src/components/SchemaViewer";
6+
7+
# Schema
8+
9+
<SchemaViewer
10+
schema={{ id: "schema:ethdebug/format/program/context" }}
11+
/>
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
sidebar_position: 5
3+
---
4+
5+
import SchemaViewer from "@site/src/components/SchemaViewer";
6+
7+
# Variables contexts
8+
9+
<SchemaViewer
10+
schema={{ id: "schema:ethdebug/format/program/context/variables" }}
11+
/>
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
sidebar_position: 5
3+
---
4+
5+
import SchemaViewer from "@site/src/components/SchemaViewer";
6+
7+
# Instruction schema
8+
9+
<SchemaViewer
10+
schema={{ id: "schema:ethdebug/format/program/instruction" }}
11+
/>
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
sidebar_position: 1
3+
---
4+
5+
# Overview
6+
7+
:::tip[Summary]
8+
9+
**ethdebug/format/program** is a JSON schema for describing compile-time
10+
information about EVM bytecode, organized from the perspective of individual
11+
machine instructions.
12+
13+
In **ethdebug/format**, a program record (or "program") represents one block of
14+
executable EVM machine code that a compiler generated for a specific contract.
15+
This could be either the contract's runtime call bytecode or the bytecode
16+
to create the contract.
17+
18+
A program is structured as a sequence of instruction records ("instructions"),
19+
where each corresponds to a single EVM instruction in the machine code. Each
20+
instruction contains information about the high-level language context at that
21+
point in the bytecode. This allows debuggers to map low-level machine state
22+
back to high-level language concepts at any point during execution.
23+
24+
Key information that programs contain for a particular instruction might
25+
include:
26+
- the source range or source ranges that are "associated" with the
27+
instruction
28+
- the collection of known high-level variables at that point in time,
29+
including their types and where to find the bytes with those variables'
30+
values
31+
- signals to indicate that the instruction is part of some control flow
32+
operation, such as calling some function from another.
33+
34+
These program records provide debuggers with a powerful reference resource
35+
to be consulted while observing a running EVM. At each step of EVM machine
36+
execution, debuggers can find the matching **ethdebug/format** program
37+
instruction and use its information to maintain a coherent model of the
38+
high-level world, step-by-step.
39+
40+
:::
41+
42+
This format defines the primary **ethdebug/format/program** schema as well as
43+
various sub-schemas in the ethdebug/format/program/* namespace.
44+
45+
JSON values adhering to this schema contain comprehensive information about a
46+
particular EVM bytecode object. This includes contract metadata (e.g., reference to the source range where the contract is defined) and, importantly, an
47+
ordered list of **ethdebug/format/program/instruction** objects.
48+
49+
Each instruction object contains essential details for translating low-level
50+
machine state at the time of the instruction back into high-level language
51+
concepts. This allows debuggers to provide a meaningful representation of
52+
program state at any point during execution.
53+
54+
## Reading this schema
55+
56+
The **ethdebug/format/program** schema is a root schema that composes other
57+
related schemas in the ethdebug/format/program/* namespace.
58+
59+
These schemas (like all schemas in this format) are specified as
60+
[JSON Schema](https://json-schema.org), draft 2020-12.
61+
62+
Please refer to one or more of the following resources in this section, or
63+
see the navigation bar for complete contents:
64+
65+
- [Key concepts](/spec/program/concepts)
66+
67+
- [Schema](/spec/program) (**ethdebug/format/program** schema listing)
68+
69+
- [Instruction schema](/spec/program/instruction)
70+
(**ethdebug/format/program/instruction** schema listing)
71+
72+
- [Context schema](/spec/program/context)
73+
(**ethdebug/format/program/context** schema listing)
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
sidebar_position: 4
3+
---
4+
5+
import SchemaViewer from "@site/src/components/SchemaViewer";
6+
7+
# Schema
8+
9+
<SchemaViewer
10+
schema={{ id: "schema:ethdebug/format/program" }}
11+
/>

schemas/program.schema.yaml

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
$schema: "https://json-schema.org/draft/2020-12/schema"
2+
$id: "schema:ethdebug/format/program"
3+
4+
title: ethdebug/format/program
5+
description: |
6+
Debugging information about a particular bytecode in a compilation.
7+
8+
type: object
9+
10+
properties:
11+
compilation:
12+
title: Compilation reference by ID
13+
description: |
14+
A reference to the compilation as an `{ "id": ... }` object.
15+
$ref: "schema:ethdebug/format/materials/reference"
16+
17+
contract:
18+
type: object
19+
properties:
20+
name:
21+
type: string
22+
23+
definition:
24+
$ref: "schema:ethdebug/format/materials/source-range"
25+
required:
26+
- definition
27+
28+
environment:
29+
title: Bytecode execution environment
30+
description: |
31+
Whether this bytecode is for contract creation or runtime calls.
32+
type: string
33+
enum:
34+
- call
35+
- create
36+
37+
context:
38+
description: |
39+
The context known to exist prior to the execution of the first
40+
instruction in the bytecode.
41+
$ref: "schema:ethdebug/format/program/context"
42+
43+
instructions:
44+
type: array
45+
description: |
46+
The full array of instructions for the bytecode.
47+
items:
48+
$ref: "schema:ethdebug/format/program/instruction"
49+
additionalItems: false
50+
51+
required:
52+
- contract
53+
- environment
54+
- instructions
55+
56+
examples:
57+
- # Incrementing a storage counter
58+
#
59+
# This example represents the call bytecode for the following pseudo-code:
60+
# ```
61+
# contract Incrementer;
62+
#
63+
# storage {
64+
# [0] storedValue: uint256;
65+
# };
66+
#
67+
# code {
68+
# let localValue = storedValue;
69+
# storedValue += 1;
70+
# value = tmp;
71+
# };
72+
# ```
73+
contract:
74+
name: "Incrementer"
75+
definition:
76+
source:
77+
id: 0
78+
environment: call
79+
context:
80+
variables:
81+
- &stored-value
82+
identifier: storedValue
83+
type:
84+
kind: uint
85+
bits: 256
86+
pointer:
87+
location: storage
88+
slot: 0
89+
instructions:
90+
- offset: 0
91+
operation:
92+
mnemonic: PUSH0
93+
context:
94+
variables:
95+
- *stored-value
96+
- offset: 1
97+
operation:
98+
mnemonic: SLOAD
99+
context:
100+
variables:
101+
- *stored-value
102+
- &local-value
103+
identifier: localValue
104+
type:
105+
kind: uint
106+
bits: 256
107+
pointer:
108+
location: stack
109+
slot: 0
110+
- offset: 2
111+
operation:
112+
mnemonic: PUSH1
113+
arguments: ["0x01"]
114+
context:
115+
variables:
116+
- *stored-value
117+
- <<: *local-value
118+
pointer:
119+
location: stack
120+
slot: 1
121+
122+
- offset: 4
123+
operation:
124+
mnemonic: ADD
125+
context:
126+
variables:
127+
- *stored-value
128+
- *local-value
129+
- offset: 5
130+
operation:
131+
mnemonic: PUSH0
132+
context:
133+
variables:
134+
- *stored-value
135+
- <<: *local-value
136+
pointer:
137+
location: stack
138+
slot: 1
139+
140+
- offset: 6
141+
operation:
142+
mnemonic: SSTORE
143+
context:
144+
variables:
145+
- *stored-value

0 commit comments

Comments
 (0)