Skip to content

Create a SSV Checkpoint Sync #740

@diegomrsantos

Description

@diegomrsantos

1. Problem Statement

Today, SSV clients assume they can always rebuild the entire SSV state by replaying all SSVNetwork contract events from the deployment block to the current head.

For mainnet:

  • SSV contract deployment block: 17,507,487 (June 2023).

Typical flow (e.g. Anchor today):

  1. Connect to an Ethereum execution endpoint (local or remote).
  2. Query SSVNetwork contract events from the deployment block.
  3. Replay all historical events (operator registrations, validator additions/removals, cluster deposits/withdrawals, fee updates, etc.).
  4. Persist the derived state to a local database.
  5. Build in-memory state (operators, clusters, validators, shares, fees).

This implies:

  • Fetching events across millions of blocks.
  • Hundreds or thousands of RPC calls (depending on batching/pagination).
  • An assumption that the execution endpoint can still serve logs/receipts all the way back to mid-2023.

The Go-based go-ssv client has the same logical requirement: it must derive the identical SSV state, even if its storage engine and caching differ.

However, this “replay from 2023 via RPC” approach is becoming unsustainable given Ethereum’s evolution:

  • Ethereum’s weak subjectivity model: Checkpoint sync is now the normal way to sync Ethereum nodes, not an edge case. SSV’s current approach (implicitly trusting an arbitrary long history) doesn’t explicitly acknowledge the weak-subjectivity assumptions it already inherits.
  • Execution clients’ bounded history and pruning defaults: Most operators do not run archive nodes. Modern clients (Erigon, Reth, etc.) prune old receipts/logs by default, meaning a typical full node may not have SSV events from 2023 readily available.

In short, relying on every new SSV node to replay 1.5+ years of events via standard RPC is increasingly at odds with how Ethereum nodes operate and scale.

2. Ethereum Context: Weak Subjectivity & Bounded History

2.1 Weak Subjectivity and Checkpoint Sync

Post-Merge Ethereum introduced weak subjectivity checkpoints:

  • A fresh or long-offline consensus client cannot, by network messages alone, distinguish the canonical chain from a malicious long-range fork.
  • To counter this, consensus clients use a weak-subjectivity checkpoint (usually a recent finalized epoch) provided via config or defaults. The client will only sync to a chain that includes this checkpoint, effectively treating it as a new trusted “genesis” point.
  • The execution layer (EL) then syncs under the hood: it downloads state and blocks up to the head, verifying that execution state and receipts match the header roots provided by the consensus layer.

SSV does not run its own Ethereum consensus; it relies on an existing CL+EL. This means an SSV node inherently trusts whatever subjective checkpoint its Ethereum node uses. Replaying SSV events from 2023 does not remove that trust assumption—it simply obscures it behind a flood of RPC calls. In other words, SSV’s security is already bounded by the trust in the Ethereum node’s chain view (which itself might be anchored by a weak-subjectivity checkpoint).

2.2 Client-Side Pruning and EIP-4444

Execution clients increasingly prune historical data to improve performance:

  • Pruning today: By default, Erigon v3 does not store old receipts/logs (it replays transactions on demand for history). Reth and others are also oriented towards bounded history. A typical “full node” often cannot serve logs from 2023 without special configurations.
  • EIP-4444 (historical data expiry): This Ethereum improvement proposal formalizes that clients stop serving block bodies and receipts older than ~1 year over the p2p network, and may prune such data locally. In the long term, historical data will be fetched via dedicated archive services or distributed networks (e.g. The Graph, Portal Network), not from every peer by default.

Implication for SSV: A new SSV node cannot assume that any random Ethereum endpoint will have the entire SSV event history available. In many cases, replaying from June 2023 will either:

  • Fail outright (if the node’s RPC can’t provide old logs due to pruning), or
  • Impose a requirement to use specialized archive RPC services (which is a centralizing and potentially costly assumption).

Ethereum’s trend is clear: recent state sync with a checkpoint is the norm, and deep history becomes opt-in. SSV should adapt accordingly.

3. How SSV State Sync Works Today

At a high level, an SSV node currently does the following:

  1. Obtain a canonical Ethereum view: The operator provides an Ethereum endpoint. The consensus client (beacon node) likely syncs using a checkpoint, and the execution client (Geth/Erigon/Reth/etc.) syncs accordingly. Once sync is complete, the SSV node has access to the latest finalized and head blocks on Ethereum (via the EL RPC).

  2. Reconstruct SSV state from the SSVNetwork contract: The SSV node’s “event syncer” is programmed with knowledge of the SSV contract’s events. It iteratively:

    • Queries logs and storage from the Ethereum node for all relevant SSVNetwork events and state from the deployment block onward.
    • Processes every event (operator registrations/updates, validator adds/removes, cluster deposits, withdrawals, fee changes, etc.) in chronological order.
    • Builds up an off-chain database that represents the full SSV state (operators, clusters, validators, balances, fee index, status flags, etc.).

3.1 On-chain vs off-chain SSV state today

The SSV contracts expose some point-in-time views, but they do not provide a way to enumerate the full network state at a given block:

  • Operators:
    Operator records are stored on-chain and exposed via views like getOperatorById on SSVNetworkViews. You can query a specific operator ID, but there is no “list all operators” method, so discovering all operators still requires indexing events (e.g. OperatorAdded / OperatorRemoved).

  • Validators:
    Validator existence for a given (owner, pubkey) can be checked on-chain (via a boolean getValidator(owner, publicKey) style view). However, there is no on-chain API to list all validators; tools like the SSV Subgraph / Scanner build that view from the historical event stream.

  • Clusters:
    Cluster state (balance, validator count, fee index, active flag, etc.) is stored on-chain in a Cluster struct and is mirrored in events such as ValidatorAdded, ClusterDeposited, ClusterLiquidated, etc. But functions like registerValidator, deposit, withdraw, getBalance, reactivate, isLiquidatable, and liquidate all expect a cluster snapshot tuple as input, which the docs explicitly say is “obtained using the SSV Subgraph or SSV Scanner tools.”
    In other words, the contracts validate a provided cluster snapshot; they do not offer a view to fetch or iterate over all clusters.

Because of these limitations, the complete SSV state at any block effectively lives in the node’s off-chain database (or an external indexer). Ethereum provides:

  • the canonical event log (to rebuild state by replay), and
  • point lookups for specific operators / validators / clusters via SSVNetworkViews,

but every node still has to index the full event history to recover the global picture.

In the current “replay from deployment” model, each SSV node repeats that same indexing work from scratch.

4. Trust model of SSV snapshots

Introducing SSV snapshots does not change the underlying security assumptions of the protocol, but it does add a small, explicit trust assumption compared to full replay.

When an operator imports a snapshot of the SSV state at block B, they are implicitly assuming that the snapshot provider:

  • Observed all relevant SSVNetwork data on the canonical chain up to B (no missing or fabricated events), and
  • Applied the SSV state transition rules correctly to derive the state at B.

We can keep this assumption as small as possible by making snapshots:

  • Auditable – It is always possible (for clients, infra providers, or anyone who cares) to replay all SSVNetwork events from deployment to B, recompute the SSV state, and check that it matches the published snapshot. Operators are not expected to do this routinely, but the mechanism is there.

  • Verifiable against on-chain state – At block B, we can query the SSVNetwork view methods and compare them with the snapshot (aggregates and/or sampled entries). If these checks fail, the snapshot is rejected. This gives a direct, on-chain sanity check that the snapshot is reasonably correct.

  • Cross-checkable across sources – Snapshot commitments (e.g. a hash of the SSV state at B) can be obtained from multiple independent providers and compared, in a similar spirit to how Ethereum encourages cross-checking weak-subjectivity checkpoints against several independent sources. The concrete cross-checking flow and verification rules are described in the following sections.

5. SSV Checkpoint Sync — Proposal for a Safe Snapshot Sync

5.1 Goal and Overview

Enable a new SSV node to bootstrap its state from a recent snapshot (taken at a block B) rather than replaying from June 2023, provided that:

  • B is a recent finalized block (to ensure the snapshot is on canonical chain and unlikely to be reorged).
  • B is within the Ethereum weak-subjectivity window for this node (meaning the node either was online up to B or has a trusted checkpoint that is at or after B). In practice, this usually means B is no more than a few weeks old at most.
  • The snapshot was generated by a fully-synced SSV node’s off-chain DB (one that has processed all events up to B).
  • The importing node can perform cheap verification to ensure the snapshot is consistent with on-chain state at B (using the Ethereum EL endpoint it’s connected to).
  • The snapshot can be cross-checked from multiple independent sources to mitigate trust in any single provider.

In essence, this is analogous to what Ethereum consensus clients do:

  • They don’t download 2 years of history; they start from a recent checkpoint.
  • They trust-but-verify: a checkpoint is provided, then the node verifies the chain from that point forward and checks the checkpoint is in the chain.
  • Users are encouraged to obtain checkpoint information from multiple sources for safety.

We want SSV nodes to do something similar:

  • Start from a recent known-good state (snapshot).
  • Verify that this state aligns with what the Ethereum contract says at that block.
  • Then proceed normally, processing new events from that block onward.

Crucially, the snapshot must include everything an SSV node needs at block B: all operators, all validators, all clusters and their balances/status, and network parameters. After loading it, the node should be in the same state as if it had replayed all events up to B.

We acknowledge that:

  • On-chain verifiability is partial: We can directly verify things like “operator 123’s fee at block B” or “total validator count at B” by calling the contract. We cannot directly query “give me cluster X composition” from the contract without already knowing it. So our verification will mix direct on-chain calls for what is available, and logical checks for consistency of the rest.
  • Trust is involved: The snapshot is computed off-chain, so the importing node is inherently trusting that computation unless it re-does it entirely. Our goal is to minimize this trust via verification steps and by encouraging multiple independent snapshots to be compared.

5.2 Snapshot Creation (Off-Chain) at Block B

(This portion is background — how the snapshot is produced — and would likely be handled by SSV maintainers or community indexers, not by the syncing node itself.)

To create a snapshot at block B, an existing SSV node (or indexer) that is fully synced through B will:

  • Stop processing at block B (ensuring no events beyond B are applied).
  • Dump its entire internal state to a structured format (e.g., a JSON or binary file with a well-defined schema). This includes:
    • Metadata Header: network identifier (e.g., Mainnet), the SSV contract address, the snapshot block number B, the block hash of B, and possibly the beacon chain finalized epoch corresponding to B.
    • Operators: every operator ID present at B, with fields { owner, fee, validatorCount, whitelistedContract, isPrivate, active } as stored in the contract. Essentially the output of getOperatorById(id) for each id.
    • Validators: a list of all (owner, validatorPublicKey) that are active at B (or have ever been added up to B). For completeness, we might include those that were removed by B as well, but the key part is knowing which validators currently exist in clusters.
    • Clusters: for each cluster (which can be identified by the owner address and the specific combination of operator IDs):
      • The list of operator IDs in the cluster,
      • The cluster’s snapshot tuple { validatorCount, networkFeeIndex, index, active, balance } at B (this is what you’d plug into getBalance or getBurnRate calls as the cluster parameter).
      • The list of validator public keys under that cluster (or a count, but the public keys tie into the Validators list above).
    • Global network state: values like networkValidatorsCount, networkFee, validatorsPerOperatorLimit, and any other global config parameters that the contract exposes via SSVNetworkViews at B.

Essentially, anyone with this snapshot should be able to instantiate an SSV node’s state exactly as it would be at the end of block B, without having to replay everything before B.

(Side note: Since the snapshot is produced off-chain, one could imagine malicious or buggy snapshots. That’s why verification is needed on the importing side, as described next.)

5.3 Verification on Import (Using Ethereum Data at B)

When an SSV node imports a snapshot at block B, it should use its connected execution client (and optionally consensus client for finality info) to verify the snapshot’s correctness before proceeding. The verification involves:

1. Verify the snapshot’s block: Ensure the snapshot’s stated blockNumber and blockHash match the actual Ethereum chain viewed by the node.

  • Use the EL RPC to fetch the block hash of B (or simply trust the CL’s view of finalized blocks).
  • Check that this matches the snapshot metadata. If it doesn’t, the snapshot might be from a different fork or network—abort if so.
  • If possible, also confirm that block B is finalized in the beacon chain (for extra safety that it won’t be reorged).

2. Global state checks: Call read-only contract methods on SSVNetwork at block *B` to verify key global values:

  • getNetworkValidatorsCount() – compare it to the total count of active validators in the snapshot.
  • getNetworkFee() – the network fee index at B (should match snapshot).
  • getValidatorsPerOperatorLimit() – verify against snapshot.
  • Any other global config (liquidation thresholds, fee increase limits, etc. as applicable) – verify these too.

All of these are cheap eth_call queries on the EL. Any mismatch means the snapshot is not reflecting the actual contract state at Breject the snapshot.

3. Operator records: For each operator that is relevant to the importing node (and ideally a random sample or all of the operators in the snapshot, if feasible):

  • Call getOperatorById(id) at B via the EL.
  • Verify that the returned { owner, fee, validatorCount, whitelistedContract, isPrivate, active } exactly matches the snapshot’s data for that operator ID.
  • If any operator’s data doesn’t match, something is wrong (snapshot may be from a different state or corrupted) → reject.

Optimization: If checking every operator is too slow (in terms of RPC calls), the node might at least check all operators that it cares about (e.g., the ones it runs or ones in its clusters), plus a random sampling of others to spot-check the snapshot’s integrity.

4. Validator existence: For each validator that the importing node will be responsible for (i.e., belonging to any cluster the node operates or monitors):

  • Call getValidator(owner, pubkey) at B.
  • This returns a boolean indicating if that validator is active in the contract.
  • Ensure that for all validators the node expects (from the snapshot) to be active, the contract indeed returns true.
  • Conversely, if the snapshot lists a validator under a cluster and getValidator returns false, that’s a red flag (snapshot might be including a validator that wasn’t actually active on-chain at B).

We don’t necessarily need to check every single validator in the network (could be thousands), but at least the ones the node is directly dealing with should match. If a snapshot were maliciously adding a fake validator or missing one, this step would catch it for any cluster the node knows about.

5. Cluster sanity checks: As noted, we cannot directly query “all clusters” on-chain. But we can do some consistency and sanity checks:

  • Internal consistency: Within the snapshot, ensure that:
    • The sum of validators across all clusters equals the networkValidatorsCount (global count) the snapshot claims (and we verified on-chain).
    • Each operator’s validatorCount in the snapshot equals the number of active validator assignments involving that operator (according to the snapshot’s cluster list). In other words, if operator 10 is said to have validatorCount = 5 on-chain, the snapshot should have exactly 5 validator entries spread across clusters that include operator 10.
    • No cluster appears twice or has contradictory data.
    • All cluster balances are non-negative and if a cluster is marked active=false (perhaps meaning removed), its balance should be 0 (as an example of a logical check).
  • Cross-check cluster with on-chain functions: We can take a few sample clusters (especially the ones this node might be part of, or randomly select some) and call getBalance(owner, operatorIds[], clusterTuple) at B using the snapshot’s data to see if the contract accepts it and returns the expected balance. Similarly, isLiquidatable or isLiquidated can be called for a sample cluster using the snapshot’s tuple. If the contract returns an unexpected result (e.g., the snapshot says a cluster is active but the contract thinks it’s liquidated given that tuple), that’s a problem.
  • These checks won’t prove the snapshot is a complete list of clusters, but they will catch obvious errors or inconsistencies in what’s provided.

To reiterate, the verification uses only standard Ethereum calls (eth_call) on the importing node’s own Ethereum endpoint. No custom trust is needed beyond the Ethereum node that the SSV node is already using. If all these checks pass, the snapshot is extremely likely to be a correct representation of on-chain state at B. Any targeted attempt to fake a snapshot would have to fool these checks (which would require either compromising the Ethereum provider or producing a snapshot with subtle self-consistent yet false data, which is hard especially if cross-checked via multiple sources below).

3b. Cross-Check Snapshot via Independent Sources (Out-of-band, do first):

(This step isn’t about on-chain verification, but rather about not putting all eggs in one basket. It is strongly recommended to do this before relying on the snapshot.)

Just as Ethereum users are advised to obtain weak-subjectivity checkpoints from multiple independent sources, SSV node operators should cross-verify the snapshot artifact itself:

  • If snapshots are available from multiple parties (say, an “official” SSV Labs snapshot, and one from a community-run indexer, and perhaps one from a friend who runs a node), the operator should compare them.
  • At minimum, compare the snapshot’s declared blockNumber and blockHash. If one source claims the snapshot is at block 12,345,678 with hash 0xABC and another source claims a snapshot at the same height has a different hash, something is wrong.
  • Better, compare a hash of the snapshot’s contents (if provided, e.g. a SHA256 or a Merkle root of the state). If the official source and community source both publish a hash for “snapshot at block 12,345,678”, those should match exactly.
  • Only if independent sources agree should the snapshot be considered for import. If there’s a discrepancy, the operator should hold off (just as one would if two explorers gave conflicting checkpoint hashes).

This cross-check is performed by the human/operator or by off-chain scripts, prior to the SSV client actually importing the file. It guards against trusting a single compromised source for the snapshot.

5.4 Catching Up from B to Head

After the snapshot is verified, the SSV node will:

  • Initialize its local database with the snapshot state (operators, clusters, validators, etc. as of block B).
  • Set its internal pointer to “last processed block = B”.
  • From there, resume normal operation: subscribe to new SSVNetwork contract events or query logs from B+1 onward, and process any changes that occurred after B.

Since B was recent (within weeks), fetching logs from B+1 to the current head is manageable and should be fully supported by even pruned nodes (they typically retain at least a year of history of logs, or the node was running and received those events in real-time).

At this point, the node is effectively synced. It saved potentially millions of RPC calls and a lot of time, at the cost of trusting an off-chain snapshot that was thoroughly verified.

6. Request for Feedback

We’d like feedback on this approach before proceeding to a formal implementation:

  • Realism of Modes A/B/C: Do these categories reflect how you as an operator run SSV nodes (or plan to)? Are there other scenarios we should consider?
  • Security trade-offs: Do you agree that, in scenarios where an operator doesn’t have a full history locally, using a snapshot within the CL’s weak subjectivity window is a reasonable trade-off? Why or why not?
  • Verification sufficiency: Given the current SSV contracts (where clusters can’t be listed on-chain), is the outlined verification (on-chain calls for global/operator data and logical checks for cluster data, plus multi-source snapshot cross-checks) sufficient for confidence? Are there additional checks or balances you’d like to see?
  • Future improvements: Would contract-level changes (in future versions of SSVNetwork) that allow direct on-chain retrieval or proof of all clusters be worth pursuing to make this even more trustless? Or is the off-chain snapshot + verification model acceptable?

If the consensus is positive, the next step would be to draft a concrete SIP (SSV Improvement Proposal) or spec outlining the snapshot file format, the import procedure, and the exact verification steps, and then implement this in both Anchor and go-ssv clients.

Metadata

Metadata

Assignees

No one assigned

    Labels

    researchAnything with unclear path forward and likely needing Go-SSV involvement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions