-
Notifications
You must be signed in to change notification settings - Fork 15
feat(ensindexer): ENSDb Writer Worker #1406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Store ENSNode metadata (ENSIndexer Public Config, Indexing Status) into ENSDb.
🦋 Changeset detectedLatest commit: 57fcd2a The changes in this PR will be included in the next version bump. This PR includes changesets to release 14 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
ba173e1 to
1f362f3
Compare
…ations of ENSIndexer Public Config and Indexing Status to ENSDb.
| await waitForEnsIndexerToBecomeHealthy; | ||
|
|
||
| // 1. Create ENSDb Client | ||
| const ensDbConnection = new EnsDbConnection(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, but i would name these connection, client, query and mutation given the existing context, it feels cleaner and more straightforward. wdyt?
| // Read stored config and in-memory config. | ||
| // Note: we wrap read operations in pRetry to ensure all of them are | ||
| // completed successfully. | ||
| const [storedConfig, inMemoryConfig] = await pRetry(() => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically it think this should be
await Promise.all([
pRetry(...),
pRetry(...)
])
this way if one of the reads fails, the other isn't unnecessarily re-fetched
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome suggestion, thanks 🚀
| // Upsert ENSIndexerPublicConfig into ENSDb. | ||
| // Note: we wrap write operation in pRetry to ensure it can complete | ||
| // successfully, as there will be no other attempt. | ||
| await pRetry(() => ensDbMutation.upsertEnsIndexerPublicConfig(inMemoryConfig)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in what conditions would this write fail that would be retry-able? perhaps pRetry is unnecessary here, and it would be more exact to require that this promise succeed on its first attempt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't feel 100% ok with this pRetry, perhaps we should the process fail if there's any (very unlikely) issue with the write 👍
|
|
||
| // Check if Indexing Status is in expected status. | ||
| if (omnichainSnapshot.omnichainStatus === OmnichainIndexingStatusIds.Unstarted) { | ||
| throw new Error("Omnichain Status must be different that 'Unstarted'."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
than* unstarted
| // Upsert ENSIndexerPublicConfig into ENSDb. | ||
| await ensDbMutation.upsertIndexingStatus(snapshot); | ||
|
|
||
| logger.info("Indexing Status successfully stored in ENSDb."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- add a once-per-process log at info level for "Indexing Status successfully stored in ENSDb."
- make this a trace-level call with something like "Indexing Status updated in ENSDb."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, there's no need to log the trace for the update event. The lack of error log implies the successful update.
| @@ -0,0 +1,41 @@ | |||
| // This file was copied 1-to-1 from ENSApi. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't seem trivial or possible to deduplicate this into ensnode-sdk, so let's wait until we have an ensnode-internal package or something to dry this up — can you add a TODO here indicating that? like
// TODO: deduplicate with apps/ensapi/src/lib/handlers/drizzle.ts when ensnode nodejs internal package is created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise for logger.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually because ensindexer isn't using pino-formatted logs consistently, i think it might be best if we just use console.log in these cases. in which case i'd remove the debug logging in the client and let the consumer of the client handle all of the logging. in-client logging is only useful if we can indicate a log level to ignore it by default.
if you decide to keep pino in ensindexer, you'd want to add LogLevelEnvironment to the ENSIndexerEnvironment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm fine with using native logger from console object.
| type Schema = { [name: string]: unknown }; | ||
|
|
||
| // https://github.com/ponder-sh/ponder/blob/f7f6444ab8d1a870fe6492023941091df7b7cddf/packages/client/src/index.ts#L226C1-L239C3 | ||
| const setDatabaseSchema = <T extends Schema>(schema: T, schemaName: string) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import { setDatabaseSchema } from "@ponder/client"; and use that instead of this
| @@ -0,0 +1,63 @@ | |||
| import type { NodePgDatabase } from "drizzle-orm/node-postgres"; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we doing all of this pooling and connection management logic? it doesn't seem necessary and introduces redirection and complexity. we only need a single connection (created by makeDrizzle) for the writer, and it's writing once a second — we don't need the parallel connections offered by pg.Pool
i'd replace this entire file with:
// in ensdb-writer-worker
const client = new EnsNodeMetadataClient()`class EnsNodeMetadataClient {
db = makeDrizzle(schema)
constructor() {} // or put it in the constructor, whatever typescript requires
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, for ENSIndexer all we need is a single connection. However, for ENSApi, we'll need to switch to connection pool to improved read-side performance 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha! makes sense. that case I'd suggest just doing something like this
return drizzle({ client: new Pool({
connectionString: process.env.DATABASE_URL,
}) })instead of a custom connection class
| import { type EnsNodeMetadata, EnsNodeMetadataKeys } from "./ensnode-metadata"; | ||
|
|
||
| /** | ||
| * ENSDb Mutation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd merge this and the query client into a single client EnsNodeMetadataClient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do 👍
| logger.debug("Fetching ENSIndexer Health status: healthy"); | ||
| } catch { | ||
| const errorMessage = "Health endpoint for ENSIndexer is not available yet."; | ||
| logger.error(errorMessage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think the client should log errors — the consumer should
Remove logs from ENSIndexerClient class
c4eef67 to
ed4c96e
Compare
consolidate ensdb modules; replace pino logger with native console logger; simplify ensdb worker logic
ed4c96e to
dd19ce9
Compare
shrugs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
| ); | ||
| } | ||
|
|
||
| // Run ENSDb Writer Worker in a non-blocking way to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just "Run ENSDb Writer Worker in the background"?
lightwalker-eth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tk-o Thanks for this. Reviewed and shared feedback 👍
| SerializedIndexingStatusResponse, | ||
| } from "@ensnode/ensnode-sdk"; | ||
|
|
||
| export class EnsIndexerClient { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine this should move into the ensnode-sdk package?
It's true that we would only use it ourselves within ENSIndexer, but in theory we should put the clients for each of the ENSNode apps into ensnode-sdk. At least with my current mental model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a lot of file changes so I prefer to make this update in a follow-up PR. This move will require solving additional complexities. We'll need to split request and response types between EnsIndexerClient and ENSNodeClient. I think we should rename ENSNodeClient to EnsApiClient. ENSNode is a term describing a logical group of services. The client requires specific service as its target, hence, EnsApiClient for the EnsApi service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I was able to move the EnsIndexerClient to ENSNode SDK package 👍
| console.log("ENSDb Writer Worker: ENSIndexer is healthy, starting tasks."); | ||
|
|
||
| // 1. Create ENSDb Client | ||
| const ensDbClient = new EnsDbClient(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I think we need to give more of a detailed focus on the case of a completely fresh DATABASE_SCHEMA and how this is initialized by Ponder.
We have the existing code path:
- ENSIndexer fetches ENSRainbow config from ENSRainbow
- ENSIndexer can now build its own config
- ENSIndexer uses its own config to build a Ponder config
- Ponder uses the Ponder config to initialize
DATABASE_SCHEMA. - ENSIndexer now writes metadata into the given
DATABASE_SCHEMA.
Right? Don't we need to wait until step 4 for the tables in DATABASE_SCHEMA to be initialized before we can perform step 5?
Assuming so, this waiting potentially needs to grow a lot if we add logic to wait up to an hour for step 1 above to be complete, waiting for ENSRainbow to become both healthy and ready.
And during all of this time, what is ENSApi supposed to do? From its perspective, it's been configured to read from DATABASE_SCHEMA but ENSIndexer might wait over an hour to create it.
One idea is that maybe we don't want to write at all to DATABASE_SCHEMA and instead we create our own database schema name that is always used by ENSNode instances writing to a particular Postgres database, kind of how Ponder sync works and how it can be shared across multiple Ponder instances at the same time. If we took this path we would need to update the data model so that multiple ENSNode instances could operate in parallel without impacting each others state. There's a meaningful amount of complexity here and we need to be careful not to mess it up. I don't have a strong opinion at the moment on the solution, above is just a quick brainstorm.
One thing that maybe could help here is to fix the situation where ENSIndexer can't initialize Ponder for more than an hour. Maybe we can make it so that ENSRainbow returns its config even before it is ready and as soon as it is healthy (which I imagine should be super fast?). If so then Ponder might be able to initialize the DATABASE_SCHEMA super fast, which would then allow us to start writing some symbolic value into the metadata tables that represents waiting for ENSRainbow to initialize which could then be read and understood by ENSApi during its own initialization phase. If we took this path then we need some strategy for what to do with Ponder's desire to just immediately start indexing as soon as we pass it its config. Is there a way to tell Ponder to initialize the database but don't start indexing yet?
Suggest to take a step back and carefully consider all cases and situations, then to document a plan and share it with the team for review 👍 Please feel welcome to challenge all the ideas I shared above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short answer is that all we need to to is to wait for ENSIndexer /health endpoint to return 200 OK response.
Here are example logs from ponder application when database schema needs to be initialized:
12:54:08 PM INFO database Using database schema 'public'
12:54:09 PM INFO database Created tables [ensnode_metadata, reverse_name_records, node_resolver_relations, resolver_records, resolver_address_records, resolver_trecords, migrated_nodes, subregistries, registration_lifecycles, registrar_actions, _ensindexer_registrar_action_metadata, subgraph_domains, subgraph_accounts, subgraph_resolvers, subgraph_registrations, subgraph_wrapped_domains, subgraph_transfers, subgraph_new_owners, subgraph_new_resolvers, subgraph_new_ttls, subgraph_wrapped_transfers, subgraph_name_wrapped, subgraph_name_unwrapped, subgraph_fuses_set, subgraph_expiry_extended, subgraph_name_registered, subgraph_name_renewed, subgraph_name_transferred, subgraph_addr_changed, subgraph_multicoin_addr_changed, subgraph_name_changed, subgraph_abi_changed, subgraph_pubkey_changed, subgraph_text_changed, subgraph_contenthash_changed, subgraph_interface_changed, subgraph_authorisation_changed, subgraph_version_changed, name_sales, name_tokens]
12:54:09 PM INFO server Started listening on port 42069
12:54:09 PM INFO server Started returning 200 responses from /health endpoint
Please note how ponder first create tables, and only then starts returning 200 OK response from its /health endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea for ENSApi to use specific DATABASE_SCHEMA. Suggest sticking to that. This way we can keep wrap the whole scope of ENSDb inside a single database schema — it's a very convenient assumption.
ENSApi has to wait for ENSDb to include EnsNodeMetadata. Otherwise, there's literally no use case ENSApi could serve. All ENSApi routes require indexing status information, which come from ENSDb.
ENSDb won't have the EnsNodeMetadata information until ENSIndexer stores it.
ENSIndexer won't store any EnsNodeMetadata information until ENSRainbow is healthy, and ENSIndexer is healthy.
I think we should create a "logical healthcheck" for ENSDb, such that clients connecting to ENSDb could know if the service is ready to be used. For example, we can consider this "logical healthcheck" to ensure that ENSDb includes EnsNodeMetadata for both keys: EnsNodeMetadataKeys.EnsIndexerPublicConfig, EnsNodeMetadataKeys.IndexingStatus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tk-o Overall agreed, but with a few distinctions:
- Suggest to separate the idea of "healthy" vs "ready".
- For an ENSDb client created with a given
DATABASE_URL,DATABASE_SCHEMA, and expected database version value:- It seems we might want to have some extremely simple operation (independent of
DATABASE_SCHEMAwhere we just check if we can connect to theDATABASE_URLat all). Or maybe this should also check thatDATABASE_SCHEMAexists too? Not sure right now. But anyway.. in my mind this is the definition of "healthy". - We can have a different operation that requires that all 3 of the expected keys in the metadata table to be successfully read and deserialized AND for the key for the ENSDb version to match the expected version in the ENSDbClient. This operation could be defined as "ready".
- It seems we might want to have some extremely simple operation (independent of
What do you think?
These ideas might also influence how ENSApi implements both its /health and /ready APIs.
|
@lightwalker-eth I summarized all feedback items that will have to be addressed in follow up PRs. Feel free to comment on that list. PR follow-ups:
|
lightwalker-eth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tk-o Thanks for updates. Reviewed and shared feedback.
| * Possible key value pairs are defined by 'EnsNodeMetadata' type: | ||
| * - `EnsNodeMetadataEnsDbVersion` | ||
| * - `EnsNodeMetadataEnsIndexerPublicConfig` | ||
| * - `EnsNodeMetadataIndexingStatus` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Possible key value pairs are defined by 'EnsNodeMetadata' type: | |
| * - `EnsNodeMetadataEnsDbVersion` | |
| * - `EnsNodeMetadataEnsIndexerPublicConfig` | |
| * - `EnsNodeMetadataIndexingStatus` | |
| * Possible key value pairs are defined in the `EnsNodeMetadata` type. |
We achieved the goal just through the sentence above. If we go further by duplicating a bunch of ideas here then it becomes more difficult to maintain these comments with accuracy.
| * Key | ||
| * | ||
| * Allowed keys: | ||
| * - `EnsNodeMetadataEnsDbVersion['key']` | ||
| * - `EnsNodeMetadataEnsIndexerPublicConfig['key']` | ||
| * - `EnsNodeMetadataIndexingStatus['key']` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Key | |
| * | |
| * Allowed keys: | |
| * - `EnsNodeMetadataEnsDbVersion['key']` | |
| * - `EnsNodeMetadataEnsIndexerPublicConfig['key']` | |
| * - `EnsNodeMetadataIndexingStatus['key']` | |
| * Key. Possible keys are defined in the `EnsNodeMetadata` type. |
| * Value | ||
| * | ||
| * Allowed values: | ||
| * - `EnsNodeMetadataEnsDbVersion['value']` | ||
| * - `EnsNodeMetadataEnsIndexerPublicConfig['value']` | ||
| * - `EnsNodeMetadataIndexingStatus['value']` | ||
| * | ||
| * Guaranteed to be a JSON object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Value | |
| * | |
| * Allowed values: | |
| * - `EnsNodeMetadataEnsDbVersion['value']` | |
| * - `EnsNodeMetadataEnsIndexerPublicConfig['value']` | |
| * - `EnsNodeMetadataIndexingStatus['value']` | |
| * | |
| * Guaranteed to be a JSON object. | |
| * Value. Possible values are defined in the `EnsNodeMetadata` type. | |
| * | |
| * Guaranteed to be a JSON object. |
| * How many times retries should be attempted before | ||
| * {@link waitForEnsIndexerToBecomeHealthy} becomes a rejected promise. | ||
| */ | ||
| export const MAX_ENSINDEXER_HEALTHCHECK_ATTEMPTS = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| export const MAX_ENSINDEXER_HEALTHCHECK_ATTEMPTS = 5; | |
| export const MAX_ENSINDEXER_STARTUP_HEALTHCHECK_ATTEMPTS = 5; |
Is that fair?
Goal: Make it more clear that this is part of the startup process.
|
|
||
| /** | ||
| * Validate if `configB` is compatible with `configA`, such that `configA` is | ||
| * a subset of `configB`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I don't think this is correct.
For example, consider the following case:
Config A has plugins X, Y, and Z.
Config B has plugins X and Y.
Config B is a subset of Config A, but it is not compatible. Both would have different indexing results.
| /** | ||
| * ENSIndexer Client | ||
| * | ||
| * Using this client methods requires first calling `health()` method and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? The client should be stateless.
| /** | ||
| * ENSIndexer Health is unknown if the health check endpoint is unavailable. | ||
| */ | ||
| Unknown: "unknown", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand the goal here? As I understand this would generally be an Error that would be thrown by the related function call on the client? Ex: request timeout or connection failed, etc?
| type SerializedIndexingStatusResponse, | ||
| } from "@ensnode/ensnode-sdk"; | ||
|
|
||
| export const EnsIndexerHealthCheckResults = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand the big idea here? Ideally all endpoints of all of our apps would return the same standard JSON response schema, where only some field such as data on a successful response has a dynamic type.
The /health endpoint of all apps should be the same.
Is there a special issue here where the /health endpoint on ENSIndexer is implemented by Ponder and not by us? If so, this logic should move into a Ponder Client class that we build.
| ); | ||
| } | ||
|
|
||
| if (this.#healthCheckResult !== EnsIndexerHealthCheckResults.Ok) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because a client managed to get a healthy status from ENSIndexer at time T doesn't mean ENSIndexer will remain healthy at time T+N.
Clients should be stateless.
Store ENSNode metadata (ENSIndexer Public Config, Indexing Status) into ENSDb.
Suggested review order
ENSNode Schema
New schema was added for ENSNode metadata at
packages/ensnode-schema/src/schemas/ensnode-metadata.schema.ts.ENSNode SDK
A simple HTTP client was built at
packages/ensnode-sdk/src/ensindexer/client.ts. It's useful for sending HTTP requests over the internal network (akin to in-memory function calls):/healthendpoint — required to respond with HTTP Status OK before any other requests can be made./api/configendpoint — returns in-memoryENSIndexerPublicConfigobject./api/indexing/statusendpoint — returns in-memoryIndexingStatusResponseobject.ENSIndexer
ENSDb module
Includes:
makeDrizzlefactory function known from ENSApi, which creates a Drizzle client instance.apps/ensindexer/src/lib/ensdb/drizzle.tsENSNodeMetadatatype defining data model associated withensnode-metadata.schema.tsapps/ensindexer/src/lib/ensdb/ensnode-metadata.tsEnsDbClientclass for performing ENSDb reads and writes.apps/ensindexer/src/lib/ensdb/ensdb-client.tsENSDb Writer Worker
Initialized at ENSIndexer process startup:
apps/ensindexer/ponder/src/ensdb-writer-worker.tsWaits for ENSIndexer to become healthy and then:
ENSIndexerPublicConfiginto ENSDb.CrossChainIndexingStatusSnapshotinto ENSDb.Related to #1252