This repository was archived by the owner on Jul 13, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi NYCPlanning! I miss you all! I've been using DBT for my new job and it's amazing! I'm creating this PR to just give you a taste on how things can look like if Pluto adopted DBT.
Things DBT solves in the Pluto context
02_build.shthat manually specifies execution order
for each table you also get to see the SQL code that generates it!
you can also use the lineage graph to see table lineageFrom a development perspective
.envusing a single~/.dbt/profiles.yml! e.g.this way, you can switch across different projects / prod or dev environment easily in different repo by specifying the specific
target->dbt run xxxxxx --target devdb-prod. This is helpful because you can ensure the environment set up is consistent across team members by only having to maintain 1 file instead of 1 file for each repo/project.2. DBT makes it easy to create schema / create table / view or replace them when you want to rerun some code. In the pluto repo, there are a lot of code to
DROP TABLE IF EXISTSorCREATE TABLE/CREATE SCHEMAwhich adds a lot of bulk to the code, DBT abstracted all that away so you can focus on the business logic -> usually stated in aSELECTstatement.3. We tried to start doing this in DevDB, it is recommended to use
SELECTfor business logic because it's more declarative and more transparent compared toINSERTorUPDATE.4. Testing is also made easy, especially in the context of Pluto, we always want to make sure e.g. there's no duplicated BBL in certain tables vs another, you can easily do so by using the
dbt_utilspackage out of the box. e.g.this would conduct the following tests:
for column
geo_bblfor tablestg_geocodescheck the field is unique and not nullfor column
borough, check the field is not null and contains only values in [1, 2, 3, 4, 5]The
dbt testcommand makes it really easy to implement some of the QAQC checks that gave us a lot of a headache.Not implemented, but might be useful
Good luck! lemme know if you have questions! I'm always on github! say hi to everyone for me thanks!