Components ETL pipeline tutorial
Setup
1. Install project dependencies
To complete this tutorial, you must install uv
and dg
.
First, install duckdb
for a local database and tree
to visualize project structure:
- Mac
- Windows
- Linux
tree
is optional and is only used to produce a nicely formatted representation of the project structure on the comand line. You can also use find
, ls
, dir
, or any other directory listing command.
2. Scaffold a new project
After installing dependencies, scaffold a components-ready project:
dg scaffold project jaffle-platform
Creating a Dagster project at /.../jaffle-platform.
Scaffolded files for Dagster project at /.../jaffle-platform.
...
The dg scaffold project
command builds a project at jaffle-platform
and initializes a new Python
virtual environment inside it. When you use dg
's default environment management behavior, you won't need to worry about activating this virtual environment yourself.
To learn more about the files, directories, and default settings in a project scaffolded with dg scaffold project
, see "Creating a project with components".
Ingest data
1. Add the Sling component type to your environment
To ingest data, you must set up Sling. However, if you list the available component types in your environment at this point, the Sling component won't appear, since the basic dagster-components
package that was installed when you scaffolded your project doesn't include components for specific integrations (like Sling):
dg list component-type
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ Component Type ┃ Summary ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ dagster_components.dagster.DefinitionsComponent │ Wraps an │
│ │ arbitrary set of │
│ │ Dagster │
│ │ definitions. │
│ dagster_components.dagster.PipesSubprocessScriptCollectionComponent │ Assets that wrap │
│ │ Python scripts │
│ │ executed with │
│ │ Dagster's │
│ │ PipesSubprocess… │
└─────────────────────────────────────────────────────────────────────┴──────────────────┘
To make the Sling component available in your environment, install the sling
extra of dagster-components
:
uv add 'dagster-components[sling]'
dg
always operates in an isolated environment, but it is able to access the set of component types available in your project environment because it attempts to resolve a project root whenever it is run. If dg
finds a pyproject.toml
file with a tool.dg.is_project = true
setting, then it will expect a uv
-managed virtual environment to be present in the same directory. (This can be confirmed by the presence of a uv.lock
file.)
When you run commands like dg list component-type
, dg
obtains the results by identifying the in-scope project environment and querying it. In this case, the project environment was set up as part of the dg scaffold project
command.
2. Confirm availability of the Sling component type
To confirm that the dagster_components.sling_replication
component type is now available, run the dg list component-type
command again:
dg list component-type
Using /.../jaffle-platform/.venv/bin/dagster-components
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Component Type ┃ Summary ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dagster_components.dagster.DefinitionsComponent │ Wraps an │
│ │ arbitrary set │
│ │ of Dagster │
│ │ definitions. │
│ dagster_components.dagster.PipesSubprocessScriptCollectionComponent │ Assets that │
│ │ wrap Python │
│ │ scripts │
│ │ executed with │
│ │ Dagster's │
│ │ PipesSubproces… │
│ dagster_components.dagster_sling.SlingReplicationCollectionComponent │ Expose one or │
│ │ more Sling │
│ │ replications to │
│ │ Dagster as │
│ │ assets. │
└──────────────────────────────────────────────────────────────────────┴─────────────────┘
3. Create a new instance of the Sling component
Next, create a new instance of this component type:
dg scaffold component 'dagster_components.dagster_sling.SlingReplicationCollectionComponent' ingest_files
Creating a Dagster component instance folder at /.../jaffle-platform/jaffle_platform/defs/ingest_files.
Using /.../jaffle-platform/.venv/bin/dagster-components
This adds a component instance to the project at jaffle_platform/defs/ingest_files
:
tree jaffle_platform
jaffle_platform
├── __init__.py
├── definitions.py
├── defs
│ ├── __init__.py
│ └── ingest_files
│ ├── component.yaml
│ └── replication.yaml