Assets
An asset is an object in persistent storage, such as a table, file, or persisted machine learning model. An asset definition is a description, in code, of an asset that should exist and how to produce and update that asset.
Asset definitions
Refer to the Asset definitions documentation for more information.
- @dagster.asset
Create a definition for how to compute an asset.
A software-defined asset is the combination of:
- An asset key, e.g. the name of a table.
- A function, which can be run to compute the contents of the asset.
- A set of upstream assets that are provided as inputs to the function when computing the asset. Unlike an op, whose dependencies are determined by the graph it lives inside, an asset knows about the upstream assets it depends on. The upstream assets are inferred from the arguments to the decorated function. The name of the argument designates the name of the upstream asset.
An asset has an op inside it to represent the function that computes it. The name of the op will be the segments of the asset key, separated by double-underscores.
Parameters:
- name (Optional[str]) – The name of the asset. If not provided, defaults to the name of the decorated function. The asset’s name must be a valid name in dagster (ie only contains letters, numbers, and _) and may not contain python reserved keywords.
- key_prefix (Optional[Union[str, Sequence[str]]]) – If provided, the asset’s key is the concatenation of the key_prefix and the asset’s name, which defaults to the name of the decorated function. Each item in key_prefix must be a valid name in dagster (ie only contains letters, numbers, and _) and may not contain python reserved keywords.
- ins (Optional[Mapping[str, AssetIn]]) – A dictionary that maps input names to information about the input.
- deps (Optional[Sequence[Union[AssetDep, AssetsDefinition, SourceAsset, AssetKey, str]]]) – The assets that are upstream dependencies, but do not correspond to a parameter of the decorated function. If the AssetsDefinition for a multi_asset is provided, dependencies on all assets created by the multi_asset will be created.
- config_schema (Optional[ConfigSchema) – The configuration schema for the asset’s underlying op. If set, Dagster will check that config provided for the op matches this schema and fail if it does not. If not set, Dagster will accept any config provided for the op.
- metadata (Optional[Dict[str, Any]]) – A dict of metadata entries for the asset.
- tags (Optional[Mapping[str, str]]) – Tags for filtering and organizing. These tags are not attached to runs of the asset.
- required_resource_keys (Optional[Set[str]]) – Set of resource handles required by the op.
- io_manager_key (Optional[str]) – The resource key of the IOManager used for storing the output of the op as an asset, and for loading it in downstream ops (default: “io_manager”). Only one of io_manager_key and io_manager_def can be provided.
- io_manager_def (Optional[object]) – beta (Beta) The IOManager used for storing the output of the op as an asset, and for loading it in downstream ops. Only one of io_manager_def and io_manager_key can be provided.
- dagster_type (Optional[DagsterType]) – Allows specifying type validation functions that will be executed on the output of the decorated function after it runs.
- partitions_def (Optional[PartitionsDefinition]) – Defines the set of partition keys that compose the asset.
- op_tags (Optional[Dict[str, Any]]) – A dictionary of tags for the op that computes the asset. Frameworks may expect and require certain metadata to be attached to a op. Values that are not strings will be json encoded and must meet the criteria that json.loads(json.dumps(value)) == value.
- group_name (Optional[str]) – A string name used to organize multiple assets into groups. If not provided, the name “default” is used.
- resource_defs (Optional[Mapping[str, object]]) – beta (Beta) A mapping of resource keys to resources. These resources will be initialized during execution, and can be accessed from the context within the body of the function.
- output_required (bool) – Whether the decorated function will always materialize an asset. Defaults to True. If False, the function can conditionally not yield a result. If no result is yielded, no output will be materialized to storage and downstream assets will not be materialized. Note that for output_required to work at all, you must use yield in your asset logic rather than return. return will not respect this setting and will always produce an asset materialization, even if None is returned.
- automation_condition (AutomationCondition) – A condition describing when Dagster should materialize this asset.
- backfill_policy (BackfillPolicy) – beta (Beta) Configure Dagster to backfill this asset according to its BackfillPolicy.
- retry_policy (Optional[RetryPolicy]) – The retry policy for the op that computes the asset.
- code_version (Optional[str]) – Version of the code that generates this asset. In general, versions should be set only for code that deterministically produces the same output when given the same inputs.
- check_specs (Optional[Sequence[AssetCheckSpec]]) – Specs for asset checks that execute in the decorated function after materializing the asset.
- key (Optional[CoeercibleToAssetKey]) – The key for this asset. If provided, cannot specify key_prefix or name.
- owners (Optional[Sequence[str]]) – A list of strings representing owners of the asset. Each string can be a user’s email address, or a team name prefixed with team:, e.g. team:finops.
- kinds (Optional[Set[str]]) – A list of strings representing the kinds of the asset. These will be made visible in the Dagster UI.
- pool (Optional[str]) – A string that identifies the concurrency pool that governs this asset’s execution.
- non_argument_deps (Optional[Union[Set[AssetKey], Set[str]]]) – deprecated Deprecated, use deps instead. Set of asset keys that are upstream dependencies, but do not pass an input to the asset. Hidden parameter not exposed in the decorator signature, but passed in kwargs.
Examples:
@asset
def my_upstream_asset() -> int:
return 5
@asset
def my_asset(my_upstream_asset: int) -> int:
return my_upstream_asset + 1
should_materialize = True
@asset(output_required=False)
def conditional_asset():
if should_materialize:
yield Output(5) # you must `yield`, not `return`, the result
# Will also only materialize if `should_materialize` is `True`
@asset
def downstream_asset(conditional_asset):
return conditional_asset + 1
class
dagster.MaterializeResultAn object representing a successful materialization of an asset. These can be returned from @asset and @multi_asset decorated functions to pass metadata or specify specific assets were materialized.
Parameters:
- asset_key (Optional[AssetKey]) – Optional in @asset, required in @multi_asset to discern which asset this refers to.
- metadata (Optional[RawMetadataMapping]) – Metadata to record with the corresponding AssetMaterialization event.
- check_results (Optional[Sequence[AssetCheckResult]]) – Check results to record with the corresponding AssetMaterialization event.
- data_version (Optional[DataVersion]) – The data version of the asset that was observed.
- tags (Optional[Mapping[str, str]]) – Tags to record with the corresponding AssetMaterialization event.
class
dagster.AssetSpecSpecifies the core attributes of an asset, except for the function that materializes or observes it.
An asset spec plus any materialization or observation function for the asset constitutes an “asset definition”.
Parameters:
- key (AssetKey) – The unique identifier for this asset.
- deps (Optional[AbstractSet[AssetKey]]) – The asset keys for the upstream assets that materializing this asset depends on.
- description (Optional[str]) – Human-readable description of this asset.
- metadata (Optional[Dict[str, Any]]) – A dict of static metadata for this asset. For example, users can provide information about the database table this asset corresponds to.
- skippable (bool) – Whether this asset can be omitted during materialization, causing downstream dependencies to skip.
- group_name (Optional[str]) – A string name used to organize multiple assets into groups. If not provided, the name “default” is used.
- code_version (Optional[str]) – The version of the code for this specific asset, overriding the code version of the materialization function
- backfill_policy (Optional[BackfillPolicy]) – BackfillPolicy to apply to the specified asset.
- owners (Optional[Sequence[str]]) – A list of strings representing owners of the asset. Each string can be a user’s email address, or a team name prefixed with team:, e.g. team:finops.
- tags (Optional[Mapping[str, str]]) – Tags for filtering and organizing. These tags are not attached to runs of the asset.
- kinds – (Optional[Set[str]]): A list of strings representing the kinds of the asset. These will be made visible in the Dagster UI.
- partitions_def (Optional[PartitionsDefinition]) – Defines the set of partition keys that compose the asset.
- merge_attributes
Returns a new AssetSpec with the specified attributes merged with the current attributes.
Parameters:
- deps (Optional[Iterable[CoercibleToAssetDep]]) – A set of asset dependencies to add to the asset self.
- metadata (Optional[Mapping[str, Any]]) – A set of metadata to add to the asset self. Will overwrite any existing metadata with the same key.
- owners (Optional[Sequence[str]]) – A set of owners to add to the asset self.
- tags (Optional[Mapping[str, str]]) – A set of tags to add to the asset self. Will overwrite any existing tags with the same key.
- kinds (Optional[Set[str]]) – A set of kinds to add to the asset self.
Returns: AssetSpec
- replace_attributes
Returns a new AssetSpec with the specified attributes replaced.
- with_io_manager_key
Returns a copy of this AssetSpec with an extra metadata value that dictates which I/O manager to use to load the contents of this asset in downstream computations.
Parameters: io_manager_key (str) – The I/O manager key. This will be used as the value for the “dagster/io_manager_key” metadata key.Returns: AssetSpec
class
dagster.AssetsDefinitionDefines a set of assets that are produced by the same op or graph.
AssetsDefinitions are typically not instantiated directly, but rather produced using the
@asset
or@multi_asset
decorators.static
from_graphConstructs an AssetsDefinition from a GraphDefinition.
Parameters:
- graph_def (GraphDefinition) – The GraphDefinition that is an asset.
- keys_by_input_name (Optional[Mapping[str, AssetKey]]) – A mapping of the input names of the decorated graph to their corresponding asset keys. If not provided, the input asset keys will be created from the graph input names.
- keys_by_output_name (Optional[Mapping[str, AssetKey]]) – A mapping of the output names of the decorated graph to their corresponding asset keys. If not provided, the output asset keys will be created from the graph output names.
- key_prefix (Optional[Union[str, Sequence[str]]]) – If provided, key_prefix will be prepended to each key in keys_by_output_name. Each item in key_prefix must be a valid name in dagster (ie only contains letters, numbers, and _) and may not contain python reserved keywords.
- internal_asset_deps (Optional[Mapping[str, Set[AssetKey]]]) – By default, it is assumed that all assets produced by the graph depend on all assets that are consumed by that graph. If this default is not correct, you pass in a map of output names to a corrected set of AssetKeys that they depend on. Any AssetKeys in this list must be either used as input to the asset or produced within the graph.
- partitions_def (Optional[PartitionsDefinition]) – Defines the set of partition keys that compose the assets.
- partition_mappings (Optional[Mapping[str, PartitionMapping]]) – Defines how to map partition keys for this asset to partition keys of upstream assets. Each key in the dictionary correponds to one of the input assets, and each value is a PartitionMapping. If no entry is provided for a particular asset dependency, the partition mapping defaults to the default partition mapping for the partitions definition, which is typically maps partition keys to the same partition keys in upstream assets.
- resource_defs (Optional[Mapping[str, ResourceDefinition]]) – beta (Beta) A mapping of resource keys to resource definitions. These resources will be initialized during execution, and can be accessed from the body of ops in the graph during execution.
- group_name (Optional[str]) – A group name for the constructed asset. Assets without a group name are assigned to a group called “default”.
- group_names_by_output_name (Optional[Mapping[str, Optional[str]]]) – Defines a group name to be associated with some or all of the output assets for this node. Keys are names of the outputs, and values are the group name. Cannot be used with the group_name argument.
- descriptions_by_output_name (Optional[Mapping[str, Optional[str]]]) – Defines a description to be associated with each of the output asstes for this graph.
- metadata_by_output_name (Optional[Mapping[str, Optional[RawMetadataMapping]]]) – Defines metadata to be associated with each of the output assets for this node. Keys are names of the outputs, and values are dictionaries of metadata to be associated with the related asset.
- tags_by_output_name (Optional[Mapping[str, Optional[Mapping[str, str]]]]) – Defines tags to be associated with each of the output assets for this node. Keys are the names of outputs, and values are dictionaries of tags to be associated with the related asset.
- freshness_policies_by_output_name (Optional[Mapping[str, Optional[FreshnessPolicy]]]) – Defines a FreshnessPolicy to be associated with some or all of the output assets for this node. Keys are the names of the outputs, and values are the FreshnessPolicies to be attached to the associated asset.
- automation_conditions_by_output_name (Optional[Mapping[str, Optional[AutomationCondition]]]) – Defines an AutomationCondition to be associated with some or all of the output assets for this node. Keys are the names of the outputs, and values are the AutoMaterializePolicies to be attached to the associated asset.
- backfill_policy (Optional[BackfillPolicy]) – Defines this asset’s BackfillPolicy
- owners_by_key (Optional[Mapping[AssetKey, Sequence[str]]]) – Defines owners to be associated with each of the asset keys for this node.
static
from_opConstructs an AssetsDefinition from an OpDefinition.
Parameters:
- op_def (OpDefinition) – The OpDefinition that is an asset.
- keys_by_input_name (Optional[Mapping[str, AssetKey]]) – A mapping of the input names of the decorated op to their corresponding asset keys. If not provided, the input asset keys will be created from the op input names.
- keys_by_output_name (Optional[Mapping[str, AssetKey]]) – A mapping of the output names of the decorated op to their corresponding asset keys. If not provided, the output asset keys will be created from the op output names.
- key_prefix (Optional[Union[str, Sequence[str]]]) – If provided, key_prefix will be prepended to each key in keys_by_output_name. Each item in key_prefix must be a valid name in dagster (ie only contains letters, numbers, and _) and may not contain python reserved keywords.
- internal_asset_deps (Optional[Mapping[str, Set[AssetKey]]]) – By default, it is assumed that all assets produced by the op depend on all assets that are consumed by that op. If this default is not correct, you pass in a map of output names to a corrected set of AssetKeys that they depend on. Any AssetKeys in this list must be either used as input to the asset or produced within the op.
- partitions_def (Optional[PartitionsDefinition]) – Defines the set of partition keys that compose the assets.
- partition_mappings (Optional[Mapping[str, PartitionMapping]]) – Defines how to map partition keys for this asset to partition keys of upstream assets. Each key in the dictionary correponds to one of the input assets, and each value is a PartitionMapping. If no entry is provided for a particular asset dependency, the partition mapping defaults to the default partition mapping for the partitions definition, which is typically maps partition keys to the same partition keys in upstream assets.
- group_name (Optional[str]) – A group name for the constructed asset. Assets without a group name are assigned to a group called “default”.
- group_names_by_output_name (Optional[Mapping[str, Optional[str]]]) – Defines a group name to be associated with some or all of the output assets for this node. Keys are names of the outputs, and values are the group name. Cannot be used with the group_name argument.
- descriptions_by_output_name (Optional[Mapping[str, Optional[str]]]) – Defines a description to be associated with each of the output asstes for this graph.
- metadata_by_output_name (Optional[Mapping[str, Optional[RawMetadataMapping]]]) – Defines metadata to be associated with each of the output assets for this node. Keys are names of the outputs, and values are dictionaries of metadata to be associated with the related asset.
- tags_by_output_name (Optional[Mapping[str, Optional[Mapping[str, str]]]]) – Defines tags to be associated with each othe output assets for this node. Keys are the names of outputs, and values are dictionaries of tags to be associated with the related asset.
- freshness_policies_by_output_name (Optional[Mapping[str, Optional[FreshnessPolicy]]]) – Defines a FreshnessPolicy to be associated with some or all of the output assets for this node. Keys are the names of the outputs, and values are the FreshnessPolicies to be attached to the associated asset.
- automation_conditions_by_output_name (Optional[Mapping[str, Optional[AutomationCondition]]]) – Defines an AutomationCondition to be associated with some or all of the output assets for this node. Keys are the names of the outputs, and values are the AutoMaterializePolicies to be attached to the associated asset.
- backfill_policy (Optional[BackfillPolicy]) – Defines this asset’s BackfillPolicy
- get_asset_spec
Returns a representation of this asset as an
AssetSpec
.If this is a multi-asset, the “key” argument allows selecting which asset to return the spec for.
Parameters: key (Optional[AssetKey]) – If this is a multi-asset, select which asset to return its AssetSpec. If not a multi-asset, this can be left as None.Returns: AssetSpec
- get_partition_mapping
Returns the partition mapping between keys in this AssetsDefinition and a given input asset key (if any).
- to_source_asset
Returns a representation of this asset as a
SourceAsset
.If this is a multi-asset, the “key” argument allows selecting which asset to return a SourceAsset representation of.
Parameters: key (Optional[Union[str, Sequence[str], AssetKey]]]) – If this is a multi-asset, select which asset to return a SourceAsset representation of. If not a multi-asset, this can be left as None.Returns: SourceAsset
- to_source_assets
Returns a SourceAsset for each asset in this definition.
Each produced SourceAsset will have the same key, metadata, io_manager_key, etc. as the corresponding asset
property
asset_depsMaps assets that are produced by this definition to assets that they depend on. The dependencies can be either “internal”, meaning that they refer to other assets that are produced by this definition, or “external”, meaning that they refer to assets that aren’t produced by this definition.
property
can_subsetIf True, indicates that this AssetsDefinition may materialize any subset of its asset keys in a given computation (as opposed to being required to materialize all asset keys).
Type: bool
property
check_specsReturns the asset check specs defined on this AssetsDefinition, i.e. the checks that can be executed while materializing the assets.
Return type: Iterable[AssetsCheckSpec]
property
dependency_keysThe asset keys which are upstream of any asset included in this AssetsDefinition.
Type: Iterable[AssetKey]
property
descriptions_by_keyReturns a mapping from the asset keys in this AssetsDefinition to the descriptions assigned to them. If there is no assigned description for a given AssetKey, it will not be present in this dictionary.
Type: Mapping[AssetKey, str]
property
group_names_by_keyReturns a mapping from the asset keys in this AssetsDefinition to the group names assigned to them. If there is no assigned group name for a given AssetKey, it will not be present in this dictionary.
Type: Mapping[AssetKey, str]
property
keyThe asset key associated with this AssetsDefinition. If this AssetsDefinition has more than one asset key, this will produce an error.
Type: AssetKey
property
keysThe asset keys associated with this AssetsDefinition.
Type: AbstractSet[AssetKey]
property
node_defReturns the OpDefinition or GraphDefinition that is used to materialize the assets in this AssetsDefinition.
Type: NodeDefinition
property
opReturns the OpDefinition that is used to materialize the assets in this AssetsDefinition.
Type: OpDefinition
property
partitions_defThe PartitionsDefinition for this AssetsDefinition (if any).
Type: Optional[PartitionsDefinition]
property
required_resource_keysThe set of keys for resources that must be provided to this AssetsDefinition.
Type: Set[str]
property
resource_defsA mapping from resource name to ResourceDefinition for the resources bound to this AssetsDefinition.
Type: Mapping[str, ResourceDefinition]
class
dagster.AssetKeyObject representing the structure of an asset key. Takes in a sanitized string, list of strings, or tuple of strings.
Example usage:
from dagster import AssetKey
AssetKey("asset1")
AssetKey(["asset1"]) # same as the above
AssetKey(["prefix", "asset1"])
AssetKey(["prefix", "subprefix", "asset1"])Parameters: path (Union[str, Sequence[str]]) – String, list of strings, or tuple of strings. A list of strings represent the hierarchical structure of the asset_key.
- dagster.map_asset_specs
Map a function over a sequence of AssetSpecs or AssetsDefinitions, replacing specs in the sequence or specs in an AssetsDefinitions with the result of the function.
Parameters:
- func (Callable[[AssetSpec], AssetSpec]) – The function to apply to each AssetSpec.
- iterable (Iterable[Union[AssetsDefinition, AssetSpec]]) – The sequence of AssetSpecs or AssetsDefinitions.
Returns: A sequence of AssetSpecs or AssetsDefinitions with the function applied to each spec.
Return type: Sequence[Union[AssetsDefinition, AssetSpec]] Examples:
from dagster import AssetSpec, map_asset_specs
asset_specs = [
AssetSpec(key="my_asset"),
AssetSpec(key="my_asset_2"),
]
mapped_specs = map_asset_specs(lambda spec: spec.replace_attributes(owners=["nelson@hooli.com"]), asset_specs)
Graph-backed asset definitions
Refer to the Graph-backed asset documentation for more information.
- @dagster.graph_asset
Creates a software-defined asset that’s computed using a graph of ops.
This decorator is meant to decorate a function that composes a set of ops or graphs to define the dependencies between them.
Parameters:
-
name (Optional[str]) – The name of the asset. If not provided, defaults to the name of the decorated function. The asset’s name must be a valid name in Dagster (ie only contains letters, numbers, and underscores) and may not contain Python reserved keywords.
-
description (Optional[str]) – A human-readable description of the asset.
-
ins (Optional[Mapping[str, AssetIn]]) – A dictionary that maps input names to information about the input.
-
config (Optional[Union[ConfigMapping], Mapping[str, Any]) –
Describes how the graph underlying the asset is configured at runtime.
If a
ConfigMapping
object is provided, then the graph takes on the config schema of this object. The mapping will be applied at runtime to generate the config for the graph’s constituent nodes.If a dictionary is provided, then it will be used as the default run config for the graph. This means it must conform to the config schema of the underlying nodes. Note that the values provided will be viewable and editable in the Dagster UI, so be careful with secrets.
-
key_prefix (Optional[Union[str, Sequence[str]]]) – If provided, the asset’s key is the concatenation of the key_prefix and the asset’s name, which defaults to the name of the decorated function. Each item in key_prefix must be a valid name in Dagster (ie only contains letters, numbers, and underscores) and may not contain Python reserved keywords.
-
group_name (Optional[str]) – A string name used to organize multiple assets into groups. If not provided, the name “default” is used.
-
partitions_def (Optional[PartitionsDefinition]) – Defines the set of partition keys that compose the asset.
-
metadata (Optional[RawMetadataMapping]) – Dictionary of metadata to be associated with the asset.
-
tags (Optional[Mapping[str, str]]) – Tags for filtering and organizing. These tags are not attached to runs of the asset.
-
owners (Optional[Sequence[str]]) – A list of strings representing owners of the asset. Each string can be a user’s email address, or a team name prefixed with team:, e.g. team:finops.
-
kinds (Optional[Set[str]]) – A list of strings representing the kinds of the asset. These will be made visible in the Dagster UI.
-
automation_condition (Optional[AutomationCondition]) – The AutomationCondition to use for this asset.
-
backfill_policy (Optional[BackfillPolicy]) – The BackfillPolicy to use for this asset.
-
code_version (Optional[str]) – Version of the code that generates this asset. In general, versions should be set only for code that deterministically produces the same output when given the same inputs.
-
key (Optional[CoeercibleToAssetKey]) – The key for this asset. If provided, cannot specify key_prefix or name.
Examples:
@op
def fetch_files_from_slack(context) -> pd.DataFrame:
...
@op
def store_files(files) -> None:
files.to_sql(name="slack_files", con=create_db_connection())
@graph_asset
def slack_files_table():
return store_files(fetch_files_from_slack())-
- @dagster.graph_multi_asset
Create a combined definition of multiple assets that are computed using the same graph of ops, and the same upstream assets.
Each argument to the decorated function references an upstream asset that this asset depends on. The name of the argument designates the name of the upstream asset.
Parameters:
-
name (Optional[str]) – The name of the graph.
-
outs – (Optional[Dict[str, AssetOut]]): The AssetOuts representing the produced assets.
-
ins (Optional[Mapping[str, AssetIn]]) – A dictionary that maps input names to information about the input.
-
partitions_def (Optional[PartitionsDefinition]) – Defines the set of partition keys that compose the assets.
-
backfill_policy (Optional[BackfillPolicy]) – The backfill policy for the asset.
-
group_name (Optional[str]) – A string name used to organize multiple assets into groups. This group name will be applied to all assets produced by this multi_asset.
-
can_subset (bool) – Whether this asset’s computation can emit a subset of the asset keys based on the context.selected_assets argument. Defaults to False.
-
config (Optional[Union[ConfigMapping], Mapping[str, Any]) –
Describes how the graph underlying the asset is configured at runtime.
If a
ConfigMapping
object is provided, then the graph takes on the config schema of this object. The mapping will be applied at runtime to generate the config for the graph’s constituent nodes.If a dictionary is provided, then it will be used as the default run config for the graph. This means it must conform to the config schema of the underlying nodes. Note that the values provided will be viewable and editable in the Dagster UI, so be careful with secrets.
If no value is provided, then the config schema for the graph is the default (derived
-
Multi-asset definitions
Refer to the Multi-asset documentation for more information.
- @dagster.multi_asset
Create a combined definition of multiple assets that are computed using the same op and same upstream assets.
Each argument to the decorated function references an upstream asset that this asset depends on. The name of the argument designates the name of the upstream asset.
You can set I/O managers keys, auto-materialize policies, freshness policies, group names, etc. on an individual asset within the multi-asset by attaching them to the
AssetOut
corresponding to that asset in the outs parameter.Parameters:
- name (Optional[str]) – The name of the op.
- outs – (Optional[Dict[str, AssetOut]]): The AssetOuts representing the assets materialized by this function. AssetOuts detail the output, IO management, and core asset properties. This argument is required except when AssetSpecs are used.
- ins (Optional[Mapping[str, AssetIn]]) – A dictionary that maps input names to information about the input.
- deps (Optional[Sequence[Union[AssetsDefinition, SourceAsset, AssetKey, str]]]) – The assets that are upstream dependencies, but do not correspond to a parameter of the decorated function. If the AssetsDefinition for a multi_asset is provided, dependencies on all assets created by the multi_asset will be created.
- config_schema (Optional[ConfigSchema) – The configuration schema for the asset’s underlying op. If set, Dagster will check that config provided for the op matches this schema and fail if it does not. If not set, Dagster will accept any config provided for the op.
- required_resource_keys (Optional[Set[str]]) – Set of resource handles required by the underlying op.
- internal_asset_deps (Optional[Mapping[str, Set[AssetKey]]]) – By default, it is assumed that all assets produced by a multi_asset depend on all assets that are consumed by that multi asset. If this default is not correct, you pass in a map of output names to a corrected set of AssetKeys that they depend on. Any AssetKeys in this list must be either used as input to the asset or produced within the op.
- partitions_def (Optional[PartitionsDefinition]) – Defines the set of partition keys that compose the assets.
- backfill_policy (Optional[BackfillPolicy]) – The backfill policy for the op that computes the asset.
- op_tags (Optional[Dict[str, Any]]) – A dictionary of tags for the op that computes the asset. Frameworks may expect and require certain metadata to be attached to a op. Values that are not strings will be json encoded and must meet the criteria that json.loads(json.dumps(value)) == value.
- can_subset (bool) – If this asset’s computation can emit a subset of the asset keys based on the context.selected_asset_keys argument. Defaults to False.
- resource_defs (Optional[Mapping[str, object]]) – beta (Beta) A mapping of resource keys to resources. These resources will be initialized during execution, and can be accessed from the context within the body of the function.
- group_name (Optional[str]) – A string name used to organize multiple assets into groups. This group name will be applied to all assets produced by this multi_asset.
- retry_policy (Optional[RetryPolicy]) – The retry policy for the op that computes the asset.
- code_version (Optional[str]) – Version of the code encapsulated by the multi-asset. If set, this is used as a default code version for all defined assets.
- specs (Optional[Sequence[AssetSpec]]) – The specifications for the assets materialized by this function.
- check_specs (Optional[Sequence[AssetCheckSpec]]) – Specs for asset checks that execute in the decorated function after materializing the assets.
- pool (Optional[str]) – A string that identifies the concurrency pool that governs this multi-asset’s execution.
- non_argument_deps (Optional[Union[Set[AssetKey], Set[str]]]) – deprecated Deprecated, use deps instead. Set of asset keys that are upstream dependencies, but do not pass an input to the multi_asset.
Examples:
@multi_asset(
specs=[
AssetSpec("asset1", deps=["asset0"]),
AssetSpec("asset2", deps=["asset0"]),
]
)
def my_function():
asset0_value = load(path="asset0")
asset1_result, asset2_result = do_some_transformation(asset0_value)
write(asset1_result, path="asset1")
write(asset2_result, path="asset2")
# Or use IO managers to handle I/O:
@multi_asset(
outs={
"asset1": AssetOut(),
"asset2": AssetOut(),
}
)
def my_function(asset0):
asset1_value = do_some_transformation(asset0)
asset2_value = do_some_other_transformation(asset0)
return asset1_value, asset2_value
- @dagster.multi_observable_source_asset
- beta
This API is currently in beta, and may have breaking changes in minor version releases, with behavior changes in patch releases.
Defines a set of assets that can be observed together with the same function.
Parameters:
- name (Optional[str]) – The name of the op.
- required_resource_keys (Optional[Set[str]]) – Set of resource handles required by the underlying op.
- partitions_def (Optional[PartitionsDefinition]) – Defines the set of partition keys that compose the assets.
- can_subset (bool) – If this asset’s computation can emit a subset of the asset keys based on the context.selected_assets argument. Defaults to False.
- resource_defs (Optional[Mapping[str, object]]) – beta (Beta) A mapping of resource keys to resources. These resources will be initialized during execution, and can be accessed from the context within the body of the function.
- group_name (Optional[str]) – A string name used to organize multiple assets into groups. This group name will be applied to all assets produced by this multi_asset.
- specs (Optional[Sequence[AssetSpec]]) – The specifications for the assets observed by this function.
- check_specs (Optional[Sequence[AssetCheckSpec]]) – Specs for asset checks that execute in the decorated function after observing the assets.
Examples:
@multi_observable_source_asset(
specs=[AssetSpec("asset1"), AssetSpec("asset2")],
)
def my_function():
yield ObserveResult(asset_key="asset1", metadata={"foo": "bar"})
yield ObserveResult(asset_key="asset2", metadata={"baz": "qux"})
class
dagster.AssetOutDefines one of the assets produced by a
@multi_asset
.Parameters:
- key_prefix (Optional[Union[str, Sequence[str]]]) – If provided, the asset’s key is the concatenation of the key_prefix and the asset’s name. When using
@multi_asset
, the asset name defaults to the key of the “outs” dictionary Only one of the “key_prefix” and “key” arguments should be provided. - key (Optional[Union[str, Sequence[str], AssetKey]]) – The asset’s key. Only one of the “key_prefix” and “key” arguments should be provided.
- dagster_type (Optional[Union[Type, DagsterType]]]) – The type of this output. Should only be set if the correct type can not be inferred directly from the type signature of the decorated function.
- description (Optional[str]) – Human-readable description of the output.
- is_required (bool) – Whether the presence of this field is required. (default: True)
- io_manager_key (Optional[str]) – The resource key of the IO manager used for this output. (default: “io_manager”).
- metadata (Optional[Dict[str, Any]]) – A dict of the metadata for the output. For example, users can provide a file path if the data object will be stored in a filesystem, or provide information of a database table when it is going to load the data into the table.
- group_name (Optional[str]) – A string name used to organize multiple assets into groups. If not provided, the name “default” is used.
- code_version (Optional[str]) – The version of the code that generates this asset.
- freshness_policy (Optional[FreshnessPolicy]) – deprecated (Deprecated) A policy which indicates how up to date this asset is intended to be.
- automation_condition (Optional[AutomationCondition]) – AutomationCondition to apply to the specified asset.
- backfill_policy (Optional[BackfillPolicy]) – BackfillPolicy to apply to the specified asset.
- owners (Optional[Sequence[str]]) – A list of strings representing owners of the asset. Each string can be a user’s email address, or a team name prefixed with team:, e.g. team:finops.
- tags (Optional[Mapping[str, str]]) – Tags for filtering and organizing. These tags are not attached to runs of the asset.
static
from_specBuilds an AssetOut from the passed spec.
Parameters:
- spec (AssetSpec) – The spec to build the AssetOut from.
- dagster_type (Optional[Union[Type, DagsterType]]) – The type of this output. Should only be set if the correct type can not be inferred directly from the type signature of the decorated function.
- is_required (bool) – Whether the presence of this field is required. (default: True)
- io_manager_key (Optional[str]) – The resource key of the IO manager used for this output. (default: “io_manager”).
- backfill_policy (Optional[BackfillPolicy]) – BackfillPolicy to apply to the specified asset.
Returns: The AssetOut built from the spec.Return type: AssetOut
- key_prefix (Optional[Union[str, Sequence[str]]]) – If provided, the asset’s key is the concatenation of the key_prefix and the asset’s name. When using
Source assets
Refer to the External asset dependencies documentation for more information.
class
dagster.SourceAsset- deprecated
This API will be removed in version 2.0.0. Use AssetSpec instead. If using the SourceAsset io_manager_key property, use AssetSpec(...).with_io_manager_key(...)..
A SourceAsset represents an asset that will be loaded by (but not updated by) Dagster.
Parameters:
- key (Union[AssetKey, Sequence[str], str]) – The key of the asset.
- metadata (Mapping[str, MetadataValue]) – Metadata associated with the asset.
- io_manager_key (Optional[str]) – The key for the IOManager that will be used to load the contents of the asset when it’s used as an input to other assets inside a job.
- io_manager_def (Optional[IOManagerDefinition]) – beta (Beta) The definition of the IOManager that will be used to load the contents of the asset when it’s used as an input to other assets inside a job.
- resource_defs (Optional[Mapping[str, ResourceDefinition]]) – beta (Beta) resource definitions that may be required by the
dagster.IOManagerDefinition
provided in the io_manager_def argument. - description (Optional[str]) – The description of the asset.
- partitions_def (Optional[PartitionsDefinition]) – Defines the set of partition keys that compose the asset.
- observe_fn (Optional[SourceAssetObserveFunction])
- op_tags (Optional[Dict[str, Any]]) – A dictionary of tags for the op that computes the asset. Frameworks may expect and require certain metadata to be attached to a op. Values that are not strings will be json encoded and must meet the criteria that json.loads(json.dumps(value)) == value.
- auto_observe_interval_minutes (Optional[float]) – While the asset daemon is turned on, a run of the observation function for this asset will be launched at this interval. observe_fn must be provided.
- freshness_policy (FreshnessPolicy) – deprecated A constraint telling Dagster how often this asset is intended to be updated with respect to its root data.
- tags (Optional[Mapping[str, str]]) – Tags for filtering and organizing. These tags are not attached to runs of the asset.
property
is_observableWhether the asset is observable.
Type: bool
property
opThe OpDefinition associated with the observation function of an observable source asset.
Throws an error if the asset is not observable.
Type: OpDefinition
- @dagster.observable_source_asset
- beta
This API is currently in beta, and may have breaking changes in minor version releases, with behavior changes in patch releases.
Create a SourceAsset with an associated observation function.
The observation function of a source asset is wrapped inside of an op and can be executed as part of a job. Each execution generates an AssetObservation event associated with the source asset. The source asset observation function should return a
DataVersion
, a ~dagster.DataVersionsByPartition, or anObserveResult
.Parameters:
- name (Optional[str]) – The name of the source asset. If not provided, defaults to the name of the decorated function. The asset’s name must be a valid name in dagster (ie only contains letters, numbers, and _) and may not contain python reserved keywords.
- key_prefix (Optional[Union[str, Sequence[str]]]) – If provided, the source asset’s key is the concatenation of the key_prefix and the asset’s name, which defaults to the name of the decorated function. Each item in key_prefix must be a valid name in dagster (ie only contains letters, numbers, and _) and may not contain python reserved keywords.
- metadata (Mapping[str, RawMetadataValue]) – Metadata associated with the asset.
- io_manager_key (Optional[str]) – The key for the IOManager that will be used to load the contents of the source asset when it’s used as an input to other assets inside a job.
- io_manager_def (Optional[IOManagerDefinition]) – beta (Beta) The definition of the IOManager that will be used to load the contents of the source asset when it’s used as an input to other assets inside a job.
- description (Optional[str]) – The description of the asset.
- group_name (Optional[str]) – A string name used to organize multiple assets into groups. If not provided, the name “default” is used.
- required_resource_keys (Optional[Set[str]]) – Set of resource keys required by the observe op.
- resource_defs (Optional[Mapping[str, ResourceDefinition]]) – beta (Beta) resource definitions that may be required by the
dagster.IOManagerDefinition
provided in the io_manager_def argument. - partitions_def (Optional[PartitionsDefinition]) – Defines the set of partition keys that compose the asset.
- op_tags (Optional[Dict[str, Any]]) – A dictionary of tags for the op that computes the asset. Frameworks may expect and require certain metadata to be attached to a op. Values that are not strings will be json encoded and must meet the criteria that json.loads(json.dumps(value)) == value.
- tags (Optional[Mapping[str, str]]) – Tags for filtering and organizing. These tags are not attached to runs of the asset.
- observe_fn (Optional[SourceAssetObserveFunction]) – Observation function for the source asset.
- automation_condition (Optional[AutomationCondition]) – A condition describing when Dagster should materialize this asset.
class
dagster.ObserveResultAn object representing a successful observation of an asset. These can be returned from an @observable_source_asset decorated function to pass metadata.
Parameters:
- asset_key (Optional[AssetKey]) – The asset key. Optional to include.
- metadata (Optional[RawMetadataMapping]) – Metadata to record with the corresponding AssetObservation event.
- check_results (Optional[Sequence[AssetCheckResult]]) – Check results to record with the corresponding AssetObservation event.
- data_version (Optional[DataVersion]) – The data version of the asset that was observed.
- tags (Optional[Mapping[str, str]]) – Tags to record with the corresponding AssetObservation event.
Dependencies
class
dagster.AssetDepSpecifies a dependency on an upstream asset.
Parameters:
- asset (Union[AssetKey, str, AssetSpec, AssetsDefinition, SourceAsset]) – The upstream asset to depend on.
- partition_mapping (Optional[PartitionMapping]) – Defines what partitions to depend on in the upstream asset. If not provided and the upstream asset is partitioned, defaults to the default partition mapping for the partitions definition, which is typically maps partition keys to the same partition keys in upstream assets.
Examples:
upstream_asset = AssetSpec("upstream_asset")
downstream_asset = AssetSpec(
"downstream_asset",
deps=[
AssetDep(
upstream_asset,
partition_mapping=TimeWindowPartitionMapping(start_offset=-1, end_offset=-1)
)
]
)
class
dagster.AssetInDefines an asset dependency.
Parameters:
- key_prefix (Optional[Union[str, Sequence[str]]]) – If provided, the asset’s key is the concatenation of the key_prefix and the input name. Only one of the “key_prefix” and “key” arguments should be provided.
- key (Optional[Union[str, Sequence[str], AssetKey]]) – The asset’s key. Only one of the “key_prefix” and “key” arguments should be provided.
- metadata (Optional[Dict[str, Any]]) – A dict of the metadata for the input. For example, if you only need a subset of columns from an upstream table, you could include that in metadata and the IO manager that loads the upstream table could use the metadata to determine which columns to load.
- partition_mapping (Optional[PartitionMapping]) – Defines what partitions to depend on in the upstream asset. If not provided, defaults to the default partition mapping for the partitions definition, which is typically maps partition keys to the same partition keys in upstream assets.
- dagster_type (DagsterType) – Allows specifying type validation functions that will be executed on the input of the decorated function before it runs.
Asset jobs
Asset jobs enable the automation of asset materializations. Dagster’s asset selection syntax can be used to select assets and assign them to a job.
- dagster.define_asset_job
Creates a definition of a job which will either materialize a selection of assets or observe a selection of source assets. This will only be resolved to a JobDefinition once placed in a code location.
Parameters:
-
name (str) – The name for the job.
-
selection (Union[str, Sequence[str], Sequence[AssetKey], Sequence[Union[AssetsDefinition, SourceAsset]], AssetSelection]) –
The assets that will be materialized or observed when the job is run.
The selected assets must all be included in the assets that are passed to the assets argument of the Definitions object that this job is included on.
The string “my_asset*” selects my_asset and all downstream assets within the code location. A list of strings represents the union of all assets selected by strings within the list.
-
config –
Describes how the Job is parameterized at runtime.
If no value is provided, then the schema for the job’s run config is a standard format based on its ops and resources.
If a dictionary is provided, then it must conform to the standard config schema, and it will be used as the job’s run config for the job whenever the job is executed. The values provided will be viewable and editable in the Dagster UI, so be careful with secrets.
-
tags (Optional[Mapping[str, object]]) – A set of key-value tags that annotate the job and can be used for searching and filtering in the UI. Values that are not already strings will be serialized as JSON. If run_tags is not set, then the content of tags will also be automatically appended to the tags of any runs of this job.
-
run_tags (Optional[Mapping[str, object]]) – A set of key-value tags that will be automatically attached to runs launched by this job. Values that are not already strings will be serialized as JSON. These tag values may be overwritten by tag values provided at invocation time. If run_tags is set, then tags are not automatically appended to the tags of any runs of this job.
-
metadata (Optional[Mapping[str, RawMetadataValue]]) – Arbitrary metadata about the job. Keys are displayed string labels, and values are one of the following: string, float, int, JSON-serializable dict, JSON-serializable list, and one of the data classes returned by a MetadataValue static method.
-
description (Optional[str]) – A description for the Job.
-
executor_def (Optional[ExecutorDefinition]) – How this Job will be executed. Defaults to
multi_or_in_process_executor
, which can be switched between multi-process and in-process modes of execution. The default mode of execution is multi-process. -
op_retry_policy (Optional[RetryPolicy]) – The default retry policy for all ops that compute assets in this job. Only used if retry policy is not defined on the asset definition.
-
partitions_def (Optional[PartitionsDefinition]) – deprecated (Deprecated) Defines the set of partitions for this job. Deprecated because partitioning is inferred from the selected assets, so setting this is redundant.
Returns: The job, which can be placed inside a code location.Return type: UnresolvedAssetJobDefinition Examples:
# A job that targets all assets in the code location:
@asset
def asset1():
...
defs = Definitions(
assets=[asset1],
jobs=[define_asset_job("all_assets")],
)
# A job that targets a single asset
@asset
def asset1():
...
defs = Definitions(
assets=[asset1],
jobs=[define_asset_job("all_assets", selection=[asset1])],
)
# A job that targets all the assets in a group:
defs = Definitions(
assets=assets,
jobs=[define_asset_job("marketing_job", selection=AssetSelection.groups("marketing"))],
)
@observable_source_asset
def source_asset():
...
# A job that observes a source asset:
defs = Definitions(
assets=assets,
jobs=[define_asset_job("observation_job", selection=[source_asset])],
)
# Resources are supplied to the assets, not the job:
@asset(required_resource_keys={"slack_client"})
def asset1():
...
defs = Definitions(
assets=[asset1],
jobs=[define_asset_job("all_assets")],
resources={"slack_client": prod_slack_client},
)-
class
dagster.AssetSelectionAn AssetSelection defines a query over a set of assets and asset checks, normally all that are defined in a code location.
You can use the “|”, “&