roboto.domain.datasets#
Submodules#
Package Contents#
- class roboto.domain.datasets.BeginManifestTransactionRequest(/, **data)#
Bases:
pydantic.BaseModel
Request payload to begin a manifest-based transaction
- Parameters:
data (Any)
- origination: str#
- resource_manifest: dict[str, int]#
- class roboto.domain.datasets.BeginManifestTransactionResponse(/, **data)#
Bases:
pydantic.BaseModel
Response to a manifest-based transaction request
- Parameters:
data (Any)
- transaction_id: str#
- upload_mappings: dict[str, str]#
- class roboto.domain.datasets.BeginSingleFileUploadRequest(/, **data)#
Bases:
pydantic.BaseModel
Request payload to begin a single file upload
- Parameters:
data (Any)
- file_path: str = None#
- file_size: int = None#
- origination: str | None = None#
- class roboto.domain.datasets.BeginSingleFileUploadResponse(/, **data)#
Bases:
pydantic.BaseModel
Response to a single file upload
- Parameters:
data (Any)
- upload_id: str#
- upload_url: str#
- class roboto.domain.datasets.CreateDatasetIfNotExistsRequest(/, **data)#
Bases:
pydantic.BaseModel
- !!! abstract “Usage Documentation”
[Models](../concepts/models.md)
A base class for creating Pydantic models.
- Parameters:
data (Any)
- __class_vars__#
The names of the class variables defined on the model.
- __private_attributes__#
Metadata about the private attributes of the model.
- __signature__#
The synthesized __init__ [Signature][inspect.Signature] of the model.
- __pydantic_complete__#
Whether model building is completed, or if there are still undefined fields.
- __pydantic_core_schema__#
The core schema of the model.
- __pydantic_custom_init__#
Whether the model has a custom __init__ function.
- __pydantic_decorators__#
Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.
- __pydantic_generic_metadata__#
Metadata for generic models; contains data used for a similar purpose to __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
- __pydantic_parent_namespace__#
Parent namespace of the model, used for automatic rebuilding of models.
- __pydantic_post_init__#
The name of the post-init method for the model, if defined.
- __pydantic_root_model__#
Whether the model is a [RootModel][pydantic.root_model.RootModel].
- __pydantic_serializer__#
The pydantic-core SchemaSerializer used to dump instances of the model.
- __pydantic_validator__#
The pydantic-core SchemaValidator used to validate instances of the model.
- __pydantic_fields__#
A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.
- __pydantic_computed_fields__#
A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.
- __pydantic_extra__#
A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.
- __pydantic_fields_set__#
The names of fields explicitly set during instantiation.
- __pydantic_private__#
Values of private attributes set on the model instance.
- create_request: CreateDatasetRequest#
- match_roboql_query: str#
- class roboto.domain.datasets.CreateDatasetRequest(**data)#
Bases:
pydantic.BaseModel
Request payload for creating a new dataset.
Used to specify the initial properties of a dataset during creation, including optional metadata, tags, name, and description.
- description: str | None = None#
Optional human-readable description of the dataset.
- metadata: dict[str, Any] = None#
Key-value metadata pairs to associate with the dataset for discovery and search.
- name: str | None = None#
Optional short name for the dataset (max 120 characters).
- tags: list[str] = None#
List of tags for dataset discovery and organization.
- class roboto.domain.datasets.CreateDirectoryRequest(/, **data)#
Bases:
pydantic.BaseModel
Request payload to create a directory in a dataset
- Parameters:
data (Any)
- create_intermediate_dirs: bool = False#
If True, creates intermediate directories in the path if they don’t exist. If False, requires all parent directories to already exist.
- error_if_exists: bool = False#
- name: str#
- origination: str | None = None#
- parent_path: str | None = None#
- class roboto.domain.datasets.Dataset(record, roboto_client=None)#
Represents a dataset within the Roboto platform.
A dataset is a logical container for files organized in a directory structure. Datasets are the primary organizational unit in Roboto, typically containing files from a single robot activity such as a drone flight, autonomous vehicle mission, or sensor data collection session. However, datasets are versatile enough to serve as a general-purpose assembly of files.
Datasets provide functionality for:
File upload and download operations
Metadata and tag management
File organization and directory operations
Topic data access and analysis
AI-powered content summarization
Integration with automated workflows and triggers
Files within a dataset can be processed by actions, visualized in the web interface, and searched using the query system. Datasets inherit access permissions from their organization and can be shared with other users and systems.
The Dataset class serves as the primary interface for dataset operations in the Roboto SDK, providing methods for file management, metadata operations, and content analysis.
- Parameters:
roboto_client (Optional[roboto.http.RobotoClient])
- UPLOAD_REPORTING_BATCH_COUNT: ClassVar[int] = 10#
Number of batches to break a large upload into for the purpose of reporting progress.
- UPLOAD_REPORTING_MIN_BATCH_SIZE: ClassVar[int] = 10#
Minimum number of files that must be uploaded before reporting progress.
- classmethod create(description=None, metadata=None, name=None, tags=None, caller_org_id=None, roboto_client=None)#
Create a new dataset in the Roboto platform.
Creates a new dataset with the specified properties and returns a Dataset instance for interacting with it. The dataset will be created in the caller’s organization unless a different organization is specified.
- Parameters:
description (Optional[str]) – Optional human-readable description of the dataset.
metadata (Optional[dict[str, Any]]) – Optional key-value metadata pairs to associate with the dataset.
name (Optional[str]) – Optional short name for the dataset (max 120 characters).
tags (Optional[list[str]]) – Optional list of tags for dataset discovery and organization.
caller_org_id (Optional[str]) – Organization ID to create the dataset in. Required for multi-org users.
roboto_client (Optional[roboto.http.RobotoClient]) – HTTP client for API communication. If None, uses the default client.
- Returns:
Dataset instance representing the newly created dataset.
- Raises:
RobotoInvalidRequestException – Invalid dataset parameters.
RobotoUnauthorizedException – Caller lacks permission to create datasets.
- Return type:
Examples
>>> dataset = Dataset.create( ... name="Highway Test Session", ... description="Autonomous vehicle highway driving test data", ... tags=["highway", "autonomous", "test"], ... metadata={"vehicle_id": "vehicle_001", "test_type": "highway"} ... ) >>> print(dataset.dataset_id) ds_abc123
>>> # Create minimal dataset >>> dataset = Dataset.create() >>> print(f"Created dataset: {dataset.dataset_id}")
- create_directory(name, error_if_exists=False, create_intermediate_dirs=False, parent_path=None, origination=None)#
Create a directory within the dataset.
- Parameters:
name (str) – Name of the directory to create.
error_if_exists (bool) – If True, raises an exception if the directory already exists.
parent_path (Optional[pathlib.Path]) – Path of the parent directory. If None, creates the directory in the root of the dataset.
origination (Optional[str]) – Optional string describing the source or context of the directory creation.
create_intermediate_dirs (bool) – If True, creates intermediate directories in the path if they don’t exist. If False, requires all parent directories to already exist.
- Raises:
RobotoConflictException – If the directory already exists and error_if_exists is True.
RobotoUnauthorizedException – If the caller lacks permission to create the directory.
RobotoInvalidRequestException – If the directory name is invalid or the parent path does not exist (when create_intermediate_dirs is False).
- Returns:
DirectoryRecord of the created directory.
- Return type:
Examples
Create a simple directory:
>>> from roboto.domain import datasets >>> dataset = datasets.Dataset.from_id(...) >>> directory = dataset.create_directory("foo") >>> print(directory.relative_path) foo
Create a directory with intermediate directories:
>>> directory = dataset.create_directory( ... name="final", ... parent_path="path/to/deep", ... create_intermediate_dirs=True ... ) >>> print(directory.relative_path) path/to/deep/final
- classmethod create_if_not_exists(match_roboql_query, description=None, metadata=None, name=None, tags=None, caller_org_id=None, roboto_client=None)#
Create a dataset if no existing dataset matches the specified query.
Searches for existing datasets using the provided RoboQL query. If a matching dataset is found, returns that dataset. If no match is found, creates a new dataset with the specified properties and returns it.
- Parameters:
match_roboql_query (str) – RoboQL query string to search for existing datasets. If this query matches any dataset, that dataset will be returned instead of creating a new one.
description (Optional[str]) – Optional human-readable description of the dataset.
metadata (Optional[dict[str, Any]]) – Optional key-value metadata pairs to associate with the dataset.
name (Optional[str]) – Optional short name for the dataset (max 120 characters).
tags (Optional[list[str]]) – Optional list of tags for dataset discovery and organization.
caller_org_id (Optional[str]) – Organization ID to create the dataset in. Required for multi-org users.
roboto_client (Optional[roboto.http.RobotoClient]) – HTTP client for API communication. If None, uses the default client.
- Returns:
Dataset instance representing either the existing matched dataset or the newly created dataset.
- Raises:
RobotoInvalidRequestException – Invalid dataset parameters or malformed RoboQL query.
RobotoUnauthorizedException – Caller lacks permission to create datasets or search existing ones.
- Return type:
Examples
Create a dataset only if no dataset with specific metadata exists:
>>> dataset = Dataset.create_if_not_exists( ... match_roboql_query="dataset.metadata.vehicle_id = 'vehicle_001'", ... name="Vehicle 001 Test Session", ... description="Test data for vehicle 001", ... metadata={"vehicle_id": "vehicle_001", "test_type": "highway"}, ... tags=["vehicle_001", "highway"] ... ) >>> print(dataset.dataset_id) ds_abc123
Create a dataset only if no dataset with specific tags exists:
>>> dataset = Dataset.create_if_not_exists( ... match_roboql_query="dataset.tags CONTAINS 'unique_session_id_xyz'", ... name="Unique Test Session", ... tags=["unique_session_id_xyz", "test"] ... ) >>> # If a dataset with tag 'unique_session_id_xyz' already exists, >>> # that dataset is returned instead of creating a new one
- property created: datetime.datetime#
Timestamp when this dataset was created.
Returns the UTC datetime when this dataset was first created in the Roboto platform. This property is immutable.
- Return type:
datetime.datetime
- property created_by: str#
Identifier of the user who created this dataset.
Returns the identifier of the person or service which originally created this dataset in the Roboto platform.
- Return type:
str
- property dataset_id: str#
Unique identifier for this dataset.
Returns the globally unique identifier assigned to this dataset when it was created. This ID is immutable and used to reference the dataset across the Roboto platform. It is always prefixed with ‘ds_’ to distinguish it from other Roboto resource IDs.
- Return type:
str
- delete()#
Delete this dataset from the Roboto platform.
Permanently removes the dataset and all its associated files, metadata, and topics. This operation cannot be undone.
If a dataset’s files are hosted in Roboto managed S3 buckets or customer read/write bring-your-own-buckets, the files in this dataset will be deleted from S3 as well. For files hosted in customer read-only buckets, the files will not be deleted from S3, but the dataset record and all associated metadata will be deleted.
- Raises:
RobotoNotFoundException – Dataset does not exist or has already been deleted.
RobotoUnauthorizedException – Caller lacks permission to delete the dataset.
- Return type:
None
Examples
>>> dataset = Dataset.from_id("ds_abc123") >>> dataset.delete() # Dataset and all its files are now permanently deleted
- delete_files(include_patterns=None, exclude_patterns=None)#
Delete files from this dataset based on pattern matching.
Deletes files that match the specified include patterns while excluding those that match exclude patterns. Uses gitignore-style pattern matching for flexible file selection.
- Parameters:
include_patterns (Optional[list[str]]) – List of gitignore-style patterns for files to include. If None, all files are considered for deletion.
exclude_patterns (Optional[list[str]]) – List of gitignore-style patterns for files to exclude from deletion. Takes precedence over include patterns.
- Raises:
RobotoUnauthorizedException – Caller lacks permission to delete files.
- Return type:
None
Notes
Pattern matching follows gitignore syntax. See https://git-scm.com/docs/gitignore for detailed pattern format documentation.
Examples
>>> dataset = Dataset.from_id("ds_abc123") >>> # Delete all PNG files except those in back_camera directory >>> dataset.delete_files( ... include_patterns=["**/*.png"], ... exclude_patterns=["**/back_camera/**"] ... )
>>> # Delete all log files >>> dataset.delete_files(include_patterns=["**/*.log"])
- property description: str | None#
Human-readable description of this dataset.
Returns the optional description text that provides details about the dataset’s contents, purpose, or context. Can be None if no description was provided.
- Return type:
Optional[str]
- download_files(out_path, include_patterns=None, exclude_patterns=None)#
Download files from this dataset to a local directory.
Downloads files that match the specified patterns to the given local directory. The directory structure from the dataset is preserved in the download location. If the output directory doesn’t exist, it will be created.
- Parameters:
out_path (pathlib.Path) – Local directory path where files should be downloaded.
include_patterns (Optional[list[str]]) – List of gitignore-style patterns for files to include. If None, all files are downloaded.
exclude_patterns (Optional[list[str]]) – List of gitignore-style patterns for files to exclude from download. Takes precedence over include patterns.
- Returns:
List of tuples containing (FileRecord, local_path) for each downloaded file.
- Raises:
RobotoUnauthorizedException – Caller lacks permission to download files.
- Return type:
list[tuple[roboto.domain.files.FileRecord, pathlib.Path]]
Notes
Pattern matching follows gitignore syntax. See https://git-scm.com/docs/gitignore for detailed pattern format documentation.
Examples
>>> import pathlib >>> dataset = Dataset.from_id("ds_abc123") >>> downloaded = dataset.download_files( ... pathlib.Path("/tmp/dataset_download"), ... include_patterns=["**/*.bag"], ... exclude_patterns=["**/test/**"] ... ) >>> print(f"Downloaded {len(downloaded)} files") Downloaded 5 files
>>> # Download all files >>> all_files = dataset.download_files(pathlib.Path("/tmp/all_files"))
- classmethod from_id(dataset_id, roboto_client=None)#
Create a Dataset instance from a dataset ID.
Retrieves dataset information from the Roboto platform using the provided dataset ID and returns a Dataset instance for interacting with it.
- Parameters:
dataset_id (str) – Unique identifier for the dataset.
roboto_client (Optional[roboto.http.RobotoClient]) – HTTP client for API communication. If None, uses the default client.
- Returns:
Dataset instance representing the requested dataset.
- Raises:
RobotoNotFoundException – Dataset with the given ID does not exist.
RobotoUnauthorizedException – Caller lacks permission to access the dataset.
- Return type:
Examples
>>> dataset = Dataset.from_id("ds_abc123") >>> print(dataset.name) 'Highway Test Session' >>> print(len(list(dataset.list_files()))) 42
- generate_summary()#
Generate a new AI-powered summary of this dataset.
Creates a new AI-generated summary that analyzes the dataset’s contents, including files, metadata, and topics. If a summary already exists, it will be overwritten. The results are persisted and can be retrieved later with get_summary().
- Returns:
AISummary object containing the generated summary text and creation timestamp.
- Raises:
RobotoUnauthorizedException – Caller lacks permission to generate summaries.
RobotoInvalidRequestException – Dataset is not suitable for summarization.
- Return type:
Examples
>>> dataset = Dataset.from_id("ds_abc123") >>> summary = dataset.generate_summary() >>> print(summary.text) This dataset contains autonomous vehicle sensor data from a highway driving session, including camera images, LiDAR point clouds, and GPS coordinates collected over a 30-minute period. >>> print(summary.created) 2024-01-15 10:30:00+00:00
- get_file_by_path(relative_path, version_id=None)#
Get a File instance for a file at the specified path in this dataset.
Retrieves a file by its relative path within the dataset. Optionally retrieves a specific version of the file.
- Parameters:
relative_path (Union[str, pathlib.Path]) – Path of the file relative to the dataset root.
version_id (Optional[int]) – Specific version of the file to retrieve. If None, gets the latest version.
- Returns:
File instance representing the file at the specified path.
- Raises:
RobotoNotFoundException – File at the given path does not exist in the dataset.
RobotoUnauthorizedException – Caller lacks permission to access the file.
- Return type:
Examples
>>> dataset = Dataset.from_id("ds_abc123") >>> file = dataset.get_file_by_path("logs/session1.bag") >>> print(file.file_id) file_xyz789
>>> # Get specific version >>> old_file = dataset.get_file_by_path("data/sensors.csv", version_id=1) >>> print(old_file.version) 1
- get_summary()#
Get the latest AI-generated summary of this dataset.
Retrieves the most recent AI-generated summary for this dataset. If no summary exists, one will be automatically generated (equivalent to calling generate_summary()).
Once a summary is generated, it persists and is returned by this method until generate_summary() is explicitly called again. The summary does not automatically update when the dataset or its files change.
- Returns:
AISummary object containing the summary text and creation timestamp.
- Raises:
RobotoUnauthorizedException – Caller lacks permission to access summaries.
- Return type:
Examples
>>> dataset = Dataset.from_id("ds_abc123") >>> summary = dataset.get_summary() >>> print(summary.text) This dataset contains autonomous vehicle sensor data from a highway driving session, including camera images, LiDAR point clouds, and GPS coordinates collected over a 30-minute period.
>>> # Summary is cached - subsequent calls return the same summary >>> cached_summary = dataset.get_summary() >>> assert summary.created == cached_summary.created
- get_summary_sync(timeout=60, poll_interval=2)#
Poll the summary endpoint until a summary’s status is COMPLETED, or raise an exception if the status is FAILED or the configurable timeout is reached.
This method will call get_summary() repeatedly until the summary reaches a terminal status. If no summary exists when this method is called, one will be generated automatically.
- Parameters:
timeout (float) – The maximum amount of time, in seconds, to wait for the summary to complete. Defaults to 1 minute.
poll_interval (roboto.waiters.Interval) – The amount of time, in seconds, to wait between polling iterations. Defaults to 2 seconds.
- Return type:
Returns: An AI Summary object containing a full LLM summary of the dataset.
- Raises:
RobotoFailedToGenerateException – If the summary status becomes FAILED.
TimeoutError – If the timeout is reached before the summary completes.
- Parameters:
timeout (float)
poll_interval (roboto.waiters.Interval)
- Return type:
Example
>>> from roboto import Dataset >>> dataset = Dataset.from_id("ds_abc123") >>> summary = dataset.get_summary_sync(timeout=60) >>> print(summary.text) This dataset contains ...
- get_topics(include=None, exclude=None)#
Get all topics associated with files in this dataset, with optional filtering.
Retrieves all topics that were extracted from files in this dataset during ingestion. If multiple files have topics with the same name (e.g., chunked files with the same schema), they are returned as separate topic objects.
Topics can be filtered by name using include/exclude patterns. Topics specified on both the inclusion and exclusion lists will be excluded.
- Parameters:
include (Optional[collections.abc.Sequence[str]]) – If provided, only topics with names in this sequence are yielded.
exclude (Optional[collections.abc.Sequence[str]]) – If provided, topics with names in this sequence are skipped. Takes precedence over include list.
- Yields:
Topic instances associated with files in this dataset, filtered according to the parameters.
- Return type:
collections.abc.Generator[roboto.domain.topics.Topic, None, None]
Examples
>>> dataset = Dataset.from_id("ds_abc123") >>> for topic in dataset.get_topics(): ... print(f"Topic: {topic.name}") Topic: /camera/image Topic: /imu/data Topic: /gps/fix
>>> # Only get camera topics >>> camera_topics = list(dataset.get_topics(include=["/camera/image", "/camera/info"])) >>> print(f"Found {len(camera_topics)} camera topics")
>>> # Exclude diagnostic topics >>> data_topics = list(dataset.get_topics(exclude=["/diagnostics"]))
- get_topics_by_file(relative_path)#
Get all topics associated with a specific file in this dataset.
Retrieves all topics that were extracted from the specified file during ingestion. This is a convenience method that combines file lookup and topic retrieval.
- Parameters:
relative_path (Union[str, pathlib.Path]) – Path of the file relative to the dataset root.
- Yields:
Topic instances associated with the specified file.
- Raises:
RobotoNotFoundException – File at the given path does not exist in the dataset.
RobotoUnauthorizedException – Caller lacks permission to access the file or its topics.
- Return type:
collections.abc.Generator[roboto.domain.topics.Topic, None, None]
Examples
>>> dataset = Dataset.from_id("ds_abc123") >>> for topic in dataset.get_topics_by_file("logs/session1.bag"): ... print(f"Topic: {topic.name}") Topic: /camera/image Topic: /imu/data Topic: /gps/fix
- list_directories()#
- Return type:
collections.abc.Generator[roboto.domain.files.DirectoryRecord, None, None]
- list_files(include_patterns=None, exclude_patterns=None)#
List files in this dataset with optional pattern-based filtering.
Returns all files in the dataset that match the specified include patterns while excluding those that match exclude patterns. Uses gitignore-style pattern matching for flexible file selection.
- Parameters:
include_patterns (Optional[list[str]]) – List of gitignore-style patterns for files to include. If None, all files are considered.
exclude_patterns (Optional[list[str]]) – List of gitignore-style patterns for files to exclude. Takes precedence over include patterns.
- Yields:
File instances that match the specified patterns.
- Raises:
RobotoUnauthorizedException – Caller lacks permission to list files.
- Return type:
collections.abc.Generator[roboto.domain.files.File, None, None]
Notes
Pattern matching follows gitignore syntax. See https://git-scm.com/docs/gitignore for detailed pattern format documentation.
Examples
>>> dataset = Dataset.from_id("ds_abc123") >>> for file in dataset.list_files(): ... print(file.relative_path) logs/session1.bag data/sensors.csv images/camera_001.jpg
>>> # List only image files, excluding back camera >>> for file in dataset.list_files( ... include_patterns=["**/*.png", "**/*.jpg"], ... exclude_patterns=["**/back_camera/**"] ... ): ... print(file.relative_path) images/front_camera_001.jpg images/side_camera_001.jpg
- property metadata: dict[str, Any]#
Custom metadata associated with this dataset.
Returns a copy of the dataset’s metadata dictionary containing arbitrary key-value pairs for storing custom information. Supports nested structures and dot notation for accessing nested fields.
- Return type:
dict[str, Any]
- property modified: datetime.datetime#
Timestamp when this dataset was last modified.
Returns the UTC datetime when this dataset was most recently updated. This includes changes to metadata, tags, description, or other properties.
- Return type:
datetime.datetime
- property modified_by: str#
Identifier of the user or service which last modified this dataset.
Returns the identifier of the person or service which most recently updated this dataset’s metadata, tags, description, or other properties.
- Return type:
str
- property name: str | None#
Human-readable name of this dataset.
Returns the optional display name for this dataset. Can be None if no name was provided during creation. For users whose organizations have their own idiomatic internal dataset IDs, it’s recommended to set the name to the organization’s internal dataset ID, since the Roboto dataset_id property is randomly generated.
- Return type:
Optional[str]
- property org_id: str#
Organization identifier that owns this dataset.
Returns the unique identifier of the organization that owns and has primary access control over this dataset.
- Return type:
str
- put_metadata(metadata)#
Add or update metadata fields for this dataset.
Sets each key-value pair in the provided dictionary as dataset metadata. If a key doesn’t exist, it will be created. If it exists, the value will be overwritten. Keys must be strings and dot notation is supported for nested keys.
- Parameters:
metadata (dict[str, Any]) – Dictionary of metadata key-value pairs to add or update.
- Raises:
RobotoUnauthorizedException – Caller lacks permission to update the dataset.
- Return type:
None
Examples
>>> dataset = Dataset.from_id("ds_abc123") >>> dataset.put_metadata({ ... "vehicle_id": "vehicle_001", ... "test_type": "highway_driving", ... "weather.condition": "sunny", ... "weather.temperature": 25 ... }) >>> print(dataset.metadata["vehicle_id"]) 'vehicle_001' >>> print(dataset.metadata["weather"]["condition"]) 'sunny'
- put_tags(tags)#
Add or update tags for this dataset.
Adds each tag in the provided sequence to the dataset. If a tag already exists, it will not be duplicated. This operation replaces the current tag list with the provided tags.
- Parameters:
tags (roboto.updates.StrSequence) – Sequence of tag strings to set on the dataset.
- Raises:
RobotoUnauthorizedException – Caller lacks permission to update the dataset.
- Return type:
None
Examples
>>> dataset = Dataset.from_id("ds_abc123") >>> dataset.put_tags(["highway", "autonomous", "test", "sunny"]) >>> print(dataset.tags) ['highway', 'autonomous', 'test', 'sunny']
- classmethod query(spec=None, roboto_client=None, owner_org_id=None)#
Query datasets using a specification with filters and pagination.
Searches for datasets matching the provided query specification. Results are returned as a generator that automatically handles pagination, yielding Dataset instances as they are retrieved from the API.
- Parameters:
spec (Optional[roboto.query.QuerySpecification]) – Query specification with filters, sorting, and pagination options. If None, returns all accessible datasets.
roboto_client (Optional[roboto.http.RobotoClient]) – HTTP client for API communication. If None, uses the default client.
owner_org_id (Optional[str]) – Organization ID to scope the query. If None, uses caller’s org.
- Yields:
Dataset instances matching the query specification.
- Raises:
ValueError – Query specification references unknown dataset attributes.
RobotoUnauthorizedException – Caller lacks permission to query datasets.
- Return type:
collections.abc.Generator[Dataset, None, None]
Examples
>>> from roboto.query import Comparator, Condition, QuerySpecification >>> spec = QuerySpecification( ... condition=Condition( ... field="name", ... comparator=Comparator.Contains, ... value="Roboto" ... )) >>> for dataset in Dataset.query(spec): ... print(f"Found dataset: {dataset.name}") Found dataset: Roboto Test Found dataset: Other Roboto Test
- property record: roboto.domain.datasets.record.DatasetRecord#
Underlying data record for this dataset.
Returns the raw
DatasetRecord
that contains all the dataset’s data fields. This provides access to the complete dataset state as stored in the platform.- Return type:
- refresh()#
Refresh this dataset instance with the latest data from the platform.
Fetches the current state of the dataset from the Roboto platform and updates this instance’s data. Useful when the dataset may have been modified by other processes or users.
- Returns:
This Dataset instance with refreshed data.
- Raises:
RobotoNotFoundException – Dataset no longer exists.
RobotoUnauthorizedException – Caller lacks permission to access the dataset.
- Return type:
Examples
>>> dataset = Dataset.from_id("ds_abc123") >>> # Dataset may have been updated by another process >>> refreshed_dataset = dataset.refresh() >>> print(f"Current file count: {len(list(refreshed_dataset.list_files()))}")
- remove_metadata(metadata)#
Remove each key in this sequence from dataset metadata if it exists. Keys must be strings. Dot notation is supported for nested keys.
Example
>>> from roboto.domain import datasets >>> dataset = datasets.Dataset(...) >>> dataset.remove_metadata(["foo", "baz.qux"])
- Parameters:
metadata (roboto.updates.StrSequence)
- Return type:
None
- remove_tags(tags)#
Remove each tag in this sequence if it exists
- Parameters:
tags (roboto.updates.StrSequence)
- Return type:
None
- rename_directory(old_path, new_path)#
- Parameters:
old_path (str)
new_path (str)
- Return type:
- property tags: list[str]#
List of tags associated with this dataset.
Returns a copy of the list of string tags that have been applied to this dataset for categorization and filtering purposes.
- Return type:
list[str]
- to_association()#
- Return type:
- to_dict()#
Convert this dataset to a dictionary representation.
Returns the dataset’s data as a JSON-serializable dictionary containing all dataset attributes and metadata.
- Returns:
Dictionary representation of the dataset data.
- Return type:
dict[str, Any]
Examples
>>> dataset = Dataset.from_id("ds_abc123") >>> dataset_dict = dataset.to_dict() >>> print(dataset_dict["name"]) 'Highway Test Session' >>> print(dataset_dict["metadata"]) {'vehicle_id': 'vehicle_001', 'test_type': 'highway'}
- update(conditions=None, description=None, metadata_changeset=None, name=None)#
Update this dataset’s properties.
Updates various properties of the dataset including name, description, and metadata. Only specified parameters are updated; others remain unchanged. Optionally supports conditional updates based on current field values.
- Parameters:
conditions (Optional[list[roboto.updates.UpdateCondition]]) – Optional list of conditions that must be met for the update to proceed.
description (Optional[str]) – New description for the dataset.
metadata_changeset (Optional[roboto.updates.MetadataChangeset]) – Metadata changes to apply (add, update, or remove fields/tags).
name (Optional[str]) – New name for the dataset.
- Returns:
Updated Dataset instance with the new properties.
- Raises:
RobotoUnauthorizedException – Caller lacks permission to update the dataset.
RobotoConditionalUpdateFailedException – Update conditions were not met.
- Return type:
Examples
>>> dataset = Dataset.from_id("ds_abc123") >>> updated_dataset = dataset.update( ... name="Updated Highway Test Session", ... description="Updated description with more details" ... ) >>> print(updated_dataset.name) 'Updated Highway Test Session'
>>> # Update with metadata changes >>> from roboto.updates import MetadataChangeset >>> changeset = MetadataChangeset(put_fields={"processed": True}) >>> updated_dataset = dataset.update(metadata_changeset=changeset)
- upload_directory(directory_path, include_patterns=None, exclude_patterns=None, delete_after_upload=False, max_batch_size=MAX_FILES_PER_MANIFEST, print_progress=True)#
Uploads all files and directories recursively from the specified directory path. You can use include_patterns and exclude_patterns to control what files and directories are uploaded, and can use delete_after_upload to clean up your local filesystem after the uploads succeed.
Example
>>> from roboto import Dataset >>> dataset = Dataset(...) >>> dataset.upload_directory( ... pathlib.Path("/path/to/directory"), ... exclude_patterns=[ ... "__pycache__/", ... "*.pyc", ... "node_modules/", ... "**/*.log", ... ], ... )
Notes
Both include_patterns and exclude_patterns follow the ‘gitignore’ pattern format described in https://git-scm.com/docs/gitignore#_pattern_format.
If both include_patterns and exclude_patterns are provided, files matching exclude_patterns will be excluded even if they match include_patterns.
- Parameters:
directory_path (pathlib.Path)
include_patterns (Optional[list[str]])
exclude_patterns (Optional[list[str]])
delete_after_upload (bool)
max_batch_size (int)
print_progress (bool)
- Return type:
None
- upload_file(file_path, file_destination_path=None, print_progress=True)#
Upload a single file to the dataset. If file_destination_path is not provided, the file will be uploaded to the top-level of the dataset.
Example
>>> from roboto.domain import datasets >>> dataset = datasets.Dataset(...) >>> dataset.upload_file( ... pathlib.Path("/path/to/file.txt"), ... file_destination_path="foo/bar.txt", ... )
- Parameters:
file_path (pathlib.Path)
file_destination_path (Optional[str])
print_progress (bool)
- Return type:
None
- upload_files(files, file_destination_paths={}, max_batch_size=MAX_FILES_PER_MANIFEST, print_progress=True)#
Upload multiple files to the dataset. If file_destination_paths is not provided, files will be uploaded to the top-level of the dataset.
Example
>>> from roboto.domain import datasets >>> dataset = datasets.Dataset(...) >>> dataset.upload_files( ... [ ... pathlib.Path("/path/to/file.txt"), ... ... ... ], ... file_destination_paths={ ... pathlib.Path("/path/to/file.txt"): "foo/bar.txt", ... }, ... )
- Parameters:
files (collections.abc.Iterable[pathlib.Path])
file_destination_paths (collections.abc.Mapping[pathlib.Path, str])
max_batch_size (int)
print_progress (bool)
- class roboto.domain.datasets.DatasetRecord(/, **data)#
Bases:
pydantic.BaseModel
Wire-transmissible representation of a dataset in the Roboto platform.
DatasetRecord contains all the metadata and properties associated with a dataset, including its identification, timestamps, metadata, tags, and organizational information. This is the data structure used for API communication and persistence.
DatasetRecord instances are typically created by the platform during dataset creation operations and are updated as datasets are modified. The Dataset domain class wraps DatasetRecord to provide a more convenient interface for dataset operations.
The record includes audit information (created/modified timestamps and users), organizational context, and user-defined metadata and tags for discovery and organization purposes.
- Parameters:
data (Any)
- administrator: str = 'Roboto'#
Deprecated field maintained for backwards compatibility. Always defaults to ‘Roboto’.
- created: datetime.datetime#
Timestamp when this dataset was created in the Roboto platform.
- created_by: str#
User ID or service account that created this dataset.
- dataset_id: str#
Unique identifier for this dataset within the Roboto platform.
- description: str | None = None#
Human-readable description of the dataset’s contents and purpose.
- device_id: str | None = None#
Optional identifier of the device that generated this dataset’s data.
- metadata: dict[str, Any] = None#
User-defined key-value pairs for storing additional dataset information.
- modified: datetime.datetime#
Timestamp when this dataset was last modified.
- modified_by: str#
User ID or service account that last modified this dataset.
- name: str | None = None#
A short name for this dataset. This may be an org-specific unique ID that’s more meaningful than the dataset_id, or a short summary of the dataset’s contents. If provided, must be 120 characters or less.
- org_id: str#
Organization ID that owns this dataset.
- roboto_record_version: int = 0#
Internal version number for this record, automatically incremented on updates.
- storage_ctx: dict[str, Any] = None#
Deprecated storage context field maintained for backwards compatibility with SDK versions prior to 0.10.0.
- storage_location: str = 'S3'#
Deprecated storage location field maintained for backwards compatibility. Always defaults to ‘S3’.
- tags: list[str] = None#
List of tags for categorizing and discovering this dataset.
- class roboto.domain.datasets.DeleteDirectoriesRequest(/, **data)#
Bases:
pydantic.BaseModel
Request payload for deleting directories within a dataset.
Used to remove entire directory structures and all contained files from a dataset. This is a bulk operation that affects multiple files.
- Parameters:
data (Any)
- directory_paths: list[str]#
List of directory paths to delete from the dataset.
- class roboto.domain.datasets.QueryDatasetFilesRequest(/, **data)#
Bases:
pydantic.BaseModel
Request payload for querying files within a dataset.
Used to retrieve files from a dataset with optional pattern-based filtering and pagination support. Supports gitignore-style patterns for flexible file selection.
- Parameters:
data (Any)
- exclude_patterns: list[str] | None = None#
List of gitignore-style patterns for files to exclude from results.
- include_patterns: list[str] | None = None#
List of gitignore-style patterns for files to include in results.
- page_token: str | None = None#
Token for retrieving the next page of results in paginated queries.
- class roboto.domain.datasets.QueryDatasetsRequest(/, **data)#
Bases:
pydantic.BaseModel
Request payload for querying datasets with filters.
Used to search for datasets based on various criteria such as metadata, tags, and other dataset properties. The filters are applied server-side to efficiently return matching datasets.
- Parameters:
data (Any)
- filters: dict[str, Any] = None#
Dictionary of filter criteria to apply when searching for datasets.
- model_config#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class roboto.domain.datasets.RenameDirectoryRequest(/, **data)#
Bases:
pydantic.BaseModel
Request payload for renaming a directory within a dataset.
Used to change the path of a directory and all its contained files within a dataset. This updates the logical organization without moving actual file content.
- Parameters:
data (Any)
- new_path: str#
New path for the directory.
- old_path: str#
Current path of the directory to rename.
- class roboto.domain.datasets.ReportTransactionProgressRequest(/, **data)#
Bases:
pydantic.BaseModel
Request payload for reporting file upload transaction progress.
Used to notify the platform about the completion status of individual files within a batch upload transaction. This enables progress tracking and partial completion handling for large file uploads.
- Parameters:
data (Any)
- manifest_items: list[str]#
List of manifest item identifiers that have completed upload.
- class roboto.domain.datasets.UpdateDatasetRequest(/, **data)#
Bases:
pydantic.BaseModel
Request payload for updating dataset properties.
Used to modify dataset metadata, description, name, and other properties. Supports conditional updates based on current field values to prevent conflicting concurrent modifications.
- Parameters:
data (Any)
- conditions: list[roboto.updates.UpdateCondition] | None = None#
Optional list of conditions that must be met for the update to proceed.
- description: str | None = None#
New description for the dataset.
- metadata_changeset: roboto.updates.MetadataChangeset | None = None#
Metadata changes to apply (add, update, or remove fields/tags).
- name: str | None = None#
New name for the dataset (max 120 characters).