roboto.domain.files#

Submodules#

Package Contents#

class roboto.domain.files.AbortTransactionsRequest(/, **data)#

Bases: pydantic.BaseModel

Request payload for aborting file upload transactions.

Used to cancel ongoing file upload transactions, typically when uploads fail or are no longer needed. This cleans up any reserved resources and marks associated files as no longer pending.

Parameters:: data (Any)

transaction_ids: list[str]#: List of transaction IDs to abort.

type roboto.domain.files.CredentialProvider = Callable[[], S3Credentials]#

class roboto.domain.files.DatasetCredentials(/, **data)#

Bases: pydantic.BaseModel

Handle credentials for dataset file access

Parameters:: data (Any)

access_key_id: str#

bucket: str#

expiration: datetime.datetime#

is_expired()#

Return type:: bool

region: str#

required_prefix: str#

secret_access_key: str#

session_token: str#

to_dict()#

Return type:: dict[str, Any]

to_s3_credentials()#

Return type:: S3Credentials

class roboto.domain.files.DeleteFileRequest(/, **data)#

Bases: pydantic.BaseModel

Request payload for deleting a file from the platform.

This request is used internally by the platform to delete files and their associated data. The file is identified by its storage URI.

Parameters:: data (Any)

uri: str#

//bucket/path/to/file.bag’).

Type:: Storage URI of the file to delete (e.g., ‘s3

class roboto.domain.files.DirectoryContentsPage(/, **data)#

Bases: pydantic.BaseModel

Response containing the contents of a dataset directory page.

Represents a paginated view of files and subdirectories within a dataset directory. Used when browsing dataset contents hierarchically.

Parameters:: data (Any)

directories: collections.abc.Sequence[roboto.domain.files.record.DirectoryRecord]#: Subdirectories contained in this directory page.

files: collections.abc.Sequence[roboto.domain.files.record.FileRecord]#: Files contained in this directory page.

next_token: str | None = None#: Token for retrieving the next page of results, if any.

class roboto.domain.files.DirectoryRecord(/, **data)#

Bases: pydantic.BaseModel

Wire-transmissible representation of a directory within a dataset.

DirectoryRecord represents a logical directory structure within a dataset, containing metadata about the directory’s location and contents. Directories are used to organize files hierarchically within datasets.

Directory records are typically returned when browsing dataset contents or when performing directory-based operations like bulk deletion.

Parameters:: data (Any)

association_id: str#

created: datetime.datetime#

created_by: str#

description: str | None = None#

directory_id: str#

fs_type: FSType#

metadata: dict[str, Any] = None#

modified: datetime.datetime#

modified_by: str#

name: str#: Name of the directory (the final component of the path).

org_id: str#

origination: str#

parent_id: str | None = None#

relative_path: str#

status: FileStatus#

storage_type: FileStorageType#

tags: list[str] = None#

upload_id: str#

class roboto.domain.files.FSType#

Bases: str, enum.Enum

File system type enum

Directory = 'directory'#

File = 'file'#

class roboto.domain.files.File(record, roboto_client=None)#

Represents a file within the Roboto platform.

Files are the fundamental data storage unit in Roboto. They can be uploaded to datasets, imported from external sources, or created as outputs from actions. Once in the platform, files can be tagged with metadata, post-processed by actions, added to collections, visualized in the web interface, and searched using the query system.

Files contain structured data that can be ingested into topics for analysis and visualization. Common file formats include ROS bags, MCAP files, ULOG files, CSV files, and many others. Each file has an associated ingestion status that tracks whether its data has been processed and made available for querying.

Files are versioned entities - each modification creates a new version while preserving the history. Files are associated with datasets and inherit access permissions from their parent dataset.

The File class provides methods for downloading, updating metadata, managing tags, accessing topics, and performing other file operations. It serves as the primary interface for file manipulation in the Roboto SDK.

Parameters:

record (roboto.domain.files.record.FileRecord)
roboto_client (Optional[roboto.http.RobotoClient])

static construct_s3_obj_arn(bucket, key, partition='aws')#

Construct an S3 object ARN from bucket and key components.

Parameters:

bucket (str) – S3 bucket name.
key (str) – S3 object key (path within the bucket).
partition (str) – AWS partition name, defaults to “aws”.

Returns:

Complete S3 object ARN string.

Return type:

str

Examples

>>> arn = File.construct_s3_obj_arn("my-bucket", "path/to/file.bag")
>>> print(arn)
'arn:aws:s3:::my-bucket/path/to/file.bag'

static construct_s3_obj_uri(bucket, key, version=None)#

Construct an S3 object URI from bucket, key, and optional version.

Parameters:

bucket (str) – S3 bucket name.
key (str) – S3 object key (path within the bucket).
version (Optional[str]) – Optional S3 object version ID.

Returns:

Complete S3 object URI string.

Return type:

str

Examples

>>> uri = File.construct_s3_obj_uri("my-bucket", "path/to/file.bag")
>>> print(uri)
's3://my-bucket/path/to/file.bag'

>>> versioned_uri = File.construct_s3_obj_uri("my-bucket", "path/to/file.bag", "abc123")
>>> print(versioned_uri)
's3://my-bucket/path/to/file.bag?versionId=abc123'

property created: datetime.datetime#

Timestamp when this file was created.

Returns the UTC datetime when this file was first uploaded or created in the Roboto platform. This timestamp is immutable.

Return type:: datetime.datetime

property created_by: str#

Identifier of the user who created this file.

Returns the user ID or identifier of the person or service that originally uploaded or created this file in the Roboto platform.

Return type:: str

property dataset_id: str#

Identifier of the dataset that contains this file.

Returns the unique identifier of the dataset that this file belongs to. Files are always associated with exactly one dataset.

Return type:: str

delete()#

Delete this file from the Roboto platform.

Permanently removes the file and all its associated data, including topics and metadata. This operation cannot be undone.

For files that were imported from customer S3 buckets (read-only BYOB integrations), this method does not delete the file content from S3. It only removes the metadata and references within the Roboto platform.

Raises:

RobotoNotFoundException – File does not exist or has already been deleted.
RobotoUnauthorizedException – Caller lacks permission to delete the file.

Return type:

None

Examples

>>> file = File.from_id("file_abc123")
>>> file.delete()
# File is now permanently deleted

property description: str | None#

Human-readable description of this file.

Returns the optional description text that provides details about the file’s contents, purpose, or context. Can be None if no description was provided.

Return type:: Optional[str]

download(local_path, credential_provider=None, progress_monitor_factory=NoopProgressMonitorFactory())#

Download this file to a local path.

Downloads the file content from cloud storage to the specified local path. The parent directories are created automatically if they don’t exist.

Parameters:

local_path (pathlib.Path) – Local filesystem path where the file should be saved.
credential_provider (Optional[roboto.domain.files.file_creds.CredentialProvider]) – Custom credentials for accessing the file storage. If None, uses default credentials for the file’s dataset.
progress_monitor_factory (roboto.domain.files.progress.ProgressMonitorFactory) – Factory for creating progress monitors to track download progress. Defaults to no progress monitoring.

Raises:

RobotoUnauthorizedException – Caller lacks permission to download the file.
FileNotFoundError – File content is not available in storage.

Examples

>>> import pathlib
>>> file = File.from_id("file_abc123")
>>> local_path = pathlib.Path("/tmp/downloaded_file.bag")
>>> file.download(local_path)
>>> print(f"Downloaded to {local_path}")

>>> # Download with progress monitoring
>>> from roboto.domain.files.progress import TqdmProgressMonitorFactory
>>> progress_factory = TqdmProgressMonitorFactory()
>>> file.download(local_path, progress_monitor_factory=progress_factory)

property file_id: str#

Unique identifier for this file.

Returns the globally unique identifier assigned to this file when it was created. This ID is immutable and used to reference the file across the Roboto platform.

Return type:: str

classmethod from_id(file_id, version_id=None, roboto_client=None)#

Create a File instance from a file ID.

Retrieves file information from the Roboto platform using the provided file ID and optionally a specific version.

Parameters:

file_id (str) – Unique identifier for the file.
version_id (Optional[int]) – Specific version of the file to retrieve. If None, gets the latest version.
roboto_client (Optional[roboto.http.RobotoClient]) – HTTP client for API communication. If None, uses the default client.

Returns:

File instance representing the requested file.

Raises:

RobotoNotFoundException – File with the given ID does not exist.
RobotoUnauthorizedException – Caller lacks permission to access the file.

Return type:

File

Examples

>>> file = File.from_id("file_abc123")
>>> print(file.relative_path)
'data/sensor_logs.bag'

>>> old_version = File.from_id("file_abc123", version_id=1)
>>> print(old_version.version)
1

classmethod from_path_and_dataset_id(file_path, dataset_id, version_id=None, roboto_client=None)#

Create a File instance from a file path and dataset ID.

Retrieves file information using the file’s relative path within a specific dataset. This is useful when you know the file’s location within a dataset but not its file ID.

Parameters:

file_path (Union[str, pathlib.Path]) – Relative path of the file within the dataset.
dataset_id (str) – ID of the dataset containing the file.
version_id (Optional[int]) – Specific version of the file to retrieve. If None, gets the latest version.
roboto_client (Optional[roboto.http.RobotoClient]) – HTTP client for API communication. If None, uses the default client.

Returns:

File instance representing the requested file.

Raises:

RobotoNotFoundException – File at the given path does not exist in the dataset.
RobotoUnauthorizedException – Caller lacks permission to access the file or dataset.

Return type:

File

Examples

>>> file = File.from_path_and_dataset_id("logs/session1.bag", "ds_abc123")
>>> print(file.file_id)
'file_xyz789'

>>> file = File.from_path_and_dataset_id(pathlib.Path("data/sensors.csv"), "ds_abc123")
>>> print(file.relative_path)
'data/sensors.csv'

static generate_s3_client(credential_provider, tcp_keepalive=True)#

Generate a configured S3 client using Roboto credentials.

Creates an S3 client with refreshable credentials obtained from the provided credential provider. The client is configured with the appropriate region and connection settings.

Parameters:

credential_provider (roboto.domain.files.file_creds.CredentialProvider) – Function that returns AWS credentials for S3 access.
tcp_keepalive (bool) – Whether to enable TCP keepalive for the S3 connection.

Returns:

Configured boto3 S3 client instance.

Examples

>>> from roboto.domain.files.file_creds import FileCredentialsHelper
>>> helper = FileCredentialsHelper(roboto_client)
>>> cred_provider = helper.get_dataset_download_creds_provider("ds_123", "bucket")
>>> s3_client = File.generate_s3_client(cred_provider)

generate_summary()#

Generate a new AI generated summary of this file. If a summary already exists, it will be overwritten. The results of this call are persisted and can be retrieved with get_summary().

Returns: An AISummary object containing the summary text and the creation timestamp.

Example

>>> from roboto import File
>>> fl = File.from_id("fl_abc123")
>>> summary = fl.generate_summary()
>>> print(summary.text)
This file contains ...

Return type:: roboto.ai.summary.AISummary

get_signed_url(override_content_type=None, override_content_disposition=None)#

Generate a signed URL for direct access to this file.

Creates a time-limited URL that allows direct access to the file content without requiring Roboto authentication. Useful for sharing files or integrating with external systems.

Parameters:

override_content_type (Optional[str]) – Custom MIME type to set in the response headers.
override_content_disposition (Optional[str]) – Custom content disposition header value (e.g., “attachment; filename=myfile.bag”).

Returns:

Signed URL string that provides temporary access to the file.

Raises:

RobotoUnauthorizedException – Caller lacks permission to access the file.

Return type:

str

Examples

>>> file = File.from_id("file_abc123")
>>> url = file.get_signed_url()
>>> print(f"Direct access URL: {url}")

>>> # Force download with custom filename
>>> download_url = file.get_signed_url(
...     override_content_disposition="attachment; filename=data.bag"
... )

get_summary()#

Get the latest AI generated summary of this file. If no summary exists, one will be generated, equivalent to a call to generate_summary().

After the first summary for a file is generated, it will be persisted and returned by this method until generate_summary() is explicitly called again. This applies even if the file or its topics/metadata change.

Returns: An AISummary object containing the summary text and the creation timestamp.

Example

>>> from roboto import File
>>> fl = File.from_id("fl_abc123")
>>> summary = fl.get_summary()
>>> print(summary.text)
This file contains ...

Return type:: roboto.ai.summary.AISummary

get_summary_sync(timeout=60, poll_interval=2)#

Poll the summary endpoint until a summary’s status is COMPLETED, or raise an exception if the status is FAILED or the configurable timeout is reached.

This method will call get_summary() repeatedly until the summary reaches a terminal status. If no summary exists when this method is called, one will be generated automatically.

Parameters:

timeout (float) – The maximum amount of time, in seconds, to wait for the summary to complete. Defaults to 1 minute.
poll_interval (roboto.waiters.Interval) – The amount of time, in seconds, to wait between polling iterations. Defaults to 2 seconds.

Return type:

roboto.ai.summary.AISummary

Returns: An AI Summary object containing a full LLM summary of the file.

Raises:

RobotoFailedToGenerateException – If the summary status becomes FAILED.
TimeoutError – If the timeout is reached before the summary completes.

Parameters:

timeout (float)
poll_interval (roboto.waiters.Interval)

Return type:

roboto.ai.summary.AISummary

Example

>>> from roboto import File
>>> fl = File.from_id("fl_abc123")
>>> summary = fl.get_summary_sync(timeout=60)
>>> print(summary.text)
This file contains ...

get_topic(topic_name)#

Get a specific topic from this file by name.

Retrieves a topic with the specified name that is associated with this file. Topics contain the structured data extracted from the file during ingestion.

Parameters:

topic_name (str) – Name of the topic to retrieve (e.g., “/camera/image”, “/imu/data”).

Returns:

Topic instance for the specified topic name.

Raises:

RobotoNotFoundException – Topic with the given name does not exist in this file.
RobotoUnauthorizedException – Caller lacks permission to access the topic.

Return type:

roboto.domain.topics.Topic

Examples

>>> file = File.from_id("file_abc123")
>>> camera_topic = file.get_topic("/camera/image")
>>> print(f"Topic schema: {camera_topic.schema}")

>>> # Access topic data
>>> for record in camera_topic.get_data():
...     print(f"Timestamp: {record['timestamp']}")

get_topics(include=None, exclude=None)#

Get all topics associated with this file, with optional filtering.

Retrieves all topics that were extracted from this file during ingestion. Topics can be filtered by name using include/exclude patterns.

Parameters:

include (Optional[collections.abc.Sequence[str]]) – If provided, only topics with names in this sequence are yielded.
exclude (Optional[collections.abc.Sequence[str]]) – If provided, topics with names in this sequence are skipped.

Yields:

Topic instances associated with this file, filtered according to the parameters.

Return type:

collections.abc.Generator[roboto.domain.topics.Topic, None, None]

Examples

>>> file = File.from_id("file_abc123")
>>> for topic in file.get_topics():
...     print(f"Topic: {topic.name}")
Topic: /camera/image
Topic: /imu/data
Topic: /gps/fix

>>> # Only get camera topics
>>> camera_topics = list(file.get_topics(include=["/camera/image", "/camera/info"]))
>>> print(f"Found {len(camera_topics)} camera topics")

>>> # Exclude diagnostic topics
>>> data_topics = list(file.get_topics(exclude=["/diagnostics"]))

classmethod import_batch(requests, roboto_client=None, caller_org_id=None)#

Import files from customer S3 bring-your-own buckets into Roboto datasets.

This is the ingress point for importing data stored in customer-owned S3 buckets that have been registered as read-only bring-your-own bucket (BYOB) integrations with Roboto. Files remain in their original S3 locations while metadata is registered with Roboto for discovery, processing, and analysis.

This method only works with S3 URIs from buckets that have been properly registered as BYOB integrations for your organization. It performs batch operations to efficiently import multiple files in a single API call, reducing overhead and improving performance.

Parameters:

requests (collections.abc.Sequence[roboto.domain.files.operations.ImportFileRequest]) – Sequence of import requests, each specifying file details and metadata.
roboto_client (Optional[roboto.http.RobotoClient]) – HTTP client for API communication. If None, uses the default client.
caller_org_id (Optional[str]) – Organization ID of the caller. Required for multi-org users.

Returns:

Sequence of File objects representing the imported files.

Raises:

RobotoInvalidRequestException – If any URI is not a valid S3 URI, if the batch exceeds 500 items, or if bucket integrations are not properly configured.
RobotoUnauthorizedException – If the caller lacks upload permissions for target datasets or if buckets don’t belong to the caller’s organization.

Return type:

collections.abc.Sequence[File]

Notes

Only works with S3 URIs from registered read-only BYOB integrations
Files are not copied; only metadata is imported into Roboto
Batch size is limited to 500 items per request
All S3 buckets must be registered to the caller’s organization

Examples

>>> from roboto.domain.files import ImportFileRequest
>>> requests = [
...     ImportFileRequest(
...         dataset_id="ds_abc123",
...         relative_path="logs/session1.bag",
...         uri="s3://my-bucket/data/session1.bag",
...         size=1024000
...     ),
...     ImportFileRequest(
...         dataset_id="ds_abc123",
...         relative_path="logs/session2.bag",
...         uri="s3://my-bucket/data/session2.bag",
...         size=2048000
...     )
... ]
>>> files = File.import_batch(requests)
>>> print(f"Imported {len(files)} files")
Imported 2 files

classmethod import_one(dataset_id, relative_path, uri, description=None, tags=None, metadata=None, roboto_client=None)#

Import a single file from an external bucket into a Roboto dataset. This currently only supports AWS S3.

This is a convenience method for importing a single file from customer-owned buckets that have been registered as bring-your-own bucket (BYOB) integrations with Roboto. Unlike import_batch(), this method automatically determines the file size by querying the object store and verifies that the object actually exists before importing, providing additional validation and convenience for single-file operations.

The file remains in its original location while metadata is registered with Roboto for discovery, processing, and analysis. This method currently only works with S3 URIs from buckets that have been properly registered as BYOB integrations for your organization.

Parameters:

dataset_id (str) – ID of the dataset to import the file into.
relative_path (str) – Path of the file relative to the dataset root (e.g., logs/session1.bag).
uri (str) – URI where the file is located (e.g., s3://my-bucket/path/to/file.bag). Must be from a registered BYOB integration.
description (Optional[str]) – Optional human-readable description of the file.
tags (Optional[list[str]]) – Optional list of tags for file discovery and organization.
metadata (Optional[dict[str, Any]]) – Optional key-value metadata pairs to associate with the file.
roboto_client (Optional[roboto.http.RobotoClient]) – HTTP client for API communication. If None, uses the default client.

Returns:

File object representing the imported file.

Raises:

RobotoInvalidRequestException – If the URI is not a valid URI or if the bucket integration is not properly configured.
RobotoNotFoundException – If the specified object does not exist.
RobotoUnauthorizedException – If the caller lacks upload permissions for the target dataset or if the bucket doesn’t belong to the caller’s organization.

Return type:

File

Notes

Only works with S3 URIs from registered BYOB integrations
File size is automatically determined from the object metadata
The file is not copied; only metadata is imported into Roboto
For importing multiple files efficiently, use import_batch() instead

Examples

Import a single ROS bag file:

>>> from roboto.domain.files import File
>>> file = File.import_one(
...     dataset_id="ds_abc123",
...     relative_path="logs/session1.bag",
...     uri="s3://my-bucket/data/session1.bag"
... )
>>> print(f"Imported file: {file.relative_path}")
Imported file: logs/session1.bag

Import a file with metadata and tags:

>>> file = File.import_one(
...     dataset_id="ds_abc123",
...     relative_path="sensors/lidar_data.pcd",
...     uri="s3://my-bucket/sensors/lidar_data.pcd",
...     description="LiDAR point cloud from highway test",
...     tags=["lidar", "highway", "test"],
...     metadata={"sensor_type": "Velodyne", "resolution": "high"}
... )
>>> print(f"File size: {file.size} bytes")

property ingestion_status: roboto.domain.files.record.IngestionStatus#

Current ingestion status of this file.

Returns the status indicating whether this file has been processed and its data extracted into topics. Used to track ingestion pipeline progress.

Return type:: roboto.domain.files.record.IngestionStatus

mark_ingested()#

Mark this file as fully ingested and ready for post-processing.

Updates the file’s ingestion status to indicate that all data has been successfully processed and extracted into topics. This enables triggers and other automated workflows that depend on complete ingestion.

Returns:: Updated File instance with ingestion status set to Ingested.
Raises:: RobotoUnauthorizedException – Caller lacks permission to update the file.
Return type:: File

Notes

This method is typically called by ingestion actions after they have successfully processed all data in the file. Once marked as ingested, the file becomes eligible for additional post-processing actions.

Examples

>>> file = File.from_id("file_abc123")
>>> print(file.ingestion_status)
IngestionStatus.NotIngested
>>> updated_file = file.mark_ingested()
>>> print(updated_file.ingestion_status)
IngestionStatus.Ingested

property metadata: dict[str, Any]#

Custom metadata associated with this file.

Returns the file’s metadata dictionary containing arbitrary key-value pairs for storing custom information. Supports nested structures and dot notation for accessing nested fields.

Return type:: dict[str, Any]

property modified: datetime.datetime#

Timestamp when this file was last modified.

Returns the UTC datetime when this file’s metadata, tags, or other properties were most recently updated. The file content itself is immutable, but metadata can be modified.

Return type:: datetime.datetime

property modified_by: str#

Identifier of the user who last modified this file.

Returns the user ID or identifier of the person who most recently updated this file’s metadata, tags, or other mutable properties.

Return type:: str

property org_id: str#

Organization identifier that owns this file.

Returns the unique identifier of the organization that owns and has primary access control over this file.

Return type:: str

put_metadata(metadata)#

Add or update metadata fields for this file.

Adds new metadata fields or updates existing ones. Existing fields not specified in the metadata dict are preserved.

Parameters:: metadata (dict[str, Any]) – Dictionary of metadata key-value pairs to add or update.
Returns:: Updated File instance with the new metadata.
Raises:: RobotoUnauthorizedException – Caller lacks permission to update the file.
Return type:: File

Examples

>>> file = File.from_id("file_abc123")
>>> updated_file = file.put_metadata({
...     "vehicle_id": "vehicle_001",
...     "session_type": "highway_driving",
...     "weather": "sunny"
... })
>>> print(updated_file.metadata["vehicle_id"])
'vehicle_001'

put_tags(tags)#

Add or update tags for this file.

Replaces the file’s current tags with the provided list. To add tags while preserving existing ones, retrieve current tags first and combine them.

Parameters:: tags (list[str]) – List of tag strings to set on the file.
Returns:: Updated File instance with the new tags.
Raises:: RobotoUnauthorizedException – Caller lacks permission to update the file.
Return type:: File

Examples

>>> file = File.from_id("file_abc123")
>>> updated_file = file.put_tags(["sensor-data", "highway", "sunny"])
>>> print(updated_file.tags)
['sensor-data', 'highway', 'sunny']

classmethod query(spec=None, roboto_client=None, owner_org_id=None)#

Query files using a specification with filters and pagination.

Searches for files matching the provided query specification. Results are returned as a generator that automatically handles pagination, yielding File instances as they are retrieved from the API.

Parameters:

spec (Optional[roboto.query.QuerySpecification]) – Query specification with filters, sorting, and pagination options. If None, returns all accessible files.
roboto_client (Optional[roboto.http.RobotoClient]) – HTTP client for API communication. If None, uses the default client.
owner_org_id (Optional[str]) – Organization ID to scope the query. If None, uses caller’s org.

Yields:

File instances matching the query specification.

Raises:

ValueError – Query specification references unknown file attributes.
RobotoUnauthorizedException – Caller lacks permission to query files.

Return type:

collections.abc.Generator[File, None, None]

Examples

>>> from roboto.query import Comparator, Condition, QuerySpecification
>>> spec = QuerySpecification(
...     condition=Condition(
...         field="tags",
...         comparator=Comparator.Contains,
...         value="sensor-data"
...     ))
>>> for file in File.query(spec):
...     print(f"Found file: {file.relative_path}")
Found file: logs/sensors_2024_01_01.bag
Found file: logs/sensors_2024_01_02.bag

>>> # Query with metadata filter
>>> spec = QuerySpecification(
...     condition=Condition(
...         field="metadata.vehicle_id",
...         comparator=Comparator.Equals,
...         value="vehicle_001"
...     ))
>>> files = list(File.query(spec))
>>> print(f"Found {len(files)} files for vehicle_001")

property record: roboto.domain.files.record.FileRecord#

Underlying data record for this file.

Returns the raw FileRecord that contains all the file’s data fields. This provides access to the complete file state as stored in the platform.

Return type:: roboto.domain.files.record.FileRecord

refresh()#

Refresh this file instance with the latest data from the platform.

Fetches the current state of the file from the Roboto platform and updates this instance’s data. Useful when the file may have been modified by other processes or users.

Returns:

This File instance with refreshed data.

Raises:

RobotoNotFoundException – File no longer exists.
RobotoUnauthorizedException – Caller lacks permission to access the file.

Return type:

File

Examples

>>> file = File.from_id("file_abc123")
>>> # File may have been updated by another process
>>> refreshed_file = file.refresh()
>>> print(f"Current version: {refreshed_file.version}")

property relative_path: str#

Path of this file relative to its dataset root.

Returns the file path within the dataset, using forward slashes as separators regardless of the operating system. This path uniquely identifies the file within its dataset.

Return type:: str

rename_file(file_id, new_path)#

Rename this file to a new path within its dataset.

Changes the relative path of the file within its dataset. This updates the file’s location identifier but does not move the actual file content.

Parameters:

file_id (str) – File ID (currently unused, kept for API compatibility).
new_path (str) – New relative path for the file within the dataset.

Returns:

Updated FileRecord with the new path.

Raises:

RobotoUnauthorizedException – Caller lacks permission to rename the file.
RobotoInvalidRequestException – New path is invalid or conflicts with existing file.

Return type:

roboto.domain.files.record.FileRecord

Examples

>>> file = File.from_id("file_abc123")
>>> print(file.relative_path)
'old_logs/session1.bag'
>>> updated_record = file.rename_file("file_abc123", "logs/session1.bag")
>>> print(updated_record.relative_path)
'logs/session1.bag'

property tags: list[str]#

List of tags associated with this file.

Returns the list of string tags that have been applied to this file for categorization and filtering purposes.

Return type:: list[str]

to_association()#

Convert this file to an Association reference.

Creates an Association object that can be used to reference this file in other contexts, such as when creating collections or specifying action inputs.

Returns:: Association object referencing this file and its current version.
Return type:: roboto.association.Association

Examples

>>> file = File.from_id("file_abc123")
>>> association = file.to_association()
>>> print(f"Association: {association.association_type}:{association.association_id}")
Association: file:file_abc123

to_dict()#

Convert this file to a dictionary representation.

Returns the file’s data as a JSON-serializable dictionary containing all file attributes and metadata.

Returns:: Dictionary representation of the file data.
Return type:: dict[str, Any]

Examples

>>> file = File.from_id("file_abc123")
>>> file_dict = file.to_dict()
>>> print(file_dict["relative_path"])
'logs/session1.bag'
>>> print(file_dict["metadata"])
{'vehicle_id': 'vehicle_001', 'session_type': 'highway'}

update(description=NotSet, metadata_changeset=NotSet, ingestion_complete=NotSet)#

Update this file’s properties.

Updates various properties of the file including description, metadata, and ingestion status. Only specified parameters are updated; others remain unchanged.

Parameters:

description (Optional[Union[str, roboto.sentinels.NotSetType]]) – New description for the file. Use NotSet to leave unchanged.
metadata_changeset (Union[roboto.updates.MetadataChangeset, roboto.sentinels.NotSetType]) – Metadata changes to apply (add, update, or remove fields/tags). Use NotSet to leave metadata unchanged.
ingestion_complete (Union[Literal[True], roboto.sentinels.NotSetType]) – Set to True to mark the file as fully ingested. Use NotSet to leave ingestion status unchanged.

Returns:

Updated File instance with the new properties.

Raises:

RobotoUnauthorizedException – Caller lacks permission to update the file.

Return type:

File

Examples

>>> file = File.from_id("file_abc123")
>>> updated_file = file.update(description="Updated sensor data from highway test")
>>> print(updated_file.description)
'Updated sensor data from highway test'

>>> # Update metadata and mark as ingested
>>> from roboto.updates import MetadataChangeset
>>> changeset = MetadataChangeset(put_fields={"processed": True})
>>> updated_file = file.update(
...     metadata_changeset=changeset,
...     ingestion_complete=True
... )

property uri: str#

Storage URI for this file’s content.

Returns the storage location URI where the file’s actual content is stored. This is typically an S3 URI or similar cloud storage reference.

Return type:: str

property version: int#

Version number of this file.

Returns the version number that increments each time the file’s metadata or properties are updated. The file content itself is immutable, but metadata changes create new versions.

Return type:: int

class roboto.domain.files.FileDownloader(roboto_client)#

A utility for downloading Roboto files.

Parameters:: roboto_client (Optional[roboto.http.roboto_client.RobotoClient])

download_files(out_path, files)#

Downloads the specified files to the provided local directory.

The files could come from different datasets. All that is required is that the caller have appropriate file download permissions.

An example use case is downloading files that are results of a search query:

>>> import pathlib
>>> from roboto import RobotoSearch
>>> from roboto.domain.files import FileDownloader
>>>
>>> roboto_search = RobotoSearch()
>>> file_downloader = FileDownloader()
>>>
>>> downloaded = file_downloader.download_files(
...     out_path=pathlib.Path("/dest/path"),
...     files=roboto_search.find_files('tags CONTAINS "CSV"')
... )
>>>
>>> for file, path in downloaded:
...     # Process the file

Parameters:

out_path (pathlib.Path) – Destination directory for the downloaded files. It is created if it doesn’t exist.
files (collections.abc.Iterable[roboto.domain.files.file.File]) – Files to download.

Returns:

A list of (File, Path) tuples, relating each provided file to its download path.

Return type:

list[tuple[roboto.domain.files.file.File, pathlib.Path]]

class roboto.domain.files.FileRecord(/, **data)#

Bases: pydantic.BaseModel

Wire-transmissible representation of a file in the Roboto platform.

FileRecord contains all the metadata and properties associated with a file, including its location, status, ingestion state, and user-defined metadata. This is the data structure used for API communication and persistence.

FileRecord instances are typically created by the platform during file import or upload operations, and are updated as files are processed and modified. The File domain class wraps FileRecord to provide a more convenient interface for file operations.

Parameters:: data (Any)

association_id: str#

property bucket: str#

Return type:: str

created: datetime.datetime#

created_by: str = ''#

description: str | None = None#

device_id: str | None = None#

file_id: str#

fs_type: FSType#

ingestion_status: IngestionStatus#

property key: str#

Return type:: str

metadata: dict[str, Any] = None#

modified: datetime.datetime#

modified_by: str#

name: str#

org_id: str#

origination: str = ''#

parent_id: str | None = None#

relative_path: str#

size: int#

status: FileStatus#

storage_type: FileStorageType#

tags: list[str] = None#

upload_id: str = 'NO_ID'#

uri: str#

version: int#

class roboto.domain.files.FileRecordRequest(/, **data)#

Bases: pydantic.BaseModel

Request payload for upserting a file record.

Used to create or update file metadata records in the platform. This is typically used during file import or metadata update operations.

Parameters:: data (Any)

file_id: str#: Unique identifier for the file.

metadata: dict[str, Any] = None#: Key-value metadata pairs to associate with the file.

tags: list[str] = None#: List of tags to associate with the file for discovery and organization.

class roboto.domain.files.FileStatus#

Bases: str, enum.Enum

Enumeration of possible file status values in the Roboto platform.

File status tracks the lifecycle state of a file from initial upload through to availability for use. This status is managed automatically by the platform and affects file visibility and accessibility.

The typical file lifecycle is: Reserved → Available → (optionally) Deleted.

Available = 'available'#

File upload is complete and the file is ready for use.

Files with this status are visible in dataset listings, searchable through the query system, and available for download and processing by actions.

Deleted = 'deleted'#

File is marked for deletion and is no longer accessible.

Files with this status are not visible in listings and cannot be accessed. This status may be temporary during the deletion process.

Reserved = 'reserved'#

File upload has been initiated but not yet completed.

Files with this status are not yet available for use and are not visible in dataset listings. This is the initial status when an upload begins.

class roboto.domain.files.FileStorageType#

Bases: str, enum.Enum

Enumeration of file storage types in the Roboto platform.

Storage type indicates how the file was added to the platform and affects access patterns and permissions. This information is used internally for credential management and access control.

S3Directory = 'directory'#: This node is a virtual directory.

S3Imported = 'imported'#

File was imported from a read-only customer-managed S3 bucket.

These files remain in the customer’s bucket and are accessed using customer-provided credentials. The customer retains full control over the file storage and access permissions.

S3Uploaded = 'uploaded'#

File was uploaded to a Roboto-managed or customer read/write bucket.

These files were explicitly uploaded through the Roboto platform to either a Roboto-managed bucket or a customer’s bring-your-own read/write bucket. Access is managed through Roboto’s credential system.

class roboto.domain.files.FileTag#

Bases: enum.Enum

Enumeration of system-defined file tag types.

These tags are used internally by the platform for indexing and organizing files. They are automatically applied during file operations and should not be manually modified by users.

CommonPrefix = 'common_prefix'#: Tag containing the common path prefix for files in a batch operation.

DatasetId = 'dataset_id'#: Tag containing the ID of the dataset that contains this file.

OrgId = 'org_id'#: Tag containing the organization ID that owns this file.

TransactionId = 'transaction_id'#: Tag containing the transaction ID for files uploaded in a batch.

class roboto.domain.files.ImportFileRequest(/, **data)#

Bases: pydantic.BaseModel

Request payload for importing an existing file into a dataset.

Used to register files that already exist in storage (such as customer S3 buckets) with the Roboto platform. The file content remains in its original location while metadata is stored in Roboto for discovery and processing.

Parameters:: data (Any)

dataset_id: str#: ID of the dataset to import the file into.

description: str | None = None#: Optional human-readable description of the file.

metadata: dict[str, Any] | None = None#: Optional key-value metadata pairs to associate with the file.

relative_path: str#: Path of the file relative to the dataset root (e.g., logs/session1.bag).

size: int | None = None#: Size of the file in bytes. When importing a single file, you can omit the size, as Roboto will look up the size from the object store. When calling import_batch, you must provide the size explicitly.

tags: list[str] | None = None#: Optional list of tags for file discovery and organization.

uri: str#

//bucket/path/to/file.bag`).

Type:: Storage URI where the file is located (e.g., `s3

class roboto.domain.files.IngestionStatus#

Bases: str, enum.Enum

Enumeration of file ingestion status values in the Roboto platform.

Ingestion status tracks whether a file’s data has been processed and extracted into topics for analysis and visualization. This status determines what platform features are available for the file and whether it can trigger automated workflows.

File ingestion happens as a post-upload processing step. Roboto supports many common robotics log formats (ROS bags, MCAP files, ULOG files, etc.) out-of-the-box. Custom ingestion actions can be written for other formats.

When writing custom ingestion actions, be sure to update the file’s ingestion status to mark it as fully ingested. This enables triggers and other automated workflows that depend on complete ingestion.

Ingested files have first-class visualization support and can be queried through the topic data system.

Ingested = 'ingested'#

All topics from this file have been fully processed and recorded.

Files with this status have complete topic data available for visualization, analysis, and querying. They are eligible for post-ingestion triggers and automated workflows that depend on complete data extraction.

NotIngested = 'not_ingested'#

No topics from this file have been processed or recorded.

Files with this status have not undergone data extraction. They cannot be visualized through the topic system and are not eligible for topic-based triggers or analysis workflows.

PartlyIngested = 'partly_ingested'#

Some but not all topics from this file have been processed.

Files with this status have at least one topic record but ingestion is incomplete. Some visualization and analysis features may be available, but the file is not yet eligible for post-ingestion triggers.

class roboto.domain.files.QueryFilesRequest(/, **data)#

Bases: pydantic.BaseModel

Request payload for querying files with filters.

Used to search for files based on various criteria such as metadata, tags, ingestion status, and other file properties. The filters are applied server-side to efficiently return matching files.

Parameters:: data (Any)

filters: dict[str, Any] = None#: Dictionary of filter criteria to apply when searching for files.

model_config#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class roboto.domain.files.RenameFileRequest(/, **data)#

Bases: pydantic.BaseModel

Request payload for renaming a file within its dataset.

Changes the relative path of a file within its dataset. This updates the file’s logical location but does not move the actual file content in storage.

Parameters:: data (Any)

association_id: str#: ID of the dataset containing the file to rename.

new_path: str#: New relative path for the file within the dataset.

class roboto.domain.files.S3Credentials#

Bases: TypedDict

This interface is driven by botocore.credentials.RefreshableCredentials

access_key: str#

expiry_time: str | None#

region: str#

secret_key: str#

token: str#

class roboto.domain.files.SignedUrlResponse(/, **data)#

Bases: pydantic.BaseModel

Response containing a signed URL for direct file access.

Provides a time-limited URL that allows direct access to file content without requiring Roboto authentication. Used for file downloads and integration with external systems.

Parameters:: data (Any)

url: str#: Signed URL that provides temporary direct access to the file.

class roboto.domain.files.UpdateFileRecordRequest(/, **data)#

Bases: pydantic.BaseModel

Request payload for updating file record properties.

Used to modify file metadata, description, and ingestion status. Only specified fields are updated; others remain unchanged. Uses NotSet sentinel values to distinguish between explicit None values and fields that should not be modified.

Parameters:: data (Any)

description: str | roboto.sentinels.NotSetType | None#: New description for the file, or NotSet to leave unchanged.

ingestion_complete: Literal[True] | roboto.sentinels.NotSetType#: Set to True to mark file as fully ingested, or NotSet to leave unchanged.

metadata_changeset: roboto.updates.MetadataChangeset | roboto.sentinels.NotSetType#: Metadata changes to apply (add, update, or remove fields/tags), or NotSet to leave unchanged.

model_config#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

roboto.domain.files.is_directory(record)#

Parameters:: record (FileRecord | DirectoryRecord)
Return type:: TypeGuard[DirectoryRecord]

roboto.domain.files.is_file(record)#

Parameters:: record (FileRecord | DirectoryRecord)
Return type:: TypeGuard[FileRecord]