roboto.domain.topics.parquet.table_transforms#
Module Contents#
- roboto.domain.topics.parquet.table_transforms.compute_time_filter_mask(timestamps, start_time=None, end_time=None)#
Compute a boolean mask indicating which rows fall within the specified time range. Returns None if no time filtering is needed (both start_time and end_time are None).
- Parameters:
timestamps (pyarrow.Array)
start_time (Optional[int])
end_time (Optional[int])
- Return type:
Optional[pyarrow.BooleanArray]
- roboto.domain.topics.parquet.table_transforms.extract_timestamp_field(schema, timestamp_message_path)#
Aggregate timestamp info into a helper utility for handling time-based data operations.
- Parameters:
schema (pyarrow.Schema)
timestamp_message_path (roboto.domain.topics.record.MessagePathRecord)
- Return type:
- roboto.domain.topics.parquet.table_transforms.extract_timestamps(table, timestamp)#
Extract timestamps in nanoseconds since Unix epoch from the table’s timestamp column.
- Parameters:
table (pyarrow.Table)
timestamp (roboto.domain.topics.parquet.timestamp.Timestamp)
- Return type:
pyarrow.Int64Array
- roboto.domain.topics.parquet.table_transforms.resolve_columns(schema, message_paths)#
Build a deduplicated list of column names safe for
read_row_group(columns=...).Children of list-type columns are replaced by their list ancestor’s column name because PyArrow’s prefix-based nested column selection does not work through list wrapper nodes in the physical Parquet schema. Selecting the parent list column already returns its full nested structure.
This is important because the message-path-to-representation mappings returned by the server contain only leaf message paths. For a column like
points: list<struct<x, y>>, onlypoints.xandpoints.yappear in the mapping — the parentpointsrecord is absent. This function derives the correct parent column name from the child’spath_in_schema.Children of struct-type columns are preserved because PyArrow can resolve them via dot-separated prefix matching (e.g.
"position.x"selects thexchild of thepositionstruct).- Parameters:
schema (pyarrow.Schema)
message_paths (collections.abc.Iterable[roboto.domain.topics.record.MessagePathRecord])
- Return type:
list[str]
- roboto.domain.topics.parquet.table_transforms.should_read_row_group(row_group_metadata, timestamp, start_time=None, end_time=None)#
Determine whether a Parquet row group contains data within the requested time range. Used to short-circuit requesting column chunks from the given row group if not relevant.
- Parameters:
row_group_metadata (pyarrow.parquet.RowGroupMetaData)
timestamp (roboto.domain.topics.parquet.timestamp.Timestamp)
start_time (Optional[int])
end_time (Optional[int])
- Return type:
bool