roboto.domain.topics.parquet.table_transforms#
Module Contents#
- roboto.domain.topics.parquet.table_transforms.add_column(table, name, values, position=0)#
- Parameters:
table (pyarrow.Table)
name (str)
values (pyarrow.Array)
position (int)
- Return type:
pyarrow.Table
- roboto.domain.topics.parquet.table_transforms.drop_column(table, column_name)#
- Parameters:
table (pyarrow.Table)
column_name (str)
- Return type:
pyarrow.Table
- roboto.domain.topics.parquet.table_transforms.enrich_with_logtime_ns(table, log_time_column_name, timestamp)#
Add a normalized log_time column in nanoseconds since Unix epoch to simplify time-based filtering and to maintain a consistent interface with other TopicReaders. Derived from the topic’s
CanonicalDataType.Timestamp
-type message path.- Parameters:
table (pyarrow.Table)
log_time_column_name (str)
timestamp (roboto.domain.topics.parquet.timestamp.Timestamp)
- Return type:
pyarrow.Table
- roboto.domain.topics.parquet.table_transforms.extract_timestamp_field(schema, timestamp_message_path)#
Aggregate timestamp info into a helper utility for handling time-based data operations.
- Parameters:
schema (pyarrow.Schema)
timestamp_message_path (roboto.domain.topics.record.MessagePathRecord)
- Return type:
- roboto.domain.topics.parquet.table_transforms.filter_table_by_logtime_ns(table, timestamp_column_name, start_time=None, end_time=None)#
Filters table rows to include only data within the specified time range.
- Parameters:
table (pyarrow.Table)
timestamp_column_name (str)
start_time (Optional[int])
end_time (Optional[int])
- Return type:
pyarrow.Table
- roboto.domain.topics.parquet.table_transforms.scale_logtime(table, log_time_column_name, to_unit)#
Scale the log_time column to the given time unit. Assumes the log_time values are formatted as nanoseconds since Unix epoch.
- Parameters:
table (pyarrow.Table)
log_time_column_name (str)
to_unit (roboto.time.TimeUnit)
- Return type:
pyarrow.Table
- roboto.domain.topics.parquet.table_transforms.should_read_row_group(row_group_metadata, timestamp, start_time=None, end_time=None)#
Determine whether a Parquet row group contains data within the requested time range. Used to short-circuit requesting column chunks from the given row group if not relevant.
- Parameters:
row_group_metadata (pyarrow.parquet.RowGroupMetaData)
timestamp (roboto.domain.topics.parquet.timestamp.Timestamp)
start_time (Optional[int])
end_time (Optional[int])
- Return type:
bool