roboto.domain.topics.parquet.table_transforms#

Module Contents#

roboto.domain.topics.parquet.table_transforms.add_column(table, name, values, position=0)#
Parameters:
  • table (pyarrow.Table)

  • name (str)

  • values (pyarrow.Array)

  • position (int)

Return type:

pyarrow.Table

roboto.domain.topics.parquet.table_transforms.drop_column(table, column_name)#
Parameters:
  • table (pyarrow.Table)

  • column_name (str)

Return type:

pyarrow.Table

roboto.domain.topics.parquet.table_transforms.enrich_with_logtime_ns(table, log_time_column_name, timestamp)#

Add a normalized log_time column in nanoseconds since Unix epoch to simplify time-based filtering and to maintain a consistent interface with other TopicReaders. Derived from the topic’s CanonicalDataType.Timestamp-type message path.

Parameters:
Return type:

pyarrow.Table

roboto.domain.topics.parquet.table_transforms.extract_timestamp_field(schema, timestamp_message_path)#

Aggregate timestamp info into a helper utility for handling time-based data operations.

Parameters:
Return type:

roboto.domain.topics.parquet.timestamp.Timestamp

roboto.domain.topics.parquet.table_transforms.filter_table_by_logtime_ns(table, timestamp_column_name, start_time=None, end_time=None)#

Filters table rows to include only data within the specified time range.

Parameters:
  • table (pyarrow.Table)

  • timestamp_column_name (str)

  • start_time (Optional[int])

  • end_time (Optional[int])

Return type:

pyarrow.Table

roboto.domain.topics.parquet.table_transforms.scale_logtime(table, log_time_column_name, to_unit)#

Scale the log_time column to the given time unit. Assumes the log_time values are formatted as nanoseconds since Unix epoch.

Parameters:
Return type:

pyarrow.Table

roboto.domain.topics.parquet.table_transforms.should_read_row_group(row_group_metadata, timestamp, start_time=None, end_time=None)#

Determine whether a Parquet row group contains data within the requested time range. Used to short-circuit requesting column chunks from the given row group if not relevant.

Parameters:
Return type:

bool