roboto.domain.topics.parquet.arrow_to_roboto#
Module Contents#
- roboto.domain.topics.parquet.arrow_to_roboto.arrow_type_to_canonical_type(arrow_type)#
- Parameters:
arrow_type (pyarrow.DataType)
- Return type:
- roboto.domain.topics.parquet.arrow_to_roboto.compute_boolean_statistics(data)#
- Parameters:
data (Union[pyarrow.Array, pyarrow.ChunkedArray])
- Return type:
dict[str, Any]
- roboto.domain.topics.parquet.arrow_to_roboto.compute_dictionary_metadata(column_name, data, max_dictionary_size=2048)#
- Parameters:
column_name (str)
data (Union[pyarrow.Array, pyarrow.ChunkedArray])
max_dictionary_size (int)
- Return type:
dict[str, Any]
- roboto.domain.topics.parquet.arrow_to_roboto.compute_numeric_statistics(data)#
- Parameters:
data (Union[pyarrow.Array, pyarrow.ChunkedArray])
- Return type:
dict[str, Any]
- roboto.domain.topics.parquet.arrow_to_roboto.field_to_message_path_request(field, parquet_file, timestamp)#
- Parameters:
field (pyarrow.Field)
parquet_file (roboto.domain.topics.parquet.parquet_parser.ParquetParser)
timestamp (roboto.domain.topics.parquet.timestamp.TimestampInfo)
- Return type:
- roboto.domain.topics.parquet.arrow_to_roboto.generate_metadata_for_field(field, parquet_parser, timestamp)#
- Parameters:
field (pyarrow.Field)
parquet_parser (roboto.domain.topics.parquet.parquet_parser.ParquetParser)
timestamp (roboto.domain.topics.parquet.timestamp.TimestampInfo)
- Return type:
dict[str, Any]
- roboto.domain.topics.parquet.arrow_to_roboto.logger#
- roboto.domain.topics.parquet.arrow_to_roboto.sanitize_column_name(field)#
- Parameters:
field (pyarrow.Field)
- Return type:
str