Apache Parquet Integration¶

Beta Feature

Parquet support is currently in beta and may behave unexpectedly. Please report any issues you encounter.

Rayforce-py provides support for reading Apache Parquet files directly into Rayforce Tables using the load_parquet() function.

Installation¶

Parquet support requires PyArrow. Install it with:

pip install rayforce-py[parquet]

Or install PyArrow directly:

pip install pyarrow>=10.0.0

Quick Start¶

from rayforce.plugins.parquet import load_parquet

# Load a parquet file into a Rayforce Table
table = load_parquet("data.parquet")

# Access columns
print(table.columns())
print(table.at_column("id"))

API Reference¶

`load_parquet(path: str) -> Table`¶

Reads a Parquet file and returns a Rayforce Table.

Parameters:

Parameter	Type	Description
`path`	`str`	Path to the Parquet file

Returns: A Table object containing the data from the Parquet file.

Raises:

ImportError: If PyArrow is not installed
ParquetConversionError: If a column type cannot be converted

Type Mapping¶

Parquet/Arrow types are automatically mapped to Rayforce types:

Arrow Type	Rayforce Type
`bool`	`B8`
`int8`, `uint8`	`U8`
`int16`, `uint16`	`I16`
`int32`, `uint32`	`I32`
`int64`, `uint64`	`I64`
`float32`, `float64`	`F64`
`string`, `large_string`	`String`
`timestamp`, `date64`	`Timestamp`
`date32`	`Date`
`time32`	`Time`
Other types	`String` (fallback)

Performance¶

The Parquet reader uses zero-copy data access where possible for optimal performance. For the following types, data is read directly from Arrow buffers without copying:

I16, I32, I64
F64
B8, U8
Timestamp
String

For unsupported types or when zero-copy is not possible, the reader falls back to converting via Python lists.