Table Joins¶
Rayforce-Py provides a useful interface to the Rayforce table joins.
Join operations work are lazy-loaded, and performed only after .execute() statement.
Inner Join¶
Inner join combines two tables based on matching values in specified columns. Only rows with matching keys in both tables are included in the result.
>>> from rayforce import Table, Vector, Symbol, Time, I64, F64
>>> trades = Table({
"sym": Vector(items=["AAPL", "AAPL", "GOOG", "GOOG"], ray_type=Symbol),
"time": Vector(
items=[
Time("09:00:00.100"),
Time("09:00:00.200"),
Time("09:00:00.150"),
Time("09:00:00.250"),
],
ray_type=Time,
),
"price": Vector(items=[100, 200, 300, 400], ray_type=I64),
})
>>> quotes = Table({
"sym": Vector(items=["AAPL", "GOOG"], ray_type=Symbol),
"bid": Vector(items=[50, 100], ray_type=I64),
"ask": Vector(items=[75, 150], ray_type=I64),
})
>>> result = trades.inner_join(quotes, on="sym").execute()
┌──────┬──────────────┬───────┬─────┬─────┐
│ sym │ time │ price │ bid │ ask │
├──────┼──────────────┼───────┼─────┼─────┤
│ AAPL │ 09:00:00.100 │ 100 │ 50 │ 75 │
│ AAPL │ 09:00:00.100 │ 100 │ 50 │ 75 │
│ GOOG │ 09:00:00.200 │ 200 │ 100 │ 150 │
│ GOOG │ 09:00:00.200 │ 200 │ 100 │ 150 │
├──────┴──────────────┴───────┴─────┴─────┤
│ 4 rows (4 shown) 5 columns (5 shown) │
└─────────────────────────────────────────┘
If you wish to join on multiple columns, provide a list of strings
Left Join¶
Left join returns all rows from the left table and matching rows from the right table. If there's no match, the right table columns will have null values.
>>> from rayforce import Table, Vector, Symbol, Time, I64, F64
>>> trades = Table({
"sym": Vector(items=["AAPL", "MSFT", "GOOG"], ray_type=Symbol),
"time": Vector(
items=[
Time("09:00:00.100"),
Time("09:00:00.200"),
Time("09:00:00.150"),
],
ray_type=Time,
),
"price": Vector(items=[100, 200, 300], ray_type=I64),
})
>>> quotes = Table({
"sym": Vector(items=["AAPL", "GOOG"], ray_type=Symbol),
"bid": Vector(items=[50, 100], ray_type=I64),
"ask": Vector(items=[75, 150], ray_type=I64),
})
>>> result = trades.left_join(quotes, on="sym").execute()
If you wish to join on multiple columns, provide a list of strings:
As-Of Join¶
As-of join performs a value-based join where each row from the left table is matched with the most closest row from the right table that occurred at or before the left table's value, based on matching keys.
>>> from rayforce import Table, Vector, Symbol, Time, I64
>>> trades = Table({
"sym": Vector(items=["AAPL", "AAPL", "GOOGL", "GOOGL"], ray_type=Symbol),
"time": Vector(
items=[
Time("09:00:00.100"), # 100ms
Time("09:00:00.200"), # 200ms
Time("09:00:00.150"), # 150ms
Time("09:00:00.250"), # 250ms
],
ray_type=Time,
),
"price": Vector(items=[100, 200, 300, 400], ray_type=I64),
})
>>> quotes = Table({
"sym": Vector(items=["AAPL", "AAPL", "AAPL", "GOOGL", "GOOGL", "GOOGL"], ray_type=Symbol),
"time": Vector(
items=[
Time("09:00:00.050"), # 50ms - before first AAPL trade
Time("09:00:00.150"), # 150ms - between AAPL trades
Time("09:00:00.250"), # 250ms - after second AAPL trade
Time("09:00:00.100"), # 100ms - before first GOOGL trade
Time("09:00:00.200"), # 200ms - between GOOGL trades
Time("09:00:00.300"), # 300ms - after second GOOGL trade
],
ray_type=Time,
),
"bid": Vector(items=[45, 55, 65, 95, 105, 115], ray_type=I64),
"ask": Vector(items=[70, 80, 90, 120, 130, 140], ray_type=I64),
})
>>> result = trades.asof_join(quotes, on=["sym", "time"]).execute()
Window Join¶
Window join matches records on specified columns and aggregates values from another table within time windows. This is useful for financial data where you want to aggregate quotes that occurred near each trade.
>>> from rayforce import Table, TableColumnInterval, Vector, Symbol, Time, F64
>>> trades = Table({
"sym": Vector(items=["AAPL", "GOOG"], ray_type=Symbol),
"time": Vector(
items=[Time("09:00:00.100"), Time("09:00:00.100")],
ray_type=Time,
),
"price": Vector(items=[150.0, 200.0], ray_type=F64),
})
>>> quotes = Table({
"sym": Vector(items=["AAPL", "AAPL", "AAPL", "GOOG", "GOOG", "GOOG"], ray_type=Symbol),
"time": Vector(
items=[
Time("09:00:00.090"),
Time("09:00:00.095"),
Time("09:00:00.105"),
Time("09:00:00.090"),
Time("09:00:00.095"),
Time("09:00:00.105"),
],
ray_type=Time,
),
"bid": Vector(items=[99.0, 100.0, 101.0, 199.0, 200.0, 201.0], ray_type=F64),
"ask": Vector(items=[109.0, 110.0, 111.0, 209.0, 210.0, 211.0], ray_type=F64),
})
>>> interval = TableColumnInterval(
lower=-10,
upper=10,
table=trades,
column=Column("time"),
)
>>> result = trades.window_join(
on=["sym", "time"],
interval=interval,
join_with=[quotes],
min_bid=Column("bid").min(),
max_ask=Column("ask").max(),
).execute()
┌──────┬──────────────┬────────┬─────────┬─────────┐
│ sym │ time │ price │ min_bid │ max_ask │
├──────┼──────────────┼────────┼─────────┼─────────┤
│ AAPL │ 09:00:00.100 │ 150.00 │ 99.00 │ 111.00 │
│ GOOG │ 09:00:00.100 │ 200.00 │ 199.00 │ 211.00 │
└──────┴──────────────┴────────┴─────────┴─────────┘
For the AAPL trade at 09:00:00.100, the window [090ms, 110ms] captures quotes at 90ms, 95ms, and 105ms, aggregating them to find min_bid=99.0 and max_ask=111.0.
Window Join vs Window Join 1¶
window_join(): Uses open intervals (excludes boundaries)window_join1(): Uses closed intervals (includes boundaries)