Skip to content

Select Queries

Rayforce-Py allows you to build select queries to the tables. You can select specific columns or compute new ones, the power is limitless!

The main approach to build the queries is to utilise chainable API, which reads left-to-right and builds the lazy query - until it's executed.

Selecting Uncomputed Columns

Just provide a strings as a non-keyword arguments to the select() method to fetch them.

>>> result = table.select("id", "name", "age").execute()

If you wish to select all table columns which are available - pass "*", just like with SQL.

>>> result = table.select("*").execute()

Selecting Computed Columns

You can select new computed columns using expressions:

>>> result = table.select(
        "id",
        total=Column("price") * Column("quantity"),
    ).execute()

The computed columns are named using keyword arguments, where the key is the column name and the value is the expression. Expression can be of any type, which can be assigned to the column. You can do math operations with the column, or perform more complex aggregations, which you can read about below.

You need to use Column class to utilise it in computations or aggregations

Using Aggregations

Rayforce-Py provides a number of aggregations you may use when querying computed columns

Operation Description
sum() Calculates the sum of the selected column
mean()/ avg() Calculates the average value of the selected column
median() Calculates the median value of the selected column
min() Calculates the min value of the selected column
max() Calculates the max value of the selected column
count() Calculates the count of the entries for selected column
first() Retrieves the first value for selected column
last() Retrieves the last value for selected column
distinct() Gets the distinct value of the column

For Example:

>>> scores = Table({
        "student_id": Vector([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ray_type=I64),
        "score": Vector([85, 92, 78, 96, 88, 91, 83, 95, 87, 90], ray_type=I64),
        "subject": Vector(
            ["Math", "Math", "Science", "Science", "Math",
            "Science", "Math", "Science", "Math", "Chemistry"],
            ray_type=Symbol,
        ),
    })

>>> stats = scores.select(
        total_students=Column("student_id").count(),      # Count: number of records
        sum_scores=Column("score").sum(),                 # Sum: total of all scores
        avg_score=Column("score").mean(),                 # Mean: average score
        median_score=Column("score").median(),            # Median: middle value
        min_score=Column("score").min(),                  # Min: lowest score
        max_score=Column("score").max(),                  # Max: highest score
        first_student=Column("student_id").first(),       # First: first student ID
        last_subject=Column("subject").last()             # Last: last subject
    ).execute()
┌────────────────┬────────────┬───────────┬──────────────┬───────────┬───────────┬───────────────┬──────────────┐
 total_students  sum_scores  avg_score  median_score  min_score  max_score  first_student  last_subject 
├────────────────┼────────────┼───────────┼──────────────┼───────────┼───────────┼───────────────┼──────────────┤
 10              885         88.50      89.00         78         96         1              Chemistry    
├────────────────┴────────────┴───────────┴──────────────┴───────────┴───────────┴───────────────┴──────────────┤
 1 rows (1 shown) 8 columns (8 shown)                                                                          
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

>>> unique_subjects = scores.select(
        subject=Column("subject").distinct()              # Distinct: unique values
    ).execute()
┌──────────────────────────────────────┐
 subject                              
├──────────────────────────────────────┤
 Math                                 
 Science                              
 Chemistry                            
├──────────────────────────────────────┤
 3 rows (3 shown) 1 columns (1 shown) 
└──────────────────────────────────────┘

Using Filters

Filters are quite powerful in Rayforce-Py. You can implement them using .where() statement addressing the previous .select()

>>> result = table.select("id", "name", "age").where(Column("age") >= 35).execute()

Library supports all standard comparison operators:

  • == - Equal to
  • != - Not equal to
  • > - Greater than
  • >= - Greater than or equal to
  • < - Less than
  • <= - Less than or equal to

IN Operator

Check if a value is in a list:

>>> result = table.where(Column("dept").isin(["eng", "marketing"])).execute()

Just as with computed columns, where statements have to be an expression, addressing a specific computed or uncomputed column.

More complex example

>>> result = table.select("id", "name", "age").where(
        (Column("age") >= 20) & Column("department").isin("IT", "HR", "Marketing")
    ).execute()

Filtered Aggregations

You can apply filters to aggregations using the where() method on columns:

>>> result = (
        table
        .select(
            total=Column("amount").sum(),
            active_total=Column("amount").where(Column("status") == "active").sum(),
            count=Column("amount").count(),
        )
        .by("category")
        .execute()
    )

This computes the total for all rows, but only sums active rows for active_total.