AuthorisedDataFrame
Row-level security for pandas DataFrames.
AuthorisedDataFrame
DataFrame wrapper that only contains rows a user is authorised to see.
Resolves a user's email domain to departments via a mapping dict, then filters the data so only authorised rows are accessible.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
segments
|
dict[str, DataFrame]
|
Pre-segmented data as |
required |
user
|
User
|
Authenticated User object from cognito_auth. |
required |
domain_mapping
|
dict[str, list[str]]
|
Maps email domains to lists of department names,
e.g. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
user |
The authenticated user this frame is filtered for. |
|
departments |
The departments this user can access, or None. |
|
has_access |
Whether the user has any department mapping. |
|
df |
The filtered DataFrame. Only contains authorised rows. |
Source code in src/cognito_auth/df.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | |
Functions
__init__(segments, user, domain_mapping)
Source code in src/cognito_auth/df.py
prepare(df, auth_column, domain_mapping)
classmethod
Pre-segment a DataFrame for repeated per-user filtering.
Call this once at app startup to segment the data by
auth_column. Then call .for_user(user) on the result
for each request to get a filtered :class:AuthorisedDataFrame.
Example::
# Startup
spending = AuthorisedDataFrame.prepare(df, "department", MAPPING)
# Per request
secure = spending.for_user(user)
secure.df # filtered
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
The full unfiltered DataFrame. |
required |
auth_column
|
str
|
Column name to segment on (e.g. |
required |
domain_mapping
|
dict[str, list[str]]
|
Domain-to-departments mapping dict. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
A |
PreparedDataFrame
|
class: |
Source code in src/cognito_auth/df.py
to_store()
Serialise for Dash dcc.Store -- filtered data + user context.
Returns a dict suitable for writing directly to a dcc.Store
component. Downstream render callbacks can read from the store
without needing access to auth or the raw data.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dict with keys:
- |
Source code in src/cognito_auth/df.py
PreparedDataFrame
Pre-segmented data ready to be filtered per user.
Created by :meth:AuthorisedDataFrame.prepare. Call
:meth:for_user to get a filtered :class:AuthorisedDataFrame
for a specific user.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
segments
|
dict[str, DataFrame]
|
Pre-segmented data as |
required |
domain_mapping
|
dict[str, list[str]]
|
Domain-to-departments mapping dict. |
required |
Source code in src/cognito_auth/df.py
Functions
for_user(user)
Create a filtered DataFrame for a specific user.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user
|
User
|
Authenticated User from cognito_auth. |
required |
Returns:
| Type | Description |
|---|---|
AuthorisedDataFrame
|
class: |
AuthorisedDataFrame
|
user is authorised to see. |
Source code in src/cognito_auth/df.py
Installation
AuthorisedDataFrame requires the [df] extra:
Quick Start
import pandas as pd
from cognito_auth.dash import DashAuth
from cognito_auth.df import AuthorisedDataFrame
auth = DashAuth()
auth.protect_app(app)
DOMAIN_MAPPING = {
"cabinetoffice.gov.uk": ["Cabinet Office"],
"homeoffice.gov.uk": ["Home Office"],
"hmrc.gov.uk": ["HMRC"],
}
# Startup -- prepare once, segment the data by department
df = pd.read_csv("data/spending.csv")
spending = AuthorisedDataFrame.prepare(df, "department", DOMAIN_MAPPING)
Store Pattern (small datasets)
For small datasets or pages with a single rendering callback, use to_store()
to push filtered data through a dcc.Store. One callback handles auth and
writes to the Store; downstream callbacks read from it without any auth
awareness:
@app.callback(
Output("filtered-data", "data"),
Input("trigger", "n_intervals"),
)
def filter_and_store(_n):
user = auth.get_auth_user()
secure = spending.for_user(user)
if not secure.has_access:
return None
return secure.to_store()
to_store() returns a dict with records, user_name, user_email,
departments, and has_access -- ready to write to dcc.Store. Downstream
callbacks read from the Store:
@app.callback(
Output("table", "data"),
Input("filtered-data", "data"),
)
def render_table(data):
if not data or not data["has_access"]:
return []
return data["records"]
Performance consideration
to_store() serialises the entire filtered DataFrame as JSON and sends it
to the browser. Every downstream callback that reads from the Store
receives the full payload back from the browser on each interaction.
For datasets larger than a few hundred rows, or pages with multiple
cascading callbacks, this can cause noticeable latency. Use the
Direct Pattern below instead.
Direct Pattern (larger datasets)
For larger datasets or pages with cascading callbacks (e.g. filter dropdowns
that trigger chart updates), call auth.get_auth_user() and
prepared.for_user(user) directly in each callback. This keeps all data
server-side -- nothing is serialised to the browser.
Since for_user() is O(k) dict lookups (where k is the number of departments
the user maps to), the per-callback overhead is negligible:
prepared = AuthorisedDataFrame.prepare(df, "department", DOMAIN_MAPPING)
def _get_user_df():
"""Get the auth-filtered DataFrame for the current request user."""
user = auth.get_auth_user()
return prepared.for_user(user)
@app.callback(
Output("filter-dept", "options"),
Input("url", "pathname"),
)
def update_dept_options(pathname):
secure = _get_user_df()
if not secure.has_access:
return []
return [
{"label": d, "value": d}
for d in sorted(secure.df["department"].dropna().unique())
]
@app.callback(
Output("chart", "children"),
[Input("filter-dept", "value"), Input("filter-metric", "value")],
)
def update_chart(dept_vals, metric):
secure = _get_user_df()
if not secure.has_access:
return html.Div("No data available for your department.")
dff = secure.df
if dept_vals:
dff = dff[dff["department"].isin(dept_vals)]
# ... build chart from dff ...
A shared _get_user_df() helper keeps the auth + filtering logic in one
place. Each callback gets a fresh, already-filtered DataFrame without any
JSON serialisation round-trip.
Choosing a Pattern
| Pattern | Best for | Tradeoff |
|---|---|---|
| Store | Small datasets, single render callback | Full data round-trips through the browser as JSON |
| Direct | Larger datasets, cascading callbacks | auth.get_auth_user() called per callback (negligible cost) |
Both patterns enforce the same security boundary: users only ever see rows for their authorised departments, and admin users see everything.
DataModel: Multiple DataFrames
For apps with multiple data sources, prepare each one at startup:
class DashboardDataModel:
def __init__(self, spending_df, forecast_df, domain_mapping):
self.spending = AuthorisedDataFrame.prepare(
spending_df, "department", domain_mapping
)
self.forecasts = AuthorisedDataFrame.prepare(
forecast_df, "department", domain_mapping
)
Then in a callback, call .for_user() on each:
@app.callback(...)
def update_dashboard(dept_vals, metric):
user = auth.get_auth_user()
secure_spending = data_model.spending.for_user(user)
secure_forecasts = data_model.forecasts.for_user(user)
if not secure_spending.has_access:
return html.Div("No data available for your department.")
# ... build dashboard from secure_spending.df, secure_forecasts.df ...
DataFrames can use different column names -- just specify the column in prepare():
# One dataset uses "department", another uses "OrganisationSubmitter"
self.assessments = AuthorisedDataFrame.prepare(
assessments_df, "department", domain_mapping
)
self.spend = AuthorisedDataFrame.prepare(
spend_df, "OrganisationSubmitter", domain_mapping
)
How It Works
prepare(): Segments the DataFrame by the auth column usinggroupby-- this happens once at startup.for_user(): Resolves the user'semail_domainto departments via the mapping, then picks the matching segments via dict lookup -- O(1) per department..df: The resulting DataFrame contains only authorised rows. There is no way to access unfiltered data through the wrapper.
Admin users (user.is_admin) automatically get access to all departments in the mapping.
Domain Mapping
The domain_mapping dict maps email domains to department names that match your data:
DOMAIN_MAPPING = {
"cabinetoffice.gov.uk": ["Cabinet Office"],
"digital.cabinet-office.gov.uk": ["Cabinet Office"],
"homeoffice.gov.uk": ["Home Office"],
"hmrc.gov.uk": ["HMRC"],
}
- Multiple domains can map to the same department
- A single domain can map to multiple departments
- Unmapped domains result in
has_access = Falseand an empty DataFrame