Skip to content

AuthorisedDataFrame

Row-level security for pandas DataFrames.

AuthorisedDataFrame

DataFrame wrapper that only contains rows a user is authorised to see.

Resolves a user's email domain to departments via a mapping dict, then filters the data so only authorised rows are accessible.

Parameters:

Name Type Description Default
segments dict[str, DataFrame]

Pre-segmented data as {department_name: DataFrame}. Create with dict(tuple(df.groupby(auth_column))).

required
user User

Authenticated User object from cognito_auth.

required
domain_mapping dict[str, list[str]]

Maps email domains to lists of department names, e.g. {"cabinetoffice.gov.uk": ["Cabinet Office"]}.

required

Attributes:

Name Type Description
user

The authenticated user this frame is filtered for.

departments

The departments this user can access, or None.

has_access

Whether the user has any department mapping.

df

The filtered DataFrame. Only contains authorised rows.

Source code in src/cognito_auth/df.py
class AuthorisedDataFrame:
    """DataFrame wrapper that only contains rows a user is authorised to see.

    Resolves a user's email domain to departments via a mapping dict,
    then filters the data so only authorised rows are accessible.

    Args:
        segments: Pre-segmented data as ``{department_name: DataFrame}``.
            Create with ``dict(tuple(df.groupby(auth_column)))``.
        user: Authenticated User object from cognito_auth.
        domain_mapping: Maps email domains to lists of department names,
            e.g. ``{"cabinetoffice.gov.uk": ["Cabinet Office"]}``.

    Attributes:
        user: The authenticated user this frame is filtered for.
        departments: The departments this user can access, or None.
        has_access: Whether the user has any department mapping.
        df: The filtered DataFrame. Only contains authorised rows.
    """

    def __init__(
        self,
        segments: dict[str, pd.DataFrame],
        user: User,
        domain_mapping: dict[str, list[str]],
    ) -> None:
        self.user = user
        self.departments = self._resolve(user, domain_mapping)
        self.has_access = self.departments is not None

        if not self.has_access or not self.departments:
            # Get columns from the first segment if available, else empty
            sample = next(iter(segments.values()), pd.DataFrame())
            self.df = pd.DataFrame(columns=sample.columns)
            return

        matching = [segments[d] for d in self.departments if d in segments]
        if not matching:
            sample = next(iter(segments.values()), pd.DataFrame())
            self.df = pd.DataFrame(columns=sample.columns)
        elif len(matching) == 1:
            self.df = matching[0]
        else:
            self.df = pd.concat(matching, ignore_index=True)

    @staticmethod
    def _resolve(user: User, mapping: dict[str, list[str]]) -> list[str] | None:
        """Resolve a user to their authorised departments.

        Admin users (``user.is_admin``) get access to all departments
        in the mapping. Standard users are looked up by email domain.

        Args:
            user: Authenticated User from cognito_auth.
            mapping: Domain-to-departments mapping dict.

        Returns:
            Sorted list of department names, or None if unmapped.
        """
        if user.is_admin:
            return sorted({d for depts in mapping.values() for d in depts})
        return mapping.get(user.email_domain)

    def to_store(self) -> dict[str, Any]:
        """Serialise for Dash ``dcc.Store`` -- filtered data + user context.

        Returns a dict suitable for writing directly to a ``dcc.Store``
        component. Downstream render callbacks can read from the store
        without needing access to auth or the raw data.

        Returns:
            Dict with keys:
                - ``records``: list of row dicts (empty if no access)
                - ``user_name``: user's display name
                - ``user_email``: user's email address
                - ``departments``: list of authorised department names
                - ``has_access``: whether the user has a department mapping
        """
        return {
            "records": self.df.to_dict("records") if self.has_access else [],
            "user_name": self.user.name,
            "user_email": self.user.email,
            "departments": self.departments,
            "has_access": self.has_access,
        }

    @classmethod
    def prepare(
        cls,
        df: pd.DataFrame,
        auth_column: str,
        domain_mapping: dict[str, list[str]],
    ) -> PreparedDataFrame:
        """Pre-segment a DataFrame for repeated per-user filtering.

        Call this once at app startup to segment the data by
        ``auth_column``. Then call ``.for_user(user)`` on the result
        for each request to get a filtered :class:`AuthorisedDataFrame`.

        Example::

            # Startup
            spending = AuthorisedDataFrame.prepare(df, "department", MAPPING)

            # Per request
            secure = spending.for_user(user)
            secure.df  # filtered

        Args:
            df: The full unfiltered DataFrame.
            auth_column: Column name to segment on (e.g. ``"department"``).
            domain_mapping: Domain-to-departments mapping dict.

        Returns:
            A :class:`PreparedDataFrame` ready for ``.for_user()`` calls.
        """
        segments = dict(tuple(df.groupby(auth_column)))
        return PreparedDataFrame(segments, domain_mapping)

Functions

__init__(segments, user, domain_mapping)

Source code in src/cognito_auth/df.py
def __init__(
    self,
    segments: dict[str, pd.DataFrame],
    user: User,
    domain_mapping: dict[str, list[str]],
) -> None:
    self.user = user
    self.departments = self._resolve(user, domain_mapping)
    self.has_access = self.departments is not None

    if not self.has_access or not self.departments:
        # Get columns from the first segment if available, else empty
        sample = next(iter(segments.values()), pd.DataFrame())
        self.df = pd.DataFrame(columns=sample.columns)
        return

    matching = [segments[d] for d in self.departments if d in segments]
    if not matching:
        sample = next(iter(segments.values()), pd.DataFrame())
        self.df = pd.DataFrame(columns=sample.columns)
    elif len(matching) == 1:
        self.df = matching[0]
    else:
        self.df = pd.concat(matching, ignore_index=True)

prepare(df, auth_column, domain_mapping) classmethod

Pre-segment a DataFrame for repeated per-user filtering.

Call this once at app startup to segment the data by auth_column. Then call .for_user(user) on the result for each request to get a filtered :class:AuthorisedDataFrame.

Example::

# Startup
spending = AuthorisedDataFrame.prepare(df, "department", MAPPING)

# Per request
secure = spending.for_user(user)
secure.df  # filtered

Parameters:

Name Type Description Default
df DataFrame

The full unfiltered DataFrame.

required
auth_column str

Column name to segment on (e.g. "department").

required
domain_mapping dict[str, list[str]]

Domain-to-departments mapping dict.

required

Returns:

Name Type Description
A PreparedDataFrame

class:PreparedDataFrame ready for .for_user() calls.

Source code in src/cognito_auth/df.py
@classmethod
def prepare(
    cls,
    df: pd.DataFrame,
    auth_column: str,
    domain_mapping: dict[str, list[str]],
) -> PreparedDataFrame:
    """Pre-segment a DataFrame for repeated per-user filtering.

    Call this once at app startup to segment the data by
    ``auth_column``. Then call ``.for_user(user)`` on the result
    for each request to get a filtered :class:`AuthorisedDataFrame`.

    Example::

        # Startup
        spending = AuthorisedDataFrame.prepare(df, "department", MAPPING)

        # Per request
        secure = spending.for_user(user)
        secure.df  # filtered

    Args:
        df: The full unfiltered DataFrame.
        auth_column: Column name to segment on (e.g. ``"department"``).
        domain_mapping: Domain-to-departments mapping dict.

    Returns:
        A :class:`PreparedDataFrame` ready for ``.for_user()`` calls.
    """
    segments = dict(tuple(df.groupby(auth_column)))
    return PreparedDataFrame(segments, domain_mapping)

to_store()

Serialise for Dash dcc.Store -- filtered data + user context.

Returns a dict suitable for writing directly to a dcc.Store component. Downstream render callbacks can read from the store without needing access to auth or the raw data.

Returns:

Type Description
dict[str, Any]

Dict with keys: - records: list of row dicts (empty if no access) - user_name: user's display name - user_email: user's email address - departments: list of authorised department names - has_access: whether the user has a department mapping

Source code in src/cognito_auth/df.py
def to_store(self) -> dict[str, Any]:
    """Serialise for Dash ``dcc.Store`` -- filtered data + user context.

    Returns a dict suitable for writing directly to a ``dcc.Store``
    component. Downstream render callbacks can read from the store
    without needing access to auth or the raw data.

    Returns:
        Dict with keys:
            - ``records``: list of row dicts (empty if no access)
            - ``user_name``: user's display name
            - ``user_email``: user's email address
            - ``departments``: list of authorised department names
            - ``has_access``: whether the user has a department mapping
    """
    return {
        "records": self.df.to_dict("records") if self.has_access else [],
        "user_name": self.user.name,
        "user_email": self.user.email,
        "departments": self.departments,
        "has_access": self.has_access,
    }

PreparedDataFrame

Pre-segmented data ready to be filtered per user.

Created by :meth:AuthorisedDataFrame.prepare. Call :meth:for_user to get a filtered :class:AuthorisedDataFrame for a specific user.

Parameters:

Name Type Description Default
segments dict[str, DataFrame]

Pre-segmented data as {department_name: DataFrame}.

required
domain_mapping dict[str, list[str]]

Domain-to-departments mapping dict.

required
Source code in src/cognito_auth/df.py
class PreparedDataFrame:
    """Pre-segmented data ready to be filtered per user.

    Created by :meth:`AuthorisedDataFrame.prepare`. Call
    :meth:`for_user` to get a filtered :class:`AuthorisedDataFrame`
    for a specific user.

    Args:
        segments: Pre-segmented data as ``{department_name: DataFrame}``.
        domain_mapping: Domain-to-departments mapping dict.
    """

    def __init__(
        self,
        segments: dict[str, pd.DataFrame],
        domain_mapping: dict[str, list[str]],
    ) -> None:
        self._segments = segments
        self._domain_mapping = domain_mapping

    def for_user(self, user: User) -> AuthorisedDataFrame:
        """Create a filtered DataFrame for a specific user.

        Args:
            user: Authenticated User from cognito_auth.

        Returns:
            :class:`AuthorisedDataFrame` containing only rows the
            user is authorised to see.
        """
        return AuthorisedDataFrame(self._segments, user, self._domain_mapping)

Functions

for_user(user)

Create a filtered DataFrame for a specific user.

Parameters:

Name Type Description Default
user User

Authenticated User from cognito_auth.

required

Returns:

Type Description
AuthorisedDataFrame

class:AuthorisedDataFrame containing only rows the

AuthorisedDataFrame

user is authorised to see.

Source code in src/cognito_auth/df.py
def for_user(self, user: User) -> AuthorisedDataFrame:
    """Create a filtered DataFrame for a specific user.

    Args:
        user: Authenticated User from cognito_auth.

    Returns:
        :class:`AuthorisedDataFrame` containing only rows the
        user is authorised to see.
    """
    return AuthorisedDataFrame(self._segments, user, self._domain_mapping)

Installation

AuthorisedDataFrame requires the [df] extra:

pip install cognito-auth[df]

# Or with Dash support:
pip install cognito-auth[dash,df]

Quick Start

import pandas as pd
from cognito_auth.dash import DashAuth
from cognito_auth.df import AuthorisedDataFrame

auth = DashAuth()
auth.protect_app(app)

DOMAIN_MAPPING = {
    "cabinetoffice.gov.uk": ["Cabinet Office"],
    "homeoffice.gov.uk": ["Home Office"],
    "hmrc.gov.uk": ["HMRC"],
}

# Startup -- prepare once, segment the data by department
df = pd.read_csv("data/spending.csv")
spending = AuthorisedDataFrame.prepare(df, "department", DOMAIN_MAPPING)

Store Pattern (small datasets)

For small datasets or pages with a single rendering callback, use to_store() to push filtered data through a dcc.Store. One callback handles auth and writes to the Store; downstream callbacks read from it without any auth awareness:

@app.callback(
    Output("filtered-data", "data"),
    Input("trigger", "n_intervals"),
)
def filter_and_store(_n):
    user = auth.get_auth_user()
    secure = spending.for_user(user)

    if not secure.has_access:
        return None

    return secure.to_store()

to_store() returns a dict with records, user_name, user_email, departments, and has_access -- ready to write to dcc.Store. Downstream callbacks read from the Store:

@app.callback(
    Output("table", "data"),
    Input("filtered-data", "data"),
)
def render_table(data):
    if not data or not data["has_access"]:
        return []
    return data["records"]

Performance consideration

to_store() serialises the entire filtered DataFrame as JSON and sends it to the browser. Every downstream callback that reads from the Store receives the full payload back from the browser on each interaction. For datasets larger than a few hundred rows, or pages with multiple cascading callbacks, this can cause noticeable latency. Use the Direct Pattern below instead.

Direct Pattern (larger datasets)

For larger datasets or pages with cascading callbacks (e.g. filter dropdowns that trigger chart updates), call auth.get_auth_user() and prepared.for_user(user) directly in each callback. This keeps all data server-side -- nothing is serialised to the browser.

Since for_user() is O(k) dict lookups (where k is the number of departments the user maps to), the per-callback overhead is negligible:

prepared = AuthorisedDataFrame.prepare(df, "department", DOMAIN_MAPPING)


def _get_user_df():
    """Get the auth-filtered DataFrame for the current request user."""
    user = auth.get_auth_user()
    return prepared.for_user(user)


@app.callback(
    Output("filter-dept", "options"),
    Input("url", "pathname"),
)
def update_dept_options(pathname):
    secure = _get_user_df()
    if not secure.has_access:
        return []
    return [
        {"label": d, "value": d}
        for d in sorted(secure.df["department"].dropna().unique())
    ]


@app.callback(
    Output("chart", "children"),
    [Input("filter-dept", "value"), Input("filter-metric", "value")],
)
def update_chart(dept_vals, metric):
    secure = _get_user_df()
    if not secure.has_access:
        return html.Div("No data available for your department.")

    dff = secure.df
    if dept_vals:
        dff = dff[dff["department"].isin(dept_vals)]

    # ... build chart from dff ...

A shared _get_user_df() helper keeps the auth + filtering logic in one place. Each callback gets a fresh, already-filtered DataFrame without any JSON serialisation round-trip.

Choosing a Pattern

Pattern Best for Tradeoff
Store Small datasets, single render callback Full data round-trips through the browser as JSON
Direct Larger datasets, cascading callbacks auth.get_auth_user() called per callback (negligible cost)

Both patterns enforce the same security boundary: users only ever see rows for their authorised departments, and admin users see everything.

DataModel: Multiple DataFrames

For apps with multiple data sources, prepare each one at startup:

class DashboardDataModel:
    def __init__(self, spending_df, forecast_df, domain_mapping):
        self.spending = AuthorisedDataFrame.prepare(
            spending_df, "department", domain_mapping
        )
        self.forecasts = AuthorisedDataFrame.prepare(
            forecast_df, "department", domain_mapping
        )

Then in a callback, call .for_user() on each:

@app.callback(...)
def update_dashboard(dept_vals, metric):
    user = auth.get_auth_user()

    secure_spending = data_model.spending.for_user(user)
    secure_forecasts = data_model.forecasts.for_user(user)

    if not secure_spending.has_access:
        return html.Div("No data available for your department.")

    # ... build dashboard from secure_spending.df, secure_forecasts.df ...

DataFrames can use different column names -- just specify the column in prepare():

# One dataset uses "department", another uses "OrganisationSubmitter"
self.assessments = AuthorisedDataFrame.prepare(
    assessments_df, "department", domain_mapping
)
self.spend = AuthorisedDataFrame.prepare(
    spend_df, "OrganisationSubmitter", domain_mapping
)

How It Works

  1. prepare(): Segments the DataFrame by the auth column using groupby -- this happens once at startup.
  2. for_user(): Resolves the user's email_domain to departments via the mapping, then picks the matching segments via dict lookup -- O(1) per department.
  3. .df: The resulting DataFrame contains only authorised rows. There is no way to access unfiltered data through the wrapper.

Admin users (user.is_admin) automatically get access to all departments in the mapping.

Domain Mapping

The domain_mapping dict maps email domains to department names that match your data:

DOMAIN_MAPPING = {
    "cabinetoffice.gov.uk": ["Cabinet Office"],
    "digital.cabinet-office.gov.uk": ["Cabinet Office"],
    "homeoffice.gov.uk": ["Home Office"],
    "hmrc.gov.uk": ["HMRC"],
}
  • Multiple domains can map to the same department
  • A single domain can map to multiple departments
  • Unmapped domains result in has_access = False and an empty DataFrame