Turning a notebook into production code

8 January 2025

At GPXZ we make heavy use of jupyter notebooks for developing new features and adjusting existing algorithms. We love the rich output, easy inspection and tweaking of variables, and the ability to experiment on data without reloading it every time.

Once we’re happy with the result, here’s the steps we take to get that new logic into production.

1. Rerun from scratch

Jupyter lets you execute cells out of order: this is great for quick edits to your code without running everything. But it makes it east to accidentally get your notebook into an unrepeatable state.

So first ensure your notebook is correctly capturing the desired result:

Copy the notebook
Clear all outputs
Rerun all cells
Ensure the notebook completes without error
Ensure the results match the old notebook!

2. Extract utility functions

Most data science processes consist of “utility” functions where

You don’t care about intermediate state
The implementation isn’t core to your algorithm
Inputs and outputs are clearly defined and delineated

Our next step is to take these utility functions

# forecast_flowrate.ipynb

df["rainfall"] = float(df.weather_report.str.split("|").str[2]) / 25.4

and put them into their own file. Call it {notebook_name}_utils.py for now: though some utils might be more at home elsewhere in your codebase, and others might get moved back next to the core algorithm.

# forecast_flowrate_utils.py

MM_PER_INCH = 25.4

def extract_rainfall(weather_report: pd.Series) -> pd.Series:
    """Extract the rainfall part of a weather report.

    Reports look like "STATION_NAME|windspeed|rainfall" and are in imperial units.
    """
    weather_report_parts = weather_report.str.split("|")
    rainfall_str = weather_report_parts.str[2]
    rainfall_inches = float(rainfall_str)
    rainfall_mm = rainfall_inches * MM_PER_INCH
    return rainfall_mm

With our utility function extracted, it can be better documented and tested.

Importing the new module with autoreload will re-import the module in the notebook whenever the python file is changed

# forecast_flowrate.ipynb

%load_ext autoreload
%autoreload 2

import forecast_flowrate_utils

and our notebook logic is much easier to read with this computation busywork outsourced

# forecast_flowrate.ipynb

df["rainfall"] = forecast_flowrate_utils.extract_rainfall(df.weather_report)

Depending on your package layout, you may have to hack sys.path to import the utils module

# forecast_flowrate.ipynb

import sys

sys.path.append('forecaster/weather')

3. Extract model inputs and outputs

Our notebook forecast_flowrate.ipynb is going to become a function forecast_flowrate().

# Inputs.
START_AT = datetime.now(timezone.utc)
BOUNDING_BOX = None  # Use whole domain as default.

Having them in one place helps contextualize the model, makes them easy to modify for testing your notebook generalises to the entire input domain.

Extracting inputs may also help with building a repeatable model, though for data-heavy models this may require more work at the data layer.

Similarly, if the model result is currently only being displayed as a notebook cell output, make sure to either save the result or put a final cell of explicit outputs:

# Outputs.
FORECAST_TIMESERIES = df_result.flowrate.fillna(0).to_numpy()

4. Hoist constants

The constants in our models are often set based on lots of experimentation and research: this hard work may be lost without documentation of why the final value was chosen!

Move them to just under the imports of your notebook. Any number other than 0 or 1 should be scrutinised as a potential algorithm parameter.

# Standard deviation of DEM gaussian blur, in px.
#
# Smoothing is needed as our DEMs have a lot of single-pixel noise. But at a 30m resolution, excess
# smoothing removes small waterways, resulting in over-estimates for inundation. A value of 2 had
# the best cross-validation results over the domains in our initial launch, though may need tweaking
# if we move to a new DEM source. 
#
# The code for running the cross-validation is in explore/dem-sigma-cv.ipynb
DEM_SMOOTHING_SIGMA = 2

In production the logic of an algorithm, is largely fixed: desired outcomes are mostly achieved by modifying these constants, which is easier when they’re all clearly in one place/

It’s also easy in a notebook to hardcode these values during experimentation, even when it’s important for the value to be consistent across the notebook. Moving them to a constant variable enforces consistency.

dem = smooth_dem(load_dem(), sigma=2)
if np.any(np.isnan(dem)):
    dem = smooth_dem(load_backup_dem(), sigma=2.5)
    # Is sigma=2.5 because the backup dem needs more smoothing?
    # Or were we experimenting with higher smoothing and forgot to revert everywhere?

5. Add logging

One benefit of notebooks is that we can easily display intermediary results. Unlike more traditional development, where often code either runs or doesn’t, intermediary state can be critical for understanding the quality of a data-heavy simulation is heading.

The easiest way to capture this is to log any parameter that might possibly be important. Having your log messages look like unique_snake_case_name=result makes them easy to search using commandline tools like grep, and makes it possible to parse them for analysis if needed.

dem = load_dem()
logger.info(f"Loaded dem dem_nan_px={np.isinan(dem).sum()}")

More advanced is to use structured logging. Plots can be saved as PDFs.

And more ugly (but something we do frequently for GPXZ!) is to pass around a meta dict, save as much info as possible, and write it out as a single json file at the end of the run.

meta: dict[str, Any] = {}

dem = load_dem()
meta["dem_nan_px"] = np.isinan(dem).sum()

If you are stuffing a bunch of debugging info into a dict it’s likely to include stuff that can’t be serialilsed as json. We use a jsonify function a bit like this to handle that:

def jsonify(x):
    # Mapping.
    if isinstance(x, dict):
        return {k: jsonify(v) for k, v in x.items()}

    # Iterable.
    if isinstance(x, tuple | list):
        return [jsonify(v) for v in x]

    # Scalar.
    return _convert_json_value(x)

def _convert_json_value(v:):
    """Convert values to json serialisable one."""
    if isinstance(v, np.ndarray):
        return [_convert_json_value(x) for x in v.tolist()]
    try:
        if math.isnan(v) or np.isnan(v):
            return None
    except TypeError:
        pass
    if isinstance(v, np.integer):
        return int(v)
    if isinstance(v, np.floating):
        return float(v)
    if isinstance(v, np.bool_):
        return bool(v)
    if isinstance(v, Path):
        return v.as_posix()
    if isinstance(v, datetime):
        return v.isoformat()
    if isinstance(v, Decimal):
        return float(v)
    return v

6. Abstraction

Ok now we’re ready to actually turn our notebook into a python file!

Your toplevel function should be a handful of functions that describe the model in high level:

DEM_SMOOTHING_SIGMA = 2

def forecast_flow_rate(start_at: datetime, domain: None | BoundingBox = None):
    """Single day ML forecast for flow at a streamgage location."""
    # Inputs.
    historic_flow = load_historic_flow(start_at)
    dem = load_dem(domain)
    precip = load_precipitation_forecast(domain, start_at)

    # Run model.
    forecast_result = run_lstm_flow_simulation(dem, precip, start_at)
    validate_forecast(forecast_result)

    # Save result.
    save_result_to_db(forecast_result)

Then copy each line of notebook code into the relevant sub function. Teasing out interactions between model steps is an excellent exercise for making sure you captured all your algorithm parameters, and weren’t relying on any weird notebook functionality.

Run unit tests, run model, compare to notebook output, done.