Lab 8-1: Predicting streamflow with the SWE regression method#

In this lab, we will use SWE measurements from the East River Valley (SNOTEL sites, plus the Kettle Ponds measurements) as well as USGS streamflow measurments from gage number 09112200. We will fit a linear regression to the streamflow-SWE relationship and use it to predict streamflow based on SWE measurements at Kettle Ponds.

import pandas as pd

# info for this package here: https://doi-usgs.github.io/dataRetrieval/reference/readNWISdv.html
import dataretrieval.nwis as nwis

from metloom.pointdata import SnotelPointData
from datetime import datetime
import altair as alt
from scipy.stats import linregress
import numpy as np
import xarray as xr
from metpy.units import units
#pip install dataretrieval
start_date = '1990-01-01'
end_date = '2024-01-10'

Download USGS Streamflow data#

We can download the streamflow gage data directly from a USGS gage.

usgs_site_code = '09112200' # Replace with the desired USGS site number

# 00060 is the parameter code for streamflow,
# this function will return mean daily values of discharge in cubic feet per second
streamflow_df, metadata = nwis.get_dv(sites=usgs_site_code, start=start_date, end=end_date, parameterCd='00060') 
streamflow_df = streamflow_df.rename(columns={'00060_Mean': 'mean_daily_streamflow'})

Streamflow-SWE relationships are usually created for total spring streamflow, so we calculate here the total stream flow betwen April-July. Calculate the total April-July streamflow for each year. Note that the streamflow data came in CFS (cubic feet per second) so to get total streamflow for each day, we multiple the mean daily CFS value by the number of seconds in a day. Because these numbers end up being quite large, we convert from cubic feet to acre-feet. An acre-foot is the amount of water that covers an acre of land in a foot of water; it’s a common unit used in water resources.

seconds_in_a_day = 60*60*24
streamflow_df['daily_total_streamflow'] = streamflow_df['mean_daily_streamflow'] * seconds_in_a_day

streamflow_df['year'] = streamflow_df.index.year
streamflow_df['month'] = streamflow_df.index.month
df_amjj = streamflow_df[streamflow_df.month.isin([4,5,6,7])]
df_amjj = df_amjj.groupby('year')[['daily_total_streamflow']].sum()

# Note that 1 cubic foot is equal to 2.30e-5 acre feet. Let's convert
df_amjj['daily_total_streamflow'] = df_amjj['daily_total_streamflow'] * 2.30e-5
df_amjj = df_amjj.rename(columns={'daily_total_streamflow': 'Seasonal total streamflow (acre-feet)'})
df_amjj
Seasonal total streamflow (acre-feet)
year
1994 160166.33280
1995 312791.24160
1996 236762.95680
1997 288948.81600
1998 153918.57600
1999 200737.00800
2000 143599.04640
2001 138869.51040
2002 70821.42336
2003 157178.18016
2004 125839.44000
2005 193059.65952
2006 178399.09152
2007 120273.29280
2008 254830.77792
2009 234487.21536
2010 165854.29536
2011 275356.16928
2012 66996.65952
2013 112420.27584
2014 226375.46496
2015 164703.11040
2016 150986.26368
2017 242269.48800
2018 80087.53824
2019 249610.99968
2020 112156.77312
2021 79424.01216
2022 149954.11200
2023 233700.28416

Download SNOTEL SWE data#

We have downloaded SNOTEL data in pervious labs, so we don’t discuss this much here.

snotel_point_butte = SnotelPointData("380:CO:SNTL", "Butte")
SNOTEL_VARS = [
    snotel_point_butte.ALLOWED_VARIABLES.SWE,
]
df_butte_longterm = snotel_point_butte.get_daily_data(
    datetime(df_amjj.index.min() - 1, 10, 1), datetime(df_amjj.index.max(), 7, 1),
    SNOTEL_VARS
)

Streamflow-SWE relationships usually use April 1st SWE, so we grab the April 1 SWE from the snotel station, for each year.

df_april1_swe = df_butte_longterm[
    (df_butte_longterm.index.get_level_values(0).month == 4)
    &
    (df_butte_longterm.index.get_level_values(0).day == 1)
]

Combine SWE and Streamflow data#

Join the two dataframes into one dataframe.

df_april1_swe.index = df_april1_swe.index.get_level_values(0).year
df_swe_and_streamflow = df_april1_swe.join(df_amjj)

Now that the tables are joined, we can look at the relationship between SWE and Streamflow across all years of data.

alt.Chart(df_swe_and_streamflow).mark_point().encode(
    alt.X('SWE:Q'),
    alt.Y('Seasonal total streamflow (acre-feet):Q')
)

Perform a linear regression.#

Fit a line to the columns ‘SWE’ ‘Seasonal total streamflow (acre-feet)’

(slope, intercept, r_value, p_value, std_err) = (
    linregress(df_swe_and_streamflow['SWE'], df_swe_and_streamflow['Seasonal total streamflow (acre-feet)'])
)
fit_line_x_values = np.linspace(0, 22, 100)
fit_lin_y_values = intercept + slope * fit_line_x_values
fit_line_df = pd.DataFrame({
    'x': fit_line_x_values,
    'y': fit_lin_y_values,
})
alt.Chart(df_swe_and_streamflow).mark_point().encode(
    alt.X('SWE:Q'),
    alt.Y('Seasonal total streamflow (acre-feet):Q')
) + alt.Chart(fit_line_df).mark_line(color='black').encode(
    alt.X('x:Q').title(''),
    alt.Y('y:Q').title('')
) 

Incorporate Kettle Ponds data#

Now pull in SWE data from the Kettle Ponds and see how well the linear regression model works for the four SWE measurements at Kettle Ponds for Spring 2023.

sos_file = "../data/sos_full_dataset_30min.nc"
sos_dataset = xr.open_dataset(sos_file)

Here, we separate out the Kettle Ponds April 1 SWE measurements from the SOS dataset. We convert to inches because the SNOTEL data comes in inches. The Kettle Ponds SWE data is in millimeters.

kps_swe_values = sos_dataset.sel(time = '20230401 0000')[['SWE_p1_c', 'SWE_p2_c', 'SWE_p3_c', 'SWE_p4_c']].to_array().values
kps_swe_values = (kps_swe_values * units("mm")).to(units("inches")).magnitude

Now, we use the slope and intercept created using Snotel and USGS data above, to estimate streamflow using Kettle Ponds April 1 SWE.

streamflow_predictions_from_kps = kps_swe_values * slope + intercept

streamflow_predictions_from_kps = pd.DataFrame({
    'SWE': kps_swe_values,
    'Seasonal total streamflow (acre-feet)': streamflow_predictions_from_kps,
})

Now, let’s plot:

  1. the SWE-Streamflow relationship above from Snotel and USGS data, all years

  2. the SWE-Streamflow relationship above from Snotel and USGS data, just from 2023

  3. the regression line fit above, and

  4. the SWE-Streamflow relationship from four snow pillow measurements of April 1 SWE at Kettle Ponds.

# 1) the SWE-Streamflow relationship above from Snotel and USGS data, all years
swe_streamflow_snotel = alt.Chart(
        df_swe_and_streamflow.assign(label = 'Measured SWE (SNOTEL) and Streamflow')
    ).mark_point().encode(
        alt.X('SWE:Q'),
        alt.Y('Seasonal total streamflow (acre-feet):Q'),
        alt.Color('label:N')
    )

# 2) the SWE-Streamflow relationship above from Snotel and USGS data, just from 2023
swe_streamflow_snotel_2023 = alt.Chart(
        df_swe_and_streamflow.loc[2023:2023].assign(label = 'Measured SWE (SNOTEL) and Streamflow, 2023')
    ).mark_point(size=200, shape='square').encode(
        alt.X('SWE:Q'),
        alt.Y('Seasonal total streamflow (acre-feet):Q'),
        alt.Color('label:N').scale(range=['purple']).title('')
    )

# 3) the regression line fit above, and
regression_line = alt.Chart(
        fit_line_df.assign(label = 'Regression line')
    ).mark_line(color='black').encode(
        alt.X('x:Q').title(''),
        alt.Y('y:Q').title(''),
        alt.Color('label:N').scale(range=['black']).title('')
    )

# 4) the SWE-Streamflow relationship from four snow pillow measurements of April 1 SWE at Kettle Ponds.
swe_streamflow_kettleponds = alt.Chart(
        streamflow_predictions_from_kps.assign(label = 'Measured SWE at Kettle Ponds, Predicted Streamflow')
    ).mark_point().encode(
        alt.X('SWE:Q').title('April 1 SWE '),
        alt.Y('Seasonal total streamflow (acre-feet):Q'),
        alt.Color('label:N').scale(range=['red']).title('')
    )

(
    swe_streamflow_snotel 
    + swe_streamflow_snotel_2023
    + regression_line
    + swe_streamflow_kettleponds
).resolve_scale(color='independent').configure_legend(labelLimit=300)

We see that total spring streamflow predicted for 2023 using Snotel data is overestimated by the linear regression (the line is above the purple square).

Total spring streamflow predicted with Kettle Ponds SWE ranges from about 200,000 – 260,000 acre-feet, a range that includes the actual streamflow value (the range of streamflow values, y-axis, associated with Kettle Ponds SWE (red dots), includes the actual streamflow value, the y-value associated with the purple box).