Lab 8-1: Predicting streamflow with the SWE regression method

Lab 8-1: Predicting streamflow with the SWE regression method#

In this lab, we will use SWE measurements from the East River Valley (SNOTEL sites, plus the Kettle Ponds measurements) as well as USGS streamflow measurments from gage number 09112200. We will fit a linear regression to the streamflow-SWE relationship and use it to predict streamflow based on SWE measurements at Kettle Ponds.

import pandas as pd

# info for this package here: https://doi-usgs.github.io/dataRetrieval/reference/readNWISdv.html
import dataretrieval.nwis as nwis

from metloom.pointdata import SnotelPointData
from datetime import datetime
import altair as alt
from scipy.stats import linregress
import numpy as np
import xarray as xr
from metpy.units import units

#pip install dataretrieval

start_date = '1990-01-01'
end_date = '2024-01-10'

Download USGS Streamflow data#

We can download the streamflow gage data directly from a USGS gage.

usgs_site_code = '09112200' # Replace with the desired USGS site number

# 00060 is the parameter code for streamflow,
# this function will return mean daily values of discharge in cubic feet per second
streamflow_df, metadata = nwis.get_dv(sites=usgs_site_code, start=start_date, end=end_date, parameterCd='00060') 
streamflow_df = streamflow_df.rename(columns={'00060_Mean': 'mean_daily_streamflow'})

Streamflow-SWE relationships are usually created for total spring streamflow, so we calculate here the total stream flow betwen April-July. Calculate the total April-July streamflow for each year. Note that the streamflow data came in CFS (cubic feet per second) so to get total streamflow for each day, we multiple the mean daily CFS value by the number of seconds in a day. Because these numbers end up being quite large, we convert from cubic feet to acre-feet. An acre-foot is the amount of water that covers an acre of land in a foot of water; it’s a common unit used in water resources.

seconds_in_a_day = 60*60*24
streamflow_df['daily_total_streamflow'] = streamflow_df['mean_daily_streamflow'] * seconds_in_a_day

streamflow_df['year'] = streamflow_df.index.year
streamflow_df['month'] = streamflow_df.index.month
df_amjj = streamflow_df[streamflow_df.month.isin([4,5,6,7])]
df_amjj = df_amjj.groupby('year')[['daily_total_streamflow']].sum()

# Note that 1 cubic foot is equal to 2.30e-5 acre feet. Let's convert
df_amjj['daily_total_streamflow'] = df_amjj['daily_total_streamflow'] * 2.30e-5
df_amjj = df_amjj.rename(columns={'daily_total_streamflow': 'Seasonal total streamflow (acre-feet)'})
df_amjj

	Seasonal total streamflow (acre-feet)
year
1994	160166.33280
1995	312791.24160
1996	236762.95680
1997	288948.81600
1998	153918.57600
1999	200737.00800
2000	143599.04640
2001	138869.51040
2002	70821.42336
2003	157178.18016
2004	125839.44000
2005	193059.65952
2006	178399.09152
2007	120273.29280
2008	254830.77792
2009	234487.21536
2010	165854.29536
2011	275356.16928
2012	66996.65952
2013	112420.27584
2014	226375.46496
2015	164703.11040
2016	150986.26368
2017	242269.48800
2018	80087.53824
2019	249610.99968
2020	112156.77312
2021	79424.01216
2022	149954.11200
2023	233700.28416

Download SNOTEL SWE data#

We have downloaded SNOTEL data in pervious labs, so we don’t discuss this much here.

snotel_point_butte = SnotelPointData("380:CO:SNTL", "Butte")
SNOTEL_VARS = [
    snotel_point_butte.ALLOWED_VARIABLES.SWE,
]
df_butte_longterm = snotel_point_butte.get_daily_data(
    datetime(df_amjj.index.min() - 1, 10, 1), datetime(df_amjj.index.max(), 7, 1),
    SNOTEL_VARS
)

Streamflow-SWE relationships usually use April 1st SWE, so we grab the April 1 SWE from the snotel station, for each year.

df_april1_swe = df_butte_longterm[
    (df_butte_longterm.index.get_level_values(0).month == 4)
    &
    (df_butte_longterm.index.get_level_values(0).day == 1)
]

Combine SWE and Streamflow data#

Join the two dataframes into one dataframe.

df_april1_swe.index = df_april1_swe.index.get_level_values(0).year

df_swe_and_streamflow = df_april1_swe.join(df_amjj)

Now that the tables are joined, we can look at the relationship between SWE and Streamflow across all years of data.

alt.Chart(df_swe_and_streamflow).mark_point().encode(
    alt.X('SWE:Q'),
    alt.Y('Seasonal total streamflow (acre-feet):Q')
)

Perform a linear regression.#

Fit a line to the columns ‘SWE’ ‘Seasonal total streamflow (acre-feet)’

(slope, intercept, r_value, p_value, std_err) = (
    linregress(df_swe_and_streamflow['SWE'], df_swe_and_streamflow['Seasonal total streamflow (acre-feet)'])
)

fit_line_x_values = np.linspace(0, 22, 100)
fit_lin_y_values = intercept + slope * fit_line_x_values
fit_line_df = pd.DataFrame({
    'x': fit_line_x_values,
    'y': fit_lin_y_values,
})

alt.Chart(df_swe_and_streamflow).mark_point().encode(
    alt.X('SWE:Q'),
    alt.Y('Seasonal total streamflow (acre-feet):Q')
) + alt.Chart(fit_line_df).mark_line(color='black').encode(
    alt.X('x:Q').title(''),
    alt.Y('y:Q').title('')
) 

Incorporate Kettle Ponds data#

Now pull in SWE data from the Kettle Ponds and see how well the linear regression model works for the four SWE measurements at Kettle Ponds for Spring 2023.

sos_file = "../data/sos_full_dataset_30min.nc"
sos_dataset = xr.open_dataset(sos_file)

Here, we separate out the Kettle Ponds April 1 SWE measurements from the SOS dataset. We convert to inches because the SNOTEL data comes in inches. The Kettle Ponds SWE data is in millimeters.

kps_swe_values = sos_dataset.sel(time = '20230401 0000')[['SWE_p1_c', 'SWE_p2_c', 'SWE_p3_c', 'SWE_p4_c']].to_array().values
kps_swe_values = (kps_swe_values * units("mm")).to(units("inches")).magnitude

Now, we use the slope and intercept created using Snotel and USGS data above, to estimate streamflow using Kettle Ponds April 1 SWE.

streamflow_predictions_from_kps = kps_swe_values * slope + intercept

streamflow_predictions_from_kps = pd.DataFrame({
    'SWE': kps_swe_values,
    'Seasonal total streamflow (acre-feet)': streamflow_predictions_from_kps,
})

Now, let’s plot:

the SWE-Streamflow relationship above from Snotel and USGS data, all years
the SWE-Streamflow relationship above from Snotel and USGS data, just from 2023
the regression line fit above, and
the SWE-Streamflow relationship from four snow pillow measurements of April 1 SWE at Kettle Ponds.

# 1) the SWE-Streamflow relationship above from Snotel and USGS data, all years
swe_streamflow_snotel = alt.Chart(
        df_swe_and_streamflow.assign(label = 'Measured SWE (SNOTEL) and Streamflow')
    ).mark_point().encode(
        alt.X('SWE:Q'),
        alt.Y('Seasonal total streamflow (acre-feet):Q'),
        alt.Color('label:N')
    )

# 2) the SWE-Streamflow relationship above from Snotel and USGS data, just from 2023
swe_streamflow_snotel_2023 = alt.Chart(
        df_swe_and_streamflow.loc[2023:2023].assign(label = 'Measured SWE (SNOTEL) and Streamflow, 2023')
    ).mark_point(size=200, shape='square').encode(
        alt.X('SWE:Q'),
        alt.Y('Seasonal total streamflow (acre-feet):Q'),
        alt.Color('label:N').scale(range=['purple']).title('')
    )

# 3) the regression line fit above, and
regression_line = alt.Chart(
        fit_line_df.assign(label = 'Regression line')
    ).mark_line(color='black').encode(
        alt.X('x:Q').title(''),
        alt.Y('y:Q').title(''),
        alt.Color('label:N').scale(range=['black']).title('')
    )

# 4) the SWE-Streamflow relationship from four snow pillow measurements of April 1 SWE at Kettle Ponds.
swe_streamflow_kettleponds = alt.Chart(
        streamflow_predictions_from_kps.assign(label = 'Measured SWE at Kettle Ponds, Predicted Streamflow')
    ).mark_point().encode(
        alt.X('SWE:Q').title('April 1 SWE '),
        alt.Y('Seasonal total streamflow (acre-feet):Q'),
        alt.Color('label:N').scale(range=['red']).title('')
    )

(
    swe_streamflow_snotel 
    + swe_streamflow_snotel_2023
    + regression_line
    + swe_streamflow_kettleponds
).resolve_scale(color='independent').configure_legend(labelLimit=300)

We see that total spring streamflow predicted for 2023 using Snotel data is overestimated by the linear regression (the line is above the purple square).

Total spring streamflow predicted with Kettle Ponds SWE ranges from about 200,000 – 260,000 acre-feet, a range that includes the actual streamflow value (the range of streamflow values, y-axis, associated with Kettle Ponds SWE (red dots), includes the actual streamflow value, the y-value associated with the purple box).