Lab 8-1: Predicting streamflow with the SWE regression method#
In this lab, we will use SWE measurements from the East River Valley (SNOTEL sites, plus the Kettle Ponds measurements) as well as USGS streamflow measurments from gage number 09112200. We will fit a linear regression to the streamflow-SWE relationship and use it to predict streamflow based on SWE measurements at Kettle Ponds.
import pandas as pd
# info for this package here: https://doi-usgs.github.io/dataRetrieval/reference/readNWISdv.html
import dataretrieval.nwis as nwis
from metloom.pointdata import SnotelPointData
from datetime import datetime
import altair as alt
from scipy.stats import linregress
import numpy as np
import xarray as xr
from metpy.units import units
#pip install dataretrieval
start_date = '1990-01-01'
end_date = '2024-01-10'
Download USGS Streamflow data#
We can download the streamflow gage data directly from a USGS gage.
usgs_site_code = '09112200' # Replace with the desired USGS site number
# 00060 is the parameter code for streamflow,
# this function will return mean daily values of discharge in cubic feet per second
streamflow_df, metadata = nwis.get_dv(sites=usgs_site_code, start=start_date, end=end_date, parameterCd='00060')
streamflow_df = streamflow_df.rename(columns={'00060_Mean': 'mean_daily_streamflow'})
Streamflow-SWE relationships are usually created for total spring streamflow, so we calculate here the total stream flow betwen April-July. Calculate the total April-July streamflow for each year. Note that the streamflow data came in CFS (cubic feet per second) so to get total streamflow for each day, we multiple the mean daily CFS value by the number of seconds in a day. Because these numbers end up being quite large, we convert from cubic feet to acre-feet. An acre-foot is the amount of water that covers an acre of land in a foot of water; it’s a common unit used in water resources.
seconds_in_a_day = 60*60*24
streamflow_df['daily_total_streamflow'] = streamflow_df['mean_daily_streamflow'] * seconds_in_a_day
streamflow_df['year'] = streamflow_df.index.year
streamflow_df['month'] = streamflow_df.index.month
df_amjj = streamflow_df[streamflow_df.month.isin([4,5,6,7])]
df_amjj = df_amjj.groupby('year')[['daily_total_streamflow']].sum()
# Note that 1 cubic foot is equal to 2.30e-5 acre feet. Let's convert
df_amjj['daily_total_streamflow'] = df_amjj['daily_total_streamflow'] * 2.30e-5
df_amjj = df_amjj.rename(columns={'daily_total_streamflow': 'Seasonal total streamflow (acre-feet)'})
df_amjj
| Seasonal total streamflow (acre-feet) | |
|---|---|
| year | |
| 1994 | 160166.33280 |
| 1995 | 312791.24160 |
| 1996 | 236762.95680 |
| 1997 | 288948.81600 |
| 1998 | 153918.57600 |
| 1999 | 200737.00800 |
| 2000 | 143599.04640 |
| 2001 | 138869.51040 |
| 2002 | 70821.42336 |
| 2003 | 157178.18016 |
| 2004 | 125839.44000 |
| 2005 | 193059.65952 |
| 2006 | 178399.09152 |
| 2007 | 120273.29280 |
| 2008 | 254830.77792 |
| 2009 | 234487.21536 |
| 2010 | 165854.29536 |
| 2011 | 275356.16928 |
| 2012 | 66996.65952 |
| 2013 | 112420.27584 |
| 2014 | 226375.46496 |
| 2015 | 164703.11040 |
| 2016 | 150986.26368 |
| 2017 | 242269.48800 |
| 2018 | 80087.53824 |
| 2019 | 249610.99968 |
| 2020 | 112156.77312 |
| 2021 | 79424.01216 |
| 2022 | 149954.11200 |
| 2023 | 233700.28416 |
Download SNOTEL SWE data#
We have downloaded SNOTEL data in pervious labs, so we don’t discuss this much here.
snotel_point_butte = SnotelPointData("380:CO:SNTL", "Butte")
SNOTEL_VARS = [
snotel_point_butte.ALLOWED_VARIABLES.SWE,
]
df_butte_longterm = snotel_point_butte.get_daily_data(
datetime(df_amjj.index.min() - 1, 10, 1), datetime(df_amjj.index.max(), 7, 1),
SNOTEL_VARS
)
Streamflow-SWE relationships usually use April 1st SWE, so we grab the April 1 SWE from the snotel station, for each year.
df_april1_swe = df_butte_longterm[
(df_butte_longterm.index.get_level_values(0).month == 4)
&
(df_butte_longterm.index.get_level_values(0).day == 1)
]
Combine SWE and Streamflow data#
Join the two dataframes into one dataframe.
df_april1_swe.index = df_april1_swe.index.get_level_values(0).year
df_swe_and_streamflow = df_april1_swe.join(df_amjj)
Now that the tables are joined, we can look at the relationship between SWE and Streamflow across all years of data.
alt.Chart(df_swe_and_streamflow).mark_point().encode(
alt.X('SWE:Q'),
alt.Y('Seasonal total streamflow (acre-feet):Q')
)
Perform a linear regression.#
Fit a line to the columns ‘SWE’ ‘Seasonal total streamflow (acre-feet)’
(slope, intercept, r_value, p_value, std_err) = (
linregress(df_swe_and_streamflow['SWE'], df_swe_and_streamflow['Seasonal total streamflow (acre-feet)'])
)
fit_line_x_values = np.linspace(0, 22, 100)
fit_lin_y_values = intercept + slope * fit_line_x_values
fit_line_df = pd.DataFrame({
'x': fit_line_x_values,
'y': fit_lin_y_values,
})
alt.Chart(df_swe_and_streamflow).mark_point().encode(
alt.X('SWE:Q'),
alt.Y('Seasonal total streamflow (acre-feet):Q')
) + alt.Chart(fit_line_df).mark_line(color='black').encode(
alt.X('x:Q').title(''),
alt.Y('y:Q').title('')
)
Incorporate Kettle Ponds data#
Now pull in SWE data from the Kettle Ponds and see how well the linear regression model works for the four SWE measurements at Kettle Ponds for Spring 2023.
sos_file = "../data/sos_full_dataset_30min.nc"
sos_dataset = xr.open_dataset(sos_file)
Here, we separate out the Kettle Ponds April 1 SWE measurements from the SOS dataset. We convert to inches because the SNOTEL data comes in inches. The Kettle Ponds SWE data is in millimeters.
kps_swe_values = sos_dataset.sel(time = '20230401 0000')[['SWE_p1_c', 'SWE_p2_c', 'SWE_p3_c', 'SWE_p4_c']].to_array().values
kps_swe_values = (kps_swe_values * units("mm")).to(units("inches")).magnitude
Now, we use the slope and intercept created using Snotel and USGS data above, to estimate streamflow using Kettle Ponds April 1 SWE.
streamflow_predictions_from_kps = kps_swe_values * slope + intercept
streamflow_predictions_from_kps = pd.DataFrame({
'SWE': kps_swe_values,
'Seasonal total streamflow (acre-feet)': streamflow_predictions_from_kps,
})
Now, let’s plot:
the SWE-Streamflow relationship above from Snotel and USGS data, all years
the SWE-Streamflow relationship above from Snotel and USGS data, just from 2023
the regression line fit above, and
the SWE-Streamflow relationship from four snow pillow measurements of April 1 SWE at Kettle Ponds.
# 1) the SWE-Streamflow relationship above from Snotel and USGS data, all years
swe_streamflow_snotel = alt.Chart(
df_swe_and_streamflow.assign(label = 'Measured SWE (SNOTEL) and Streamflow')
).mark_point().encode(
alt.X('SWE:Q'),
alt.Y('Seasonal total streamflow (acre-feet):Q'),
alt.Color('label:N')
)
# 2) the SWE-Streamflow relationship above from Snotel and USGS data, just from 2023
swe_streamflow_snotel_2023 = alt.Chart(
df_swe_and_streamflow.loc[2023:2023].assign(label = 'Measured SWE (SNOTEL) and Streamflow, 2023')
).mark_point(size=200, shape='square').encode(
alt.X('SWE:Q'),
alt.Y('Seasonal total streamflow (acre-feet):Q'),
alt.Color('label:N').scale(range=['purple']).title('')
)
# 3) the regression line fit above, and
regression_line = alt.Chart(
fit_line_df.assign(label = 'Regression line')
).mark_line(color='black').encode(
alt.X('x:Q').title(''),
alt.Y('y:Q').title(''),
alt.Color('label:N').scale(range=['black']).title('')
)
# 4) the SWE-Streamflow relationship from four snow pillow measurements of April 1 SWE at Kettle Ponds.
swe_streamflow_kettleponds = alt.Chart(
streamflow_predictions_from_kps.assign(label = 'Measured SWE at Kettle Ponds, Predicted Streamflow')
).mark_point().encode(
alt.X('SWE:Q').title('April 1 SWE '),
alt.Y('Seasonal total streamflow (acre-feet):Q'),
alt.Color('label:N').scale(range=['red']).title('')
)
(
swe_streamflow_snotel
+ swe_streamflow_snotel_2023
+ regression_line
+ swe_streamflow_kettleponds
).resolve_scale(color='independent').configure_legend(labelLimit=300)
We see that total spring streamflow predicted for 2023 using Snotel data is overestimated by the linear regression (the line is above the purple square).
Total spring streamflow predicted with Kettle Ponds SWE ranges from about 200,000 – 260,000 acre-feet, a range that includes the actual streamflow value (the range of streamflow values, y-axis, associated with Kettle Ponds SWE (red dots), includes the actual streamflow value, the y-value associated with the purple box).