{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab 2-3: More Hypothesis Testing\n", "---" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# import libraries we'll need\n", "import pandas as pd\n", "import numpy as np\n", "import scipy.stats as stats\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## T-Test for small sample sizes (n<30)\n", "\n", "We have instantaneous monthly observations of dissolved organic carbon (DOC) in two streams over the course of one water year (October-September). Use a two-sample, two-sided, t-test to determine:\n", "\n", "1. Using data for all 12 months, with what confidence can we say that the annual mean DOC concentrations are different between the two streams?\n", "2. Compare the two streams again, but this time perform two tests, one for the first 6 months of the water year (October-March), and a second test for the last 6 months (April-September).\n", "3. Can we say that the DOC concentrations between the two streams are different in the first half and/or second half of the water year? With what level of confidence could we say that they are different?\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [] }, "outputs": [], "source": [ "wy_month_labels = ['Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep']\n", "wy_month_numbers = np.arange(12)+1" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [] }, "outputs": [], "source": [ "# DOC for the first stream, mg/L\n", "doc_1 = [65.3, 98.4, 113.1, 120.5, 105.3, 100.3, 92.3, 97.5, 88.2, 89.5, 72.1, 61.9]\n", "# DOC for the second stream, mg/L\n", "doc_2 = [62.0, 50.7, 30.9, 52.5, 98.7, 95.8, 99.3, 110.2, 104.9, 96.4, 82.5, 75.5]" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Note that you need to enter the code to calculate the t-test yourself, based on the lecture notes or book" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Chi-Squared Test for a Change in the Standard Deviation\n", "Test for statistical significance of a change in the standard deviation.\n", "Note that the standard deviation does not benefit from the Central Limit Theorem.\n", "Even though it is not strictly true, assume for the moment that the\n", "sample data are derived from a normally distributed population. Use a\n", "single sample test (with rejection region based on the Chi Squared\n", "distribution). Assume that the sample standard deviation from the\n", "1929-1974 data is close to the true population standard deviation of the\n", "earlier data set. Test that the more recent sample is different from this.\n", "\n", "Use ${t} = \\frac{(n-1)s^2}{\\sigma^2}$ with n-1 degrees of freedom." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.10/site-packages/openpyxl/worksheet/_read_only.py:81: UserWarning: Unknown extension is not supported and will be removed\n", " for idx, row in parser.parse():\n" ] }, { "data": { "text/html": [ "
\n", " | date of peak | \n", "water year | \n", "peak value (cfs) | \n", "gage_ht (feet) | \n", "
---|---|---|---|---|
0 | \n", "1928-10-09 | \n", "1929 | \n", "18800 | \n", "10.55 | \n", "
1 | \n", "1930-02-05 | \n", "1930 | \n", "15800 | \n", "10.44 | \n", "
2 | \n", "1931-01-28 | \n", "1931 | \n", "35100 | \n", "14.08 | \n", "