{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab 1-3: Empirical Probability Distributions\n", "\n", "In the real world, we have a limited number of observations (not inifine numbers along a curve as in the theoretical examples). How do we know what our data looks like, what kind of distribution it has, what statistical tests we might what to use? \n", "\n", "One first step can be to create an empirical CDF and PDF from the data. (PDFs are often more intuitive, they resemble histograms, but which one you use to communicate some point depends on your audience and what what other engineers and scientists in your field typically use.) Wikipedia is a good place to start to learn more about [empirical distributions](https://en.wikipedia.org/wiki/Empirical_distribution_function).\n", "\n", "Let's import some packages we'll need, and load a sample dataset." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import scipy.stats as stats\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.10/site-packages/openpyxl/worksheet/_read_only.py:79: UserWarning: Unknown extension is not supported and will be removed\n", " for idx, row in parser.parse():\n" ] }, { "data": { "text/html": [ "
\n", " | date of peak | \n", "water year | \n", "peak value (cfs) | \n", "gage_ht (feet) | \n", "
---|---|---|---|---|
0 | \n", "1928-10-09 | \n", "1929 | \n", "18800 | \n", "10.55 | \n", "
1 | \n", "1930-02-05 | \n", "1930 | \n", "15800 | \n", "10.44 | \n", "
2 | \n", "1931-01-28 | \n", "1931 | \n", "35100 | \n", "14.08 | \n", "