Lab 1-2

Lab 1-2#

Section 1: Creating and Plotting Random Normal Data#

This section will introduce you to creating and plotting random normal data using the numpy and math modules. The random normal data set will have a mean of 100, standard deviation of 25, and sample size of 1000.

# First, import the libraries you will need
import numpy as np
import matplotlib.pyplot as plt

#Also, add code to make sure your plots appear in your Jupyter notebook
%matplotlib inline 

# Create variables for the mean, standard deviation, and sample size of the data 
# as well as a variable for the number of bins used to plot the data as a 
# histogram later.
mean = 200
sd = 200
size = 1000
nbins = 10

# Create random data using the properties defined above and the np module.
data_normal = np.random.normal(mean, sd, size)

Now that the data has been created, plot the data as a histogram. Try changing the variables defined above, especially the number of bins and sample size, and seeing how the graph changes.

plt.figure()
plt.hist(data_normal, nbins, ec="black")
plt.title('(Fig. 1) Normal Distribution')
plt.xlabel('Random Number')
plt.ylabel('Number of Occurences')

Text(0, 0.5, 'Number of Occurences')

../../_images/6c26912897b59d7bae574618b772e6d01a06add6212b57ebf6e70c3e8bc9ec2a.png

Section 2: Creating Random Lognormal Data#

Next, generate and plot random lognormal data with the same mean, standard deviation, and sample size as above. Refer back to Lecture 2 in class or to wikipedia’s page on the lognormal distibution: https://en.wikipedia.org/wiki/Log-normal_distribution. Note that the parameters \(\mu\) and \(\sigma\) in the lognormal distribution refer to the mean and the standard deviation of the variables natural logarithm, not of the original dataset. If we use “mean” and “sd” to refer to what we would calculate for the original dataset, then we can calculate \(\mu\) and \(\sigma\) as follows:

First, the mean of \(ln(RandomData)\) is \({\mu} = ln\bigg(\frac{mean^2}{\sqrt{mean^2 + sd^2}}\bigg)\) and the standard deviation is \({\sigma} = \sqrt{ln\bigg(\frac{mean^2 + sd^2}{mean^2}\bigg)}\).

mean = 100
sd = 25
size = 1000
nbins = 20

# Find the mean and standard deviation for ln(RandomData)
mu = np.log(mean**2 / np.sqrt(mean**2 + sd**2))
sigma = np.sqrt(np.log((mean**2 + sd**2) / (mean**2)))

# Create random data 
data_lognormal = np.random.lognormal(mu, sigma, size)

Now plot the data. Try changing the variables above to see how the graph changes.

plt.figure()
plt.hist(data_lognormal, nbins, ec="black")
plt.title('(Fig. 2) Lognormal Distribution')
plt.xlabel('Random Number')
plt.ylabel('Number of Occurences')

Text(0, 0.5, 'Number of Occurences')

../../_images/6d70ade6e021f2682bc84667a284b06447601de6542c028c6352115708ccf2a5.png

Section 3: Creating Random Uniform Data#

Next, generate random uniform data with the same mean, standard deviation, and sample size as above. Consider Lecture 2 on the uniform distribution or see wikipedia: https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)

First, for \({f(x)} = \frac{1}{b-a}\), the mean is \({\mu} = \frac{b+a}{2}\) and the standard deviation is \({\sigma} = \frac{b-a}{\sqrt{12}}\).

So, \({a} = \mu-\sigma\sqrt{3}\) and \({b} = \mu+\sigma\sqrt{3}\).

mean = 100
sd = 50
size = 1000
nbins = 10

# Find the bounds for uniform data
a = mean - sd*np.sqrt(3)
b = mean + sd*np.sqrt(3)

# Create random data 
data_uniform = np.random.uniform(a, b, size)