Lab 7-2: Markov Chains - ENSO Phases#

Download the data file for this lab, ENSO_to2024.csv, which contains a record of the El Niño Southern Oscillation (ENSO) phase from 1900-2024.

You can read more about ENSO here, and here.


Importing python packages you’ll need for this lab:

import pandas as pd
import numpy as np
import scipy.stats as stats
from scipy import sparse
import matplotlib.pyplot as plt
%matplotlib inline

Load the data file

df = pd.read_csv('../data/ENSO_to2022.csv', comment='#')
df.head(3)
Water Year ENSO Phase Unnamed: 2
0 1900 1 NaN
1 1901 2 NaN
2 1902 2 NaN

A. Using the time series of the phase of the El Niño Southern Oscillation (ENSO) from 1900-2022, create a lag-1 Markov model of the ENSO phase.

Observed Phases of ENSO:

  • 1: warm (El Niño)

  • 2: neutral (ENSO neutral)

  • 3: cool, (La Niña)

Count transitions between each of the three ENSO phases using scipy.sparse.csr_matrix() and then scipy.sparse.csr_matrix.todense().

# count the transitions from each state to the next

# convert transition counts to matrix form

Normalize the transition matrix to get probabilities. This will create our lag-1 Markov Model.

Compute cumulative sums along the rows, make sure these sum to 1. (We will use this cdf matrix below in a simulation of ENSO phases)


B. Using this Markov model and a random number generator, simulate 5,000 years of ENSO data.

# pick the number of years we want to simulate (5000)

# use a uniform random number for 5000 years

# start off in state 2, neutral

C. Using this randomly generated data, answer the following questions.

  • According to the model, what is the probability that three warm ENSO years would occur in a row?

  • What is the large-sample probability that three cool ENSO years would happen in a row?

(Try refreshing the numbers several times to increase the sample size if the condition never happens.)