`1.022 Final Project: <https://learning-modules.mit.edu/class/index.html?uuid=/course/1/fa20/1.022#info>`_ 

Modeling the nation-wide spread of COVID-19
============================================
     
We have carefully crafted the final project to provide you
with an opportunity to gain experience dealing with a real-world
application of networks. As a particular case of interest, we will
design, evaluate, and refine an epidemic model to predict the **total
number of covid cases in the US** over time.

.. _collecting-data:

===============================================
1. Collecting and plotting data on covid cases
===============================================

As the first step, we collect historic and current data on total number
of covid cases across the nation via `nytimes covid-19
bot <https://github.com/nytimes/covid-19-data>`__.

Run the code below to:

-  get the daily record of the cumulative counts of covid cases at the
   state level
-  process the gathered data to get daily and weekly counts of new cases
   at both state and national levels

.. jupyter-execute::

    import numpy as np
    import pandas as pd
    import itertools 
    import pickle
    from scipy.linalg import norm
    from scipy.linalg import expm
    from scipy.optimize import minimize
    import os
    from timeit import default_timer as timer
    import matplotlib.pyplot as plt
    plt.style.use('seaborn')
    import nytimes

    # some basic national demographics (including population of each
    # state to be used in normalization of covid counts)
    state_geo = pd.read_csv('states_demographics.csv',\ 
                        dtype={'state_fips':str}).set_index('state_id')
    
    raw_states_df, _ = nytimes.get_nyt_data()
    state_cases = nytimes.convert_state_df_to_ts(\
            raw_states_df, quantity = 'cases').drop(columns=['state']).T
    state_cases.index = pd.to_datetime(state_cases.index)
    state_cases = state_cases[state_geo['state_name']]
    state_cases.columns = state_geo.index.copy()
    state_cases_daily = state_cases.diff()
    t_last_na = state_cases_daily.isna().any(axis=1).cumsum().idxmax()
    state_cases_daily.dropna(inplace=True)
    state_cases_weekly = 7*state_cases_daily.resample(\
                            '7D').mean().divide(state_geo['population'])
    
    cumul_offset = state_cases.loc[t_last_na].divide(state_geo['population'])
    state_cases_cumulative = state_cases_weekly.cumsum() + cumul_offset
    
    nation_cases_daily = state_cases_daily.sum(axis=1)
    nation_cases_weekly = 7*nation_cases_daily.resample('7D').mean()\
                                           /state_geo['population'].sum()
    nation_cumul_offset = state_cases.loc[t_last_na].sum()\
                                           /state_geo['population'].sum()
    nation_cases_cumulative = nation_cases_weekly.cumsum()\
                                                    + nation_cumul_offset

.. jupyter-execute::
    :hide-code:
    
    t_clip = pd.Timestamp('2020-11-25')
    nation_cases_cumulative = nation_cases_cumulative[:t_clip]
    nation_cases_weekly = nation_cases_weekly[:t_clip]
    state_cases_cumulative = state_cases_cumulative[:t_clip]
    state_cases_weekly = state_cases_weekly[:t_clip]


The processed data that we will use in our design is weekly new covid
cases and cumulative number of cases, gathered in DataFrames
``state_cases_weekly``, ``state_cases_cumulative`` for state level data,
and as two Series ``nation_cases_weekly`` ,\ ``nation_cases_cumulative``
for the country level data, all starting at March 18 and having 7D
frequency.

Here is a snapshot from state level weekly count of new covid cases (`state_cases_weekly`):

.. csv-table:: 
   :header: state_id, AL, AZ, AR, CA 
   :widths: 20, 20, 20, 20, 20

    2020-03-18 , 0.000041  ,0.000042  ,0.000069,  0.000048
    2020-03-25  ,0.000154  ,0.000134 , 0.000110 , 0.000150
    2020-04-01  ,0.000244  ,0.000176,  0.000143  ,0.000227
           

**Both weekly and cumulative number of cases are normalized by the
population of their states for the state level data, and by US
population for the national level data**

From now on and in the rest of this manuscript, *number of cases* by default 
refers to their values after normalization ( a number between 0 and 1), unless 
explicitly stated otherwise.

.. _task1.1:

.. topic:: Task 1.1
                   
    Use the collected data to plot: (i) weekly count of new covid cases in
    the US, and (ii) cumulative number of cases in the US.

.. jupyter-execute::
     
     fig, axes = plt.subplots(2,1)
     iter_axes = itertools.cycle(fig.axes)
    
     ax = next(iter_axes)
     ax.cla()
     ax.plot(nation_cases_weekly,\ 
             label='national cases weekly', linewidth=4); ax.legend()
    
     ax = next(iter_axes)
     ax.cla()
     ax.plot(nation_cases_cumulative,\
             label='national cases cumulative', linewidth=4); ax.legend()

.. _basic-SIR:

===============================================
2. Fitting a basic SIR model to the covid data
===============================================

.. _task2.1:

.. topic:: Task 2.1

    -  Show that in the basic SIR model, so long as the size of the
       infection is very small (e.g., early stages of the spread), we can
       approximate the dynamics of the spread by:
    
    .. math::
    
       \begin{align*}\dot{S}(t) &=-\beta~I(t)\\
       \dot{I}(t) &= (\beta - \gamma)~I(t)
       \end{align*}
    
    -  Use this simplified dynamics to show that:
    
       .. math:: S(t) \approx S(0) -\frac{\beta}{\beta-\gamma}(e^{(\beta~-~\gamma)t}-1)~I(0)
    

Observe the three identical quantities:
the (normalized) cumulative number of cases at time :math:`t`
:math:`\equiv` :math:`1-S(t)` :math:`\equiv` the fraction of the
population that are either infected or recovered/removed by time
:math:`t` (*IR* population).

The function ``SIR`` implemented below simulates :math:`T` steps of the
SIR model based on the above approximation.

Inputs to the ``SIR`` function: 

- ``beta``: the average infection/contact rate in the population 
- ``gamma``: the average recovery rate for each infected case 
- ``IR_init``: the initial size of infected-or-recovered/removed cases (:math:`I(0)+R(0)`) 
- ``I_init``: the initial size of the infected cases (:math:`I(0)`) 
- ``T``: the number of time steps to run the model. **Each step here represents a week** (7D)

The function will then output:

-  ``IR_1_to_T``: The cumulative number of covid cases at
   :math:`t= 1, \ldots, T` as predicted by the model

.. jupyter-execute::

    def SIR(beta, gamma, IR_init, I_init, T):
         IR_1_to_T = (np.exp((beta-gamma)*np.arange(1,T+1))-1)*\
                                     beta*I_init/(beta-gamma) + IR_init
         return IR_1_to_T
    

The next step is to tune the parameters of the model, namely
:math:`\beta` and :math:`\gamma` using the data we prepared earlier.

Let us **choose March 18, as t=0**. Use the cumulative number of
cases in the US on March 18 (top entry of ``nation_cases_cumulative``)
as ``IR_init``. We further assume ``I_init`` = ``IR_init`` (i.e.,
:math:`\small R(0)\approx0`). This can be justified noting that on March
18 the spread was at its very early stages, hence only a small fraction
of infected cases had recovered by then.

The goal is to choose :math:`\beta` and :math:`\gamma` such that number
of cases as predicted by the model *stays close* to that observed from
data. The standard approach is to choose a *loss function*, measuring
how off the predictions are from observed data, and then tune the
parameters such that they minimize the loss.

| Here, we search for :math:`\beta` and :math:`\gamma` minimizing the
  sum of the squares of prediction errors for the US cumulative number
  of cases, over a period of :math:`T=16` weeks (from March 25 to July
  08)
| 

  .. math:: {\underset{\beta, \gamma}{\rm minimize}} \sum_{t=1}^T \big(IR_{\rm predicted}(t)-IR_{\rm data}(t)\big)^2

Note that March 18 data is already used in initializing the model
(``IR_init``).

Here is the code implementing this loss function:

.. jupyter-execute::

     def SIR_LOSS(params, IR_init, I_init, observed_cumulative_cases):
         beta, gamma = params
         T = len(observed_cumulative_cases)
         predicted_IR = SIR(beta, gamma, IR_init, I_init, T)
         loss = norm(observed_cumulative_cases-predicted_IR)**2
         return loss

Inputs and outputs to ``SIR_LOSS`` function: 

- ``params``: to pass the values of :math:`\beta` and :math:`\gamma` to the loss function. 
- ``IR_init``: The initial size of the infected-or-recovered cases (:math:`I(0)+R(0)`) 
- ``I_init``: The initial size of the infected cases (:math:`I(0)`) 
- ``observed_cumulative_cases``: The cumulative number of covid cases in the US as observed from data

The function will then output:

-  ``loss``: The sum of squares of prediction errors in cumulative
   number of cases in the US as predicted by the model

.. _task2.2:

.. topic:: Task 2.2

    -  Use the function
       `minimize <https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html>`__
       from *scipy.optimize* package to find values of :math:`\beta` and
       :math:`\gamma` minimizing the loss function ``SIR_LOSS`` for a period
       of :math:`T=16` weeks from March 25 to July 08. (Choose data from
       ``nation_cases_cumulative`` for this period to pass as
       ``observed_cumulative_cases`` to the loss function)
    
    -  Pass the values :math:`\beta` and :math:`\gamma` you obtain to the
       ``SIR`` function to predict the cumulative number of covid cases in
       the US for 36 weeks starting from March 18.
    
    -  Plot the predictions together with observed data for both (i)
       cumulative number of cases and (ii) weekly new cases ( use
       ``np.diff()`` to obtain the latter)

.. jupyter-execute::

     n_train = 16
     IR_init = I_init = nation_cases_cumulative[0]
     observed_cumul_cases = nation_cases_cumulative[1:n_train+1].values
    
     params_init = [0.1,0.01]
     res = minimize(SIR_LOSS, params_init,\
                      args = (IR_init, I_init, observed_cumul_cases))
     beta_SIR, gamma_SIR = res['x']
     print('success:{}, beta: {:.3f}, gamma:{:.3f}'.format(\
                                     res['success'], beta_SIR, gamma_SIR))
    

.. jupyter-execute::

     # pred over entire data/visualize
     T = len(nation_cases_cumulative[1:])
     predicted_cumul_cases = SIR(beta_SIR, gamma_SIR,\
              nation_cases_cumulative[0], nation_cases_cumulative[0], T)
     predicted_cumul_cases = \
             np.append(nation_cases_cumulative[0],predicted_cumul_cases)
     predicted_weekly_cases = np.diff(predicted_cumul_cases)
    
     fig, axes = plt.subplots(2,1)
     iter_axes = itertools.cycle(fig.axes)
     ax = next(iter_axes); ax.cla()
     ax.plot(nation_cases_weekly[1:], \
                 label= '{}: weekly'.format('nation'), linewidth=2)
     ax.legend()
     ax.plot(nation_cases_weekly.index[1:],\
                  predicted_weekly_cases, label='pred', linewidth=2)
     ax.legend()
     ax = next(iter_axes); ax.cla()
     ax.plot(nation_cases_cumulative,\
              label='{}: cumulative'.format('nation'), linewidth=2)
     ax.legend()
     ax.plot(nation_cases_cumulative.index,\
              predicted_cumul_cases, label='pred', linewidth=2) 
     ax.legend()


====================================================================================================
3. Refining the design by incorporating `SafeGraph <https://www.safegraph.com/>`__ mobility data
====================================================================================================


The performance of the basic SIR design reveals the necessity to account
for the behavioral dynamics of a population dealing with a pandemic, in
order to have more reliable predictions. In this part, we refine our
initial design by incorporating mobility data provided by
`SafeGraph <https://www.safegraph.com/>`__ in order to cope with changes
in inter-state mobility across the US in the covid era.

To save you time, we have already processed and trimmed daily US
mobility data starting from January 1st up until now. The end result
accessible via ‘interstate_mobility.pickle’ is a dictionary providing
weekly averaged inter-state mobility data for the period of January 1 -
November 11 (a total of 46 weeks). Each item in the dictionary is a
tuple ``(date, df)``, where:

-  ``df.loc[st1, st2]`` counts the number of devices from state ``st1``
   who visited state ``st2``, averaged over the week starting from
   ``date``.
-  diagonal entries of ``df`` count the number of devices not leaving
   their home state, averaged over the same week.

.. jupyter-execute::

     with open('interstate_mobility.pickle', 'rb') as f_handle:
        interstate_mobility_weekly = pickle.load(f_handle)

     # let's look inside
     date = pd.Timestamp('2020-03-18')
     pd.options.display.float_format = '{:.2f}'.format
     display(interstate_mobility_weekly[date].iloc[:4,:6])


.. _task3.1:

.. topic:: Task 3.1

    Calculate and plot average out-of-state travel by adding up off-diagonal
    entries in each DataFrame and normalizing the result by the number of
    states. Is there any noticable fluctuations in travel volume in response
    to covid? Summarize your observations from the plot (max. 5 lines).

.. jupyter-execute::

     avg_inflow ={}
     for date, df in interstate_mobility_weekly.items():
        avg_inflow[date] = (np.sum(df.values) - np.trace(df))/df.shape[0]
    
     avg_inflow = pd.Series(avg_inflow)    
     plt.plot(avg_inflow,\
                        label='average out of state travel', linewidth=4)
     plt.legend()


A networked SIR model with time-varying inter-state mobility
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This is an enhanced version of the networked SIS model discussed in the
lecture, capturing the effect of changes in inter-state mobility over
time.


The population of any state :math:`j` at time :math:`t` is compromised
of (i) those who have travelled to state :math:`j` from other states,
and (ii) those whose home state is :math:`j` and have not travelled
elsewhere. We use :math:`n_{ij}(t)` to denote the number of individuals
with home state :math:`i` who are in state :math:`j` at time :math:`t`.
The infected and susciptible populations currently at state :math:`j`
are then :math:`\sum_{i=1}^{n}n_{ij}(t)I_i(t)` and
:math:`\sum_{i=1}^{n}n_{ij}(t)S_i(t)`. Normalized sizes become:

.. math::

   \begin{align}
   {\hat I}_j(t)=&\sum_{i=1}^{n}c_{ij}(t)I_i(t)\\
   {\hat S}_j(t)=&\sum_{i=1}^{n}c_{ij}(t)S_i(t), 
   \end{align}

where :math:`c_{ij}(t):=\frac{n_{ij}(t)}{\sum_{i=1}^{n}n_{ij}(t)}`.
Observe that :math:`c_{ij}(t)` **normalizes the inflow** to state
:math:`j` to 1. In other words, matrix :math:`C(t)=[c_{ij}(t)]` is
*column-stochastic*.

The size of new infections is
:math:`\beta_j {\hat I}_j(t){\hat S}_j(t)`. During the early stages, we
can use the first order approximation :math:`\beta_j {\hat I}_j(t)` for
the size of new infections, out of which the share of travellers from
state :math:`i` is :math:`\beta_j {\hat I}_j(t) n_{ij}(t)` newly
infected individuals, taking the virus with themselves back to their
home state :math:`i`.

The aggregate size of new cases among **residents of state**
:math:`{\bf i}` thus becomes
:math:`\sum_{j=1}^{n}\beta_j {\hat I}_j(t) n_{ij}(t)`, or equivalently
:math:`\sum_{i=j}^{n}\beta_j {\hat I}_j(t) \rho_{ij}(t)`, when
normalized with their population size :math:`\sum_{j=1}^{n} n_{ij}(t)`.
Note that, here
:math:`\rho_{ij}(t) :=\frac{n_{ij}(t)}{\sum_{j=1}^{n}n_{ij}(t)}`. That
is, :math:`\rho_{ij}(t)` **normalizes the outflow** from state :math:`i`
to 1. In other words, matrix :math:`{\rm P}(t)=[\rho_{ij}(t)]` is
*row-stochastic*. The continuous-time approximate update rule for state
:math:`i` then becomes:

.. math::

   \begin{align*}
   \dot{S}_i{(t)} &= -\sum_{j=1}^{n}\rho_{ij}(t)\beta_j {\hat I}_j(t)\\
   \dot{I}_i(t)&=\sum_{j=1}^{n}\rho_{ij}(t)\beta_j {\hat I}_j(t) - \gamma I_i(t)
   \end{align*}

.. _task3.2:

.. topic:: Task 3.2

    -  Verify that the dynamics described above can be written in the
       following compact form:
    
       .. math::
    
          \begin{align*}
          \dot{S}(t) &= -{\rm P}(t)~{\rm diag}(\beta)~C^T(t)~I(t)\\
          \dot{I}(t) &= \big({\rm P}(t)~{\rm diag}(\beta)~C^T(t)-\gamma~{\rm Id.}\big)~I(t)
          \end{align*}
    
    where :math:`{\rm P}(t) = [\rho_{ij}(t)]` and :math:`C(t)=[c_{ij}(t)]`
    are row-stochastic and column-stochastic versions of the inter-state
    mobility matrix, respectively.
    
    -  Can you justify the use of different contact rates, but a common
       recovery rate for different states?


The solution to the above continuous-time dynamics can be written as:

.. math::

   \begin{align*}
   I(t) &= e^{\int_0^{t}\big({\rm P}(\tau)~{\rm diag}(\beta)~C^T(\tau)-\gamma~{\rm Id.}\big)d\tau}~I(0)\\
   S(t) &= S(0) -\int_0^{t} {\rm P}(\tau)~{\rm diag}(\beta)~C^T(\tau)~I(\tau)~d\tau
   \end{align*}

where :math:`{\rm Id.}` stands for the *identity matrix*.

We use the following discrete-time approximation in our analysis:

.. math::

   \begin{align*}
   I(t+1)&=e^{\sum_{\tau=0}^{t}\big({\rm P}(\tau)~{\rm diag}(\beta)~C^T(\tau)-\gamma~{\rm Id.}\big)}~I(0)\\
   S(t+1) &= S(0) - \sum_{\tau=0}^{t} {\rm P}(\tau)~{\rm diag}(\beta)~C^T(\tau)~I(\tau)
   \end{align*}

The function ``Networked_SIR`` implemented below simulates :math:`T`
steps of our networked SIR model based on the above approximation.

Inputs to the ``Networked_SIR`` function: 

- ``beta``: 1D array whose elements are the infection rates of different states 
- ``gamma``: the common recovery rate 
- ``IR_init``: 1D array represeting the initial size of the IR population in each state 
- ``I_init``: 1D array represeting the initial size of the infected population in each state 
- ``C_matrices``: :math:`T\times n_{\rm states}\times n_{\rm states}` array. ``C_matrices[t]`` represents inter-state mobility matrix with inflows normalized to 1 (column sum=1) at time :math:`t` for :math:`t=0,1,\ldots, T-1` 
- ``P_matrices``: :math:`T\times n_{\rm states}\times n_{\rm states}` array. ``P_matrices[t]`` represents inter-state mobility matrix with outflows normalized to 1 (row sum=1) at time :math:`t` for :math:`t=0,1,\ldots, T-1` 
- ``T``: the number of time steps to simulate the model and make predictions for :math:`t=1,\ldots, T`. Each step again represents a week (7D)

The function will then output:

-  A tuple (``IR``, ``I``) of two :math:`T\times n_{\rm states}` arrays
   giving model predictions for size of IR and I populations in each
   state for :math:`t= 1, \ldots, T`

.. jupyter-execute::

     def Networked_SIR(beta, gamma, IR_init, I_init,\
                                         C_matrices, P_matrices):
         n_states = len(I_init)   
         T = len(C_matrices)
    
         I = np.zeros((T+1,n_states))
         IR = np.zeros((T+1,n_states))
         I[0], IR[0] = I_init.copy(), IR_init.copy()
         # I[0], IR[0] = I_init[:,0].copy(), IR_init[:,0].copy()
         for t in range(T):
             P_beta_CT_t = P_matrices[t].dot(\
                            np.diag(beta).dot(C_matrices[t].T))
             I[t+1] = np.dot(expm(\
                            P_beta_CT_t - gamma*np.eye(n_states)),I[t])
             IR[t+1] = IR[t] + np.dot(P_beta_CT_t,I[t])
         return IR[1:], I[1:]
    

Here, we search for
:math:`\beta = [\beta_1,\ldots,\beta_{n_{\rm states}}]` and
:math:`\gamma` minimizing the following loss:

.. math:: {\underset{\beta, \gamma}{\rm minimize}} \sum_{t=1}^T \sum_{i=1}^{n_{\rm states}}\frac{\big(\text{cumulative cases from data}(i,t) - \text{predicted cumulative cases}(i,t)\big)^2}{n_{\rm states}\times T}

This is the average of the losses of individual states’ predictions for
the **raw number of cases** and not the normalized counts (can you guess
why we do not use normalized counts here?).

Here is the code implementing this loss function:

.. jupyter-execute::

     def Networked_SIR_LOSS(params, IR_init, I_init,\
                         C_matrices, P_matrices, observed_cumul_cases):
         beta, gamma = params[:-1], params[-1] 
         n_states = len(I_init)   
         T = len(P_matrices)
    
         predicted_IR, predicted_I = Networked_SIR(beta, gamma,\
                              IR_init, I_init, C_matrices, P_matrices)
    
         error_in_cumul_cases = (observed_cumul_cases-predicted_IR)\
                                         *state_geo['population'].values
         # normalizing the error with number of states and T. 
         # The devision by 1000 is just for the error not to blow up!
         loss = norm(error_in_cumul_cases/(n_states*T*1000))**2
         return loss

Inputs and outputs to ``Networked_SIR_LOSS`` function: 

- ``params``: 1D array to pass model parameters :math:`\beta` and :math:`\gamma` to the loss function. 
- ``IR_init``: the initial size of infected-or-recovered cases for each state - ``I_init``: the initial size of infected cases for each state 
- ``C_matrices``: :math:`T\times n_{\rm states}\times n_{\rm states}` array. ``C_matrices[t]`` represents inter-state mobility matrix with inflows normalized to 1 (column sum=1) at time :math:`t` for :math:`t=0,1,\ldots, T-1` 
- ``P_matrices``: :math:`T\times n_{\rm states}\times n_{\rm states}` array. ``P_matrices[t]`` represents inter-state mobility matrix with outflows normalized to 1 (row sum=1) at time :math:`t` for :math:`t=0,1,\ldots, T-1` 
- ``observed_cumul_cases``: :math:`T\times n_{\rm states}` array passing the observed cumulative count of covid cases in each state for :math:`t=1,\ldots, T`.

**Note that we still pass the normalized value of observed_cumul_cases
to the loss function. The conversion to raw numbers is implemented
inside the function.**

The function will then output:

- ``loss``: The sum of squares of prediction errors in cumulative case counts at state level, averaged over all the states
      

.. topic:: Task 3.3

    -  Use the function
       `minimize <https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html>`__
       from *scipy.optimize* package to find values of
       :math:`\beta=[\beta_1,\ldots,\beta_{n}]` and :math:`\gamma`
       minimizing the loss function ``Networked_SIR_LOSS`` over a period of
       :math:`T=16` weeks, from March 25 to July 08. (Choose data from
       ``state_cases_cumulative`` for this period to pass as
       ``observed_cumul_cases`` to the loss function)
    
    -  make a bar plot of :math:`\beta` values versus their state ids (use
       ``state_geo.index`` for state ids)
    
    -  Pass the values obtained for :math:`\beta` and :math:`\gamma` to
       ``Networked_SIR`` function and use its output to predict the
       cumulative number of covid cases in the US for 36 weeks starting from
       March 18.
    
    -  Plot the predictions together with observed data for both (i)
       cumulative number of cases and (ii) weekly new cases ( use
       ``np.diff()`` to obtain the latter)

.. jupyter-execute::

     t_init = state_cases_cumulative.index[0]
     datelist = avg_inflow[t_init:].index
     C_matrices = np.array([interstate_mobility_weekly[date].div(\
         interstate_mobility_weekly[date].sum(axis=0), axis=1).values\
                                                   for date in datelist])
     P_matrices = np.array([interstate_mobility_weekly[date].div(\
         interstate_mobility_weekly[date].sum(axis=1), axis=0).values\
                                                   for date in datelist])
    
     IR_init = I_init = state_cases_cumulative.head(1).values[0]
     n_states = len(I_init)
     beta, gamma = beta_SIR*np.ones(n_states), gamma_SIR
     params_init = np.append(beta,gamma)

.. jupyter-execute::

     start_time = timer()
     res = minimize(Networked_SIR_LOSS, params_init, args = (IR_init, I_init,\
             C_matrices[:n_train],P_matrices[:n_train],\
                     state_cases_cumulative[1:1+n_train].values))
     elapsed = timer() - start_time
     print('elapsed time: {:.2f} seconds, success:{}'.format(\
                                                 elapsed, res['success']))
     beta, gamma = res['x'][:-1], res['x'][-1]

.. jupyter-execute::

     plt.figure(1, figsize=(18,6))
     plt.bar(state_geo.index, beta, label='beta'); plt.legend()
    

.. jupyter-execute::

     predicted_IR, predicted_I = Networked_SIR(beta, gamma, IR_init, I_init,\
           C_matrices,P_matrices)
    
     population_weights = state_geo['population'].div(\
                                            state_geo['population'].sum())
     US_predicted_IR = (predicted_IR*population_weights.values).sum(axis=1)
    
     predicted_weekly_cases = np.diff(\
             np.concatenate([IR_init.reshape(1,-1), predicted_IR]), axis=0)
     US_predicted_weekly_cases = \
             (predicted_weekly_cases*population_weights.values).sum(axis=1)
     n_preds = len(US_predicted_weekly_cases)

    fig, axes = plt.subplots(2,1)
    iter_axes = itertools.cycle(fig.axes)
    
    ax = next(iter_axes)
    ax.cla()
    ax.plot(nation_cases_weekly[:n_preds+1],\
                 label='US weekly cases', linewidth=4)
    ax.plot(nation_cases_weekly.index[1:n_preds+1],\
                 US_predicted_weekly_cases, label='predicted',\
                                              linewidth=2); ax.legend()   
    ax = next(iter_axes)
    ax.cla()
    ax.plot(nation_cases_cumulative[:n_preds+1],\
             label='US cumulative cases', linewidth=4); ax.legend()
    ax.plot(nation_cases_cumulative.index[1:n_preds+1],\
             US_predicted_IR, label='predicted', linewidth=2); ax.legend()
    
Model-based prediction of the efficacy of local state regulations and social practices
--------------------------------------------------------------------------------------

Local and state-level policies and regulations, such as stay-at-home
orders and working from home, as well as social practices such as social
distancing and public mask-wearing, can significantly affect the local
infection/contact rates.

As the last step of this project, we will learn how to use our model to
estimate the impact of improving infection/contact rates on the size of
infection.

.. topic:: Task 4.1

    Suppose that by means of local regulations and promoting social
    awareness you could reduce infection rates from
    :math:`\beta=[\beta_1,\ldots,\beta_n]` to :math:`c\,\beta` for some
    :math:`0<c<1`. Let us hypothetically assume that the reduced rates had
    come into effect on July 8. Use the model predictions to estimate how
    small :math:`c` needed to be in order to keep the cumulative number of
    cases in the US on November 18 below :math:`1.5\%` as opposed to its
    real value of :math:`3.84\%`

**Note:** When simulating, keep mobility matrices same as before. As for
IR_init and I_init on July 8, use values predicted for IR and I by the
model for July 8 in Task 3.3.

.. jupyter-execute::
    
    pd.options.display.float_format = '{:.3f}'.format
    display(nation_cases_cumulative.iloc[n_preds:n_preds+2])

.. jupyter-execute::

     reduced_beta = 0.94*beta
     predicted_IR_mitigated, predicted_I_mitigated =\
         Networked_SIR(reduced_beta, gamma,\
               predicted_IR[n_train-1],predicted_I[n_train-1],\
                           C_matrices[n_train:],P_matrices[n_train:])
    
     US_predicted_IR_mitigated =\
         (predicted_IR_mitigated*population_weights.values).sum(axis=1)
     US_predicted_IR_merged =\
         np.append(US_predicted_IR[:n_train], US_predicted_IR_mitigated)
    
     fig, ax = plt.subplots()
     ax.plot(nation_cases_cumulative[:n_preds+1],\
         label='US cumulative cases', linewidth=4); ax.legend()
     ax.plot(nation_cases_cumulative.index[1:n_preds+1],\
        US_predicted_IR_merged,\
          label=r'prediction for reduced $\beta$: {:.2f}$\beta$'.format(\
                               reduced_beta[0]/beta[0]), linewidth=2)
     ax.legend()
     ax.plot(nation_cases_cumulative.index[1:n_preds+1],\
         US_predicted_IR,\
            label=r'prediction for original $\beta$', linewidth=2)
     ax.legend()


.. jupyter-execute::
     
     print(\
     '{:.2f}% achieved by {:.1f}% improvement in infection rates'.format(\
         100*US_predicted_IR_merged[-1], 100*(1-reduced_beta[0]/beta[0])))

    
.. topic:: Task 4.2 (bonus)

    Consider another scenario where by running targetted campaigns and
    promoting effective social practices you could reduce infection rates at
    **ten of the states, each by** :math:`5\%`. Which states would you pick,
    to have the least number of national cumulative cases by November 18
    (assuming again that the new rates would come to effect on July 08).
    Report the best choice you can come up with and plot the number of cases
    over time as predicted by the model for your choice.

Timeline
--------

-  Final reports are due on Wednesday, December 9th by 5PM.  
-  No separate report is needed. Complete the tasks within the same
   notebook (adding text or code cells as needed) and submit it as your
   report.
-  If you still prefer to do any part separately, you can attach it to
   your submission.
- There are **bonus points** up for grabs for **Task 4.2**, with a worth equal to 10\% of the project. The bonus will be equally distributed among those who submit the best solution.

Good luck!