diff --git a/.nojekyll b/.nojekyll
index 1f9130b..58766e1 100644
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-d7a0f323
\ No newline at end of file
+fa893d75
\ No newline at end of file
diff --git a/Class Notebooks/01A_EDA.html b/Class Notebooks/01A_EDA.html
index 983bb6d..863c62e 100644
--- a/Class Notebooks/01A_EDA.html	
+++ b/Class Notebooks/01A_EDA.html	
@@ -8,7 +8,7 @@
 
 <meta name="author" content="Roy McKenzie, Joshua Edelmann, Benjamin Feder">
 
-<title>Wisconsin Applied Data Analytics 2023 Class - Exploratory Data Analysis</title>
+<title>Wisconsin Applied Data Analytics 2023 Class - 1&nbsp; Exploratory Data Analysis</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -112,7 +112,7 @@
       <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
         <i class="bi bi-layout-text-sidebar-reverse"></i>
       </button>
-      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/01A_EDA.html"><span class="chapter-title">Exploratory Data Analysis</span></a></li></ol></nav>
+      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/01A_EDA.html"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></a></li></ol></nav>
       <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
       </a>
       <button type="button" class="btn quarto-search-button" aria-label="" onclick="window.quartoOpenSearch();">
@@ -132,7 +132,7 @@
     <div class="sidebar-title mb-0 py-0">
       <a href="../">Wisconsin Applied Data Analytics 2023 Class</a> 
         <div class="sidebar-tools-main">
-    <a href="https://github.com/Coleridge-Initiative" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
+    <a href="https://github.com/Coleridge-Initiative/ada-2023-wi" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
 </div>
     </div>
       </div>
@@ -151,62 +151,74 @@
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link active"><span class="chapter-title">Exploratory Data Analysis</span></a>
+  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link active">
+ <span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cross-Sectional Analysis</span></a>
+  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></a>
+  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></a>
+  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Measurement</span></a>
+  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Visualization</span></a>
+  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Characterizing Demand</span></a>
+  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Disclosure Review</span></a>
+  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Dimensional Model Scripts</span></a>
+  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></a>
+  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Building Employer Measures</span></a>
+  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Working with NAICS Crosswalk</span></a>
+  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></span></a>
   </div>
 </li>
     </ul>
@@ -219,17 +231,17 @@
     <h2 id="toc-title">Table of contents</h2>
    
   <ul>
-  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction">Introduction</a>
+  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction"><span class="header-section-number">2</span> Introduction</a>
   <ul class="collapse">
   <li><a href="#the-purpose-of-these-workbooks" id="toc-the-purpose-of-these-workbooks" class="nav-link" data-scroll-target="#the-purpose-of-these-workbooks">The Purpose of These Workbooks</a></li>
   </ul></li>
-  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup">Technical Setup</a>
+  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup"><span class="header-section-number">3</span> Technical Setup</a>
   <ul class="collapse">
   <li><a href="#load-libraries" id="toc-load-libraries" class="nav-link" data-scroll-target="#load-libraries">Load Libraries</a></li>
   <li><a href="#establish-database-connection" id="toc-establish-database-connection" class="nav-link" data-scroll-target="#establish-database-connection">Establish Database Connection</a></li>
   <li><a href="#new-.renviron" id="toc-new-.renviron" class="nav-link" data-scroll-target="#new-.renviron">New <code>.Renviron</code></a></li>
   </ul></li>
-  <li><a href="#wi-promis-data-ds_wi_dwd.promis" id="toc-wi-promis-data-ds_wi_dwd.promis" class="nav-link" data-scroll-target="#wi-promis-data-ds_wi_dwd.promis">WI PROMIS data: <code>ds_wi_dwd.promis</code></a>
+  <li><a href="#wi-promis-data-ds_wi_dwd.promis" id="toc-wi-promis-data-ds_wi_dwd.promis" class="nav-link" data-scroll-target="#wi-promis-data-ds_wi_dwd.promis"><span class="header-section-number">4</span> WI PROMIS data: <code>ds_wi_dwd.promis</code></a>
   <ul class="collapse">
   <li><a href="#structure-of-the-promis-data" id="toc-structure-of-the-promis-data" class="nav-link" data-scroll-target="#structure-of-the-promis-data">Structure of the PROMIS data</a>
   <ul class="collapse">
@@ -246,7 +258,7 @@ <h2 id="toc-title">Table of contents</h2>
   </ul></li>
   <li><a href="#checkpoint" id="toc-checkpoint" class="nav-link" data-scroll-target="#checkpoint">Checkpoint</a></li>
   </ul></li>
-  <li><a href="#wi-ui-wage-records-ds_wi_dsd.ui_wage" id="toc-wi-ui-wage-records-ds_wi_dsd.ui_wage" class="nav-link" data-scroll-target="#wi-ui-wage-records-ds_wi_dsd.ui_wage">WI UI wage records: <code>ds_wi_dsd.ui_wage</code></a>
+  <li><a href="#wi-ui-wage-records-ds_wi_dsd.ui_wage" id="toc-wi-ui-wage-records-ds_wi_dsd.ui_wage" class="nav-link" data-scroll-target="#wi-ui-wage-records-ds_wi_dsd.ui_wage"><span class="header-section-number">5</span> WI UI wage records: <code>ds_wi_dsd.ui_wage</code></a>
   <ul class="collapse">
   <li><a href="#structure-of-the-ui-wage-records" id="toc-structure-of-the-ui-wage-records" class="nav-link" data-scroll-target="#structure-of-the-ui-wage-records">Structure of the UI wage records</a></li>
   <li><a href="#coverage-of-the-ui-wage-records" id="toc-coverage-of-the-ui-wage-records" class="nav-link" data-scroll-target="#coverage-of-the-ui-wage-records">Coverage of the UI wage records</a></li>
@@ -257,8 +269,8 @@ <h2 id="toc-title">Table of contents</h2>
   </ul></li>
   <li><a href="#checkpoint-1" id="toc-checkpoint-1" class="nav-link" data-scroll-target="#checkpoint-1">Checkpoint</a></li>
   </ul></li>
-  <li><a href="#next-steps-applying-the-workbook-to-your-project" id="toc-next-steps-applying-the-workbook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-the-workbook-to-your-project">Next Steps: Applying the workbook to your project</a></li>
-  <li><a href="#citation" id="toc-citation" class="nav-link" data-scroll-target="#citation">Citation</a></li>
+  <li><a href="#next-steps-applying-the-workbook-to-your-project" id="toc-next-steps-applying-the-workbook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-the-workbook-to-your-project"><span class="header-section-number">6</span> Next Steps: Applying the workbook to your project</a></li>
+  <li><a href="#citation" id="toc-citation" class="nav-link" data-scroll-target="#citation"><span class="header-section-number">7</span> Citation</a></li>
   </ul>
 </nav>
     </div>
@@ -267,7 +279,7 @@ <h2 id="toc-title">Table of contents</h2>
 
 <header id="title-block-header" class="quarto-title-block default">
 <div class="quarto-title">
-<h1 class="title"><span class="chapter-title">Exploratory Data Analysis</span></h1>
+<h1 class="title"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></h1>
 <p class="subtitle lead">Module 2: Workbook 1A</p>
 </div>
 
@@ -304,8 +316,8 @@ <h1 class="title"><span class="chapter-title">Exploratory Data Analysis</span></
 }
 </style>
 </div>
-<section id="introduction" class="level1">
-<h1>Introduction</h1>
+<section id="introduction" class="level1" data-number="2">
+<h1 data-number="2"><span class="header-section-number">2</span> Introduction</h1>
 <p>Welcome to the first workbook for Module 2 of this course, covering <strong>Exploratory Data Analysis</strong>, or EDA.</p>
 <p>In class, we learned that EDA is the process of examining your data to:</p>
 <ul>
@@ -328,8 +340,8 @@ <h2 class="anchored" data-anchor-id="the-purpose-of-these-workbooks">The Purpose
 <p>The timeline for completing these workbooks will be given on the training website and communicated to you in class. Unlike the Foundations Module workbooks, these workbooks should be completed as homework after we have discussed the material in class.</p>
 </section>
 </section>
-<section id="technical-setup" class="level1">
-<h1>Technical Setup</h1>
+<section id="technical-setup" class="level1" data-number="3">
+<h1 data-number="3"><span class="header-section-number">3</span> Technical Setup</h1>
 <p>This workbook will cover both SQL and R coding concepts, so we need to set up our environment to connect to the proper database and run R code only accessible in packages external to the basic R environment. Typically, throughout these workbooks, we use SQL for the majority of data exploration and creation of the analytic frame, and then read that analytic frame into R for the descriptive analysis and visualization.</p>
 <p><strong>Note:</strong> If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, <a href="#wi-promis-data-ds_wi_dwd.promis">WI PROMIS data: <code>ds_wi_dwd.promis</code></a>.</p>
 <section id="load-libraries" class="level2">
@@ -370,8 +382,8 @@ <h2 class="anchored" data-anchor-id="new-.renviron">New <code>.Renviron</code></
 <p>where <code>John.Doe.P00002</code> is replaced with your username and <code>xxxxxxxxxx</code> is replaced with your password (both still in quotes!) The setup of this code is nearly identical to that required in the Foundations Module workspace - however, <code>DBUSER</code> should now end with <code>.T00111</code> instead of <code>.T00112</code>.</p>
 </section>
 </section>
-<section id="wi-promis-data-ds_wi_dwd.promis" class="level1">
-<h1>WI PROMIS data: <code>ds_wi_dwd.promis</code></h1>
+<section id="wi-promis-data-ds_wi_dwd.promis" class="level1" data-number="4">
+<h1 data-number="4"><span class="header-section-number">4</span> WI PROMIS data: <code>ds_wi_dwd.promis</code></h1>
 <p>The primary dataset we will use in this class is the <strong>Wisconsin PROMIS data.</strong> The PROMIS (Program for Measuring Insured Unemployed Statistics) data, stored on Redshift as <code>ds_wi_dwd.promis</code>, provides information on unemployment insurance claimants in Wisconsin. Specifically, according to the LAUS Extraction Guide, the data includes “initial claims, additional initial claims, and continued claims that were either new or updated.”</p>
 <section id="structure-of-the-promis-data" class="level2">
 <h2 class="anchored" data-anchor-id="structure-of-the-promis-data">Structure of the PROMIS data</h2>
@@ -551,8 +563,8 @@ <h2 class="anchored" data-anchor-id="checkpoint">Checkpoint</h2>
 <p>Using the data dictionary, identify one or more further variables from the PROMIS data that might be relevant to your group’s analysis. Think through what these variables “should” look like, as well as what issues might arise. Working individually or with your group, examine the distribution of these variables. Document any EDA-related concerns and findings in your team’s project template. Brainstorm as to what the cause of these issues might be, and how it could impact your analysis.</p>
 </section>
 </section>
-<section id="wi-ui-wage-records-ds_wi_dsd.ui_wage" class="level1">
-<h1>WI UI wage records: <code>ds_wi_dsd.ui_wage</code></h1>
+<section id="wi-ui-wage-records-ds_wi_dsd.ui_wage" class="level1" data-number="5">
+<h1 data-number="5"><span class="header-section-number">5</span> WI UI wage records: <code>ds_wi_dsd.ui_wage</code></h1>
 <p>We’re now going to apply these same EDA concepts to a second dataset, Wisconsin’s UI wage records, which are stored on Redshift as <code>ds_wi_dwd.ui_wage</code>.</p>
 <p>We will keep the narrative of our exploration far briefer in this section. You are encouraged to read through the following output and think about how it pertains to the discussions that we had above.</p>
 <section id="structure-of-the-ui-wage-records" class="level2">
@@ -689,15 +701,15 @@ <h2 class="anchored" data-anchor-id="checkpoint-1">Checkpoint</h2>
 <p>As with the PROMIS data, use the data dictionary to identify one or more further variables from the UI wage records that might be relevant to your group’s analysis. Think through what these variables “should” look like, as well as what issues might arise. Working individually or with your group, examine the distribution of these variables. Document any EDA-related concerns and findings in your project template. Brainstorm as to what the cause of these issues might be, and how it could impact your analysis.</p>
 </section>
 </section>
-<section id="next-steps-applying-the-workbook-to-your-project" class="level1">
-<h1>Next Steps: Applying the workbook to your project</h1>
+<section id="next-steps-applying-the-workbook-to-your-project" class="level1" data-number="6">
+<h1 data-number="6"><span class="header-section-number">6</span> Next Steps: Applying the workbook to your project</h1>
 <p>The workbook provides a structure for you to start your EDA process on the data within the scope of your project. The data coverage and row definition for the two primary datasets in this training is available, allowing you to focus on evaluating the distribution of variables potentially relevant to your analysis. The data coverage is particularly essential for project ideas linking the two datasets, as you will want to select a set of years, quarters, and weeks that are available in potentially both datasets.</p>
 <p>As you evaluate variable distributions, you can start by repurposing the code in these sections. There are code snippets for distributions of numeric, time-based, and categorical variables that may be appropriate depending on the type of column you are interested in exploring.</p>
 <p>In doing so, as recommended in the checkpoints, note your findings in your team’s project template. As your project progresses, it will be helpful to look back at these notes, especially in thinking through how to most accurately and best communicate your team’s final product to an external audience. Ultimately, the EDA process is an essential step in the project development lifecycle, as it provides helpful contextual information on the variables you may choose to use (or not use) in your analysis.</p>
 <p>For all of these steps, remember not to take notes or discuss exact results outside the ADRF. Instead, create notes or output inside the ADRF, and store them either in your U: drive or in your team’s folder on the P: drive. When discussing results with your team, remember to speak broadly, and instead direct them to look at specific findings within the ADRF. And, as always, feel free to reach out to the Coleridge team if you have any questions as you get used to this workflow!</p>
 </section>
-<section id="citation" class="level1">
-<h1>Citation</h1>
+<section id="citation" class="level1" data-number="7">
+<h1 data-number="7"><span class="header-section-number">7</span> Citation</h1>
 <p><a href="https://www.bls.gov/cew/additional-resources/unemployment-insurance-modernization/laus-ui-extract-guide.pdf">LAUS EXTRACTION GUIDE</a></p>
 <p>AR EDA Notebook (link to come)</p>
 
@@ -946,7 +958,7 @@ <h1>Citation</h1>
   </div>
   <div class="nav-page nav-page-next">
       <a href="../Class Notebooks/01B_cross_section.html" class="pagination-link">
-        <span class="nav-page-text"><span class="chapter-title">Cross-Sectional Analysis</span></span> <i class="bi bi-arrow-right-short"></i>
+        <span class="nav-page-text"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></span> <i class="bi bi-arrow-right-short"></i>
       </a>
   </div>
 </nav>
diff --git a/Class Notebooks/01B_cross_section.html b/Class Notebooks/01B_cross_section.html
index 4936aca..f53995b 100644
--- a/Class Notebooks/01B_cross_section.html	
+++ b/Class Notebooks/01B_cross_section.html	
@@ -8,7 +8,7 @@
 
 <meta name="author" content="Roy McKenzie, Joshua Edelmann, Benjamin Feder">
 
-<title>Wisconsin Applied Data Analytics 2023 Class - Cross-Sectional Analysis</title>
+<title>Wisconsin Applied Data Analytics 2023 Class - 2&nbsp; Cross-Sectional Analysis</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -112,7 +112,7 @@
       <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
         <i class="bi bi-layout-text-sidebar-reverse"></i>
       </button>
-      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/01B_cross_section.html"><span class="chapter-title">Cross-Sectional Analysis</span></a></li></ol></nav>
+      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/01B_cross_section.html"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></a></li></ol></nav>
       <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
       </a>
       <button type="button" class="btn quarto-search-button" aria-label="" onclick="window.quartoOpenSearch();">
@@ -132,7 +132,7 @@
     <div class="sidebar-title mb-0 py-0">
       <a href="../">Wisconsin Applied Data Analytics 2023 Class</a> 
         <div class="sidebar-tools-main">
-    <a href="https://github.com/Coleridge-Initiative" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
+    <a href="https://github.com/Coleridge-Initiative/ada-2023-wi" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
 </div>
     </div>
       </div>
@@ -151,62 +151,74 @@
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Exploratory Data Analysis</span></a>
+  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link active"><span class="chapter-title">Cross-Sectional Analysis</span></a>
+  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link active">
+ <span class="menu-text"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></a>
+  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></a>
+  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Measurement</span></a>
+  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Visualization</span></a>
+  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Characterizing Demand</span></a>
+  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Disclosure Review</span></a>
+  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Dimensional Model Scripts</span></a>
+  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></a>
+  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Building Employer Measures</span></a>
+  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Working with NAICS Crosswalk</span></a>
+  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></span></a>
   </div>
 </li>
     </ul>
@@ -219,13 +231,13 @@
     <h2 id="toc-title">Table of contents</h2>
    
   <ul>
-  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction">Introduction</a></li>
-  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup">Technical Setup</a>
+  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction"><span class="header-section-number">3</span> Introduction</a></li>
+  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup"><span class="header-section-number">4</span> Technical Setup</a>
   <ul class="collapse">
   <li><a href="#load-libraries" id="toc-load-libraries" class="nav-link" data-scroll-target="#load-libraries">Load Libraries</a></li>
   <li><a href="#establish-database-connection" id="toc-establish-database-connection" class="nav-link" data-scroll-target="#establish-database-connection">Establish Database Connection</a></li>
   </ul></li>
-  <li><a href="#cross-section" id="toc-cross-section" class="nav-link" data-scroll-target="#cross-section">Cross-section</a>
+  <li><a href="#cross-section" id="toc-cross-section" class="nav-link" data-scroll-target="#cross-section"><span class="header-section-number">5</span> Cross-section</a>
   <ul class="collapse">
   <li><a href="#checkpoint" id="toc-checkpoint" class="nav-link" data-scroll-target="#checkpoint">Checkpoint</a></li>
   <li><a href="#creating-our-cross-section" id="toc-creating-our-cross-section" class="nav-link" data-scroll-target="#creating-our-cross-section">Creating our Cross-section</a></li>
@@ -238,8 +250,8 @@ <h2 id="toc-title">Table of contents</h2>
   </ul></li>
   <li><a href="#checkpoint-1" id="toc-checkpoint-1" class="nav-link" data-scroll-target="#checkpoint-1">Checkpoint</a></li>
   </ul></li>
-  <li><a href="#next-steps-applying-this-notebook-to-your-project" id="toc-next-steps-applying-this-notebook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-this-notebook-to-your-project">Next Steps: Applying this notebook to your project</a></li>
-  <li><a href="#citation" id="toc-citation" class="nav-link" data-scroll-target="#citation">Citation</a></li>
+  <li><a href="#next-steps-applying-this-notebook-to-your-project" id="toc-next-steps-applying-this-notebook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-this-notebook-to-your-project"><span class="header-section-number">6</span> Next Steps: Applying this notebook to your project</a></li>
+  <li><a href="#citation" id="toc-citation" class="nav-link" data-scroll-target="#citation"><span class="header-section-number">7</span> Citation</a></li>
   </ul>
 </nav>
     </div>
@@ -248,7 +260,7 @@ <h2 id="toc-title">Table of contents</h2>
 
 <header id="title-block-header" class="quarto-title-block default">
 <div class="quarto-title">
-<h1 class="title"><span class="chapter-title">Cross-Sectional Analysis</span></h1>
+<h1 class="title"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></h1>
 <p class="subtitle lead">Module 2: Workbook 1B</p>
 </div>
 
@@ -285,14 +297,14 @@ <h1 class="title"><span class="chapter-title">Cross-Sectional Analysis</span></h
 }
 </style>
 </div>
-<section id="introduction" class="level1">
-<h1>Introduction</h1>
+<section id="introduction" class="level1" data-number="3">
+<h1 data-number="3"><span class="header-section-number">3</span> Introduction</h1>
 <p>Our next notebook in Module 2 will build off the EDA concepts discussed in the first one, extending the years, quarters, and weeks as part of the data coverage component to a method rooted in a specific moment in time - <strong>cross-section analysis</strong>. A cross-section allows us to look at a slice of our data in time so we can evaluate the stock of observations, just at that particular snapshot. <strong>Through the remainder of the class notebooks, we will apply each topic to the same focused research topic, all aimed at better understanding unemployment to reemployment pathways for a specific set of claimants receiving benefits after COVID-imposed restrictions were lifted in Wisconsin.</strong></p>
 <p>Composing a cross-section enables for broad understandings of volume and in this context, claimant compositions. Especially as a workforce board, it can be immensely useful to understand common characteristics of those receiving UI benefits, regardless of benefit duration, particularly in evaluating workforce alignment scenarios to identify promising job matches between prospective employee and employer.</p>
 <p>Cross section analyses are limited in gaining a deep understanding of experiences over time, though, because they are tracking stocks of observations at certain points in time, rather than observations consistently throughout the time period. A different analysis method is more appropriate for a longitudinal study, one that we will introduce in the next notebook. At a minimum, even for those intending on evaluating claimant experiences longitudinally, cross sections can provide important context.</p>
 </section>
-<section id="technical-setup" class="level1">
-<h1>Technical Setup</h1>
+<section id="technical-setup" class="level1" data-number="4">
+<h1 data-number="4"><span class="header-section-number">4</span> Technical Setup</h1>
 <p>Here, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, <a href="#cross-section">Cross-section</a>.</p>
 <section id="load-libraries" class="level2">
 <h2 class="anchored" data-anchor-id="load-libraries">Load Libraries</h2>
@@ -325,8 +337,8 @@ <h2 class="anchored" data-anchor-id="establish-database-connection">Establish Da
 <p>For this code to work, you need to have an <code>.Renviron</code> file in your user folder (i.e.&nbsp;U:\John.Doe.P00002) containing your username and password.</p>
 </section>
 </section>
-<section id="cross-section" class="level1">
-<h1>Cross-section</h1>
+<section id="cross-section" class="level1" data-number="5">
+<h1 data-number="5"><span class="header-section-number">5</span> Cross-section</h1>
 <p>Even though we will eventually build out a longitudinal study for claimants starting to receive UI benefits after COVID-related restrictions ended in the state, starting with a cross-sectional analysis will help us better understand the dynamics of the entire set of individuals receiving UI benefits at this time. Here, we aim to evaluate this stock of claimants in a variety of ways:</p>
 <ul>
 <li>Individual characteristics</li>
@@ -576,13 +588,13 @@ <h2 class="anchored" data-anchor-id="checkpoint-1">Checkpoint</h2>
 <p>If you think a cross-sectional analysis would be helpful for your group project, identify variables, or combinations of variables, you’d like to look into after developing your cross-section. Working individually or with your group, if you end up developing a cross-section, examine the distribution of these variables. Document any concerns and findings in your team’s project template, and think about how you may want to contextualize these findings within your overall project.</p>
 </section>
 </section>
-<section id="next-steps-applying-this-notebook-to-your-project" class="level1">
-<h1>Next Steps: Applying this notebook to your project</h1>
+<section id="next-steps-applying-this-notebook-to-your-project" class="level1" data-number="6">
+<h1 data-number="6"><span class="header-section-number">6</span> Next Steps: Applying this notebook to your project</h1>
 <p>This workbook applies the concepts of a cross-sectional analysis to Wisconsin’s PROMIS data and discusses some of the considerations and potential of such a investigation. Even if your team’s ultimate plan is to perform a longitudinal analysis, a cross-sectional approach may be useful. If your team deems it appropriate to develop a cross-section, you are encouraged to repurpose as much code as possible in developing your initial snapshot and subsequent descriptive analysis.</p>
 <p>As you work through your project, it is recommended that you add your thoughts and findings to your team’s project template in the ADRF.</p>
 </section>
-<section id="citation" class="level1">
-<h1>Citation</h1>
+<section id="citation" class="level1" data-number="7">
+<h1 data-number="7"><span class="header-section-number">7</span> Citation</h1>
 <p>Tian Lou, &amp; Dave McQuown. (2021, March 8). Data Exploration for Cross-sectional Analysis using Illinois Unemployment Insurance Data. Zenodo. https://doi.org/10.5281/zenodo.4588936</p>
 <p>Tian Lou, &amp; Dave McQuown. (2021, March 8). Data Visualization using Illinois Unemployment Insurance Data. Zenodo. https://doi.org/10.5281/zenodo.4589040</p>
 <p>Census NAICS codes. https://www.census.gov/naics/</p>
@@ -828,12 +840,12 @@ <h1>Citation</h1>
 <nav class="page-navigation">
   <div class="nav-page nav-page-previous">
       <a href="../Class Notebooks/01A_EDA.html" class="pagination-link">
-        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-title">Exploratory Data Analysis</span></span>
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></span>
       </a>          
   </div>
   <div class="nav-page nav-page-next">
       <a href="../Class Notebooks/02_cohort.html" class="pagination-link">
-        <span class="nav-page-text"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span> <i class="bi bi-arrow-right-short"></i>
+        <span class="nav-page-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span> <i class="bi bi-arrow-right-short"></i>
       </a>
   </div>
 </nav>
diff --git a/Class Notebooks/02_cohort.html b/Class Notebooks/02_cohort.html
index 690d616..6562548 100644
--- a/Class Notebooks/02_cohort.html	
+++ b/Class Notebooks/02_cohort.html	
@@ -8,7 +8,7 @@
 
 <meta name="author" content="Roy McKenzie, Benjamin Feder, Joshua Edelmann">
 
-<title>Wisconsin Applied Data Analytics 2023 Class - Cohort Analysis Part 1: Defining a Cohort</title>
+<title>Wisconsin Applied Data Analytics 2023 Class - 3&nbsp; Cohort Analysis Part 1:&nbsp;Defining a Cohort</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -112,7 +112,7 @@
       <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
         <i class="bi bi-layout-text-sidebar-reverse"></i>
       </button>
-      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/02_cohort.html"><span class="chapter-title">Cohort Analysis Part 1:\ Defining a Cohort</span></a></li></ol></nav>
+      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/02_cohort.html"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:\ Defining a Cohort</span></a></li></ol></nav>
       <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
       </a>
       <button type="button" class="btn quarto-search-button" aria-label="" onclick="window.quartoOpenSearch();">
@@ -132,7 +132,7 @@
     <div class="sidebar-title mb-0 py-0">
       <a href="../">Wisconsin Applied Data Analytics 2023 Class</a> 
         <div class="sidebar-tools-main">
-    <a href="https://github.com/Coleridge-Initiative" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
+    <a href="https://github.com/Coleridge-Initiative/ada-2023-wi" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
 </div>
     </div>
       </div>
@@ -151,62 +151,74 @@
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Exploratory Data Analysis</span></a>
+  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cross-Sectional Analysis</span></a>
+  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link active"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></a>
+  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link active">
+ <span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></a>
+  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Measurement</span></a>
+  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Visualization</span></a>
+  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Characterizing Demand</span></a>
+  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Disclosure Review</span></a>
+  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Dimensional Model Scripts</span></a>
+  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></a>
+  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Building Employer Measures</span></a>
+  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Working with NAICS Crosswalk</span></a>
+  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></span></a>
   </div>
 </li>
     </ul>
@@ -219,13 +231,13 @@
     <h2 id="toc-title">Table of contents</h2>
    
   <ul>
-  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction">Introduction</a></li>
-  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup">Technical Setup</a>
+  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction"><span class="header-section-number">4</span> Introduction</a></li>
+  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup"><span class="header-section-number">5</span> Technical Setup</a>
   <ul class="collapse">
   <li><a href="#load-libraries" id="toc-load-libraries" class="nav-link" data-scroll-target="#load-libraries">Load Libraries</a></li>
   <li><a href="#establish-database-connection" id="toc-establish-database-connection" class="nav-link" data-scroll-target="#establish-database-connection">Establish Database Connection</a></li>
   </ul></li>
-  <li><a href="#defining-a-cohort" id="toc-defining-a-cohort" class="nav-link" data-scroll-target="#defining-a-cohort">Defining a Cohort</a>
+  <li><a href="#defining-a-cohort" id="toc-defining-a-cohort" class="nav-link" data-scroll-target="#defining-a-cohort"><span class="header-section-number">6</span> Defining a Cohort</a>
   <ul class="collapse">
   <li><a href="#primary-data-source" id="toc-primary-data-source" class="nav-link" data-scroll-target="#primary-data-source">Primary Data Source</a></li>
   <li><a href="#defining-characteristics---general-considerations" id="toc-defining-characteristics---general-considerations" class="nav-link" data-scroll-target="#defining-characteristics---general-considerations">Defining Characteristics - General Considerations</a>
@@ -234,7 +246,7 @@ <h2 id="toc-title">Table of contents</h2>
   <li><a href="#checkpoint" id="toc-checkpoint" class="nav-link" data-scroll-target="#checkpoint">Checkpoint</a></li>
   </ul></li>
   </ul></li>
-  <li><a href="#creating-the-cohort" id="toc-creating-the-cohort" class="nav-link" data-scroll-target="#creating-the-cohort">Creating the cohort</a>
+  <li><a href="#creating-the-cohort" id="toc-creating-the-cohort" class="nav-link" data-scroll-target="#creating-the-cohort"><span class="header-section-number">7</span> Creating the cohort</a>
   <ul class="collapse">
   <li><a href="#initial-cohort" id="toc-initial-cohort" class="nav-link" data-scroll-target="#initial-cohort">Initial Cohort</a>
   <ul class="collapse">
@@ -247,8 +259,8 @@ <h2 id="toc-title">Table of contents</h2>
   <li><a href="#creating-the-full-cohort-dataset" id="toc-creating-the-full-cohort-dataset" class="nav-link" data-scroll-target="#creating-the-full-cohort-dataset">Creating the “full” cohort dataset</a></li>
   <li><a href="#temporal-measure-planning" id="toc-temporal-measure-planning" class="nav-link" data-scroll-target="#temporal-measure-planning">Temporal Measure Planning</a></li>
   </ul></li>
-  <li><a href="#next-steps-applying-this-notebook-to-your-project" id="toc-next-steps-applying-this-notebook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-this-notebook-to-your-project">Next Steps: Applying this notebook to your project</a></li>
-  <li><a href="#citations" id="toc-citations" class="nav-link" data-scroll-target="#citations">Citations</a></li>
+  <li><a href="#next-steps-applying-this-notebook-to-your-project" id="toc-next-steps-applying-this-notebook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-this-notebook-to-your-project"><span class="header-section-number">8</span> Next Steps: Applying this notebook to your project</a></li>
+  <li><a href="#citations" id="toc-citations" class="nav-link" data-scroll-target="#citations"><span class="header-section-number">9</span> Citations</a></li>
   </ul>
 </nav>
     </div>
@@ -257,7 +269,7 @@ <h2 id="toc-title">Table of contents</h2>
 
 <header id="title-block-header" class="quarto-title-block default">
 <div class="quarto-title">
-<h1 class="title"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></h1>
+<h1 class="title"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></h1>
 <p class="subtitle lead">Module 2: Workbook 2</p>
 </div>
 
@@ -294,8 +306,8 @@ <h1 class="title"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defin
 }
 </style>
 </div>
-<section id="introduction" class="level1">
-<h1>Introduction</h1>
+<section id="introduction" class="level1" data-number="4">
+<h1 data-number="4"><span class="header-section-number">4</span> Introduction</h1>
 <p>Welcome to the second notebook of Module 2 of this course! Here, we will begin the process of <strong>cohort creation</strong> for our research topic spanning the entire series of class notebooks, which is focused on better understanding unemployment to reemployment pathways for a specific set of claimants receiving benefits after COVID-imposed restrictions were lifted in Wisconsin.</p>
 <p>Previously, we applied a cross-sectional analysis to the PROMIS data, which allowed us to better understand the volume of individuals interacting with the unemployment insurance (UI) system at a specific moment in time. Since cross-sections are restricted to particular snapshots, and do not account for shocks though, they are limited in providing a framework for tracking experiences over time.</p>
 <p>A separate method is more appropriate for a longitudinal analysis: <strong>cohort analysis</strong>. In creating a cohort, we will denote a reference point where each member of our cohort experienced a common event - this could be entry into a program, exit from a program, or any other shared experience across a set of observations. With this setup, we can better understand and compare the experiences of those encountering the same policies and economic shocks at the same time, especially across different subgroups.</p>
@@ -307,8 +319,8 @@ <h1>Introduction</h1>
 </ol>
 <p>This notebook is concerned with the first step, as we will walk through the decision rules we will use to define a cohort from the raw microdata aimed at helping us answer our research question. The following notebooks will leverage this initial cohort as we build out the rest of the analysis.</p>
 </section>
-<section id="technical-setup" class="level1">
-<h1>Technical Setup</h1>
+<section id="technical-setup" class="level1" data-number="5">
+<h1 data-number="5"><span class="header-section-number">5</span> Technical Setup</h1>
 <p>Here, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, <a href="#defining-a-cohort">Defining a Cohort</a>.</p>
 <section id="load-libraries" class="level2">
 <h2 class="anchored" data-anchor-id="load-libraries">Load Libraries</h2>
@@ -341,8 +353,8 @@ <h2 class="anchored" data-anchor-id="establish-database-connection">Establish Da
 <p>For this code to work, you need to have an <code>.Renviron</code> file in your user folder (i.e.&nbsp;U:\John.Doe.P00002) containing your username and password.</p>
 </section>
 </section>
-<section id="defining-a-cohort" class="level1">
-<h1>Defining a Cohort</h1>
+<section id="defining-a-cohort" class="level1" data-number="6">
+<h1 data-number="6"><span class="header-section-number">6</span> Defining a Cohort</h1>
 <p>Before writing code for creating and exploring our cohort, it’s crucial to think through the decisions from a data literacy standpoint. Again, the key idea here is to define a set of individuals with a consistent “anchor point” in the data so we can follow them longitudinally.</p>
 <p>First, we have to think through the underlying set of observations we want to track over time and where they exist. Fundamentally, this ties back to identifying our original population of interest.</p>
 <section id="primary-data-source" class="level2">
@@ -388,8 +400,8 @@ <h3 class="anchored" data-anchor-id="checkpoint">Checkpoint</h3>
 </section>
 </section>
 </section>
-<section id="creating-the-cohort" class="level1">
-<h1>Creating the cohort</h1>
+<section id="creating-the-cohort" class="level1" data-number="7">
+<h1 data-number="7"><span class="header-section-number">7</span> Creating the cohort</h1>
 <p>Now that we have developed our cohort decisions, we can start building it out. We will do this in two steps:</p>
 <ol type="1">
 <li>Identify our initial cohort cross-section - pulling in the “anchor point” observation for all individuals in our cohort. Here, that will be their first week of benefit reception.</li>
@@ -609,13 +621,13 @@ <h2 class="anchored" data-anchor-id="temporal-measure-planning">Temporal Measure
 <p>We will cover this calculation, and others, in the upcoming longitudinal analysis notebook.</p>
 </section>
 </section>
-<section id="next-steps-applying-this-notebook-to-your-project" class="level1">
-<h1>Next Steps: Applying this notebook to your project</h1>
+<section id="next-steps-applying-this-notebook-to-your-project" class="level1" data-number="8">
+<h1 data-number="8"><span class="header-section-number">8</span> Next Steps: Applying this notebook to your project</h1>
 <p>This workbook covers the conceptual approach for developing an appropriate cohort aimed at informing a specific research topic. As you work with your group, you should be thinking about the decision rules applied in this notebook and their potential pertinence to your research project. Once you define your cohort, you are encouraged to conduct a basic exploration of key subgroups before progressing with your longitudinal analysis, paying close attention to the subgroup counts.</p>
 <p>Given that the data application decisions are not finalized, you can expect to receive an update on the translation of these cohort restrictions to the PROMIS data next class.</p>
 </section>
-<section id="citations" class="level1">
-<h1>Citations</h1>
+<section id="citations" class="level1" data-number="9">
+<h1 data-number="9"><span class="header-section-number">9</span> Citations</h1>
 <p>Tian Lou, &amp; Dave McQuown. (2021, March 8). Data Exploration for Cohort Analysis using Illinois Unemployment Insurance Data. Zenodo. https://doi.org/10.5281/zenodo.4589024</p>
 <p>AR Creating a Cohort Notebook (link to come)</p>
 
@@ -859,12 +871,12 @@ <h1>Citations</h1>
 <nav class="page-navigation">
   <div class="nav-page nav-page-previous">
       <a href="../Class Notebooks/01B_cross_section.html" class="pagination-link">
-        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-title">Cross-Sectional Analysis</span></span>
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></span>
       </a>          
   </div>
   <div class="nav-page nav-page-next">
       <a href="../Class Notebooks/03_record_linkage.html" class="pagination-link">
-        <span class="nav-page-text"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span> <i class="bi bi-arrow-right-short"></i>
+        <span class="nav-page-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span> <i class="bi bi-arrow-right-short"></i>
       </a>
   </div>
 </nav>
diff --git a/Class Notebooks/03_record_linkage.html b/Class Notebooks/03_record_linkage.html
index 08d0fce..73aa781 100644
--- a/Class Notebooks/03_record_linkage.html	
+++ b/Class Notebooks/03_record_linkage.html	
@@ -8,7 +8,7 @@
 
 <meta name="author" content="Roy McKenzie, Benjamin Feder, Joshua Edelmann">
 
-<title>Wisconsin Applied Data Analytics 2023 Class - Cohort Analysis Part 2: Data Model and Record Linkage</title>
+<title>Wisconsin Applied Data Analytics 2023 Class - 4&nbsp; Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -112,7 +112,7 @@
       <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
         <i class="bi bi-layout-text-sidebar-reverse"></i>
       </button>
-      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/03_record_linkage.html"><span class="chapter-title">Cohort Analysis Part 2:\ Data Model and Record Linkage</span></a></li></ol></nav>
+      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/03_record_linkage.html"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:\ Data Model and Record Linkage</span></a></li></ol></nav>
       <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
       </a>
       <button type="button" class="btn quarto-search-button" aria-label="" onclick="window.quartoOpenSearch();">
@@ -132,7 +132,7 @@
     <div class="sidebar-title mb-0 py-0">
       <a href="../">Wisconsin Applied Data Analytics 2023 Class</a> 
         <div class="sidebar-tools-main">
-    <a href="https://github.com/Coleridge-Initiative" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
+    <a href="https://github.com/Coleridge-Initiative/ada-2023-wi" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
 </div>
     </div>
       </div>
@@ -151,62 +151,74 @@
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Exploratory Data Analysis</span></a>
+  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cross-Sectional Analysis</span></a>
+  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></a>
+  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link active"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></a>
+  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link active">
+ <span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Measurement</span></a>
+  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Visualization</span></a>
+  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Characterizing Demand</span></a>
+  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Disclosure Review</span></a>
+  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Dimensional Model Scripts</span></a>
+  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></a>
+  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Building Employer Measures</span></a>
+  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Working with NAICS Crosswalk</span></a>
+  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></span></a>
   </div>
 </li>
     </ul>
@@ -219,29 +231,29 @@
     <h2 id="toc-title">Table of contents</h2>
    
   <ul>
-  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction">Introduction</a></li>
-  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup">Technical setup</a>
+  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction"><span class="header-section-number">5</span> Introduction</a></li>
+  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup"><span class="header-section-number">6</span> Technical setup</a>
   <ul class="collapse">
   <li><a href="#load-libraries" id="toc-load-libraries" class="nav-link" data-scroll-target="#load-libraries">Load Libraries</a></li>
   <li><a href="#establish-database-connection" id="toc-establish-database-connection" class="nav-link" data-scroll-target="#establish-database-connection">Establish Database Connection</a></li>
   </ul></li>
-  <li><a href="#redefining-our-cohort" id="toc-redefining-our-cohort" class="nav-link" data-scroll-target="#redefining-our-cohort">Redefining our Cohort</a>
+  <li><a href="#redefining-our-cohort" id="toc-redefining-our-cohort" class="nav-link" data-scroll-target="#redefining-our-cohort"><span class="header-section-number">7</span> Redefining our Cohort</a>
   <ul class="collapse">
   <li><a href="#checkpoint" id="toc-checkpoint" class="nav-link" data-scroll-target="#checkpoint">Checkpoint</a></li>
   </ul></li>
-  <li><a href="#data-models-for-record-linkage" id="toc-data-models-for-record-linkage" class="nav-link" data-scroll-target="#data-models-for-record-linkage">Data models for record linkage</a>
+  <li><a href="#data-models-for-record-linkage" id="toc-data-models-for-record-linkage" class="nav-link" data-scroll-target="#data-models-for-record-linkage"><span class="header-section-number">8</span> Data models for record linkage</a>
   <ul class="collapse">
   <li><a href="#class-data-model" id="toc-class-data-model" class="nav-link" data-scroll-target="#class-data-model">Class Data Model</a></li>
   <li><a href="#applying-the-data-model-to-our-cohort-analysis" id="toc-applying-the-data-model-to-our-cohort-analysis" class="nav-link" data-scroll-target="#applying-the-data-model-to-our-cohort-analysis">Applying the data model to our cohort analysis</a></li>
   </ul></li>
-  <li><a href="#analytic-frame-exploration" id="toc-analytic-frame-exploration" class="nav-link" data-scroll-target="#analytic-frame-exploration">Analytic Frame Exploration</a>
+  <li><a href="#analytic-frame-exploration" id="toc-analytic-frame-exploration" class="nav-link" data-scroll-target="#analytic-frame-exploration"><span class="header-section-number">9</span> Analytic Frame Exploration</a>
   <ul class="collapse">
   <li><a href="#claimant-data" id="toc-claimant-data" class="nav-link" data-scroll-target="#claimant-data">Claimant Data</a></li>
   <li><a href="#interaction-between-claimant-and-wage-data" id="toc-interaction-between-claimant-and-wage-data" class="nav-link" data-scroll-target="#interaction-between-claimant-and-wage-data">Interaction between Claimant and Wage Data</a></li>
   <li><a href="#checkpoint-1" id="toc-checkpoint-1" class="nav-link" data-scroll-target="#checkpoint-1">Checkpoint</a></li>
   </ul></li>
-  <li><a href="#next-steps-applying-this-notebook-to-your-project" id="toc-next-steps-applying-this-notebook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-this-notebook-to-your-project">Next steps: Applying this notebook to your project</a></li>
-  <li><a href="#citations" id="toc-citations" class="nav-link" data-scroll-target="#citations">Citations</a></li>
+  <li><a href="#next-steps-applying-this-notebook-to-your-project" id="toc-next-steps-applying-this-notebook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-this-notebook-to-your-project"><span class="header-section-number">10</span> Next steps: Applying this notebook to your project</a></li>
+  <li><a href="#citations" id="toc-citations" class="nav-link" data-scroll-target="#citations"><span class="header-section-number">11</span> Citations</a></li>
   </ul>
 </nav>
     </div>
@@ -250,7 +262,7 @@ <h2 id="toc-title">Table of contents</h2>
 
 <header id="title-block-header" class="quarto-title-block default">
 <div class="quarto-title">
-<h1 class="title"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></h1>
+<h1 class="title"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></h1>
 <p class="subtitle lead">Module 2: Workbook 3</p>
 </div>
 
@@ -287,8 +299,8 @@ <h1 class="title"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data
 }
 </style>
 </div>
-<section id="introduction" class="level1">
-<h1>Introduction</h1>
+<section id="introduction" class="level1" data-number="5">
+<h1 data-number="5"><span class="header-section-number">5</span> Introduction</h1>
 <p>Welcome to our third notebook of this module! In this notebook, we will demonstrate how to leverage the results of <strong>record linkage</strong> and <strong>dimensional data modeling</strong> to build out an analytic frame necessary for a longitudinal cohort analysis.</p>
 <p>In the <a href="P:\tr-wi-2023\Workbooks\02_cohort.html">last notebook</a>, we learned that the first step of cohort analysis is to define its uniting “anchor point” in time, limiting observations to the initial cross-section. While this first step is essential, it doesn’t allow us to follow these individuals over time - which is, after all, the whole point of cohort analysis!</p>
 <p>To harness the true power of a cohort analysis, we need to perform some sort of record linkage. As indicated by its name, record linkage is the process of identifying and linking all records - data points - which pertain to the entities of interest in the analysis. Broadly, record linkage allows us to follow our population of interest over time and across different data sources. Remember that at the end of the previous notebook, we started the record linkage process by joining our cohort cross-section back to the full PROMIS dataset to identify additional observations for our cohort members.</p>
@@ -301,8 +313,8 @@ <h1>Introduction</h1>
 <li>A few basic analyses we can execute using the linked analytic frame, and how to apply to these analyses to your own cohort</li>
 </ul>
 </section>
-<section id="technical-setup" class="level1">
-<h1>Technical setup</h1>
+<section id="technical-setup" class="level1" data-number="6">
+<h1 data-number="6"><span class="header-section-number">6</span> Technical setup</h1>
 <p>Here, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, <a href="#redefining-our-cohort">Redefining our Cohort</a>.</p>
 <section id="load-libraries" class="level2">
 <h2 class="anchored" data-anchor-id="load-libraries">Load Libraries</h2>
@@ -335,8 +347,8 @@ <h2 class="anchored" data-anchor-id="establish-database-connection">Establish Da
 <p>For this code to work, you need to have an <code>.Renviron</code> file in your user folder (i.e.&nbsp;U:\John.Doe.P00002) containing your username and password.</p>
 </section>
 </section>
-<section id="redefining-our-cohort" class="level1">
-<h1>Redefining our Cohort</h1>
+<section id="redefining-our-cohort" class="level1" data-number="7">
+<h1 data-number="7"><span class="header-section-number">7</span> Redefining our Cohort</h1>
 <p>Although in the last notebook we took a first pass at constructing the cohort for our analysis, we also left you with the following <a href="P:\tr-wi-2023\Workbooks\02_cohort.html#5_Next_Steps:_Applying_this_notebook_to_your_project">caveat</a>:</p>
 <blockquote class="blockquote">
 <p>“Given that the data application decisions are not finalized, you can expect to receive an update on the translation of these cohort restrictions to the PROMIS data next class.”</p>
@@ -413,8 +425,8 @@ <h2 class="anchored" data-anchor-id="checkpoint">Checkpoint</h2>
 <p>Given our new knowledge of the PROMIS data, do you need to go back and redefine your team’s cohort? Refer to the updated <a href="P:\tr-wi-2023\Data_Dictionaries\WI_promis_data_dictionary.pdf">data dictionary</a>, which has a new column, “Face Validity”, providing additional information for each variable.</p>
 </section>
 </section>
-<section id="data-models-for-record-linkage" class="level1">
-<h1>Data models for record linkage</h1>
+<section id="data-models-for-record-linkage" class="level1" data-number="8">
+<h1 data-number="8"><span class="header-section-number">8</span> Data models for record linkage</h1>
 <p>Now that we have reassembled our cohort based on our new understanding of the PROMIS data, we can revisit the record linkage process with the eventual aim of constructing our final analytic frame.</p>
 <p>When attempting to link records, however, there are many potential issues which could arise. For example:</p>
 <ul>
@@ -458,8 +470,8 @@ <h2 class="anchored" data-anchor-id="applying-the-data-model-to-our-cohort-analy
 <p>Now we have created our analytic frame!</p>
 </section>
 </section>
-<section id="analytic-frame-exploration" class="level1">
-<h1>Analytic Frame Exploration</h1>
+<section id="analytic-frame-exploration" class="level1" data-number="9">
+<h1 data-number="9"><span class="header-section-number">9</span> Analytic Frame Exploration</h1>
 <p>Before proceeding with the rest of our analysis, we will explore this data frame to ensure we understand its construction and how we can best leverage it moving forward.</p>
 <p>First, let’s evaluate the distribution of total observations for each member of our cohort in the data model.</p>
 <div class="cell" data-layout-align="center">
@@ -650,13 +662,13 @@ <h2 class="anchored" data-anchor-id="checkpoint-1">Checkpoint</h2>
 <p>Are all of the variables you need for your team’s research project available in the data model? Discuss with your team and note any gaps.</p>
 </section>
 </section>
-<section id="next-steps-applying-this-notebook-to-your-project" class="level1">
-<h1>Next steps: Applying this notebook to your project</h1>
+<section id="next-steps-applying-this-notebook-to-your-project" class="level1" data-number="10">
+<h1 data-number="10"><span class="header-section-number">10</span> Next steps: Applying this notebook to your project</h1>
 <p>In this notebook, we demonstrated how to apply the newly-created class data model to a longitudinal study with an already-developed cohort. Think through the new questions that this linked data model allows you to explore and how they relate to your team’s project. Refer back to the list of tables in our data model, and begin devising a plan for how each may contribute to your analysis. As you are doing so, take special care to think through the varied grains of the benefit and wage data in the fact table, as well as our new findings about the PROMIS data in general, and how they may impact your work in addressing your research question.</p>
 <p>We will further this narrative in the next notebook as we continue to develop our descriptive analysis, shifting our focus to the benefit- and employment-based measures we can create using our new analytic frame to develop findings to inform our guiding research topic.</p>
 </section>
-<section id="citations" class="level1">
-<h1>Citations</h1>
+<section id="citations" class="level1" data-number="11">
+<h1 data-number="11"><span class="header-section-number">11</span> Citations</h1>
 <p>AR 2022 Record Linkage Notebook, Robert McGough, Nishav Mainali, Benjamin Feder, Josh Edelmann (Link to come)</p>
 
 
@@ -899,12 +911,12 @@ <h1>Citations</h1>
 <nav class="page-navigation">
   <div class="nav-page nav-page-previous">
       <a href="../Class Notebooks/02_cohort.html" class="pagination-link">
-        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span>
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span>
       </a>          
   </div>
   <div class="nav-page nav-page-next">
       <a href="../Class Notebooks/04_measurement.html" class="pagination-link">
-        <span class="nav-page-text"><span class="chapter-title">Measurement</span></span> <i class="bi bi-arrow-right-short"></i>
+        <span class="nav-page-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></span> <i class="bi bi-arrow-right-short"></i>
       </a>
   </div>
 </nav>
diff --git a/Class Notebooks/04_measurement.html b/Class Notebooks/04_measurement.html
index cb6d930..e838e1f 100644
--- a/Class Notebooks/04_measurement.html	
+++ b/Class Notebooks/04_measurement.html	
@@ -8,7 +8,7 @@
 
 <meta name="author" content="Roy McKenzie, Benjamin Feder, Joshua Edelmann">
 
-<title>Wisconsin Applied Data Analytics 2023 Class - Measurement</title>
+<title>Wisconsin Applied Data Analytics 2023 Class - 5&nbsp; Measurement</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -114,7 +114,7 @@
       <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
         <i class="bi bi-layout-text-sidebar-reverse"></i>
       </button>
-      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/04_measurement.html"><span class="chapter-title">Measurement</span></a></li></ol></nav>
+      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/04_measurement.html"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></a></li></ol></nav>
       <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
       </a>
       <button type="button" class="btn quarto-search-button" aria-label="" onclick="window.quartoOpenSearch();">
@@ -134,7 +134,7 @@
     <div class="sidebar-title mb-0 py-0">
       <a href="../">Wisconsin Applied Data Analytics 2023 Class</a> 
         <div class="sidebar-tools-main">
-    <a href="https://github.com/Coleridge-Initiative" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
+    <a href="https://github.com/Coleridge-Initiative/ada-2023-wi" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
 </div>
     </div>
       </div>
@@ -153,62 +153,74 @@
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Exploratory Data Analysis</span></a>
+  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cross-Sectional Analysis</span></a>
+  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></a>
+  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></a>
+  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link active"><span class="chapter-title">Measurement</span></a>
+  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link active">
+ <span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Visualization</span></a>
+  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Characterizing Demand</span></a>
+  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Disclosure Review</span></a>
+  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Dimensional Model Scripts</span></a>
+  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></a>
+  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Building Employer Measures</span></a>
+  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Working with NAICS Crosswalk</span></a>
+  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></span></a>
   </div>
 </li>
     </ul>
@@ -221,14 +233,14 @@
     <h2 id="toc-title">Table of contents</h2>
    
   <ul>
-  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction">Introduction</a></li>
-  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup">Technical setup</a>
+  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction"><span class="header-section-number">6</span> Introduction</a></li>
+  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup"><span class="header-section-number">7</span> Technical setup</a>
   <ul class="collapse">
   <li><a href="#load-libraries" id="toc-load-libraries" class="nav-link" data-scroll-target="#load-libraries">Load libraries</a></li>
   <li><a href="#establish-database-connection" id="toc-establish-database-connection" class="nav-link" data-scroll-target="#establish-database-connection">Establish database connection</a></li>
   </ul></li>
-  <li><a href="#loading-our-analytic-frame" id="toc-loading-our-analytic-frame" class="nav-link" data-scroll-target="#loading-our-analytic-frame">Loading our analytic frame</a></li>
-  <li><a href="#unemployment-experience-measures" id="toc-unemployment-experience-measures" class="nav-link" data-scroll-target="#unemployment-experience-measures">Unemployment experience measures</a>
+  <li><a href="#loading-our-analytic-frame" id="toc-loading-our-analytic-frame" class="nav-link" data-scroll-target="#loading-our-analytic-frame"><span class="header-section-number">8</span> Loading our analytic frame</a></li>
+  <li><a href="#unemployment-experience-measures" id="toc-unemployment-experience-measures" class="nav-link" data-scroll-target="#unemployment-experience-measures"><span class="header-section-number">9</span> Unemployment experience measures</a>
   <ul class="collapse">
   <li><a href="#benefit-spells" id="toc-benefit-spells" class="nav-link" data-scroll-target="#benefit-spells">Benefit Spells</a>
   <ul class="collapse">
@@ -243,7 +255,7 @@ <h2 id="toc-title">Table of contents</h2>
   <li><a href="#checkpoint-2" id="toc-checkpoint-2" class="nav-link" data-scroll-target="#checkpoint-2">Checkpoint</a></li>
   </ul></li>
   </ul></li>
-  <li><a href="#employment-measures" id="toc-employment-measures" class="nav-link" data-scroll-target="#employment-measures">Employment Measures</a>
+  <li><a href="#employment-measures" id="toc-employment-measures" class="nav-link" data-scroll-target="#employment-measures"><span class="header-section-number">10</span> Employment Measures</a>
   <ul class="collapse">
   <li><a href="#any-employment" id="toc-any-employment" class="nav-link" data-scroll-target="#any-employment">Any Employment</a></li>
   <li><a href="#quarterly-wages" id="toc-quarterly-wages" class="nav-link" data-scroll-target="#quarterly-wages">Quarterly Wages</a>
@@ -255,8 +267,8 @@ <h2 id="toc-title">Table of contents</h2>
   <li><a href="#checkpoint-4" id="toc-checkpoint-4" class="nav-link" data-scroll-target="#checkpoint-4">Checkpoint</a></li>
   </ul></li>
   </ul></li>
-  <li><a href="#next-steps-applying-this-notebook-to-your-project" id="toc-next-steps-applying-this-notebook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-this-notebook-to-your-project">Next steps: Applying this notebook to your project</a></li>
-  <li><a href="#citation" id="toc-citation" class="nav-link" data-scroll-target="#citation">Citation</a></li>
+  <li><a href="#next-steps-applying-this-notebook-to-your-project" id="toc-next-steps-applying-this-notebook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-this-notebook-to-your-project"><span class="header-section-number">11</span> Next steps: Applying this notebook to your project</a></li>
+  <li><a href="#citation" id="toc-citation" class="nav-link" data-scroll-target="#citation"><span class="header-section-number">12</span> Citation</a></li>
   </ul>
 </nav>
     </div>
@@ -265,7 +277,7 @@ <h2 id="toc-title">Table of contents</h2>
 
 <header id="title-block-header" class="quarto-title-block default">
 <div class="quarto-title">
-<h1 class="title"><span class="chapter-title">Measurement</span></h1>
+<h1 class="title"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></h1>
 <p class="subtitle lead">Module 2: Workbook 4</p>
 </div>
 
@@ -302,14 +314,14 @@ <h1 class="title"><span class="chapter-title">Measurement</span></h1>
 }
 </style>
 </div>
-<section id="introduction" class="level1">
-<h1>Introduction</h1>
+<section id="introduction" class="level1" data-number="6">
+<h1 data-number="6"><span class="header-section-number">6</span> Introduction</h1>
 <p>Welcome to Notebook 4 of Module 2! Up to this point in the course, most of our work with the Wisconsin data has been focused on data preparation and project scoping, culminating in the development of our analytic frame in last week’s notebook. In this notebook, we will bridge the gap between this project scoping work and the actual process of longitudinal analysis by developing the measures that will serve as our primary outcomes of interest.</p>
 <p>As you’ve learned, when we are analyzing administrative data not developed for research purposes, it is important to create new measures that will help us answer our policy-relevant questions. When we say “measure”, we usually mean a <strong>person-level variable</strong> that we can use to compare outcomes for individuals in our cohort. Creating measures at the person level allows us to compare outcomes for different subgroups of individuals based on their characteristics and experiences.</p>
 <p>Here, we will demonstrate how to create several measures to describe our cohort members’ UI experience and subsequent workforce outcomes. While your group may choose to generate different measures based on your research question, the code displayed here should provide a good starting place for thinking about how to best create and analyze person-level measures.</p>
 </section>
-<section id="technical-setup" class="level1">
-<h1>Technical setup</h1>
+<section id="technical-setup" class="level1" data-number="7">
+<h1 data-number="7"><span class="header-section-number">7</span> Technical setup</h1>
 <p>As in previous notebooks, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, <a href="#loading-our-analytic-frame">Loading our analytic frame</a>.</p>
 <section id="load-libraries" class="level2">
 <h2 class="anchored" data-anchor-id="load-libraries">Load libraries</h2>
@@ -342,8 +354,8 @@ <h2 class="anchored" data-anchor-id="establish-database-connection">Establish da
 <p>For this code to work, you need to have an <code>.Renviron</code> file in your user folder (i.e.&nbsp;U:\John.Doe.P00002) containing your username and password.</p>
 </section>
 </section>
-<section id="loading-our-analytic-frame" class="level1">
-<h1>Loading our analytic frame</h1>
+<section id="loading-our-analytic-frame" class="level1" data-number="8">
+<h1 data-number="8"><span class="header-section-number">8</span> Loading our analytic frame</h1>
 <p>We can recreate our analytic frame dataset from the prior notebook by using SQL joins to filter the fact table to only include our cohort members.</p>
 <div class="cell" data-layout-align="center">
 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>qry <span class="ot">&lt;-</span> <span class="st">"</span></span>
@@ -357,8 +369,8 @@ <h1>Loading our analytic frame</h1>
 </div>
 <p>For further details about the analytic frame, please refer back to notebook <a href="P:/tr-wi-2023/Workbooks/03_record_linkage.html">03_record_linkage</a>.</p>
 </section>
-<section id="unemployment-experience-measures" class="level1">
-<h1>Unemployment experience measures</h1>
+<section id="unemployment-experience-measures" class="level1" data-number="9">
+<h1 data-number="9"><span class="header-section-number">9</span> Unemployment experience measures</h1>
 <p>The first set of measures we will construct are aimed at capturing aspects of our cohort members’ experience with the UI benefit system. Again, each of these measures is <strong>person-level</strong> - for each measure, we want to distill the wealth of information available in our analytic frame into a single outcome per individual that we can compare across subgroups of our cohort.</p>
 <section id="benefit-spells" class="level2">
 <h2 class="anchored" data-anchor-id="benefit-spells">Benefit Spells</h2>
@@ -580,8 +592,8 @@ <h3 class="anchored" data-anchor-id="checkpoint-2">Checkpoint</h3>
 </section>
 </section>
 </section>
-<section id="employment-measures" class="level1">
-<h1>Employment Measures</h1>
+<section id="employment-measures" class="level1" data-number="10">
+<h1 data-number="10"><span class="header-section-number">10</span> Employment Measures</h1>
 <p>Since our analytic frame also includes variables describing employment experiences, we can develop measures focused on our cohort’s past and future employment relative to the benefit year in question.</p>
 <p>Conveniently, because our cohort definition identifies individuals who started their benefit year in the last week of Q1 2022, any employment in subsequent rows (remember to aggregate by quarter!) reflect employment post-UI entry. In these examples, we will restrict the employment data to within three quarters of UI program entry.</p>
 <p>To do so, we will create a handy reference table below, which will also track the quarter relative to entry.</p>
@@ -759,12 +771,12 @@ <h3 class="anchored" data-anchor-id="checkpoint-4">Checkpoint</h3>
 </section>
 </section>
 </section>
-<section id="next-steps-applying-this-notebook-to-your-project" class="level1">
-<h1>Next steps: Applying this notebook to your project</h1>
+<section id="next-steps-applying-this-notebook-to-your-project" class="level1" data-number="11">
+<h1 data-number="11"><span class="header-section-number">11</span> Next steps: Applying this notebook to your project</h1>
 <p>Hopefully, by this point in the notebook, you have been inspired to apply some of these measures to your own cohort and overall project. You are encouraged to use the base code available in this notebook, and adapt and apply it to your own work. In the realm of unemployment to reemployment trajectories, there is a wealth of potential measures that can be created by linking the PROMIS and UI wage records, and we encourage you to think through the different ways you might be able to create new measures and proxies to help answer your primary research question.</p>
 </section>
-<section id="citation" class="level1">
-<h1>Citation</h1>
+<section id="citation" class="level1" data-number="12">
+<h1 data-number="12"><span class="header-section-number">12</span> Citation</h1>
 <p>AR Measurement Notebook (link to come)</p>
 <p>WI 2023 Record Linkage Notebook, Roy McKenzie, Benjamin Feder, Joshua Edelmann (citation to be added)</p>
 
@@ -1008,12 +1020,12 @@ <h1>Citation</h1>
 <nav class="page-navigation">
   <div class="nav-page nav-page-previous">
       <a href="../Class Notebooks/03_record_linkage.html" class="pagination-link">
-        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span>
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span>
       </a>          
   </div>
   <div class="nav-page nav-page-next">
       <a href="../Class Notebooks/05_datavisualization.html" class="pagination-link">
-        <span class="nav-page-text"><span class="chapter-title">Data Visualization</span></span> <i class="bi bi-arrow-right-short"></i>
+        <span class="nav-page-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></span> <i class="bi bi-arrow-right-short"></i>
       </a>
   </div>
 </nav>
diff --git a/Class Notebooks/05_datavisualization.html b/Class Notebooks/05_datavisualization.html
index 861ef62..b198d24 100644
--- a/Class Notebooks/05_datavisualization.html	
+++ b/Class Notebooks/05_datavisualization.html	
@@ -8,7 +8,7 @@
 
 <meta name="author" content="Corey Sparks, Benjamin Feder, Roy McKenzie, Joshua Edelmann">
 
-<title>Wisconsin Applied Data Analytics 2023 Class - Data Visualization</title>
+<title>Wisconsin Applied Data Analytics 2023 Class - 6&nbsp; Data Visualization</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -112,7 +112,7 @@
       <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
         <i class="bi bi-layout-text-sidebar-reverse"></i>
       </button>
-      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/05_datavisualization.html"><span class="chapter-title">Data Visualization</span></a></li></ol></nav>
+      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/05_datavisualization.html"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></a></li></ol></nav>
       <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
       </a>
       <button type="button" class="btn quarto-search-button" aria-label="" onclick="window.quartoOpenSearch();">
@@ -132,7 +132,7 @@
     <div class="sidebar-title mb-0 py-0">
       <a href="../">Wisconsin Applied Data Analytics 2023 Class</a> 
         <div class="sidebar-tools-main">
-    <a href="https://github.com/Coleridge-Initiative" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
+    <a href="https://github.com/Coleridge-Initiative/ada-2023-wi" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
 </div>
     </div>
       </div>
@@ -151,62 +151,74 @@
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Exploratory Data Analysis</span></a>
+  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cross-Sectional Analysis</span></a>
+  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></a>
+  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></a>
+  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Measurement</span></a>
+  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link active"><span class="chapter-title">Data Visualization</span></a>
+  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link active">
+ <span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Characterizing Demand</span></a>
+  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Disclosure Review</span></a>
+  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Dimensional Model Scripts</span></a>
+  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></a>
+  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Building Employer Measures</span></a>
+  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Working with NAICS Crosswalk</span></a>
+  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></span></a>
   </div>
 </li>
     </ul>
@@ -219,14 +231,14 @@
     <h2 id="toc-title">Table of contents</h2>
    
   <ul>
-  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction">Introduction</a></li>
-  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup">Technical setup</a>
+  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction"><span class="header-section-number">7</span> Introduction</a></li>
+  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup"><span class="header-section-number">8</span> Technical setup</a>
   <ul class="collapse">
   <li><a href="#load-libraries" id="toc-load-libraries" class="nav-link" data-scroll-target="#load-libraries">Load libraries</a></li>
   <li><a href="#establish-database-connection" id="toc-establish-database-connection" class="nav-link" data-scroll-target="#establish-database-connection">Establish database connection</a></li>
   </ul></li>
-  <li><a href="#loading-our-analytic-frame" id="toc-loading-our-analytic-frame" class="nav-link" data-scroll-target="#loading-our-analytic-frame">Loading our analytic frame</a></li>
-  <li><a href="#visualization" id="toc-visualization" class="nav-link" data-scroll-target="#visualization">Visualization</a>
+  <li><a href="#loading-our-analytic-frame" id="toc-loading-our-analytic-frame" class="nav-link" data-scroll-target="#loading-our-analytic-frame"><span class="header-section-number">9</span> Loading our analytic frame</a></li>
+  <li><a href="#visualization" id="toc-visualization" class="nav-link" data-scroll-target="#visualization"><span class="header-section-number">10</span> Visualization</a>
   <ul class="collapse">
   <li><a href="#density-plot" id="toc-density-plot" class="nav-link" data-scroll-target="#density-plot">Density plot</a>
   <ul class="collapse">
@@ -237,8 +249,8 @@ <h2 id="toc-title">Table of contents</h2>
   <li><a href="#heat-map" id="toc-heat-map" class="nav-link" data-scroll-target="#heat-map">Heat Map</a></li>
   <li><a href="#checkpoint" id="toc-checkpoint" class="nav-link" data-scroll-target="#checkpoint">Checkpoint</a></li>
   </ul></li>
-  <li><a href="#next-steps-applying-this-notebook-to-your-project" id="toc-next-steps-applying-this-notebook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-this-notebook-to-your-project">Next steps: Applying this notebook to your project</a></li>
-  <li><a href="#citations" id="toc-citations" class="nav-link" data-scroll-target="#citations">Citations</a></li>
+  <li><a href="#next-steps-applying-this-notebook-to-your-project" id="toc-next-steps-applying-this-notebook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-this-notebook-to-your-project"><span class="header-section-number">11</span> Next steps: Applying this notebook to your project</a></li>
+  <li><a href="#citations" id="toc-citations" class="nav-link" data-scroll-target="#citations"><span class="header-section-number">12</span> Citations</a></li>
   </ul>
 </nav>
     </div>
@@ -247,7 +259,7 @@ <h2 id="toc-title">Table of contents</h2>
 
 <header id="title-block-header" class="quarto-title-block default">
 <div class="quarto-title">
-<h1 class="title"><span class="chapter-title">Data Visualization</span></h1>
+<h1 class="title"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></h1>
 <p class="subtitle lead">Module 2: Workbook 5</p>
 </div>
 
@@ -284,8 +296,8 @@ <h1 class="title"><span class="chapter-title">Data Visualization</span></h1>
 }
 </style>
 </div>
-<section id="introduction" class="level1">
-<h1>Introduction</h1>
+<section id="introduction" class="level1" data-number="7">
+<h1 data-number="7"><span class="header-section-number">7</span> Introduction</h1>
 <p>Welcome to Notebook 5 of Module 2! At this point in our notebook series, we have built out our descriptive analysis, and are now think about the findings and how to appropriately convey them. For outputs deemed best displayed in an image, we may have started on some initial plots in <code>ggplot2</code>, largely relying on its base functionality. Here, we will show you different ways you can leverage the powerful <code>ggplot2</code> package to create presentation- and publication-quality data visualizations from our descriptive analysis. We will also discuss different visualization options based on the type of the analysis.</p>
 <p>We will cover the following visualizations in this notebook:</p>
 <ul>
@@ -295,8 +307,8 @@ <h1>Introduction</h1>
 <li><strong>Heat Map</strong>: shows geographical variations in a variable using graded differences in color</li>
 </ul>
 </section>
-<section id="technical-setup" class="level1">
-<h1>Technical setup</h1>
+<section id="technical-setup" class="level1" data-number="8">
+<h1 data-number="8"><span class="header-section-number">8</span> Technical setup</h1>
 <p>As in previous notebooks, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, <a href="#loading-our-analytic-frame">Loading our analytic frame</a>.</p>
 <section id="load-libraries" class="level2">
 <h2 class="anchored" data-anchor-id="load-libraries">Load libraries</h2>
@@ -330,8 +342,8 @@ <h2 class="anchored" data-anchor-id="establish-database-connection">Establish da
 <p>For this code to work, you need to have an <code>.Renviron</code> file in your user folder (i.e.&nbsp;<code>U:\\John.Doe.P00002</code>) containing your username and password.</p>
 </section>
 </section>
-<section id="loading-our-analytic-frame" class="level1">
-<h1>Loading our analytic frame</h1>
+<section id="loading-our-analytic-frame" class="level1" data-number="9">
+<h1 data-number="9"><span class="header-section-number">9</span> Loading our analytic frame</h1>
 <p>As we did in the previous notebook, can recreate our analytic frame by using SQL joins to filter the fact table to only include our cohort members.</p>
 <div class="cell" data-layout-align="center">
 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>qry <span class="ot">&lt;-</span> <span class="st">"</span></span>
@@ -344,8 +356,8 @@ <h1>Loading our analytic frame</h1>
 <span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>analytic_frame <span class="ot">&lt;-</span> <span class="fu">dbGetQuery</span>(con, qry)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 </section>
-<section id="visualization" class="level1">
-<h1>Visualization</h1>
+<section id="visualization" class="level1" data-number="10">
+<h1 data-number="10"><span class="header-section-number">10</span> Visualization</h1>
 <p>This initial section is quite technically-focused. If you’d like, you can skip to the <a href="#density-plot">Density plot</a> subsection.</p>
 <p>Recall the structure of <strong>traditional <code>ggplot2</code> syntax</strong>:</p>
 <ul>
@@ -869,13 +881,13 @@ <h2 class="anchored" data-anchor-id="checkpoint">Checkpoint</h2>
 <p>Of your findings, which ones are most suitable to visualization? Why? Are there additional updates you would like to make to any of these plots?</p>
 </section>
 </section>
-<section id="next-steps-applying-this-notebook-to-your-project" class="level1">
-<h1>Next steps: Applying this notebook to your project</h1>
+<section id="next-steps-applying-this-notebook-to-your-project" class="level1" data-number="11">
+<h1 data-number="11"><span class="header-section-number">11</span> Next steps: Applying this notebook to your project</h1>
 <p>Although this notebook is quite technical and focused on final outputs, it can still be useful as you are producing your descriptive analysis. In particular, this notebook provides a variety of display options, and you should think about the best choice and design for exhibiting your findings. You can start by creating the base plot and think about an ideal title, so you can adjust the aspects of the graph to highlight your findings for the audience. At a minimum, it will be helpful for the business-oriented members of your team if you reuse the <code>ggsave()</code> code and save preliminary plots early and often, so they can provide their input on the direction of the analysis.</p>
 <p>Additionally, we recommend revisiting this notebook as you begin preparing to export your final tables and graphs from the ADRF, so you can apply layering updates to ensure your exports are ready for your final presentation and report. There are many other <code>ggplot2</code> layer aspects we did not cover in this notebook; thankfully, there are many open-source posts and examples for you to draw from as well.</p>
 </section>
-<section id="citations" class="level1">
-<h1>Citations</h1>
+<section id="citations" class="level1" data-number="12">
+<h1 data-number="12"><span class="header-section-number">12</span> Citations</h1>
 <p>Kamil Slowikowski (2021). ggrepel: Automatically Position Non-Overlapping Text Labels with ‘ggplot2’. R package version 0.9.1. https://CRAN.R-project.org/package=ggrepel</p>
 <p>Pedersen, T. L. (2022, August 24). Make your ggplot2 extension package understand the new linewidth aesthetic [web log]. Retrieved July 28, 2023, from https://www.tidyverse.org/blog/2022/08/ggplot2-3-4-0-size-to-linewidth/.</p>
 <p>Tian Lou, &amp; Dave McQuown. (2021, March 8). Data Visualization using Illinois Unemployment Insurance Data. Zenodo. https://doi.org/10.5281/zenodo.4589040</p>
@@ -1120,12 +1132,12 @@ <h1>Citations</h1>
 <nav class="page-navigation">
   <div class="nav-page nav-page-previous">
       <a href="../Class Notebooks/04_measurement.html" class="pagination-link">
-        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-title">Measurement</span></span>
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></span>
       </a>          
   </div>
   <div class="nav-page nav-page-next">
       <a href="../Class Notebooks/06_characterizing_demand.html" class="pagination-link">
-        <span class="nav-page-text"><span class="chapter-title">Characterizing Demand</span></span> <i class="bi bi-arrow-right-short"></i>
+        <span class="nav-page-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></span> <i class="bi bi-arrow-right-short"></i>
       </a>
   </div>
 </nav>
diff --git a/Class Notebooks/06_characterizing_demand.html b/Class Notebooks/06_characterizing_demand.html
index f4720c3..dfa0e28 100644
--- a/Class Notebooks/06_characterizing_demand.html	
+++ b/Class Notebooks/06_characterizing_demand.html	
@@ -8,7 +8,7 @@
 
 <meta name="author" content="Roy McKenzie and Benjamin Feder">
 
-<title>Wisconsin Applied Data Analytics 2023 Class - Characterizing Demand</title>
+<title>Wisconsin Applied Data Analytics 2023 Class - 7&nbsp; Characterizing Demand</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -114,7 +114,7 @@
       <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
         <i class="bi bi-layout-text-sidebar-reverse"></i>
       </button>
-      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/06_characterizing_demand.html"><span class="chapter-title">Characterizing Demand</span></a></li></ol></nav>
+      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/06_characterizing_demand.html"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></a></li></ol></nav>
       <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
       </a>
       <button type="button" class="btn quarto-search-button" aria-label="" onclick="window.quartoOpenSearch();">
@@ -134,7 +134,7 @@
     <div class="sidebar-title mb-0 py-0">
       <a href="../">Wisconsin Applied Data Analytics 2023 Class</a> 
         <div class="sidebar-tools-main">
-    <a href="https://github.com/Coleridge-Initiative" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
+    <a href="https://github.com/Coleridge-Initiative/ada-2023-wi" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
 </div>
     </div>
       </div>
@@ -153,62 +153,74 @@
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Exploratory Data Analysis</span></a>
+  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cross-Sectional Analysis</span></a>
+  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></a>
+  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></a>
+  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Measurement</span></a>
+  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Visualization</span></a>
+  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link active"><span class="chapter-title">Characterizing Demand</span></a>
+  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link active">
+ <span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Disclosure Review</span></a>
+  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Dimensional Model Scripts</span></a>
+  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></a>
+  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Building Employer Measures</span></a>
+  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Working with NAICS Crosswalk</span></a>
+  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></span></a>
   </div>
 </li>
     </ul>
@@ -221,26 +233,26 @@
     <h2 id="toc-title">Table of contents</h2>
    
   <ul>
-  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction">Introduction</a></li>
-  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup">Technical setup</a>
+  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction"><span class="header-section-number">8</span> Introduction</a></li>
+  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup"><span class="header-section-number">9</span> Technical setup</a>
   <ul class="collapse">
   <li><a href="#load-libraries" id="toc-load-libraries" class="nav-link" data-scroll-target="#load-libraries">Load libraries</a></li>
   <li><a href="#establish-database-connection" id="toc-establish-database-connection" class="nav-link" data-scroll-target="#establish-database-connection">Establish database connection</a></li>
   </ul></li>
-  <li><a href="#employer-side-analysis" id="toc-employer-side-analysis" class="nav-link" data-scroll-target="#employer-side-analysis">Employer-Side Analysis</a>
+  <li><a href="#employer-side-analysis" id="toc-employer-side-analysis" class="nav-link" data-scroll-target="#employer-side-analysis"><span class="header-section-number">10</span> Employer-Side Analysis</a>
   <ul class="collapse">
   <li><a href="#data-load" id="toc-data-load" class="nav-link" data-scroll-target="#data-load">Data Load</a></li>
   <li><a href="#identifying-previous-and-next-employers" id="toc-identifying-previous-and-next-employers" class="nav-link" data-scroll-target="#identifying-previous-and-next-employers">Identifying previous and next employers</a></li>
   <li><a href="#example-1-claim-volume-by-past-employer-earnings" id="toc-example-1-claim-volume-by-past-employer-earnings" class="nav-link" data-scroll-target="#example-1-claim-volume-by-past-employer-earnings">Example 1: Claim Volume by Past Employer Earnings</a></li>
   <li><a href="#example-2-future-claims-by-next-employers-employment-growth-rate" id="toc-example-2-future-claims-by-next-employers-employment-growth-rate" class="nav-link" data-scroll-target="#example-2-future-claims-by-next-employers-employment-growth-rate">Example 2: Future Claims by Next Employer’s Employment Growth Rate</a></li>
   </ul></li>
-  <li><a href="#job-postings-data---opportunity-insights" id="toc-job-postings-data---opportunity-insights" class="nav-link" data-scroll-target="#job-postings-data---opportunity-insights">Job Postings Data - Opportunity Insights</a>
+  <li><a href="#job-postings-data---opportunity-insights" id="toc-job-postings-data---opportunity-insights" class="nav-link" data-scroll-target="#job-postings-data---opportunity-insights"><span class="header-section-number">11</span> Job Postings Data - Opportunity Insights</a>
   <ul class="collapse">
   <li><a href="#data-load-1" id="toc-data-load-1" class="nav-link" data-scroll-target="#data-load-1">Data Load</a></li>
   <li><a href="#analysis" id="toc-analysis" class="nav-link" data-scroll-target="#analysis">Analysis</a></li>
   </ul></li>
-  <li><a href="#next-steps-applying-this-notebook-to-your-project" id="toc-next-steps-applying-this-notebook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-this-notebook-to-your-project">Next steps: Applying this notebook to your project</a></li>
-  <li><a href="#citations" id="toc-citations" class="nav-link" data-scroll-target="#citations">Citations</a></li>
+  <li><a href="#next-steps-applying-this-notebook-to-your-project" id="toc-next-steps-applying-this-notebook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-this-notebook-to-your-project"><span class="header-section-number">12</span> Next steps: Applying this notebook to your project</a></li>
+  <li><a href="#citations" id="toc-citations" class="nav-link" data-scroll-target="#citations"><span class="header-section-number">13</span> Citations</a></li>
   </ul>
 </nav>
     </div>
@@ -249,7 +261,7 @@ <h2 id="toc-title">Table of contents</h2>
 
 <header id="title-block-header" class="quarto-title-block default">
 <div class="quarto-title">
-<h1 class="title"><span class="chapter-title">Characterizing Demand</span></h1>
+<h1 class="title"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></h1>
 <p class="subtitle lead">Module 2: Workbook 6</p>
 </div>
 
@@ -286,13 +298,13 @@ <h1 class="title"><span class="chapter-title">Characterizing Demand</span></h1>
 }
 </style>
 </div>
-<section id="introduction" class="level1">
-<h1>Introduction</h1>
+<section id="introduction" class="level1" data-number="8">
+<h1 data-number="8"><span class="header-section-number">8</span> Introduction</h1>
 <p>Investigating the demand side of the labor market can help us understand the different types of employers within it. The majority of the research on labor market outcomes lays emphasis on the role of the employee (labor market supply). While this is important, understanding the employer’s role is also critical for developing employment outcomes.</p>
 <p>In the previous notebooks, we used descriptive statistics to analyze employment outcomes for our cohort. The goal of this notebook is now to demonstrate how we can leverage descriptive statistics for the purpose of characterizing labor demand and better contextualizing opportunities for employment by job sector. This will allow us to understand the types of employers individuals in our cohort are employed by and their relationship to our outcome measures, as well as recognize in-demand industry trends in Wisconsin.</p>
 </section>
-<section id="technical-setup" class="level1">
-<h1>Technical setup</h1>
+<section id="technical-setup" class="level1" data-number="9">
+<h1 data-number="9"><span class="header-section-number">9</span> Technical setup</h1>
 <p>As in previous notebooks, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, <a href="#employer-side-analysis">Employer-side Analysis</a>.</p>
 <section id="load-libraries" class="level2">
 <h2 class="anchored" data-anchor-id="load-libraries">Load libraries</h2>
@@ -327,8 +339,8 @@ <h2 class="anchored" data-anchor-id="establish-database-connection">Establish da
 <p>For this code to work, you need to have an <code>.Renviron</code> file in your user folder (i.e.&nbsp;<code>U:\\John.Doe.P00002</code>) containing your username and password.</p>
 </section>
 </section>
-<section id="employer-side-analysis" class="level1">
-<h1>Employer-Side Analysis</h1>
+<section id="employer-side-analysis" class="level1" data-number="10">
+<h1 data-number="10"><span class="header-section-number">10</span> Employer-Side Analysis</h1>
 <p>An individual in our cohort may have multiple employers of focus - their previous one(s) before claiming UI benefits, and subsequent one(s) upon reemployment. Here, we will provide separate examples focusing on these different employers, and their relationship with some of the outcome measures developed in the <a href="P:\tr-wi-2023\Workbooks\04_measurement.html">Measurement</a> workbook.</p>
 <section id="data-load" class="level2">
 <h2 class="anchored" data-anchor-id="data-load">Data Load</h2>
@@ -591,8 +603,8 @@ <h2 class="anchored" data-anchor-id="example-2-future-claims-by-next-employers-e
 <p>Do these results surprise you? Keep in mind that we have limited future claims data given our cohort definition and range of available data.</p>
 </section>
 </section>
-<section id="job-postings-data---opportunity-insights" class="level1">
-<h1>Job Postings Data - Opportunity Insights</h1>
+<section id="job-postings-data---opportunity-insights" class="level1" data-number="11">
+<h1 data-number="11"><span class="header-section-number">11</span> Job Postings Data - Opportunity Insights</h1>
 <p>Shifting gears, with demand, we can also look at the quantity of job openings by employer characteristic. There are many sources for tracking job postings, one of which is Opportunity Insights’ job postings data from Lightcast, which was formerly known as Burning Glass Technologies.</p>
 <section id="data-load-1" class="level2">
 <h2 class="anchored" data-anchor-id="data-load-1">Data Load</h2>
@@ -734,13 +746,13 @@ <h2 class="anchored" data-anchor-id="analysis">Analysis</h2>
 <p>We can see that the outlook appears to be relatively positive for those previously in manufacturing, for example, in terms of future job availability.</p>
 </section>
 </section>
-<section id="next-steps-applying-this-notebook-to-your-project" class="level1">
-<h1>Next steps: Applying this notebook to your project</h1>
+<section id="next-steps-applying-this-notebook-to-your-project" class="level1" data-number="12">
+<h1 data-number="12"><span class="header-section-number">12</span> Next steps: Applying this notebook to your project</h1>
 <p>This notebook is all about potential analyses - if you work through the concepts covered in the previous notebook, your project should be more than good enough. However, if you feel intrigued by the possibility of including either one of these types of analyses, whether it is of employer characteristics or job postings, we encourage you to use it to supplement your analysis.</p>
 <p>At the very least, even if you don’t incorporate this work into your project, we hope you are inspired to consider a demand-focused analysis in the future, either on its own or as a supplement to one focusing on potential employees.</p>
 </section>
-<section id="citations" class="level1">
-<h1>Citations</h1>
+<section id="citations" class="level1" data-number="13">
+<h1 data-number="13"><span class="header-section-number">13</span> Citations</h1>
 <p>Garner, Maryah, Nunez, Allison, Mian, Rukhshan, &amp; Feder, Benjamin. (2022). Characterizing Labor Demand with Descriptive Analysis using Indiana’s Temporary Assistance for Needy Families Data and UI Wage Data. https://doi.org/10.5281/zenodo.7459656</p>
 <p>Job postings data from Lightcast, aggregated by Opportunity Insights.</p>
 <p>“The Economic Impacts of COVID-19: Evidence from a New Public Database Built Using Private Sector Data”, by Raj Chetty, John Friedman, Nathaniel Hendren, Michael Stepner, and the Opportunity Insights Team. November 2020. Available at: https://opportunityinsights.org/wp-content/uploads/2020/05/tracker_paper.pdf</p>
@@ -985,12 +997,12 @@ <h1>Citations</h1>
 <nav class="page-navigation">
   <div class="nav-page nav-page-previous">
       <a href="../Class Notebooks/05_datavisualization.html" class="pagination-link">
-        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-title">Data Visualization</span></span>
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></span>
       </a>          
   </div>
   <div class="nav-page nav-page-next">
       <a href="../Class Notebooks/07_disclosure_review.html" class="pagination-link">
-        <span class="nav-page-text"><span class="chapter-title">Disclosure Review</span></span> <i class="bi bi-arrow-right-short"></i>
+        <span class="nav-page-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></span> <i class="bi bi-arrow-right-short"></i>
       </a>
   </div>
 </nav>
diff --git a/Class Notebooks/07_disclosure_review.html b/Class Notebooks/07_disclosure_review.html
index 349c501..bf524d4 100644
--- a/Class Notebooks/07_disclosure_review.html	
+++ b/Class Notebooks/07_disclosure_review.html	
@@ -8,7 +8,7 @@
 
 <meta name="author" content="Joshua Edelmann and Benjamin Feder">
 
-<title>Wisconsin Applied Data Analytics 2023 Class - Disclosure Review</title>
+<title>Wisconsin Applied Data Analytics 2023 Class - 8&nbsp; Disclosure Review</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -114,7 +114,7 @@
       <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
         <i class="bi bi-layout-text-sidebar-reverse"></i>
       </button>
-      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/07_disclosure_review.html"><span class="chapter-title">Disclosure Review</span></a></li></ol></nav>
+      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/07_disclosure_review.html"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></a></li></ol></nav>
       <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
       </a>
       <button type="button" class="btn quarto-search-button" aria-label="" onclick="window.quartoOpenSearch();">
@@ -134,7 +134,7 @@
     <div class="sidebar-title mb-0 py-0">
       <a href="../">Wisconsin Applied Data Analytics 2023 Class</a> 
         <div class="sidebar-tools-main">
-    <a href="https://github.com/Coleridge-Initiative" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
+    <a href="https://github.com/Coleridge-Initiative/ada-2023-wi" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
 </div>
     </div>
       </div>
@@ -153,62 +153,74 @@
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Exploratory Data Analysis</span></a>
+  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cross-Sectional Analysis</span></a>
+  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></a>
+  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></a>
+  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Measurement</span></a>
+  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Visualization</span></a>
+  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Characterizing Demand</span></a>
+  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link active"><span class="chapter-title">Disclosure Review</span></a>
+  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link active">
+ <span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Dimensional Model Scripts</span></a>
+  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></a>
+  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Building Employer Measures</span></a>
+  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Working with NAICS Crosswalk</span></a>
+  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></span></a>
   </div>
 </li>
     </ul>
@@ -221,46 +233,46 @@
     <h2 id="toc-title">Table of contents</h2>
    
   <ul>
-  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction">Introduction</a></li>
-  <li><a href="#preparing-files-for-export" id="toc-preparing-files-for-export" class="nav-link" data-scroll-target="#preparing-files-for-export">Preparing Files for Export</a>
+  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction"><span class="header-section-number">9</span> Introduction</a></li>
+  <li><a href="#preparing-files-for-export" id="toc-preparing-files-for-export" class="nav-link" data-scroll-target="#preparing-files-for-export"><span class="header-section-number">10</span> Preparing Files for Export</a>
   <ul class="collapse">
   <li><a href="#wi-2023-class-export-guidelines" id="toc-wi-2023-class-export-guidelines" class="nav-link" data-scroll-target="#wi-2023-class-export-guidelines">WI 2023 Class Export Guidelines</a></li>
   <li><a href="#supporting-documentation" id="toc-supporting-documentation" class="nav-link" data-scroll-target="#supporting-documentation">Supporting Documentation</a></li>
   </ul></li>
-  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup">Technical setup</a>
+  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup"><span class="header-section-number">11</span> Technical setup</a>
   <ul class="collapse">
   <li><a href="#load-libraries" id="toc-load-libraries" class="nav-link" data-scroll-target="#load-libraries">Load libraries</a></li>
   <li><a href="#establish-database-connection" id="toc-establish-database-connection" class="nav-link" data-scroll-target="#establish-database-connection">Establish database connection</a></li>
   </ul></li>
-  <li><a href="#loading-our-analytic-frame" id="toc-loading-our-analytic-frame" class="nav-link" data-scroll-target="#loading-our-analytic-frame">Loading our analytic frame</a></li>
-  <li><a href="#export-1-tabular-output-of-future-claims-by-next-primary-employers-employment-growth-rate" id="toc-export-1-tabular-output-of-future-claims-by-next-primary-employers-employment-growth-rate" class="nav-link" data-scroll-target="#export-1-tabular-output-of-future-claims-by-next-primary-employers-employment-growth-rate">Export 1: Tabular Output of Future Claims by Next Primary Employer’s Employment Growth Rate</a>
+  <li><a href="#loading-our-analytic-frame" id="toc-loading-our-analytic-frame" class="nav-link" data-scroll-target="#loading-our-analytic-frame"><span class="header-section-number">12</span> Loading our analytic frame</a></li>
+  <li><a href="#export-1-tabular-output-of-future-claims-by-next-primary-employers-employment-growth-rate" id="toc-export-1-tabular-output-of-future-claims-by-next-primary-employers-employment-growth-rate" class="nav-link" data-scroll-target="#export-1-tabular-output-of-future-claims-by-next-primary-employers-employment-growth-rate"><span class="header-section-number">13</span> Export 1: Tabular Output of Future Claims by Next Primary Employer’s Employment Growth Rate</a>
   <ul class="collapse">
   <li><a href="#steps-for-export" id="toc-steps-for-export" class="nav-link" data-scroll-target="#steps-for-export">Steps for Export</a></li>
   <li><a href="#preparation" id="toc-preparation" class="nav-link" data-scroll-target="#preparation">Preparation</a></li>
   </ul></li>
-  <li><a href="#export-2-bar-plot-of-exit-rates-by-week-relative-to-benefit-year" id="toc-export-2-bar-plot-of-exit-rates-by-week-relative-to-benefit-year" class="nav-link" data-scroll-target="#export-2-bar-plot-of-exit-rates-by-week-relative-to-benefit-year">Export 2: Bar Plot of Exit Rates by Week Relative to Benefit Year</a>
+  <li><a href="#export-2-bar-plot-of-exit-rates-by-week-relative-to-benefit-year" id="toc-export-2-bar-plot-of-exit-rates-by-week-relative-to-benefit-year" class="nav-link" data-scroll-target="#export-2-bar-plot-of-exit-rates-by-week-relative-to-benefit-year"><span class="header-section-number">14</span> Export 2: Bar Plot of Exit Rates by Week Relative to Benefit Year</a>
   <ul class="collapse">
   <li><a href="#steps-for-export-1" id="toc-steps-for-export-1" class="nav-link" data-scroll-target="#steps-for-export-1">Steps for Export</a></li>
   <li><a href="#preparation-1" id="toc-preparation-1" class="nav-link" data-scroll-target="#preparation-1">Preparation</a></li>
   </ul></li>
-  <li><a href="#export-3-line-plot-of-median-quarterly-wages-by-benefit-characteristics" id="toc-export-3-line-plot-of-median-quarterly-wages-by-benefit-characteristics" class="nav-link" data-scroll-target="#export-3-line-plot-of-median-quarterly-wages-by-benefit-characteristics">Export 3: Line Plot of Median Quarterly Wages by Benefit Characteristics</a>
+  <li><a href="#export-3-line-plot-of-median-quarterly-wages-by-benefit-characteristics" id="toc-export-3-line-plot-of-median-quarterly-wages-by-benefit-characteristics" class="nav-link" data-scroll-target="#export-3-line-plot-of-median-quarterly-wages-by-benefit-characteristics"><span class="header-section-number">15</span> Export 3: Line Plot of Median Quarterly Wages by Benefit Characteristics</a>
   <ul class="collapse">
   <li><a href="#steps-for-export-2" id="toc-steps-for-export-2" class="nav-link" data-scroll-target="#steps-for-export-2">Steps for Export</a></li>
   <li><a href="#preparation-2" id="toc-preparation-2" class="nav-link" data-scroll-target="#preparation-2">Preparation</a></li>
   </ul></li>
-  <li><a href="#export-4-heat-map-of-claimant-rates-by-county" id="toc-export-4-heat-map-of-claimant-rates-by-county" class="nav-link" data-scroll-target="#export-4-heat-map-of-claimant-rates-by-county">Export 4: Heat Map of Claimant Rates by County</a>
+  <li><a href="#export-4-heat-map-of-claimant-rates-by-county" id="toc-export-4-heat-map-of-claimant-rates-by-county" class="nav-link" data-scroll-target="#export-4-heat-map-of-claimant-rates-by-county"><span class="header-section-number">16</span> Export 4: Heat Map of Claimant Rates by County</a>
   <ul class="collapse">
   <li><a href="#steps-for-export-3" id="toc-steps-for-export-3" class="nav-link" data-scroll-target="#steps-for-export-3">Steps for Export</a></li>
   <li><a href="#preparation-3" id="toc-preparation-3" class="nav-link" data-scroll-target="#preparation-3">Preparation</a></li>
   </ul></li>
-  <li><a href="#saving-visuals" id="toc-saving-visuals" class="nav-link" data-scroll-target="#saving-visuals">Saving Visuals</a>
+  <li><a href="#saving-visuals" id="toc-saving-visuals" class="nav-link" data-scroll-target="#saving-visuals"><span class="header-section-number">17</span> Saving Visuals</a>
   <ul class="collapse">
   <li><a href="#png" id="toc-png" class="nav-link" data-scroll-target="#png">PNG</a></li>
   <li><a href="#jpeg" id="toc-jpeg" class="nav-link" data-scroll-target="#jpeg">JPEG</a></li>
   <li><a href="#pdf" id="toc-pdf" class="nav-link" data-scroll-target="#pdf">PDF</a></li>
   </ul></li>
-  <li><a href="#next-steps-applying-this-notebook-to-your-project" id="toc-next-steps-applying-this-notebook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-this-notebook-to-your-project">Next steps: Applying this notebook to your project</a></li>
-  <li><a href="#references" id="toc-references" class="nav-link" data-scroll-target="#references">References</a></li>
+  <li><a href="#next-steps-applying-this-notebook-to-your-project" id="toc-next-steps-applying-this-notebook-to-your-project" class="nav-link" data-scroll-target="#next-steps-applying-this-notebook-to-your-project"><span class="header-section-number">18</span> Next steps: Applying this notebook to your project</a></li>
+  <li><a href="#references" id="toc-references" class="nav-link" data-scroll-target="#references"><span class="header-section-number">19</span> References</a></li>
   </ul>
 </nav>
     </div>
@@ -269,7 +281,7 @@ <h2 id="toc-title">Table of contents</h2>
 
 <header id="title-block-header" class="quarto-title-block default">
 <div class="quarto-title">
-<h1 class="title"><span class="chapter-title">Disclosure Review</span></h1>
+<h1 class="title"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></h1>
 <p class="subtitle lead">Module 2: Workbook 7</p>
 </div>
 
@@ -306,8 +318,8 @@ <h1 class="title"><span class="chapter-title">Disclosure Review</span></h1>
 }
 </style>
 </div>
-<section id="introduction" class="level1">
-<h1>Introduction</h1>
+<section id="introduction" class="level1" data-number="9">
+<h1 data-number="9"><span class="header-section-number">9</span> Introduction</h1>
 <p>This workbook provides information on how to prepare research output for disclosure control. It outlines how to prepare different kinds of outputs before submitting an export request and gives an overview of the information needed for disclosure review. <em>Please read through the entire workbook because it will separately discuss different types of outputs that will be flagged in the disclosure review process.</em></p>
 <p>We will apply the Wisconsin export rules to the following files in this workbook:</p>
 <ul>
@@ -317,8 +329,8 @@ <h1>Introduction</h1>
 <li>Heat Map</li>
 </ul>
 </section>
-<section id="preparing-files-for-export" class="level1">
-<h1>Preparing Files for Export</h1>
+<section id="preparing-files-for-export" class="level1" data-number="10">
+<h1 data-number="10"><span class="header-section-number">10</span> Preparing Files for Export</h1>
 <p>When exporting results, there are 3 items to be concerned with:</p>
 <ol type="1">
 <li><p><strong>Export file(s):</strong> this is the file you wish to export. This file needs to be disclosure-proofed; we will eventually walk through those steps in this notebook, first introducing them to you in the next section</p></li>
@@ -399,8 +411,8 @@ <h2 class="anchored" data-anchor-id="supporting-documentation">Supporting Docume
 </ul>
 </section>
 </section>
-<section id="technical-setup" class="level1">
-<h1>Technical setup</h1>
+<section id="technical-setup" class="level1" data-number="11">
+<h1 data-number="11"><span class="header-section-number">11</span> Technical setup</h1>
 <p>As in previous workbooks, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you are not concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, <a href="#loading-our-analytic-frame">Loading our analytic frame</a>.</p>
 <section id="load-libraries" class="level2">
 <h2 class="anchored" data-anchor-id="load-libraries">Load libraries</h2>
@@ -471,8 +483,8 @@ <h2 class="anchored" data-anchor-id="establish-database-connection">Establish da
 </div>
 </section>
 </section>
-<section id="loading-our-analytic-frame" class="level1">
-<h1>Loading our analytic frame</h1>
+<section id="loading-our-analytic-frame" class="level1" data-number="12">
+<h1 data-number="12"><span class="header-section-number">12</span> Loading our analytic frame</h1>
 <p>Since we will be adapting tables and visuals we have created in past notebook that mostly relied on the same underlying analytic frame, we will recreate it and read it into R first.</p>
 <div class="cell" data-layout-align="center">
 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>qry <span class="ot">&lt;-</span> <span class="st">"</span></span>
@@ -485,8 +497,8 @@ <h1>Loading our analytic frame</h1>
 <span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>analytic_frame <span class="ot">&lt;-</span> <span class="fu">dbGetQuery</span>(con, qry)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 </section>
-<section id="export-1-tabular-output-of-future-claims-by-next-primary-employers-employment-growth-rate" class="level1">
-<h1>Export 1: Tabular Output of Future Claims by Next Primary Employer’s Employment Growth Rate</h1>
+<section id="export-1-tabular-output-of-future-claims-by-next-primary-employers-employment-growth-rate" class="level1" data-number="13">
+<h1 data-number="13"><span class="header-section-number">13</span> Export 1: Tabular Output of Future Claims by Next Primary Employer’s Employment Growth Rate</h1>
 <p>Our first file we will prepare for export is a table containing future claims by employment growth created in the Characterizing Demand <a href="P:\tr-wi-2023\Workbooks\06_characterizing_demand.html#34_Example_2:_Future_Claims_by_Next_Employer’s_Employment_Growth_Rate">notebook</a>. In reality, the output development and disclosure review preparation are done in tandem. However, for simplicity, we will do this in separate steps, as we have already generated the initial output file.</p>
 <section id="steps-for-export" class="level2">
 <h2 class="anchored" data-anchor-id="steps-for-export">Steps for Export</h2>
@@ -637,8 +649,8 @@ <h2 class="anchored" data-anchor-id="preparation">Preparation</h2>
 </div>
 </section>
 </section>
-<section id="export-2-bar-plot-of-exit-rates-by-week-relative-to-benefit-year" class="level1">
-<h1>Export 2: Bar Plot of Exit Rates by Week Relative to Benefit Year</h1>
+<section id="export-2-bar-plot-of-exit-rates-by-week-relative-to-benefit-year" class="level1" data-number="14">
+<h1 data-number="14"><span class="header-section-number">14</span> Export 2: Bar Plot of Exit Rates by Week Relative to Benefit Year</h1>
 <p>Our second file to export is a bar plot showing the exit counts by week for our cohort in 2022. We initially created this bar plot in the Visualization <a href="P:/tr-wi-2023/Workbooks/05_datavisualization.html#43_Bar_Plot">notebook</a>.</p>
 <section id="steps-for-export-1" class="level2">
 <h2 class="anchored" data-anchor-id="steps-for-export-1">Steps for Export</h2>
@@ -781,8 +793,8 @@ <h2 class="anchored" data-anchor-id="preparation-1">Preparation</h2>
 <p>We will remind you of how to save this final plot at the end of the notebook.</p>
 </section>
 </section>
-<section id="export-3-line-plot-of-median-quarterly-wages-by-benefit-characteristics" class="level1">
-<h1>Export 3: Line Plot of Median Quarterly Wages by Benefit Characteristics</h1>
+<section id="export-3-line-plot-of-median-quarterly-wages-by-benefit-characteristics" class="level1" data-number="15">
+<h1 data-number="15"><span class="header-section-number">15</span> Export 3: Line Plot of Median Quarterly Wages by Benefit Characteristics</h1>
 <p>Our third file to prepare for export will build off of the line plot from the Visualization <a href="P:/tr-wi-2023/Workbooks/05_datavisualization.html#42_Line_plot">notebook</a>. The line plot in that notebook depicted average wages over time; here, we are going to pivot slightly and show median wages over time.</p>
 <section id="steps-for-export-2" class="level2">
 <h2 class="anchored" data-anchor-id="steps-for-export-2">Steps for Export</h2>
@@ -937,8 +949,8 @@ <h2 class="anchored" data-anchor-id="preparation-2">Preparation</h2>
 <p>We’ll save this figure at the end of the notebook.</p>
 </section>
 </section>
-<section id="export-4-heat-map-of-claimant-rates-by-county" class="level1">
-<h1>Export 4: Heat Map of Claimant Rates by County</h1>
+<section id="export-4-heat-map-of-claimant-rates-by-county" class="level1" data-number="16">
+<h1 data-number="16"><span class="header-section-number">16</span> Export 4: Heat Map of Claimant Rates by County</h1>
 <p>For our final export file we will be disclosure-proofing the heatmap from the visualization <a href="P:/tr-wi-2023/Workbooks/05_datavisualization.html#44_Heat_Map">notebook</a>, which displays counties by their UI claim rate at a specific point in time.</p>
 <section id="steps-for-export-3" class="level2">
 <h2 class="anchored" data-anchor-id="steps-for-export-3">Steps for Export</h2>
@@ -1044,8 +1056,8 @@ <h2 class="anchored" data-anchor-id="preparation-3">Preparation</h2>
 <p>Note that with the redaction rules, the counties with the five highest claim rates are slightly different than those noted prior to applying the disclosure controls.</p>
 </section>
 </section>
-<section id="saving-visuals" class="level1">
-<h1>Saving Visuals</h1>
+<section id="saving-visuals" class="level1" data-number="17">
+<h1 data-number="17"><span class="header-section-number">17</span> Saving Visuals</h1>
 <p>In this section, we provide examples of different techniques for exporting our presentation-ready plots. We can use <code>ggsave()</code> to save our visuals in a png, jpeg and pdf format without losing quality, demonstrating saving as each file type on the final plots.</p>
 <section id="png" class="level2">
 <h2 class="anchored" data-anchor-id="png">PNG</h2>
@@ -1072,13 +1084,13 @@ <h2 class="anchored" data-anchor-id="pdf">PDF</h2>
 </div>
 </section>
 </section>
-<section id="next-steps-applying-this-notebook-to-your-project" class="level1">
-<h1>Next steps: Applying this notebook to your project</h1>
+<section id="next-steps-applying-this-notebook-to-your-project" class="level1" data-number="18">
+<h1 data-number="18"><span class="header-section-number">18</span> Next steps: Applying this notebook to your project</h1>
 <p>This notebook may appear to be overwhelming, but majority of the code has been copied from previous notebooks to recreate the final tables and graphs. Focus your attention on the disclosure rules and procedures applied to each output, as this provides useful information and code techniques to apply to a variety of outputs. We recommend saving all output early so your team members can provide a fresh set of eyes on all the final files to ensure the all rules have been appropriately applied.</p>
 <p>Additionally, we recommend revisiting this notebook as you begin disclosure proofing your final tables and graphs so you can ensure your exports are ready for your final presentation and report.</p>
 </section>
-<section id="references" class="level1">
-<h1>References</h1>
+<section id="references" class="level1" data-number="19">
+<h1 data-number="19"><span class="header-section-number">19</span> References</h1>
 <p>VDC 2022 Presentation Preparation Notebook, Joshua Edelmann and Benjamin Feder (citation to be added)</p>
 <p>WI 2023 Characterizing Labor Demand Notebook, Roy McKenzie, Benjamin Feder (citation to be added)</p>
 <p>WI 2023 Data Visualization Notebook, Corey Sparks, Benjamin Feder, Roy McKenzie, and Joshua Edelmann (citation to be added)</p>
@@ -1323,12 +1335,12 @@ <h1>References</h1>
 <nav class="page-navigation">
   <div class="nav-page nav-page-previous">
       <a href="../Class Notebooks/06_characterizing_demand.html" class="pagination-link">
-        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-title">Characterizing Demand</span></span>
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></span>
       </a>          
   </div>
   <div class="nav-page nav-page-next">
       <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="pagination-link">
-        <span class="nav-page-text"><span class="chapter-title">Dimensional Model Scripts</span></span> <i class="bi bi-arrow-right-short"></i>
+        <span class="nav-page-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></span> <i class="bi bi-arrow-right-short"></i>
       </a>
   </div>
 </nav>
diff --git a/Class Notebooks/dimensional_model_creation_scripts.html b/Class Notebooks/dimensional_model_creation_scripts.html
index 171f3b0..36fea86 100644
--- a/Class Notebooks/dimensional_model_creation_scripts.html	
+++ b/Class Notebooks/dimensional_model_creation_scripts.html	
@@ -7,7 +7,7 @@
 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
 
 
-<title>Wisconsin Applied Data Analytics 2023 Class - Dimensional Model Scripts</title>
+<title>Wisconsin Applied Data Analytics 2023 Class - 9&nbsp; Dimensional Model Scripts</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -111,7 +111,7 @@
       <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
         <i class="bi bi-layout-text-sidebar-reverse"></i>
       </button>
-      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/dimensional_model_creation_scripts.html"><span class="chapter-title">Dimensional Model Scripts</span></a></li></ol></nav>
+      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../Class Notebooks/dimensional_model_creation_scripts.html"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></a></li></ol></nav>
       <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
       </a>
       <button type="button" class="btn quarto-search-button" aria-label="" onclick="window.quartoOpenSearch();">
@@ -131,7 +131,7 @@
     <div class="sidebar-title mb-0 py-0">
       <a href="../">Wisconsin Applied Data Analytics 2023 Class</a> 
         <div class="sidebar-tools-main">
-    <a href="https://github.com/Coleridge-Initiative" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
+    <a href="https://github.com/Coleridge-Initiative/ada-2023-wi" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
 </div>
     </div>
       </div>
@@ -150,62 +150,74 @@
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Exploratory Data Analysis</span></a>
+  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cross-Sectional Analysis</span></a>
+  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></a>
+  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></a>
+  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Measurement</span></a>
+  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Visualization</span></a>
+  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Characterizing Demand</span></a>
+  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Disclosure Review</span></a>
+  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link active"><span class="chapter-title">Dimensional Model Scripts</span></a>
+  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link active">
+ <span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></a>
+  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Building Employer Measures</span></a>
+  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Working with NAICS Crosswalk</span></a>
+  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></span></a>
   </div>
 </li>
     </ul>
@@ -230,7 +242,7 @@ <h2 id="toc-title">Table of contents</h2>
 
 <header id="title-block-header" class="quarto-title-block default">
 <div class="quarto-title">
-<h1 class="title"><span class="chapter-title">Dimensional Model Scripts</span></h1>
+<h1 class="title"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></h1>
 </div>
 
 
@@ -995,12 +1007,12 @@ <h2 class="anchored" data-anchor-id="fact-table-load">04 Fact Table Load</h2>
 <nav class="page-navigation">
   <div class="nav-page nav-page-previous">
       <a href="../Class Notebooks/07_disclosure_review.html" class="pagination-link">
-        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-title">Disclosure Review</span></span>
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></span>
       </a>          
   </div>
   <div class="nav-page nav-page-next">
       <a href="../supplemental Materials/supplemental_data_model.html" class="pagination-link">
-        <span class="nav-page-text"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span> <i class="bi bi-arrow-right-short"></i>
+        <span class="nav-page-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span> <i class="bi bi-arrow-right-short"></i>
       </a>
   </div>
 </nav>
diff --git a/index.html b/index.html
index b21726f..303a8b2 100644
--- a/index.html
+++ b/index.html
@@ -94,7 +94,7 @@
     <div class="sidebar-title mb-0 py-0">
       <a href="./">Wisconsin Applied Data Analytics 2023 Class</a> 
         <div class="sidebar-tools-main">
-    <a href="https://github.com/Coleridge-Initiative" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
+    <a href="https://github.com/Coleridge-Initiative/ada-2023-wi" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
 </div>
     </div>
       </div>
@@ -113,62 +113,74 @@
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="./Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Exploratory Data Analysis</span></a>
+  <a href="./Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="./Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cross-Sectional Analysis</span></a>
+  <a href="./Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="./Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></a>
+  <a href="./Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="./Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></a>
+  <a href="./Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="./Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Measurement</span></a>
+  <a href="./Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="./Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Visualization</span></a>
+  <a href="./Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="./Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Characterizing Demand</span></a>
+  <a href="./Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="./Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Disclosure Review</span></a>
+  <a href="./Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="./Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Dimensional Model Scripts</span></a>
+  <a href="./Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="./supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></a>
+  <a href="./supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="./supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Building Employer Measures</span></a>
+  <a href="./supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="./supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Working with NAICS Crosswalk</span></a>
+  <a href="./supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></span></a>
   </div>
 </li>
     </ul>
@@ -468,7 +480,7 @@ <h1 class="unnumbered">Preface</h1>
   </div>
   <div class="nav-page nav-page-next">
       <a href="./Class Notebooks/01A_EDA.html" class="pagination-link">
-        <span class="nav-page-text"><span class="chapter-title">Exploratory Data Analysis</span></span> <i class="bi bi-arrow-right-short"></i>
+        <span class="nav-page-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></span> <i class="bi bi-arrow-right-short"></i>
       </a>
   </div>
 </nav>
diff --git a/search.json b/search.json
index dac77be..fbf4809 100644
--- a/search.json
+++ b/search.json
@@ -9,686 +9,686 @@
   {
     "objectID": "Class Notebooks/01A_EDA.html",
     "href": "Class Notebooks/01A_EDA.html",
-    "title": "Exploratory Data Analysis",
+    "title": "1  Exploratory Data Analysis",
     "section": "",
-    "text": "Introduction\nWelcome to the first workbook for Module 2 of this course, covering Exploratory Data Analysis, or EDA.\nIn class, we learned that EDA is the process of examining your data to:\nEDA helps you ensure that the provided data suits your desired analysis and that you have the necessary understanding to make informed analytic decisions as you work through your project.\nIn the Foundations Module, we learned that EDA itself is part of the larger process of data discovery. In this workbook, we’re going to be discovering the two main datasets we will use in this course: the PROMIS Unemployment Insurance (UI) claims data and the Wisconsin UI wage records. We will work to answer key data discovery questions about each of these datasets, including\nThis workbook will cover both SQL and R coding concepts, so we need to set up our environment to connect to the proper database and run R code only accessible in packages external to the basic R environment. Typically, throughout these workbooks, we use SQL for the majority of data exploration and creation of the analytic frame, and then read that analytic frame into R for the descriptive analysis and visualization.\nNote: If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, WI PROMIS data: ds_wi_dwd.promis.\nThe primary dataset we will use in this class is the Wisconsin PROMIS data. The PROMIS (Program for Measuring Insured Unemployed Statistics) data, stored on Redshift as ds_wi_dwd.promis, provides information on unemployment insurance claimants in Wisconsin. Specifically, according to the LAUS Extraction Guide, the data includes “initial claims, additional initial claims, and continued claims that were either new or updated.”\nWe’re now going to apply these same EDA concepts to a second dataset, Wisconsin’s UI wage records, which are stored on Redshift as ds_wi_dwd.ui_wage.\nWe will keep the narrative of our exploration far briefer in this section. You are encouraged to read through the following output and think about how it pertains to the discussions that we had above.\nThe workbook provides a structure for you to start your EDA process on the data within the scope of your project. The data coverage and row definition for the two primary datasets in this training is available, allowing you to focus on evaluating the distribution of variables potentially relevant to your analysis. The data coverage is particularly essential for project ideas linking the two datasets, as you will want to select a set of years, quarters, and weeks that are available in potentially both datasets.\nAs you evaluate variable distributions, you can start by repurposing the code in these sections. There are code snippets for distributions of numeric, time-based, and categorical variables that may be appropriate depending on the type of column you are interested in exploring.\nIn doing so, as recommended in the checkpoints, note your findings in your team’s project template. As your project progresses, it will be helpful to look back at these notes, especially in thinking through how to most accurately and best communicate your team’s final product to an external audience. Ultimately, the EDA process is an essential step in the project development lifecycle, as it provides helpful contextual information on the variables you may choose to use (or not use) in your analysis.\nFor all of these steps, remember not to take notes or discuss exact results outside the ADRF. Instead, create notes or output inside the ADRF, and store them either in your U: drive or in your team’s folder on the P: drive. When discussing results with your team, remember to speak broadly, and instead direct them to look at specific findings within the ADRF. And, as always, feel free to reach out to the Coleridge team if you have any questions as you get used to this workflow!\nLAUS EXTRACTION GUIDE\nAR EDA Notebook (link to come)"
+    "text": "2 Introduction\nWelcome to the first workbook for Module 2 of this course, covering Exploratory Data Analysis, or EDA.\nIn class, we learned that EDA is the process of examining your data to:\nEDA helps you ensure that the provided data suits your desired analysis and that you have the necessary understanding to make informed analytic decisions as you work through your project.\nIn the Foundations Module, we learned that EDA itself is part of the larger process of data discovery. In this workbook, we’re going to be discovering the two main datasets we will use in this course: the PROMIS Unemployment Insurance (UI) claims data and the Wisconsin UI wage records. We will work to answer key data discovery questions about each of these datasets, including\nThis workbook will cover both SQL and R coding concepts, so we need to set up our environment to connect to the proper database and run R code only accessible in packages external to the basic R environment. Typically, throughout these workbooks, we use SQL for the majority of data exploration and creation of the analytic frame, and then read that analytic frame into R for the descriptive analysis and visualization.\nNote: If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, WI PROMIS data: ds_wi_dwd.promis.\nThe primary dataset we will use in this class is the Wisconsin PROMIS data. The PROMIS (Program for Measuring Insured Unemployed Statistics) data, stored on Redshift as ds_wi_dwd.promis, provides information on unemployment insurance claimants in Wisconsin. Specifically, according to the LAUS Extraction Guide, the data includes “initial claims, additional initial claims, and continued claims that were either new or updated.”\nWe’re now going to apply these same EDA concepts to a second dataset, Wisconsin’s UI wage records, which are stored on Redshift as ds_wi_dwd.ui_wage.\nWe will keep the narrative of our exploration far briefer in this section. You are encouraged to read through the following output and think about how it pertains to the discussions that we had above.\nThe workbook provides a structure for you to start your EDA process on the data within the scope of your project. The data coverage and row definition for the two primary datasets in this training is available, allowing you to focus on evaluating the distribution of variables potentially relevant to your analysis. The data coverage is particularly essential for project ideas linking the two datasets, as you will want to select a set of years, quarters, and weeks that are available in potentially both datasets.\nAs you evaluate variable distributions, you can start by repurposing the code in these sections. There are code snippets for distributions of numeric, time-based, and categorical variables that may be appropriate depending on the type of column you are interested in exploring.\nIn doing so, as recommended in the checkpoints, note your findings in your team’s project template. As your project progresses, it will be helpful to look back at these notes, especially in thinking through how to most accurately and best communicate your team’s final product to an external audience. Ultimately, the EDA process is an essential step in the project development lifecycle, as it provides helpful contextual information on the variables you may choose to use (or not use) in your analysis.\nFor all of these steps, remember not to take notes or discuss exact results outside the ADRF. Instead, create notes or output inside the ADRF, and store them either in your U: drive or in your team’s folder on the P: drive. When discussing results with your team, remember to speak broadly, and instead direct them to look at specific findings within the ADRF. And, as always, feel free to reach out to the Coleridge team if you have any questions as you get used to this workflow!\nLAUS EXTRACTION GUIDE\nAR EDA Notebook (link to come)"
   },
   {
     "objectID": "Class Notebooks/01A_EDA.html#the-purpose-of-these-workbooks",
     "href": "Class Notebooks/01A_EDA.html#the-purpose-of-these-workbooks",
-    "title": "Exploratory Data Analysis",
+    "title": "1  Exploratory Data Analysis",
     "section": "The Purpose of These Workbooks",
     "text": "The Purpose of These Workbooks\nAs we’ve discussed in class over the past few weeks, you will now apply the skills we learned in both modules thus far to restricted use Wisconsin data. With your team, you’ll carry out a descriptive analysis of this data and prepare a final project showcasing the results of this analysis.\nThese workbooks are here to help you along in this project development by showcasing how to apply the techniques that we’ll discuss in class to the Wisconsin microdata. Part of this workbook will be technical - providing basic code snippets your team can modify to begin parsing through the data. But, as always, there will also be an applied data literacy component of these workbooks, and it should help you develop a better understanding of the structure and use of the underlying data even if you never write a line of code.\nThe timeline for completing these workbooks will be given on the training website and communicated to you in class. Unlike the Foundations Module workbooks, these workbooks should be completed as homework after we have discussed the material in class."
   },
   {
     "objectID": "Class Notebooks/01A_EDA.html#load-libraries",
     "href": "Class Notebooks/01A_EDA.html#load-libraries",
-    "title": "Exploratory Data Analysis",
+    "title": "1  Exploratory Data Analysis",
     "section": "Load Libraries",
     "text": "Load Libraries\nJust like we did in the Foundations Module, in running SQL and R code together through R, we need to load the RJDBC package. In addition, we will load the tidyverse suite of packages, as they will help us implement some of our fundamental data operations while maintaining a consistent syntax.\n\nEvery time you create a new R file, you should copy and run the following code snippet.\n\n\noptions(scipen = 999) # avoid scientific notation\nlibrary(RJDBC)\nlibrary(tidyverse)"
   },
   {
     "objectID": "Class Notebooks/01A_EDA.html#establish-database-connection",
     "href": "Class Notebooks/01A_EDA.html#establish-database-connection",
-    "title": "Exploratory Data Analysis",
+    "title": "1  Exploratory Data Analysis",
     "section": "Establish Database Connection",
     "text": "Establish Database Connection\nTo load data from the Redshift server into R, we need to first set up a connection to the database. The following set of commands accomplish this:\n\ndbusr=Sys.getenv(\"DBUSER\")\ndbpswd=Sys.getenv(\"DBPASSWD\")\n\nurl &lt;- \"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;loginToRp=urn:amazon:webservices:govcloud;ssl=true;AutoCreate=true;idp_host=adfs.adrf.net;idp_port=443;ssl_insecure=true;plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\"\n\ndriver &lt;- JDBC(\n  \"com.amazon.redshift.jdbc42.Driver\",\n  classPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\", identifier.quote=\"`\"\n)\n\ncon &lt;- dbConnect(driver, url, dbusr, dbpswd)\n\nAs a reminder, don’t worry too much about the details of this connection - you can simply copy and paste this code each time you want to connect your R script to the Redshift database."
   },
   {
     "objectID": "Class Notebooks/01A_EDA.html#new-.renviron",
     "href": "Class Notebooks/01A_EDA.html#new-.renviron",
-    "title": "Exploratory Data Analysis",
+    "title": "1  Exploratory Data Analysis",
     "section": "New .Renviron",
     "text": "New .Renviron\nFor this code to work, you need to create a new .Renviron file in your user folder (i.e. U:\\John.Doe.P00002) that contains the following:\nDBUSER='adrf\\John.Doe.P00002’\nDBPASSWD='xxxxxxxxxxxx'\nwhere John.Doe.P00002 is replaced with your username and xxxxxxxxxx is replaced with your password (both still in quotes!) The setup of this code is nearly identical to that required in the Foundations Module workspace - however, DBUSER should now end with .T00111 instead of .T00112."
   },
   {
     "objectID": "Class Notebooks/01A_EDA.html#structure-of-the-promis-data",
     "href": "Class Notebooks/01A_EDA.html#structure-of-the-promis-data",
-    "title": "Exploratory Data Analysis",
+    "title": "1  Exploratory Data Analysis",
     "section": "Structure of the PROMIS data",
     "text": "Structure of the PROMIS data\nLet’s begin answering our key data discovery questions. First - what is the structure of the PROMIS data? There are two dimensions to this question:\n\nWhat variables are available in the PROMIS data?\nWhat does each row in the PROMIS data represent?\n\nTo start, just like in the Foundations Module, let’s glance at a few rows of the PROMIS data:\n\nNote: To avoid the consistent column-wise scrolling required, you can run the code stored in qry in DBeaver, as it is only written in SQL.\n\n\nqry &lt;- \"SELECT * FROM ds_wi_dwd.promis LIMIT 5\"\n\ndbGetQuery(con, qry)\n\nWe can see that there are 43 columns available in the PROMIS data. The PROMIS data dictionary (Right-click on link to open) on the P: drive contains detailed descriptions of each variable.\nThe data dictionary descriptions, while helpful, do not provide the entire context for these variables. We also need to have a clear definition of what each observation - or row - in the PROMIS dataset represents. That is, we want to understand which variable or combination of variables uniquely define each row of the dataset.\nBy uniquely define, we mean that if you kept only this set of variables in the dataset, and counted all unique rows, it would return the same number of rows as the total dataset. In that case, we would know that those variables (and only those variables) are essential in defining that observation, or data point. If we concerned any smaller set of variables, then we would have multiple rows that look identical - not helping us uniquely identify each observation.\nTo know what we’re aiming for, we can find the total number of rows in the PROMIS data:\n\nqry &lt;- \"SELECT COUNT(*) FROM ds_wi_dwd.promis\"\n\ndbGetQuery(con, qry)\n\nLet’s think about the three categories of variables that might appear in our dataset to uniquely define our rows:\n\nThe Unit of Observation\nThe first category to consider is variables that describe the unit of observation. The unit of observation refers to the type of entity or object about which data is collected. This could be a person, a organization, a state, or any other entity that is the focus of data collection.\nIn the case of the PROMIS data, our unit of observation is the person - the individual claimants who have their claims reported in the dataset. Besides leveraging background information on the nature of the data, we can identify this because ssn is a person level variable in our dataset representing the smallest unit identified in the data.\nImportantly, we need to note that the unit of observation alone does not necessarily define a row for our dataset. We might guess that each row of the PROMIS data defines a single individual, and ssn alone defines a row. But we can test this, using the SQL keyword DISTINCT to count the number of unique hashed social security numbers in our data:\n\nqry &lt;- \"\nSELECT COUNT(DISTINCT(ssn)) \nFROM ds_wi_dwd.promis\n\"\n\ndbGetQuery(con, qry)\n\nWe see that there are far fewer unique ssn values than total rows in the table. This indicates that ssn alone does not define a row of our data - some individuals must appear in the data more than once. Why might there be multiple observations for the same individual? The most common reason is the one we’ll talk about next: time.\n\n\nPeriod of Observation\nAs we saw above, a given dataset might have multiple rows for each unit of observation. There are many reasons why this could be, but the most common one is that data about each unit of observation is observed at multiple points in time.\nThis introduces a separate category of variables in our dataset: those that help describe the period of observation, or the time interval of data collection for the unit of observation. It represents how often the data is captured or updated. This could also be referred to as the time dimension of the data.\nNote that not every dataset will have variables representing the period of observation. For example, consider a dataset containing a list of all workers who have ever been employed in Wisconsin - each individual should only appear in the dataset once.\nIn the PROMIS data, the period of observation is represented by the variable week_ending_date, which indicates the end date of the week in which the claimant was eligible and claimed. Note that this is a date variable, which often can encode information about the period of observation.\nLet’s look at the amount of unique combinations of ssn and week_ending_date combined:\n\nqry &lt;- \"\n--CONCAT COMBINES THE VALUES IN week_ending_date AND ssn INTO A SINGLE VARIABLE\n\nSELECT COUNT(DISTINCT CONCAT(week_ending_date, ssn)) \nFROM ds_wi_dwd.promis\n\"\n\ndbGetQuery(con, qry)\n\nThis is exactly the number of total rows in our dataset! Therefore, we can say that the claims data is collected at the person-week level, because we see (at most) one observation per individual per week. Because the row numbers match, we know we don’t have individuals who are marked as having been eligible and claimed during the same week.\n\n\nAttributes\nWhat about the other 41 variables described in the data dictionary?\nThese remaining variables represent the information that the PROMIS table contains about our person-week observations. There are several different types of attributes that we can see in our data dictionary:\n\nIndividual attributes: birth_date, commuter, education, gender, race and other similar variables\nLast employer attributes: last_employer_naics and last_employer_name describe the employer with maximum last day of work provided by the claimant\nClaim attributes: ic_claim_date, montetraily_eligible, program_type, and other similar variables describe information about the UI claim the individual is making\n\nContinuing this process through the remaining variables can help you get a better understanding of your data, and will greatly aid in gauging the potential of your team’s research ideas."
   },
   {
     "objectID": "Class Notebooks/01A_EDA.html#coverage-of-the-promis-data",
     "href": "Class Notebooks/01A_EDA.html#coverage-of-the-promis-data",
-    "title": "Exploratory Data Analysis",
+    "title": "1  Exploratory Data Analysis",
     "section": "Coverage of the PROMIS data",
     "text": "Coverage of the PROMIS data\nThe next step of our exploratory data analysis is to determine the data coverage: what time span, region, and other important subgroups are captured in the data? Luckily, this is pretty straightforward for the PROMIS data: we know that the geographical area is the entire state of Wisconsin, and that the subgroup covered is UI claimants. To determine the time periods covered, we can look group our data by week:\n\nqry &lt;-  \"\nSELECT week_ending_date, COUNT(*) \nFROM ds_wi_dwd.promis \nGROUP BY week_ending_date \nORDER BY week_ending_date\n\"\n\ncounts_by_week &lt;- dbGetQuery(con, qry)\n\ncounts_by_week\n\nWe can flip through the table above and begin to get a feeling for the number of individuals from each week, but you can also use this same summary dataset to quickly calculate our coverage:\n\nmin(counts_by_week$week)\nmax(counts_by_week$week)\n\nFrom this, we see that the PROMIS data begins in the week ending on REDACTED, and ends after the week ending REDACTED.\nWe can also note that the week with the highest number of claimants was the week ending REDACTED:\n\ncounts_by_week %&gt;%\n  filter(\n    count == max(counts_by_week$count)\n  )"
   },
   {
     "objectID": "Class Notebooks/01A_EDA.html#distribution-of-the-promis-data",
     "href": "Class Notebooks/01A_EDA.html#distribution-of-the-promis-data",
-    "title": "Exploratory Data Analysis",
+    "title": "1  Exploratory Data Analysis",
     "section": "Distribution of the PROMIS data",
     "text": "Distribution of the PROMIS data\nThe final part of data discovery, is in looking at the distribution of key variables in our dataset, and documenting any irregularities such as missingness, outliers, invalid values, or other issues that need to be addressed in an analysis leveraging these data components. We’ll look at a few of the variables from the PROMIS data now, but this type of exploration is more of an art than a science, and the continued exploration of these variables will be an essential component of your projects going forwards in the class. As a reminder, you are encouraged to use this code as inspiration in your own exploration.\n\nweek_ending_date\nLet’s begin by taking the same data pull we just used to examine the data coverage, and plot the data using a simple line plot:\n\nqry &lt;-  \"\nSELECT week_ending_date, COUNT(*) \nFROM ds_wi_dwd.promis \nGROUP BY week_ending_date \nORDER BY week_ending_date\n\"\n\ncounts_by_week &lt;- dbGetQuery(con, qry) \n\nggplot(counts_by_week, aes(x = week_ending_date, y = count)) + \n  geom_line()\n\nImmediately, we can see a massive jump in records in REDACTED, which should make sense given Wisconsin’s economic situation at the time. Does anything else catch your eye? Take some time to examine this graph.\n\n\nbirth_date\nLet’s now look at the distribution of claimants by their year of birth, reported in the PROMIS data as part of the variable birth_date. We will use the SQL function EXTRACT to isolate the year.\n\nNote: We have to use COUNT(DISTINCT(SSN)) instead of COUNT(*) because an individual may be represented in more than one row.\n\n\nqry &lt;- \"\nSELECT extract(year from birth_date) as year, COUNT(DISTINCT(SSN))\nFROM ds_wi_dwd.promis \nGROUP BY year\nORDER BY year DESC\n\"\n\ncounts_by_birthday &lt;- dbGetQuery(con, qry)\n\ncounts_by_birthday\n\nLooking at just the first page of this table is surprising! We see that there are individuals with no reported birthday or with birthdays either in the future or so recently as to be improbable. To see this laid out further, we can plot the above data:\n\nggplot(counts_by_birthday, aes(x = year, y = count)) + \n  geom_bar(stat = \"identity\")\n\nThis confirms our first concerns from the table - there seems to be a weird island of implausible birthdays at the far right side of the graph. For now, we will proceed with our EDA, but these are the kinds of underlying data quality issues that are essential to flag and handle properly later in your eventual analysis. Think about the potential root cause(s) of this issue and the potential impact on your project work.\n\n\nethnicity and race\nFinally, we’ll look at the distribution of two additional individual characteristic variables, starting with ethnicity.\n\nqry &lt;- \"\nSELECT ethnicity, COUNT(DISTINCT(SSN))\nFROM ds_wi_dwd.promis \nGROUP BY ethnicity\n\"\ndbGetQuery(con, qry)\n\nAnd now race:\n\nqry &lt;- \"\nSELECT race, COUNT(DISTINCT(SSN))\nFROM ds_wi_dwd.promis \nGROUP BY race\nORDER BY race\n\"\n\ndbGetQuery(con, qry)\n\nFor each of these variables, we encourage you to refer back to the data dictionary and check what each value above refers to, and whether these are the values we would expect. For both variables, we do see missing observations, as well as an asterisk in the ethnicity variable, which also refers to missing values. What might these mean? For whom might race be unreported? We will discuss the implications of missing data in the Missingness and Inference Errors lecture, but it’s helpful to think about these issues as you continue scoping your project.\nAs a note, the data dictionary is incomplete, but we believe that race = 8 represents Hispanic claimants. In addition, NA and 0 values indicate unknown missing race here, rather 9 as indicated in the dictionary."
   },
   {
     "objectID": "Class Notebooks/01A_EDA.html#checkpoint",
     "href": "Class Notebooks/01A_EDA.html#checkpoint",
-    "title": "Exploratory Data Analysis",
+    "title": "1  Exploratory Data Analysis",
     "section": "Checkpoint",
     "text": "Checkpoint\nUsing the data dictionary, identify one or more further variables from the PROMIS data that might be relevant to your group’s analysis. Think through what these variables “should” look like, as well as what issues might arise. Working individually or with your group, examine the distribution of these variables. Document any EDA-related concerns and findings in your team’s project template. Brainstorm as to what the cause of these issues might be, and how it could impact your analysis."
   },
   {
     "objectID": "Class Notebooks/01A_EDA.html#structure-of-the-ui-wage-records",
     "href": "Class Notebooks/01A_EDA.html#structure-of-the-ui-wage-records",
-    "title": "Exploratory Data Analysis",
+    "title": "1  Exploratory Data Analysis",
     "section": "Structure of the UI wage records",
     "text": "Structure of the UI wage records\nLet’s preview the wage data as before:\n\nqry &lt;- \"\nSELECT * \nFROM ds_wi_dwd.ui_wage \nLIMIT 5\n\"\n  \ndbGetQuery(con, qry)\n\nWe see there are thirteen variables in our dataset. Detailed descriptions of these columns can be found in the UI wage data dictionary (Right-click on link to open) on the P drive.\nLet’s analyze these:\n\nUnit of observation: The “reference worker”. Part of this, as before, is the individual, defined by ssn, name_first, name_middle, name_last. All of these variables are hashed. But, since we are specifically concerned about the worker here, the definition of the reference worker also includes the employer identifier, ui_account. In other words, our unit of observation is the person-employer combination.\nPeriod of observation: Calendar quarter, defined by year and quarter\nAttributes:\n\nEmployer attributes: fips_state_code, ui_account, sein_unit, and ein all hold information on the reference worker’s employer\nLabor attributes: wage, hours, and weeks are all measures of how much the reference worker worked and was paid in a given week. Both hours and weeks are not populated in the table.\n\n\nAs before, our rows are defined by the combination of our unit and period of observation variables - the UI wage records are available at the person-employer-quarter level. We can confirm this fact:\n\nqry &lt;- \"\nSELECT COUNT(*)\nFROM ds_wi_dwd.ui_wage\n\"\n\ndbGetQuery(con, qry)\n\n\nqry &lt;- \"\nSELECT COUNT(DISTINCT CONCAT(CONCAT(CONCAT(ssn, year), quarter), ui_account))\nFROM ds_wi_dwd.ui_wage\n\"\n\ndbGetQuery(con, qry)\n\nYou might note that there is still a slight discrepancy in the counts. While the majority of the data is at the person-employer-quarter level, there are some observations that are not unique up to these three variables. Specific causes vary: some cases are due to changing names, some cases represent just actual duplicated rows of the data. These duplicates don’t indicate that our definition of a row is wrong, but rather that there is remaining underlying messiness in the data that your team will need to address for your sample of the data as part of your data cleaning."
   },
   {
     "objectID": "Class Notebooks/01A_EDA.html#coverage-of-the-ui-wage-records",
     "href": "Class Notebooks/01A_EDA.html#coverage-of-the-ui-wage-records",
-    "title": "Exploratory Data Analysis",
+    "title": "1  Exploratory Data Analysis",
     "section": "Coverage of the UI wage records",
     "text": "Coverage of the UI wage records\nWe know our wage data covers most earners in Wisconsin, but over what time period? We can answer this by looking at observations by quarter:\n\nqry &lt;- \"\nSELECT year, quarter, COUNT(*) \nFROM ds_wi_dwd.ui_wage \nGROUP BY year, quarter\nORDER BY year, quarter\n\"\n\ndbGetQuery(con, qry)\n\nThe coverage of our data is from REDACTED to REDACTED, with no gaps in years or quarters."
   },
   {
     "objectID": "Class Notebooks/01A_EDA.html#distribution-of-the-ui-wage-records",
     "href": "Class Notebooks/01A_EDA.html#distribution-of-the-ui-wage-records",
-    "title": "Exploratory Data Analysis",
+    "title": "1  Exploratory Data Analysis",
     "section": "Distribution of the UI wage records",
     "text": "Distribution of the UI wage records\n\nyear\nLet’s look at the trends by year for the number of observations in the wage data:\n\nqry &lt;- \"\nSELECT year, COUNT(*) \nFROM ds_wi_dwd.ui_wage \nGROUP BY year\nORDER BY year\n\"\n\ncount_per_year &lt;- dbGetQuery(con, qry)\n\nggplot(count_per_year, aes(x = year, y = count)) + \n  geom_line()\n\nThis graph looks as expected, except the first and last years are much lower than all the others. For the first year, REDACTED, this is to be expected: our coverage query revealed that there was only one quarter of that year available.\nBut REDACTED still seems exceptionally low. Looking back at our table of counts by quarter, we can see that beginning in REDACTED, the total number of observations drops by around 2 million per quarter. We are still working on understanding the cause of this drop, so stay tuned for a further update and explanation!\n\n\nwage\nNext, let’s explore the distribution of wages. An easy place to start is the range of observations: what are the minimum, median, and maximum wages recorded?\n\nqry &lt;- \"\nSELECT MIN(wage), MEDIAN(wage), MAX(wage) \nFROM ds_wi_dwd.ui_wage\n\"\n\ndbGetQuery(con, qry)\n\nNote, here we use the median because, as compared to the mean, it will be less impacted by large outliers. We can also look at these summary statistics by year:\n\nqry &lt;- \"\nSELECT year, MIN(wage), MEDIAN(wage), MAX(wage) \nFROM ds_wi_dwd.ui_wage\nGROUP BY year\nORDER BY year\n\"\n\ndbGetQuery(con, qry)\n\nFrom this table, we can see that:\n\nThe maximum is usually many order of magnitude larger than the median, indicating that it, as well as some other potential values, are likely outliers\nFrom REDACTED-REDACTED, the maximum wage is REDACTED. This is likely due to a decision in the database during that period to “top-code” or cut off the highest earners.\n\nTo explore this further, let’s use a bar plot to look at wages in 4 income bands in a single quarter:\n\nqry &lt;- \"\nSELECT wage \nFROM ds_wi_dwd.ui_wage \nWHERE year = 2019 AND quarter = 4\n\"\n\nwages_2019_q4 &lt;- dbGetQuery(con, qry) %&gt;%\n  mutate(\n    # bin the data into 4 groups for plotting\n    bin = cut_interval(wage, n = 4, dig.lab = 7) \n  ) %&gt;%\n  group_by(bin) %&gt;%\n  summarize(\n    count = n()\n  )\n\nggplot(wages_2019_q4, aes(x = bin, y = count)) + \n  geom_bar(stat = \"identity\")\n\nThis graph shows us that almost all wage records are at the very bottom of the distribution, with a few shifting the x-axis rightward - even though there aren’t even enough observations in those higher groups to appear on our graph! This is a common feature of wage distribution, and something to be aware of moving forward, especially when reporting certain statistics that may be influenced by outliers, such as an average (again, as opposed to the median)."
   },
   {
     "objectID": "Class Notebooks/01A_EDA.html#checkpoint-1",
     "href": "Class Notebooks/01A_EDA.html#checkpoint-1",
-    "title": "Exploratory Data Analysis",
+    "title": "1  Exploratory Data Analysis",
     "section": "Checkpoint",
     "text": "Checkpoint\nAs with the PROMIS data, use the data dictionary to identify one or more further variables from the UI wage records that might be relevant to your group’s analysis. Think through what these variables “should” look like, as well as what issues might arise. Working individually or with your group, examine the distribution of these variables. Document any EDA-related concerns and findings in your project template. Brainstorm as to what the cause of these issues might be, and how it could impact your analysis."
   },
   {
     "objectID": "Class Notebooks/01B_cross_section.html",
     "href": "Class Notebooks/01B_cross_section.html",
-    "title": "Cross-Sectional Analysis",
+    "title": "2  Cross-Sectional Analysis",
     "section": "",
-    "text": "Introduction\nOur next notebook in Module 2 will build off the EDA concepts discussed in the first one, extending the years, quarters, and weeks as part of the data coverage component to a method rooted in a specific moment in time - cross-section analysis. A cross-section allows us to look at a slice of our data in time so we can evaluate the stock of observations, just at that particular snapshot. Through the remainder of the class notebooks, we will apply each topic to the same focused research topic, all aimed at better understanding unemployment to reemployment pathways for a specific set of claimants receiving benefits after COVID-imposed restrictions were lifted in Wisconsin.\nComposing a cross-section enables for broad understandings of volume and in this context, claimant compositions. Especially as a workforce board, it can be immensely useful to understand common characteristics of those receiving UI benefits, regardless of benefit duration, particularly in evaluating workforce alignment scenarios to identify promising job matches between prospective employee and employer.\nCross section analyses are limited in gaining a deep understanding of experiences over time, though, because they are tracking stocks of observations at certain points in time, rather than observations consistently throughout the time period. A different analysis method is more appropriate for a longitudinal study, one that we will introduce in the next notebook. At a minimum, even for those intending on evaluating claimant experiences longitudinally, cross sections can provide important context.\nHere, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Cross-section.\nEven though we will eventually build out a longitudinal study for claimants starting to receive UI benefits after COVID-related restrictions ended in the state, starting with a cross-sectional analysis will help us better understand the dynamics of the entire set of individuals receiving UI benefits at this time. Here, we aim to evaluate this stock of claimants in a variety of ways:\nSince a cross-section allows us to look at a snapshot of data, let’s recall the graph of individuals receiving benefits over time:\nAs denoted by the red dotted line, our cross-section comes roughly two years after REDACTED. Additionally, this snapshot occurs just after the holiday season, and a REDACTED. We spaced this snapshot out slightly so as to wait until claimant counts began returning to normal levels.\nThis workbook applies the concepts of a cross-sectional analysis to Wisconsin’s PROMIS data and discusses some of the considerations and potential of such a investigation. Even if your team’s ultimate plan is to perform a longitudinal analysis, a cross-sectional approach may be useful. If your team deems it appropriate to develop a cross-section, you are encouraged to repurpose as much code as possible in developing your initial snapshot and subsequent descriptive analysis.\nAs you work through your project, it is recommended that you add your thoughts and findings to your team’s project template in the ADRF.\nTian Lou, & Dave McQuown. (2021, March 8). Data Exploration for Cross-sectional Analysis using Illinois Unemployment Insurance Data. Zenodo. https://doi.org/10.5281/zenodo.4588936\nTian Lou, & Dave McQuown. (2021, March 8). Data Visualization using Illinois Unemployment Insurance Data. Zenodo. https://doi.org/10.5281/zenodo.4589040\nCensus NAICS codes. https://www.census.gov/naics/\nDepartment of Labor claims definitions. https://oui.doleta.gov/dv/dvkms/ui_dv_pop3-3a_claims.pdf"
+    "text": "3 Introduction\nOur next notebook in Module 2 will build off the EDA concepts discussed in the first one, extending the years, quarters, and weeks as part of the data coverage component to a method rooted in a specific moment in time - cross-section analysis. A cross-section allows us to look at a slice of our data in time so we can evaluate the stock of observations, just at that particular snapshot. Through the remainder of the class notebooks, we will apply each topic to the same focused research topic, all aimed at better understanding unemployment to reemployment pathways for a specific set of claimants receiving benefits after COVID-imposed restrictions were lifted in Wisconsin.\nComposing a cross-section enables for broad understandings of volume and in this context, claimant compositions. Especially as a workforce board, it can be immensely useful to understand common characteristics of those receiving UI benefits, regardless of benefit duration, particularly in evaluating workforce alignment scenarios to identify promising job matches between prospective employee and employer.\nCross section analyses are limited in gaining a deep understanding of experiences over time, though, because they are tracking stocks of observations at certain points in time, rather than observations consistently throughout the time period. A different analysis method is more appropriate for a longitudinal study, one that we will introduce in the next notebook. At a minimum, even for those intending on evaluating claimant experiences longitudinally, cross sections can provide important context.\nHere, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Cross-section.\nEven though we will eventually build out a longitudinal study for claimants starting to receive UI benefits after COVID-related restrictions ended in the state, starting with a cross-sectional analysis will help us better understand the dynamics of the entire set of individuals receiving UI benefits at this time. Here, we aim to evaluate this stock of claimants in a variety of ways:\nSince a cross-section allows us to look at a snapshot of data, let’s recall the graph of individuals receiving benefits over time:\nAs denoted by the red dotted line, our cross-section comes roughly two years after REDACTED. Additionally, this snapshot occurs just after the holiday season, and a REDACTED. We spaced this snapshot out slightly so as to wait until claimant counts began returning to normal levels.\nThis workbook applies the concepts of a cross-sectional analysis to Wisconsin’s PROMIS data and discusses some of the considerations and potential of such a investigation. Even if your team’s ultimate plan is to perform a longitudinal analysis, a cross-sectional approach may be useful. If your team deems it appropriate to develop a cross-section, you are encouraged to repurpose as much code as possible in developing your initial snapshot and subsequent descriptive analysis.\nAs you work through your project, it is recommended that you add your thoughts and findings to your team’s project template in the ADRF.\nTian Lou, & Dave McQuown. (2021, March 8). Data Exploration for Cross-sectional Analysis using Illinois Unemployment Insurance Data. Zenodo. https://doi.org/10.5281/zenodo.4588936\nTian Lou, & Dave McQuown. (2021, March 8). Data Visualization using Illinois Unemployment Insurance Data. Zenodo. https://doi.org/10.5281/zenodo.4589040\nCensus NAICS codes. https://www.census.gov/naics/\nDepartment of Labor claims definitions. https://oui.doleta.gov/dv/dvkms/ui_dv_pop3-3a_claims.pdf"
   },
   {
     "objectID": "Class Notebooks/01B_cross_section.html#load-libraries",
     "href": "Class Notebooks/01B_cross_section.html#load-libraries",
-    "title": "Cross-Sectional Analysis",
+    "title": "2  Cross-Sectional Analysis",
     "section": "Load Libraries",
     "text": "Load Libraries\nJust like in the first notebook, we will start by loading necessary packages not readily available in the base R setup.\n\nAs a reminder, every time you create a new R file, you should copy and run the following code snippet.\n\n\noptions(scipen = 999) # avoid scientific notation\nlibrary(RJDBC)\nlibrary(tidyverse)"
   },
   {
     "objectID": "Class Notebooks/01B_cross_section.html#establish-database-connection",
     "href": "Class Notebooks/01B_cross_section.html#establish-database-connection",
-    "title": "Cross-Sectional Analysis",
+    "title": "2  Cross-Sectional Analysis",
     "section": "Establish Database Connection",
     "text": "Establish Database Connection\nThe following set of commands will set up a connection to the Redshift database:\n\ndbusr=Sys.getenv(\"DBUSER\")\ndbpswd=Sys.getenv(\"DBPASSWD\")\n\nurl &lt;- \"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;loginToRp=urn:amazon:webservices:govcloud;ssl=true;AutoCreate=true;idp_host=adfs.adrf.net;idp_port=443;ssl_insecure=true;plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\"\n\ndriver &lt;- JDBC(\n  \"com.amazon.redshift.jdbc42.Driver\",\n  classPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\", identifier.quote=\"`\"\n)\n\ncon &lt;- dbConnect(driver, url, dbusr, dbpswd)\n\nFor this code to work, you need to have an .Renviron file in your user folder (i.e. U:\\John.Doe.P00002) containing your username and password."
   },
   {
     "objectID": "Class Notebooks/01B_cross_section.html#checkpoint",
     "href": "Class Notebooks/01B_cross_section.html#checkpoint",
-    "title": "Cross-Sectional Analysis",
+    "title": "2  Cross-Sectional Analysis",
     "section": "Checkpoint",
     "text": "Checkpoint\nDo you think a cross-sectional analysis would be helpful for your group project? Either way, jot down some ideas and provide a supporting argument for doing so (or not) in your team’s project template. Identify specific date ranges that may be valuable to consider in developing a cross section if you are arguing for the utility of the analysis."
   },
   {
     "objectID": "Class Notebooks/01B_cross_section.html#creating-our-cross-section",
     "href": "Class Notebooks/01B_cross_section.html#creating-our-cross-section",
-    "title": "Cross-Sectional Analysis",
+    "title": "2  Cross-Sectional Analysis",
     "section": "Creating our Cross-section",
     "text": "Creating our Cross-section\nNow, we will use the Wisconsin PROMIS table to isolate this particular set of claimants. Since we verified each row of the PROMIS data is a unique combination of ssn and week_ending_date in Workbook 1A, we can identify our cross-section, with one row per individual, of those claiming UI benefits at this time with the following code:\n\nqry &lt;- \"\nSELECT * \nFROM ds_wi_dwd.promis \nWHERE week_ending_date = '2022-03-26' and program_type = 1\n\"\n\ncross_section &lt;- dbGetQuery(con, qry)\n\nhead(cross_section)\n\nAs opposed to the millions of rows in the full PROMIS dataset, this cross section is far smaller:\n\ncross_section %&gt;%\n  summarize(\n    n_rows = n()\n  )\n\nAlso, we can note that, just as expected, we have removed the time dimension from this data by comparing the total number of rows to unique ssn values:\n\ncross_section %&gt;%\n  summarize(\n    n_rows = n(),\n    n_people = n_distinct(ssn)\n  )"
   },
   {
     "objectID": "Class Notebooks/01B_cross_section.html#analysis",
     "href": "Class Notebooks/01B_cross_section.html#analysis",
-    "title": "Cross-Sectional Analysis",
+    "title": "2  Cross-Sectional Analysis",
     "section": "Analysis",
     "text": "Analysis\nNow that we have our cross-section identified, we can explore the variation within this data frame, allowing for a more nuanced view of the volume of claimants for the week ending March 26, 2022. In the following subsections, we will look at the following aspects of the cross-section:\n\nInitial claim type\nRace\nIndustry\nIndustry + Workforce Development Area (WDA)\n\nOf course, these are only a few of the various combinations of variables we can use to analyze this group.\n\nInitial claim type\nIn the next notebook, when we start to build out our longitudinal analysis, we will focus on a set of individuals who start their claims at the end of March. Before doing so, in evaluating the group of all claimants in this week, we can identify the percentage of new claimants.\nThis variable is stored in the PROMIS data as ic_type.\n\ncross_section %&gt;%\n  group_by(ic_type) %&gt;%\n  summarize(\n    n_people = n_distinct(ssn)\n  ) %&gt;%\n  # ungroup so we can take percentage of all in data frame in next step\n  ungroup() %&gt;%\n  mutate(\n    perc = 100*n_people/sum(n_people)\n  )\n\nIn referring back to the PROMIS data dictionary, we can see that the vast majority of claimants were continuing. This is not always the case, though, especially when looking at cross-sectional distributions in March of 2020, as COVID shutdowns greatly impacted the Wisconsin workforce.\nAs an additional note that is not included in the codebook, ic_type = 3 indicates a transitional claim. From the Department of Labor’s website: a transitional claim is a claim filed to request a determination of eligibility and establishment of a new benefit year having an effective date within the seven-day period immediately following the benefit year ending date and a week for which compensation or waiting period credit was claimed.\n\n\nRace\nThe distribution of race (available as race) for our cross-section is as follows:\n\ncross_section %&gt;%\n  group_by(race) %&gt;%\n  summarize(\n    n_people = n_distinct(ssn)\n  ) %&gt;%\n  ungroup() %&gt;%\n  mutate(\n    perc = 100*n_people/sum(n_people)\n  )\n\nAs noted in the prior notebook, the data dictionary is incomplete, but we believe that race = 8 represents Hispanic claimants. In addition, NA and 0 values indicate unknown missing race here, rather 9 as indicated in the dictionary.\n\n\nIndustry\nAssessing the industry breakdown of each claimant’s most recent employer is a bit more complicated than that required of the previous two sections because the PROMIS variable last_employer_naics contains 6-digit industry codes rather than industry names.\n\ncross_section %&gt;%\n  select(last_employer_naics) %&gt;%\n  head(5)\n\nLuckily, these North American Industry Classification System (NAICS) codes were developed jointly by representatives in the U.S., Canada, and Mexico, and are standardized across jurisdictions and federal statistical agencies. Accordingly, a publicly-available NAICS code to industry description crosswalk has been ingested into the ADRF and is available in the ds_public_1 schema as naics_descriptions.\nNAICS codes can be listed up to 6 digits; after the first two, each additional number provides a more granular view into the industry classification. Although it would be great to analyze the industry distribution at its most granular level available in the data, there are more than 1000 6-digit NAICS codes! Instead, for simplification, we will look at the 24 potential 2-digit identifiers. 3-digit codes can also be very useful if you find the 2-digit classifications too broad - for context, there are 99 potential 3-digit NAICS values.\n\nLinkage Approach\nGiven this information, we will take the following approach:\n\nRead the NAICS to industry description crosswalk, ds_public_1.naics_descriptions, into R\nCreate a new column with 2-digit NAICS codes in cross_section\nLeft join the 2-digit NAICS codes in cross_section to the naics_descriptions crosswalk. We will use a left join to ensure we keep all 2-digit last_employer_naics observations, even if they are missing (in general or in the crosswalk).\n\nThe technical implementation of this process will follow. If you’d like, you can skip to the Post-Linkage Analysis subsection.\n\n\nLinkage Implementation\nWe’ll start by reading the NAICS to industry description crosswalk into R.\n\nqry &lt;- \"\nSELECT * \nFROM ds_public_1.naics_descriptions\n\"\n\nnaics_to_industry &lt;- dbGetQuery(con, qry)\n\nhead(naics_to_industry)\n\nNote that the codes are stored in a character variable, rather than a numeric one. This is important for ensuring that leading zeros are available in these codes.\nWe’ll now proceed onto Step 2, creating a new 2-digit NAICS code column in our cross_section data frame.\n\nReminder: You can pull up documentation on a function in R by running ?[function] in your console.\n\n\ncross_section &lt;- cross_section %&gt;%\n  # substring allows us to isolate specific parts of a character variable\n  mutate(last_employer_naics_2 = substring(last_employer_naics, 1, 2))\n\n# show updated column\ncross_section %&gt;%\n  select(last_employer_naics, last_employer_naics_2) %&gt;%\n  head()\n\nFinally, we will proceed by performing a left join between our cross section and NAICS crosswalk data frames.\n\ncross_section_naics &lt;- cross_section %&gt;%\n  left_join(naics_to_industry, by = c(\"last_employer_naics_2\" = \"code\"))\n\n# show new column from crosswalk\ncross_section_naics %&gt;%\n  select(last_employer_naics_2, title) %&gt;%\n  head()\n\n\n\nPost-Linkage Analysis\nWith our linked data frame now available, we can find the most common industries of the claimants’ most recent employers.\n\ncross_section_naics %&gt;%\n  group_by(title) %&gt;%\n  summarize(\n    n_people = n_distinct(ssn)\n  ) %&gt;%\n  arrange(desc(n_people))\n\nNote here than there are still a large group of individuals who have NA given as their NAICS industry. This is because we are lacking information on the last employers of these individuals - their other last employer identifying variables should also be NA.\n\n\n\nIndustry and WDA\nAs we investigate various cross-sectional breakdowns, we do not need to limit ourselves to a single variable. Particularly in evaluating industries, there may be regional (and other) dynamics at play. Here, we’ll identify the most common recent industries by WDA.\nFor perspective, a map of the WDAs is available on the class website, under the resources page.\nChallenge: Can you find the WDA variable in the PROMIS data dictionary?\n\nLinkage Approach\nAs the title of this section might have indicated, we do not have WDA-level information available in the PROMIS data. We do have residential zip codes, res_zip, at least. When combined with a zip code to county, and then a separate county to WDA crosswalk, we can link the PROMIS observations to WDAs.\nIn total, our approach will be as follows:\n\nRead the zip code to county and county to WDA crosswalks available as .csv files in the P: drive into R\nUpdate the county name to match on county between the two data frames\nLeft join the two crosswalks (zip code to county on the left to ensure no missing zip codes)\nLeft join this newly-formed crosswalk in step 3 to the cross-section data frame\n\nSimilar to the Industry portion, the technical implementation of this process will follow. If you’d like, you can skip to the Post-Linkage Analysis subsection.\n\nNote: Zip codes do not always directly map to counties. To differentiate between potential counties for the same zip code, we have isolated the county with the highest coverage area in our crosswalk. This crosswalk contains information for nearly every zip code, but may have a small degree of missingness.\n\n\n\nLinkage Implementation\nWe’ll start with step 1 - reading the csv files from the P: drive into R.\n\nzip_to_county &lt;- read_csv(\"P:/tr-wi-2023/Public Data/hud_zip_county_wi_updated_small.csv\")\n\ncounty_to_wda &lt;- read_csv(\"P:/tr-wi-2023/Public Data/County - WDA.csv\") \n\nhead(zip_to_county)\nhead(county_to_wda)\n\nDo you notice how the values of the column we’d like to join on, County, are slightly different across the two crosswalks? In order to perform this join, we will need a consistent county naming convention.That brings us to Step 2, where we will remove the ” County” portion of the County variable in county_to_wda and convert its values to all uppercase.\n\n# toupper() converts all values to uppercase\n# str_remove will remove all instances of the desired characters from the entire character string\ncounty_to_wda &lt;- county_to_wda %&gt;%\n  mutate(\n    County = toupper(str_remove(County, \" County\"))\n  )\n\nhead(county_to_wda)\n\nNow we can move onto step 3, joining the two crosswalks to create a larger one linking zip codes to WDAs.\n\nzip_to_wda &lt;- zip_to_county %&gt;%\n  left_join(county_to_wda, by = c(\"County\")) %&gt;%\n  select(-County) %&gt;%\n  # convert zip to character b/c stored as character in PROMIS\n  mutate(\n    zip = as.character(zip)\n  )\n\nhead(zip_to_wda)\n\nFinally, we can complete the last step of joining this larger zip code to WDA crosswalk to the cross_section data frame.\n\ncross_section_naics_wda &lt;- cross_section_naics %&gt;%\n  left_join(zip_to_wda, by = c(\"res_zip\" = \"zip\"))\n\ncross_section_naics_wda %&gt;%\n  select(res_zip, WDA) %&gt;%\n  head()\n\n\n\nPost-Linkage Analysis\nWith res_zip now connected to WDAs, we can find the most common industries by WDA.\n\n# first find # of people by title/WDA combination\n# then regroup by WDA, sort data and isolate top 3 entries by new group (WDA)\ncross_section_naics_wda %&gt;%\n  group_by(title, WDA) %&gt;%\n  summarize(\n    n_people = n_distinct(ssn)\n  ) %&gt;%\n  ungroup() %&gt;%\n  group_by(WDA) %&gt;%\n  arrange(WDA, desc(n_people)) %&gt;%\n  slice(1:3)\n\nFor parsing these types of results, local knowledge can be extremely helpful in understanding the differences across WDA. Again, a map of the WDA locations is available on the resources page of the website. Please refer to it as needed, and discuss with your team some of the variation from this table."
   },
   {
     "objectID": "Class Notebooks/01B_cross_section.html#checkpoint-1",
     "href": "Class Notebooks/01B_cross_section.html#checkpoint-1",
-    "title": "Cross-Sectional Analysis",
+    "title": "2  Cross-Sectional Analysis",
     "section": "Checkpoint",
     "text": "Checkpoint\nIf you think a cross-sectional analysis would be helpful for your group project, identify variables, or combinations of variables, you’d like to look into after developing your cross-section. Working individually or with your group, if you end up developing a cross-section, examine the distribution of these variables. Document any concerns and findings in your team’s project template, and think about how you may want to contextualize these findings within your overall project."
   },
   {
     "objectID": "Class Notebooks/02_cohort.html",
     "href": "Class Notebooks/02_cohort.html",
-    "title": "Cohort Analysis Part 1: Defining a Cohort",
+    "title": "3  Cohort Analysis Part 1: Defining a Cohort",
     "section": "",
-    "text": "Introduction\nWelcome to the second notebook of Module 2 of this course! Here, we will begin the process of cohort creation for our research topic spanning the entire series of class notebooks, which is focused on better understanding unemployment to reemployment pathways for a specific set of claimants receiving benefits after COVID-imposed restrictions were lifted in Wisconsin.\nPreviously, we applied a cross-sectional analysis to the PROMIS data, which allowed us to better understand the volume of individuals interacting with the unemployment insurance (UI) system at a specific moment in time. Since cross-sections are restricted to particular snapshots, and do not account for shocks though, they are limited in providing a framework for tracking experiences over time.\nA separate method is more appropriate for a longitudinal analysis: cohort analysis. In creating a cohort, we will denote a reference point where each member of our cohort experienced a common event - this could be entry into a program, exit from a program, or any other shared experience across a set of observations. With this setup, we can better understand and compare the experiences of those encountering the same policies and economic shocks at the same time, especially across different subgroups.\nIn total, there are three main steps in carrying out an effective cohort analysis:\nThis notebook is concerned with the first step, as we will walk through the decision rules we will use to define a cohort from the raw microdata aimed at helping us answer our research question. The following notebooks will leverage this initial cohort as we build out the rest of the analysis.\nHere, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Defining a Cohort.\nBefore writing code for creating and exploring our cohort, it’s crucial to think through the decisions from a data literacy standpoint. Again, the key idea here is to define a set of individuals with a consistent “anchor point” in the data so we can follow them longitudinally.\nFirst, we have to think through the underlying set of observations we want to track over time and where they exist. Fundamentally, this ties back to identifying our original population of interest.\nNow that we have developed our cohort decisions, we can start building it out. We will do this in two steps:\nThis workbook covers the conceptual approach for developing an appropriate cohort aimed at informing a specific research topic. As you work with your group, you should be thinking about the decision rules applied in this notebook and their potential pertinence to your research project. Once you define your cohort, you are encouraged to conduct a basic exploration of key subgroups before progressing with your longitudinal analysis, paying close attention to the subgroup counts.\nGiven that the data application decisions are not finalized, you can expect to receive an update on the translation of these cohort restrictions to the PROMIS data next class.\nTian Lou, & Dave McQuown. (2021, March 8). Data Exploration for Cohort Analysis using Illinois Unemployment Insurance Data. Zenodo. https://doi.org/10.5281/zenodo.4589024\nAR Creating a Cohort Notebook (link to come)"
+    "text": "4 Introduction\nWelcome to the second notebook of Module 2 of this course! Here, we will begin the process of cohort creation for our research topic spanning the entire series of class notebooks, which is focused on better understanding unemployment to reemployment pathways for a specific set of claimants receiving benefits after COVID-imposed restrictions were lifted in Wisconsin.\nPreviously, we applied a cross-sectional analysis to the PROMIS data, which allowed us to better understand the volume of individuals interacting with the unemployment insurance (UI) system at a specific moment in time. Since cross-sections are restricted to particular snapshots, and do not account for shocks though, they are limited in providing a framework for tracking experiences over time.\nA separate method is more appropriate for a longitudinal analysis: cohort analysis. In creating a cohort, we will denote a reference point where each member of our cohort experienced a common event - this could be entry into a program, exit from a program, or any other shared experience across a set of observations. With this setup, we can better understand and compare the experiences of those encountering the same policies and economic shocks at the same time, especially across different subgroups.\nIn total, there are three main steps in carrying out an effective cohort analysis:\nThis notebook is concerned with the first step, as we will walk through the decision rules we will use to define a cohort from the raw microdata aimed at helping us answer our research question. The following notebooks will leverage this initial cohort as we build out the rest of the analysis.\nHere, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Defining a Cohort.\nBefore writing code for creating and exploring our cohort, it’s crucial to think through the decisions from a data literacy standpoint. Again, the key idea here is to define a set of individuals with a consistent “anchor point” in the data so we can follow them longitudinally.\nFirst, we have to think through the underlying set of observations we want to track over time and where they exist. Fundamentally, this ties back to identifying our original population of interest.\nNow that we have developed our cohort decisions, we can start building it out. We will do this in two steps:\nThis workbook covers the conceptual approach for developing an appropriate cohort aimed at informing a specific research topic. As you work with your group, you should be thinking about the decision rules applied in this notebook and their potential pertinence to your research project. Once you define your cohort, you are encouraged to conduct a basic exploration of key subgroups before progressing with your longitudinal analysis, paying close attention to the subgroup counts.\nGiven that the data application decisions are not finalized, you can expect to receive an update on the translation of these cohort restrictions to the PROMIS data next class.\nTian Lou, & Dave McQuown. (2021, March 8). Data Exploration for Cohort Analysis using Illinois Unemployment Insurance Data. Zenodo. https://doi.org/10.5281/zenodo.4589024\nAR Creating a Cohort Notebook (link to come)"
   },
   {
     "objectID": "Class Notebooks/02_cohort.html#load-libraries",
     "href": "Class Notebooks/02_cohort.html#load-libraries",
-    "title": "Cohort Analysis Part 1: Defining a Cohort",
+    "title": "3  Cohort Analysis Part 1: Defining a Cohort",
     "section": "Load Libraries",
     "text": "Load Libraries\nAs before, we will start by loading necessary packages not readily available in the base R setup.\n\nAs a reminder, every time you create a new R file, you should copy and run the following code snippet.\n\n\noptions(scipen = 999) # avoid scientific notation\nlibrary(RJDBC)\nlibrary(tidyverse)"
   },
   {
     "objectID": "Class Notebooks/02_cohort.html#establish-database-connection",
     "href": "Class Notebooks/02_cohort.html#establish-database-connection",
-    "title": "Cohort Analysis Part 1: Defining a Cohort",
+    "title": "3  Cohort Analysis Part 1: Defining a Cohort",
     "section": "Establish Database Connection",
     "text": "Establish Database Connection\nThe following set of commands will set up a connection to the Redshift database:\n\ndbusr=Sys.getenv(\"DBUSER\")\ndbpswd=Sys.getenv(\"DBPASSWD\")\n\nurl &lt;- \"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;loginToRp=urn:amazon:webservices:govcloud;ssl=true;AutoCreate=true;idp_host=adfs.adrf.net;idp_port=443;ssl_insecure=true;plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\"\n\ndriver &lt;- JDBC(\n  \"com.amazon.redshift.jdbc42.Driver\",\n  classPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\", identifier.quote=\"`\"\n)\n\ncon &lt;- dbConnect(driver, url, dbusr, dbpswd)\n\nFor this code to work, you need to have an .Renviron file in your user folder (i.e. U:\\John.Doe.P00002) containing your username and password."
   },
   {
     "objectID": "Class Notebooks/02_cohort.html#primary-data-source",
     "href": "Class Notebooks/02_cohort.html#primary-data-source",
-    "title": "Cohort Analysis Part 1: Defining a Cohort",
+    "title": "3  Cohort Analysis Part 1: Defining a Cohort",
     "section": "Primary Data Source",
     "text": "Primary Data Source\nIn the context of this class, we have two options for tracking individuals longitudinally:\n\nPROMIS Data\nUI Wage Records\n\nIn either case, our choice of the base dataset limits the scope of potential observations: if we choose the PROMIS data, then each member of our cohort must have interacted with the UI system. On the other hand, if we base our cohort in the wage data, then each member of our cohort must have earned covered wages at some point in time. In revisiting our research question, we need to decide if it calls for an analysis of UI claimants, who may or may not have received wages, or about covered workers, who may or may not be UI claimants.\nHere, our primary population of interest is UI claimants, so we are going to build our cohort using the PROMIS data."
   },
   {
     "objectID": "Class Notebooks/02_cohort.html#defining-characteristics---general-considerations",
     "href": "Class Notebooks/02_cohort.html#defining-characteristics---general-considerations",
-    "title": "Cohort Analysis Part 1: Defining a Cohort",
+    "title": "3  Cohort Analysis Part 1: Defining a Cohort",
     "section": "Defining Characteristics - General Considerations",
     "text": "Defining Characteristics - General Considerations\nNow that we have identified our primary data source, we next need to decide how we are going to filter the millions of observations in our PROMIS file to a useful subset for analysis. The decision rules for your cohort will vary based on your population of interest and research question, but generally, could include:\n\nCategorical decisions: You might focus on certain characteristics. Some may be based on already provided categorical variables (like filtering for only claimants from a specific region), with others on certain conditions derived from numeric variables (like filtering for only claimants receiving payments over a certain amount)\nTime-based decisions: Because a cohort is defined by a timetable, you will need to limit your cohort to only a particular starting period or periods. You might also want to include a cutoff date for observations, or add other time-based controls to your cohort.\n\nIn addition to these types of decisions, you might also need to exclude certain individuals from your cohort based on issues or inaccuracies in the underlying data. These could include inaccuracies in birth dates, as we saw in the EDA notebook, or missing data. We will talk more specifically about how to handle these inaccuracies later, but it is something to keep in mind as you begin planning your cohort with your team.\n\nNotebook Decisions\nFor the purposes of the research topic explored in this series of notebooks, we will filter the PROMIS data in the following ways to develop our cohort:\n\nCategorical decision rules:\n\nRegular UI benefit claimants (not other programs), program_type = 1\nNew claimants to evaluate their experiences starting with UI benefit reception, ic_type = 1\n\nTime-based decision rules:\n\nClaimants who started their benefit year after Wisconsin’s reopenings with enough time to look at their potential subsequent reemployment, effective_date_of_claim = \"2023-03-20\"\n…and received their first payment that week, week_ending_date = \"2023-03-26\" and stop_payment_indicator = 'N'\n\n\n\nNote: The conceptual framework for developing our cohort in this fashion aligns with the original research intent; however, the specific variables used are subject to change pending input from Wisconsin’s UI experts.\n\n\n\nCheckpoint\nHow are you thinking of defining your team’s cohort? Refer back to the data dictionary and decide how each variable should be used (if at all) as part of your cohort definition. Take notes in your project template and discuss with your team."
   },
   {
     "objectID": "Class Notebooks/02_cohort.html#initial-cohort",
     "href": "Class Notebooks/02_cohort.html#initial-cohort",
-    "title": "Cohort Analysis Part 1: Defining a Cohort",
+    "title": "3  Cohort Analysis Part 1: Defining a Cohort",
     "section": "Initial Cohort",
     "text": "Initial Cohort\nWe can identify this first benefit week with the following code:\n\nqry &lt;- \"\nSELECT *\nFROM ds_wi_dwd.promis \nWHERE ic_type = 1\n  AND program_type = 1\n  AND effective_date_of_claim = '2022-03-20'\n  AND week_ending_date = '2022-03-26'\n  AND stop_payment_indicator = 'N'\n\"\n\ncohort_initial &lt;- dbGetQuery(con, qry)\n\nBefore bringing in the rest of these individuals’ UI benefit observations, we will quickly explore their time invariant characteristics.\n\nTotal Count\nFirst, we can find the total number of rows in our initial cohort slice, which should match the number of individuals (as defined by SSN), since a row in the PROMIS table is available at the individual/week grain:\n\ncohort_initial %&gt;%\n  summarize(\n    n_rows = n(),\n    n_people = n_distinct(ssn)\n  )\n\nFor context, we can compare our cohort size to the total number of individuals receiving benefits, or our complete cross section, for the week ending March 26, 2022, which we found in the previous notebook.\n\nqry &lt;- \"\nselect count(*)\nfrom ds_wi_dwd.promis \nwhere week_ending_date = '2022-03-26'\n\"\n\ndbGetQuery(con, qry)\n\nWe can see that new recipients make up a tiny portion of all UI benefit recipients in this week.\n\n\nRace\nLet’s consider the racial composition of our cohort:\n\ncohort_initial %&gt;%\n  group_by(race) %&gt;%\n  summarize(\n    n_people = n_distinct(ssn)\n  ) %&gt;%\n  ungroup() %&gt;%\n  mutate(\n    perc = 100*n_people/sum(n_people)\n  )\n\nIn comparison to the overall stock of claimants receiving benefits, we can see that the set of new recipients has a higher percentage of white claimants.\n\n\nIndustry\nNext, we want to look at the distribution of our cohort by the industry of their most recent employer. We explained the code for the industry linkage in Notebook 01B.\n\nReview: Industry Linkage\nLoad the NAICS to industry crosswalk:\n\nqry &lt;- \"\nSELECT * \nFROM ds_public_1.naics_descriptions\n\"\n\nnaics_to_industry &lt;- dbGetQuery(con, qry)\n\nIsolate the first two digits of the NAICS code from the PROMIS data:\n\ncohort_initial &lt;- cohort_initial %&gt;%\n  mutate(\n    last_employer_naics_2 = substring(last_employer_naics, 1, 2)\n  )\n\nJoin the crosswalk to the cohort data frame:\n\ncohort_initial &lt;- cohort_initial %&gt;%\n  left_join(naics_to_industry, by = c(\"last_employer_naics_2\" = \"code\"))\n\n\n\nPost-Linkage Analysis\nNow that we have linked the industry classifications to our cohort, let’s find the most common primary industries of our cohort’s most recent employers:\n\ncohort_initial %&gt;% \n  group_by(title) %&gt;%\n  summarize(\n    n_people = n_distinct(ssn)\n  ) %&gt;%\n  arrange(desc(n_people)) %&gt;%\n  head(5)\n\nHow does this compare to the industry breakdown of the entire cross-section?\n\n\n\nIndustry and WDA\nLike we did in the previous notebook, we can extend this analysis to segment the most common industries by workforce development area (WDA).\nAgain, we will quickly demonstrate the linkage approach to the WDAs, with more detail available in the cross-section notebook.\n\nReview: WDA Linkage\nRead the zip code to county and county to WDA crosswalks available as .csv files in the P: drive into R:\n\nzip_to_county &lt;- read_csv(\"P:/tr-wi-2023/Public Data/hud_zip_county_wi_updated_small.csv\")\n\ncounty_to_wda &lt;- read_csv(\"P:/tr-wi-2023/Public Data/County - WDA.csv\") \n\nhead(zip_to_county)\nhead(county_to_wda)\n\nUpdate the county name to match on county between the two data frames:\n\n# toupper() converts all values to uppercase\n# str_remove will remove all instances of the desired characters from the entire character string\ncounty_to_wda &lt;- county_to_wda %&gt;%\n  mutate(\n    County = toupper(str_remove(County, \" County\"))\n  )\n\nhead(county_to_wda)\n\nJoin the two crosswalks:\n\nzip_to_wda &lt;- zip_to_county %&gt;%\n  left_join(county_to_wda, by = c(\"County\")) %&gt;%\n  select(-County) %&gt;%\n  # convert zip to character b/c stored as character in PROMIS\n  mutate(\n    zip = as.character(zip)\n  )\n\nhead(zip_to_wda)\n\nJoin this new crosswalk to the cohort data frame:\n\ncohort_initial_wda &lt;- cohort_initial %&gt;%\n  left_join(zip_to_wda, by = c(\"res_zip\" = \"zip\"))\n\ncohort_initial_wda %&gt;%\n  select(res_zip, WDA) %&gt;%\n  head()\n\n\n\nPost-Linkage Analysis\nNow, we can find the most common industry within each WDA:\n\n# first find # of people by title/WDA combination\n# then regroup by WDA, sort data and isolate top 3 entries by new group (WDA)\ncohort_initial_wda %&gt;%\n  group_by(title, WDA) %&gt;%\n  summarize(\n    n_people = n_distinct(ssn)\n  ) %&gt;%\n  ungroup() %&gt;%\n  group_by(WDA) %&gt;%\n  arrange(WDA, desc(n_people)) %&gt;%\n  slice(1)\n\nWith the caveat that there are only a few claimants in some of the WDAs, the most common industry does differ in specific WDAs.\n\nIn building out these distributions, it is important to pay attention to subgroup sizes, as you may need to aggregate or expand your cohort definition (perhaps to multiple entry weeks) to ensure you have enough individuals for analysis.\n\n\n\n\nCheckpoint\nWhat further exploration will be needed after your team has constructed your cohort? What other subgroups, beyond gender and industry, might you explore the distribution for in your cohort? How might the distribution of individuals within these subgroups impact your analysis? How might you add this context when discussing your findings?\nThink through these questions individually and with your team, and note any further analyses you think will be necessary in your project template."
   },
   {
     "objectID": "Class Notebooks/02_cohort.html#creating-the-full-cohort-dataset",
     "href": "Class Notebooks/02_cohort.html#creating-the-full-cohort-dataset",
-    "title": "Cohort Analysis Part 1: Defining a Cohort",
+    "title": "3  Cohort Analysis Part 1: Defining a Cohort",
     "section": "Creating the “full” cohort dataset",
     "text": "Creating the “full” cohort dataset\nAfter getting a better sense of our cohort composition and confirming our cohort definition, we can harness the full power of the framework by bringing in the rest of the PROMIS data observations within these benefit spells. To do so, we will form a CTE joining our initial cohort back to the entire PROMIS table, conditional on the ssn and effective_date_of_claim. If you need a refresher on CTEs, we covered this in the Foundations Module Unit 2 workbook.\n\nNote: There may be additional considerations we did not include in this query in identifying future benefit reception for our cohort. We will update this code if necessary in the coming week after a discussion with the Wisconsin PROMIS data experts.\n\n\nqry &lt;- \"\nWITH cohort AS (\n  SELECT *\n  FROM ds_wi_dwd.promis \n  WHERE ic_type = 1\n    AND program_type = 1\n    AND effective_date_of_claim = '2022-03-20'\n    AND week_ending_date = '2022-03-26'\n    AND stop_payment_indicator = 'N'\n)\nSELECT p.*\nFROM cohort c\nJOIN ds_wi_dwd.promis p\nON c.ssn = p.ssn and c.effective_date_of_claim = p.effective_date_of_claim \n\"\n\ncohort_full &lt;- dbGetQuery(con, qry)\n\nWe can find the average number of benefit weeks per individual in our cohort:\n\ncohort_full %&gt;%\n  summarize(\n    n_weeks = n(),\n    n_people = n_distinct(ssn),\n    avg_weeks = n_weeks/n_people\n  )"
   },
   {
     "objectID": "Class Notebooks/02_cohort.html#temporal-measure-planning",
     "href": "Class Notebooks/02_cohort.html#temporal-measure-planning",
-    "title": "Cohort Analysis Part 1: Defining a Cohort",
+    "title": "3  Cohort Analysis Part 1: Defining a Cohort",
     "section": "Temporal Measure Planning",
     "text": "Temporal Measure Planning\nWith our full cohort available, we can start to think about useful outcome measures that may not require any information from other datasets. One potentially-relevant measure for assessing UI program experiences is the distribution of time spent receiving benefits, or in other words, exit rates for our cohort.\nWe will cover this calculation, and others, in the upcoming longitudinal analysis notebook."
   },
   {
     "objectID": "Class Notebooks/03_record_linkage.html",
     "href": "Class Notebooks/03_record_linkage.html",
-    "title": "Cohort Analysis Part 2: Data Model and Record Linkage",
+    "title": "4  Cohort Analysis Part 2: Data Model and Record Linkage",
     "section": "",
-    "text": "Introduction\nWelcome to our third notebook of this module! In this notebook, we will demonstrate how to leverage the results of record linkage and dimensional data modeling to build out an analytic frame necessary for a longitudinal cohort analysis.\nIn the last notebook, we learned that the first step of cohort analysis is to define its uniting “anchor point” in time, limiting observations to the initial cross-section. While this first step is essential, it doesn’t allow us to follow these individuals over time - which is, after all, the whole point of cohort analysis!\nTo harness the true power of a cohort analysis, we need to perform some sort of record linkage. As indicated by its name, record linkage is the process of identifying and linking all records - data points - which pertain to the entities of interest in the analysis. Broadly, record linkage allows us to follow our population of interest over time and across different data sources. Remember that at the end of the previous notebook, we started the record linkage process by joining our cohort cross-section back to the full PROMIS dataset to identify additional observations for our cohort members.\nBefore we continue the record linkage process on our original cohort, though, we will reconstruct our cohort based on an updated understanding of the PROMIS data.\nIn this notebook, we will cover the following topics, all within the scope of understanding unemployment to reemployment pathways for claimants receiving benefits after COVID-imposed restrictions were lifted in Wisconsin:\nHere, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Redefining our Cohort.\nAlthough in the last notebook we took a first pass at constructing the cohort for our analysis, we also left you with the following caveat:\nAs we continue to work with the PROMIS data in this class, we (and you!) are uncovering new features of the underlying data structure and quality that may impact our analyses. This is part of the reality of working with real administrative data, so don’t feel discouraged as you work through the iterative process of updating your cohort and analysis plan to reflect these developments - we are doing the same!\nWe spoke with a Wisconsin UI expert this week to get a better understanding of the dataset a a whole. We have placed an updated version of the data dictionary reflecting the information we received on the Resources page of the class website - the “Face Validity” column contains important notes about using and interpreting several of the PROMIS variables.\nAside from this general feedback, there were also a few particularly pertinent questions we were able to get answered about the proper use of the UI data this week:\nWith these answers in mind, we will update our applied cohort decisions, which we will begin using today and carry forward in the rest of these notebooks. Recall our initial cohort definition, where we want to restrict the data to those receiving benefits in the benefit year beginning March 20, 2022. That is, for our initial cohort cross-section, we want to limit to the earliest observation for any person with observations satisfying the following:\nFor future reference, we will save the results of this query as nb_cohort to our read/write schema on Redshift accessible in this workspace, tr_wi_2023.\n# occurrence will assign an order to the observations corresponding to each ssn\n# starting with the earliest week_ending_date\nqry &lt;- \"\nwith benefit_recipients as (\n  select *,\n    row_number() over (partition by ssn order by week_ending_date) as occurrence\n  from ds_wi_dwd.promis\n  where effective_date_of_claim = '2022-03-20' \n    and monetarily_eligible = 'Y' \n    and stop_payment_indicator = 'N' \n    and waiting_week = 'N' \n    and entitlement = 0 \n    and ic_type is NULL\n) \nselect *\ninto tr_wi_2023.nb_cohort\nfrom benefit_recipients\nwhere occurrence = 1\n\"\n\ndbExecute(con, qry)\nRecall that for everyone to have access to this table, and any other permanent table we create in the schema, we need to run the following command.\nGRANT SELECT ON TABLE tr_wi_2023.nb_cohort TO group db_t00111_ro;\nWe can verify that the cohort table exists like so:\nqry &lt;- \"\nselect * from tr_wi_2023.nb_cohort\nlimit 5\n\"\n\ndbGetQuery(con, qry)\nNow that we have reassembled our cohort based on our new understanding of the PROMIS data, we can revisit the record linkage process with the eventual aim of constructing our final analytic frame.\nWhen attempting to link records, however, there are many potential issues which could arise. For example:\nBecause of this, record linkage often involves more steps than just joining two datasets together.\nOftentimes, the best way to approach this record linkage, especially if you plan to run varied analyses on the data, is to create a flexible data model upon which you can build your eventual analytic frame. In part, a data model is abstract - a high-level blueprint for how your datasets should combine and how you will deal with some of the issues highlighted above. A data model allows us to create the “modeled data” we use for our analysis - the set of physical, easy-to-combine tables that adhere to the same rule-based framework.\nFor this class, we have created a data model for your use. If you are interested, the details of how we created this model can be viewed in the supplementary notebook supplemental_data_model.html. For the remainder of this notebook, we will walk through the data model and provide examples for how you can use it to support your analysis.\nBefore proceeding with the rest of our analysis, we will explore this data frame to ensure we understand its construction and how we can best leverage it moving forward.\nFirst, let’s evaluate the distribution of total observations for each member of our cohort in the data model.\nanalytic_frame %&gt;% \n  group_by(person_id) %&gt;% \n  summarize(\n    n_weeks = n()\n  ) %&gt;%\n  ungroup() %&gt;%\n  distinct(n_weeks)\nWe only have one row in our output, meaning that we have the same number of rows for every individual in our analytic frame (and fact table)! This makes sense, since in creating our weekly observation table, we populated rows corresponding to every Saturday from 2006 to 2023 for every individual, whether or not they actually appeared in the PROMIS or wage data during that week. We began in 2006 because it aligns with the start of the earliest accessible data source in this training.\nIn this notebook, we demonstrated how to apply the newly-created class data model to a longitudinal study with an already-developed cohort. Think through the new questions that this linked data model allows you to explore and how they relate to your team’s project. Refer back to the list of tables in our data model, and begin devising a plan for how each may contribute to your analysis. As you are doing so, take special care to think through the varied grains of the benefit and wage data in the fact table, as well as our new findings about the PROMIS data in general, and how they may impact your work in addressing your research question.\nWe will further this narrative in the next notebook as we continue to develop our descriptive analysis, shifting our focus to the benefit- and employment-based measures we can create using our new analytic frame to develop findings to inform our guiding research topic.\nAR 2022 Record Linkage Notebook, Robert McGough, Nishav Mainali, Benjamin Feder, Josh Edelmann (Link to come)"
+    "text": "5 Introduction\nWelcome to our third notebook of this module! In this notebook, we will demonstrate how to leverage the results of record linkage and dimensional data modeling to build out an analytic frame necessary for a longitudinal cohort analysis.\nIn the last notebook, we learned that the first step of cohort analysis is to define its uniting “anchor point” in time, limiting observations to the initial cross-section. While this first step is essential, it doesn’t allow us to follow these individuals over time - which is, after all, the whole point of cohort analysis!\nTo harness the true power of a cohort analysis, we need to perform some sort of record linkage. As indicated by its name, record linkage is the process of identifying and linking all records - data points - which pertain to the entities of interest in the analysis. Broadly, record linkage allows us to follow our population of interest over time and across different data sources. Remember that at the end of the previous notebook, we started the record linkage process by joining our cohort cross-section back to the full PROMIS dataset to identify additional observations for our cohort members.\nBefore we continue the record linkage process on our original cohort, though, we will reconstruct our cohort based on an updated understanding of the PROMIS data.\nIn this notebook, we will cover the following topics, all within the scope of understanding unemployment to reemployment pathways for claimants receiving benefits after COVID-imposed restrictions were lifted in Wisconsin:\nHere, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Redefining our Cohort.\nAlthough in the last notebook we took a first pass at constructing the cohort for our analysis, we also left you with the following caveat:\nAs we continue to work with the PROMIS data in this class, we (and you!) are uncovering new features of the underlying data structure and quality that may impact our analyses. This is part of the reality of working with real administrative data, so don’t feel discouraged as you work through the iterative process of updating your cohort and analysis plan to reflect these developments - we are doing the same!\nWe spoke with a Wisconsin UI expert this week to get a better understanding of the dataset a a whole. We have placed an updated version of the data dictionary reflecting the information we received on the Resources page of the class website - the “Face Validity” column contains important notes about using and interpreting several of the PROMIS variables.\nAside from this general feedback, there were also a few particularly pertinent questions we were able to get answered about the proper use of the UI data this week:\nWith these answers in mind, we will update our applied cohort decisions, which we will begin using today and carry forward in the rest of these notebooks. Recall our initial cohort definition, where we want to restrict the data to those receiving benefits in the benefit year beginning March 20, 2022. That is, for our initial cohort cross-section, we want to limit to the earliest observation for any person with observations satisfying the following:\nFor future reference, we will save the results of this query as nb_cohort to our read/write schema on Redshift accessible in this workspace, tr_wi_2023.\n# occurrence will assign an order to the observations corresponding to each ssn\n# starting with the earliest week_ending_date\nqry &lt;- \"\nwith benefit_recipients as (\n  select *,\n    row_number() over (partition by ssn order by week_ending_date) as occurrence\n  from ds_wi_dwd.promis\n  where effective_date_of_claim = '2022-03-20' \n    and monetarily_eligible = 'Y' \n    and stop_payment_indicator = 'N' \n    and waiting_week = 'N' \n    and entitlement = 0 \n    and ic_type is NULL\n) \nselect *\ninto tr_wi_2023.nb_cohort\nfrom benefit_recipients\nwhere occurrence = 1\n\"\n\ndbExecute(con, qry)\nRecall that for everyone to have access to this table, and any other permanent table we create in the schema, we need to run the following command.\nGRANT SELECT ON TABLE tr_wi_2023.nb_cohort TO group db_t00111_ro;\nWe can verify that the cohort table exists like so:\nqry &lt;- \"\nselect * from tr_wi_2023.nb_cohort\nlimit 5\n\"\n\ndbGetQuery(con, qry)\nNow that we have reassembled our cohort based on our new understanding of the PROMIS data, we can revisit the record linkage process with the eventual aim of constructing our final analytic frame.\nWhen attempting to link records, however, there are many potential issues which could arise. For example:\nBecause of this, record linkage often involves more steps than just joining two datasets together.\nOftentimes, the best way to approach this record linkage, especially if you plan to run varied analyses on the data, is to create a flexible data model upon which you can build your eventual analytic frame. In part, a data model is abstract - a high-level blueprint for how your datasets should combine and how you will deal with some of the issues highlighted above. A data model allows us to create the “modeled data” we use for our analysis - the set of physical, easy-to-combine tables that adhere to the same rule-based framework.\nFor this class, we have created a data model for your use. If you are interested, the details of how we created this model can be viewed in the supplementary notebook supplemental_data_model.html. For the remainder of this notebook, we will walk through the data model and provide examples for how you can use it to support your analysis.\nBefore proceeding with the rest of our analysis, we will explore this data frame to ensure we understand its construction and how we can best leverage it moving forward.\nFirst, let’s evaluate the distribution of total observations for each member of our cohort in the data model.\nanalytic_frame %&gt;% \n  group_by(person_id) %&gt;% \n  summarize(\n    n_weeks = n()\n  ) %&gt;%\n  ungroup() %&gt;%\n  distinct(n_weeks)\nWe only have one row in our output, meaning that we have the same number of rows for every individual in our analytic frame (and fact table)! This makes sense, since in creating our weekly observation table, we populated rows corresponding to every Saturday from 2006 to 2023 for every individual, whether or not they actually appeared in the PROMIS or wage data during that week. We began in 2006 because it aligns with the start of the earliest accessible data source in this training.\nIn this notebook, we demonstrated how to apply the newly-created class data model to a longitudinal study with an already-developed cohort. Think through the new questions that this linked data model allows you to explore and how they relate to your team’s project. Refer back to the list of tables in our data model, and begin devising a plan for how each may contribute to your analysis. As you are doing so, take special care to think through the varied grains of the benefit and wage data in the fact table, as well as our new findings about the PROMIS data in general, and how they may impact your work in addressing your research question.\nWe will further this narrative in the next notebook as we continue to develop our descriptive analysis, shifting our focus to the benefit- and employment-based measures we can create using our new analytic frame to develop findings to inform our guiding research topic.\nAR 2022 Record Linkage Notebook, Robert McGough, Nishav Mainali, Benjamin Feder, Josh Edelmann (Link to come)"
   },
   {
     "objectID": "Class Notebooks/03_record_linkage.html#load-libraries",
     "href": "Class Notebooks/03_record_linkage.html#load-libraries",
-    "title": "Cohort Analysis Part 2: Data Model and Record Linkage",
+    "title": "4  Cohort Analysis Part 2: Data Model and Record Linkage",
     "section": "Load Libraries",
     "text": "Load Libraries\nAs in previous notebooks, we will start by loading necessary packages not readily available in the base R setup.\n\nAs a reminder, every time you create a new R file, you should copy and run the following code snippet.\n\n\noptions(scipen = 999) # avoid scientific notation\nlibrary(RJDBC)\nlibrary(tidyverse)"
   },
   {
     "objectID": "Class Notebooks/03_record_linkage.html#establish-database-connection",
     "href": "Class Notebooks/03_record_linkage.html#establish-database-connection",
-    "title": "Cohort Analysis Part 2: Data Model and Record Linkage",
+    "title": "4  Cohort Analysis Part 2: Data Model and Record Linkage",
     "section": "Establish Database Connection",
     "text": "Establish Database Connection\nThe following set of commands will set up a connection to the Redshift database:\n\ndbusr=Sys.getenv(\"DBUSER\")\ndbpswd=Sys.getenv(\"DBPASSWD\")\n\nurl &lt;- \"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;loginToRp=urn:amazon:webservices:govcloud;ssl=true;AutoCreate=true;idp_host=adfs.adrf.net;idp_port=443;ssl_insecure=true;plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\"\n\ndriver &lt;- JDBC(\n  \"com.amazon.redshift.jdbc42.Driver\",\n  classPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\", identifier.quote=\"`\"\n)\n\ncon &lt;- dbConnect(driver, url, dbusr, dbpswd)\n\nFor this code to work, you need to have an .Renviron file in your user folder (i.e. U:\\John.Doe.P00002) containing your username and password."
   },
   {
     "objectID": "Class Notebooks/03_record_linkage.html#checkpoint",
     "href": "Class Notebooks/03_record_linkage.html#checkpoint",
-    "title": "Cohort Analysis Part 2: Data Model and Record Linkage",
+    "title": "4  Cohort Analysis Part 2: Data Model and Record Linkage",
     "section": "Checkpoint",
     "text": "Checkpoint\nGiven our new knowledge of the PROMIS data, do you need to go back and redefine your team’s cohort? Refer to the updated data dictionary, which has a new column, “Face Validity”, providing additional information for each variable."
   },
   {
     "objectID": "Class Notebooks/03_record_linkage.html#class-data-model",
     "href": "Class Notebooks/03_record_linkage.html#class-data-model",
-    "title": "Cohort Analysis Part 2: Data Model and Record Linkage",
+    "title": "4  Cohort Analysis Part 2: Data Model and Record Linkage",
     "section": "Class Data Model",
     "text": "Class Data Model\nThe following diagram (also available on the resources page of the class website) visualizes the relationships between tables as part of the larger data model:\n\nThis diagram represents each of the tables that we created from the raw data for our data model, as well as the links between them, which are indicated by the connected lines.\nThe following tables appear in our data model, all of which start with the prefix wi:\n\nFact table (wi_fact_weekly_observation): This table is at the center of our data model and contains the actual week by week observations for our individuals. PROMIS data appears (as in the original dataset) at the weekly level, the UI wage data appears repeated for each week of the quarter.\nPerson table (wi_mdim_person): This table contains static demographic information for each person in our dataset, such as birth date and race. This table has been mastered, meaning that internal discrepancies in the data (such as changing birth dates for the same individual) have been resolved.\n\nDate table (wi_rdim_week): This table links the weekly dates to the corresponding year and quarter in the fact table.\nIndustry and occupation tables (wi_rdim_industry and wi_rdim_occupation): These tables provide titles for the industry and occupation codes present in the PROMIS data, respectively.\nZip code to county to WDA crosswalk (wi_rdim_zip_county_wda_xwalk): This table joins the crosswalks between ZIP code, county, and WDA that we have used previously into one table.\n\nAgain, for more details on the creation of this data model and the methods used to clean and prepare the data, please refer to supplemental_data_model.html."
   },
   {
     "objectID": "Class Notebooks/03_record_linkage.html#applying-the-data-model-to-our-cohort-analysis",
     "href": "Class Notebooks/03_record_linkage.html#applying-the-data-model-to-our-cohort-analysis",
-    "title": "Cohort Analysis Part 2: Data Model and Record Linkage",
+    "title": "4  Cohort Analysis Part 2: Data Model and Record Linkage",
     "section": "Applying the data model to our cohort analysis",
     "text": "Applying the data model to our cohort analysis\nNow that we have a better understanding of the data model and its resulting tables, we can combine it with our saved cohort cross-section table to complete our analytic frame. This is similar to what we did at the end of the prior notebook by merging the PROMIS data back onto itself, but in this case we are working off a more consistent set of decisions rules, as well also supplementing it with relevant UI wage data.\nUsing the cohort table we defined previously, we can bring in all claimant- and employment-related observations by joining to the fact table by way of the person table (linking ssn to person_id). Warning: this code may take a few minutes to execute!\n\nqry &lt;- \"\nselect f.*\nfrom tr_wi_2023.nb_cohort c \njoin tr_wi_2023.wi_mdim_person p on c.ssn = p.ssn \njoin tr_wi_2023.wi_fact_weekly_observation f on p.person_id = f.person_id\n\"\n\nanalytic_frame &lt;- dbGetQuery(con, qry)\n\nNow we have created our analytic frame!"
   },
   {
     "objectID": "Class Notebooks/03_record_linkage.html#claimant-data",
     "href": "Class Notebooks/03_record_linkage.html#claimant-data",
-    "title": "Cohort Analysis Part 2: Data Model and Record Linkage",
+    "title": "4  Cohort Analysis Part 2: Data Model and Record Linkage",
     "section": "Claimant Data",
     "text": "Claimant Data\nIf we want to focus just on weeks where our cohort submitted a UI benefit claim, we can use the new variable created as part of our data model benefit_claimed - in other words, whether or not they appeared in the PROMIS data for that week. Using this variable, we can find the first and last times any individual from our cohort filed a claim:\n\nanalytic_frame %&gt;%\n  filter(\n    benefit_claimed == \"Y\"\n  ) %&gt;%\n  summarize(\n    first_week = min(week_ending_date),\n    last_week = max(week_ending_date)\n  )\n\nNotice that these weeks fall well outside the benefit year we would expect to find these individuals in, as our data frame still contains all PROMIS records for these individuals. This could be useful for some research questions later on, but for justification, we can confirm that the range of weeks in this benefit year are far more plausible.\n\nNote: The variable effective_date_of_claim has been renamed to benefit_yr_start in the fact table.\n\n\nanalytic_frame %&gt;%\n  filter(\n    benefit_claimed == \"Y\",\n    benefit_yr_start == \"2022-03-20\"\n  ) %&gt;%\n  summarize(\n    first_week = min(week_ending_date),\n    last_week = max(week_ending_date)\n  )\n\nWithin this benefit year, because we now have all claim observations, we can better understand our cohort’s engagement with the UI system. To do so, we will plot two different distributions:\n\nThe number of weeks that they claimed UI benefits\nThe number of weeks that they received UI benefits\n\nThe second of these calculations may not appear to be as easily accessible because it does not appear as a lone variable in the PROMIS data, but rather can be identified based on a combination of columns. For convenience, though, in our data model, we have distilled these conditions down to a single flag, normal_benefit_received.\n\nplot_data &lt;- analytic_frame %&gt;%\n  filter(benefit_yr_start == \"2022-03-20\") %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    n_weeks_claimed = sum(benefit_claimed == \"Y\"),\n    n_weeks_received = sum(normal_benefit_received == \"Y\")\n  ) %&gt;%\n  ungroup() \n\n# make longer for ease in legend creation in ggplot\nplot_data %&gt;%\n  pivot_longer(\n    cols = starts_with(\"n\"),\n    names_to = \"stat\",\n    values_to = \"weeks\"\n  ) %&gt;%\n  ggplot(aes(x=weeks, fill=stat)) +\n    geom_density(alpha = .25) # alpha changes transparency\n\nThe plot above is similar to a histogram, and for reasons we will discuss later in the training, is much easier to export from the ADRF. So we can see that majority of individuals claim benefits for somewhere (red) from around REDACTED to REDACTED weeks, they actually receive benefits (blue) for a fewer number of weeks. We can see the same information reflected in tabular form.\n\nplot_data %&gt;%\n  pull(n_weeks_claimed) %&gt;%\n  summary()\n\n\nplot_data %&gt;%\n  pull(n_weeks_received) %&gt;%\n  summary()\n\nJust as we ascertained from the plot, both the median and average number of weeks claimed are higher than weeks received.\nWe can build on this by displaying an image of the percentage of the total cohort claim and receiving benefits in a given week relative to the benefit year start date. This may sound familiar, as it will largely resemble one of the key visualizations in Illinois’ Unemployment to Reemployment Portal, except with percentages instead of counts.\n\n# find week_id corresponding with benefit start week\n# benefit start week is the sunday before the saturday corresponding\n# to the week_id\nbenefit_start_id &lt;- analytic_frame %&gt;%\n  filter(week_ending_date == \"2022-03-26\") %&gt;%\n  distinct(week_id) %&gt;%\n  pull()\n\n# find total cohort size\ncohort_size &lt;- analytic_frame %&gt;%\n  summarize(n_distinct(person_id)) %&gt;%\n  pull()\n\nplot_data2 &lt;- analytic_frame %&gt;%\n  filter(benefit_yr_start == \"2022-03-20\") %&gt;%\n  mutate(\n    weeks_since_start = week_id - benefit_start_id\n  ) %&gt;%\n  group_by(weeks_since_start) %&gt;%\n  summarize(\n    pct_claiming = sum(benefit_claimed == \"Y\")/cohort_size,\n    pct_receiving = sum(normal_benefit_received == \"Y\")/cohort_size\n  ) %&gt;%\n  ungroup()\n\nhead(plot_data2)\n\nInstead of reshaping the data frame to make it longer for easier ggplot2 support, we will keep it in its original structure for comparison.\n\nplot_data2 %&gt;%\n  ggplot() + \n    geom_bar(\n      stat = \"identity\", \n      aes(x = weeks_since_start, y = pct_claiming), \n      fill = \"black\"\n    ) + \n    geom_bar(\n      stat = \"identity\", \n      aes(x = weeks_since_start, y = pct_receiving),\n      fill = \"#228833\"\n    )\n\nThis graph shows the number of weeks since the benefit year start on the x-axis. The black bars show the total percentage of our cohort claiming benefits, with the green bars underneath resembling the total percent of our cohort receiving benefits in the week. As expected, the green bars are always shorter than the black bars, because only individuals who claimed benefits in a week are eligible to receive benefits in that week, and they only start showing up in the weeks after the benefit year commenced.\nWe can extend this work to showcase the true breadth of our data model. At this point, we are looking at only the initial benefit year for our cohort, but what if we are also interested in prior UI spells? Or re-enrollments in the subsequent benefit year? With our new analytic frame, we can develop a fuller view of our cohort members’ complete UI benefit history.\n\n# show dates instead of weeks relative to start date\n# find benefit year end date according to cohort definition\nbenefit_end_date &lt;- analytic_frame %&gt;% \n  filter(week_id == benefit_start_id + 52) %&gt;% \n  distinct(week_ending_date) %&gt;%\n  pull()\n\nanalytic_frame %&gt;%\n  group_by(week_ending_date) %&gt;%\n  summarize(\n    pct_claiming = sum(benefit_claimed == \"Y\")/cohort_size,\n    pct_receiving = sum(normal_benefit_received == \"Y\")/cohort_size\n  ) %&gt;%\n  ungroup() %&gt;%\n  filter(pct_claiming &gt; 0) %&gt;%\n  ggplot() + \n  geom_bar(\n    stat = \"identity\",\n    aes(x = week_ending_date, y = pct_claiming), \n    color = \"black\"\n  ) + \n  geom_bar(\n    stat = \"identity\", \n    aes(x = week_ending_date, y = pct_receiving),\n    color = \"#228833\"\n  ) +\n  # indicate start and end points of benefit year for cohort\n  geom_vline(xintercept = c(ymd(\"2022-03-26\"), ymd(benefit_end_date)), color = \"red\")\n\nAs expected, we see a big spike in benefit claims and reception when the COVID-imposed restrictions began in Wisconsin. Interestingly enough, prior to the pandemic, there appears to be a seasonal trend in claimant and reception patterns for our cohort as well."
   },
   {
     "objectID": "Class Notebooks/03_record_linkage.html#interaction-between-claimant-and-wage-data",
     "href": "Class Notebooks/03_record_linkage.html#interaction-between-claimant-and-wage-data",
-    "title": "Cohort Analysis Part 2: Data Model and Record Linkage",
+    "title": "4  Cohort Analysis Part 2: Data Model and Record Linkage",
     "section": "Interaction between Claimant and Wage Data",
     "text": "Interaction between Claimant and Wage Data\nNow that we have a better grasp of the potential benefit-oriented longitudinal analyses facilitated by the data model, we can shift to the other set of columns in the fact table - the wage records. In working with the wage records, it’s important to note that, as we mentioned briefly above and in prior notebooks, the grain, or frequency of observation, is different in the PROMIS and UI wage data. While the PROMIS data exists at the weekly level, UI wages are only reported by quarter. To combine these two data sources, we duplicated records from the UI wage data for each week of the quarter where wages were reported.\nSee the example below:\n\nanalytic_frame %&gt;%\n  filter(person_id == 2170, calendar_year == 2022, calendar_quarter == 3) %&gt;%\n  arrange(week_ending_date) %&gt;%\n  select(person_id, week_ending_date, benefit_claimed, employed_in_quarter, total_wages)\n\nWe can see that for this individual’s PROMIS information changes week to week - REDACTED. Their wage information, however, is static for each week. In the future, especially when aggregating data - you cannot just sum the weekly observations in order to find total wages over a period of time. We’ll cover working with quarterly wage records in the next notebook, which will focus on measurement.\nWith that understanding of the data structure in hand, we can now look more narrowly at our cohort’s representation in the wage data for a set of quarters before, during, and after their anchoring benefit year.\n\nanalytic_frame %&gt;%\n  # restrict wage records to 2022 b/c they stop then\n  filter(calendar_year &gt; 2020, calendar_year &lt; 2023) %&gt;%\n  # combine year and quarter into one variable\n  mutate(\n    year_quarter = paste0(calendar_year, \"-Q\", calendar_quarter)\n  ) %&gt;%\n  group_by(year_quarter) %&gt;%\n  summarize(\n    percent_in_wage = sum(employed_in_quarter == \"Y\") / n()\n  ) %&gt;%\n  ggplot() + \n  geom_bar(stat = \"identity\", aes(x = year_quarter, y = percent_in_wage))\n\nWe can see that covered employment in the UI wage records peaked in the two quarters prior to our cohort’s benefit start date, which makes sense given the requirements for qualifying for UI benefits. In addition, as expected, we see a drop in the percentage of observations appearing in the wage data in the quarter of UI benefit reception."
   },
   {
     "objectID": "Class Notebooks/03_record_linkage.html#checkpoint-1",
     "href": "Class Notebooks/03_record_linkage.html#checkpoint-1",
-    "title": "Cohort Analysis Part 2: Data Model and Record Linkage",
+    "title": "4  Cohort Analysis Part 2: Data Model and Record Linkage",
     "section": "Checkpoint",
     "text": "Checkpoint\nAre all of the variables you need for your team’s research project available in the data model? Discuss with your team and note any gaps."
   },
   {
     "objectID": "Class Notebooks/04_measurement.html",
     "href": "Class Notebooks/04_measurement.html",
-    "title": "Measurement",
+    "title": "5  Measurement",
     "section": "",
-    "text": "Introduction\nWelcome to Notebook 4 of Module 2! Up to this point in the course, most of our work with the Wisconsin data has been focused on data preparation and project scoping, culminating in the development of our analytic frame in last week’s notebook. In this notebook, we will bridge the gap between this project scoping work and the actual process of longitudinal analysis by developing the measures that will serve as our primary outcomes of interest.\nAs you’ve learned, when we are analyzing administrative data not developed for research purposes, it is important to create new measures that will help us answer our policy-relevant questions. When we say “measure”, we usually mean a person-level variable that we can use to compare outcomes for individuals in our cohort. Creating measures at the person level allows us to compare outcomes for different subgroups of individuals based on their characteristics and experiences.\nHere, we will demonstrate how to create several measures to describe our cohort members’ UI experience and subsequent workforce outcomes. While your group may choose to generate different measures based on your research question, the code displayed here should provide a good starting place for thinking about how to best create and analyze person-level measures.\nAs in previous notebooks, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Loading our analytic frame.\nWe can recreate our analytic frame dataset from the prior notebook by using SQL joins to filter the fact table to only include our cohort members.\nqry &lt;- \"\nselect f.*\nfrom tr_wi_2023.nb_cohort c \njoin tr_wi_2023.wi_mdim_person p on (c.ssn = p.ssn)\njoin tr_wi_2023.wi_fact_weekly_observation f on (p.person_id = f.person_id)\n\"\n\nanalytic_frame &lt;- dbGetQuery(con, qry)\nFor further details about the analytic frame, please refer back to notebook 03_record_linkage.\nThe first set of measures we will construct are aimed at capturing aspects of our cohort members’ experience with the UI benefit system. Again, each of these measures is person-level - for each measure, we want to distill the wealth of information available in our analytic frame into a single outcome per individual that we can compare across subgroups of our cohort.\nSince our analytic frame also includes variables describing employment experiences, we can develop measures focused on our cohort’s past and future employment relative to the benefit year in question.\nConveniently, because our cohort definition identifies individuals who started their benefit year in the last week of Q1 2022, any employment in subsequent rows (remember to aggregate by quarter!) reflect employment post-UI entry. In these examples, we will restrict the employment data to within three quarters of UI program entry.\nTo do so, we will create a handy reference table below, which will also track the quarter relative to entry.\n# approach: first filter analytic frame wage data to specific quarters\n# then order to find quarter relative to a specific point in time\n# the [] subsets the range of potential values to the specific one in 2022 Q1\nquarters_in_range &lt;- analytic_frame %&gt;%\n  distinct(calendar_year, calendar_quarter) %&gt;%\n  filter(\n    calendar_year == 2021 & calendar_quarter %in% c(2,3,4) | calendar_year == 2022\n  ) %&gt;%\n  arrange(calendar_year, calendar_quarter) %&gt;%\n  mutate(\n    quarter_from_entry = row_number() - row_number()[calendar_year == 2022 & calendar_quarter == 1]\n  )\n\nquarters_in_range\nHopefully, by this point in the notebook, you have been inspired to apply some of these measures to your own cohort and overall project. You are encouraged to use the base code available in this notebook, and adapt and apply it to your own work. In the realm of unemployment to reemployment trajectories, there is a wealth of potential measures that can be created by linking the PROMIS and UI wage records, and we encourage you to think through the different ways you might be able to create new measures and proxies to help answer your primary research question.\nAR Measurement Notebook (link to come)\nWI 2023 Record Linkage Notebook, Roy McKenzie, Benjamin Feder, Joshua Edelmann (citation to be added)"
+    "text": "6 Introduction\nWelcome to Notebook 4 of Module 2! Up to this point in the course, most of our work with the Wisconsin data has been focused on data preparation and project scoping, culminating in the development of our analytic frame in last week’s notebook. In this notebook, we will bridge the gap between this project scoping work and the actual process of longitudinal analysis by developing the measures that will serve as our primary outcomes of interest.\nAs you’ve learned, when we are analyzing administrative data not developed for research purposes, it is important to create new measures that will help us answer our policy-relevant questions. When we say “measure”, we usually mean a person-level variable that we can use to compare outcomes for individuals in our cohort. Creating measures at the person level allows us to compare outcomes for different subgroups of individuals based on their characteristics and experiences.\nHere, we will demonstrate how to create several measures to describe our cohort members’ UI experience and subsequent workforce outcomes. While your group may choose to generate different measures based on your research question, the code displayed here should provide a good starting place for thinking about how to best create and analyze person-level measures.\nAs in previous notebooks, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Loading our analytic frame.\nWe can recreate our analytic frame dataset from the prior notebook by using SQL joins to filter the fact table to only include our cohort members.\nqry &lt;- \"\nselect f.*\nfrom tr_wi_2023.nb_cohort c \njoin tr_wi_2023.wi_mdim_person p on (c.ssn = p.ssn)\njoin tr_wi_2023.wi_fact_weekly_observation f on (p.person_id = f.person_id)\n\"\n\nanalytic_frame &lt;- dbGetQuery(con, qry)\nFor further details about the analytic frame, please refer back to notebook 03_record_linkage.\nThe first set of measures we will construct are aimed at capturing aspects of our cohort members’ experience with the UI benefit system. Again, each of these measures is person-level - for each measure, we want to distill the wealth of information available in our analytic frame into a single outcome per individual that we can compare across subgroups of our cohort.\nSince our analytic frame also includes variables describing employment experiences, we can develop measures focused on our cohort’s past and future employment relative to the benefit year in question.\nConveniently, because our cohort definition identifies individuals who started their benefit year in the last week of Q1 2022, any employment in subsequent rows (remember to aggregate by quarter!) reflect employment post-UI entry. In these examples, we will restrict the employment data to within three quarters of UI program entry.\nTo do so, we will create a handy reference table below, which will also track the quarter relative to entry.\n# approach: first filter analytic frame wage data to specific quarters\n# then order to find quarter relative to a specific point in time\n# the [] subsets the range of potential values to the specific one in 2022 Q1\nquarters_in_range &lt;- analytic_frame %&gt;%\n  distinct(calendar_year, calendar_quarter) %&gt;%\n  filter(\n    calendar_year == 2021 & calendar_quarter %in% c(2,3,4) | calendar_year == 2022\n  ) %&gt;%\n  arrange(calendar_year, calendar_quarter) %&gt;%\n  mutate(\n    quarter_from_entry = row_number() - row_number()[calendar_year == 2022 & calendar_quarter == 1]\n  )\n\nquarters_in_range\nHopefully, by this point in the notebook, you have been inspired to apply some of these measures to your own cohort and overall project. You are encouraged to use the base code available in this notebook, and adapt and apply it to your own work. In the realm of unemployment to reemployment trajectories, there is a wealth of potential measures that can be created by linking the PROMIS and UI wage records, and we encourage you to think through the different ways you might be able to create new measures and proxies to help answer your primary research question.\nAR Measurement Notebook (link to come)\nWI 2023 Record Linkage Notebook, Roy McKenzie, Benjamin Feder, Joshua Edelmann (citation to be added)"
   },
   {
     "objectID": "Class Notebooks/04_measurement.html#load-libraries",
     "href": "Class Notebooks/04_measurement.html#load-libraries",
-    "title": "Measurement",
+    "title": "5  Measurement",
     "section": "Load libraries",
     "text": "Load libraries\nWe will start by loading necessary packages not readily available in the base R setup.\n\nAs a reminder, every time you create a new R file, you should copy and run the following code snippet.\n\n\noptions(scipen = 999) # avoid scientific notation\nlibrary(RJDBC)\nlibrary(tidyverse)"
   },
   {
     "objectID": "Class Notebooks/04_measurement.html#establish-database-connection",
     "href": "Class Notebooks/04_measurement.html#establish-database-connection",
-    "title": "Measurement",
+    "title": "5  Measurement",
     "section": "Establish database connection",
     "text": "Establish database connection\nThe following set of commands will set up a connection to the Redshift database:\n\ndbusr=Sys.getenv(\"DBUSER\")\ndbpswd=Sys.getenv(\"DBPASSWD\")\n\nurl &lt;- \"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;loginToRp=urn:amazon:webservices:govcloud;ssl=true;AutoCreate=true;idp_host=adfs.adrf.net;idp_port=443;ssl_insecure=true;plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\"\n\ndriver &lt;- JDBC(\n  \"com.amazon.redshift.jdbc42.Driver\",\n  classPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\", identifier.quote=\"`\"\n)\n\ncon &lt;- dbConnect(driver, url, dbusr, dbpswd)\n\nFor this code to work, you need to have an .Renviron file in your user folder (i.e. U:\\John.Doe.P00002) containing your username and password."
   },
   {
     "objectID": "Class Notebooks/04_measurement.html#benefit-spells",
     "href": "Class Notebooks/04_measurement.html#benefit-spells",
-    "title": "Measurement",
+    "title": "5  Measurement",
     "section": "Benefit Spells",
     "text": "Benefit Spells\nIn the previous notebook, we visualized the distribution of the number of weeks that our cohort members claimed UI benefits to start examining potential patterns. We saw the density plot version of this plot in the last notebook, but to get an even clearer picture (that we aren’t worried about exporting) let’s look at the actual underlying histogram:\n\nplot_data &lt;- analytic_frame %&gt;%\n  filter(benefit_yr_start == \"2022-03-20\") %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    n_weeks_claimed = sum(benefit_claimed == \"Y\"),\n  ) %&gt;%\n  ungroup() \n\n  \nggplot(plot_data, aes(x = n_weeks_claimed)) + \n  geom_histogram(stat = \"count\", binwidth = 1) # set binwidth so each bin is a week\n\nThis plot is great, but it’s very noisy: using the number of weeks claimed as an outcome treats individuals with just one week of difference differently, even though their experiences may be very similar. Therefore, to construct both an outcome and a categorical variable we can use for later analysis, we can group individuals based on the number of weeks in which they claimed UI benefits. For purely pedagogical purposes, we will define a low volume spell as one in which the claimant falls in the bottom 25 percent of cohort members’ in terms of number of weeks claimed during the year. We can use R to find the cutoff for this:\n\n# quantile(), as the name indicates, helps us find quantiles\nplot_data %&gt;%\n  summarize(\n    percentile_25 = quantile(n_weeks_claimed, probs = .25)\n  )\n\nFrom this, we can see that approximately one quarter of our cohort members have between REDACTED and REDACTED total weeks claimed during the benefit year, and the remaining have high volume spells with REDACTED or more total weeks claimed during the benefit year. Let’s see those groups delineated on the plot:\n\nggplot(plot_data, aes(x = n_weeks_claimed)) + \n  geom_histogram(stat = \"count\", binwidth = 1) + \n  geom_vline(xintercept = 5.5, color = \"red\", size = 1) \n\nNow that we have this measure defined, let’s create a table storing this measure for each member of our cohort:\n\nspell_volume_measure &lt;- analytic_frame %&gt;%\n  filter(benefit_yr_start == \"2022-03-20\") %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    n_weeks_claimed = sum(benefit_claimed == \"Y\"),\n  ) %&gt;%\n  ungroup() %&gt;%\n  mutate(\n    spell_volume = case_when(\n      n_weeks_claimed &lt; quantile(n_weeks_claimed, probs = .25) ~ \"low\",\n      n_weeks_claimed &gt;= quantile(n_weeks_claimed, probs = .25) ~ \"high\"\n    ),\n    spell_volume = factor(spell_volume, c(\"low\", \"high\"), ordered = TRUE) # set as factor\n  ) %&gt;%\n  select(-n_weeks_claimed)\n\nspell_volume_measure %&gt;% \n  group_by(spell_volume) %&gt;%\n  summarize(\n    n_people = n_distinct(person_id)\n  )\n\nOur cohort’s experience while receiving UI benefits isn’t defined just by their number of claims, though. Recall the following graph showing the percentage of our cohort claiming and receiving benefits by weeks since the beginning of their benefit year:\n\n# find week_id corresponding with benefit start week\n# benefit start week is the sunday before the saturday corresponding\n# to the week_id\nbenefit_start_id &lt;- analytic_frame %&gt;%\n  filter(week_ending_date == \"2022-03-26\") %&gt;%\n  distinct(week_id) %&gt;%\n  pull()\n\n# find total cohort size\ncohort_size &lt;- analytic_frame %&gt;%\n  summarize(n_distinct(person_id)) %&gt;%\n  pull()\n\nplot_data &lt;- analytic_frame %&gt;%\n  filter(benefit_yr_start == \"2022-03-20\") %&gt;%\n  mutate(\n    weeks_since_start = week_id - benefit_start_id\n  ) %&gt;%\n  group_by(weeks_since_start) %&gt;%\n  summarize(\n    prop_claiming = sum(benefit_claimed == \"Y\")/cohort_size,\n    prop_receving = sum(normal_benefit_received == \"Y\")/cohort_size\n  ) %&gt;%\n  ungroup()\n\n\nplot_data %&gt;%\n  ggplot() + \n    geom_bar(\n      stat = \"identity\", \n      aes(x = weeks_since_start, y = prop_claiming), \n      fill = \"black\"\n    ) + \n    geom_bar(\n      stat = \"identity\", \n      aes(x = weeks_since_start, y = prop_receving),\n      fill = \"#228833\"\n    )\n\nNote that the percentage of our cohort claiming is not strictly decreasing over time - some individuals stop claiming benefits for a period of time, and then return. We want to identify these individuals - who have “stuttered” claims - and compare them with individuals with “continuous” claims. We can identify the continuous claimants by finding individuals where the number of weeks claimed is equal to the number of weeks between the first and last week they claim:\n\nclaim_frequency_measure &lt;- analytic_frame %&gt;% \n  # only focused on observations where benefits were claimed\n  filter(benefit_yr_start == \"2022-03-20\", benefit_claimed == \"Y\") %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    n_weeks_claimed = n(),\n    first_week_claimed = min(week_id),\n    last_week_claimed = max(week_id)\n  ) %&gt;%\n  mutate(\n    # add one because range is inclusive\n    duration = last_week_claimed - first_week_claimed + 1, \n    claim_frequency = if_else(\n      duration == n_weeks_claimed, \n      \"continuous\",\n      \"stuttered\"\n    )\n  ) %&gt;%\n  ungroup() %&gt;%\n  select(person_id, claim_frequency)\n\nclaim_frequency_measure %&gt;%\n  group_by(claim_frequency) %&gt;%\n  summarize(\n    n_people = n_distinct(person_id)\n  )\n\nFrom here, we can see that approximately half of our cohort filed claims continuously, while the other half filed claims in a somewhat stuttered fashion.\nNext, we might want to see how these claims interact:\n\nmeasures &lt;- claim_frequency_measure %&gt;%\n  inner_join(spell_volume_measure, by = \"person_id\")\n\n# use table() to see matrix\n# otherwise can use group_by and summarize() like we have been doing\ntable(measures$spell_volume, measures$claim_frequency)\n\nFrom here, amongst other insights, we can see that a relatively REDACTED percentage of continuous claimants had a REDACTED volume of claims than stuttered claimants.\n\nCheckpoint\nDo our definitions of these measures make sense to you? For the spell volume measure, how might you adjust the cutoffs for “low” and “high” volume spells and why?"
   },
   {
     "objectID": "Class Notebooks/04_measurement.html#date-of-exit-and-exit-rates",
     "href": "Class Notebooks/04_measurement.html#date-of-exit-and-exit-rates",
-    "title": "Measurement",
+    "title": "5  Measurement",
     "section": "Date of exit and exit rates",
     "text": "Date of exit and exit rates\nAnother useful measure for our analysis is the idea of a person-level “date of exit” variable - that is, the last time each member of our cohort actually received benefits from UI. We can calculate this like so:\n\nexit_rate_measure &lt;- analytic_frame %&gt;%\n  # just looking at benefit reception observations\n  filter(benefit_yr_start == \"2022-03-20\", normal_benefit_received == \"Y\") %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    last_week = max(week_ending_date),\n    last_week_id = max(week_id)\n  )\n\nUsing this measure, we can actually begin creating “exit rate” style plots for our cohort. For this plot, we want to show the percentage of the initial cohort who will ever receive benefits again in that week or later within the benefit year.\n\nexit_rate_plot_data &lt;- exit_rate_measure %&gt;%\n  group_by(last_week, last_week_id) %&gt;%\n  summarize(\n    n_leaving = n()\n  ) %&gt;%\n  ungroup() %&gt;%\n  arrange(last_week_id) %&gt;%\n  #cumsum finds cumulative sum\n  mutate(\n    n_remaining = sum(n_leaving) - cumsum(n_leaving),\n    relative_week = last_week_id - benefit_start_id\n  )\n\nggplot(exit_rate_plot_data, aes(x = relative_week, y = n_remaining)) + \n  geom_bar(stat = \"identity\")\n\nNotice, that unlike the plots we saw in the section above, the bars in this plot are strictly decreasing in height. This is because the prior plot counted only the percentage of the cohort appearing in the claims data each week, which could increase or decrease as individuals “stuttered” on and off of benefits. Here, though, a member of our cohort is only removed from the count for each bar in this plot if they will not receive benefits again for the remainder of the benefit year.\n\nCheckpoint\nDoes anything surprise you about this plot? Notice that we defined our exit measure in terms of benefit reception; we could have also defined it in terms of claims. Which would make the most sense for your group’s research questions?"
   },
   {
     "objectID": "Class Notebooks/04_measurement.html#future-ui-claims",
     "href": "Class Notebooks/04_measurement.html#future-ui-claims",
-    "title": "Measurement",
+    "title": "5  Measurement",
     "section": "Future UI claims",
     "text": "Future UI claims\nFor our final measure of our cohort’s UI experience, we want to look beyond our primary benefit year and generate a measure capturing whether or not each cohort member has claimed benefits in a future benefit year.\n\nfuture_claims_measure &lt;- analytic_frame %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    future_claims = case_when(\n      max(benefit_yr_start, na.rm = TRUE) &gt; as.Date(\"2022-03-20\") ~ TRUE,\n      TRUE ~ FALSE,\n    )\n  ) %&gt;%\n  ungroup()\n  \n\nfuture_claims_measure %&gt;%\n  group_by(future_claims) %&gt;%\n  summarize(\n    n_people = n_distinct(person_id)\n  ) %&gt;%\n  ungroup() %&gt;%\n  mutate(\n    perc = 100*n_people/sum(n_people)\n  )\n\nWe can see from this that approximately one quarter of individuals in our cohort go on to claim benefits in future benefit years. We are a bit limited by the future claims data we have for this cohort, though, as future benefit years may be censored by the range of available data.\nStill, though, looking at how this interacts with our prior measures, we can also see that stuttered claimants are much REDACTED likely to claim benefits in a future benefit year:\n\nmeasures &lt;- measures %&gt;%\n  inner_join(future_claims_measure, by = \"person_id\")\n\ntable(measures$claim_frequency, measures$future_claims)\n\nAnd that those with a high volume of claims within the initial benefit year are much REDACTED likely to claim benefits in a future benefit year:\n\ntable(measures$spell_volume, measures$future_claims)\n\n\nCheckpoint\nHow could this measure be used for your analysis? Would it be helpful to look at only specific future benefit periods? What about prior benefit years?"
   },
   {
     "objectID": "Class Notebooks/04_measurement.html#any-employment",
     "href": "Class Notebooks/04_measurement.html#any-employment",
-    "title": "Measurement",
+    "title": "5  Measurement",
     "section": "Any Employment",
     "text": "Any Employment\nDue to the eligibility requirements for receiving UI benefits, we expect that most individuals in our cohort should have employment information in the quarters prior to UI program entry (with exceptions for those in sectors not covered by UI wages). As part of evaluating reemployment, we can analyze these trends over time.\nRecall that in working with our analytic frame, employment can be indicated through the created variable employed_in_quarter. As mentioned in the previous section, though, our analytic frame is recorded at the weekly grain, with wage information stored quarterly. Therefore, to isolate non-repetitive quarterly wage information, we must take distinct observations at the person/quarter level.\n\nNote: As one of our key decisions in developing the fact table, we did not include employment information where the individual showed up in the UI wage records with zero wages.\n\n\n# approach: restrict quarters of interest to those in reference table\n# find distinct employment observations by quarter and construct employment %s\nplot_employed_data &lt;- analytic_frame %&gt;%\n  inner_join(quarters_in_range, by = c(\"calendar_year\", \"calendar_quarter\")) %&gt;%\n  distinct(person_id, quarter_from_entry, employed_in_quarter) %&gt;%\n  group_by(quarter_from_entry, employed_in_quarter) %&gt;%\n  summarize(\n    n_people = n_distinct(person_id)\n  ) %&gt;%\n  ungroup() %&gt;%\n  group_by(quarter_from_entry) %&gt;%\n  mutate(\n    perc = 100*n_people/sum(n_people)\n  ) %&gt;%\n  ungroup()\n\nplot_employed_data %&gt;%\n  # just graph percent employed\n  filter(employed_in_quarter == \"Y\") %&gt;%\n  ggplot(aes(x=quarter_from_entry, y=perc)) +\n  geom_line() + \n  geom_vline(xintercept = 0, color = \"red\")\n\nIn this preliminary visualization, keep in mind the range of the y-axis, and how it may be misleading. Still, though, as expected, we see a drop in employment upon UI program entry. With more subsequent quarters for analysis, we might be able to obtain a better understanding of employment recovery for our cohort."
   },
   {
     "objectID": "Class Notebooks/04_measurement.html#quarterly-wages",
     "href": "Class Notebooks/04_measurement.html#quarterly-wages",
-    "title": "Measurement",
+    "title": "5  Measurement",
     "section": "Quarterly Wages",
     "text": "Quarterly Wages\nBeyond an indicator of employment, we can look at the average quarterly wages for our cohort. Because we don’t know potential wages for those missing from the UI wage records, we will not include them here - the implications of this decision will be discussed in next week’s lecture.\n\nplot_wage_data &lt;- analytic_frame %&gt;%\n  inner_join(quarters_in_range, by = c(\"calendar_year\", \"calendar_quarter\")) %&gt;%\n  filter(employed_in_quarter == \"Y\") %&gt;%\n  distinct(person_id, quarter_from_entry, total_wages) %&gt;%\n  group_by(quarter_from_entry) %&gt;%\n  # up to here is same as previous code besides employment filter\n  summarize(\n    avg_wages = mean(total_wages)\n  ) %&gt;%\n  ungroup() \n\nplot_wage_data %&gt;%\n  ggplot(aes(x=quarter_from_entry, y=avg_wages)) +\n  geom_line() + \n  geom_vline(xintercept = 0, color = \"red\")\n\nThe interpretation here gets a little tricky, because we don’t necessarily know the amount of weeks the individual worked in the given quarter - for our cohort definition, it makes sense for the average wages to drop in the quarter after entry, as these individuals are claiming benefits into the start of this next quarter and likely not working full-time throughout that quarter.\nWe can add many additional elements to this plot - here, as an example, we will group average wages by spell volume and frequency.\n\nanalytic_frame %&gt;%\n  inner_join(quarters_in_range, by = c(\"calendar_year\", \"calendar_quarter\")) %&gt;%\n  filter(employed_in_quarter == \"Y\") %&gt;%\n  distinct(person_id, quarter_from_entry, total_wages) %&gt;%\n  # add in person-level measures data frame\n  inner_join(measures, by = \"person_id\") %&gt;% \n  group_by(quarter_from_entry, spell_volume, claim_frequency) %&gt;%\n  summarize(\n    avg_wages = mean(total_wages)\n  ) %&gt;%\n  ungroup() %&gt;%\n  ggplot(aes(x=quarter_from_entry, y = avg_wages, linetype = spell_volume, color = claim_frequency)) +\n  geom_line()\n\nWe’ll clean up this visual in the next notebook, but even here, we can see pretty significant differences in experiences by the claimants’ spell characteristics, especially by volume.\n\nCheckpoint\nWhat other subgroup interactions are you interested in exploring with quarterly wages? Which ones are most relevant for your group project, and how might you be able to apply this code to your work?"
   },
   {
     "objectID": "Class Notebooks/04_measurement.html#full-quarter-primary-employment",
     "href": "Class Notebooks/04_measurement.html#full-quarter-primary-employment",
-    "title": "Measurement",
+    "title": "5  Measurement",
     "section": "Full-Quarter Primary Employment",
     "text": "Full-Quarter Primary Employment\nWe can also evaluate employment recovery and stability through retention-based measures, with options within the overall category:\n\nEmployment-based: Continuous employment, regardless of employer, over time\nJob-based: Continuous employment by the same employer over time\n\nThe difference between these two options is vast, particularly for low-wage workers, as past research indicates that job to job transitions can be a means to economic advancement, as long as there is a consistent record of employment. However, though, for an analysis of employer retention, job retention itself may be of more interest.\nIn this subsection, we will develop a measure aimed at covering job-based stability, full-quarter primary employment. We will define full-quarter primary employment at time \\(t\\) as cases where an individual has the same primary employer in quarters \\(t-1\\), \\(t\\), and \\(t+1\\). Since the UI wage records do not measure weeks or hours, we can develop a proxy by assuming that an individual was primarily employed for the entire middle quarter if they appear to have the same primary employer in three consecutive quarters.\nBecause full-quarter primary employment is reliant on \\(t-1\\) and \\(t+1\\) information for quarter \\(t\\), and we will only be able to evaluate the two quarters before and after spell initiation.\n\n# joining to updated reference table now\nanalytic_frame %&gt;%\n  inner_join(quarters_in_range, by = c(\"calendar_year\", \"calendar_quarter\")) %&gt;%\n  distinct(person_id, quarter_from_entry, primary_employer_id) %&gt;%\n  # arrange by quarter_from_entry to assess continuous employment\n  arrange(person_id, quarter_from_entry) %&gt;%\n  group_by(person_id) %&gt;%\n  # lag() refers to the row before, lead() the row after\n  mutate(\n    full_q_ind = case_when(\n      lag(primary_employer_id) == primary_employer_id &\n        primary_employer_id == lead(primary_employer_id) &\n        !is.na(primary_employer_id) ~ \"Y\",\n      TRUE ~ \"N\"\n    )\n  ) %&gt;%\n  ungroup() %&gt;%\n  group_by(quarter_from_entry, full_q_ind) %&gt;%\n  summarize(\n    n_people = n_distinct(person_id)\n  ) %&gt;%\n  ungroup() %&gt;%\n  group_by(quarter_from_entry) %&gt;%\n  mutate(\n    perc = 100*n_people/sum(n_people)\n  ) %&gt;%\n  filter(full_q_ind == \"Y\") %&gt;%\n  ungroup() %&gt;%\n  ggplot(aes(x=quarter_from_entry, y = perc)) +\n  geom_line()\n\nWe can see a pretty significant drop in the percentage of the cohort experiencing full-quarter employment, especially relative to any employment, in the quarter corresponding to UI benefit entry and the following ones. Based on our previous findings when we interacted the spell frequency and volume with average quarterly wages, we can perform a similar analysis on our full quarter primary employment indicator:\n\nanalytic_frame %&gt;%\n  inner_join(quarters_in_range, by = c(\"calendar_year\", \"calendar_quarter\")) %&gt;%\n  distinct(person_id, quarter_from_entry, primary_employer_id) %&gt;%\n  # add in person-level measures info\n  inner_join(measures, by = \"person_id\") %&gt;%\n  # in arrange and group bys, include grouping variables\n  arrange(person_id, spell_volume, claim_frequency, quarter_from_entry) %&gt;%\n  group_by(person_id, spell_volume, claim_frequency) %&gt;%\n  mutate(\n    full_q_ind = case_when(\n      lag(primary_employer_id) == primary_employer_id &\n        primary_employer_id == lead(primary_employer_id) &\n        !is.na(primary_employer_id) ~ \"Y\",\n      TRUE ~ \"N\"\n    )\n  ) %&gt;%\n  ungroup() %&gt;%\n  group_by(quarter_from_entry, spell_volume, claim_frequency, full_q_ind) %&gt;%\n  summarize(\n    n_people = n_distinct(person_id)\n  ) %&gt;%\n  ungroup() %&gt;%\n  group_by(quarter_from_entry, spell_volume, claim_frequency) %&gt;%\n  mutate(\n    perc = 100*n_people/sum(n_people)\n  ) %&gt;%\n  filter(full_q_ind == \"Y\") %&gt;%\n  ungroup() %&gt;%\n  ggplot(aes(x=quarter_from_entry, y = perc, linetype = spell_volume, color = claim_frequency)) +\n  geom_line()\n\nEven within this short time period, we can see quite the differences amongst the groups.\n\nCheckpoint\nFor your project, if you are interested in a measure of employment stability, would job or employment stability be more valuable? Why?"
   },
   {
     "objectID": "Class Notebooks/05_datavisualization.html",
     "href": "Class Notebooks/05_datavisualization.html",
-    "title": "Data Visualization",
+    "title": "6  Data Visualization",
     "section": "",
-    "text": "Introduction\nWelcome to Notebook 5 of Module 2! At this point in our notebook series, we have built out our descriptive analysis, and are now think about the findings and how to appropriately convey them. For outputs deemed best displayed in an image, we may have started on some initial plots in ggplot2, largely relying on its base functionality. Here, we will show you different ways you can leverage the powerful ggplot2 package to create presentation- and publication-quality data visualizations from our descriptive analysis. We will also discuss different visualization options based on the type of the analysis.\nWe will cover the following visualizations in this notebook:\nAs in previous notebooks, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Loading our analytic frame.\nAs we did in the previous notebook, can recreate our analytic frame by using SQL joins to filter the fact table to only include our cohort members.\nqry &lt;- \"\nselect f.*\nfrom tr_wi_2023.nb_cohort c \njoin tr_wi_2023.wi_mdim_person p on (c.ssn = p.ssn)\njoin tr_wi_2023.wi_fact_weekly_observation f on (p.person_id = f.person_id)\n\"\n\nanalytic_frame &lt;- dbGetQuery(con, qry)\nThis initial section is quite technically-focused. If you’d like, you can skip to the Density plot subsection.\nRecall the structure of traditional ggplot2 syntax:\nAlthough this notebook is quite technical and focused on final outputs, it can still be useful as you are producing your descriptive analysis. In particular, this notebook provides a variety of display options, and you should think about the best choice and design for exhibiting your findings. You can start by creating the base plot and think about an ideal title, so you can adjust the aspects of the graph to highlight your findings for the audience. At a minimum, it will be helpful for the business-oriented members of your team if you reuse the ggsave() code and save preliminary plots early and often, so they can provide their input on the direction of the analysis.\nAdditionally, we recommend revisiting this notebook as you begin preparing to export your final tables and graphs from the ADRF, so you can apply layering updates to ensure your exports are ready for your final presentation and report. There are many other ggplot2 layer aspects we did not cover in this notebook; thankfully, there are many open-source posts and examples for you to draw from as well.\nKamil Slowikowski (2021). ggrepel: Automatically Position Non-Overlapping Text Labels with ‘ggplot2’. R package version 0.9.1. https://CRAN.R-project.org/package=ggrepel\nPedersen, T. L. (2022, August 24). Make your ggplot2 extension package understand the new linewidth aesthetic [web log]. Retrieved July 28, 2023, from https://www.tidyverse.org/blog/2022/08/ggplot2-3-4-0-size-to-linewidth/.\nTian Lou, & Dave McQuown. (2021, March 8). Data Visualization using Illinois Unemployment Insurance Data. Zenodo. https://doi.org/10.5281/zenodo.4589040"
+    "text": "7 Introduction\nWelcome to Notebook 5 of Module 2! At this point in our notebook series, we have built out our descriptive analysis, and are now think about the findings and how to appropriately convey them. For outputs deemed best displayed in an image, we may have started on some initial plots in ggplot2, largely relying on its base functionality. Here, we will show you different ways you can leverage the powerful ggplot2 package to create presentation- and publication-quality data visualizations from our descriptive analysis. We will also discuss different visualization options based on the type of the analysis.\nWe will cover the following visualizations in this notebook:\nAs in previous notebooks, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Loading our analytic frame.\nAs we did in the previous notebook, can recreate our analytic frame by using SQL joins to filter the fact table to only include our cohort members.\nqry &lt;- \"\nselect f.*\nfrom tr_wi_2023.nb_cohort c \njoin tr_wi_2023.wi_mdim_person p on (c.ssn = p.ssn)\njoin tr_wi_2023.wi_fact_weekly_observation f on (p.person_id = f.person_id)\n\"\n\nanalytic_frame &lt;- dbGetQuery(con, qry)\nThis initial section is quite technically-focused. If you’d like, you can skip to the Density plot subsection.\nRecall the structure of traditional ggplot2 syntax:\nAlthough this notebook is quite technical and focused on final outputs, it can still be useful as you are producing your descriptive analysis. In particular, this notebook provides a variety of display options, and you should think about the best choice and design for exhibiting your findings. You can start by creating the base plot and think about an ideal title, so you can adjust the aspects of the graph to highlight your findings for the audience. At a minimum, it will be helpful for the business-oriented members of your team if you reuse the ggsave() code and save preliminary plots early and often, so they can provide their input on the direction of the analysis.\nAdditionally, we recommend revisiting this notebook as you begin preparing to export your final tables and graphs from the ADRF, so you can apply layering updates to ensure your exports are ready for your final presentation and report. There are many other ggplot2 layer aspects we did not cover in this notebook; thankfully, there are many open-source posts and examples for you to draw from as well.\nKamil Slowikowski (2021). ggrepel: Automatically Position Non-Overlapping Text Labels with ‘ggplot2’. R package version 0.9.1. https://CRAN.R-project.org/package=ggrepel\nPedersen, T. L. (2022, August 24). Make your ggplot2 extension package understand the new linewidth aesthetic [web log]. Retrieved July 28, 2023, from https://www.tidyverse.org/blog/2022/08/ggplot2-3-4-0-size-to-linewidth/.\nTian Lou, & Dave McQuown. (2021, March 8). Data Visualization using Illinois Unemployment Insurance Data. Zenodo. https://doi.org/10.5281/zenodo.4589040"
   },
   {
     "objectID": "Class Notebooks/05_datavisualization.html#load-libraries",
     "href": "Class Notebooks/05_datavisualization.html#load-libraries",
-    "title": "Data Visualization",
+    "title": "6  Data Visualization",
     "section": "Load libraries",
     "text": "Load libraries\nWe will start by loading necessary packages not readily available in the base R setup.\n\nAs a reminder, every time you create a new R file, you should copy and run the following code snippet.\n\n\noptions(scipen = 999) # avoid scientific notation\nlibrary(RJDBC)\nlibrary(tidyverse)"
   },
   {
     "objectID": "Class Notebooks/05_datavisualization.html#establish-database-connection",
     "href": "Class Notebooks/05_datavisualization.html#establish-database-connection",
-    "title": "Data Visualization",
+    "title": "6  Data Visualization",
     "section": "Establish database connection",
     "text": "Establish database connection\nThe following set of commands will set up a connection to the Redshift database:\n\ndbusr=Sys.getenv(\"DBUSER\")\ndbpswd=Sys.getenv(\"DBPASSWD\")\n\nurl &lt;- \"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;loginToRp=urn:amazon:webservices:govcloud;ssl=true;AutoCreate=true;idp_host=adfs.adrf.net;idp_port=443;ssl_insecure=true;plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\"\n\ndriver &lt;- JDBC(\n  \"com.amazon.redshift.jdbc42.Driver\",\n  classPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\",\n  identifier.quote=\"`\"\n)\n\ncon &lt;- dbConnect(driver, url, dbusr, dbpswd)\n\nFor this code to work, you need to have an .Renviron file in your user folder (i.e. U:\\\\John.Doe.P00002) containing your username and password."
   },
   {
     "objectID": "Class Notebooks/05_datavisualization.html#density-plot",
     "href": "Class Notebooks/05_datavisualization.html#density-plot",
-    "title": "Data Visualization",
+    "title": "6  Data Visualization",
     "section": "Density plot",
     "text": "Density plot\nTo illustrate the density plot, we will evoke one we created when initially exploring our analytic frame in the Data Model and Record Linkage notebook.\nOn this plot, the y-axis will show the relative frequencies of cohort members with the corresponding number of weeks of claimed benefits and received benefits in this specific benefit year. Compared to a histogram, the density plot is a smoother representation, referred to as a kernel density.\nRecall the initial code we wrote, where one argument in geom_density() was used to differentiate the overlapping plots:\n\nplot_data &lt;- analytic_frame %&gt;%\n  filter(benefit_yr_start == \"2022-03-20\") %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    n_weeks_claimed = sum(benefit_claimed == \"Y\"),\n    n_weeks_received = sum(normal_benefit_received == \"Y\")\n  ) %&gt;%\n  ungroup() \n\n# make longer for ease in legend creation in ggplot\nplot_data %&gt;%\n  pivot_longer(\n    cols = starts_with(\"n\"),\n    names_to = \"stat\",\n    values_to = \"weeks\"\n  ) %&gt;%\n  ggplot(aes(x=weeks, fill=stat)) +\n    geom_density(alpha = .25) # alpha changes transparency\n\nDepending on the features you hope your audience focuses, the choice of overlapping densities may not be the wisest decision. Although it may be helpful in explaining the relative differences in distributions for weeks claimed compared to weeks received, it may be difficult to glean much else, such as the distribution of weeks received for this particular cohort in a specific benefit year.\nIn that case, we may opt for a simpler initial plot:\n\nplot_data %&gt;%\n  ggplot(aes(x=n_weeks_received)) +\n  geom_density()\n\nWe can further modify our base plot to include more informational elements, like titles, labels, and other annotations. TO begin, we will specify the following:\n\nPlot title: We want a simple statement that conveys the major takeaway(s) of the graph.\nAxis labels: To further allow our audience to understand what is being plotted, we will provide well-formatted labels for our axes.\nData source annotation: Providing clear reference and source of the underlying data used for the visualization can increase the credibility and enable the reproducibility of your results. Additionally, if you want, you can also identify the analyst responsible for creating the majestic data visualization.\n\nA handy way to easily modify a plot is to first create a ggplot object from our base plot before adding layers to it.\n\nNote: When initially adding new layers, we recommend that you do not overwrite the ggplot object until you are satisfied with the result of the layer.\n\n\nd_plot &lt;- plot_data %&gt;%\n  ggplot(aes(x=n_weeks_received)) +\n  geom_density()\n\nd_plot\n\nWe can add a title and axis labels with the labs() function. title adds a title to the plot and x and y provide labels for their respective axes.\nWe can also add a caption to the bottom, using caption, to properly attribute the visual by its dataset and perhaps the visualization developer. Likewise, we can add a subtitle to add additional description to our plot using subtitle.\n\nd_plot &lt;- d_plot + \n  labs(title = \"Many Claimants Received Less Than REDACTED Weeks of UI Benefits\",\n       subtitle = \"Density Plot of Number of Weeks Benefits were Received in their 2022 Benefit Year\",\n       x = \"Number of weeks\",\n       y = \"Density\", \n       # \\n is new line\n       caption = \"Source: Wisconsin PROMIS data  \\n Created by Irma Analyst, Ph.D.\")\n\nd_plot\n\nThis is much better than what we started with, but we can add additional refinements to the plot, such as adding a marker showing the median number of weeks, and modifying the overall theme of the plot.\n\nmedian_d_plot &lt;- plot_data %&gt;%\n  summarize(median = quantile(n_weeks_received, .5)) %&gt;%\n  pull()\n\nd_plot  &lt;- d_plot +\n  geom_vline(xintercept = median_d_plot, \n             linetype = \"dotted\", \n             color = \"red\", \n             size = 1.5) +\n  theme_classic()\n\nd_plot\n\nThis marker does not mean much without context - we can add further text annotations on the plot using the annotate() function. In this function, we specify what we want to add to the plot, in terms of text and the x & y coordinates on the plot.\nThis can sometimes require a little trial and error to get the exact x and y coordinates so they appear like you want on the plot, as we often place the text to the side of the curve to avoid overlap.\n\nd_plot &lt;- d_plot +\n  annotate(geom = \"text\", \n           x= median_d_plot+2.5,\n           y = .08,\n           color = \"red\",\n           label = \"Median\") \n\nd_plot\n\nThis plot has improved substantially with some minimal addition to our code, and we can continue to use these additional elements as needed in any other kind of ggplot graph we make.\n\nExporting plots\nAn important, but not always obvious aspect of creating plots in R is getting them exported. If your plot is made with ggplot, then you can use the ggsave() function to save it to any number of different image formats for export.\nYou can specify the size of the file, the image type and the resolution of the image very easily. If you choose not to store your plots as objects in R, e.g.(d_plot from our example), then ggsave() will automatically save the last plot you generated, otherwise you can give it a plot object name to save a specific plot. Here, we save our plot as a .png format (very common for web and document-bound images), with a print resolution, and size 5 inches high by 7 inches wide.\n\nggsave(d_plot,\n       filename = 'WI_dens_plot.png',\n       dpi = \"print\",\n       width = 7,\n       height = 5)"
   },
   {
     "objectID": "Class Notebooks/05_datavisualization.html#line-plot",
     "href": "Class Notebooks/05_datavisualization.html#line-plot",
-    "title": "Data Visualization",
+    "title": "6  Data Visualization",
     "section": "Line plot",
     "text": "Line plot\nOur next plot type we will work with is a line plot. A line plot looks similar to a density plot, which also used a line to show the values of our summary, but is a much more general way to show data, especially over time. In this section, we will build off a line chart we generated in the Quarterly Wages section of the Measurement notebook, which displays average wages by quarter relative to the start of their 2022 benefit year based on the nature of their UI claim history.\nThe code required to develop this plot is quite extensive, and may be more simply accessed through the measurement notebook - we will still copy all of this code in the cell below.\n\nspell_volume_measure &lt;- analytic_frame %&gt;%\n  filter(benefit_yr_start == \"2022-03-20\") %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    n_weeks_claimed = sum(benefit_claimed == \"Y\"),\n  ) %&gt;%\n  ungroup() %&gt;%\n  mutate(\n    spell_volume = case_when(\n      n_weeks_claimed &lt; quantile(n_weeks_claimed, probs = .25) ~ \"low\",\n      n_weeks_claimed &gt;= quantile(n_weeks_claimed, probs = .25) ~ \"high\"\n    ),\n    spell_volume = factor(spell_volume, c(\"low\", \"high\"), ordered = TRUE) # set as factor\n  ) %&gt;%\n  select(-n_weeks_claimed)\n\nclaim_frequency_measure &lt;- analytic_frame %&gt;% \n  # only focused on observations where benefits were claimed\n  filter(benefit_yr_start == \"2022-03-20\", benefit_claimed == \"Y\") %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    n_weeks_claimed = n(),\n    first_week_claimed = min(week_id),\n    last_week_claimed = max(week_id)\n  ) %&gt;%\n  mutate(\n    # add one because range is inclusive\n    duration = last_week_claimed - first_week_claimed + 1, \n    claim_frequency = if_else(\n      duration == n_weeks_claimed, \n      \"continuous\",\n      \"stuttered\"\n    )\n  ) %&gt;%\n  ungroup() %&gt;%\n  select(person_id, claim_frequency)\n\nmeasures &lt;- claim_frequency_measure %&gt;%\n  inner_join(spell_volume_measure, by = \"person_id\")\n\nquarters_in_range &lt;- analytic_frame %&gt;%\n  distinct(calendar_year, calendar_quarter) %&gt;%\n  filter(\n    calendar_year == 2021 & calendar_quarter %in% c(2,3,4) | calendar_year == 2022\n  ) %&gt;%\n  arrange(calendar_year, calendar_quarter) %&gt;%\n  mutate(\n    quarter_from_entry = row_number() - row_number()[calendar_year == 2022 & calendar_quarter == 1]\n  )\n\nWith the proper data frames now available in our environment, we can re-run the code snippet used to create the preliminary line chart, saving it to l_plot.\n\nl_plot &lt;- analytic_frame %&gt;%\n  inner_join(quarters_in_range, by = c(\"calendar_year\", \"calendar_quarter\")) %&gt;%\n  filter(employed_in_quarter == \"Y\") %&gt;%\n  distinct(person_id, quarter_from_entry, total_wages) %&gt;%\n  # add in person-level measures data frame\n  inner_join(measures, by = \"person_id\") %&gt;% \n  group_by(quarter_from_entry, spell_volume, claim_frequency) %&gt;%\n  summarize(\n    avg_wages = mean(total_wages)\n  ) %&gt;%\n  ungroup() %&gt;%\n  ggplot(aes(x=quarter_from_entry,\n             y = avg_wages,\n             linetype = spell_volume,\n             color = claim_frequency)) +\n  geom_line()\n\nl_plot\n\nLet’s start by applying some of the same techniques from before to l_plot.\n\n# update titles, change theme\n# can update legend titles by assigning titles to ggplot aesthetics\nl_plot &lt;- l_plot + \n  labs(\n    title = \"Claimants with REDACTED Spell Volumes Earn REDACTED in the Quarters Pre- and \\nPost- Benefit Entry, On Average\", \n    x = \"Quarter Relative to UI Benefit Start Year (March 2022)\", \n    y = \"Average Quarterly Wages\", \n    subtitle = \"Average Quarterly Wages by Benefit Characteristics Relative to 2022 UI Benefit Start Year\", \n    caption = \"Source: WI PROMIS and UI Wage data \\n Created by Irma Analyst, Ph.D.\",\n    color = \"Claim Frequency\",\n    linetype = \"Claim Volume\"\n  ) +\n  theme_classic()\n\nl_plot\n\nNote that in the previous plot, because it did not require a legend, the caption was already right-aligned. We can enforce the same standard with the existence of a legend by updating the caption’s position.\n\n# default aligns to plot panels, \"plot\" aligns to entire plot\nl_plot &lt;- l_plot +\n  theme(\n    plot.caption.position = \"plot\"\n  )\n\nl_plot\n\nBecause the plot features different colors and line types, we can adjust the default values to better differentiate between the four lines, making them more accessible.\nscale_color_brewer() provides accessible color schemes from ColorBrewer, with options for different variable relationships (sequential, diverging, qualitative). Here, our subgroups are qualitative, so we will opt for one of the qualitative palette options.\nWe can also update the thickness of each line by adjusting the size parameter of geom_line(), with its default at 1. In newer versions of ggplot2, the size parameter has been separated into a size aesthetic for handling sizing, with linewidth controlling the width.\n\nNote: We recommend you enforce consistent aesthetic choices for the same subgroups across plots (ex. keep the colors for claim frequency).\n\n\nl_plot &lt;- l_plot +\n  scale_color_brewer(palette = \"Dark2\") +\n  geom_line(size = 1.3)\n\nl_plot\n\nWe can further improve the clarity of the visualization by adjusting the axes. Specifically, we can update the tick marks on the x-axis to reflect key points, which are all seven quarters (-3 to 3), as opposed to just -2, 0, and 2. Additionally, we can expand the range of the y-axis to start at 0.\n\nl_plot &lt;- l_plot + \n  # start y-axis at 0\n  expand_limits(y=0) +\n  # change x-axis tick mark frequency\n  scale_x_continuous(\n    breaks = seq(from = -3, to = 3, by= 1)\n  )\n\nl_plot\n\nFinally, if we wanted to highlight specific values on the line - say, at the end, we can do so using the ggrepel package, which ensures text labels from overlapping. In this case, because we want to update text, instead of using geom_text(), we will use geom_text_repel().\nSince we just want to highlight the final values on the line, rather than all values, we can filter our initial data frame to values at the end of the plot (quarter_from_entry == 3), and use it as an input to geom_text_repel().\n\nlibrary(ggrepel)\n\ndata_ends &lt;- analytic_frame %&gt;%\n  inner_join(quarters_in_range, by = c(\"calendar_year\", \"calendar_quarter\")) %&gt;%\n  filter(employed_in_quarter == \"Y\") %&gt;%\n  distinct(person_id, quarter_from_entry, total_wages, primary_employer_id) %&gt;%\n  # add in person-level measures data frame\n  inner_join(measures, by = \"person_id\") %&gt;% \n  group_by(quarter_from_entry, spell_volume, claim_frequency) %&gt;%\n  summarize(\n    avg_wages = mean(total_wages),\n    n_people = n_distinct(person_id),\n    n_employers = n_distinct(primary_employer_id)\n  ) %&gt;%\n  mutate(\n    avg_wages = round(avg_wages)\n  ) %&gt;% \n  filter(quarter_from_entry == 3)\n\n\nl_plot +\n  geom_text_repel(\n    data = data_ends, \n    aes(label = avg_wages), \n    # adjust x-axis position of text\n    nudge_x = .3, \n    # only move text in y direction to ensure horizontal alignment\n    direction = \"y\"\n  ) +\n  # update scale to allow for more room on right side to fit labels\n  scale_x_continuous(\n    breaks = seq(from = -3, to = 3, by= 1),\n    limits = c(-3, 3.5)\n  )\n\nWe can then save this file in our working directory.\n\nggsave(l_plot,\n       filename = 'WI_line_plot.png',\n       dpi = \"print\",\n       width = 7, height = 5)"
   },
   {
     "objectID": "Class Notebooks/05_datavisualization.html#bar-plot",
     "href": "Class Notebooks/05_datavisualization.html#bar-plot",
-    "title": "Data Visualization",
+    "title": "6  Data Visualization",
     "section": "Bar Plot",
     "text": "Bar Plot\nRecall that a bar plot can be great for plotting relationships between numerical and categorical variables, often in situations where we want to compare relative sizes. A traditional bar plot may look like the following:\n\n# mtcars is a built-in public dataset in R\nggplot(mpg, aes(y = class)) +\n    geom_bar()\n\nThus far, we have not generated any traditional bar graphs, instead opting for tabular displays. If you recall, though, we have used geom_bar() in a handful of prior code snippets, with the x-axis representing fixed, continuous points (like relative week). Specifically, recall our bar plot of exit rates.\nThe following code regenerates this plot:\n\nexit_rate_measure &lt;- analytic_frame %&gt;%\n  # just looking at benefit reception observations\n  filter(benefit_yr_start == \"2022-03-20\", normal_benefit_received == \"Y\") %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    last_week = max(week_ending_date),\n    last_week_id = max(week_id),\n    n_employers = n_distinct(primary_employer_id),\n    n_people = n_distinct(person_id)\n  )\n\nbenefit_start_id &lt;- analytic_frame %&gt;%\n  filter(week_ending_date == \"2022-03-26\") %&gt;%\n  distinct(week_id) %&gt;%\n  pull()\n\nexit_rate_plot_data &lt;- exit_rate_measure %&gt;%\n  group_by(last_week, last_week_id) %&gt;%\n  summarize(\n    n_leaving = n(),\n    n_employers = sum(n_employers),\n    n_people = sum(n_people)\n  ) %&gt;%\n  ungroup() %&gt;%\n  arrange(last_week_id) %&gt;%\n  #cumsum finds cumulative sum\n  mutate(\n    n_remaining = sum(n_leaving) - cumsum(n_leaving),\n    relative_week = last_week_id - benefit_start_id\n  )\n\nggplot(exit_rate_plot_data, aes(x = relative_week, y = n_remaining)) + \n  geom_bar(stat = \"identity\")\n\nWe can consider this a time-series visualization, and perhaps a line plot may be more suitable. That being said, for pedagogical purposes, we will continue with the visualization as a bar graph. Because we are showing counts on the y-axis, and not percentage, it may be helpful to add a horizontal line representing the 50% cutoff point. To find this, we can divide the total count (n_leaving + n_remaining in the first week) by 2. We’ll also update the theme and labels in this snippet:\n\nb_plot &lt;- ggplot(exit_rate_plot_data, aes(x = relative_week, y = n_remaining)) + \n  geom_bar(stat = \"identity\")\n\n# find total cohort size\ncohort_size &lt;- exit_rate_plot_data %&gt;%\n  filter(relative_week == 1) %&gt;%\n  summarize(\n    n_leaving + n_remaining\n  ) %&gt;%\n  pull()\n\n# graph and label horizontal line\nb_plot &lt;- b_plot +\n  geom_hline(\n    yintercept = cohort_size/2,\n    linetype = \"dotted\",\n    color = \"red\",\n    size = 1.5\n  ) +\n  scale_x_continuous(\n    breaks = seq(0, 50, 5)\n  ) +\n  annotate(\n    geom = \"text\",\n    x = 40,\n    y = (cohort_size/2) + 50,\n    color = \"red\",\n    label = \"50% cutoff\"\n  ) +\n  # update titles\n  labs(\n    title = \"The Exit Rate Slows by Week REDACTED\",\n    x = \"Week Since Benefit Year Start\", \n    y = \"Number Remaining on UI Benefits\",\n    subtitle = \"Exit Counts by Week Relative to Benefit Year Start in 2022\",\n    caption = \"Source: WI PROMIS data \\n Created by Irma Analyst, Ph.D.\"\n  ) +\n  # update theme\n  theme_classic()\n\nb_plot\n\nLastly, for reference, we can add annotations of the number of individuals tracked at the beginning and end of the benefit year.\n\n# find first and last week in the data\ndata_start &lt;- exit_rate_plot_data %&gt;%\n  filter(relative_week == 1) %&gt;%\n  pull(n_remaining)\n\ndata_end &lt;- exit_rate_plot_data %&gt;%\n  filter(relative_week == 50) %&gt;%\n  pull(n_remaining)\n\n# choose annotation two weeks to the right of the bar\nb_plot &lt;- b_plot +\n  annotate(geom = \"text\", \n           x= 3, \n           y = data_start,\n           color = \"black\",\n           label =  data_start) +\n  annotate(geom = \"text\", \n           x= 52,\n           y = data_end,\n           color = \"black\",\n           label = data_end)\n\nb_plot\n\nAt the end of our transformations, we can save the resulting image.\n\nggsave(b_plot, filename = 'WI_bar_plot.png', dpi = \"print\", width = 7, height = 5)"
   },
   {
     "objectID": "Class Notebooks/05_datavisualization.html#heat-map",
     "href": "Class Notebooks/05_datavisualization.html#heat-map",
-    "title": "Data Visualization",
+    "title": "6  Data Visualization",
     "section": "Heat Map",
     "text": "Heat Map\nAs our final visualization, we will showcase a heatmap based on geography. We have not focused much on the potential geographical analyses within this data, so we will create an example inspired by an analysis in our cross-section notebook, where we found the most common industries by workforce development area for our initial cross-section.\nWe will modify this analysis to examine our initial cohort cross-section (easier to pull), graphing the claimant counts by the relative labor force by county in 2022 using public BLS data.\nWe want to create a county map for the state, so we must use the ZIP code to county crosswalk file to identify which zip codes are within which counties. This query accomplishes this:\n\nqry &lt;- \"\nselect c.*, xwalk.county\nfrom tr_wi_2023.nb_cohort c \nleft join tr_wi_2023.wi_rdim_zip_county_wda_xwalk xwalk on (c.res_zip = xwalk.zip)\n\"\n\ncohort_cross_section &lt;- dbGetQuery(con, qry)\n\nWe can’t simply map the numbers of claimants in each county, because larger population counties will naturally have higher numbers of claimants, so we must create a rate of sorts by normalizing to the labor force size in each county.\nWe have these data from the BLS, and we much merge them to our claimant data to create the rates.\nFirst we aggregate the number of claimants by each county:\n\nclaims_by_county &lt;- cohort_cross_section %&gt;% \n  # convert to title name\n  mutate(county = str_to_title(county)) %&gt;%\n  group_by(county) %&gt;%\n  summarize(\n    n_claimants = n_distinct(ssn),\n    n_employers = n_distinct(ui_number)\n  ) %&gt;%\n  ungroup()\n\nNext, we can merge this data frame to the BLS data, and calculate our rate, here per 10,000 people in the labor force. Ideally, we would use a county FIPS code, but these data only have county names, which requires us to manipulate some names to be the same in both data frames, specifically for Saint Croix and Fond Du Lac.\n\nlabor_force &lt;- read_csv(\"P:/tr-wi-2023/Public Data/Labor Force - LAUS.csv\")\n\nh_plot_data &lt;- labor_force %&gt;%\n  mutate(\n    cnty_name = word(Area, 1, sep = \" County\"),\n    cnty_name = case_when(\n      cnty_name == \"St. Croix\" ~ \"Saint Croix\",\n      cnty_name == \"Fond du Lac\" ~ \"Fond Du Lac\",\n      TRUE ~ cnty_name\n    )\n  ) %&gt;%\n  # only use 2022 data since cross section is in 2022\n  filter(Year == 2022) %&gt;%\n  # don't need rest of the variables\n  select(cnty_name, `Labor Force`) %&gt;%\n  left_join(claims_by_county, by = c(\"cnty_name\" = \"county\")) %&gt;%\n  # multiply by 10000 to find rate per 10000 individuals\n  mutate(\n    claimant_rate = 10000 * coalesce(n_claimants / `Labor Force`,0)\n  )\n\nhead(h_plot_data)\n\nNow that we have the rates per county, we just need to figure out how to graph them on a map of the state. To do this, we will use the sf package, which reads spatial, or map, data into R and creates map visualizations. Public Census geographic data are available in the project folder at P:/tr-wi-2023/Public Data/Support team upload/.\n`{r} library(sf) #read GEOJSON into R as df # quiet suppresses info on name, driver, size and spatial reference counties &lt;- st_read(     \"P:/tr-wi-2023/Public Data/Support team upload/county_geographies.geojson\",      quiet = TRUE   ) %&gt;%    filter(STATEFP == 55)  #filter for Wisconsin\nCreating our map is easy, using the geom_sf() geometry in ggplot().\n\n# left join so we have county geography info for each county even if they did not\n# have any claimants in the cross-section\nh_plot &lt;- counties %&gt;%\n  left_join(h_plot_data, by = c(\"NAME\" = \"cnty_name\")) %&gt;%\n  ggplot() + \n  geom_sf(aes(fill=claimant_rate))\n\nh_plot\n\nWe can also apply a different color scheme to the plot using various options in the available suites of color palettes to improve accessibility.\n\nh_plot &lt;- h_plot +\n  scale_fill_viridis_c()\n\nh_plot\n\nIf we want to label the counties with the highest rates, we can use the geom_label_repel() function, which has similar functionality relative to geom_text_repel().\n\n# find counties with highest claimant rates\n# top_n sorts and finds highest values\n# can inner join because only including 5 counties\nhigh_counties &lt;- h_plot_data %&gt;%\n  top_n(5, claimant_rate) %&gt;%\n  inner_join(counties, by = c(\"cnty_name\" = \"NAME\"))\n\n\nh_plot &lt;- h_plot +\n  geom_label_repel(data = high_counties,\n                   aes(label = cnty_name, geometry = geometry),\n                   stat = \"sf_coordinates\",\n                   min.segment.length = 0)\n\nh_plot\n\nLike the other plots in this notebook, we can add titles and annotations to the plot using labs().\n\nh_plot &lt;- h_plot + \n  labs(\n    title = \"Wisconsin Counties with the 5 highest UI Claim Rates\",\n    subtitle = \"Per 10,000 Labor force participants\", \n    fill = \"Claimants\",\n    caption = \"Source: Wisconsin PROMIS data and BLS\\n Created by Irma Analyst, Ph.D.\" \n  )\n\nh_plot\n\nOnce we are satisfied with our output, we can save the visualization.\n\nggsave(h_plot,\n       filename = 'WI_Heatmap.png',\n       dpi = \"print\",\n       width = 7, height = 7)"
   },
   {
     "objectID": "Class Notebooks/05_datavisualization.html#checkpoint",
     "href": "Class Notebooks/05_datavisualization.html#checkpoint",
-    "title": "Data Visualization",
+    "title": "6  Data Visualization",
     "section": "Checkpoint",
     "text": "Checkpoint\nOf your findings, which ones are most suitable to visualization? Why? Are there additional updates you would like to make to any of these plots?"
   },
   {
     "objectID": "Class Notebooks/06_characterizing_demand.html",
     "href": "Class Notebooks/06_characterizing_demand.html",
-    "title": "Characterizing Demand",
+    "title": "7  Characterizing Demand",
     "section": "",
-    "text": "Introduction\nInvestigating the demand side of the labor market can help us understand the different types of employers within it. The majority of the research on labor market outcomes lays emphasis on the role of the employee (labor market supply). While this is important, understanding the employer’s role is also critical for developing employment outcomes.\nIn the previous notebooks, we used descriptive statistics to analyze employment outcomes for our cohort. The goal of this notebook is now to demonstrate how we can leverage descriptive statistics for the purpose of characterizing labor demand and better contextualizing opportunities for employment by job sector. This will allow us to understand the types of employers individuals in our cohort are employed by and their relationship to our outcome measures, as well as recognize in-demand industry trends in Wisconsin.\nAs in previous notebooks, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Employer-side Analysis.\nAn individual in our cohort may have multiple employers of focus - their previous one(s) before claiming UI benefits, and subsequent one(s) upon reemployment. Here, we will provide separate examples focusing on these different employers, and their relationship with some of the outcome measures developed in the Measurement workbook.\nShifting gears, with demand, we can also look at the quantity of job openings by employer characteristic. There are many sources for tracking job postings, one of which is Opportunity Insights’ job postings data from Lightcast, which was formerly known as Burning Glass Technologies.\nThis notebook is all about potential analyses - if you work through the concepts covered in the previous notebook, your project should be more than good enough. However, if you feel intrigued by the possibility of including either one of these types of analyses, whether it is of employer characteristics or job postings, we encourage you to use it to supplement your analysis.\nAt the very least, even if you don’t incorporate this work into your project, we hope you are inspired to consider a demand-focused analysis in the future, either on its own or as a supplement to one focusing on potential employees.\nGarner, Maryah, Nunez, Allison, Mian, Rukhshan, & Feder, Benjamin. (2022). Characterizing Labor Demand with Descriptive Analysis using Indiana’s Temporary Assistance for Needy Families Data and UI Wage Data. https://doi.org/10.5281/zenodo.7459656\nJob postings data from Lightcast, aggregated by Opportunity Insights.\n“The Economic Impacts of COVID-19: Evidence from a New Public Database Built Using Private Sector Data”, by Raj Chetty, John Friedman, Nathaniel Hendren, Michael Stepner, and the Opportunity Insights Team. November 2020. Available at: https://opportunityinsights.org/wp-content/uploads/2020/05/tracker_paper.pdf"
+    "text": "8 Introduction\nInvestigating the demand side of the labor market can help us understand the different types of employers within it. The majority of the research on labor market outcomes lays emphasis on the role of the employee (labor market supply). While this is important, understanding the employer’s role is also critical for developing employment outcomes.\nIn the previous notebooks, we used descriptive statistics to analyze employment outcomes for our cohort. The goal of this notebook is now to demonstrate how we can leverage descriptive statistics for the purpose of characterizing labor demand and better contextualizing opportunities for employment by job sector. This will allow us to understand the types of employers individuals in our cohort are employed by and their relationship to our outcome measures, as well as recognize in-demand industry trends in Wisconsin.\nAs in previous notebooks, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Employer-side Analysis.\nAn individual in our cohort may have multiple employers of focus - their previous one(s) before claiming UI benefits, and subsequent one(s) upon reemployment. Here, we will provide separate examples focusing on these different employers, and their relationship with some of the outcome measures developed in the Measurement workbook.\nShifting gears, with demand, we can also look at the quantity of job openings by employer characteristic. There are many sources for tracking job postings, one of which is Opportunity Insights’ job postings data from Lightcast, which was formerly known as Burning Glass Technologies.\nThis notebook is all about potential analyses - if you work through the concepts covered in the previous notebook, your project should be more than good enough. However, if you feel intrigued by the possibility of including either one of these types of analyses, whether it is of employer characteristics or job postings, we encourage you to use it to supplement your analysis.\nAt the very least, even if you don’t incorporate this work into your project, we hope you are inspired to consider a demand-focused analysis in the future, either on its own or as a supplement to one focusing on potential employees.\nGarner, Maryah, Nunez, Allison, Mian, Rukhshan, & Feder, Benjamin. (2022). Characterizing Labor Demand with Descriptive Analysis using Indiana’s Temporary Assistance for Needy Families Data and UI Wage Data. https://doi.org/10.5281/zenodo.7459656\nJob postings data from Lightcast, aggregated by Opportunity Insights.\n“The Economic Impacts of COVID-19: Evidence from a New Public Database Built Using Private Sector Data”, by Raj Chetty, John Friedman, Nathaniel Hendren, Michael Stepner, and the Opportunity Insights Team. November 2020. Available at: https://opportunityinsights.org/wp-content/uploads/2020/05/tracker_paper.pdf"
   },
   {
     "objectID": "Class Notebooks/06_characterizing_demand.html#load-libraries",
     "href": "Class Notebooks/06_characterizing_demand.html#load-libraries",
-    "title": "Characterizing Demand",
+    "title": "7  Characterizing Demand",
     "section": "Load libraries",
     "text": "Load libraries\nWe will start by loading necessary packages not readily available in the base R setup.\n\nAs a reminder, every time you create a new R file, you should copy and run the following code snippet.\n\n\noptions(scipen = 999) # avoid scientific notation\nlibrary(RJDBC)\nlibrary(tidyverse)\nlibrary(lubridate) # future data manipulation"
   },
   {
     "objectID": "Class Notebooks/06_characterizing_demand.html#establish-database-connection",
     "href": "Class Notebooks/06_characterizing_demand.html#establish-database-connection",
-    "title": "Characterizing Demand",
+    "title": "7  Characterizing Demand",
     "section": "Establish database connection",
     "text": "Establish database connection\nThe following set of commands will set up a connection to the Redshift database:\n\ndbusr=Sys.getenv(\"DBUSER\")\ndbpswd=Sys.getenv(\"DBPASSWD\")\n\nurl &lt;- \"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;loginToRp=urn:amazon:webservices:govcloud;ssl=true;AutoCreate=true;idp_host=adfs.adrf.net;idp_port=443;ssl_insecure=true;plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\"\n\ndriver &lt;- JDBC(\n  \"com.amazon.redshift.jdbc42.Driver\",\n  classPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\",\n  identifier.quote=\"`\"\n)\n\ncon &lt;- dbConnect(driver, url, dbusr, dbpswd)\n\nFor this code to work, you need to have an .Renviron file in your user folder (i.e. U:\\\\John.Doe.P00002) containing your username and password."
   },
   {
     "objectID": "Class Notebooks/06_characterizing_demand.html#data-load",
     "href": "Class Notebooks/06_characterizing_demand.html#data-load",
-    "title": "Characterizing Demand",
+    "title": "7  Characterizing Demand",
     "section": "Data Load",
     "text": "Data Load\nIn linking employer characteristics to our cohort’s unemployment experience and employment measures, we will leverage two queries loading the following into R:\n\nAnalytic frame containing cohort\nEmployer characteristics table\n\nThe first, our analytic frame, relies on the same code snippet as the previous notebooks.\n\n# load analytic frame\nqry &lt;- \"\nselect f.*\nfrom tr_wi_2023.nb_cohort c \njoin tr_wi_2023.wi_mdim_person p on (c.ssn = p.ssn)\njoin tr_wi_2023.wi_fact_weekly_observation f on (p.person_id = f.person_id)\n\"\n\nanalytic_frame &lt;- dbGetQuery(con, qry)\n\nhead(analytic_frame)\n\nThe other can be accessed through a simple SELECT and FROM query, loading a new table, employer_yearly_agg, in the tr_wi_2023 schema, into R.\nAn in-depth description of the process for creating this table is available in a new supplemental notebook, Employer Measures. Briefly, the table contains a yearly summary by employer of many characteristics, and was generated by aggregating the UI wage records to the employer. The table is restricted to employers with at least five observations in a given quarter, and then aggregated based on quarterly values to the yearly grain.\nLet’s take a look at this table:\n\nqry &lt;- \"\nselect *\nfrom tr_wi_2023.employer_yearly_agg\n\"\n\nemployer_yearly_agg &lt;- dbGetQuery(con, qry) \n\nhead(employer_yearly_agg)\n\nThe variables are separated into four categories:\n\nFirm characteristics\n\nTotal payroll\nTotal full quarter employment\nTotal employment\nNAICS\n\nMeasures of stability\n\nNumber of new hires who become full quarter employees (hired in \\(t-1\\) whom we see in \\(t+1\\))\nRatio of full quarter employees to all employees\n\nMeasures of Opportunity\n\nGrowth rate of employment\nGrowth rate of hires\nGrowth rate of separations\n\nMeasures of job quality\n\nAverage earnings per employee\nAverage full quarter earnings per employee\nEarnings per employee at 25th percentile\nEarnings per employee at 75th percentile\n\n\nBecause the ui_account variable identifying employers in our main analytic frame is stored as an integer, rather than a character, we convert the ui_account number in this table now:\n\nemployer_yearly_agg &lt;- employer_yearly_agg %&gt;%\n  mutate(ui_account = as.integer(ui_account))"
   },
   {
     "objectID": "Class Notebooks/06_characterizing_demand.html#identifying-previous-and-next-employers",
     "href": "Class Notebooks/06_characterizing_demand.html#identifying-previous-and-next-employers",
-    "title": "Characterizing Demand",
+    "title": "7  Characterizing Demand",
     "section": "Identifying previous and next employers",
     "text": "Identifying previous and next employers\nOur analytic frame does not contain clear identifiers for previous and following employers, as it consists of individuals over time, with changing previous and following employers within the lifetime of our data. For consistency with our original cohort definition and analysis up to this point, we will define their most recent employer as the listed prior employer in their earliest claim week associated with the benefit year beginning 2022-03-20:\n\nlast_employer &lt;- analytic_frame %&gt;%\n  filter(benefit_yr_start == as.Date(\"2022-03-20\"), benefit_claimed == \"Y\") %&gt;%\n  group_by(person_id) %&gt;%\n  filter(week_ending_date == min(week_ending_date)) %&gt;% \n  ungroup() %&gt;%\n  select(\n    person_id, \n    # rename to differentiate year\n    initial_claim_year = calendar_year,\n    last_employer\n  )\n\nTo define our cohort’s next employer, we take their listed employer from the UI wage data in the next week where they have a listed employer and don’t claim benefits after their initial benefit claim:\n\nNote: This is an approach to identifying subsequent primary employer, not the approach. For example, you may opt to add an additional condition to only look at employment after the first week of benefit reception.\n\n\nnext_employer &lt;- analytic_frame %&gt;%\n  filter(!is.na(primary_employer_id)) %&gt;% \n  group_by(person_id) %&gt;%\n  # find all weeks of no benefit reception in their benefit year\n  filter(\n    week_ending_date &gt;= min(week_ending_date[benefit_yr_start == as.Date(\"2022-03-20\")], na.rm = TRUE),\n    benefit_claimed == \"N\"\n  ) %&gt;% \n  # of all those weeks, take first one\n  filter(week_ending_date == min(week_ending_date)) %&gt;%\n  ungroup() %&gt;%\n  select(\n    person_id, \n    next_employment_year = calendar_year, \n    next_employer = primary_employer_id \n  )\n\nWe combine these into one table, using a left join to preserve all instances of prior employment, which includes everyone in the initial cohort. Not everyone in the cohort necessary has a record of reemployment in the data, as reemployment is contingent upon coverage in the UI wage records, relative to the PROMIS data, which records the previous employer, regardless of UI wage record coverage.\n\nemployers &lt;- last_employer %&gt;%\n  left_join(next_employer, by = 'person_id')\n\nFor our cohort, many of their most recent primary employer are also their subsequent primary employer:\n\nemployers %&gt;%\n  mutate(\n    same_emp = last_employer == next_employer,\n    # treat NA (no future employer) as FALSE\n    same_emp = ifelse(is.na(same_emp), FALSE, same_emp)\n  ) %&gt;%\n  group_by(same_emp) %&gt;%\n  summarize(n_distinct(person_id))\n\nWe will now link this information with our employer characteristics data frame to conduct a few example analyses."
   },
   {
     "objectID": "Class Notebooks/06_characterizing_demand.html#example-1-claim-volume-by-past-employer-earnings",
     "href": "Class Notebooks/06_characterizing_demand.html#example-1-claim-volume-by-past-employer-earnings",
-    "title": "Characterizing Demand",
+    "title": "7  Characterizing Demand",
     "section": "Example 1: Claim Volume by Past Employer Earnings",
     "text": "Example 1: Claim Volume by Past Employer Earnings\nIn evaluating our cohort’s unemployment experiences, we looked into their benefit spells, classifying them by volume and frequency. We can also treat benefit spell volume as a numeric, rather than categorical, variable, comparing average spell lengths by a specific characteristic of each claimant’s most recent employer. In this case, we will focus on the average employer quarterly earnings of all individuals at the organization.\nFirst, we will find the claim volume for each individual in our specific benefit year.\n\nclaim_volume_measure &lt;- analytic_frame %&gt;%\n  filter(benefit_yr_start == \"2022-03-20\") %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    n_weeks_claimed = sum(benefit_claimed == \"Y\")\n  ) %&gt;%\n  ungroup()\n\nhead(claim_volume_measure)\n\nThen we will group the average quarterly earnings variable, avg_avg_earnings, into new categories - high, medium, and low.\n\n# below or at 25th percentile is \"low\"\n# between 25th and 75th percentiles is \"medium\"\n# at or above 75th percentile is \"high\"\nemployer_earnings_measure &lt;- employer_yearly_agg %&gt;%\n  group_by(years) %&gt;%\n  mutate(\n    earnings_category = case_when(\n      avg_avg_earnings &lt;= quantile(avg_avg_earnings, .25) ~ \"Low\",\n      avg_avg_earnings &gt;=  quantile(avg_avg_earnings, .75) ~ \"High\",\n      !is.na(avg_avg_earnings) ~ \"Medium\", \n      # in case anything unexpected!\n      TRUE ~ NA_character_\n    )\n  ) %&gt;%\n  # change type to factor so table output is in desired order\n  mutate(\n    earnings_category = factor(\n      earnings_category,\n      levels = c(\"Low\", \"Medium\", \"High\"),\n      ordered = TRUE\n    )\n  ) %&gt;%\n  # select relevant variables\n  select(ui_account, years, earnings_category, avg_avg_earnings)\n\nhead(employer_earnings_measure)\n\nWith our three data frames populated (cohort with previous employers, employers by earning measure, and claim volumes for cohort) developed, we can combine them, using left joins starting with the cohort with previous employers data frame, as it contains everyone in the original cohort.\n\ncombined_measures_last &lt;- employers %&gt;%\n  # don't need next employer info\n  select(\n    person_id, last_employer, initial_claim_year\n  ) %&gt;%\n  left_join(\n    employer_earnings_measure,\n    by = c(\"last_employer\" = \"ui_account\", \"initial_claim_year\" = \"years\")\n  ) %&gt;%\n  left_join(claim_volume_measure, by = \"person_id\")\n\nhead(combined_measures_last)\n\nAnd finally find the average claim volume by prior employer.\n\nAn NA value in earnings_category corresponds to individuals whose previous employers were not found in employers_yearly_agg in the given year.\n\n\ncombined_measures_last %&gt;% \n  group_by(earnings_category) %&gt;%\n  summarize(\n    mean_weeks_claimed = mean(n_weeks_claimed)\n  )\n\nAre you surprised by the results?"
   },
   {
     "objectID": "Class Notebooks/06_characterizing_demand.html#example-2-future-claims-by-next-employers-employment-growth-rate",
     "href": "Class Notebooks/06_characterizing_demand.html#example-2-future-claims-by-next-employers-employment-growth-rate",
-    "title": "Characterizing Demand",
+    "title": "7  Characterizing Demand",
     "section": "Example 2: Future Claims by Next Employer’s Employment Growth Rate",
     "text": "Example 2: Future Claims by Next Employer’s Employment Growth Rate\nFor this example, we will focus on the future, at least relative to UI benefit program entry in 2022. In the Measurement notebook, we created an indicator tracking if individuals in our cohort appeared in the PROMIS data in a later benefit year.\nWe can reuse the exact code from that notebook:\n\nfuture_claims_measure &lt;- analytic_frame %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    future_claims = case_when(\n      max(benefit_yr_start, na.rm = TRUE) &gt; as.Date(\"2022-03-20\") ~ TRUE,\n      TRUE ~ FALSE,\n    )\n  ) %&gt;%\n  ungroup()\n\nAs we did in the first example, we will transform an employer characteristic numeric variable into a categorical one, this time using avg_emp_rate, which tracks the average quarterly employment growth rate for each employer/year combination.\n\n# positive growth rate when emp_rate &gt; 0\nnext_employer_growth_measure &lt;- employer_yearly_agg %&gt;%\n  mutate(\n    positive_emp_growth = avg_emp_rate &gt; 0\n  ) %&gt;%\n  # select relevant columns\n  select(\n    c(\"ui_account\", \"years\", \"avg_emp_rate\", \"positive_emp_growth\")\n  )\n\nhead(next_employer_growth_measure)\n\nAgain, we can combine our three data frames, left joining the ones containing the growth measure and future claims indicator to employers.\n\ncombined_measures_next &lt;- employers %&gt;%\n  select(\n    person_id, next_employer, next_employment_year\n  ) %&gt;%\n  mutate(\n    next_employer = as.integer(next_employer)\n  ) %&gt;%\n  left_join(\n    next_employer_growth_measure,\n    by = c(\n      \"next_employer\" = \"ui_account\",\n      \"next_employment_year\" = \"years\"\n    )\n  ) %&gt;%\n  left_join(future_claims_measure, by = \"person_id\")\n\nhead(combined_measures_next)\n\nAnd then create our final output table:\n\ncombined_measures_next %&gt;%\n  group_by(positive_emp_growth, future_claims) %&gt;%\n  summarize(\n    n_people = n_distinct(person_id)\n  ) %&gt;%\n  ungroup() %&gt;%\n  group_by(positive_emp_growth) %&gt;%\n  mutate(\n    perc = 100*n_people/sum(n_people)\n  )\n\nDo these results surprise you? Keep in mind that we have limited future claims data given our cohort definition and range of available data."
   },
   {
     "objectID": "Class Notebooks/06_characterizing_demand.html#data-load-1",
     "href": "Class Notebooks/06_characterizing_demand.html#data-load-1",
-    "title": "Characterizing Demand",
+    "title": "7  Characterizing Demand",
     "section": "Data Load",
     "text": "Data Load\nFor pedagogical purposes, we have ingested a small sample of job postings data into the ADRF - it is accessible as a csv file in the P: drive. The ingested data is available weekly by state, and given our cohort and data restrictions, we will subset the data to just include Wisconsin.\n\nNote: We mention “pedagogical purposes” because the data is incomplete for many weeks - with a complete dataset, perhaps a more robust analysis can be executed.\n\n\n# read in and subset job postings data\nwi_postings &lt;- read_csv(\"P:/tr-wi-2023/Public Data/Burning Glass - State - Weekly.csv\") %&gt;%\n  filter(statefips == 55)\n\nhead(wi_postings)\n\nThe data dictionary (Right-click on link to open) in the P: drive contains detailed descriptions of each variable.\nSince our cohort is subset to those with benefit years starting in the last week of the first quarter, to get a sense of future openings, we can look at the job postings data starting in Q2 of 2022.\n\n# make_date() converts multiple columns into date\n# filter postings to after start of benefit year (started last week of Q1)\nwi_postings_post_ui &lt;- wi_postings %&gt;%\n  mutate(date = make_date(year, month, day_endofweek)) %&gt;%\n  filter(quarter(date) &gt;= 2, year(date) == 2022) \n\nhead(wi_postings_post_ui)"
   },
   {
     "objectID": "Class Notebooks/06_characterizing_demand.html#analysis",
     "href": "Class Notebooks/06_characterizing_demand.html#analysis",
-    "title": "Characterizing Demand",
+    "title": "7  Characterizing Demand",
     "section": "Analysis",
     "text": "Analysis\nHere, we will focus on total job postings, as well as job postings within NAICS supersectors (aggregated versions of 2-digit NAICS codes), which are denoted by the columns containing _ss.\n\nwi_postings_post_ui &lt;- wi_postings_post_ui %&gt;%\n  select(starts_with(\"bg\"), -contains(\"jz\"), date)\n\nhead(wi_postings_post_ui)\n\nThe supersector-related columns are stored as character variables. However, since they are measuring levels relative to January 2020, it makes sense to convert them to numeric. While we’re doing the conversion, since we hope to plot changes job posting levels by supersector, we will lengthen the data frame.\n\nNote: The baseline January 2020 data is available in the P: drive as Job Postings Industry Shares - National - 2020.csv.\n\n\nplotting_data &lt;- wi_postings_post_ui %&gt;%\n  # across will executive the function (2nd argument) for all columns\n  mutate(across(contains(\"ss\"), as.numeric)) %&gt;%\n  pivot_longer(\n    # don't pivot date column\n    cols = starts_with(\"bg\"),\n    names_to = \"supersector\",\n    values_to = \"relative_postings\"\n  ) %&gt;%\n  # rename supersector categories (previously separate columns)\n  mutate(\n    supersector = case_when(\n      supersector == \"bg_posts\" ~ \"total\",\n      str_detect(supersector, \"30\") ~ \"manufacturing\",\n      str_detect(supersector, \"55\") ~ \"finance\",\n      str_detect(supersector, \"60\") ~ \"prof and bus services\",\n      str_detect(supersector, \"65\") ~ \"edu and health services\",\n      str_detect(supersector, \"70\") ~ \"hospitality\",\n      # for all that don't meet expected criteria - helpful in debugging\n      TRUE ~ NA_character_\n    )\n  ) \n\nhead(plotting_data)\n\nThen we can plot the data using a line chart, highlighting the total trend line in Wisconsin relative to the others.\n\n# keep subset of total for special highlighting on graph\nplotting_data_tot &lt;- plotting_data %&gt;%\n  filter(supersector == \"total\")\n\nplotting_data %&gt;%\n  filter(supersector != \"total\") %&gt;%\n  ggplot() +\n  # include group argument and color supersector lines\n  geom_line(aes(x=date, y=relative_postings, group = supersector, color=supersector)) +\n  # include group argument and change size of total line\n  geom_line(aes(x=date, y=relative_postings, group = supersector), data=plotting_data_tot, size = 1.5) +\n  ylim(-1.5,1.5) +\n  scale_color_brewer(type = \"qual\", palette = \"Dark2\") +\n  theme_classic()\n\nDo you find any of these trends surprising?\nFor reference, we can compare these trends to the supersectors representing our cohort’s most recent primary employers prior to their benefit year with the following approach:\n\nRepurpose our code creating the last_employer data frame to find the NAICS code associated with this employer\nMerge with newly-available NAICS supersector to 2-digit NAICS crosswalk\nDevelop final table\n\nWe will start with our first step:\n\nlast_employer_w_naics &lt;- analytic_frame %&gt;%\n  filter(benefit_yr_start == as.Date(\"2022-03-20\"), benefit_claimed == \"Y\") %&gt;%\n  group_by(person_id) %&gt;%\n  filter(week_ending_date == min(week_ending_date)) %&gt;% \n  ungroup() %&gt;%\n  select(\n    person_id, \n    last_employer_naics\n  )\n\nhead(last_employer_w_naics)\n\nThe NAICS supersector crosswalk is available in the P drive as high_level_crosswalk.csv.\n\nsuper_xwalk &lt;- read_csv(\"P:/tr-wi-2023/Public Data/high_level_crosswalk.csv\") \n\nhead(super_xwalk)\n\nFor our purposes, the two relevant variables in the crosswalk are super_sector and naics_sector. To make it easier to join naics_sector to the last_employer_w_naics data frame, we will isolate the numeric component of each entry.\n\nsuper_xwalk &lt;- super_xwalk %&gt;%\n  mutate(\n    two_digit_naics = str_extract(naics_sector, \"[0-9][0-9]\"),\n    # get rid of numeric in super_sector, always 4 digits + space\n    super_sector = substring(super_sector, 6)\n  ) %&gt;%\n  select(super_sector, two_digit_naics)\n\nhead(super_xwalk)\n\nFinally, we can join the two tables and aggregate our cohort by their most recent employer’s NAICS supersector.\n\nlast_employer_w_naics %&gt;%\n  mutate(\n    two_digit_naics = substring(last_employer_naics, 1, 2)\n  ) %&gt;%\n  left_join(super_xwalk, by = \"two_digit_naics\") %&gt;%\n  group_by(super_sector) %&gt;%\n  summarize(\n    n_people = n_distinct(person_id)\n  ) %&gt;%\n  ungroup() %&gt;%\n  mutate(\n    perc = 100*n_people/sum(n_people)\n  ) %&gt;%\n  arrange(desc(perc))\n\nWe can see that the outlook appears to be relatively positive for those previously in manufacturing, for example, in terms of future job availability."
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html",
     "href": "Class Notebooks/07_disclosure_review.html",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "",
-    "text": "Introduction\nThis workbook provides information on how to prepare research output for disclosure control. It outlines how to prepare different kinds of outputs before submitting an export request and gives an overview of the information needed for disclosure review. Please read through the entire workbook because it will separately discuss different types of outputs that will be flagged in the disclosure review process.\nWe will apply the Wisconsin export rules to the following files in this workbook:\nWhen exporting results, there are 3 items to be concerned with:\nAs in previous workbooks, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you are not concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Loading our analytic frame.\nSince we will be adapting tables and visuals we have created in past notebook that mostly relied on the same underlying analytic frame, we will recreate it and read it into R first.\nqry &lt;- \"\nselect f.*\nfrom tr_wi_2023.nb_cohort c \njoin tr_wi_2023.wi_mdim_person p on (c.ssn = p.ssn)\njoin tr_wi_2023.wi_fact_weekly_observation f on (p.person_id = f.person_id)\n\"\n\nanalytic_frame &lt;- dbGetQuery(con, qry)\nOur first file we will prepare for export is a table containing future claims by employment growth created in the Characterizing Demand notebook. In reality, the output development and disclosure review preparation are done in tandem. However, for simplicity, we will do this in separate steps, as we have already generated the initial output file.\nOur second file to export is a bar plot showing the exit counts by week for our cohort in 2022. We initially created this bar plot in the Visualization notebook.\nOur third file to prepare for export will build off of the line plot from the Visualization notebook. The line plot in that notebook depicted average wages over time; here, we are going to pivot slightly and show median wages over time.\nFor our final export file we will be disclosure-proofing the heatmap from the visualization notebook, which displays counties by their UI claim rate at a specific point in time.\nIn this section, we provide examples of different techniques for exporting our presentation-ready plots. We can use ggsave() to save our visuals in a png, jpeg and pdf format without losing quality, demonstrating saving as each file type on the final plots.\nThis notebook may appear to be overwhelming, but majority of the code has been copied from previous notebooks to recreate the final tables and graphs. Focus your attention on the disclosure rules and procedures applied to each output, as this provides useful information and code techniques to apply to a variety of outputs. We recommend saving all output early so your team members can provide a fresh set of eyes on all the final files to ensure the all rules have been appropriately applied.\nAdditionally, we recommend revisiting this notebook as you begin disclosure proofing your final tables and graphs so you can ensure your exports are ready for your final presentation and report.\nVDC 2022 Presentation Preparation Notebook, Joshua Edelmann and Benjamin Feder (citation to be added)\nWI 2023 Characterizing Labor Demand Notebook, Roy McKenzie, Benjamin Feder (citation to be added)\nWI 2023 Data Visualization Notebook, Corey Sparks, Benjamin Feder, Roy McKenzie, and Joshua Edelmann (citation to be added)"
+    "text": "9 Introduction\nThis workbook provides information on how to prepare research output for disclosure control. It outlines how to prepare different kinds of outputs before submitting an export request and gives an overview of the information needed for disclosure review. Please read through the entire workbook because it will separately discuss different types of outputs that will be flagged in the disclosure review process.\nWe will apply the Wisconsin export rules to the following files in this workbook:\nWhen exporting results, there are 3 items to be concerned with:\nAs in previous workbooks, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you are not concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Loading our analytic frame.\nSince we will be adapting tables and visuals we have created in past notebook that mostly relied on the same underlying analytic frame, we will recreate it and read it into R first.\nqry &lt;- \"\nselect f.*\nfrom tr_wi_2023.nb_cohort c \njoin tr_wi_2023.wi_mdim_person p on (c.ssn = p.ssn)\njoin tr_wi_2023.wi_fact_weekly_observation f on (p.person_id = f.person_id)\n\"\n\nanalytic_frame &lt;- dbGetQuery(con, qry)\nOur first file we will prepare for export is a table containing future claims by employment growth created in the Characterizing Demand notebook. In reality, the output development and disclosure review preparation are done in tandem. However, for simplicity, we will do this in separate steps, as we have already generated the initial output file.\nOur second file to export is a bar plot showing the exit counts by week for our cohort in 2022. We initially created this bar plot in the Visualization notebook.\nOur third file to prepare for export will build off of the line plot from the Visualization notebook. The line plot in that notebook depicted average wages over time; here, we are going to pivot slightly and show median wages over time.\nFor our final export file we will be disclosure-proofing the heatmap from the visualization notebook, which displays counties by their UI claim rate at a specific point in time.\nIn this section, we provide examples of different techniques for exporting our presentation-ready plots. We can use ggsave() to save our visuals in a png, jpeg and pdf format without losing quality, demonstrating saving as each file type on the final plots.\nThis notebook may appear to be overwhelming, but majority of the code has been copied from previous notebooks to recreate the final tables and graphs. Focus your attention on the disclosure rules and procedures applied to each output, as this provides useful information and code techniques to apply to a variety of outputs. We recommend saving all output early so your team members can provide a fresh set of eyes on all the final files to ensure the all rules have been appropriately applied.\nAdditionally, we recommend revisiting this notebook as you begin disclosure proofing your final tables and graphs so you can ensure your exports are ready for your final presentation and report.\nVDC 2022 Presentation Preparation Notebook, Joshua Edelmann and Benjamin Feder (citation to be added)\nWI 2023 Characterizing Labor Demand Notebook, Roy McKenzie, Benjamin Feder (citation to be added)\nWI 2023 Data Visualization Notebook, Corey Sparks, Benjamin Feder, Roy McKenzie, and Joshua Edelmann (citation to be added)"
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html#wi-2023-class-export-guidelines",
     "href": "Class Notebooks/07_disclosure_review.html#wi-2023-class-export-guidelines",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "WI 2023 Class Export Guidelines",
     "text": "WI 2023 Class Export Guidelines\nThe following rules concern the files for export.\n\nEach team is able to export up to 10 figures/tables\n\nWe limit the number of files to export because reviewing export requests is a highly manual process, thus very time extensive. Along with Coleridge’s review, it also needs to pass additional review from Wisconsin, so each additional file will add more time to the review process. Also, for a 20-minute presentation, 10 figures/tables should be more than sufficient.\n\nEvery statistic for export must be based on at least 10 individuals and at least 3 employers (when using wage records)\n\nStatistics that are based on 0-9 individuals must be suppressed\nStatistics derived from the UI wage records that are based on 0-2 employers must be suppressed\n\nCounts must to be rounded\n\nCounts below 1000 must be rounded to the nearest ten\nCounts greater than or equal to 1000 must be rounded to the nearest hundred\n\nFor example, a count of 868 would be rounded to 870, and a count of 1868 would be rounded to 1900.\n\nWe ask for rounded counts to limit the possibility of complementary disclosure risk\n\nReported wages must be rounded to the nearest hundred\nReported averages must be rounded to the nearest tenth\nPercentages and proportions must be rounded\n\nThe same rounding rules applied to counts must be applied to both the numerator and denominator before finding the percentage/proportion\nPercentages must then be rounded to the nearest percent\nProportions must be rounded to the nearest hundredth\n\nExact percentiles cannot be exported\n\nExact percentiles cannot be exported because they may represent a true data point\nInstead, for example, you may calculate a “fuzzy median,” by averaging the true 45th and 55th percentiles\n\nIf you are calculating fuzzy wage percentiles, you will need to round to the nearest hundred after calculating the fuzzy percentile\nIf you are calculating fuzzy percentiles for counts of individuals, you will need to round to the nearest 10 if the count is less than 1000 and to the nearest hundred if the count is greater than or equal to 1000\n\n\nExact maxima and minima cannot be exported\n\nMaxima and minima cannot be exported because they will correspond to a true data point\nSuppress maximum and minimum values in general\nYou may replace an exact maximum or minimum with a top-coded value or a fuzzy maximum or minimum value. For example: If the maximum value for earnings is 154,325, it could be top-coded as ‘100,000+’. (The earnings value 154,325 is an example only and not derived from Wisconsin DWD data.) Another permissible approach using this example would be calculating a fuzzy maximum value by using the formula below:\n\n\n\nNote: To ensure the correct display of this equation, please access this file using Google Chrome. To accomplish this, right-click on the file, hover your cursor over the Open with option, and subsequently choose Google Chrome.\n\n\\[\n\\frac{90th\\ percentile\\ of\\ earnings + 154325}{2}\n\\]\n\nComplementary suppression\n\nIf your files include totals or are dependent on a preceding or subsequent file, you may need to be mindful of complementary disclosure risks — that is assessing if the file totals or the separate files, when read together, might disclose information about less then 10 individuals or 3 employers in the data in a way that a single, simpler file would not. Team leads and export reviewers will work with you on implementing any necessary complementary suppression techniques."
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html#supporting-documentation",
     "href": "Class Notebooks/07_disclosure_review.html#supporting-documentation",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "Supporting Documentation",
     "text": "Supporting Documentation\nAs mentioned above, you will need to provide additional information to accompany each of the files requested for export for them to be approved by the reviewers.\nUnderlying counts\nYou will need to provide a table with underlying counts of individuals and employers (where appropriate) for each statistic depicted in the file(s) requested for export. It’s often easiest to have a corresponding counts file for each file requested for export.\n\nYou will need to include both the rounded and the unrounded counts of individuals\nIf percentages or proportions are to be exported, you must report both the rounded and the unrounded counts of individuals for the numerator and denominator. You must also report the counts of employers for both the numerator and the denominator when working with wage records.\n\nCode\n\nPlease provide the code written to create every output requested for export and the code generating every table with underlying counts. It is important for the export reviewers to have the code to better understand what exactly was done and replicate results. Thus, it is important to document every step of the analysis in your code file(s)."
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html#load-libraries",
     "href": "Class Notebooks/07_disclosure_review.html#load-libraries",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "Load libraries",
     "text": "Load libraries\nWe will start by loading necessary packages not readily available in the base R setup. By default, each code cell will be hidden - you can unhide specific cells by clicking on the gray CODE box on the right-hand side. You can also globally unhide all code cells at the top of the file.\n\nAs a reminder, every time you create a new R file, you should copy and run the following code snippet.\n\n\noptions(scipen = 999) # avoid scientific notation\nlibrary(RJDBC)\nlibrary(tidyverse)\nlibrary(ggrepel)\nlibrary(zoo)\nlibrary(sf)"
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html#establish-database-connection",
     "href": "Class Notebooks/07_disclosure_review.html#establish-database-connection",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "Establish database connection",
     "text": "Establish database connection\nThe following set of commands will set up a connection to the Redshift database:\n\ndbusr=Sys.getenv(\"DBUSER\")\ndbpswd=Sys.getenv(\"DBPASSWD\")\n\nurl &lt;- paste0(\"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;\",\n\"loginToRp=urn:amazon:webservices:govcloud;ssl=true;\",\n\"AutoCreate=true;idp_host=adfs.adrf.net;idp_port=443;\",\n\"ssl_insecure=true;\",\n\"plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\")\n\ndriver &lt;- JDBC(\n  \"com.amazon.redshift.jdbc42.Driver\",\n  classPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\",\n  identifier.quote=\"`\"\n)\n\ncon &lt;- dbConnect(driver, url, dbusr, dbpswd)\n\nFor this code to work, you need to have an .Renviron file in your user folder (i.e. U:\\\\John.Doe.P00002) containing your username and password.\nWe will also create folders for you to save your export files. Organizing files into two separate folders (for export and supporting documentation) will make the export process easier. The two folders we will create are:\n\nOutput for any graph or table we would like to export, and\nData for the underlying counts that created the figure or table.\n\nFirst we are going to pull your U:/ drive folder name and then create separate folders within for your export files. This code relies on a lot of string manipulation.\n\n# pull and check user name \nuser_name &lt;- substring(list.dirs(path = 'U:/', recursive = FALSE), 5)\n\n# run code to create directories\n# sprintf is a string manipulation function that enables us to use symbols as placeholders in R so we can interchange values in an expression\n# rather than rewriting all the queries, we can use sprintf to parameterize the queries, making them much more flexible\nmain_dir &lt;- (sprintf(\"U:\\\\%s\\\\WI_Class_Exports\\\\\", user_name))\nfigures_dir &lt;- (sprintf(\"U:\\\\%s\\\\WI_Class_Exports\\\\Output\\\\\", user_name))\ndata_dir &lt;- (sprintf(\"U:\\\\%s\\\\WI_Class_Exports\\\\Data\\\\\", user_name))\n\n\ndir_list &lt;- c(main_dir, figures_dir, data_dir)\n\n## Create directory for outputs if it doesn't already exist (won't overwrite anything)\nfor (dir in dir_list) {\n    if (dir.exists(file.path(dir)) == T){\n        print(sprintf(\"Output Directory %s Already Exists\", dir))\n    } else {\n        dir.create(file.path(dir))\n        print(sprintf(\"Created Output Directory %s\", dir))\n    }\n}"
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html#steps-for-export",
     "href": "Class Notebooks/07_disclosure_review.html#steps-for-export",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "Steps for Export",
     "text": "Steps for Export\nWe will adhere to the following steps in preparing this table for export:\n\nCreate columns containing the total counts of unique people and employers. This has already been done - you can do this by running the n_distinct() function\nRedact values\n\nvalues with individual counts below 10 and employer counts below 3 must be removed. We must include employer counts because the employer characteristics table is developed by aggregating the UI wage table.\n\nRound values\n\nCounts below 1000 rounded to the nearest ten\nCounts above or equal to 1000 rounded to the nearest hundred\nPercentages rounded to the nearest percent"
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html#preparation",
     "href": "Class Notebooks/07_disclosure_review.html#preparation",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "Preparation",
     "text": "Preparation\nThe code required to develop the final table is quite extensive and may be more simply accessed through the characterizing demand notebook - we will still copy all of this code in the cell below. If you want to explore the code in this notebook, you can expand the code box by clicking the code button on the right-hand side.\n\nqry &lt;- \"\nselect *\nfrom tr_wi_2023.employer_yearly_agg\n\"\n\nemployer_yearly_agg &lt;- dbGetQuery(con, qry) \n\nemployer_yearly_agg &lt;- employer_yearly_agg %&gt;%\n  mutate(ui_account = as.integer(ui_account))\n\nlast_employer &lt;- analytic_frame %&gt;%\n  filter(benefit_yr_start == as.Date(\"2022-03-20\"), benefit_claimed == \"Y\") %&gt;%\n  group_by(person_id) %&gt;%\n  filter(week_ending_date == min(week_ending_date)) %&gt;% \n  ungroup() %&gt;%\n  select(\n    person_id, \n    # rename to differentiate year\n    initial_claim_year = calendar_year,\n    last_employer\n  )\n\nnext_employer &lt;- analytic_frame %&gt;%\n  filter(!is.na(primary_employer_id)) %&gt;% \n  group_by(person_id) %&gt;%\n  # find all weeks of no benefit reception in their benefit year\n  filter(\n    week_ending_date &gt;= min(week_ending_date[benefit_yr_start == as.Date(\"2022-03-20\")], na.rm = TRUE),\n    benefit_claimed == \"N\"\n  ) %&gt;% \n  # of all those weeks, take first one\n  filter(week_ending_date == min(week_ending_date)) %&gt;%\n  ungroup() %&gt;%\n  select(\n    person_id, \n    next_employment_year = calendar_year, \n    next_employer = primary_employer_id \n  )\n\nemployers &lt;- last_employer %&gt;%\n  left_join(next_employer, by = 'person_id')\n\nfuture_claims_measure &lt;- analytic_frame %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    future_claims = case_when(\n      max(benefit_yr_start, na.rm = TRUE) &gt; as.Date(\"2022-03-20\") ~ TRUE,\n      TRUE ~ FALSE,\n    )\n  ) %&gt;%\n  ungroup()\n\n# positive growth rate when emp_rate &gt; 0\nnext_employer_growth_measure &lt;- employer_yearly_agg %&gt;%\n  mutate(\n    positive_emp_growth = avg_emp_rate &gt; 0\n  ) %&gt;%\n  # select relevant columns\n  select(\n    c(\"ui_account\", \"years\", \"avg_emp_rate\", \"positive_emp_growth\")\n  )\n\ncombined_measures_next &lt;- employers %&gt;%\n  select(\n    person_id, next_employer, next_employment_year\n  ) %&gt;%\n  mutate(\n    next_employer = as.integer(next_employer)\n  ) %&gt;%\n  left_join(\n    next_employer_growth_measure,\n    by = c(\n      \"next_employer\" = \"ui_account\",\n      \"next_employment_year\" = \"years\"\n    )\n  ) %&gt;%\n  left_join(future_claims_measure, by = \"person_id\")\n\ncombined_measures_next &lt;- combined_measures_next %&gt;%\n  group_by(positive_emp_growth, future_claims) %&gt;%\n  summarize(\n    n_people = n_distinct(person_id),\n    n_employers = n_distinct(next_employer)\n  ) %&gt;%\n  ungroup() %&gt;%\n  group_by(positive_emp_growth) %&gt;%\n  mutate(\n    perc = 100*n_people/sum(n_people)\n  )\n\nNow that we have redeveloped the table, we will prepare the resulting data frame for export.\n\nNote: We are replacing all values that do not satisfy our disclosure rules with NA.\n\n\nexport_1_data &lt;- combined_measures_next %&gt;% \n    mutate(\n      n_people_rounded = ifelse(n_people &lt; 1000, round(n_people, -1), round(n_people, -2)),\n      perc_rounded = ifelse(n_people &lt; 10 | n_employers &lt; 3, NA, round(100*n_people_rounded/sum(n_people_rounded),0))\n    )\n\nexport_1_data\n\nThis data frame now has all of the necessary underlying information for export review. After applying export rules, we highly recommend comparing the disclosure-proofed output to the original, which may also review complementary disclosure issues. Let’s save this data frame as a csv in our Data folder in our U: drive.\nAlthough this file will not be exported, it will be used by the export team to make sure the figure satisfies the disclosure requirements.\n\nNote: You will need a folder called “Data” to save the table using the code below, which was created at the beginning of the notebook.\n\n\n# save underlying data file\nwrite_csv(export_1_data, sprintf('%s/export_1_data.csv', data_dir))\n\nNow that we have saved the underlying counts that we need for the final table, we will now save the final table for export in our Output folder. We do this after removing the non-rounded counts and percentages, as well as any unnecessary columns.\n\nNote: In the corresponding documentation memo, we need to mention how the percentage is calculated. The percentage is calculated per positive_emp_growth value.\n\n\nexport_1 &lt;- export_1_data %&gt;% \n  select(positive_emp_growth, future_claims, n_people_rounded, perc_rounded)\n\nexport_1\n\nNow we’ll save it as a csv file in our Output folder.\n\nwrite_csv(export_1, sprintf('%s/export_1.csv', figures_dir))"
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html#steps-for-export-1",
     "href": "Class Notebooks/07_disclosure_review.html#steps-for-export-1",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "Steps for Export",
     "text": "Steps for Export\nWe will adhere to the following steps in preparing this table for export:\n\nCreate columns containing the total counts of unique people and employers. This has already been done, but you can do this by running the n_distinct() function\nRedact values\n\nValues with individual counts below 10 must be removed. We do not need to worry about employer counts because wage data are not present\n\nRound values\n\nCounts below 1000 rounded to the nearest ten\nCounts above or equal to 1000 rounded to the nearest hundred\n\n\nThe following code regenerates the underlying data frame for this plot.\n\nexit_rate_measure &lt;- analytic_frame %&gt;%\n  # just looking at benefit reception observations\n  filter(benefit_yr_start == \"2022-03-20\", normal_benefit_received == \"Y\") %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    last_week = max(week_ending_date),\n    last_week_id = max(week_id),\n    n_people = n_distinct(person_id)\n  )\n\nbenefit_start_id &lt;- analytic_frame %&gt;%\n  filter(week_ending_date == \"2022-03-26\") %&gt;%\n  distinct(week_id) %&gt;%\n  pull()\n\nexport_2 &lt;- exit_rate_measure %&gt;%\n  group_by(last_week, last_week_id) %&gt;%\n  summarize(\n    n_leaving = n()\n  ) %&gt;%\n  ungroup() %&gt;%\n  arrange(last_week_id) %&gt;%\n  #cumsum finds cumulative sum\n  mutate(\n    n_remaining = sum(n_leaving) - cumsum(n_leaving),\n    relative_week = last_week_id - benefit_start_id\n  )\n\nWe can now redact any counts below our threshold and apply our rounding rules.\n\nNote: the column n_leaving is the unique number of individuals exiting during the given week. We will need to note this in our documentation memo so the reviewers know that the sum of n_leaving and n_remaining in a week is equal to n_remaining from the previous week. Also, we need to make sure we do not over-redact. If we redact a n_remaining_rounded value because n_leaving is less than 10, then we might not have to redact the next n_remaining_rounded value if the difference between the previous two n_leaving values and the current n_leaving value is greater than 10. This code method will not suffice if there are more than two straight weeks with less than 10 individuals leaving between them."
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html#preparation-1",
     "href": "Class Notebooks/07_disclosure_review.html#preparation-1",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "Preparation",
     "text": "Preparation\n\nexport_2_data &lt;- export_2 %&gt;%\n    mutate(\n      n_remaining_rounded = ifelse(n_remaining &lt; 1000, round(n_remaining, -1), round(n_remaining, -2)), #apply initial rounding rules account for counts &lt; 1000 or &gt;= 1000 for number remaining\n      n_leaving_rounded = ifelse(n_leaving &lt; 1000, round(n_leaving, -1), round(n_leaving, -2)), #account for counts &lt; 1000 or &gt;= 1000\n      n_leaving_rounded = ifelse(n_leaving &lt; 10, NA, n_leaving_rounded), #apply rules \n      n_remaining_rounded = ifelse(n_leaving &lt; 10, NA, n_remaining_rounded), #apply disclosure rules \n      roll_sum = ifelse(n_leaving &lt; 10 & lag(n_leaving) &lt; 10, lag(rollsumr(n_leaving, 2)) , NA), #get rolling sum \n      flag = ifelse(!is.na(roll_sum), 1, 0) #creating flag if roll_sum exists\n  ) %&gt;%\n  mutate( #accounting for differences &gt; 10 for multiple relative_weeks\n    n_remaining_rounded = case_when(\n      n_remaining &lt; 1000 & flag == 1 & (relative_week %% 2 != 0) & roll_sum &gt; 9 ~ round(n_remaining, -1),\n      n_remaining &gt;= 1000 & flag == 1 & (relative_week %% 2 != 0) & roll_sum &gt; 9 ~ round(n_remaining, -2),\n      TRUE ~ n_remaining_rounded\n    )) %&gt;%\n  select(relative_week, n_leaving, n_remaining, n_leaving_rounded, n_remaining_rounded, roll_sum)\n\nexport_2_data\n\nThis is the final table that will use to create our bar plot. We need to save this for review in our Data folder.\n\n# save underlying data file\nwrite_csv(export_2_data, sprintf('%s/export_2_data.csv', data_dir))\n\nWe will now update the previous bar plot code with the variable corresponding to the redacted and rounded values. Keep in mind that any statistic we add to the plot also needs to be rounded. We will apply this to the code from the Visualization notebook.\n\n# find total cohort size\ncohort_size &lt;- export_2_data %&gt;%\n  filter(relative_week == 1) %&gt;%\n  summarize(\n    round((n_leaving_rounded + n_remaining_rounded),-2)\n  ) %&gt;%\n  pull()\n\ndata_start &lt;- export_2_data %&gt;%\n  filter(relative_week == 1) %&gt;%\n  pull(n_remaining_rounded)\n\ndata_end &lt;- export_2_data %&gt;%\n  filter(relative_week == 50) %&gt;%\n  pull(n_remaining_rounded)\n\n# graph and label horizontal line\nb_plot &lt;- ggplot(export_2_data, aes(x = relative_week, y = n_remaining_rounded)) + \n  geom_bar(stat = \"identity\") +\n  geom_hline(\n    yintercept = cohort_size/2,\n    linetype = \"dotted\",\n    color = \"red\",\n    size = 1.5\n  ) +\n  scale_x_continuous(\n    breaks = seq(0, 50, 5)\n  ) +\n  annotate(\n    geom = \"text\",\n    x = 40,\n    y = (cohort_size/2) + 50,\n    color = \"red\",\n    label = \"50% cutoff\"\n  ) +\n  annotate(geom = \"text\", \n           x= 3, \n           y = data_start,\n           color = \"black\",\n           label =  data_start) +\n  annotate(geom = \"text\", \n           x= 52,\n           y = data_end,\n           color = \"black\",\n           label = data_end\n  ) +\n  # update titles\n  labs(\n    title = \"The Exit Rate Slows by Week REDACTED\",\n    x = \"Week Since Benefit Year Start\", \n    y = \"Number Remaining on UI Benefits\",\n    subtitle = \"Exit Counts by Week Relative to Benefit Year Start in 2022\",\n    caption = \"Source: WI PROMIS data \\n Created by Irma Analyst, Ph.D.\"\n  ) +\n  # update theme\n  theme_classic()\n\nb_plot\n\nWe will remind you of how to save this final plot at the end of the notebook."
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html#steps-for-export-2",
     "href": "Class Notebooks/07_disclosure_review.html#steps-for-export-2",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "Steps for Export",
     "text": "Steps for Export\nAfter finding the median quarterly wages by benefit characteristics, we will need to accomplish the following tasks to ensure the file satisfies all disclosure rules:\n\nCreate fuzzy percentiles\n\nFuzzy median: Average the true 45th and 55th percentiles\n\nRedact values\n\nValues with individual counts below 10 and employer counts below 3 must be removed. Employer counts are required because the quarterly wages are derived from the UI wage records.\n\nRound values\n\nCounts below 1000 rounded to the nearest ten\nCounts above or equal to 1000 rounded to the nearest hundred\nWages must be rounded to the nearest 100\n\n\nThe code to develop the underlying data frame is quite extensive and may be more simply accessed through the measurement notebook - we will still copy all of this code in the cell below.\n\nquarters_in_range &lt;- analytic_frame %&gt;%\n  distinct(calendar_year, calendar_quarter) %&gt;%\n  filter(\n    calendar_year == 2021 & calendar_quarter %in% c(2,3,4) | calendar_year == 2022\n  ) %&gt;%\n  arrange(calendar_year, calendar_quarter) %&gt;%\n  mutate(\n    quarter_from_entry = row_number() - row_number()[calendar_year == 2022 & calendar_quarter == 1]\n  )\n\nclaim_frequency_measure &lt;- analytic_frame %&gt;% \n  # only focused on observations where benefits were claimed\n  filter(benefit_yr_start == \"2022-03-20\", benefit_claimed == \"Y\") %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    n_weeks_claimed = n(),\n    first_week_claimed = min(week_id),\n    last_week_claimed = max(week_id)\n  ) %&gt;%\n  mutate(\n    # add one because range is inclusive\n    duration = last_week_claimed - first_week_claimed + 1, \n    claim_frequency = if_else(\n      duration == n_weeks_claimed, \n      \"continuous\",\n      \"stuttered\"\n    )\n  ) %&gt;%\n  ungroup() %&gt;%\n  select(person_id, claim_frequency)\n\nspell_volume_measure &lt;- analytic_frame %&gt;%\n  filter(benefit_yr_start == \"2022-03-20\") %&gt;%\n  group_by(person_id) %&gt;%\n  summarize(\n    n_weeks_claimed = sum(benefit_claimed == \"Y\"),\n  ) %&gt;%\n  ungroup() %&gt;%\n  mutate(\n    spell_volume = case_when(\n      n_weeks_claimed &lt; quantile(n_weeks_claimed, probs = .25) ~ \"low\",\n      n_weeks_claimed &gt;= quantile(n_weeks_claimed, probs = .25) ~ \"high\"\n    ),\n    spell_volume = factor(spell_volume, c(\"low\", \"high\"), ordered = TRUE) # set as factor\n  ) %&gt;%\n  select(-n_weeks_claimed)\n\nmeasures &lt;- claim_frequency_measure %&gt;%\n  inner_join(spell_volume_measure, by = \"person_id\")\n\nNow that we have successfully generated our underlying data frame, we can begin to apply our export rules. Since we are showing median wages over time, instead of averages, we will need to calculate the fuzzy median because we cannot export true percentiles."
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html#preparation-2",
     "href": "Class Notebooks/07_disclosure_review.html#preparation-2",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "Preparation",
     "text": "Preparation\n\nexport_3_data &lt;- analytic_frame %&gt;%\n  inner_join(quarters_in_range, by = c(\"calendar_year\", \"calendar_quarter\")) %&gt;%\n  filter(employed_in_quarter == \"Y\") %&gt;%\n  distinct(person_id, quarter_from_entry, total_wages, primary_employer_id) %&gt;%\n  # add in person-level measures data frame\n  inner_join(measures, by = \"person_id\") %&gt;% \n  group_by(quarter_from_entry, spell_volume, claim_frequency) %&gt;%\n  summarize(\n    n_people = n_distinct(person_id),\n    n_employers = n_distinct(primary_employer_id),\n    median_wages = median(total_wages),\n    fuzzy_median = (quantile(total_wages, .45) + quantile(total_wages, .55))/2 #calculate fuzzy median\n  ) %&gt;%\n  ungroup()  %&gt;%\n    # if the subgroup satisfies disclosure rules, round to nearest hundred\n    # otherwise redact\n    mutate(\n        fuzzy_median_rounded = ifelse(n_people &lt; 10 | n_employers &lt; 3, NA, round(fuzzy_median, -2)) \n    )\n\nexport_3_data\n\nWe will want to submit this data frame as documentation for the line plot. We’ll save this as a csv in our Data folder.\n\nNote: We calculated distinct employers based on primary_employer_id. If a cell were to be redacted due to insufficient employer counts, we can join back to the original UI wage records table in case any individuals were employed by more than one employer - we can do this because we are evaluating total quarterly wages, not primary quarterly wages.\n\n\nwrite_csv(export_3_data, sprintf('%s/export_3_data.csv', data_dir))\n\nWith the export-safe data frames now available in our environment, we can re-run the code snippet used to create the line chart, saving it to l_plot. Keep in mind we are calculating median wage instead of average wage.\n\ndata_ends &lt;- export_3_data %&gt;% \n  filter(quarter_from_entry == 3)\n\nl_plot &lt;- export_3_data %&gt;% \n  ggplot(aes(x=quarter_from_entry,\n             y = fuzzy_median_rounded,\n             linetype = spell_volume,\n             color = claim_frequency)) +\n  geom_line() + \n  labs(\n    title = \"Claimants with REDACTED Spell Volumes have REDACTED Median Earnings in the Quarters Pre- and \\nPost- Benefit Entry\", \n    x = \"Quarter Relative to UI Benefit Start Year (March 2022)\", \n    y = \"Median Quarterly Wages\", \n    subtitle = \"Median Quarterly Wages by Benefit Characteristics Relative to 2022 UI Benefit Start Year\", \n    caption = \"Source: WI PROMIS and UI Wage data \\n Created by Irma Analyst, Ph.D.\",\n    color = \"Claim Frequency\",\n    linetype = \"Claim Volume\"\n  ) +\n  theme_classic() +\n  theme(\n    plot.caption.position = \"plot\"\n  ) +\n  scale_color_brewer(palette = \"Dark2\") +\n  geom_line(size = 1.3) + \n  # start y-axis at 0\n  expand_limits(y=0) +\n  # change x-axis tick mark frequency\n  geom_text_repel(\n    data = data_ends, \n    aes(label = fuzzy_median_rounded), \n    # adjust x-axis position of text\n    nudge_x = .3, \n    # only move text in y direction to ensure horizontal alignment\n    direction = \"y\"\n  ) +\n  # update scale to allow for more room on right side to fit labels\n  scale_x_continuous(\n    breaks = seq(from = -3, to = 3, by= 1),\n    limits = c(-3, 3.5)\n  )\n\nl_plot\n\nWe’ll save this figure at the end of the notebook."
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html#steps-for-export-3",
     "href": "Class Notebooks/07_disclosure_review.html#steps-for-export-3",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "Steps for Export",
     "text": "Steps for Export\n\nCreate columns containing the total counts of unique claimants. This has already been done, but you can do this by running the n_distinct() function. We don’t need employer counts because this file is not based on the UI wage records.\nRedact values\n\nValues with individual counts below 10 must be removed\n\nRound values\n\nCounts below 1000 rounded to the nearest ten\nCounts above or equal to 1000 rounded to the nearest hundred\n\n\nWe’ll pull in the data we and create the tables needed for the final map.\n\nqry &lt;- \"\nselect c.*, xwalk.county\nfrom tr_wi_2023.nb_cohort c \nleft join tr_wi_2023.wi_rdim_zip_county_wda_xwalk xwalk on (c.res_zip = xwalk.zip)\n\"\n\ncohort_cross_section &lt;- dbGetQuery(con, qry)\n\nclaims_by_county &lt;- cohort_cross_section %&gt;% \n  # convert to title name\n  mutate(county = str_to_title(county)) %&gt;%\n  group_by(county) %&gt;%\n  summarize(\n    n_claimants = n_distinct(ssn)\n  ) %&gt;%\n  ungroup()\n\nlabor_force &lt;- read_csv(\"P:/tr-wi-2023/Public Data/Labor Force - LAUS.csv\")\n\nh_plot_data &lt;- labor_force %&gt;%\n  mutate(\n    cnty_name = word(Area, 1, sep = \" County\"),\n    cnty_name = case_when(\n      cnty_name == \"St. Croix\" ~ \"Saint Croix\",\n      cnty_name == \"Fond du Lac\" ~ \"Fond Du Lac\",\n      TRUE ~ cnty_name\n    )\n  ) %&gt;%\n  # only use 2022 data since cross section is in 2022\n  filter(Year == 2022) %&gt;%\n  # don't need rest of the variables\n  select(cnty_name, `Labor Force`) %&gt;%\n  left_join(claims_by_county, by = c(\"cnty_name\" = \"county\")) %&gt;%\n  # multiply by 10000 to find rate per 10000 individuals\n  mutate(\n    claimant_rate = 10000 * coalesce(n_claimants / `Labor Force`,0)\n  )\n\ncounties &lt;- st_read(\n    \"P:/tr-wi-2023/Public Data/Support team upload/county_geographies.geojson\", \n    quiet = TRUE\n  ) %&gt;% \n  filter(STATEFP == 55)  #filter for Wisconsin\n\nThe data frame h_plot_data contains the variables of interest that we need to disclosure proof. Keep in mind the Labor Force variable comes from public data so we do not need to apply any disclosure rules to this. Thus, the only variable we need to worry about for is n_claimants, and then we will round the claimant rate to the nearest whole number (person)."
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html#preparation-3",
     "href": "Class Notebooks/07_disclosure_review.html#preparation-3",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "Preparation",
     "text": "Preparation\n\nh_plot_data &lt;- h_plot_data %&gt;% \n  mutate(\n    n_claimants_rounded = ifelse(n_claimants &lt; 1000, round(n_claimants, -1), round(n_claimants, -2)),\n    n_claimants_rounded = ifelse(n_claimants &lt; 10, NA, n_claimants_rounded),\n    claimant_rate_rounded = round(10000 * coalesce(n_claimants_rounded / `Labor Force`), 0)) #round to the nearest person\n                                 \nh_plot_data\n\nWe can now save this data frame as a supporting file.\n\nwrite_csv(h_plot_data, sprintf('%s/export_4_data.csv', data_dir))\n\nWith the proper data frames now available in our environment, we can re-run the code snippet used to create the map, saving it to h_plot.\n\nhigh_counties &lt;- h_plot_data %&gt;%\n  top_n(5, claimant_rate_rounded) %&gt;%\n  inner_join(counties, by = c(\"cnty_name\" = \"NAME\"))\n\nh_plot &lt;- counties %&gt;%\n  left_join(h_plot_data, by = c(\"NAME\" = \"cnty_name\")) %&gt;%\n  ggplot() + \n  geom_sf(aes(fill=claimant_rate_rounded)) +\n  scale_fill_viridis_c() +\n  geom_label_repel(data = high_counties,\n                   aes(label = cnty_name, geometry = geometry),\n                   stat = \"sf_coordinates\",\n                   min.segment.length = 0) + \n  labs(\n    title = \"Wisconsin Counties with the 5 highest UI Claim Rates\",\n    subtitle = \"Per 10,000 Labor force participants\", \n    fill = \"Claimants\",\n    caption = \"Source: Wisconsin PROMIS data and BLS\\n Created by Irma Analyst, Ph.D.\" \n  )\n\nh_plot\n\nNote that with the redaction rules, the counties with the five highest claim rates are slightly different than those noted prior to applying the disclosure controls."
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html#png",
     "href": "Class Notebooks/07_disclosure_review.html#png",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "PNG",
     "text": "PNG\n\nggsave(b_plot, \n       filename =  sprintf('%s/WI_bar_plot.png', figures_dir), \n       dpi = \"print\", width = 7, height = 5)"
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html#jpeg",
     "href": "Class Notebooks/07_disclosure_review.html#jpeg",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "JPEG",
     "text": "JPEG\n\nggsave(l_plot, \n       filename =  sprintf('%s/WI_line_plot.jpeg', figures_dir), \n       dpi = \"print\", width = 7, height = 5)"
   },
   {
     "objectID": "Class Notebooks/07_disclosure_review.html#pdf",
     "href": "Class Notebooks/07_disclosure_review.html#pdf",
-    "title": "Disclosure Review",
+    "title": "8  Disclosure Review",
     "section": "PDF",
     "text": "PDF\n\nggsave(h_plot, \n       filename = sprintf('%s/WI_heat_map.pdf', figures_dir),\n       dpi = \"print\", width = 7, height = 7)"
   },
   {
     "objectID": "Class Notebooks/dimensional_model_creation_scripts.html#table-creation",
     "href": "Class Notebooks/dimensional_model_creation_scripts.html#table-creation",
-    "title": "Dimensional Model Scripts",
+    "title": "9  Dimensional Model Scripts",
     "section": "01 Table Creation",
     "text": "01 Table Creation\n\nCREATE TABLE IF NOT EXISTS tr_wi_2023.zip_county_wda_xwalk\n(\n    zip VARCHAR(5) NOT NULL  ENCODE lzo\n    ,county VARCHAR(11)   ENCODE lzo\n    ,wda SMALLINT   ENCODE az64\n    ,PRIMARY KEY (zip)\n)\nDISTSTYLE AUTO\n;\nALTER TABLE tr_wi_2023.zip_county_wda_xwalk owner to dbadmin11;\nGRANT ALL ON TABLE tr_wi_2023.zip_county_wda_xwalk TO dbadmin11;\nGRANT SELECT, INSERT, DELETE, UPDATE ON TABLE tr_wi_2023.zip_county_wda_xwalk TO group ci_data_eng;\nGRANT SELECT ON TABLE tr_wi_2023.zip_county_wda_xwalk TO group ci_read_group;\nGRANT SELECT ON TABLE tr_wi_2023.zip_county_wda_xwalk TO group db_t00111_ro;\n\n------------------------\n\nCREATE TABLE IF NOT EXISTS tr_wi_2023.wi_rdim_week\n(\n    week_id SMALLINT NOT NULL  ENCODE az64\n    ,week_code DATE NOT NULL  ENCODE az64\n    ,quarter_code CHAR(6)   ENCODE lzo\n    ,calendar_year SMALLINT   ENCODE az64\n    ,calendar_quarter SMALLINT   ENCODE az64\n    ,PRIMARY KEY (week_id)\n)\nDISTSTYLE AUTO\n;\nALTER TABLE tr_wi_2023.wi_rdim_week owner to dbadmin11;\nGRANT ALL ON TABLE tr_wi_2023.wi_rdim_week TO dbadmin11;\nGRANT SELECT, INSERT, DELETE, UPDATE ON TABLE tr_wi_2023.wi_rdim_week TO group ci_data_eng;\nGRANT SELECT ON TABLE tr_wi_2023.wi_rdim_week TO group ci_read_group;\nGRANT SELECT ON TABLE tr_wi_2023.wi_rdim_week TO group db_t00111_ro;\n\n-----\n\nCREATE TABLE IF NOT exists tr_wi_2023.wi_rdim_industry\n(\n    code VARCHAR(18)   ENCODE lzo\n    ,title VARCHAR(357)   ENCODE lzo\n    ,description VARCHAR(24000)   ENCODE lzo\n)\nDISTSTYLE AUTO\n;\nALTER TABLE tr_wi_2023.wi_rdim_industry owner to dbadmin11;\n\n-- Permissions\n\nGRANT ALL ON TABLE tr_wi_2023.wi_rdim_industry TO dbadmin11;\nGRANT INSERT, SELECT, UPDATE,  DELETE ON TABLE tr_wi_2023.wi_rdim_industry TO group ci_data_eng;\nGRANT ALL ON TABLE tr_wi_2023.wi_rdim_industry TO group ci_read_group;\nGRANT SELECT ON TABLE tr_wi_2023.wi_rdim_industry TO group db_t00111_ro;\n\n-----------------------\nCREATE TABLE IF NOT exists tr_wi_2023.wi_rdim_occupation\n(\n    SOC_code VARCHAR(18)   ENCODE lzo\n    ,SOC_title VARCHAR(357)   ENCODE lzo\n)\nDISTSTYLE AUTO\n;\nALTER TABLE tr_wi_2023.wi_rdim_occupation owner to dbadmin11;\n\n-- Permissions\n\nGRANT ALL ON TABLE tr_wi_2023.wi_rdim_occupation TO dbadmin11;\nGRANT INSERT, SELECT, UPDATE,  DELETE ON TABLE tr_wi_2023.wi_rdim_occupation TO group ci_data_eng;\nGRANT ALL ON TABLE tr_wi_2023.wi_rdim_occupation TO group ci_read_group;\nGRANT SELECT ON TABLE tr_wi_2023.wi_rdim_occupation TO group db_t00111_ro;\n\n------------------------\n\nCREATE TABLE IF NOT exists tr_wi_2023.wi_mdim_person\n(\n    ssn VARCHAR(64)   ENCODE lzo\n    ,person_id int identity not null\n    ,gender INT   ENCODE az64\n    ,race VARCHAR(3) ENCODE lzo\n    ,ethnicity VARCHAR(3) ENCODE lzo\n    ,birth_date DATE ENCODE az64\n    ,primary key(person_id)\n)\nDISTSTYLE AUTO\n;\nALTER TABLE tr_wi_2023.wi_mdim_person owner to dbadmin11;\n\n-- Permissions\n\nGRANT ALL ON TABLE tr_wi_2023.wi_mdim_person TO dbadmin11;\nGRANT INSERT, SELECT, UPDATE,  DELETE ON TABLE tr_wi_2023.wi_mdim_person TO group ci_data_eng;\nGRANT ALL ON TABLE tr_wi_2023.wi_mdim_person TO group ci_read_group;\nGRANT SELECT ON TABLE tr_wi_2023.wi_mdim_person TO group db_t00111_ro;\n\n------------------------ \n\nCREATE TABLE IF NOT EXISTS tr_wi_2023.wi_fact_weekly_observation\n(\n    person_id int not null  \n    ,week_ending_date date\n    ,week_id smallint NOT NULL  \n    ,benefit_claimed varchar(1) not null \n    ,benefit_yr_start date\n    ,normal_benefit_received varchar(1) not null\n    ,ic_claim_date date\n    ,last_employer_naics varchar(6)\n    ,last_employer int\n    ,eligible_benefit_amount int\n    ,earnings_during_wk varchar(3)\n    ,entitlement int\n    ,veteran_status int4\n    ,commuter varchar(3)\n    ,education varchar(3)\n    ,disability varchar(3)\n    ,ic_type varchar(3)\n    ,occupation varchar\n    ,program_type varchar(1)\n    ,status_code varchar(3)\n    ,stop_payment_indicator varchar(1)\n    ,waiting_week varchar(1)\n    ,res_zip varchar(5)\n    ,employed_in_quarter varchar(1) not null \n    ,calendar_year smallint\n    ,calendar_quarter smallint \n    ,primary_employer_id varchar(10)\n    ,primary_employer_wages int\n    ,total_wages int\n    ,employer_count SMALLINT\n    ,PRIMARY KEY (person_id, week_id)\n)\nDISTSTYLE auto\n;\nALTER TABLE tr_wi_2023.wi_fact_weekly_observation owner to dbadmin11;\nGRANT ALL ON TABLE tr_wi_2023.wi_fact_weekly_observation TO dbadmin11;\nGRANT SELECT, INSERT, DELETE, UPDATE ON TABLE tr_wi_2023.wi_fact_weekly_observation TO group ci_data_eng;\nGRANT SELECT ON TABLE tr_wi_2023.wi_fact_weekly_observation TO group ci_read_group;\nGRANT SELECT ON TABLE tr_wi_2023.wi_fact_weekly_observation TO group db_t00111_ro;\n\nALTER TABLE tr_wi_2023.wi_fact_weekly_observation\nADD FOREIGN KEY (person_id) \nREFERENCES tr_wi_2023.wi_mdim_person(person_id)\n\nALTER TABLE tr_wi_2023.wi_fact_weekly_observation\nADD FOREIGN KEY (week_id) \nREFERENCES tr_wi_2023.wi_rdim_week(week_id)\n\nALTER TABLE tr_wi_2023.wi_fact_weekly_observation\nADD FOREIGN KEY (res_zip) \nREFERENCES tr_wi_2023.wi_rdim_zip_county_wda_xwalk(zip)\n\nALTER TABLE tr_wi_2023.wi_fact_weekly_observation\nADD FOREIGN KEY (occupation) \nREFERENCES tr_wi_2023.wi_rdim_occupation(soc_code)\n\nALTER TABLE tr_wi_2023.wi_fact_weekly_observation\nADD FOREIGN KEY (last_employer_naics) \nREFERENCES tr_wi_2023.wi_rdim_industry(code)"
   },
   {
     "objectID": "Class Notebooks/dimensional_model_creation_scripts.html#linkage-reference-dimension-load-scripts",
     "href": "Class Notebooks/dimensional_model_creation_scripts.html#linkage-reference-dimension-load-scripts",
-    "title": "Dimensional Model Scripts",
+    "title": "9  Dimensional Model Scripts",
     "section": "02 Linkage Reference Dimension Load Scripts",
     "text": "02 Linkage Reference Dimension Load Scripts\n\n--populate week table \nCREATE OR REPLACE PROCEDURE populating_wi_rdim_week() --need TO CREATE PROCEDURE \nLANGUAGE plpgsql\nas $$\ndeclare\n    StartDate DATE := '2006-01-07'; -- setting START date \n    EndDate DATE := '2023-12-30'; -- setting END date \n    Date DATE := StartDate; -- setting the variable date TO increate WITH EACH loop \n    ID smallint :=1; -- setting id variable \n\nbegin\n    while Date &lt;= EndDate loop -- starting the while loop \n        \n        raise notice 'this is the date %', Date; -- printing OUT what date im on\n    \n        insert into tr_wi_2023.wi_rdim_week (\n        week_id, \n        week_code, \n        quarter_code, \n        calendar_year, \n        calendar_quarter\n        )\n        values(\n        ID, -- Quarter ID\n        Date, -- Week Code\n        CAST(DATE_PART(y,Date) AS CHAR(4)) + 'Q' + CAST(DATE_PART(qtr,Date) AS CHAR(1)), --Quarter_Code\n        DATE_PART(y,Date), --Calendar_Year\n        DATE_PART(qtr,Date) --Calendar_Quarter\n        );\n        Date := dateadd(w,1,Date); --increasing the date variable \n        ID := ID + 1; -- incresing the id variable \n    end loop;\n    raise info 'loop ended';\n\nend;\n$$;\n\ncall populating_wi_rdim_week();\n\n\n-------\n-- populate occupation table\n\n--only 2-digit soc presently reported in PROMIS\nselect distinct substring(occupation, 1, 2) from ds_wi_dwd.promis p order by 1;\nselect distinct substring(occupation, 3, 8) from ds_wi_dwd.promis p;\n\n\n--can join to soc_xwalk where soc_group = 'Major' without duplicating join \n--when just using first two characters in soc_title from soc_xwalk \nselect distinct p.occupation, sx.soc_title  \nfrom ds_wi_dwd.promis p \nleft join ds_public_1.soc_xwalk sx \non substring(p.occupation, 1, 2) = substring(sx.soc_code, 1, 2) and sx.soc_group = 'Major'\norder by 1;\n\n--will take just first two characters from soc_code into reference table and limit xwalk to just rows where \n-- soc_group = 'Major'\ninsert into tr_wi_2023.wi_rdim_occupation(\n    soc_code,\n    soc_title\n)\nselect \nsubstring(soc_code, 1, 2) as soc_code,\nsoc_title\nfrom ds_public_1.soc_xwalk sx \nwhere sx.soc_group = 'Major'\n\n\n--- industry table\ninsert into tr_wi_2023.wi_rdim_industry(\n    code,\n    title,\n    description\n)\nselect \ncode,\ntitle,\ndescription\nfrom ds_public_1.naics_descriptions"
   },
   {
     "objectID": "Class Notebooks/dimensional_model_creation_scripts.html#person-master",
     "href": "Class Notebooks/dimensional_model_creation_scripts.html#person-master",
-    "title": "Dimensional Model Scripts",
+    "title": "9  Dimensional Model Scripts",
     "section": "03 Person Master",
     "text": "03 Person Master\n\n--MASTERING RULES: IF GENDER, RACE, ETHNICITY, OR BIRTH_DATE CHANGE FOR AN SSN OVER TIME\n-- TAKE THE MOST COMMON VALUE. IN TIES, PRIORITIZE KNOWN VALUES OVER UNKNOWN\n\n-- first will find number of ssn values this concerns \nwith mult as (\n    SELECT ssn\n    FROM ds_wi_dwd.promis p\n    group by ssn \n    having count(distinct(gender)) &gt; 1 or count(distinct(race)) &gt; 1 or count(distinct(ethnicity)) &gt; 1 or count(distinct(birth_date)) &gt; 1 \n)\nselect count(distinct(ssn))\nfrom mult \n\n\n--MASTERING CODE \ninsert into tr_wi_2023.wi_mdim_person(\n    ssn\n    ,gender\n    ,race\n    ,ethnicity \n    ,birth_date\n)\nwith gender_tab as\n(\n    select ssn \n        , gender \n        , ROW_NUMBER() over(partition by ssn order by count(*) desc,\n        --PRIORITIZE KNOWN GENDER 1 AND 2 OVER UNKNOWN 0 AND 3\n            case when gender = 0 then 2\n            when gender = 1 then 1 \n            when gender = 2 then 1 \n            when gender = 3 then 2\n            else 3 end \n        ) as RowNum\n    from ds_wi_dwd.promis \n    group by ssn, gender    \n),\nrace_tab as\n(\n    select ssn \n        , race \n        , ROW_NUMBER() over(partition by ssn order by count(*) desc,\n        --PRIORITIZE KNOWN RACE OVER UNKNOWN, UNKNOWN OVER NULL \n            case when race in (1,2,3,4,5,8) then 1\n            when race in (0, 6) then 2\n            when race is null then 3\n            else 4 end \n    ) as RowNum\n    from ds_wi_dwd.promis \n    group by ssn, race \n),\nethnicity_tab as (\n    select ssn \n        , ethnicity \n        , ROW_NUMBER() over(partition by ssn order by count(*) desc,\n        --PRIORITIZE KNOWN ETHNICITY OVER UNKNOWN, UNKNOWN OVER NULL \n            case when ethnicity in ('Y', 'N') then 1\n            when ethnicity = '*' then 2\n            when ethnicity is null then 3\n            else 4 end \n    ) as RowNum\n    from ds_wi_dwd.promis \n    group by ssn, ethnicity \n),\nbirthdate_tab as (\n    select ssn \n        , birth_date \n        , ROW_NUMBER() over(partition by ssn order by count(*) desc,\n        --PRIORITIZE KNOWN BIRTH_DATE OVER IMPOSSIBLE, NULL OVER IMPOSSIBLE\n            case when birth_date is null then 2\n            when extract(year from birth_date) &gt; 2023 then 3\n            else 1 end \n    ) as RowNum\n    from ds_wi_dwd.promis \n    group by ssn, birth_date  \n),\nmastered_tab as (\n    select g.ssn, g.gender, r.race, e.ethnicity, b.birth_date \n    from gender_tab g \n    inner join race_tab r on g.ssn = r.ssn and g.RowNum = 1 and r.RowNum = 1\n    inner join ethnicity_tab e on g.ssn = e.ssn and e.RowNum = 1\n    inner join birthdate_tab b on g.ssn = b.ssn and b.RowNum = 1\n),\n--MAKE SURE WE GET EVERYONE IN WAGE RECORDS WHO DOESN'T SHOW UP IN PROMIS TOO\nunique_ssn_wage as (\n    select distinct ssn\n    from ds_wi_dwd.ui_wage uw \n)\n--JOIN EVERYONE IN PROMIS TO EVERYONE IN WAGE WHO DOESN'T SHOW UP IN PROMIS TO GET FULL SET OF PEOPLE \nselect \ncoalesce(m.ssn, u.ssn) as ssn \n, m.gender, m.race\n, m.ethnicity\n, m.birth_date\nfrom mastered_tab m \nfull outer join unique_ssn_wage u \non m.ssn = u.ssn ;"
   },
   {
     "objectID": "Class Notebooks/dimensional_model_creation_scripts.html#fact-table-load",
     "href": "Class Notebooks/dimensional_model_creation_scripts.html#fact-table-load",
-    "title": "Dimensional Model Scripts",
+    "title": "9  Dimensional Model Scripts",
     "section": "04 Fact Table Load",
     "text": "04 Fact Table Load\n\ninsert into tr_wi_2023.wi_fact_weekly_observation(\n    person_id  \n    ,week_ending_date\n    ,week_id  \n    ,benefit_claimed \n    ,benefit_yr_start\n    ,normal_benefit_received\n    ,ic_claim_date\n    ,last_employer_naics\n    ,last_employer\n    ,eligible_benefit_amount\n    ,earnings_during_wk\n    ,entitlement\n    ,veteran_status\n    ,commuter\n    ,education\n    ,disability\n    ,ic_type\n    ,occupation\n    ,program_type\n    ,status_code\n    ,stop_payment_indicator\n    ,waiting_week\n    ,res_zip\n    ,employed_in_quarter \n    ,calendar_year\n    ,calendar_quarter\n    ,primary_employer_id\n    ,primary_employer_wages\n    ,total_wages\n    ,employer_count\n)\n--FOR HERE: SSN WHERE CLAUSE TO SPEED UP QUERY\n--get all week/person combos \nwith person_quarter_combos as (\n    select wmp.ssn, wrw.week_code  \n    from tr_wi_2023.wi_mdim_person wmp \n    cross join tr_wi_2023.wi_rdim_week wrw \n    --where wmp.ssn = 'REDACTED'\n),\npromis_info as (\n    select \n    p.ssn\n    ,week_ending_date as claim_week\n    ,effective_date_of_claim as benefit_yr_start \n    ,case \n        when ic_type is null and monetarily_eligible = 'Y' and stop_payment_indicator = 'N' and waiting_week = 'N' and entitlement = 0 then 'Y'\n        else 'N' \n        end as normal_benefit_received\n    ,ic_claim_date \n    ,last_employer_naics\n    ,ui_number as last_employer \n    ,weekly_benefit_amount as eligible_benefit_amount \n    ,earnings_during_wk \n    ,entitlement \n    ,veteran_status\n    ,commuter \n    ,education \n    ,disability \n    ,ic_type \n    ,substring(occupation, 1, 2) as occupation \n    ,program_type \n    ,status_code \n    ,stop_payment_indicator \n    ,waiting_week \n    ,res_zip \n    from ds_wi_dwd.promis p \n),\n--FOR NOW: IGNORE ALL WAGE RECORDS WHERE UI ACCOUNT NUMBER ISN'T ALL INTEGERS\nWage_Rank AS (\n    SELECT\n    w.ssn,\n    w.year,\n    w.quarter,\n    ROW_NUMBER() OVER(PARTITION BY w.ssn, w.year, w.quarter ORDER BY w.wage DESC) AS RANK,\n    w.wage,\n    w.ui_account\n    FROM ds_wi_dwd.ui_wage w\n    where w.wage &gt; 0 and substring(w.ui_account,1,1) != 'M' \n),\nPrimary_Employer_Wage AS (\n    select\n    WR.ssn,\n    WR.year,\n    WR.quarter,\n    WR.wage AS Primary_Employer_Wages,\n    WR.ui_account as Primary_Employer_ID\n    from Wage_Rank WR\n    WHERE\n    WR.RANK=1\n),\nAll_Employer_Wage AS (\n    SELECT \n    WR.ssn,\n    WR.year,\n    WR.quarter,\n    COUNT(WR.ui_account) AS Employer_Count,\n    SUM(WR.wage) AS Total_Wages\n    FROM \n    Wage_Rank WR\n    GROUP BY\n    WR.ssn,\n    WR.year, \n    WR.quarter \n)\nselect person.person_id\n,pq.week_code as week_ending_date\n,week.week_id\n,case when pi.claim_week is null then 'N' else 'Y' end as benefit_claimed\n,pi.benefit_yr_start\n,case when pi.normal_benefit_received is null then 'N' else pi.normal_benefit_received end as normal_benefit_received\n,pi.ic_claim_date\n,pi.last_employer_naics\n,pi.last_employer\n,pi.eligible_benefit_amount \n,pi.earnings_during_wk\n,pi.entitlement \n,pi.veteran_status\n,pi.commuter \n,pi.education \n,pi.disability \n,pi.ic_type \n,pi.occupation \n,pi.program_type \n,pi.status_code \n,pi.stop_payment_indicator \n,pi.waiting_week \n,pi.res_zip\n,CASE WHEN PEW.ssn IS NULL THEN 'N' ELSE 'Y' END AS employed_in_quarter\n,week.calendar_year\n,week.calendar_quarter\n,PEW.Primary_Employer_ID as primary_empoyer_id\n,PEW.Primary_Employer_Wages as primary_employer_wages\n,AEW.Total_Wages as total_wages\n,AEW.Employer_Count as employer_count\nfrom person_quarter_combos pq\njoin tr_wi_2023.wi_mdim_person person on (pq.ssn = person.ssn)\njoin tr_wi_2023.wi_rdim_week week on (pq.week_code = week.week_code)\nleft join promis_info pi on (pq.ssn = pi.ssn) and (pq.week_code = pi.claim_week)\nLEFT JOIN Primary_Employer_Wage PEW ON (PEW.ssn=pq.ssn) AND (PEW.quarter=week.calendar_quarter) and (PEW.year = week.calendar_year) \nLEFT JOIN All_Employer_Wage AEW ON (AEW.ssn=pq.ssn) AND (AEW.quarter=week.calendar_quarter) and (AEW.year = week.calendar_year)\norder by person.person_id, pq.week_code;"
   },
   {
     "objectID": "supplemental Materials/supplemental_data_model.html",
     "href": "supplemental Materials/supplemental_data_model.html",
-    "title": "Data Model Construction for Longitudinal Analysis",
+    "title": "10  Data Model Construction for Longitudinal Analysis",
     "section": "",
-    "text": "Introduction\nThis supplemental notebook covers record linkage and creating a linked data model to facilitate longitudinal analyses.\nAnalyses involving administrative data often require:\nThis notebook will introduce and demonstrate some helpful techniques for linking administrative data while mediating the above issues. The output of the notebook should provide a flexible and performant framework that meets the needs of most projects and can be easily customized to include additional variables or characteristics.\nThe linked data assets documented in this notebook have already been completely created and loaded in the tr_wi_2023 schema as tables beginning with a “wi” prefix. This notebook will not create or load duplicative copies of the linked dataset, but rather cover the techniques used to construct and load the model and hopefully serve as a resource to use when building future linked datasets.\nHere, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Record linkage and Dimensional Modeling.\nRecord linkage is an important component of any analysis, unless you have a fictitious perfectly curated dataset with no messiness or missing variables, and especially when it comes to linking administrative records. Unlike survey data that allows for perfectly selected variables with some potential for messiness, administrative data is tailored to administrative purposes (not academic). That means that we will not have all of the variables we ideally want, and it also means that the data can be messy (either missing responses or with variables that we may not quite understand or have at our disposal). While we may not directly address missing responses (more on indirectly addressing this in the inference lecture), we can enrich our data set by pulling in relevant information from other sources.\nTo facilitate easy and performant analysis of very large record sets (quarterly wages, PROMIS file), we will be formatting the data in a dimensional model. This type of model:\nThe modeling process involves “conjugating” the data into events and observations (verbs/facts) and the entities and attributes with which they are associated and by which they are analyzed (nouns/dimensions) (Kimball and Ross, 2019).\nThe SQL scripts for the actual creation of the dimensional model used in this notebook are in a subfolder within the “Notebooks” subfolder titled “Linked Data Model Scripts.” They include statements to create the tables and foreign key constraints used to enforce relational integrity and enhance query performance. A logical diagram for this dimensional model has been added to the References page of the class site.\nYou will not need to create additional tables, but you may wish to review the SQL as a reference for creating dimensional models for future projects.\nUnlike reference data that is consistent across states (NAICS, SOC), master data refer to the unique collection of persons, employers, or households served by each state. A state can have many different references to the same real-world entity, and mastering is the processing of assembling a set that has one member (record) for each unique instance of an entity in the real world.\nThis master record can merge attributes from multiple sources, resulting in a “golden record” with a higher completeness than is available in individual sources. When multiple references to the same entity have different values, those differences are resolved through a process called survivorship in which decisions are made about which value to keep (most recent, most frequent, highest quality source, etc.).\nIn our example, due to the messy nature of administrative data, there are individuals whose gender, race, ethnicity, and birth date values change over time, and even within the same case. First, let’s check how many individuals this concerns.\nqry &lt;- \"\nWITH mult AS (\n    SELECT ssn\n    FROM ds_wi_dwd.promis p\n    GROUP BY ssn \n    HAVING COUNT(DISTINCT(gender)) &gt; 1 or \n      COUNT(DISTINCT(race)) &gt; 1 or \n      COUNT(DISTINCT(ethnicity)) &gt; 1 or \n      COUNT(DISTINCT(birth_date)) &gt; 1 \n)\nSELECT COUNT(DISTINCT(ssn))\nFROM mult\"\n\ndbGetQuery(con, qry)\nWe see that there are hundreds of individuals that have multiple values for race, ethnicity, gender, or birth date. We can resolve these misalignments in a variety of ways, and here, we will do so by finding their most common value. In cases where there is a tie, we will prioritize known values over unknown ones.\nmastering_code &lt;- \"\nWITH gender_tab AS\n(\n    SELECT ssn \n        , gender \n        , ROW_NUMBER() OVER(PARTITION BY ssn ORDER BY count(*) DESC,\n        --prioritize known gender 1 and 2 over unknown 0 and 3 in ties\n            CASE WHEN gender = 0 THEN 2\n            WHEN gender = 1 THEN 1 \n            WHEN gender = 2 THEN 1 \n            WHEN gender = 3 THEN 2\n            ELSE 3 END \n        ) AS RowNum\n    FROM ds_wi_dwd.promis \n    GROUP BY ssn, gender    \n),\nrace_tab AS\n(\n    SELECT ssn \n        , race \n        , ROW_NUMBER() OVER(PARTITION BY ssn ORDER BY count(*) DESC,\n        --prioritize known race over unknown, unknown over null\n            CASE WHEN race IN (1,2,3,4,5,8) THEN 1\n            WHEN race IN (0, 6) THEN 2\n            WHEN race IS NULL THEN 3\n            ELSE 4 END \n    ) AS RowNum\n    FROM ds_wi_dwd.promis \n    GROUP BY ssn, race \n),\nethnicity_tab AS (\n    SELECT ssn \n        , ethnicity \n        , ROW_NUMBER() OVER(PARTITION BY ssn ORDER BY count(*) DESC,\n        --prioritize known ethnicity over unknown, unknown over null \n            CASE WHEN ethnicity in ('Y', 'N') THEN 1\n            WHEN ethnicity = '*' THEN 2\n            WHEN ethnicity IS NULL THEN 3\n            ELSE 4 END\n    ) AS RowNum\n    FROM ds_wi_dwd.promis \n    GROUP BY ssn, ethnicity \n),\nbirthdate_tab AS (\n    SELECT ssn \n        , birth_date \n        , ROW_NUMBER() OVER(PARTITION BY ssn ORDER BY count(*) DESC,\n        --prioritize known birth_date over impossible, null over impossible\n            CASE WHEN birth_date IS NULL THEN 2\n            WHEN EXTRACT(year FROM birth_date) &gt; 2023 THEN 3\n            ELSE 1 END \n    ) AS RowNum\n    FROM ds_wi_dwd.promis \n    GROUP BY ssn, birth_date  \n),\nmastered_tab AS (\n    SELECT g.ssn, g.gender, r.race, e.ethnicity, b.birth_date \n    FROM gender_tab g \n    INNER JOIN race_tab r ON g.ssn = r.ssn AND g.RowNum = 1 AND r.RowNum = 1\n    INNER JOIN ethnicity_tab e ON g.ssn = e.ssn AND e.RowNum = 1\n    INNER JOIN birthdate_tab b ON g.ssn = b.ssn AND b.RowNum = 1\n),\n--make sure we get everyone in wage records who doesn't show up in promis too\nunique_ssn_wage AS (\n    SELECT DISTINCT ssn\n    FROM ds_wi_dwd.ui_wage uw \n)\n--join everyone in promis to everyone in wage who doesn't show up in promis to get full set of people\nSELECT\nCOALESCE(m.ssn, u.ssn) as ssn \n, m.gender, m.race\n, m.ethnicity\n, m.birth_date\nFROM mastered_tab m \nFULL OUTER JOIN unique_ssn_wage u ON m.ssn = u.ssn \nLIMIT 5\n--the actual load is an insert of more than 6 million rows...this selects a sample of 5\n--any with NA for gender, race, ethnicity, and birth date likely only available in wage records\n\"\ndbGetQuery(con, mastering_code)\nThe fact table stores the actual observations (facts) of interest. Since this table often contains large numbers of records, it will ideally be comprised of a small number of bytes per row and primarily consist of indexed foreign keys to dimension tables and observation-specific measures. This allows for storage of large records sets with low storage cost and high query performance (extremely helpful for supporting dashboards).\nIn this example, the fact table is at the grain of one row per person per week. We will create a record for every week between the first and last observations of a person for both employment and PROMIS data sets, regardless of employment or PROMIS participation in a given week. These “missing” observation weeks are materialized because unemployment and non-participation may be just as interesting for some analyses and longitudinal analysis benefits from consistent representation across time periods of consistent grain.\nSome of our cohort members have observations for multiple employers in a single quarter. Since our unit of analysis is the person, not the person-employer combination, we need to resolve these one-to-many relationships into a single observation while retaining the information pertinent to analysis. In this example, the primary employer and associated wages were identified and recorded based on the employer with the largest wages in the quarter. In order to minimize loss of potentially relevant information, the total wages and number of employers are also included on each observation.\nWe can also simplify more complex logic during the load process in order to make analysis easier, more performant, and more consistent across products. For example, in this query, we are decoding the combination of variables consisting of benefit reception to create a simpler flag for identifying which individuals actually received benefits.\nThe fact table is at the grain of one row per person per week. However, the UI wages is on the grain of one row per person per employer per quarter. We kept the UI wages on the quarter grain, so keep that in mind when creating employment measures. The corresponding notebook demonstrating use cases of the fact table (and overall data model) displays this in more detail.\nfact_qry &lt;- \"\n--get all week/person combos \nWITH person_quarter_combos AS (\n    SELECT wmp.ssn, wrw.week_code  \n    FROM tr_wi_2023.wi_mdim_person wmp \n    CROSS JOIN tr_wi_2023.wi_rdim_week wrw \n    WHERE wmp.ssn = 'REDACTED'\n),\npromis_info AS (\n    SELECT \n    p.ssn\n    ,week_ending_date AS claim_week\n    ,effective_date_of_claim AS benefit_yr_start \n    ,CASE \n        WHEN ic_type IS NULL AND monetarily_eligible = 'Y' AND stop_payment_indicator = 'N'\n        AND waiting_week = 'N' AND entitlement = 0 THEN 'Y'\n        ELSE 'N' \n        END AS normal_benefit_received\n    ,ic_claim_date \n    ,last_employer_naics\n    ,ui_number AS last_employer \n    ,weekly_benefit_amount AS eligible_benefit_amount \n    ,earnings_during_wk \n    ,entitlement \n    ,veteran_status\n    ,commuter \n    ,education \n    ,disability \n    ,ic_type \n    ,SUBSTRING(occupation, 1, 2) AS occupation \n    ,program_type \n    ,status_code \n    ,stop_payment_indicator \n    ,waiting_week \n    ,res_zip \n    FROM ds_wi_dwd.promis p \n),\n--ignore all wage records where the ui account number isn't all integers\nWage_Rank AS (\n    SELECT\n    w.ssn,\n    w.year,\n    w.quarter,\n    ROW_NUMBER() OVER(PARTITION BY w.ssn, w.year, w.quarter ORDER BY w.wage DESC) AS RANK,\n    w.wage,\n    w.ui_account\n    FROM ds_wi_dwd.ui_wage w\n    WHERE w.wage &gt; 0 AND substring(w.ui_account,1,1) != 'M' \n),\nPrimary_Employer_Wage AS (\n    SELECT\n    WR.ssn,\n    WR.year,\n    WR.quarter,\n    WR.wage AS Primary_Employer_Wages,\n    WR.ui_account AS Primary_Employer_ID\n    FROM Wage_Rank WR\n    WHERE\n    WR.RANK=1\n),\nAll_Employer_Wage AS (\n    SELECT \n    WR.ssn,\n    WR.year,\n    WR.quarter,\n    COUNT(WR.ui_account) AS Employer_Count,\n    SUM(WR.wage) AS Total_Wages\n    FROM \n    Wage_Rank WR\n    GROUP BY\n    WR.ssn,\n    WR.year, \n    WR.quarter \n)\nSELECT person.person_id\n,pq.week_code AS week_ending_date\n,week.week_id\n,CASE WHEN pi.claim_week IS NULL then 'N' else 'Y' end as benefit_claimed\n,pi.benefit_yr_start\n,case when pi.normal_benefit_received is null then 'N' else pi.normal_benefit_received end\n  as normal_benefit_received\n,pi.ic_claim_date\n,pi.last_employer_naics\n,pi.last_employer\n,pi.eligible_benefit_amount \n,pi.earnings_during_wk\n,pi.entitlement \n,pi.veteran_status\n,pi.commuter \n,pi.education \n,pi.disability \n,pi.ic_type \n,pi.occupation \n,pi.program_type \n,pi.status_code \n,pi.stop_payment_indicator \n,pi.waiting_week \n,pi.res_zip\n,CASE WHEN PEW.ssn IS NULL THEN 'N' ELSE 'Y' END AS employed_in_quarter\n,week.calendar_year\n,week.calendar_quarter\n,CAST(PEW.Primary_Employer_ID AS INT) AS primary_empoyer_id\n,PEW.Primary_Employer_Wages AS primary_employer_wages\n,AEW.Total_Wages AS total_wages\n,AEW.Employer_Count AS employer_count \nFROM person_quarter_combos pq\nJOIN tr_wi_2023.wi_mdim_person person ON (pq.ssn = person.ssn)\nJOIN tr_wi_2023.wi_rdim_week week ON (pq.week_code = week.week_code)\nLEFT JOIN promis_info pi ON (pq.ssn = pi.ssn) AND (pq.week_code = pi.claim_week)\nLEFT JOIN Primary_Employer_Wage PEW ON (PEW.ssn=pq.ssn) AND\n  (PEW.quarter=week.calendar_quarter) AND (PEW.year = week.calendar_year) \nLEFT JOIN All_Employer_Wage AEW ON (AEW.ssn=pq.ssn) AND\n  (AEW.quarter=week.calendar_quarter) AND (AEW.year = week.calendar_year)\nORDER BY person.person_id, pq.week_code\n\"\ndbGetQuery(con,fact_qry)\nMcGough, R., et.al., Spring 2022 Applied Data Analytics Training, Arkansas Work-Based Learning to Workforce Outcomes, Linked Dataset Construction for Longitudinal Analysis\nAbowd, et. al., The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators, 2006 (https://lehd.ces.census.gov/doc/technical_paper/tp-2006-01.pdf).\nKimball, R., & Ross, M. (2019). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, Ed. Wiley."
+    "text": "11 Introduction\nThis supplemental notebook covers record linkage and creating a linked data model to facilitate longitudinal analyses.\nAnalyses involving administrative data often require:\nThis notebook will introduce and demonstrate some helpful techniques for linking administrative data while mediating the above issues. The output of the notebook should provide a flexible and performant framework that meets the needs of most projects and can be easily customized to include additional variables or characteristics.\nThe linked data assets documented in this notebook have already been completely created and loaded in the tr_wi_2023 schema as tables beginning with a “wi” prefix. This notebook will not create or load duplicative copies of the linked dataset, but rather cover the techniques used to construct and load the model and hopefully serve as a resource to use when building future linked datasets.\nHere, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, Record linkage and Dimensional Modeling.\nRecord linkage is an important component of any analysis, unless you have a fictitious perfectly curated dataset with no messiness or missing variables, and especially when it comes to linking administrative records. Unlike survey data that allows for perfectly selected variables with some potential for messiness, administrative data is tailored to administrative purposes (not academic). That means that we will not have all of the variables we ideally want, and it also means that the data can be messy (either missing responses or with variables that we may not quite understand or have at our disposal). While we may not directly address missing responses (more on indirectly addressing this in the inference lecture), we can enrich our data set by pulling in relevant information from other sources.\nTo facilitate easy and performant analysis of very large record sets (quarterly wages, PROMIS file), we will be formatting the data in a dimensional model. This type of model:\nThe modeling process involves “conjugating” the data into events and observations (verbs/facts) and the entities and attributes with which they are associated and by which they are analyzed (nouns/dimensions) (Kimball and Ross, 2019).\nThe SQL scripts for the actual creation of the dimensional model used in this notebook are in a subfolder within the “Notebooks” subfolder titled “Linked Data Model Scripts.” They include statements to create the tables and foreign key constraints used to enforce relational integrity and enhance query performance. A logical diagram for this dimensional model has been added to the References page of the class site.\nYou will not need to create additional tables, but you may wish to review the SQL as a reference for creating dimensional models for future projects.\nUnlike reference data that is consistent across states (NAICS, SOC), master data refer to the unique collection of persons, employers, or households served by each state. A state can have many different references to the same real-world entity, and mastering is the processing of assembling a set that has one member (record) for each unique instance of an entity in the real world.\nThis master record can merge attributes from multiple sources, resulting in a “golden record” with a higher completeness than is available in individual sources. When multiple references to the same entity have different values, those differences are resolved through a process called survivorship in which decisions are made about which value to keep (most recent, most frequent, highest quality source, etc.).\nIn our example, due to the messy nature of administrative data, there are individuals whose gender, race, ethnicity, and birth date values change over time, and even within the same case. First, let’s check how many individuals this concerns.\nqry &lt;- \"\nWITH mult AS (\n    SELECT ssn\n    FROM ds_wi_dwd.promis p\n    GROUP BY ssn \n    HAVING COUNT(DISTINCT(gender)) &gt; 1 or \n      COUNT(DISTINCT(race)) &gt; 1 or \n      COUNT(DISTINCT(ethnicity)) &gt; 1 or \n      COUNT(DISTINCT(birth_date)) &gt; 1 \n)\nSELECT COUNT(DISTINCT(ssn))\nFROM mult\"\n\ndbGetQuery(con, qry)\nWe see that there are hundreds of individuals that have multiple values for race, ethnicity, gender, or birth date. We can resolve these misalignments in a variety of ways, and here, we will do so by finding their most common value. In cases where there is a tie, we will prioritize known values over unknown ones.\nmastering_code &lt;- \"\nWITH gender_tab AS\n(\n    SELECT ssn \n        , gender \n        , ROW_NUMBER() OVER(PARTITION BY ssn ORDER BY count(*) DESC,\n        --prioritize known gender 1 and 2 over unknown 0 and 3 in ties\n            CASE WHEN gender = 0 THEN 2\n            WHEN gender = 1 THEN 1 \n            WHEN gender = 2 THEN 1 \n            WHEN gender = 3 THEN 2\n            ELSE 3 END \n        ) AS RowNum\n    FROM ds_wi_dwd.promis \n    GROUP BY ssn, gender    \n),\nrace_tab AS\n(\n    SELECT ssn \n        , race \n        , ROW_NUMBER() OVER(PARTITION BY ssn ORDER BY count(*) DESC,\n        --prioritize known race over unknown, unknown over null\n            CASE WHEN race IN (1,2,3,4,5,8) THEN 1\n            WHEN race IN (0, 6) THEN 2\n            WHEN race IS NULL THEN 3\n            ELSE 4 END \n    ) AS RowNum\n    FROM ds_wi_dwd.promis \n    GROUP BY ssn, race \n),\nethnicity_tab AS (\n    SELECT ssn \n        , ethnicity \n        , ROW_NUMBER() OVER(PARTITION BY ssn ORDER BY count(*) DESC,\n        --prioritize known ethnicity over unknown, unknown over null \n            CASE WHEN ethnicity in ('Y', 'N') THEN 1\n            WHEN ethnicity = '*' THEN 2\n            WHEN ethnicity IS NULL THEN 3\n            ELSE 4 END\n    ) AS RowNum\n    FROM ds_wi_dwd.promis \n    GROUP BY ssn, ethnicity \n),\nbirthdate_tab AS (\n    SELECT ssn \n        , birth_date \n        , ROW_NUMBER() OVER(PARTITION BY ssn ORDER BY count(*) DESC,\n        --prioritize known birth_date over impossible, null over impossible\n            CASE WHEN birth_date IS NULL THEN 2\n            WHEN EXTRACT(year FROM birth_date) &gt; 2023 THEN 3\n            ELSE 1 END \n    ) AS RowNum\n    FROM ds_wi_dwd.promis \n    GROUP BY ssn, birth_date  \n),\nmastered_tab AS (\n    SELECT g.ssn, g.gender, r.race, e.ethnicity, b.birth_date \n    FROM gender_tab g \n    INNER JOIN race_tab r ON g.ssn = r.ssn AND g.RowNum = 1 AND r.RowNum = 1\n    INNER JOIN ethnicity_tab e ON g.ssn = e.ssn AND e.RowNum = 1\n    INNER JOIN birthdate_tab b ON g.ssn = b.ssn AND b.RowNum = 1\n),\n--make sure we get everyone in wage records who doesn't show up in promis too\nunique_ssn_wage AS (\n    SELECT DISTINCT ssn\n    FROM ds_wi_dwd.ui_wage uw \n)\n--join everyone in promis to everyone in wage who doesn't show up in promis to get full set of people\nSELECT\nCOALESCE(m.ssn, u.ssn) as ssn \n, m.gender, m.race\n, m.ethnicity\n, m.birth_date\nFROM mastered_tab m \nFULL OUTER JOIN unique_ssn_wage u ON m.ssn = u.ssn \nLIMIT 5\n--the actual load is an insert of more than 6 million rows...this selects a sample of 5\n--any with NA for gender, race, ethnicity, and birth date likely only available in wage records\n\"\ndbGetQuery(con, mastering_code)\nThe fact table stores the actual observations (facts) of interest. Since this table often contains large numbers of records, it will ideally be comprised of a small number of bytes per row and primarily consist of indexed foreign keys to dimension tables and observation-specific measures. This allows for storage of large records sets with low storage cost and high query performance (extremely helpful for supporting dashboards).\nIn this example, the fact table is at the grain of one row per person per week. We will create a record for every week between the first and last observations of a person for both employment and PROMIS data sets, regardless of employment or PROMIS participation in a given week. These “missing” observation weeks are materialized because unemployment and non-participation may be just as interesting for some analyses and longitudinal analysis benefits from consistent representation across time periods of consistent grain.\nSome of our cohort members have observations for multiple employers in a single quarter. Since our unit of analysis is the person, not the person-employer combination, we need to resolve these one-to-many relationships into a single observation while retaining the information pertinent to analysis. In this example, the primary employer and associated wages were identified and recorded based on the employer with the largest wages in the quarter. In order to minimize loss of potentially relevant information, the total wages and number of employers are also included on each observation.\nWe can also simplify more complex logic during the load process in order to make analysis easier, more performant, and more consistent across products. For example, in this query, we are decoding the combination of variables consisting of benefit reception to create a simpler flag for identifying which individuals actually received benefits.\nThe fact table is at the grain of one row per person per week. However, the UI wages is on the grain of one row per person per employer per quarter. We kept the UI wages on the quarter grain, so keep that in mind when creating employment measures. The corresponding notebook demonstrating use cases of the fact table (and overall data model) displays this in more detail.\nfact_qry &lt;- \"\n--get all week/person combos \nWITH person_quarter_combos AS (\n    SELECT wmp.ssn, wrw.week_code  \n    FROM tr_wi_2023.wi_mdim_person wmp \n    CROSS JOIN tr_wi_2023.wi_rdim_week wrw \n    WHERE wmp.ssn = 'REDACTED'\n),\npromis_info AS (\n    SELECT \n    p.ssn\n    ,week_ending_date AS claim_week\n    ,effective_date_of_claim AS benefit_yr_start \n    ,CASE \n        WHEN ic_type IS NULL AND monetarily_eligible = 'Y' AND stop_payment_indicator = 'N'\n        AND waiting_week = 'N' AND entitlement = 0 THEN 'Y'\n        ELSE 'N' \n        END AS normal_benefit_received\n    ,ic_claim_date \n    ,last_employer_naics\n    ,ui_number AS last_employer \n    ,weekly_benefit_amount AS eligible_benefit_amount \n    ,earnings_during_wk \n    ,entitlement \n    ,veteran_status\n    ,commuter \n    ,education \n    ,disability \n    ,ic_type \n    ,SUBSTRING(occupation, 1, 2) AS occupation \n    ,program_type \n    ,status_code \n    ,stop_payment_indicator \n    ,waiting_week \n    ,res_zip \n    FROM ds_wi_dwd.promis p \n),\n--ignore all wage records where the ui account number isn't all integers\nWage_Rank AS (\n    SELECT\n    w.ssn,\n    w.year,\n    w.quarter,\n    ROW_NUMBER() OVER(PARTITION BY w.ssn, w.year, w.quarter ORDER BY w.wage DESC) AS RANK,\n    w.wage,\n    w.ui_account\n    FROM ds_wi_dwd.ui_wage w\n    WHERE w.wage &gt; 0 AND substring(w.ui_account,1,1) != 'M' \n),\nPrimary_Employer_Wage AS (\n    SELECT\n    WR.ssn,\n    WR.year,\n    WR.quarter,\n    WR.wage AS Primary_Employer_Wages,\n    WR.ui_account AS Primary_Employer_ID\n    FROM Wage_Rank WR\n    WHERE\n    WR.RANK=1\n),\nAll_Employer_Wage AS (\n    SELECT \n    WR.ssn,\n    WR.year,\n    WR.quarter,\n    COUNT(WR.ui_account) AS Employer_Count,\n    SUM(WR.wage) AS Total_Wages\n    FROM \n    Wage_Rank WR\n    GROUP BY\n    WR.ssn,\n    WR.year, \n    WR.quarter \n)\nSELECT person.person_id\n,pq.week_code AS week_ending_date\n,week.week_id\n,CASE WHEN pi.claim_week IS NULL then 'N' else 'Y' end as benefit_claimed\n,pi.benefit_yr_start\n,case when pi.normal_benefit_received is null then 'N' else pi.normal_benefit_received end\n  as normal_benefit_received\n,pi.ic_claim_date\n,pi.last_employer_naics\n,pi.last_employer\n,pi.eligible_benefit_amount \n,pi.earnings_during_wk\n,pi.entitlement \n,pi.veteran_status\n,pi.commuter \n,pi.education \n,pi.disability \n,pi.ic_type \n,pi.occupation \n,pi.program_type \n,pi.status_code \n,pi.stop_payment_indicator \n,pi.waiting_week \n,pi.res_zip\n,CASE WHEN PEW.ssn IS NULL THEN 'N' ELSE 'Y' END AS employed_in_quarter\n,week.calendar_year\n,week.calendar_quarter\n,CAST(PEW.Primary_Employer_ID AS INT) AS primary_empoyer_id\n,PEW.Primary_Employer_Wages AS primary_employer_wages\n,AEW.Total_Wages AS total_wages\n,AEW.Employer_Count AS employer_count \nFROM person_quarter_combos pq\nJOIN tr_wi_2023.wi_mdim_person person ON (pq.ssn = person.ssn)\nJOIN tr_wi_2023.wi_rdim_week week ON (pq.week_code = week.week_code)\nLEFT JOIN promis_info pi ON (pq.ssn = pi.ssn) AND (pq.week_code = pi.claim_week)\nLEFT JOIN Primary_Employer_Wage PEW ON (PEW.ssn=pq.ssn) AND\n  (PEW.quarter=week.calendar_quarter) AND (PEW.year = week.calendar_year) \nLEFT JOIN All_Employer_Wage AEW ON (AEW.ssn=pq.ssn) AND\n  (AEW.quarter=week.calendar_quarter) AND (AEW.year = week.calendar_year)\nORDER BY person.person_id, pq.week_code\n\"\ndbGetQuery(con,fact_qry)\nMcGough, R., et.al., Spring 2022 Applied Data Analytics Training, Arkansas Work-Based Learning to Workforce Outcomes, Linked Dataset Construction for Longitudinal Analysis\nAbowd, et. al., The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators, 2006 (https://lehd.ces.census.gov/doc/technical_paper/tp-2006-01.pdf).\nKimball, R., & Ross, M. (2019). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, Ed. Wiley."
   },
   {
     "objectID": "supplemental Materials/supplemental_data_model.html#load-libraries",
     "href": "supplemental Materials/supplemental_data_model.html#load-libraries",
-    "title": "Data Model Construction for Longitudinal Analysis",
+    "title": "10  Data Model Construction for Longitudinal Analysis",
     "section": "Load Libraries",
     "text": "Load Libraries\nAs before, we will start by loading necessary packages not readily available in the base R setup.\n\nAs a reminder, every time you create a new R file, you should copy and run the following code snippet.\n\n\noptions(scipen = 999) # avoid scientific notation\nlibrary(RJDBC)\nlibrary(tidyverse)"
   },
   {
     "objectID": "supplemental Materials/supplemental_data_model.html#establish-database-connection",
     "href": "supplemental Materials/supplemental_data_model.html#establish-database-connection",
-    "title": "Data Model Construction for Longitudinal Analysis",
+    "title": "10  Data Model Construction for Longitudinal Analysis",
     "section": "Establish Database Connection",
     "text": "Establish Database Connection\nThe following set of commands will set up a connection to the Redshift database:\n\ndbusr=Sys.getenv(\"DBUSER\")\ndbpswd=Sys.getenv(\"DBPASSWD\")\n\nurl &lt;- \"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;loginToRp=urn:amazon:webservices:govcloud;ssl=true;AutoCreate=true;idp_host=adfs.adrf.net;idp_port=443;ssl_insecure=true;plugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\"\n\ndriver &lt;- JDBC(\n  \"com.amazon.redshift.jdbc42.Driver\",\n  classPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\", identifier.quote=\"`\"\n)\n\ncon &lt;- dbConnect(driver, url, dbusr, dbpswd)\n\nFor this code to work, you need to have an .Renviron file in your user folder (i.e. U:\\John.Doe.P00002) containing your username and password."
   },
   {
     "objectID": "supplemental Materials/supplemental_data_model.html#merged-dimensions",
     "href": "supplemental Materials/supplemental_data_model.html#merged-dimensions",
-    "title": "Data Model Construction for Longitudinal Analysis",
+    "title": "10  Data Model Construction for Longitudinal Analysis",
     "section": "Merged Dimensions",
     "text": "Merged Dimensions\nThe modeling process starts with identifying the “dimensions” that describe the observations of interest and by which they will be analyzed. These will be combined into dimension entities (tables) that merge attributes (columns) from multiple data sources. Some of the advantages of using merged dimensions include:\n\nMediating differences in semantics\nFacilitating easy hierarchy navigation\nImproving query performance by reducing the number of joins involved and facilitating joins to the fact table with numeric surrogate IDs that require less storage space than character-based natural keys\nAllowing for easy expansion with additional attributes without disrupting the much larger table of observations\nReferencing data dimensions that have an external registration authority for interoperability across departments, states, and sectors (such as FIPS county and NAICS codes)\nFacilitating changes in naming or attributes over time. This is a “time variant” or “slowly changing” dimension, which was not used in this model.\n\nThe selection logic for the occupation dimension is available below, illustrating some of the techniques used to pull together and format merged dimensions. The complete SQL script also includes an INSERT statement for loading the table.\nWe will first look at the publicly-available SOC code crosswalk available in the ADRF:\n\nqry &lt;- \"\nSELECT * \nFROM ds_public_1.soc_xwalk\nLIMIT 5\n\"\n\ndbGetQuery(con, qry)\n\nEven within five rows, we can see that where soc_group = Major, the SOC code appears to be consistent with a traditional two-digit code, which is how occupation is stored in the PROMIS data.\nGiven this, as long as we limit occupuation in the PROMIS data to the first two digits, we can join directly to the SOC code crosswalk.\n\nqry &lt;- \"\nSELECT distinct p.occupation, sx.soc_title  \nFROM ds_wi_dwd.promis p \nLEFT JOIN ds_public_1.soc_xwalk sx \nON substring(p.occupation, 1, 2) = substring(sx.soc_code, 1, 2) AND\n  sx.soc_group = 'Major' --can join on a filtering clause to filter the table\n                         --prior to the left join\norder by 1\n\"\ndbGetQuery(con,qry)\n\nWithin the data model, the occupation table features SOC codes available at the two-digit level, and the fact table, which we will cover in a later section, contains occupational information at the 2-digit level as well for consistency and ease in potential joins."
   },
   {
     "objectID": "supplemental Materials/supplemental_data_model.html#time-dimension",
     "href": "supplemental Materials/supplemental_data_model.html#time-dimension",
-    "title": "Data Model Construction for Longitudinal Analysis",
+    "title": "10  Data Model Construction for Longitudinal Analysis",
     "section": "Time Dimension",
     "text": "Time Dimension\nA special type of dimension that is helpful for longitudinal analysis is a time dimension. This is a dimension that stores all possible values for a period of time (day, week, quarter, month, year) across a long period and allows for easy cross-referencing across grains.\nUsing an incrementing integer identifier as the primary key for time dimensions is particularly useful for longitudinal analysis as it facilitates comparison across periods through simple arithmetic. For example, in order to find outcomes for the 4th week following the week of exit t, you simply need to look up t+4.\nBy encoding all dates at a consistent grain (week) and representation (incrementing integer), it makes it easy to conduct analyses based on relative longitudinal outcomes (26 weeks after entry for all entrants in 2022) in additional to absolute longitudinal outcomes (2015 Q3 employment for January 3, 2015 entrants). This is especially helpful when smaller data sets limit the cohort size for absolute cohort outcomes.\nTo construct the time dimension, we set up a loop that increments between a starting and ending period and derives various time period representations and relationships needed for our analysis.\n\nqry &lt;- \"\nDROP TABLE IF EXISTS #Temp_WI_RDIM_Week; -- dropping temp table if it exists \n\n-- create temporary table to store results. the real query inserts to the time \n-- dimension with each loop iteration. temp tables, as you may have guessed, are\n-- temporary in nature\n\nCREATE TABLE #Temp_WI_RDIM_Week(\n    week_id SMALLINT NOT NULL  ENCODE az64\n    ,week_code DATE NOT NULL  ENCODE az64\n    ,quarter_code CHAR(6)   ENCODE lzo\n    ,calendar_year SMALLINT   ENCODE az64\n    ,calendar_quarter SMALLINT   ENCODE az64\n    ,PRIMARY KEY (week_id)\n);\n\n-- a procedure is a set of reusable code\nCREATE OR REPLACE PROCEDURE populating_wi_rdim_week_nb() \nLANGUAGE plpgsql\nAS $$\nDECLARE\n    StartDate DATE := '2006-01-07'; -- setting START date, first saturday in 2006\n    EndDate DATE := '2023-12-30'; -- setting END date, last saturday in 2023\n    Date DATE := StartDate; -- setting the variable date to increase with each loop \n    ID smallint :=1; -- setting id variable \n\nBEGIN\n  -- starting the while loop\n    WHILE Date &lt;= EndDate LOOP  \n        \n         -- printing the date it's currently processing\n         RAISE NOTICE 'THIS IS THE DATE %', DATE;\n\n        INSERT INTO #Temp_WI_RDIM_Week(\n        week_id, \n        week_code, \n        quarter_code, \n        calendar_year, \n        calendar_quarter\n        )\n        VALUES(\n        ID, -- quarter id\n        Date, -- week code\n        CAST(DATE_PART(y,Date) AS CHAR(4)) + 'Q' + CAST(DATE_PART(qtr,Date) AS CHAR(1)), -- quarter_code\n        DATE_PART(y,Date), --calendar_year\n        DATE_PART(qtr,Date) --calendar_quarter\n        );\n        Date := DATEADD(w,1,Date); -- increasing the date variable \n        ID := ID + 1; -- increasing the id variable \n    END LOOP;\n    RAISE INFO 'LOOP ENDED';\n\nEND;\n$$;\n\nCALL populating_wi_rdim_week_nb(); -- we need to call the procedure to populate the table\n\n--select the loop results\nSELECT * FROM #Temp_WI_RDIM_Week;\n\n\"\ntimedim &lt;- dbGetQuery(con, qry)\nhead(timedim)"
   },
   {
     "objectID": "supplemental Materials/supplemental_employer_measures.html",
     "href": "supplemental Materials/supplemental_employer_measures.html",
-    "title": "Building Employer Measures",
+    "title": "11  Building Employer Measures",
     "section": "",
-    "text": "Introduction\nThis supplemental notebook provides a demonstration of how we can build employer-level characteristics, at the yearly grain, from the Unemployment Insurance (UI) wage records dataset. Our final output from this notebook is a permanent table with employer-level information aggregated to the calendar year for each employer with at least 5 employees in Wisconsin that appears in its UI wage records.\nWe will start by loading necessary packages not readily available in the base R setup.\noptions(scipen = 999) # avoid scientific notation\nlibrary(RJDBC)\nlibrary(tidyverse)\nlibrary(scales)\nlibrary(zoo)\nlibrary(lubridate)\nAnd then set up a connection to the Redshift database:\ndbusr=Sys.getenv(\"DBUSER\")\ndbpswd=Sys.getenv(\"DBPASSWD\")\n\nurl &lt;- \"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;\nloginToRp=urn:amazon:webservices:govcloud;ssl=true;\nAutoCreate=true;idp_host=adfs.adrf.net;idp_port=443;\nssl_insecure=true;\nplugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\"\n\ndriver &lt;- JDBC(\n  \"com.amazon.redshift.jdbc42.Driver\",\n  classPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\",\n  identifier.quote=\"`\"\n)\n\ncon &lt;- dbConnect(driver, url, dbusr, dbpswd)\nFor this code to work, you need to have an .Renviron file in your user folder (i.e. U:\\John.Doe.P00002) containing your username and password.\nWe will define each employer as a unique ui_account value in the UI wage records,, developing the following measures for each ui_account:\nFirm characteristics\nStability\nOpportunity\nJob Quality\nLet’s remind ourselves of the range of the UI wage records available for this class:\n# see years and quarters in wage data\nqry &lt;- \"\nSELECT DISTINCT year, quarter\nFROM ds_wi_dwd.ui_wage uw\nORDER BY year, quarter;\n\"\ndbGetQuery(con,qry)\nAlthough our final analytic table will be aggregated to the calendar year, we need to first develop these measures for each quarter. To do so, we will first create permanent tables of all the UI wage record information from ui_wage for all the quarters from 2008-2022. Ideally, we should include information from the two quarters prior to our desired start (2007Q3, 2007Q4) and one after our end (2023Q1) since we need employment, separation, and hiring counts for 2007Q3 and 2007Q4 to calculate growth rates for these measures in 2008Q1, as well as counts for 2023Q1 to calculate these measures in 2022Q4. Since we do not have data for 2023Q1, the 2022Q4 table will be incomplete. Nevertheless, this incomplete table, with the proper framing, can still be useful in our analysis.\nWe can create a table with all necessary pre- and post-quarter of interest information, which will make it easier to loop through all of the quarters.\nNow that we have our aggregations and growth rates calculated, we will combine these into a single yearly aggregation table. We need to create our start and end strings of the query and then we will paste these strings together.\nstring = \"\nselect *, %s as years, total_earnings/num_employed as avg_earnings, \ncase \n  when full_num_employed = 0 then 0 \n  else full_total_earnings/full_num_employed \n  end as full_avg_earnings \nfrom tr_wi_2023.%s_agg\"\nend_qry = \"select ui_account, years,\ncount(*) as num_quarters, avg(num_employed) as avg_num_employed,\navg(cast(total_earnings as bigint)) as avg_total_earnings,\navg(bottom_25_pctile) as avg_bottom_25_pctile,\navg(top_75_pctile) as avg_top_75_pctile,\navg(full_num_employed) as avg_full_num_employed,\navg(cast(full_total_earnings as bigint)) as avg_full_total_earnings,\navg(emp_rate) as avg_emp_rate, \navg(hire_rate) as avg_hire_rate, \navg(sep_rate) as avg_sep_rate,\navg(avg_earnings) as avg_avg_earnings,\navg(full_avg_earnings) as avg_full_avg_earnings,\navg(ratio_fullq_total) as avg_ratio_fullq_total,\navg(new_hires_fullq) as avg_new_hires_fullq\ninto tr_wi_2023.employer_yearly_agg\nfrom wi_comb\ngroup by ui_account, years\"\nThe only part of the query string that needs updating in the loop is the middle. So, here we are creating the middle part of the query then pasting the start and end string of the query to it.\nquarters = quarter_year$title\nyr = quarter_year$year\n\nstart_string = \"DROP TABLE IF EXISTS tr_wi_2023.employer_yearly_agg; \nwith wi_comb as ( \"\nfor(i in 3:(length(quarter_agg_save))){\n    query =\"select *, %s as years, total_earnings/num_employed as avg_earnings, \n        case when full_num_employed = 0 then 0 else full_total_earnings/full_num_employed end as full_avg_earnings \n    from tr_wi_2023.%s\"\n    full_qry = sprintf(query, yr[i], quarters[i])\n    if (i == 3) {\n        start_string = paste0(start_string, full_qry)        \n    }\n    else if (i == length(quarter_agg_save)) {\n        start_string = paste0(start_string, '\\n UNION ALL \\n', full_qry, '\\n )', '\\n', end_qry)        \n    }\n    else {\n        start_string = paste0(start_string, ' \\n UNION ALL \\n', full_qry)\n    }\n}\n\n# writeLines(start_string)\n# DBI::dbExecute(con, start_string)\nAnd finally, we need to permission the table to all appropriate groups:\nqry &lt;- '\nGRANT SELECT, UPDATE, DELETE, INSERT ON TABLE tr_wi_2023.employer_yearly_agg\nTO group ci_read_group;\n\nGRANT SELECT, UPDATE, DELETE, INSERT ON TABLE tr_wi_2023.employer_yearly_agg\nTO group db_t00111_rw;\n'\n\n# DBI::dbExecute(con, qry)\nFeder, Benjamin, Garner, Maryah, Nunez, Allison, & Mian, Rukhshan. (2022, December 19). Creating Supplemental Employment Measures using Indiana’s Unemployment Insurance Wage Records. Zenodo. https://doi.org/10.5281/zenodo.7459730"
+    "text": "12 Introduction\nThis supplemental notebook provides a demonstration of how we can build employer-level characteristics, at the yearly grain, from the Unemployment Insurance (UI) wage records dataset. Our final output from this notebook is a permanent table with employer-level information aggregated to the calendar year for each employer with at least 5 employees in Wisconsin that appears in its UI wage records.\nWe will start by loading necessary packages not readily available in the base R setup.\noptions(scipen = 999) # avoid scientific notation\nlibrary(RJDBC)\nlibrary(tidyverse)\nlibrary(scales)\nlibrary(zoo)\nlibrary(lubridate)\nAnd then set up a connection to the Redshift database:\ndbusr=Sys.getenv(\"DBUSER\")\ndbpswd=Sys.getenv(\"DBPASSWD\")\n\nurl &lt;- \"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;\nloginToRp=urn:amazon:webservices:govcloud;ssl=true;\nAutoCreate=true;idp_host=adfs.adrf.net;idp_port=443;\nssl_insecure=true;\nplugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\"\n\ndriver &lt;- JDBC(\n  \"com.amazon.redshift.jdbc42.Driver\",\n  classPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\",\n  identifier.quote=\"`\"\n)\n\ncon &lt;- dbConnect(driver, url, dbusr, dbpswd)\nFor this code to work, you need to have an .Renviron file in your user folder (i.e. U:\\John.Doe.P00002) containing your username and password.\nWe will define each employer as a unique ui_account value in the UI wage records,, developing the following measures for each ui_account:\nFirm characteristics\nStability\nOpportunity\nJob Quality\nLet’s remind ourselves of the range of the UI wage records available for this class:\n# see years and quarters in wage data\nqry &lt;- \"\nSELECT DISTINCT year, quarter\nFROM ds_wi_dwd.ui_wage uw\nORDER BY year, quarter;\n\"\ndbGetQuery(con,qry)\nAlthough our final analytic table will be aggregated to the calendar year, we need to first develop these measures for each quarter. To do so, we will first create permanent tables of all the UI wage record information from ui_wage for all the quarters from 2008-2022. Ideally, we should include information from the two quarters prior to our desired start (2007Q3, 2007Q4) and one after our end (2023Q1) since we need employment, separation, and hiring counts for 2007Q3 and 2007Q4 to calculate growth rates for these measures in 2008Q1, as well as counts for 2023Q1 to calculate these measures in 2022Q4. Since we do not have data for 2023Q1, the 2022Q4 table will be incomplete. Nevertheless, this incomplete table, with the proper framing, can still be useful in our analysis.\nWe can create a table with all necessary pre- and post-quarter of interest information, which will make it easier to loop through all of the quarters.\nNow that we have our aggregations and growth rates calculated, we will combine these into a single yearly aggregation table. We need to create our start and end strings of the query and then we will paste these strings together.\nstring = \"\nselect *, %s as years, total_earnings/num_employed as avg_earnings, \ncase \n  when full_num_employed = 0 then 0 \n  else full_total_earnings/full_num_employed \n  end as full_avg_earnings \nfrom tr_wi_2023.%s_agg\"\nend_qry = \"select ui_account, years,\ncount(*) as num_quarters, avg(num_employed) as avg_num_employed,\navg(cast(total_earnings as bigint)) as avg_total_earnings,\navg(bottom_25_pctile) as avg_bottom_25_pctile,\navg(top_75_pctile) as avg_top_75_pctile,\navg(full_num_employed) as avg_full_num_employed,\navg(cast(full_total_earnings as bigint)) as avg_full_total_earnings,\navg(emp_rate) as avg_emp_rate, \navg(hire_rate) as avg_hire_rate, \navg(sep_rate) as avg_sep_rate,\navg(avg_earnings) as avg_avg_earnings,\navg(full_avg_earnings) as avg_full_avg_earnings,\navg(ratio_fullq_total) as avg_ratio_fullq_total,\navg(new_hires_fullq) as avg_new_hires_fullq\ninto tr_wi_2023.employer_yearly_agg\nfrom wi_comb\ngroup by ui_account, years\"\nThe only part of the query string that needs updating in the loop is the middle. So, here we are creating the middle part of the query then pasting the start and end string of the query to it.\nquarters = quarter_year$title\nyr = quarter_year$year\n\nstart_string = \"DROP TABLE IF EXISTS tr_wi_2023.employer_yearly_agg; \nwith wi_comb as ( \"\nfor(i in 3:(length(quarter_agg_save))){\n    query =\"select *, %s as years, total_earnings/num_employed as avg_earnings, \n        case when full_num_employed = 0 then 0 else full_total_earnings/full_num_employed end as full_avg_earnings \n    from tr_wi_2023.%s\"\n    full_qry = sprintf(query, yr[i], quarters[i])\n    if (i == 3) {\n        start_string = paste0(start_string, full_qry)        \n    }\n    else if (i == length(quarter_agg_save)) {\n        start_string = paste0(start_string, '\\n UNION ALL \\n', full_qry, '\\n )', '\\n', end_qry)        \n    }\n    else {\n        start_string = paste0(start_string, ' \\n UNION ALL \\n', full_qry)\n    }\n}\n\n# writeLines(start_string)\n# DBI::dbExecute(con, start_string)\nAnd finally, we need to permission the table to all appropriate groups:\nqry &lt;- '\nGRANT SELECT, UPDATE, DELETE, INSERT ON TABLE tr_wi_2023.employer_yearly_agg\nTO group ci_read_group;\n\nGRANT SELECT, UPDATE, DELETE, INSERT ON TABLE tr_wi_2023.employer_yearly_agg\nTO group db_t00111_rw;\n'\n\n# DBI::dbExecute(con, qry)\nFeder, Benjamin, Garner, Maryah, Nunez, Allison, & Mian, Rukhshan. (2022, December 19). Creating Supplemental Employment Measures using Indiana’s Unemployment Insurance Wage Records. Zenodo. https://doi.org/10.5281/zenodo.7459730"
   },
   {
     "objectID": "supplemental Materials/supplemental_employer_measures.html#range-of-quarters-function",
     "href": "supplemental Materials/supplemental_employer_measures.html#range-of-quarters-function",
-    "title": "Building Employer Measures",
+    "title": "11  Building Employer Measures",
     "section": "Range of Quarters Function",
     "text": "Range of Quarters Function\n\ncreate_quarters &lt;- function(start_yq, end_yq) {\n  \n    # converting start and end quarters to year-quarter format that R can understand\n    # yq() from lubridate\n    d1 &lt;- yq(start_yq)\n    d2 &lt;- yq(end_yq)\n    \n    # Getting the range between d1 and d2\n    dat &lt;- format(seq(d1, d2, by=\"quarter\"), \"%Y-%m\")\n    \n    # converting resulting range to a year-quarter format\n    q_yr_input &lt;- as.yearqtr(dat, \"%Y-%m\") #from zoo\n    df &lt;- data.frame(q_yr_input)\n    names(df) &lt;- c(\"yr_quarter\")\n\n    df$qyr_req &lt;- paste0(tolower(substring(df$yr_quarter, 6, 7)), \"_\", substring(df$yr_quarter, 1, 4))\n    df$title = paste0(df$qyr_req, \"_agg\")\n    df &lt;- df %&gt;% mutate(quart = str_sub(title, 2,2))\n    df &lt;- df %&gt;% mutate(year = str_sub(yr_quarter, 1, 4))\n\n    return(df)\n}\n\nquarter_year &lt;- create_quarters('2007 Q3', '2022 Q4')\n\nquarters &lt;- quarter_year$yr_quarter\nquarters_sql_save &lt;- quarter_year$qyr_req\nquarter_agg_save &lt;- quarter_year$title\nquart &lt;- quarter_year$quart\nyr &lt;- quarter_year$year\n\nhead(quarter_year)\n\nNow that we have our data frame populated with all the quarters needed, we will start creating our measures. Here we are creating measures for every year and quarter."
   },
   {
     "objectID": "supplemental Materials/supplemental_employer_measures.html#creating-tables",
     "href": "supplemental Materials/supplemental_employer_measures.html#creating-tables",
-    "title": "Building Employer Measures",
+    "title": "11  Building Employer Measures",
     "section": "Creating tables",
     "text": "Creating tables\nWe use a for() loop to iterate through the UI Wage records by our quarters of interest. While doing so, we drop duplicated ssn/ui_account/quarter/year entries, prioritizing the highest wage value in these instances.\nSince we are creating permanent tables, we will comment out the code to execute these commands - they will return errors because the table names already exist in the schema.\n\nNote that this is one way to approach duplicate data. There are other ways in handling duplicates, such as summing up the wages.\n\n\nfor(i in 1:length(quarters_sql_save)){\n    qry = 'DROP TABLE IF EXISTS tr_wi_2023.%s;\n    \n    with init_wage as (\n        select *\n        from ds_wi_dwd.ui_wage\n        where quarter = %s and year = %s and ui_account is not null and wage &gt; 0\n    ),\n    dup as (\n        select *, \n          row_number() over (partition by ssn, ui_account, quarter, year order by wage desc) as rownumber_wg\n        from init_wage\n    )\n    select ui_account, ssn, wage, quarter, year\n    into tr_wi_2023.\"%s\"\n    from dup\n    where rownumber_wg = 1\n    '\n    # sprintf() allows us to systematically update the character string (SQL query)\n    full_qry = sprintf(qry, quarters_sql_save[i], quart[i], yr[i], quarters_sql_save[i])\n    # writeLines(full_qry)\n    # DBI::dbExecute(con, full_qry)\n}\n\nNow that we have created all the tables that contain our UI Wage records by each quarter of interest, we need to now permission those tables to the appropriate groups. The code below does just that.\n\nfor(i in 1:length(quarters_sql_save)){\n  qry &lt;- '\n  GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE tr_wi_2023.%s\n  TO group ci_read_group;\n\n  GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE tr_wi_2023.%s \n  TO group db_t00111_rw;\n  '\n  full_qry = sprintf(qry, quarters_sql_save[i], quarters_sql_save[i])\n  # writeLines(full_qry)\n  # DBI::dbExecute(con, full_qry)\n}\n\nLet’s take a peek at a single table to make sure we created it properly:\n\n# see example\nqry = \"\nSELECT * \nFROM tr_wi_2023.q3_2015\nLIMIT 5\n\"\ndbGetQuery(con, qry)\n\nOur code worked as intended - we can access the table."
   },
   {
     "objectID": "supplemental Materials/supplemental_employer_measures.html#identifying-pre--and-post-quarter-employment",
     "href": "supplemental Materials/supplemental_employer_measures.html#identifying-pre--and-post-quarter-employment",
-    "title": "Building Employer Measures",
+    "title": "11  Building Employer Measures",
     "section": "Identifying pre- and post-quarter employment",
     "text": "Identifying pre- and post-quarter employment\nIn the process of eventually building out our desired measures, we will create new pre_emp and post_emp columns to track if each ui_account/ssn combination within a given quarter exists in the wage record table the quarter before and/or the quarter after. This will be important in tracking full-quarter employment, as well as hiring and separation numbers.\n\n# initialize pre and post employment columns\nnew_cols &lt;- c('pre_emp', 'post_emp')\n\nfor(col in new_cols){\n    for(i in 1:length(quarters_sql_save)){\n        qry='\n        ALTER TABLE tr_wi_2023.\"%s\" ADD \"%s\" int\n        '\n        full_qry = sprintf(qry, quarters_sql_save[i], col)\n        # writeLines(full_qry)\n        # DBI::dbExecute(con, full_qry)\n    }\n}\n\nLet’s make sure our new columns are added to the tables:\n\n# see example\nqry = \"\nSELECT * \nFROM tr_wi_2023.q3_2015\nLIMIT 5\n\"\ndbGetQuery(con, qry)\n\nNow that the pre_emp and post_emp columns are initialized in each of these temporary tables, we can set these as indicator variables if the ssn/ui_account combination that appeared in the UI wage records for the given year/quarter combination also existed in the previous and future quarter.\n\nfor(i in 2:length(quarters_sql_save)){\n    # update this quarter employment flag\n    qry='\n    UPDATE tr_wi_2023.\"%s\" SET pre_emp = \n        CASE WHEN b.wage is null THEN 0 ELSE 1 END\n    FROM tr_wi_2023.\"%s\" b\n    where tr_wi_2023.\"%s\".ssn = b.ssn and \n        tr_wi_2023.\"%s\".ui_account = b.ui_account\n    '\n    full_qry = sprintf(qry, quarters_sql_save[i], quarters_sql_save[i-1], quarters_sql_save[i], quarters_sql_save[i])\n    # writeLines(full_qry)\n    # DBI::dbExecute(con, full_qry)\n    }\n\n\nfor(i in 2:length(quarters_sql_save)-1){\n    # update this quarter employment flag\n    qry='\n    UPDATE tr_wi_2023.\"%s\" SET post_emp = \n        CASE WHEN b.wage is null THEN 0 ELSE 1 END\n    FROM tr_wi_2023.\"%s\" b\n    where tr_wi_2023.\"%s\".ssn = b.ssn and \n        tr_wi_2023.\"%s\".ui_account = b.ui_account\n    '\n    full_qry = sprintf(qry, quarters_sql_save[i], quarters_sql_save[i+1], quarters_sql_save[i], quarters_sql_save[i])\n    # writeLines(full_qry)\n    # DBI::dbExecute(con, full_qry)\n    }\n\nTo confirm the proper creation of our new columns, we can look at the observation counts by values of pre_emp and post_emp.\n\n# see values of post_emp\nqry = \"\nSELECT post_emp, COUNT(*)\nFROM tr_wi_2023.q4_2015 \nGROUP BY post_emp\n\"\ndbGetQuery(con, qry)\n\n\n# see values of pre_emp\nqry = \"\nSELECT pre_emp, COUNT(*)\nFROM tr_wi_2023.q4_2017 \nGROUP BY pre_emp\n\"\ndbGetQuery(con, qry)"
   },
   {
     "objectID": "supplemental Materials/supplemental_employer_measures.html#separation-and-hire",
     "href": "supplemental Materials/supplemental_employer_measures.html#separation-and-hire",
-    "title": "Building Employer Measures",
+    "title": "11  Building Employer Measures",
     "section": "Separation and Hire",
     "text": "Separation and Hire\nNow that we have pre and post-quarter employment indicators for each ssn/ui_account combination, we can add hiring and separation indicators into these tables.\n\nnew_cols &lt;- c('sep', 'hire')\n\nfor(col in new_cols){\n    for(i in 1:length(quarters_sql_save)){\n        qry='\n        ALTER TABLE tr_wi_2023.\"%s\" ADD \"%s\" int\n        '\n        full_qry = sprintf(qry, quarters_sql_save[i], col)\n        # writeLines(full_qry)\n        # DBI::dbExecute(con, full_qry)\n    }\n}\n\nLet’s make sure it worked:\n\nqry &lt;- \"\nSELECT  *\nFROM tr_wi_2023.q4_2019\nLIMIT 5\n\"\ndbGetQuery(con, qry)\n\nWe will build our sep and hire variables as indicators - if an individual is not employed by the same employer in the previous quarter, they will have a hire value of 1. We use similar logic for the sep column, instead evaluating the next quarter.\n\nfor(i in 1:length(quarters_sql_save)){\n    qry='\n    UPDATE tr_wi_2023.\"%s\" \n    SET \n        sep = CASE WHEN post_emp is null THEN 1 ELSE 0 END,\n        hire = CASE WHEN pre_emp is null THEN 1 ELSE 0 END\n    '\n    full_qry = sprintf(qry, quarters_sql_save[i])\n    # writeLines(full_qry)\n    # DBI::dbExecute(con, full_qry)\n}\n\nChecking if the table populated correctly:\n\n# look at different values of sep\nqry = '\nselect count(*), sep\nfrom tr_wi_2023.q1_2018 \ngroup by sep\norder by sep\n'\n\ndbGetQuery(con, qry)\n\n\n# look at different values of hire\nqry = '\nselect count(*), hire\nfrom tr_wi_2023.q1_2018 \ngroup by hire\norder by hire\n'\n\ndbGetQuery(con, qry)"
   },
   {
     "objectID": "supplemental Materials/supplemental_employer_measures.html#aggregate-by-employer",
     "href": "supplemental Materials/supplemental_employer_measures.html#aggregate-by-employer",
-    "title": "Building Employer Measures",
+    "title": "11  Building Employer Measures",
     "section": "Aggregate by Employer",
     "text": "Aggregate by Employer\nAt this point, we have all the information we need to aggregate on the ui_account values. We will do these aggregations in separate steps, as they will require separate WHERE clauses. In the first, we will find values for all measures outside of the full-quarter employee-related ones.\n\n# example of non-full quarter measures query\nqry &lt;- \"\n    select distinct ui_account, year, quarter,\n    count(ssn) over(partition by ui_account, year, quarter) as num_employed,\n    sum(Wage) over(partition by ui_account, year, quarter) as total_earnings,\n    sum(hire) over(partition by ui_account, year, quarter) as num_hire,\n    sum(sep) over(partition by ui_account, year, quarter) as num_sep,\n    percentile_cont(0.25) within group (order by Wage) over (partition by ui_account, year, quarter) as bottom_25_pctile,\n    percentile_cont(0.75) within group (order by Wage) over (partition by ui_account, year, quarter) as top_75_pctile\n    from tr_wi_2023.q1_2015\n    limit 5\n\"\ndbGetQuery(con, qry)\n\nIn a separate table, we can find all of the columns related to full quarter employment.\n\n# example of full quarter measures query\nqry &lt;- \"\n    select distinct ui_account, year, quarter,\n    count(ssn) over(partition by ui_account,  year, quarter) as full_num_employed,\n    sum(Wage) over(partition by ui_account,  year, quarter) as full_total_earnings\n    from tr_wi_2023.q1_2015\n    where post_emp = 1 and pre_emp = 1\n    limit 5\n\"\ndbGetQuery(con, qry)\n\nFinally, we need information on these employer’s hiring, employment, and separation numbers for the prior quarter to calculate their growth rates.\n\n# example query finding hiring, employment, and separation metrics\nqry &lt;- \"\n    select ui_account, year, quarter,\n    count(ssn) as num_employed_pre,\n    sum(hire) as num_hire_pre,\n    sum(sep) as num_sep_pre\n    from tr_wi_2023.q1_2015\n    group by ui_account, year, quarter\n\"\ndbGetQuery(con, qry)\n\nNow that we have all the information we need in three tables, we can join them together based on the ui_account values.\n\n# example join\nqry &lt;- \"\nwith full_q as (\n    select distinct ui_account,  year, quarter,\n    count(ssn) over(partition by ui_account,  year, quarter) as full_num_employed_init,\n    sum(Wage) over(partition by ui_account,  year, quarter) as full_total_earnings_init\n    from tr_wi_2023.q1_2015\n    where post_emp = 1 and pre_emp = 1\n),\nemp as (\n    select distinct ui_account,  year, quarter,\n    count(ssn) over(partition by  ui_account,  year, quarter) as num_employed,\n    sum(Wage) over(partition by  ui_account,  year, quarter) as total_earnings,\n    sum(hire) over(partition by  ui_account,  year, quarter) as num_hire,\n    sum(sep) over(partition by  ui_account,  year, quarter) as num_sep,\n    percentile_cont(0.25) within group (order by Wage) over (partition by  ui_account,  year, quarter) as top_25_pctile,\n    percentile_cont(0.75) within group (order by Wage) over (partition by  ui_account,  year, quarter) as top_75_pctile\n    from tr_wi_2023.q1_2015\n),\ntabs as (\n    select emp.*, full_q.full_num_employed_init,\n    full_q.full_total_earnings_init\n    from emp\n    left join full_q\n    on emp.ui_account = full_q.ui_account and emp.year = full_q.year and emp.quarter = full_q.quarter\n    where emp.num_employed &gt;= 5\n)\nselect ui_account,  year, quarter, num_employed, total_earnings, num_hire, num_sep, top_25_pctile, top_75_pctile, \n  case \n    when full_num_employed_init is null then 0\n    else full_num_employed_init end as full_num_employed,\n  case\n    when full_total_earnings_init is null then 0\n    else full_total_earnings_init end as full_total_earnings\nfrom tabs\nlimit 5\n\"\ndbGetQuery(con, qry)"
   },
   {
     "objectID": "supplemental Materials/supplemental_employer_measures.html#calculating-growth-rates",
     "href": "supplemental Materials/supplemental_employer_measures.html#calculating-growth-rates",
-    "title": "Building Employer Measures",
+    "title": "11  Building Employer Measures",
     "section": "Calculating growth rates",
     "text": "Calculating growth rates\nTo calculate the hiring, separation, and employment growth rates, we will use the following function from Davis and Haltiwanger (1992) to calculate employment growth rate (emp_rate), separation growth rate (sep_rate), and hire growth rate (hire_rate):\n\\[g_{et}=\\frac{2(x_{et}-x_{e,t-1})}{(x_{et}+x_{e,t-1})}\\]\nIn this function, \\(g_{et}\\) represents employment/separation/hire growth rate of employer \\(e\\) at time \\(t\\). \\(x_{et}\\) and \\(x_{e,t-1}\\) are employer \\(e\\)’s employment/separation/hire at time \\(t\\) and \\(t-1\\), respectively. According to Davis and Haltiwanger (1992):\n“This growth rate measure is symmetric about zero, and it lies in the closed interval [-2,2] with deaths (births) corresponding to the left (right) endpoint. A virtue of this measure is that it facilitates an integrated treatment of births, deaths, and continuing establishments in the empirical analysis.”\nIn other words, a firm with a \\(g_{et} = 2\\) is a new firm, while a firm with a \\(g_{et} = -2\\) is a a firm that exited the economy.\n\nWhy do the two endpoints represent firms’ deaths and births? Calculate the value of \\(g_{et}\\) when \\(x_{et}=0\\) and when \\(x_{e,t-1}=0\\) and see what you find.\n\nIn practice, we will apply this formula for every ui_account unless it experienced no hires or separations in the current and previous quarters, where instead of getting a divide by zero error, we will assign it to 0.\nLet’s translate this to a query:\n\nfor(i in 2:(length(quarters_sql_save))){\n  \n    qry = 'DROP TABLE IF EXISTS tr_wi_2023.\"%s\";\n    \n    with full_q as (\n        select distinct a.ui_account, a.year, a.quarter,\n        count(ssn) over(partition by ui_account, a.year, a.quarter) as full_num_employed_init,\n        sum(wage) over(partition by ui_account, a.year, a.quarter) as full_total_earnings_init\n        from tr_wi_2023.\"%s\" a\n        where post_emp = 1 and pre_emp = 1\n    ),\n    emp as (\n        select distinct ui_account, b.year, b.quarter,\n        count(ssn) over(partition by ui_account, b.year, b.quarter) as num_employed,\n        sum(wage) over(partition by ui_account, b.year, b.quarter) as total_earnings,\n        sum(hire) over(partition by ui_account, b.year, b.quarter) as num_hire,\n        sum(sep) over(partition by ui_account, b.year, b.quarter) as num_sep,\n        percentile_cont(0.25) within group (order by wage) over (partition by ui_account, year, b.quarter) as bottom_25_pctile,\n        percentile_cont(0.75) within group (order by wage) over (partition by ui_account, year, b.quarter) as top_75_pctile\n        from tr_wi_2023.\"%s\" b\n    ),\n    tabs as (\n        select emp.*, full_q.full_num_employed_init,\n        full_q.full_total_earnings_init\n        from emp\n        left join full_q\n        on emp.ui_account = full_q.ui_account and emp.year = full_q.year and emp.quarter = full_q.quarter\n        where emp.num_employed &gt;= 5\n    ),\n    joined as (\n        select ui_account, tabs.year, tabs.quarter, num_employed, total_earnings, num_hire, num_sep, bottom_25_pctile, top_75_pctile, \n        case \n            when full_num_employed_init is null then 0\n            else full_num_employed_init end as full_num_employed,\n        case\n            when full_total_earnings_init is null then 0\n            else full_total_earnings_init end as full_total_earnings\n        from tabs\n    ),\n    old_tabs as (\n        select ui_account, c.year, c.quarter,\n        count(ssn) as num_employed_pre,\n        sum(hire) as num_hire_pre,\n        sum(sep) as num_sep_pre\n        from tr_wi_2023.\"%s\" c\n        group by ui_account, year, c.quarter\n    ),\n    hired as (\n    select tmone.ui_account,\n    count(tmone.ssn) as new_hires_fullq\n    from tr_wi_2023.%s tmone\n    join tr_wi_2023.%s t on tmone.ssn = t.ssn and tmone.ui_account = t.ui_account\n    where tmone.hire = 1 and t.post_emp = 1\n    group by tmone.ui_account\n    )\n    select joined.ui_account, joined.year, joined.quarter, joined.num_employed, joined.total_earnings, \n    joined.bottom_25_pctile, joined.top_75_pctile, joined.full_num_employed, joined.full_total_earnings, \n    CAST(joined.full_num_employed AS FLOAT)/CAST(joined.num_employed AS FLOAT) as ratio_fullq_total, hired.new_hires_fullq,\n        case \n            when (old_tabs.num_employed_pre is null or old_tabs.num_employed_pre = 0) and joined.num_employed = 0 then 0\n            when old_tabs.num_employed_pre is null and joined.num_employed != 0 then 2\n            else (2.0 * (joined.num_employed - old_tabs.num_employed_pre))/(joined.num_employed + old_tabs.num_employed_pre) end as emp_rate,\n        case\n            when (old_tabs.num_hire_pre is null or old_tabs.num_hire_pre = 0) and joined.num_hire = 0 then 0\n            when old_tabs.num_hire_pre is null and joined.num_hire != 0 then 2\n            else (2.0 * (joined.num_hire - old_tabs.num_hire_pre))/(joined.num_hire + old_tabs.num_hire_pre) end as hire_rate, \n        case\n            when (old_tabs.num_sep_pre is null or old_tabs.num_sep_pre = 0) and joined.num_sep = 0 then 0\n            when old_tabs.num_sep_pre is null and joined.num_sep != 0 then 2\n            else (2.0 * (joined.num_sep - old_tabs.num_sep_pre))/(joined.num_sep + old_tabs.num_sep_pre) end as sep_rate\n    into tr_wi_2023.%s\n    from joined\n    left join old_tabs on joined.ui_account = old_tabs.ui_account\n    left join hired on joined.ui_account = hired.ui_account\n    '\n    \n    full_qry = sprintf(\n      qry, quarter_agg_save[i], quarters_sql_save[i], quarters_sql_save[i],\n      quarters_sql_save[i-1], quarters_sql_save[i-1], quarters_sql_save[i],\n      quarter_agg_save[i]\n    )\n\n    # write(full_qry, \"my_file1.txt\", append=TRUE)\n    # writeLines(full_qry)\n    # DBI::dbExecute(con, full_qry)\n    }\n\nWe need to give access to the appropriate groups:\n\nfor(i in 2:length(quarter_agg_save)){\n  qry &lt;- '\n  GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE tr_wi_2023.%s\n  TO group ci_read_group;\n\n  GRANT SELECT, UPDATE, DELETE, INSERT ON TABLE tr_wi_2023.%s \n  TO group db_t00111_rw;\n  '\n  full_qry = sprintf(qry, quarter_agg_save[i], quarter_agg_save[i])\n  # writeLines(full_qry)\n  # DBI::dbExecute(con, full_qry)\n}"
   },
   {
     "objectID": "supplemental Materials/supplemental_naics_xwalk.html",
     "href": "supplemental Materials/supplemental_naics_xwalk.html",
-    "title": "Working with NAICS Crosswalk",
+    "title": "12  Working with NAICS Crosswalk",
     "section": "",
-    "text": "Introduction\nThis supplemental notebook focuses on linking the NAICS-employer crosswalk with the data model, particularly the fact table. A similar procedure can be followed for matching the crosswalk with the UI wage records table.\nWe will start by loading necessary packages not readily available in the base R setup.\noptions(scipen = 999) # avoid scientific notation\nlibrary(RJDBC)\nlibrary(tidyverse)\nAnd then set up a connection to the Redshift database:\ndbusr=Sys.getenv(\"DBUSER\")\ndbpswd=Sys.getenv(\"DBPASSWD\")\n\nurl &lt;- \"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;\nloginToRp=urn:amazon:webservices:govcloud;ssl=true;\nAutoCreate=true;idp_host=adfs.adrf.net;idp_port=443;\nssl_insecure=true;\nplugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\"\n\ndriver &lt;- JDBC(\n  \"com.amazon.redshift.jdbc42.Driver\",\n  classPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\",\n  identifier.quote=\"`\"\n)\n\ncon &lt;- dbConnect(driver, url, dbusr, dbpswd)\nFor this code to work, you need to have an .Renviron file in your user folder (i.e. U:\\\\John.Doe.P00002) containing your username and password.\nBefore linking the crosswalk with additional employer information available in other tables, it is helpful to identify any potential discrepancies that may affect the quality of the linkage. Here, we will investigate the columns we plan to use in our join to ensure consistency between the sources.\nEmployer information is available in all three data sources - PROMIS, UI Wage Records, and of course, the NAICS crosswalk. We’ll start with the data we’re already using.\nGiven the information we’ve learned about the various columns storing employer UI account numbers in the different tables, the recommended process for joining the information in the crosswalk table to the fact (and UI wage records too) is as follows:\nWe will demonstrate this on our analytic frame below, which can be accessed with the following SQL code:\nqry &lt;- \"\nselect f.*\nfrom tr_wi_2023.nb_cohort c \njoin tr_wi_2023.wi_mdim_person p on (c.ssn = p.ssn)\njoin tr_wi_2023.wi_fact_weekly_observation f on (p.person_id = f.person_id)\nlimit 5\n\"\n\ndbGetQuery(con, qry)\nAlthough you can read this table, along with the crosswalk one, separately into R and then join, for speed and compiling efficiency reasons, we recommend performing the join in SQL.\nThe most general join between the analytic frame and the crosswalk table can be achieved with the following code:\nqry &lt;- \"\nwith xwalk_conv as (\n  --CONVERT UI ACCOUNT TO INTEGER, IGNORE PROBLEMATIC VALUES IN WHERE\n  select cast(cast(ui_account_root_number as int) as varchar) as ui_account, employer_naics_code\n  from ds_wi_dwd.employer_naics_xwalk enx \n  where ui_account_root_number not like '%,%'\n)\nselect f.*, xwalk_conv.employer_naics_code as primary_employer_naics\nfrom tr_wi_2023.nb_cohort c \njoin tr_wi_2023.wi_mdim_person p on (c.ssn = p.ssn)\njoin tr_wi_2023.wi_fact_weekly_observation f on (p.person_id = f.person_id)\nleft join xwalk_conv on (f.primary_employer_id = xwalk_conv.ui_account)\nlimit 5\n\"\ndbGetQuery(con, qry)\nThis code will provide lots of duplication in cases where the primary employer has multiple NAICS codes, since the quarterly wage information is on the quarterly grain, whereas the UI benefit information is stored weekly. For our more specific example, built purely for pedagogical purposes, we keep the information we read into R at the quarterly grain, only pursuing employment observations in the same quarter as a corresponding UI benefit claim. This will allow us to easily display an approach for working with multiple NAICS codes corresponding to the same employer.\nqry &lt;- \"\nwith xwalk_conv as (\n  --CONVERT UI ACCOUNT TO INTEGER, IGNORE PROBLEMATIC VALUES IN WHERE\n  select cast(cast(ui_account_root_number as int) as varchar) as ui_account, employer_naics_code\n  from ds_wi_dwd.employer_naics_xwalk enx \n  where ui_account_root_number not like '%,%'\n)\n--ONLY TAKE DISTINCT QUARTERLY ENTRIES FOR RELEVANT COLUMNS\nselect distinct f.person_id, f.last_employer_naics, f.calendar_year, f.calendar_quarter, \n  f.primary_employer_id, xwalk_conv.employer_naics_code as primary_employer_naics\nfrom tr_wi_2023.nb_cohort c \njoin tr_wi_2023.wi_mdim_person p on (c.ssn = p.ssn)\njoin tr_wi_2023.wi_fact_weekly_observation f on (p.person_id = f.person_id)\nleft join xwalk_conv on (f.primary_employer_id = xwalk_conv.ui_account)\nwhere f.last_employer_naics is not null and f.primary_employer_id is not null \n\"\ncohort_subset &lt;- dbGetQuery(con, qry)\nIf, for example, we wanted to identify individuals who remained in the same industry pre- and post-benefit reception, our data frame - not the process for creating the data frame - may look similar to cohort_subset.\nhead(cohort_subset)\nWith all of the information now existing in the data frame, to account for employers matching to more than one NAICS code in the industry crosswalk, we can develop an indicator that returns TRUE when last_employer_naics is the same as any of the primary_employer_naics values for a given individual/quarter combination. To avoid over-counting the number of entries that satisfy the condition, we can take the distinct combination of person/year/quarter and our indicator, as shown below:\ncohort_subset %&gt;%\n  group_by(person_id, calendar_year, calendar_quarter) %&gt;%\n  mutate(\n    naics_test = any(primary_employer_naics == last_employer_naics),\n    # account for any employers with NULL naics values in xwalk\n    naics_test = ifelse(is.na(naics_test), FALSE, naics_test)\n  ) %&gt;%\n  ungroup() %&gt;%\n  distinct(person_id, calendar_year, calendar_quarter, naics_test) %&gt;%\n  head()\nThere are certainly other approaches to working with multiple industry values per employer - for example, a more strict interpretation may employ the complement to any(), all()."
+    "text": "13 Introduction\nThis supplemental notebook focuses on linking the NAICS-employer crosswalk with the data model, particularly the fact table. A similar procedure can be followed for matching the crosswalk with the UI wage records table.\nWe will start by loading necessary packages not readily available in the base R setup.\noptions(scipen = 999) # avoid scientific notation\nlibrary(RJDBC)\nlibrary(tidyverse)\nAnd then set up a connection to the Redshift database:\ndbusr=Sys.getenv(\"DBUSER\")\ndbpswd=Sys.getenv(\"DBPASSWD\")\n\nurl &lt;- \"jdbc:redshift:iam://adrf-redshift11.cdy8ch2udktk.us-gov-west-1.redshift.amazonaws.com:5439/projects;\nloginToRp=urn:amazon:webservices:govcloud;ssl=true;\nAutoCreate=true;idp_host=adfs.adrf.net;idp_port=443;\nssl_insecure=true;\nplugin_name=com.amazon.redshift.plugin.AdfsCredentialsProvider\"\n\ndriver &lt;- JDBC(\n  \"com.amazon.redshift.jdbc42.Driver\",\n  classPath = \"C:\\\\drivers\\\\redshift_withsdk\\\\redshift-jdbc42-2.1.0.12\\\\redshift-jdbc42-2.1.0.12.jar\",\n  identifier.quote=\"`\"\n)\n\ncon &lt;- dbConnect(driver, url, dbusr, dbpswd)\nFor this code to work, you need to have an .Renviron file in your user folder (i.e. U:\\\\John.Doe.P00002) containing your username and password.\nBefore linking the crosswalk with additional employer information available in other tables, it is helpful to identify any potential discrepancies that may affect the quality of the linkage. Here, we will investigate the columns we plan to use in our join to ensure consistency between the sources.\nEmployer information is available in all three data sources - PROMIS, UI Wage Records, and of course, the NAICS crosswalk. We’ll start with the data we’re already using.\nGiven the information we’ve learned about the various columns storing employer UI account numbers in the different tables, the recommended process for joining the information in the crosswalk table to the fact (and UI wage records too) is as follows:\nWe will demonstrate this on our analytic frame below, which can be accessed with the following SQL code:\nqry &lt;- \"\nselect f.*\nfrom tr_wi_2023.nb_cohort c \njoin tr_wi_2023.wi_mdim_person p on (c.ssn = p.ssn)\njoin tr_wi_2023.wi_fact_weekly_observation f on (p.person_id = f.person_id)\nlimit 5\n\"\n\ndbGetQuery(con, qry)\nAlthough you can read this table, along with the crosswalk one, separately into R and then join, for speed and compiling efficiency reasons, we recommend performing the join in SQL.\nThe most general join between the analytic frame and the crosswalk table can be achieved with the following code:\nqry &lt;- \"\nwith xwalk_conv as (\n  --CONVERT UI ACCOUNT TO INTEGER, IGNORE PROBLEMATIC VALUES IN WHERE\n  select cast(cast(ui_account_root_number as int) as varchar) as ui_account, employer_naics_code\n  from ds_wi_dwd.employer_naics_xwalk enx \n  where ui_account_root_number not like '%,%'\n)\nselect f.*, xwalk_conv.employer_naics_code as primary_employer_naics\nfrom tr_wi_2023.nb_cohort c \njoin tr_wi_2023.wi_mdim_person p on (c.ssn = p.ssn)\njoin tr_wi_2023.wi_fact_weekly_observation f on (p.person_id = f.person_id)\nleft join xwalk_conv on (f.primary_employer_id = xwalk_conv.ui_account)\nlimit 5\n\"\ndbGetQuery(con, qry)\nThis code will provide lots of duplication in cases where the primary employer has multiple NAICS codes, since the quarterly wage information is on the quarterly grain, whereas the UI benefit information is stored weekly. For our more specific example, built purely for pedagogical purposes, we keep the information we read into R at the quarterly grain, only pursuing employment observations in the same quarter as a corresponding UI benefit claim. This will allow us to easily display an approach for working with multiple NAICS codes corresponding to the same employer.\nqry &lt;- \"\nwith xwalk_conv as (\n  --CONVERT UI ACCOUNT TO INTEGER, IGNORE PROBLEMATIC VALUES IN WHERE\n  select cast(cast(ui_account_root_number as int) as varchar) as ui_account, employer_naics_code\n  from ds_wi_dwd.employer_naics_xwalk enx \n  where ui_account_root_number not like '%,%'\n)\n--ONLY TAKE DISTINCT QUARTERLY ENTRIES FOR RELEVANT COLUMNS\nselect distinct f.person_id, f.last_employer_naics, f.calendar_year, f.calendar_quarter, \n  f.primary_employer_id, xwalk_conv.employer_naics_code as primary_employer_naics\nfrom tr_wi_2023.nb_cohort c \njoin tr_wi_2023.wi_mdim_person p on (c.ssn = p.ssn)\njoin tr_wi_2023.wi_fact_weekly_observation f on (p.person_id = f.person_id)\nleft join xwalk_conv on (f.primary_employer_id = xwalk_conv.ui_account)\nwhere f.last_employer_naics is not null and f.primary_employer_id is not null \n\"\ncohort_subset &lt;- dbGetQuery(con, qry)\nIf, for example, we wanted to identify individuals who remained in the same industry pre- and post-benefit reception, our data frame - not the process for creating the data frame - may look similar to cohort_subset.\nhead(cohort_subset)\nWith all of the information now existing in the data frame, to account for employers matching to more than one NAICS code in the industry crosswalk, we can develop an indicator that returns TRUE when last_employer_naics is the same as any of the primary_employer_naics values for a given individual/quarter combination. To avoid over-counting the number of entries that satisfy the condition, we can take the distinct combination of person/year/quarter and our indicator, as shown below:\ncohort_subset %&gt;%\n  group_by(person_id, calendar_year, calendar_quarter) %&gt;%\n  mutate(\n    naics_test = any(primary_employer_naics == last_employer_naics),\n    # account for any employers with NULL naics values in xwalk\n    naics_test = ifelse(is.na(naics_test), FALSE, naics_test)\n  ) %&gt;%\n  ungroup() %&gt;%\n  distinct(person_id, calendar_year, calendar_quarter, naics_test) %&gt;%\n  head()\nThere are certainly other approaches to working with multiple industry values per employer - for example, a more strict interpretation may employ the complement to any(), all()."
   },
   {
     "objectID": "supplemental Materials/supplemental_naics_xwalk.html#fact-table",
     "href": "supplemental Materials/supplemental_naics_xwalk.html#fact-table",
-    "title": "Working with NAICS Crosswalk",
+    "title": "12  Working with NAICS Crosswalk",
     "section": "Fact table",
     "text": "Fact table\nThe fact table contains two columns with employer information:\n\nlast_employer: Formerly ui_number in the PROMIS data, tracks the individual’s most recent employer while claiming UI benefits.\nprimary_employer_id: Corresponds to ui_account in the wage records table, tracks the individual’s primary employer in a given quarter.\n\nLet’s take a look at how these variables are currently stored within the fact table:\n\n# filter for non-null observations\nqry &lt;- \"\nselect last_employer, primary_employer_id\nfrom tr_wi_2023.wi_fact_weekly_observation\nwhere last_employer is not null and primary_employer_id is not null\nlimit 5\n\"\ndbGetQuery(con, qry)\n\nWhen developing the fact table, we decided to keep the variable types consistent with their original source - here, that results in the employer UI account numbers stored as integers and characters within the PROMIS and wage record data, respectively. In comparing these two columns, there are processing issues in coercing primary_employer_id to an integer, so we recommend instead transforming last_employer into a character variable.\nWe can also find the possible lengths of the UI account number variables:\n\n# last employer (from PROMIS)\nqry &lt;- \"\nselect length(last_employer), count(*)\nfrom tr_wi_2023.wi_fact_weekly_observation\nwhere last_employer is not null\ngroup by length(last_employer)\norder by length(last_employer)\n\"\ndbGetQuery(con, qry)\n\n\n# current employer (from UI wages)\nqry &lt;- \"\nselect length(primary_employer_id), count(*)\nfrom tr_wi_2023.wi_fact_weekly_observation\nwhere primary_employer_id is not null\ngroup by length(primary_employer_id)\norder by length(primary_employer_id)\n\"\ndbGetQuery(con, qry)\n\nWe can see that the potential lengths of both variables span from 1-6."
   },
   {
     "objectID": "supplemental Materials/supplemental_naics_xwalk.html#crosswalk",
     "href": "supplemental Materials/supplemental_naics_xwalk.html#crosswalk",
-    "title": "Working with NAICS Crosswalk",
+    "title": "12  Working with NAICS Crosswalk",
     "section": "Crosswalk",
     "text": "Crosswalk\nThe crosswalk table has been ingested into the ds_wi_dwd schema as employer_naics_xwalk. As we did in the exploratory data analysis, or EDA, process, we can start by looking at a few rows from the table.\n\nqry &lt;- \"\nselect *\nfrom ds_wi_dwd.employer_naics_xwalk\nlimit 5\n\"\n\ndbGetQuery(con, qry)\n\nThankfully, the table is quite simple, just containing the two columns required to link industry-level information to employers existing in other tables. Note the leading zeroes here - if you recall, the other variables tracking employer UI account numbers both do not have leading zeroes. Before looking at account number lengths, let’s verify that each row contains a unique UI account number:\n\nqry &lt;- \"\nselect count(*) as total_rows, count(distinct(ui_account_root_number)) as total_emps\nfrom ds_wi_dwd.employer_naics_xwalk\n\"\n\ndbGetQuery(con, qry)\n\nThere appear to be instances of an employer UI account number appearing multiple times in the data. We can check to see if there are complete duplicate rows or potentially more than one NAICS code per employer with the following query:\n\n# find instances of more than one unique naics per ui account \n# ignore \"None\" and null values\nqry &lt;- \"\nselect ui_account_root_number, count(distinct(employer_naics_code))\nfrom ds_wi_dwd.employer_naics_xwalk\nwhere employer_naics_code != 'None' and employer_naics_code is not null \ngroup by ui_account_root_number\nhaving count(distinct(employer_naics_code)) &gt; 1\nlimit 5\n\"\ndbGetQuery(con, qry)\n\nWhen using the crosswalk table, we need to be careful of cases where an employer may have multiple NAICS codes, as these values cannot be prioritized with the available data. We will introduce an approach in the following section.\nLastly, we can compare lengths by converting the variable with leading zeroes to an integer using cast() in SQL.\n\n# ignoring ui account numbers with processing issues in WHERE clause\nqry &lt;- \"\nwith int_conv as (\n  --CONVERT UI ACCOUNT TO INTEGER, IGNORE PROBLEMATIC VALUES IN WHERE CLAUSE (CONTAIN COMMAS)\n  select cast(ui_account_root_number as int) as num \n  from ds_wi_dwd.employer_naics_xwalk enx \n  where ui_account_root_number not like '%,%'\n) \n--FIND DISTRIBUTION OF UI ACCOUNT NUMBER LENGTHS IGNORING LEADING ZEROES\nselect length(num), count(*) \nfrom int_conv\ngroup by length(num)\norder by length(num)\n\"\n\ndbGetQuery(con, qry)\n\nNote that there are some UI account numbers in the crosswalk with more than 6 digits, excluding non-leading zeroes. Although it would be theoretically possible to simply link on the last six digits of the UI account numbers, there may be different employers with the same last six digits, resulting in inaccurate joins. Therefore, the UI account numbers with more than six digits will not join to any of the employer-level information in the other files. That being said, the employer information in the PROMIS datA already contains NAICS codes, so we’re really just focused on joining to the employers in the wage records."
   }
diff --git a/supplemental Materials/supplemental_data_model.html b/supplemental Materials/supplemental_data_model.html
index d35d40e..ce97c2c 100644
--- a/supplemental Materials/supplemental_data_model.html	
+++ b/supplemental Materials/supplemental_data_model.html	
@@ -8,7 +8,7 @@
 
 <meta name="author" content="Joshua Edelmann, Benjamin Feder, Roy McKenzie">
 
-<title>Wisconsin Applied Data Analytics 2023 Class - Data Model Construction for Longitudinal Analysis</title>
+<title>Wisconsin Applied Data Analytics 2023 Class - 10&nbsp; Data Model Construction for Longitudinal Analysis</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -112,7 +112,7 @@
       <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
         <i class="bi bi-layout-text-sidebar-reverse"></i>
       </button>
-      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../supplemental Materials/supplemental_data_model.html"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></a></li></ol></nav>
+      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../supplemental Materials/supplemental_data_model.html"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></a></li></ol></nav>
       <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
       </a>
       <button type="button" class="btn quarto-search-button" aria-label="" onclick="window.quartoOpenSearch();">
@@ -132,7 +132,7 @@
     <div class="sidebar-title mb-0 py-0">
       <a href="../">Wisconsin Applied Data Analytics 2023 Class</a> 
         <div class="sidebar-tools-main">
-    <a href="https://github.com/Coleridge-Initiative" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
+    <a href="https://github.com/Coleridge-Initiative/ada-2023-wi" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
 </div>
     </div>
       </div>
@@ -151,62 +151,74 @@
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Exploratory Data Analysis</span></a>
+  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cross-Sectional Analysis</span></a>
+  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></a>
+  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></a>
+  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Measurement</span></a>
+  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Visualization</span></a>
+  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Characterizing Demand</span></a>
+  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Disclosure Review</span></a>
+  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Dimensional Model Scripts</span></a>
+  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link active"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></a>
+  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link active">
+ <span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Building Employer Measures</span></a>
+  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Working with NAICS Crosswalk</span></a>
+  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></span></a>
   </div>
 </li>
     </ul>
@@ -219,20 +231,20 @@
     <h2 id="toc-title">Table of contents</h2>
    
   <ul>
-  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction">Introduction</a></li>
-  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup">Technical setup</a>
+  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction"><span class="header-section-number">11</span> Introduction</a></li>
+  <li><a href="#technical-setup" id="toc-technical-setup" class="nav-link" data-scroll-target="#technical-setup"><span class="header-section-number">12</span> Technical setup</a>
   <ul class="collapse">
   <li><a href="#load-libraries" id="toc-load-libraries" class="nav-link" data-scroll-target="#load-libraries">Load Libraries</a></li>
   <li><a href="#establish-database-connection" id="toc-establish-database-connection" class="nav-link" data-scroll-target="#establish-database-connection">Establish Database Connection</a></li>
   </ul></li>
-  <li><a href="#record-linkage-and-dimensional-modeling" id="toc-record-linkage-and-dimensional-modeling" class="nav-link" data-scroll-target="#record-linkage-and-dimensional-modeling">Record Linkage and Dimensional Modeling</a>
+  <li><a href="#record-linkage-and-dimensional-modeling" id="toc-record-linkage-and-dimensional-modeling" class="nav-link" data-scroll-target="#record-linkage-and-dimensional-modeling"><span class="header-section-number">13</span> Record Linkage and Dimensional Modeling</a>
   <ul class="collapse">
   <li><a href="#merged-dimensions" id="toc-merged-dimensions" class="nav-link" data-scroll-target="#merged-dimensions">Merged Dimensions</a></li>
   <li><a href="#time-dimension" id="toc-time-dimension" class="nav-link" data-scroll-target="#time-dimension">Time Dimension</a></li>
   </ul></li>
-  <li><a href="#mastering" id="toc-mastering" class="nav-link" data-scroll-target="#mastering">Mastering</a></li>
-  <li><a href="#fact-table" id="toc-fact-table" class="nav-link" data-scroll-target="#fact-table">Fact Table</a></li>
-  <li><a href="#references" id="toc-references" class="nav-link" data-scroll-target="#references">References</a></li>
+  <li><a href="#mastering" id="toc-mastering" class="nav-link" data-scroll-target="#mastering"><span class="header-section-number">14</span> Mastering</a></li>
+  <li><a href="#fact-table" id="toc-fact-table" class="nav-link" data-scroll-target="#fact-table"><span class="header-section-number">15</span> Fact Table</a></li>
+  <li><a href="#references" id="toc-references" class="nav-link" data-scroll-target="#references"><span class="header-section-number">16</span> References</a></li>
   </ul>
 </nav>
     </div>
@@ -241,7 +253,7 @@ <h2 id="toc-title">Table of contents</h2>
 
 <header id="title-block-header" class="quarto-title-block default">
 <div class="quarto-title">
-<h1 class="title"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></h1>
+<h1 class="title"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></h1>
 <p class="subtitle lead">Module 2: Supplemental Notebook</p>
 </div>
 
@@ -278,8 +290,8 @@ <h1 class="title"><span class="chapter-title">Data Model Construction for Longit
 }
 </style>
 </div>
-<section id="introduction" class="level1">
-<h1>Introduction</h1>
+<section id="introduction" class="level1" data-number="11">
+<h1 data-number="11"><span class="header-section-number">11</span> Introduction</h1>
 <p>This supplemental notebook covers record linkage and creating a linked data model to facilitate longitudinal analyses.</p>
 <p>Analyses involving administrative data often require:</p>
 <ul>
@@ -293,8 +305,8 @@ <h1>Introduction</h1>
 <p>This notebook will introduce and demonstrate some helpful techniques for linking administrative data while mediating the above issues. The output of the notebook should provide a flexible and performant framework that meets the needs of most projects and can be easily customized to include additional variables or characteristics.</p>
 <p>The linked data assets documented in this notebook have already been completely created and loaded in the <strong>tr_wi_2023</strong> schema as tables beginning with a “wi” prefix. This notebook will not create or load duplicative copies of the linked dataset, but rather cover the techniques used to construct and load the model and hopefully serve as a resource to use when building future linked datasets.</p>
 </section>
-<section id="technical-setup" class="level1">
-<h1>Technical setup</h1>
+<section id="technical-setup" class="level1" data-number="12">
+<h1 data-number="12"><span class="header-section-number">12</span> Technical setup</h1>
 <p>Here, we will reintroduce the code required to set up our environment to connect to the proper database and load certain packages. If you aren’t concerned with the technical setup of this workbook, please feel free to skip ahead to the next section, <a href="#record-linkage-and-dimensional-modeling">Record linkage and Dimensional Modeling</a>.</p>
 <section id="load-libraries" class="level2">
 <h2 class="anchored" data-anchor-id="load-libraries">Load Libraries</h2>
@@ -327,8 +339,8 @@ <h2 class="anchored" data-anchor-id="establish-database-connection">Establish Da
 <p>For this code to work, you need to have an <code>.Renviron</code> file in your user folder (i.e.&nbsp;U:\John.Doe.P00002) containing your username and password.</p>
 </section>
 </section>
-<section id="record-linkage-and-dimensional-modeling" class="level1">
-<h1>Record Linkage and Dimensional Modeling</h1>
+<section id="record-linkage-and-dimensional-modeling" class="level1" data-number="13">
+<h1 data-number="13"><span class="header-section-number">13</span> Record Linkage and Dimensional Modeling</h1>
 <p>Record linkage is an important component of any analysis, unless you have a fictitious perfectly curated dataset with no messiness or missing variables, and especially when it comes to linking administrative records. Unlike survey data that allows for perfectly selected variables with some potential for messiness, administrative data is tailored to administrative purposes (not academic). That means that we will not have all of the variables we ideally want, and it also means that the data can be messy (either missing responses or with variables that we may not quite understand or have at our disposal). While we may not directly address missing responses (more on indirectly addressing this in the inference lecture), we can enrich our data set by pulling in relevant information from other sources.</p>
 <p>To facilitate easy and performant analysis of very large record sets (quarterly wages, PROMIS file), we will be formatting the data in a dimensional model. This type of model:</p>
 <ul>
@@ -450,8 +462,8 @@ <h2 class="anchored" data-anchor-id="time-dimension">Time Dimension</h2>
 </div>
 </section>
 </section>
-<section id="mastering" class="level1">
-<h1>Mastering</h1>
+<section id="mastering" class="level1" data-number="14">
+<h1 data-number="14"><span class="header-section-number">14</span> Mastering</h1>
 <p>Unlike reference data that is consistent across states (NAICS, SOC), master data refer to the unique collection of persons, employers, or households served by each state. A state can have many different references to the same real-world entity, and mastering is the processing of assembling a set that has one member (record) for each unique instance of an entity in the real world.</p>
 <p>This master record can merge attributes from multiple sources, resulting in a “golden record” with a higher completeness than is available in individual sources. When multiple references to the same entity have different values, those differences are resolved through a process called survivorship in which decisions are made about which value to keep (most recent, most frequent, highest quality source, etc.).</p>
 <p>In our example, due to the messy nature of administrative data, there are individuals whose gender, race, ethnicity, and birth date values change over time, and even within the same case. First, let’s check how many individuals this concerns.</p>
@@ -561,8 +573,8 @@ <h1>Mastering</h1>
 <span id="cb7-80"><a href="#cb7-80" aria-hidden="true" tabindex="-1"></a><span class="fu">dbGetQuery</span>(con, mastering_code)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 </section>
-<section id="fact-table" class="level1">
-<h1>Fact Table</h1>
+<section id="fact-table" class="level1" data-number="15">
+<h1 data-number="15"><span class="header-section-number">15</span> Fact Table</h1>
 <p>The fact table stores the actual observations (facts) of interest. Since this table often contains large numbers of records, it will ideally be comprised of a small number of bytes per row and primarily consist of indexed foreign keys to dimension tables and observation-specific measures. This allows for storage of large records sets with low storage cost and high query performance (extremely helpful for supporting dashboards).</p>
 <p>In this example, the fact table is at the grain of one row per person per week. We will create a record for every week between the first and last observations of a person for both employment and PROMIS data sets, regardless of employment or PROMIS participation in a given week. These “missing” observation weeks are materialized because unemployment and non-participation may be just as interesting for some analyses and longitudinal analysis benefits from consistent representation across time periods of consistent grain.</p>
 <p>Some of our cohort members have observations for multiple employers in a single quarter. Since our unit of analysis is the person, not the person-employer combination, we need to resolve these one-to-many relationships into a single observation while retaining the information pertinent to analysis. In this example, the primary employer and associated wages were identified and recorded based on the employer with the largest wages in the quarter. In order to minimize loss of potentially relevant information, the total wages and number of employers are also included on each observation.</p>
@@ -690,8 +702,8 @@ <h1>Fact Table</h1>
 <span id="cb8-116"><a href="#cb8-116" aria-hidden="true" tabindex="-1"></a><span class="fu">dbGetQuery</span>(con,fact_qry)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 </section>
-<section id="references" class="level1">
-<h1>References</h1>
+<section id="references" class="level1" data-number="16">
+<h1 data-number="16"><span class="header-section-number">16</span> References</h1>
 <p>McGough, R., et.al., Spring 2022 Applied Data Analytics Training, Arkansas Work-Based Learning to Workforce Outcomes, Linked Dataset Construction for Longitudinal Analysis</p>
 <p>Abowd, et. al., The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators, 2006 (https://lehd.ces.census.gov/doc/technical_paper/tp-2006-01.pdf).</p>
 <p>Kimball, R., &amp; Ross, M. (2019). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, Ed. Wiley.</p>
@@ -936,12 +948,12 @@ <h1>References</h1>
 <nav class="page-navigation">
   <div class="nav-page nav-page-previous">
       <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="pagination-link">
-        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-title">Dimensional Model Scripts</span></span>
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></span>
       </a>          
   </div>
   <div class="nav-page nav-page-next">
       <a href="../supplemental Materials/supplemental_employer_measures.html" class="pagination-link">
-        <span class="nav-page-text"><span class="chapter-title">Building Employer Measures</span></span> <i class="bi bi-arrow-right-short"></i>
+        <span class="nav-page-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></span> <i class="bi bi-arrow-right-short"></i>
       </a>
   </div>
 </nav>
diff --git a/supplemental Materials/supplemental_employer_measures.html b/supplemental Materials/supplemental_employer_measures.html
index 82baad1..854f495 100644
--- a/supplemental Materials/supplemental_employer_measures.html	
+++ b/supplemental Materials/supplemental_employer_measures.html	
@@ -8,7 +8,7 @@
 
 <meta name="author" content="Joshua Edelmann, Benjamin Feder">
 
-<title>Wisconsin Applied Data Analytics 2023 Class - Building Employer Measures</title>
+<title>Wisconsin Applied Data Analytics 2023 Class - 11&nbsp; Building Employer Measures</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -114,7 +114,7 @@
       <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
         <i class="bi bi-layout-text-sidebar-reverse"></i>
       </button>
-      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../supplemental Materials/supplemental_employer_measures.html"><span class="chapter-title">Building Employer Measures</span></a></li></ol></nav>
+      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../supplemental Materials/supplemental_employer_measures.html"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></a></li></ol></nav>
       <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
       </a>
       <button type="button" class="btn quarto-search-button" aria-label="" onclick="window.quartoOpenSearch();">
@@ -134,7 +134,7 @@
     <div class="sidebar-title mb-0 py-0">
       <a href="../">Wisconsin Applied Data Analytics 2023 Class</a> 
         <div class="sidebar-tools-main">
-    <a href="https://github.com/Coleridge-Initiative" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
+    <a href="https://github.com/Coleridge-Initiative/ada-2023-wi" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
 </div>
     </div>
       </div>
@@ -153,62 +153,74 @@
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Exploratory Data Analysis</span></a>
+  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cross-Sectional Analysis</span></a>
+  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></a>
+  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></a>
+  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Measurement</span></a>
+  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Visualization</span></a>
+  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Characterizing Demand</span></a>
+  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Disclosure Review</span></a>
+  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Dimensional Model Scripts</span></a>
+  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></a>
+  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link active"><span class="chapter-title">Building Employer Measures</span></a>
+  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link active">
+ <span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Working with NAICS Crosswalk</span></a>
+  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></span></a>
   </div>
 </li>
     </ul>
@@ -221,8 +233,8 @@
     <h2 id="toc-title">Table of contents</h2>
    
   <ul>
-  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction">Introduction</a></li>
-  <li><a href="#quarterly-measures" id="toc-quarterly-measures" class="nav-link" data-scroll-target="#quarterly-measures">Quarterly Measures</a>
+  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction"><span class="header-section-number">12</span> Introduction</a></li>
+  <li><a href="#quarterly-measures" id="toc-quarterly-measures" class="nav-link" data-scroll-target="#quarterly-measures"><span class="header-section-number">13</span> Quarterly Measures</a>
   <ul class="collapse">
   <li><a href="#range-of-quarters-function" id="toc-range-of-quarters-function" class="nav-link" data-scroll-target="#range-of-quarters-function">Range of Quarters Function</a></li>
   <li><a href="#creating-tables" id="toc-creating-tables" class="nav-link" data-scroll-target="#creating-tables">Creating tables</a></li>
@@ -231,8 +243,8 @@ <h2 id="toc-title">Table of contents</h2>
   <li><a href="#aggregate-by-employer" id="toc-aggregate-by-employer" class="nav-link" data-scroll-target="#aggregate-by-employer">Aggregate by Employer</a></li>
   <li><a href="#calculating-growth-rates" id="toc-calculating-growth-rates" class="nav-link" data-scroll-target="#calculating-growth-rates">Calculating growth rates</a></li>
   </ul></li>
-  <li><a href="#aggregation-to-the-calendar-year" id="toc-aggregation-to-the-calendar-year" class="nav-link" data-scroll-target="#aggregation-to-the-calendar-year">Aggregation to the Calendar Year</a></li>
-  <li><a href="#citations" id="toc-citations" class="nav-link" data-scroll-target="#citations">Citations</a></li>
+  <li><a href="#aggregation-to-the-calendar-year" id="toc-aggregation-to-the-calendar-year" class="nav-link" data-scroll-target="#aggregation-to-the-calendar-year"><span class="header-section-number">14</span> Aggregation to the Calendar Year</a></li>
+  <li><a href="#citations" id="toc-citations" class="nav-link" data-scroll-target="#citations"><span class="header-section-number">15</span> Citations</a></li>
   </ul>
 </nav>
     </div>
@@ -241,7 +253,7 @@ <h2 id="toc-title">Table of contents</h2>
 
 <header id="title-block-header" class="quarto-title-block default">
 <div class="quarto-title">
-<h1 class="title"><span class="chapter-title">Building Employer Measures</span></h1>
+<h1 class="title"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></h1>
 <p class="subtitle lead">Module 2: Supplemental Notebook</p>
 </div>
 
@@ -278,8 +290,8 @@ <h1 class="title"><span class="chapter-title">Building Employer Measures</span><
 }
 </style>
 </div>
-<section id="introduction" class="level1">
-<h1>Introduction</h1>
+<section id="introduction" class="level1" data-number="12">
+<h1 data-number="12"><span class="header-section-number">12</span> Introduction</h1>
 <p>This supplemental notebook provides a demonstration of how we can build employer-level characteristics, at the yearly grain, from the Unemployment Insurance (UI) wage records dataset. Our final output from this notebook is a permanent table with employer-level information aggregated to the calendar year for each employer with at least 5 employees in Wisconsin that appears in its UI wage records.</p>
 <p>We will start by loading necessary packages not readily available in the base R setup.</p>
 <blockquote class="blockquote">
@@ -314,8 +326,8 @@ <h1>Introduction</h1>
 </div>
 <p>For this code to work, you need to have an <code>.Renviron</code> file in your user folder (i.e.&nbsp;U:\John.Doe.P00002) containing your username and password.</p>
 </section>
-<section id="quarterly-measures" class="level1">
-<h1>Quarterly Measures</h1>
+<section id="quarterly-measures" class="level1" data-number="13">
+<h1 data-number="13"><span class="header-section-number">13</span> Quarterly Measures</h1>
 <p>We will define each employer as a unique <code>ui_account</code> value in the UI wage records,, developing the following measures for each <code>ui_account</code>:</p>
 <p><strong>Firm characteristics</strong></p>
 <ul>
@@ -803,8 +815,8 @@ <h2 class="anchored" data-anchor-id="calculating-growth-rates">Calculating growt
 </div>
 </section>
 </section>
-<section id="aggregation-to-the-calendar-year" class="level1">
-<h1>Aggregation to the Calendar Year</h1>
+<section id="aggregation-to-the-calendar-year" class="level1" data-number="14">
+<h1 data-number="14"><span class="header-section-number">14</span> Aggregation to the Calendar Year</h1>
 <p>Now that we have our aggregations and growth rates calculated, we will combine these into a single yearly aggregation table. We need to create our start and end strings of the query and then we will paste these strings together.</p>
 <div class="cell" data-layout-align="center">
 <div class="sourceCode cell-code" id="cb24"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a>string <span class="ot">=</span> <span class="st">"</span></span>
@@ -876,8 +888,8 @@ <h1>Aggregation to the Calendar Year</h1>
 <span id="cb27-9"><a href="#cb27-9" aria-hidden="true" tabindex="-1"></a><span class="co"># DBI::dbExecute(con, qry)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
 </section>
-<section id="citations" class="level1">
-<h1>Citations</h1>
+<section id="citations" class="level1" data-number="15">
+<h1 data-number="15"><span class="header-section-number">15</span> Citations</h1>
 <p>Feder, Benjamin, Garner, Maryah, Nunez, Allison, &amp; Mian, Rukhshan. (2022, December 19). Creating Supplemental Employment Measures using Indiana’s Unemployment Insurance Wage Records. Zenodo. https://doi.org/10.5281/zenodo.7459730</p>
 
 
@@ -1120,12 +1132,12 @@ <h1>Citations</h1>
 <nav class="page-navigation">
   <div class="nav-page nav-page-previous">
       <a href="../supplemental Materials/supplemental_data_model.html" class="pagination-link">
-        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span>
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span>
       </a>          
   </div>
   <div class="nav-page nav-page-next">
       <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="pagination-link">
-        <span class="nav-page-text"><span class="chapter-title">Working with NAICS Crosswalk</span></span> <i class="bi bi-arrow-right-short"></i>
+        <span class="nav-page-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></span> <i class="bi bi-arrow-right-short"></i>
       </a>
   </div>
 </nav>
diff --git a/supplemental Materials/supplemental_naics_xwalk.html b/supplemental Materials/supplemental_naics_xwalk.html
index 868684e..05aad42 100644
--- a/supplemental Materials/supplemental_naics_xwalk.html	
+++ b/supplemental Materials/supplemental_naics_xwalk.html	
@@ -8,7 +8,7 @@
 
 <meta name="author" content="Benjamin Feder">
 
-<title>Wisconsin Applied Data Analytics 2023 Class - Working with NAICS Crosswalk</title>
+<title>Wisconsin Applied Data Analytics 2023 Class - 12&nbsp; Working with NAICS Crosswalk</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -111,7 +111,7 @@
       <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
         <i class="bi bi-layout-text-sidebar-reverse"></i>
       </button>
-      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../supplemental Materials/supplemental_naics_xwalk.html"><span class="chapter-title">Working with NAICS Crosswalk</span></a></li></ol></nav>
+      <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../supplemental Materials/supplemental_naics_xwalk.html"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></a></li></ol></nav>
       <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar,#quarto-sidebar-glass" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
       </a>
       <button type="button" class="btn quarto-search-button" aria-label="" onclick="window.quartoOpenSearch();">
@@ -131,7 +131,7 @@
     <div class="sidebar-title mb-0 py-0">
       <a href="../">Wisconsin Applied Data Analytics 2023 Class</a> 
         <div class="sidebar-tools-main">
-    <a href="https://github.com/Coleridge-Initiative" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
+    <a href="https://github.com/Coleridge-Initiative/ada-2023-wi" rel="" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
 </div>
     </div>
       </div>
@@ -150,62 +150,74 @@
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Exploratory Data Analysis</span></a>
+  <a href="../Class Notebooks/01A_EDA.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">1</span>&nbsp; <span class="chapter-title">Exploratory Data Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cross-Sectional Analysis</span></a>
+  <a href="../Class Notebooks/01B_cross_section.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">2</span>&nbsp; <span class="chapter-title">Cross-Sectional Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></a>
+  <a href="../Class Notebooks/02_cohort.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 1:&nbsp;Defining a Cohort</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></a>
+  <a href="../Class Notebooks/03_record_linkage.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Cohort Analysis Part 2:&nbsp;Data Model and Record Linkage</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Measurement</span></a>
+  <a href="../Class Notebooks/04_measurement.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Measurement</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Visualization</span></a>
+  <a href="../Class Notebooks/05_datavisualization.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Data Visualization</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Characterizing Demand</span></a>
+  <a href="../Class Notebooks/06_characterizing_demand.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Characterizing Demand</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Disclosure Review</span></a>
+  <a href="../Class Notebooks/07_disclosure_review.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Disclosure Review</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Dimensional Model Scripts</span></a>
+  <a href="../Class Notebooks/dimensional_model_creation_scripts.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Dimensional Model Scripts</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></a>
+  <a href="../supplemental Materials/supplemental_data_model.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Data Model Construction for Longitudinal Analysis</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link"><span class="chapter-title">Building Employer Measures</span></a>
+  <a href="../supplemental Materials/supplemental_employer_measures.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></span></a>
   </div>
 </li>
         <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link active"><span class="chapter-title">Working with NAICS Crosswalk</span></a>
+  <a href="../supplemental Materials/supplemental_naics_xwalk.html" class="sidebar-item-text sidebar-link active">
+ <span class="menu-text"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></span></a>
   </div>
 </li>
     </ul>
@@ -218,13 +230,13 @@
     <h2 id="toc-title">Table of contents</h2>
    
   <ul>
-  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction">Introduction</a></li>
-  <li><a href="#exploration" id="toc-exploration" class="nav-link" data-scroll-target="#exploration">Exploration</a>
+  <li><a href="#introduction" id="toc-introduction" class="nav-link active" data-scroll-target="#introduction"><span class="header-section-number">13</span> Introduction</a></li>
+  <li><a href="#exploration" id="toc-exploration" class="nav-link" data-scroll-target="#exploration"><span class="header-section-number">14</span> Exploration</a>
   <ul class="collapse">
   <li><a href="#fact-table" id="toc-fact-table" class="nav-link" data-scroll-target="#fact-table">Fact table</a></li>
   <li><a href="#crosswalk" id="toc-crosswalk" class="nav-link" data-scroll-target="#crosswalk">Crosswalk</a></li>
   </ul></li>
-  <li><a href="#merging-crosswalk-with-our-analytic-frame" id="toc-merging-crosswalk-with-our-analytic-frame" class="nav-link" data-scroll-target="#merging-crosswalk-with-our-analytic-frame">Merging Crosswalk with our Analytic Frame</a></li>
+  <li><a href="#merging-crosswalk-with-our-analytic-frame" id="toc-merging-crosswalk-with-our-analytic-frame" class="nav-link" data-scroll-target="#merging-crosswalk-with-our-analytic-frame"><span class="header-section-number">15</span> Merging Crosswalk with our Analytic Frame</a></li>
   </ul>
 </nav>
     </div>
@@ -233,7 +245,7 @@ <h2 id="toc-title">Table of contents</h2>
 
 <header id="title-block-header" class="quarto-title-block default">
 <div class="quarto-title">
-<h1 class="title"><span class="chapter-title">Working with NAICS Crosswalk</span></h1>
+<h1 class="title"><span class="chapter-number">12</span>&nbsp; <span class="chapter-title">Working with NAICS Crosswalk</span></h1>
 <p class="subtitle lead">Module 2: Supplemental Notebook</p>
 </div>
 
@@ -270,8 +282,8 @@ <h1 class="title"><span class="chapter-title">Working with NAICS Crosswalk</span
 }
 </style>
 </div>
-<section id="introduction" class="level1">
-<h1>Introduction</h1>
+<section id="introduction" class="level1" data-number="13">
+<h1 data-number="13"><span class="header-section-number">13</span> Introduction</h1>
 <p>This supplemental notebook focuses on linking the NAICS-employer crosswalk with the data model, particularly the fact table. A similar procedure can be followed for matching the crosswalk with the UI wage records table.</p>
 <p>We will start by loading necessary packages not readily available in the base R setup.</p>
 <blockquote class="blockquote">
@@ -303,8 +315,8 @@ <h1>Introduction</h1>
 </div>
 <p>For this code to work, you need to have an <code>.Renviron</code> file in your user folder (i.e.&nbsp;<code>U:\\John.Doe.P00002</code>) containing your username and password.</p>
 </section>
-<section id="exploration" class="level1">
-<h1>Exploration</h1>
+<section id="exploration" class="level1" data-number="14">
+<h1 data-number="14"><span class="header-section-number">14</span> Exploration</h1>
 <p>Before linking the crosswalk with additional employer information available in other tables, it is helpful to identify any potential discrepancies that may affect the quality of the linkage. Here, we will investigate the columns we plan to use in our join to ensure consistency between the sources.</p>
 <p>Employer information is available in all three data sources - PROMIS, UI Wage Records, and of course, the NAICS crosswalk. We’ll start with the data we’re already using.</p>
 <section id="fact-table" class="level2">
@@ -409,8 +421,8 @@ <h2 class="anchored" data-anchor-id="crosswalk">Crosswalk</h2>
 <p>Note that there are some UI account numbers in the crosswalk with more than 6 digits, excluding non-leading zeroes. Although it would be theoretically possible to simply link on the last six digits of the UI account numbers, there may be different employers with the same last six digits, resulting in inaccurate joins. Therefore, the UI account numbers with more than six digits will not join to any of the employer-level information in the other files. That being said, the employer information in the PROMIS datA already contains NAICS codes, so we’re really just focused on joining to the employers in the wage records.</p>
 </section>
 </section>
-<section id="merging-crosswalk-with-our-analytic-frame" class="level1">
-<h1>Merging Crosswalk with our Analytic Frame</h1>
+<section id="merging-crosswalk-with-our-analytic-frame" class="level1" data-number="15">
+<h1 data-number="15"><span class="header-section-number">15</span> Merging Crosswalk with our Analytic Frame</h1>
 <p>Given the information we’ve learned about the various columns storing employer UI account numbers in the different tables, the recommended process for joining the information in the crosswalk table to the fact (and UI wage records too) is as follows:</p>
 <ul>
 <li>Get rid of leading zeroes in the crosswalk table by converting <code>ui_account_root_number</code> to an integer.</li>
@@ -730,7 +742,7 @@ <h1>Merging Crosswalk with our Analytic Frame</h1>
 <nav class="page-navigation">
   <div class="nav-page nav-page-previous">
       <a href="../supplemental Materials/supplemental_employer_measures.html" class="pagination-link">
-        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-title">Building Employer Measures</span></span>
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Building Employer Measures</span></span>
       </a>          
   </div>
   <div class="nav-page nav-page-next">