This repository has been archived by the owner on Aug 8, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
fix use of pluto change files #252
Merged
damonmcc
merged 20 commits into
master
from
251-fix-error-when-pluto_corrections-doesnt-exist
Feb 9, 2023
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
81b54cd
change function name
damonmcc b5d2be6
simplify uses of DigitalOceanClient
damonmcc 54e0c63
more specific name
damonmcc c3ebafa
...
damonmcc ac4e8d2
remove badges from page
damonmcc ba074a2
readme format
damonmcc 7f24350
readme
damonmcc 2b19b36
add example.env
damonmcc 7a74b0f
readme
damonmcc 87f7d06
add get_all_filenames_in_folder
damonmcc 04b8287
add and use get_output_folderpath
damonmcc defa58b
fix getting variety of changes zip files
damonmcc 466df08
fix full accounting section
damonmcc 0b31506
fix filenames
damonmcc f42c9f5
improve intro text
damonmcc a177d99
remove default option to changes from all versions
damonmcc 0e18951
in display text replace correction with change
damonmcc f5be3b4
in code replace correction with change
damonmcc 67111a7
text details
damonmcc 05c9e60
missed one
damonmcc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,19 @@ | ||
# Data Engineering Quality Control and Assurance Application | ||
This web application displays charts and tables to assess the consistency, quality and completeness of a particular build of one of data engineering's data products. | ||
It's written in Python using the [streamlit](https://streamlit.io/) framework. | ||
|
||
Best practice to run the app locally is to use the devcontainer | ||
This web application displays charts and tables to assess the consistency, quality and completeness of a particular build of one of data engineering's data products. | ||
|
||
The deployed app is at https://edm-data-engineering.nycplanningdigital.com/?page=Home | ||
|
||
It's written in Python using the [streamlit](https://streamlit.io/) framework. | ||
|
||
The code to produce data this application assess can be found at https://github.com/NYCPlanning/ | ||
|
||
## Dev | ||
|
||
Best practice to run the app locally is to use the dev container (especially via VS Code) | ||
|
||
1. From a dev container terminal, run `./entrypoint.sh` | ||
|
||
2. If in VS Code, a popup should appear with an option to navigate to the site in a browser | ||
|
||
3. If an error of `Access to localhost was denied` appears in the browser, try navigating to `127.0.0.1:5000` rather than `localhost:5000` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
AWS_S3_ENDPOINT= | ||
AWS_SECRET_ACCESS_KEY= | ||
AWS_ACCESS_KEY_ID= | ||
AWS_S3_BUCKET= |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
import streamlit as st | ||
import pandas as pd | ||
import numpy as np | ||
import plotly.express as px | ||
from st_aggrid import AgGrid | ||
from src.constants import COLOR_SCHEME | ||
from abc import ABC | ||
|
||
|
||
class ChangesReport: | ||
def __init__(self, data) -> None: | ||
self.applied_changes = data["pluto_changes_applied"] | ||
self.not_applied_changes = data["pluto_changes_not_applied"] | ||
self.version_dropdown = np.flip( | ||
np.sort(data["pluto_changes_applied"].version.dropna().unique()) | ||
) | ||
|
||
def __call__(self): | ||
st.header("Manual Changes") | ||
|
||
st.markdown( | ||
""" | ||
PLUTO is created using the best available data from a number of city agencies. To further | ||
improve data quality, the Department of City Planning (DCP) applies changes to selected field | ||
values. | ||
|
||
Each change to a field is labeled for a version of PLUTO. | ||
|
||
For programmatic changes, this is version in which the programmatic change was | ||
implemented. For research and user reported changes, this is the version in which the BBL | ||
change was added to PLUTO_input_research.csv. | ||
|
||
For more information about the structure of the pluto changes report, | ||
see the [Pluto Changelog Readme](https://www1.nyc.gov/assets/planning/download/pdf/data-maps/open-data/pluto_change_file_readme.pdf?r=22v1). | ||
|
||
NOTE: This report is based on the files | ||
`pluto_changes_applied.csv`/`pluto_changes_not_applied.csv` | ||
(or legacy files `pluto_corrections_applied.csv`/`pluto_corrections_not_applied.csv`) | ||
""" | ||
) | ||
|
||
if self.applied_changes is None or self.not_applied_changes is None: | ||
st.info( | ||
"There are no available changes reports for this branch. This is likely due to a problem on the backend with the files on Digital Ocean." | ||
) | ||
return | ||
|
||
version = st.sidebar.selectbox( | ||
"Filter the changes to fields by the PLUTO Version in which they were first introduced", | ||
self.version_dropdown, | ||
) | ||
|
||
AppliedChangesSection(self.applied_changes, version)() | ||
NotAppliedChangesSection(self.not_applied_changes, version)() | ||
|
||
st.info( | ||
""" | ||
See [here](https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page) for a full accounting of the changes made for the latest version | ||
in the PLUTO change file. | ||
""" | ||
) | ||
|
||
|
||
class ChangesSection(ABC): | ||
def __init__(self, changes, version) -> None: | ||
super().__init__() | ||
self.changes = self.filter_by_version(changes, version) | ||
self.version_text = self.version_text(version) | ||
|
||
def filter_by_version(self, df, version): | ||
if version == "All": | ||
return df | ||
else: | ||
return df.loc[df["version"] == version] | ||
|
||
def version_text(self, version): | ||
return "All Versions" if version == "All" else f"Version {version}" | ||
|
||
def display_changes_figures(self, df, title): | ||
figure = self.generate_graph(self.field_change_counts(df), title) | ||
st.plotly_chart(figure) | ||
|
||
self.display_changes_df(df, title) | ||
|
||
def generate_graph(self, changes, title): | ||
return px.bar( | ||
changes, | ||
x="field", | ||
y="size", | ||
text="size", | ||
title=title, | ||
labels={"size": "Count of Records", "field": "Altered Field"}, | ||
color_discrete_sequence=COLOR_SCHEME, | ||
) | ||
|
||
def field_change_counts(self, df): | ||
return df.groupby(["field"]).size().to_frame("size").reset_index() | ||
|
||
def display_changes_df(self, changes, title): | ||
changes = changes.sort_values( | ||
by=["version", "reason", "bbl"], ascending=[False, True, True] | ||
) | ||
|
||
AgGrid(data=changes, key=f"display_changes_df_{title}") | ||
|
||
|
||
class AppliedChangesSection(ChangesSection): | ||
def __call__(self): | ||
st.subheader("Manual Changes Applied", anchor="changes-applied") | ||
|
||
if self.changes.empty: | ||
st.info(f"No Changes introduced in {self.version_text} were applied.") | ||
else: | ||
title_text = ( | ||
f"Applied Manual Changes introduced in {self.version_text} by Field" | ||
) | ||
self.display_changes_figures(self.changes, title_text) | ||
st.markdown( | ||
""" | ||
For each record in the PLUTO Changes table, PLUTO attempts to change a record to the New Value column by matching on the BBL and the | ||
Old Value column. The graph and table below outline the records in the pluto changes table that were successfully applied to PLUTO. | ||
""" | ||
) | ||
|
||
|
||
class NotAppliedChangesSection(ChangesSection): | ||
def __call__(self): | ||
st.subheader("Manual Changes Not Applied", anchor="changes-not-applied") | ||
st.markdown( | ||
""" | ||
For each record in the PLUTO Changes table, PLUTO attempts to change a record by matching on the BBL and the | ||
Old Value column. As the underlying datasources change and improve, PLUTO records may no longer match the old value | ||
specified in the pluto changes table. The graph and table below outline the records in the pluto changes table that failed to be applied for this reason. | ||
""" | ||
) | ||
|
||
if self.changes.empty: | ||
st.info(f"All Changes introduced in {self.version_text} were applied.") | ||
else: | ||
title_text = ( | ||
f"Manual Changes not Applied introduced in {self.version_text} by Field" | ||
) | ||
self.display_changes_figures(self.changes, title_text) |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function names are much more detailed...NOICE