Skip to content
This repository has been archived by the owner on Jul 13, 2023. It is now read-only.

Switch to just using PLUTO manual research for corrections #425

Merged
merged 4 commits into from
Mar 8, 2023

Conversation

AmandaDoyle
Copy link
Member

Addresses issue #424

@AmandaDoyle AmandaDoyle added the data update Related to a data product update label Feb 6, 2023
pluto_build/sql/_create.sql Outdated Show resolved Hide resolved
@mbh329
Copy link
Member

mbh329 commented Feb 6, 2023

Otherwise this makes sense to me

Copy link
Member

@damonmcc damonmcc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AmandaDoyle does the csv currently named pluto_input_research.csv have to be edited to ensure it covers everything we've been applying from the archived pluto_corrections files?

@damonmcc
Copy link
Member

damonmcc commented Feb 6, 2023

@AmandaDoyle I'm seeing more uses of the to-be-deprecated pluto_input_research table (example in corr_communitydistricts.sql). but this is a WIP so no worries if those are todos!

and maybe these changes be tested later by building on this branch and comparing to 22v3.1?

@AmandaDoyle AmandaDoyle changed the title WIP: Switch to just using PLUTO manual research for corrections Switch to just using PLUTO manual research for corrections Feb 7, 2023
@AmandaDoyle
Copy link
Member Author

Latest changes do the following

  • get rid of all references of pluto_corrections
  • limits outputs to just pluto_changes_applied and pluto_changes_not_applied

@damonmcc
Copy link
Member

damonmcc commented Feb 7, 2023

just kicked off a build to test the changes

if it works and the output is identical to 22v3.1 from main branch, would you wanna delete the commented out code? should be easy enough to revert changes via git/github if we have to

@damonmcc
Copy link
Member

damonmcc commented Feb 7, 2023

@AmandaDoyle

there's a QAQC app error since an expected pluto_corrections export doesn't exist. I opened an issue for it and can work on it asap

we probably already mentioned this but this error is reminding me that the new research/corrections/change process maybe won't be implemented until 23v1. does that sounds right?

@AmandaDoyle
Copy link
Member Author

@damonmcc thanks for flagging. Working through issue #251 would be good, but if there implications to previous versions of PLUTO (i.e. QAQC reports of 22v2 wouldn't work), perhaps we don't rename the file, or scope out what needs to be changed?

That's right, that the earliest this change would be implemented is for 23v1, so this isn't pressing.

@mbh329
Copy link
Member

mbh329 commented Feb 8, 2023

It seems like the the issue to get the right corrections would need to be fixed in this function function. could we just implement an if/else statement in this function? might be good to talk this through

@damonmcc
Copy link
Member

damonmcc commented Feb 9, 2023

I'll compare 22v3.1 pluto_changes_applied.csv generated by this branch to 22v3.1 pluto_corrections_applied.csv generated on main

can use 22v3 pluto_corrections_applied.csv as a baseline file

@damonmcc
Copy link
Member

damonmcc commented Feb 9, 2023

@AmandaDoyle @mbh329

When comparing 22v3 to 22v3.1 applied changes, I'm seeing only one type of difference in applied changes:

  • for some field, reason = lotarea, Zero lot area changes, the version has changed from 22v3 to 22v3.1

image

When comparing 22v3.1 main branch to 22v3.1 this branch applied changes, I'm seeing many types of differences in applied changes:

  • all Zero lot area changes now have a version of 22v3.1 (maybe because it's a code-based change?)
  • some Normalized owner name changes from pluto_input_research.csv that were never applied are now being applied (e.g. ,ownername,NYC DOT,NYC DEPARTMENT OF TRANSPORTATION,1,Normalized owner name,19v2

a concern: I'm seeing an older version change for the same BBL/field beat a newer one when they're both on pluto_input_research. it's the changes for 1019570140,cd. they're actually the same change but differ in reason and version

@damonmcc
Copy link
Member

damonmcc commented Feb 10, 2023

check to do: compare non-programatic field changes from applied.csv + not_applied.csv and ensure they match those in pluto_unput_research.csv

  • exclude lotarea, ownername

so far seeing

  • 75 more rows in input_research than in combined change files
  • 4 rows in the combined change files indicate (accurately) that unintentional changes are being applied to 2 ownername values. see corrections to CD are being applied to ownername #429. unlikely to be caused by the code changes in this PR

@AmandaDoyle
Copy link
Member Author

@damonmcc and @mbh329

  • It makes sense that all Zero lot area changes now have the latest version. These are programmatic changes and the version of the change comes from version that's actively being build. See code here.
  • Why are some Normalized owner name changes from pluto_input_research.csv that were never applied are now being applied? Can you rationalize this?
  • We should do some cleanup on pluto_input_research.csv so that there are not duplicate corrections. BBL and field should be unique in that table.
  • The reason why input_research may have more table is because applied.csv+ not_applied.csv are limited to BBLs that currently exist in PLUTO. Perhaps we change not_applied.csv to include bbls that are no longer in PLUTO, so that this file can be used to cleanup not_applied.csv, or?
  • corrections to CD are being applied to ownername #429 is separate issue because this issue existed before this change, but this does need to be investigated.

I don't see anything that need to be changed on the code side before merging this in but before merging this in I'd like to check with LS because this will have implications on how we communicate how we make changes and the scope of changes, and that she's prepared for that for the upcoming 23v1 build.

@AmandaDoyle
Copy link
Member Author

@damonmcc and @mbh329
LS is okay merging this in and having these changes be applied to 23v1. I need you to verify that the changes on code and outputs are good to go and pass your review.
Reminder that this is a maintenance item this sprint.

@damonmcc damonmcc self-requested a review March 8, 2023 19:35
@AmandaDoyle AmandaDoyle merged commit 428d94c into main Mar 8, 2023
@AmandaDoyle AmandaDoyle deleted the 424_create_research_table_applied branch March 8, 2023 20:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
data update Related to a data product update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants