Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Implement cascading deletion for obsolete TrackedEntities[DHIS2-15066] #196

Merged
merged 19 commits into from
Jan 16, 2025
Merged
369 changes: 369 additions & 0 deletions releases/2.42/migration-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ To help you navigate the document, here's a detailed table of contents.
- [Inconsistent data](#inconsistent-data)
- [Tracker](#tracker)
- [Null Organisation Unit](#null-organisation-unit)
- [Null Tracked Entity Type](#null-tracked-entity-type)
---
## Inconsistent-data

Expand Down Expand Up @@ -270,6 +271,374 @@ BEGIN
DROP TABLE temp_event_ids;
DROP TABLE temp_pm_ids;

END;
$$;
```

### Null Tracked Entity Type

The `TrackedEntity`, previously known as `TrackedEntityInstance`, is required to have a specific type, such as person, place, equipment, or area. However, this constraint was not enforced at the database level, leading to inconsistent data. To ensure data integrity moving forward, we need to enforce this requirement by making the `trackedentitytypeid` column in the `trackedentity` table non-nullable.

#### Checking for Null Values

To check for any NULL values in the trackedentitytypeid column, you can use the following SQL script. If the list returned is not empty, it indicates the presence of inconsistent data that the migration process was unable to resolve. The uids returned can be used in subsequent steps of this procedure.

##### For >= 2.41 Instances:

```sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we shouldn't show entities that we can fix in the migration, so we should filter out any tracked entity that has an enrollment with a program that has a defined trackedEntityType

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should change this query to not show the entities that will be fixed by the migration

SELECT STRING_AGG(te.uid, ', ') AS uids
FROM trackedentity te
WHERE te.trackedentitytypeid IS NULL
AND NOT EXISTS (
SELECT 1
FROM enrollment e JOIN program p on e.programid = p.programid
WHERE e.trackedentityid = te.trackedentityid and p.trackedentitytypeid IS NOT NULL
);
```
##### For <= 2.40 Instances:

```sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

SELECT STRING_AGG(te.uid, ', ') AS uids
FROM trackedentityinstance te
WHERE te.trackedentitytypeid IS NULL
AND NOT EXISTS (
SELECT 1
FROM programinstance pi JOIN program p ON pi.programid = p.programid
WHERE pi.trackedentityinstanceid = te.trackedentityinstanceid and p.trackedentitytypeid IS NOT NULL );
```

#### Fixing Null Values
zubaira marked this conversation as resolved.
Show resolved Hide resolved

Starting from version v42, NULL values are no longer allowed in the trackedentitytypeid column. The migration attempted to address the invalid data, but it was unsuccessful. There are two options going forward.
- Change the `NULL` value to a valid trackedentitytypeid. ([Assign trackedentitytyeid to tracked entity](#assign-tracked-entity-type))
- Completely remove invalid trackedentity record. ([Delete trackedentities](#deleting-invalid-tracked-entities))

##### Assign tracked entity type
To assign valid trackedentitytypeid
- Use the following command to list all the trackedentitytypes currently available in the system. This query retrieves the uid and name of all trackedentitytypes in your database. Review the output to identify the most appropriate trackedentitytype.

```sql
SELECT uid, name FROM trackedentitytype;
```

- Replace {REFERENCE_UID} with the UID of the selected trackedentitytype and {UIDS} with the UIDs retrieved from the SQL query above. Then, execute the following command:

```sql
UPDATE trackedentity SET trackedentitytypeid=( SELECT trackedentitytypeid FROM trackedentitytype WHERE uid='{REFERENCE_UID}') WHERE trackedentitytypeid IS NULL AND trackedentityid IN ({UIDS})?
```


##### Deleting invalid tracked entities
The script below can be used to remove all trackedentity records where the trackedentitytypeid column is null and migration was not able to fix it. Use this script with caution, as it may result in permanent data loss.

##### For >= 2.41 Instances:

```plsql
zubaira marked this conversation as resolved.
Show resolved Hide resolved
DO $$
DECLARE
invalid_count INT; -- Variable to store the count of invalid TrackedEntities
deleted_count INT := 0; -- Variable to keep track of deleted TrackedEntity count
BEGIN
CREATE TEMP TABLE te AS (
SELECT trackedentityid
FROM trackedentity
WHERE trackedentitytypeid IS NULL
);

CREATE TEMP TABLE enrollment_ids AS (
SELECT enrollmentid
FROM enrollment
WHERE trackedentityid IN (SELECT trackedentityid FROM te)
);

CREATE TEMP TABLE event_ids AS (
SELECT eventid
FROM event
WHERE enrollmentid IN (SELECT enrollmentid FROM enrollment_ids)
);

CREATE TEMP TABLE te_pm AS (
SELECT id
FROM programmessage
WHERE trackedentityid IN (SELECT trackedentityid FROM te)
);

CREATE TEMP TABLE pi_pm AS (
SELECT id
FROM programmessage
WHERE enrollmentid IN (SELECT enrollmentid FROM enrollment_ids)
);

CREATE TEMP TABLE event_pm AS (
SELECT id
FROM programmessage
WHERE eventid IN (SELECT eventid FROM event_ids)
);

SELECT COUNT(trackedentityid)
INTO invalid_count
FROM trackedentity
WHERE trackedentitytypeid IS NULL AND NOT EXISTS (
SELECT 1
FROM enrollment e JOIN program p on e.programid = p.programid
WHERE e.trackedentityid = te.trackedentityid and p.trackedentitytypeid IS NOT NULL
);

RAISE NOTICE 'Number of invalid TrackedEntities (trackedentitytypeid IS NULL): %', invalid_count;

-- If there are any invalid TrackedEntities, proceed with deletion
IF invalid_count > 0 THEN

DELETE FROM programmessage_deliverychannels
WHERE programmessagedeliverychannelsid IN (SELECT id FROM te_pm);

DELETE FROM programmessage_emailaddresses
WHERE programmessageemailaddressid IN (SELECT id FROM te_pm);

DELETE FROM programmessage_phonenumbers
WHERE programmessagephonenumberid IN (SELECT id FROM te_pm);

DELETE FROM programmessage_deliverychannels
WHERE programmessagedeliverychannelsid IN (SELECT id FROM pi_pm);

DELETE FROM programmessage_emailaddresses
WHERE programmessageemailaddressid IN (SELECT id FROM pi_pm);

DELETE FROM programmessage_phonenumbers
WHERE programmessagephonenumberid IN (SELECT id FROM pi_pm);

DELETE FROM programmessage_deliverychannels
WHERE programmessagedeliverychannelsid IN (SELECT id FROM event_pm);

DELETE FROM programmessage_emailaddresses
WHERE programmessageemailaddressid IN (SELECT id FROM event_pm);

DELETE FROM programmessage_phonenumbers
WHERE programmessagephonenumberid IN (SELECT id FROM event_pm);

DELETE FROM event_notes
WHERE eventid IN (SELECT eventid FROM event_ids);

DELETE FROM enrollment_notes
WHERE enrollmentid IN (SELECT enrollmentid FROM enrollment_ids);

DELETE FROM note
WHERE noteid NOT IN (
SELECT noteid FROM event_notes
UNION ALL
SELECT noteid FROM enrollment_notes
);

DELETE FROM trackedentitydatavalueaudit
WHERE eventid IN (SELECT eventid FROM event_ids);

DELETE FROM programmessage
WHERE eventid IN (SELECT eventid FROM event_ids);

DELETE FROM programmessage
WHERE enrollmentid IN (SELECT enrollmentid FROM enrollment_ids);

DELETE FROM event
WHERE enrollmentid IN (SELECT enrollmentid FROM enrollment_ids);

DELETE FROM programmessage
WHERE trackedentityid IN (SELECT trackedentityid FROM te);

DELETE FROM relationshipitem
WHERE trackedentityid IN (SELECT trackedentityid FROM te);

DELETE FROM trackedentityattributevalue
WHERE trackedentityid IN (SELECT trackedentityid FROM te);

DELETE FROM trackedentityattributevalueaudit
WHERE trackedentityid IN (SELECT trackedentityid FROM te);

DELETE FROM trackedentityprogramowner
WHERE trackedentityid IN (SELECT trackedentityid FROM te);

DELETE FROM programtempowner
WHERE trackedentityid IN (SELECT trackedentityid FROM te);

DELETE FROM programtempownershipaudit
WHERE trackedentityid IN (SELECT trackedentityid FROM te);

DELETE FROM programownershiphistory
WHERE trackedentityid IN (SELECT trackedentityid FROM te);

DELETE FROM enrollment
WHERE trackedentityid IN (SELECT trackedentityid FROM te);

WITH deleted AS (DELETE FROM trackedentity WHERE trackedentitytypeid IS NULL AND NOT EXISTS (
SELECT 1
FROM enrollment e JOIN program p on e.programid = p.programid
WHERE e.trackedentityid = te.trackedentityid and p.trackedentitytypeid IS NOT NULL
)
RETURNING *) SELECT COUNT(*) INTO deleted_count FROM deleted;

RAISE NOTICE 'Total number of TrackedEntities deleted: %', deleted_count;

ELSE
RAISE NOTICE 'No invalid TrackedEntities found for deletion.';
END IF;

DROP TABLE IF EXISTS te, enrollment_ids, event_ids, te_pm, pi_pm, event_pm;

END;
$$;
```

##### For <= 2.40 Instances:

```plsql
DO $$
DECLARE
invalid_count INT; -- Variable to store the count of invalid TrackedEntityInstances
deleted_count INT := 0; -- Variable to keep track of deleted TrackedEntityInstance count
BEGIN
CREATE TEMP TABLE te AS (
SELECT trackedentityinstanceid
FROM trackedentityinstance
WHERE trackedentitytypeid IS NULL
);

CREATE TEMP TABLE pi_ids AS (
SELECT programinstanceid
FROM programinstance
WHERE trackedentityinstanceid IN (SELECT trackedentityinstanceid FROM te)
);

CREATE TEMP TABLE psi_ids AS (
SELECT programstageinstanceid
FROM programstageinstance
WHERE programinstanceid IN (SELECT programinstanceid FROM pi_ids )
);

CREATE TEMP TABLE te_pm AS (
SELECT id
FROM programmessage
WHERE trackedentityinstanceid IN (SELECT trackedentityinstanceid FROM te)
);

CREATE TEMP TABLE pi_pm AS (
SELECT id
FROM programmessage
WHERE programinstanceid IN (SELECT programinstanceid FROM pi_ids )
);

CREATE TEMP TABLE psi_pm AS (
SELECT id
FROM programmessage
WHERE programstageinstanceid IN (SELECT programstageinstanceid FROM psi_ids )
);

SELECT COUNT(trackedentityinstanceid)
INTO invalid_count
FROM trackedentityinstance
WHERE trackedentitytypeid IS NULL AND NOT EXISTS (
SELECT 1
FROM programinstance e JOIN program p on e.programid = p.programid
WHERE e.trackedentityinstanceid = te. trackedentityinstanceid and p.trackedentitytypeid IS NOT NULL
);

RAISE NOTICE 'Number of invalid TrackedEntityInstances (trackedentitytypeid IS NULL): %', invalid_count;

-- If there are any invalid TrackedEntityInstances, proceed with deletion
IF invalid_count > 0 THEN

DELETE FROM programmessage_deliverychannels
WHERE programmessagedeliverychannelsid IN (SELECT id FROM te_pm);

DELETE FROM programmessage_emailaddresses
WHERE programmessageemailaddressid IN (SELECT id FROM te_pm);

DELETE FROM programmessage_phonenumbers
WHERE programmessagephonenumberid IN (SELECT id FROM te_pm);

DELETE FROM programmessage_deliverychannels
WHERE programmessagedeliverychannelsid IN (SELECT id FROM pi_pm);

DELETE FROM programmessage_emailaddresses
WHERE programmessageemailaddressid IN (SELECT id FROM pi_pm);

DELETE FROM programmessage_phonenumbers
WHERE programmessagephonenumberid IN (SELECT id FROM pi_pm);

DELETE FROM programmessage_deliverychannels
WHERE programmessagedeliverychannelsid IN (SELECT id FROM psi_pm );

DELETE FROM programmessage_emailaddresses
WHERE programmessageemailaddressid IN (SELECT id FROM psi_pm );

DELETE FROM programmessage_phonenumbers
WHERE programmessagephonenumberid IN (SELECT id FROM psi_pm );

DELETE FROM programstageinstancecomments
WHERE programstageinstanceid IN (SELECT programstageinstanceid FROM psi_ids );

DELETE FROM programinstancecomments
WHERE programinstanceid IN (SELECT programinstanceid FROM pi_ids );

DELETE FROM trackedentitycomment
WHERE trackedentitycommentid NOT IN (
SELECT trackedentitycommentid FROM programstageinstancecomments
UNION ALL
SELECT trackedentitycommentid FROM programinstancecomments
);

DELETE FROM trackedentitydatavalueaudit
WHERE programstageinstanceid IN (SELECT programstageinstanceid FROM psi_ids );

DELETE FROM programmessage
WHERE programstageinstanceid IN (SELECT programstageinstanceid FROM psi_ids );

DELETE FROM programmessage
WHERE programinstanceid IN (SELECT programinstanceid FROM pi_ids );

DELETE FROM programstageinstance
WHERE programinstanceid IN (SELECT programinstanceid FROM pi_ids );

DELETE FROM programmessage
WHERE trackedentityinstanceid IN (SELECT trackedentityinstanceid FROM te);

DELETE FROM relationshipitem
WHERE trackedentityinstanceid IN (SELECT trackedentityinstanceid FROM te);

DELETE FROM trackedentityattributevalue
WHERE trackedentityinstanceid IN (SELECT trackedentityinstanceid FROM te);

DELETE FROM trackedentityattributevalueaudit
WHERE trackedentityinstanceid IN (SELECT trackedentityinstanceid FROM te);

DELETE FROM trackedentityprogramowner
WHERE trackedentityinstanceid IN (SELECT trackedentityinstanceid FROM te);

DELETE FROM programtempowner
WHERE trackedentityinstanceid IN (SELECT trackedentityinstanceid FROM te);

DELETE FROM programtempownershipaudit
WHERE trackedentityinstanceid IN (SELECT trackedentityinstanceid FROM te);

DELETE FROM programownershiphistory
WHERE trackedentityinstanceid IN (SELECT trackedentityinstanceid FROM te);

DELETE FROM programinstance
WHERE trackedentityinstanceid IN (SELECT trackedentityinstanceid FROM te);

WITH deleted AS (DELETE FROM trackedentityinstanceid WHERE trackedentitytypeid IS NULL AND NOT EXISTS (
SELECT 1
FROM programinstance e JOIN program p on e.programid = p.programid
WHERE e.trackedentityinstanceid = te.trackedentityinstanceid and p.trackedentitytypeid IS NOT NULL
)
RETURNING *) SELECT COUNT(*) INTO deleted_count FROM deleted;

RAISE NOTICE 'Total number of TrackedEntityInstances deleted: %', deleted_count;

ELSE
RAISE NOTICE 'No invalid TrackedEntityInstances found for deletion.';
END IF;

DROP TABLE IF EXISTS te, pi_ids, psi_ids, te_pm, pi_pm, psi_pm;

END;
$$;
```