Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JP-3708: Adding the CRMAG Array to the Optional Results Product #304

Closed

Conversation

kmacdonald-stsci
Copy link
Collaborator

@kmacdonald-stsci kmacdonald-stsci commented Oct 10, 2024

Resolves JP-3708

This PR addresses a crash that occurs when running multiprocessing for ramp fitting with the C extension while choosing to save the optional results product. This is because the CRMAG array was not implemented in the C extension, so the C extension returns a NoneType, so when the data slices attempt to reassemble an exception is thrown because there is no CRMAG to reassemble.

The CRMAG array is a 4-D array, with dimensions (nints, max_num_crs, nrows, ncols). For each integration ramp there is some number of cosmic ray jumps, including 0. The dimension max_num_crs is the maximum number of jumps found over all integrations and all pixels. For ramps that have fewer than the max number of jumps the last elements in that pixel ramp are 0. For example, if the maximum number of jumps found was 5, but a ramp has only two jumps, the first two elements of the CRMAG array for that integration, row, and column are non-zero, while the last three are 0. The non-zero elements of the ramp array are the magnitude of the jump, which is the difference between the two groups where a jump was detected.

The CRMAG is now implemented in the C extension and functions properly with multiprocessing.

Tasks

  • update or add relevant tests
  • update relevant docstrings and / or docs/ page
  • Does this PR change any API used downstream? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
    • run regression tests with this branch installed ("git+https://github.com/<fork>/stcal@<branch>")
news fragment change types...
  • changes/<PR#>.apichange.rst: change to public API
  • changes/<PR#>.bugfix.rst: fixes an issue
  • changes/<PR#>.general.rst: infrastructure or miscellaneous change

@kmacdonald-stsci
Copy link
Collaborator Author

kmacdonald-stsci commented Oct 10, 2024

Copy link

codecov bot commented Oct 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.34%. Comparing base (60bd3b8) to head (38bad7d).
Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #304      +/-   ##
==========================================
+ Coverage   86.21%   86.34%   +0.13%     
==========================================
  Files          47       49       +2     
  Lines        8812     8896      +84     
==========================================
+ Hits         7597     7681      +84     
  Misses       1215     1215              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jemorrison jemorrison self-requested a review October 23, 2024 18:08
crs->tail->flink = new_node;
crs->tail = new_node;
crs->size++;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kmacdonald-stsci I have not used linked lists in c before. I need a quick overview. I googled it and these double linked lists ? I could use a little more detail so I can figure out what it going on.
Is 'head' the pointer for the VERY FIRST CR segment. And 'tail' the location of the very last one. So are the segments beginning and ends stored or just the first and last locations ?
In the cr_list_add on line 1248 why is the tail = new_node. I did not think head and tail would equal each other. Is that just for the first cr found.
Also I am unclear that flink is supposed represent.

Copy link
Collaborator Author

@kmacdonald-stsci kmacdonald-stsci Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linked list I am using for this is a singly linked list. There are a few different ways to implement this. My implementation uses two structures, the linked list container and the nodes being listed, with only the forward link being defined in each node. The forward link is the flink parameter that points to the next node in the list. When flink is NULL, then that node is the tail of the list:

The linked list data structure contains useful meta data for the list. In particular, the head, the tail, and the total number of nodes in each list, which will be used to create the second dimension for the 4-D CRMAG array, which has dimensions (nints, max_num_crs, nrows, ncols). The tail parameter in the linked list is a convenience variable to go directly to the tail and add the next node, rather than have to traverse the list to the tail each time a new node is added to the list.

So the linked list with n CRs detected looks like this:

cr_list->head = cr_node->flink = pointer to the next node
                cr_node->flink = pointer to the next node
                cr_node->flink = pointer to the next node
                cr_node->flink = pointer to the next node
                ...
cr_list->tail = cr_node->flink = NULL
cr->size = n

So, if some ramp looks like this:

data =    [1.,   2.,   13.,      14.,  15.,  16.,  37.,      38.,  39.,  40.,  41.   102.,     103., 104., 105.]
groupdq = [GOOD, GOOD, JUMP_DET, GOOD, GOOD, GOOD, JUMP_DET, GOOD, GOOD, GOOD, GOOD, JUMP_DET, GOOD, GOOD, GOOD]

Then the list of CR magnitudes for this ramp will look like this:

cr_list->head = {crmag = 11., flink = pointer to next node}
                {crmag = 21., flink = pointer to next node}
cr_list->tail = {crmag = 61., flink = NULL}
cr_list->size = 3

So, yes, cr_list->head is the very first CR magnitude encountered in the ramp and is the difference between that group data and the previous group data. The cr_tail->tail is the last known CR magnitude computed, so if another one is found, a new node is allocated and added as the tail. The result is an ordered linked list starting with the first detected CR magnitude and ending with the last detected CR magnitude.

At line

if (0==crs->size) {

This conditional checks to see if a current list is empty. If it's empty, the size will be 0. When a CR is detected, the CR magnitude is computed and a new cr_node is allocated to store that value. The cr_node is then added to the empty list, resulting in a list of size 1, therefore the head and tail are the same and the size is set to 1. This list will look like this:

cr_list->head = cr_list->tail = cr_node->flink = NULL
cr_list->size = 1

The "else" in this conditional means the list is not empty, so put the newly computed CR magnitude at tail of the list and the size is incremented by 1.

Copy link
Collaborator

@jemorrison jemorrison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have gone through the code a few times. To the best of my ability it looks correct to me and frees the necessary memory when it is done. The comments you gave me were essential for me to understand the flow. If you can somehow add some of those details into the code so in the future when we need to look at the code it is easier to understand what is going on that might be helpful.

@kmacdonald-stsci
Copy link
Collaborator Author

I have gone through the code a few times. To the best of my ability it looks correct to me and frees the necessary memory when it is done. The comments you gave me were essential for me to understand the flow. If you can somehow add some of those details into the code so in the future when we need to look at the code it is easier to understand what is going on that might be helpful.

I added comments to the linked lists used to more thoroughly describe the implementations. Please take a look at them to let me know if they help clarify how they are implemented.

@melanieclarke
Copy link
Contributor

I'm testing these changes on some sample data I have on disk, and this command, running without multiprocessing completes in seconds:
strun ramp_fit jw01247667001_02101_00001_mirifulong_jump.fits --save_opt=True --opt_name=opt_serial.fits

This one, with multiprocessing on, I killed after 5 minutes because it didn't look like it was ever going to complete:
strun ramp_fit jw01247667001_02101_00001_mirifulong_jump.fits --save_opt=True --opt_name=opt_parallel.fits --maximum_cores=half
After killing it, it reports 20 leaked semaphores.

This one, with multiprocessing on but optional output off completes in seconds:
strun ramp_fit jw01247667001_02101_00001_mirifulong_jump.fits --maximum_cores=half
After completion, it reports 10 leaked semaphores.

Can you please double check the interaction between multiprocessing and optional output results? I'm wondering if there's a deadlock somewhere.

@kmacdonald-stsci
Copy link
Collaborator Author

Closing to fix multiprocessing bug in the optional results product.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants