-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate Values in private_map File for Different Keys When Masking in sosreport #3806
Comments
That is a direct consequence of the independent executions of sos cleaners. Something that would be difficult to prevent, esp. when the cleaners operate on different input data/reports. The best easy solution would be to have a lock on default mapping file to prevent concurrent cleaner executions. Another approach - not so easy - that will fix this issue as a side effect, is https://github.com/orgs/sosreport/discussions/3476#discussioncomment-8501581 . Any use case to run sos cleaner concurrently? Can't you streamline the executions to prevent the issue? |
Thank you for pointing that out that the concurrent executions of sos cleaners seem to arise this issue. Regarding streamlining the process, we have different options to trigger sosreport—one being user-driven and the other systemd-driven. When a systemd-driven sosreport is in progress, we wouldn’t want to keep the user waiting for it to finish before they can initiate another report. Therefore, streamlining executions in these scenarios wouldn’t be feasible, as we need to accommodate both types of triggers without introducing delays. |
So even waiting for a lock is not a solution as that would delay execution. The only solution I see as possible is the https://github.com/orgs/sosreport/discussions/3476#discussioncomment-8501581, then. @TurboTurtle or @arif-ali , any idea? |
I don't think that rejecting a locking mechanism for not "delaying an execution" is a reasonable ask. That's the entire point of a locking mechanism, and is exactly what would happen today if multiple sos report collections were kicked off by a systemd service (by default, anyway) - systemd would block concurrent execution of the same |
Assuming the locks, a scenario like:
Do you see that as a reasonable cost for preventing the duplicate values in mapping? If the lceaner would be faster than now, or if its execution time would be deterministic, then ok, but here..? I dont know.. Optionally, the user can kill the |
Can we use file locking and incrementally add each key-value pair as they are generated to effectively resolve the issues? (instead of writing the entire set at the end of the sosreport process |
The way of fixing this very depends on the way how we decide to fix #3097 :
This does not scale much. Hard to guess performance, but assume there will be hundreds to thousands of records to be written to the mapping file. Then:
So, having N new key-pair records to write into a mapping file with M records already present, we (over)write (M+N)*N/2 records into the mapping file, iterativelly. Even for the "linear" use case when no cleaner runs concurrently. We can go that way, but this will be the cost. For the "values are hashes of keys from some secret seed" approach, we dont need such lock on every write. The only locks are:
Anything in between (initial reading of mapping file, generating new pairs of key-value from the same seed) can be done concurrently, and it will be deterministic (for the given same seed). So considering this request, it gives another argument for the "seed option" (https://github.com/orgs/sosreport/discussions/3476#discussioncomment-8501581). Which has its own gotchas (less user-friendly obfuscated names, potential risk to guess the seed via vocabulary attack (this is serious concern), running collector with cleaner among systems with mutliple different seeds). |
When two sosreport collections run concurrently with masking enabled, an issue occurs where duplicate entries are created in both the /etc/sos/clean/default_mapping file and the private_map file. This leads to different keys being mapped to the same masked value, causing confusion when trying to reverse map the masked data from the logs.
Specifically, during concurrent sosreport runs, the first report might assign "host0": "gmail.com", while the second report assigns "host0": "outlook.com". This results in duplicated entries in the private_map and default_mapping files, making it difficult to accurately identify the original data corresponding to the masked values.
Steps to Reproduce:
Expected Behavior:
Each sosreport instance should assign unique masked values to corresponding keys without duplication, ensuring proper reverse mapping from logs.
Impact:
This issue causes confusion and potential data inaccuracies when trying to map masked values back to the original sensitive information, which can be problematic for log analysis and troubleshooting.
The text was updated successfully, but these errors were encountered: