Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export an analysis to rerun on a different spyglass instance #1129

Open
lfrank opened this issue Sep 23, 2024 · 2 comments
Open

Export an analysis to rerun on a different spyglass instance #1129

lfrank opened this issue Sep 23, 2024 · 2 comments
Labels
enhancement New feature or request infrastructure Unix, MySQL, etc. settings/issues impacting users

Comments

@lfrank
Copy link
Contributor

lfrank commented Sep 23, 2024

It seems possible (perhaps likely?) that different groups will have their own databases, but would like to be able to import a set of analyses / results from another group. This could be something like issue #861 but with a provision to transfer entries to a different database.

There are multiple complexities here, but if this were possible it might be that it would be really useful.

@lfrank lfrank changed the title Export of selected table entries and associated files for a different database Export of selected table entries and associated files to a different database / spyglass install Sep 23, 2024
@CBroz1
Copy link
Member

CBroz1 commented Sep 24, 2024

  • The current export process yields sql files that could be loaded into any database, including lines that drop existing tables to redefine them
  • This repo uses those exports as a basis for a single-analysis container
  • To load the same files into an existing database would require...
    • Either (a) adding flags to the existing dump process to append rather than declare or (b) editing the sql files with some sed/awk operations to remove DROP statements and change CREATE to CREATE IF NOT EXISTS
    • an appropriately credentialed user to run a handful of bash commands, which could be integrated into spyglass itself somewhere

Some questions come to mind regarding data integrity ...

  1. What if there are naming collisions in entries?
  • Case 1: LabA has 'subject1' and tries to load LabB's 'subject1', a different subject.
  • Case 2: LabA and LabB both have data from the same 'subject1', run with 'ParamsA', but this paramset was defined differently in each case.
  • How should a load handle conflicts? It could...
    • Simply reject a load with overlapping names
    • Assume collision refers to the same entity (e.g., assume default paramsets have not been changed)
    • Append some value to the loaded case, (e.g., 'subject1_imported{DATE}')
    • Pairwise compare every case of collision, including data stored as blobs, time intensive
  1. What if there are differences in table definitions?
  • Case 3: LabA has kept up with table alters (e.g., adding new fields), but LabB never ran these alters when updating Spyglass
  • Case 4: LabA and LabB do not share the exact same definition of a downstream custom table
  • How should a load handle these cases? It could ...
    • Reject the load
    • Rename the custom tables (e.g., 'CustomTableImported{DATE}')
    • Attempt to suggest changes to the imported file or alter existing tables

Any monitoring of the ingestion process to resolve collisions is going to be a major lift of parsing error messages from SQL, which DataJoint is better equipped for than Spyglass (maybe worth a feature request from them?). A skilled user could manage these decisions working with SQL directly, but I'm not confident in our ability to do it programmatically in Python. A featurefull approach might be an effort on par with expanding DataJoint by 30% to handle all possible error codes and reverting on fail.

An alternate approach might look more like a 'replication tool' that exported a spec of paramsets to run, and then applied them to a different database. This would require rerunning all computations, but it would allow datajoint and/or the end-user to handle collisions one-by-one

@CBroz1 CBroz1 added enhancement New feature or request infrastructure Unix, MySQL, etc. settings/issues impacting users labels Sep 24, 2024
@lfrank
Copy link
Contributor Author

lfrank commented Sep 26, 2024

Great points, and indeed the replication tool might be by far the best way to approach this given all the challenges. Let's discuss when you're back in town.

@CBroz1 CBroz1 changed the title Export of selected table entries and associated files to a different database / spyglass install Export an analysis to rerun on a different spyglass instance Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request infrastructure Unix, MySQL, etc. settings/issues impacting users
Projects
None yet
Development

No branches or pull requests

2 participants