-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column label problem in Import Data #28
Comments
Re the column labels, I thought, in that situation, Athena would replace the labels with column numbers. I'll look into it. Re the data every 5 seconds thing, don't use Athena. I don't mean that in a snippish or obnoxious way. I mean that I never wrote Athena with the intent of receiving data at that pace. She's simply not efficient enough for that. To put it another way, we didn't have 5 second scans at my old NSLS beamline, so Athena doesn't know how to cope with that. She and I have co-dependent coping issues 😄 Here's what I suggest instead. Write a little program that converts the data from whatever form it appears every 5 seconds into a json file of the sort explained in the new version of the user's manual: http://bruceravel.github.io/demeter/documents/Athena/output/project.html You can rely on Athena's defaults for most of the parameters -- the main chore will be to generate the x and y (and optionally the i0 and signal) lists and package them correction. In the args dictionary, I'd recommend setting datatype and label explicitly, but you should be able to rely upon Athena's defaults for all the rest. Because of Ifeffit's (*) memory limitations, I would recommend limiting each json file to about 30 scans. You could also use this converter to do the right thing with all of the columns in each data file, which will obviate your first problem ❗ The advantage here is that you use a custom tool to manage the data volume associated with 5 second scans. Your visitors can then "enjoy" Athena at leisure with the json-style project files that you generate. (*) I see from your comment that you are using the Larch backend. That's so exciting! I am thrilled that its working well -- or at all, to be frank! You should still be mindful of Ifeffit's problems because your users will likely be using it when they get home. It's a slow transition.... Your observation of top is interesting. I'll try to do some profiling to see if I can figure out what the hold-up is. Although, are your sure that Athena per se is the slow-poke and not NFS or how Athena is using NFS? (Weren't you the person who diagnosed an NFS problem a while back...?) |
Could you email me one of the files that is taking 10 seconds for the column selection dialog to make an appearance? Or post it on gist. Thanks. |
I just emailed you a typical file that takes quite long to load. We are indeed using the Larch backend: the main reason behind this is Ifeffit being unable to read this multi detector element files properly. It appears to read only the first x rows and ignores the rest of the file, which probably has to do with its memory limitations. Otherwise the Larch backend appears to work quite well, even on a machine that is running multiple Athena sessions by multiple users simultaneously. Once or twice we ran into the following problem:
which we haven't been able to replicate. If I may also make a suggestion: the Demeter installer appears to have a hard dependency on Ifeffit. Perhaps if the installer can verify that a Larch installation is already present, there is no need to force the user to also install Ifeffit? About the JSON file: I assume you recommend this file format because it is processed faster in Athena? Is Larch still used then? The B18 staff already have scripts that deal with summing up the counts in the different elements of the detector. They were even an absolute necessity to get their data read by the Ifeffit backend. I assume they could be modified to output to JSON, but I will need to discuss this with them. |
Oh and I am indeed the NFS guy (although it was actually an access control lists problem) 😄 In this case, although the file is indeed stored on an NFS partition, I doubt that it has a lot of influence here as it is only 1.7 MB. |
Edit the properties of the Athena desktop icon. Change the target from "dathena" to "lathena". There are "l" versions of the Athena, Artemis, and Hephaestus batfiles in It's up to you to install Larch on the machine. It's been a while since I tested these, but I think they work.... |
That's not it at all. The json is simply a different format for the Athena project file. It has the same information contents as the conventional project file (which is a serialization of Demeter data structures), but is easier for other applications to write. The reason I suggest this route is that it bypasses the whole process of repetitively importing data files. No one uses Athena because they love the column selection dialog. Writing qxas data directly to a project file lets you start using Athena for the stuff that is more satisfying. If you take a step back, you could (and probably should) question the wisdom of presenting your users with data files like the one you sent me. Not only does you user not really want to interact with the column selection dialog, your user doesn't really want to figure which columns in the file are interesting in the first place. How often does your user change her mind about which columns to select? Not very often, I bet! So why are you making her think about that? Why not write a file that just has energy, signal, and I0 (and time, I suppose)? Isn't that what she wants? Or, to make the same argument in the opposite way, why do you do dead-time corrections before writing this file? (Do you do deadtime correction? I bet you are because I see the word "xpress" in the file header.) Shouldn't you be writing a file with ICR and OCR values so the user can check your deadtime correction? The answer is: "of course not". For a staff member, or in the rare case where it matters for a user, you Diamond folk have an HDF5 file with all that stuff in it. If you present your user with deadtime corrected data for each element, why not take the next step and give them what they really want? Basically, I am suggesting that you remove even more of the friction from these measurements. |
I agree that NFS has nothing to do with it. My current suspicion (although I have thus far spent much more time eating lunch than looking into it) is that the slowness is in trying to remove the background from wonky data. I suspect that Athena is assuming that these data are transmission because "I0" and "It" as meaningful column labels. But there is no step in "It", so autobk grinds away for a while before giving up the ghost. Another option, besides any of the other suggestion I have made in this issue thread, would be to write a filetype plugin that recognizes the file as being from B18 and measured with your 36 element detector. The advantage of using a filetype plugin is that it provides a mechanism for suggesting which columns to choose. If autobk on non-data is, indeed, the problem, a filetype plugin would be a good solution. It'd take me less than 30 minutes to make one. I'll see what I can do. If it fixes the slowness problem, I'll send it to you for a try-out. |
@tschoonj @bruceravel I would definitely suggest that the beamline have a tool that reduced the data to something more sensible than a file with more than 20 columns. We (at my beamline) have produced such large number of data channels for years, and have been advocating this position for a very long time. We recently saw a similar question on the ifeffit mailing list about data from SSRL. FWIW, we don't have an Athena plugin, we have a standalone conversion tool. It's nice that Athena is able to deal so well with multiple columns, but there are limits to what is possible. For example, it can not deal with doing per-channel deadtime corrections The whole premise of using ASCII column files is that they are human readable. Indeed, to import these into Athena, the user has to explicitly select columns -- the files are not fully parsed and digested. With more than 10 or 20 columns, that concept breaks down, and Athena is really hard to use. The file, though ASCII encoded, is essentially binary. An advantage of a beamline-specific tool is that it could use other binary file types (netcdf, hdf5, etc). We see issues at our beamline with people using Athena with raw fluorescence XAFS files all the time. It's not that Athena gets it wrong or the people are dim, it's that they didn't use the right tool for the job. When we show them the right tool (we call it "deadtime correction", but doing the summing and file simplification is just as important), their lives get much, much simpler: Columns 1 and 2 are energy and mu_fluorescence. In short, a beamline-specific conversion tool are the right solution to the problem you're facing. |
Hi Bruce and Matt, Many thanks for the very useful suggestions. I will discuss with the beamline staff how we can improve their dataprocessing strategies based on your recommendations. @bruceravel Many, many, many thanks for writing the Demeter plugin! I will give it a try tomorrow morning. |
Hi Bruce,
Some B18 staff members here at Diamond have discovered that not all columns of their 36-element detector XAS files are not properly represented in the Column selection dialog window. Although all radiobuttons appear to be present, only about half of the corresponding labels are shown, which I assume is due to space constraints.
I am attaching a screenshot that will make this situation clear.
The whole process of opening a file like these also takes quite long: about 10-15 seconds until the Column selection dialog opens. Keeping an eye on
top
reveals that thelarch
part is fast, below one second, and the rest of the time is spent in Athena itself. During the file-reading the gui freezes.As the B18 typically produces a file like this every 5 seconds during an experiment, the loading and processing of a large number of files becomes really slow for them in post-processing.
Any advice on how we could things speed up here?
We are using the last version of demeter installed on Centos 6 machines.
The text was updated successfully, but these errors were encountered: