-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WaveSurfer File Loading Classes #1040
base: master
Are you sure you want to change the base?
WaveSurfer File Loading Classes #1040
Conversation
Hello @easy-electrophysiology! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-11-17 22:30:39 UTC |
Hi @easy-electrophysiology! Thanks for adding support for the wavesurfer format. The code looks like a good start to me.
If analog channels are the most important signal type to be used in Neo then we can add a first wavesurfer IO version supporting only those signals.
I think you are right in only specifying a single signal stream, as you only have one source file and homogeneous data.
Events are for representing special time points during the recording session. If you don't have any type of that data, then you don't need to specify it.
If your data files contains additional information for each signal I would recommend also making these available via Neo using the annotation and array_annotations mechanism. Regarding the test dataset we can either give you access to the gin repository directly or you just send the test dataset including the README file containing the attribution to me and I will upload it. When using pywavesurfer in the back, does this load data in a memory mapping fashion or is the complete dataset copied into memory when calling |
Hi @JuliaSprenger thanks a lot for this, apologies for the delay in response.
|
Hi again, @samuelgarcia Do you have any strong opinions about how to open h5py files? |
Hi thank you for this.
I will check the code and signalstream soon. |
neo/io/wavesurferio.py
Outdated
""" | ||
neo.io have been split in 2 level API: | ||
* neo.io: this API give neo object | ||
* neo.rawio: this API give raw data as they are in files. | ||
|
||
Developper are encourage to use neo.rawio. | ||
|
||
When this is done the neo.io is done automagically with | ||
this king of following code. | ||
|
||
Author: sgarcia | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this comments
neo/io/wavesurferio.py
Outdated
# This is an inportant choice when there are several channels. | ||
# 'split-all' : 1 AnalogSignal each 1 channel | ||
# 'group-by-same-units' : one 2D AnalogSignal for each group of channel with same units | ||
_prefered_signal_group_mode = "split-all" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now now the default is group-by-same-units
Requires the PyWaveSurfer module written by Boaz Mohar and Adam Taylor. | ||
|
||
To Discuss: | ||
- Wavesurfer also has analog output, and digital input / output channels, but here only supported analog input. Is this okay? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. As soon as it is clear in the doc. Go for the most important for you first. Enhance later on demand.
To Discuss: | ||
- Wavesurfer also has analog output, and digital input / output channels, but here only supported analog input. Is this okay? | ||
- I believe the signal streams field is configured correctly here, used AxonRawIO as a guide. | ||
- each segment (sweep) has it's own timestamp, so I beleive no events_signals is correct (similar to winwcprawio not axonrawio) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I don't anderstand
|
||
class WaveSurferRawIO(BaseRawIO): | ||
|
||
extensions = ['fake'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the true extention
|
||
def _parse_header(self): | ||
|
||
pyws_data = ws.loadDataFile(self.filename, format_string="double") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is lazy ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short this load metadata but not load buffer in memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not actually lazy - apologies I understand better now how memmap is working e.g. in the AxonIO, and the logic behind the RawIO / IO API.
Lazy loading could be supported by re-writing PyWaveSurfer or incorporating that code more extensively into a new RawIO module, but my initial thought was to provide a wrapper around PyWaveSurfers IO only. I see now that the format of an IO module with only Block readable and the lazy=True argument not allowed, similar to StimFitIO, is the appropriate way to achieve this, rather than the RawIO API.
If you are happy with the approach, I will start again and provide an IO class only based on StimfitIO. I believe in this case it would also be appropriate to return the data scaled.
for seg_index in range(int(header["NSweepsPerRun"])): | ||
|
||
sweep_id = "sweep_{0:04d}".format(seg_index + 1) # e.g. "sweep_0050" | ||
self._raw_signals[seg_index] = pyws_data[sweep_id]["analogScans"].T # reshape to data x channel for Neo standard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reshape make a copy in memory I guess, you should make the reshape on the fly in _get_analogsignal_chunk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think reshaping does not necessarily create a copy, so maybe this would be worth a test instead of reshaping in every get_chunk
call...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think reshape on h5py buffer need to create a np.array and so a copy.
Maybe this is already an array which is also not good.
for ch_idx, (ch_name, ch_units) in enumerate(zip(ai_channel_names, | ||
ai_channel_units)): | ||
ch_id = ch_idx + 1 | ||
dtype = "float64" # as loaded with "double" argument from PyWaveSurfer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In there any way to load the data as "raw" (certainly int16) because in this case we let PyWaveSurfer to make the scaling.
But the idea of this rawio layer is to be able to load data in raw mode to avoid memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a note / warning, raw data from wavesurfer (and pywavwsurfer) is uncorrected for a NI card specific calibration. I think it should never be used. Not sure if this is relevant to the selection of dtype or memory considerations.
See here for the documentation on it as it might be very confusing to users that would assume linear scaling of A/D values to voltage. I know it tripped us :)
stream_index, channel_indexes): | ||
if channel_indexes is None: | ||
channel_indexes = slice(None) | ||
raw_signals = self._raw_signals[seg_index][slice(i_start, i_stop), channel_indexes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the transopose here.
I have added a new commit, which wraps the PyWaveSurfer module in the io rather than rawio class, using stimfitio as a basis. It is working, though one issue with this is that it would be useful (for our use at least) to have the header, which in this commited version is also made in the io module. However, the header dtypes (_signal_channel_dtype) are imported from rawio.baserawio, so I can imagine this does not conform well to expected use of your rawio/ io classess. Would it be better to generate the header in the rawio class and then load the data in read_block() of the io class? Many thanks, |
Hi Joe. Why did you switch to io level ? |
Thanks for this, I thought as much though the problem is in its current form the pywavesurfer does not support lazy loading and so the entire file must be loaded into memory at once, which I think (?) violates the rawio API. It would be possible to port the PyWaveSufer module to the rawio class but one benefit of leveraging their module is that it is maintained by the WaveSurfer team and so will be kept up to date by them. More practically, I am not sure I will have the time to impliment this solution for a month or so, but it would be possible. Alternatively we could liase with the WaveSurfer team and ask whether they are happy to support arguments for lazy loading. Happy to proceed as you see fit. |
Hi, |
@easy-electrophysiology Any news on this? Tell us if you need more feedback or support with a request towards pywavesurfer. |
@easy-electrophysiology : any news on lazy implementation possibiliy ? |
Hi both apologies for the delayed reply. Happy to contact pywavesurfer, though just reveiwing their module and would be great to get your advice on best way to implement lazy loading. Pywavesurfer loads the file into memory at once with h5py. I believe h5py does support lazy loading with memory loaded when data is sliced. However to convert the pywavesurfer to support lazy loading, as I understand it the module would have to undergo quite a major re-write such that all data scaling etc. is run on each sweep only when loaded into memory (?). Could you see any easier way based on the existing code? |
Hi, guys. I took a look at this. I looked through the code base of pywavesurfer and I don't think is too hard to modify it for lazy reading. However, I think that the true challenge is the scaling. Check the following:
(bold mine) If this can be done (save non-linear scaling) maybe we can have a discussion and I can take care of this @samuelgarcia @JuliaSprenger . |
Maybe this non-linear scaling can be taken care of in |
For reference, this is the operation that they do to transfrom int16 to float: for i in range(0, n_channels):
scaled_data[i, :] = inverse_channel_scales[i] * np.polyval(np.flipud(scaling_coefficients[i, :]),
data_as_ADC_counts[i, :])
|
Hi @h-mayorquin! Would you be up for looking into the changes required for lazy loading? |
Hi, Julia, I realize that I never replied you. I don't have time right now. But if a conversion comes up that requires this I will be able to do it. |
Pull request to add support for WaveSurfer (https://wavesurfer.janelia.org) filetypes. Build around and requires the PyWaveSurfer module written by @boazmohar and @adamltaylor.
This pull request is not finished but have opened to discuss. Next, if this all sounds okay, I write on the neural ensemble list to request upload on the g-node of WaveSurfer filetypes (kindly provided by Boaz and Adam) and write tests for the module.
There were a couple of outstanding questions I had, to ensure the proposed module confirms to the Neo API (thanks BTW for structuring the rawIO API so cleanly and easy to use). These are in the wavesurferrawio.py header but also pasted here:
Best,
Joe