WaveSurfer File Loading Classes #1040

easy-electrophysiology · 2021-10-17T11:13:39Z

Pull request to add support for WaveSurfer (https://wavesurfer.janelia.org) filetypes. Build around and requires the PyWaveSurfer module written by @boazmohar and @adamltaylor.

This pull request is not finished but have opened to discuss. Next, if this all sounds okay, I write on the neural ensemble list to request upload on the g-node of WaveSurfer filetypes (kindly provided by Boaz and Adam) and write tests for the module.

There were a couple of outstanding questions I had, to ensure the proposed module confirms to the Neo API (thanks BTW for structuring the rawIO API so cleanly and easy to use). These are in the wavesurferrawio.py header but also pasted here:

Wavesurfer also has analog output, and digital input / output channels, but here only supported analog input. Is this okay?
I believe but am not certain the signal streams field is configured correctly, used AxonRawIO as a guide.
each segment (sweep) has it's own timestamp, so I beleive no events_signals is correct (similar to winwcprawio not axonrawio)
For now I have just provided the minimal annotations. Would you recommend providing full annotations for this similar to axonrawio?

Best,
Joe

pep8speaks · 2021-10-17T11:13:43Z

Hello @easy-electrophysiology! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file neo/io/wavesurferio.py:

Line 13:100: E501 line too long (163 > 99 characters)
Line 118:100: E501 line too long (115 > 99 characters)

In the file neo/rawio/wavesurferrawio.py:

Line 10:100: E501 line too long (125 > 99 characters)
Line 12:100: E501 line too long (127 > 99 characters)
Line 39:1: E302 expected 2 blank lines, found 1
Line 65:100: E501 line too long (100 > 99 characters)
Line 66:100: E501 line too long (125 > 99 characters)
Line 82:100: E501 line too long (115 > 99 characters)
Line 113:18: E126 continuation line over-indented for hanging indent

Comment last updated at 2021-11-17 22:30:39 UTC

JuliaSprenger · 2021-10-18T13:01:10Z

Hi @easy-electrophysiology! Thanks for adding support for the wavesurfer format. The code looks like a good start to me.
Regarding your questions:

Wavesurfer also has analog output, and digital input / output channels, but here only supported analog input. Is this okay?

If analog channels are the most important signal type to be used in Neo then we can add a first wavesurfer IO version supporting only those signals.

I believe but am not certain the signal streams field is configured correctly, used AxonRawIO as a guide.

I think you are right in only specifying a single signal stream, as you only have one source file and homogeneous data.

each segment (sweep) has it's own timestamp, so I believe no events_signals is correct (similar to winwcprawio not axonrawio)

Events are for representing special time points during the recording session. If you don't have any type of that data, then you don't need to specify it.

For now I have just provided the minimal annotations. Would you recommend providing full annotations for this similar to axonrawio?

If your data files contains additional information for each signal I would recommend also making these available via Neo using the annotation and array_annotations mechanism.

Regarding the test dataset we can either give you access to the gin repository directly or you just send the test dataset including the README file containing the attribution to me and I will upload it.

When using pywavesurfer in the back, does this load data in a memory mapping fashion or is the complete dataset copied into memory when calling ws.loadDataFile(self.filename, format_string="double")?

easy-electrophysiology · 2021-10-27T00:08:00Z

Hi @JuliaSprenger thanks a lot for this, apologies for the delay in response.

Thanks I will look into the annotations and discuss with the PyWaveSurfer devs to determine if there is anything important to put here.
That would be great to get access to the gin repository, shall I request on the neural ensemble list?
ws.loadDataFile() uses h5py to open the file with h5py.File(). I have had a look backend on h5py but have not been able to determine how exactly the memory is managed when opened through File(), however it looks well optimised. I see Neo uses h5py in some situations, are there any h5py options that are preffered?

JuliaSprenger · 2021-10-27T09:31:38Z

Hi again,
for adding you as a collaborator to the gin repository I need an email address to invite you. Which one can I use for this?

@samuelgarcia Do you have any strong opinions about how to open h5py files?

samuelgarcia · 2021-10-28T15:32:02Z

Hi thank you for this.

about digital input, it is not suppported directly in rawio. This is something I plan in folowing month.
What you can do is detect front and transform this to event but it is not lazy anymore.
about h5py, I don't anderstand the question. normally h5py is totally lazy and load buffer on demand so it should be OK

I will check the code and signalstream soon.

samuelgarcia · 2021-10-28T15:32:23Z

neo/io/wavesurferio.py

+"""
+neo.io have been split in 2 level API:
+  * neo.io: this API give neo object
+  * neo.rawio: this API give raw data as they are in files.
+
+Developper are encourage to use neo.rawio.
+
+When this is done the neo.io is done automagically with
+this king of following code.
+
+Author: sgarcia
+
+"""


remove this comments

samuelgarcia · 2021-10-28T15:32:50Z

neo/io/wavesurferio.py

+    # This is an inportant choice when there are several channels.
+    #   'split-all' :  1 AnalogSignal each 1 channel
+    #   'group-by-same-units' : one 2D AnalogSignal for each group of channel with same units
+    _prefered_signal_group_mode = "split-all"


Now now the default is group-by-same-units

samuelgarcia · 2021-10-28T15:33:35Z

neo/rawio/wavesurferrawio.py

+Requires the PyWaveSurfer module written by Boaz Mohar and Adam Taylor.
+
+To Discuss:
+- Wavesurfer also has analog output, and digital input / output channels, but here only supported analog input. Is this okay?


yes. As soon as it is clear in the doc. Go for the most important for you first. Enhance later on demand.

samuelgarcia · 2021-10-28T15:33:54Z

neo/rawio/wavesurferrawio.py

+To Discuss:
+- Wavesurfer also has analog output, and digital input / output channels, but here only supported analog input. Is this okay?
+- I believe the signal streams field is configured correctly here, used AxonRawIO as a guide.
+- each segment (sweep) has it's own timestamp, so I beleive no events_signals is correct (similar to winwcprawio not axonrawio)


This I don't anderstand

samuelgarcia · 2021-10-28T15:34:08Z

neo/rawio/wavesurferrawio.py

+
+class WaveSurferRawIO(BaseRawIO):
+
+    extensions = ['fake']


Use the true extention

samuelgarcia · 2021-10-28T15:34:26Z

neo/rawio/wavesurferrawio.py

+
+    def _parse_header(self):
+
+        pyws_data = ws.loadDataFile(self.filename, format_string="double")


Are you sure this is lazy ?

In short this load metadata but not load buffer in memory

This is not actually lazy - apologies I understand better now how memmap is working e.g. in the AxonIO, and the logic behind the RawIO / IO API.

Lazy loading could be supported by re-writing PyWaveSurfer or incorporating that code more extensively into a new RawIO module, but my initial thought was to provide a wrapper around PyWaveSurfers IO only. I see now that the format of an IO module with only Block readable and the lazy=True argument not allowed, similar to StimFitIO, is the appropriate way to achieve this, rather than the RawIO API.

If you are happy with the approach, I will start again and provide an IO class only based on StimfitIO. I believe in this case it would also be appropriate to return the data scaled.

samuelgarcia · 2021-10-28T15:35:33Z

neo/rawio/wavesurferrawio.py

+        for seg_index in range(int(header["NSweepsPerRun"])):
+
+            sweep_id = "sweep_{0:04d}".format(seg_index + 1)                     # e.g. "sweep_0050"
+            self._raw_signals[seg_index] = pyws_data[sweep_id]["analogScans"].T  # reshape to data x channel for Neo standard


This reshape make a copy in memory I guess, you should make the reshape on the fly in _get_analogsignal_chunk

I think reshaping does not necessarily create a copy, so maybe this would be worth a test instead of reshaping in every get_chunk call...

I think reshape on h5py buffer need to create a np.array and so a copy.
Maybe this is already an array which is also not good.

samuelgarcia · 2021-10-28T15:36:54Z

neo/rawio/wavesurferrawio.py

+        for ch_idx, (ch_name, ch_units) in enumerate(zip(ai_channel_names,
+                                                         ai_channel_units)):
+            ch_id = ch_idx + 1
+            dtype = "float64"  # as loaded with "double" argument from PyWaveSurfer


In there any way to load the data as "raw" (certainly int16) because in this case we let PyWaveSurfer to make the scaling.
But the idea of this rawio layer is to be able to load data in raw mode to avoid memory.

Just as a note / warning, raw data from wavesurfer (and pywavwsurfer) is uncorrected for a NI card specific calibration. I think it should never be used. Not sure if this is relevant to the selection of dtype or memory considerations.
See here for the documentation on it as it might be very confusing to users that would assume linear scaling of A/D values to voltage. I know it tripped us :)

samuelgarcia · 2021-10-28T15:37:16Z

neo/rawio/wavesurferrawio.py

+                                stream_index, channel_indexes):
+        if channel_indexes is None:
+            channel_indexes = slice(None)
+        raw_signals = self._raw_signals[seg_index][slice(i_start, i_stop), channel_indexes]


Make the transopose here.

… rawio version

easy-electrophysiology · 2021-11-17T22:37:41Z

I have added a new commit, which wraps the PyWaveSurfer module in the io rather than rawio class, using stimfitio as a basis. It is working, though one issue with this is that it would be useful (for our use at least) to have the header, which in this commited version is also made in the io module. However, the header dtypes (_signal_channel_dtype) are imported from rawio.baserawio, so I can imagine this does not conform well to expected use of your rawio/ io classess.

Would it be better to generate the header in the rawio class and then load the data in read_block() of the io class?

Many thanks,
Joe

samuelgarcia · 2021-11-18T08:24:13Z

Hi Joe.
It your file format can be written at rawio level it is better because the io is then wrapped with 5 lines of code and then the io will support lazy mode.

Why did you switch to io level ?

easy-electrophysiology · 2021-11-18T08:35:18Z

Thanks for this, I thought as much though the problem is in its current form the pywavesurfer does not support lazy loading and so the entire file must be loaded into memory at once, which I think (?) violates the rawio API.

It would be possible to port the PyWaveSufer module to the rawio class but one benefit of leveraging their module is that it is maintained by the WaveSurfer team and so will be kept up to date by them. More practically, I am not sure I will have the time to impliment this solution for a month or so, but it would be possible.

Alternatively we could liase with the WaveSurfer team and ask whether they are happy to support arguments for lazy loading. Happy to proceed as you see fit.

samuelgarcia · 2022-01-24T10:35:29Z

Hi,
sorry for long response delay.
The best alternative would be to ask pywasvesurfer team to add some lazy reading in their API.
No ? Could you contact then ?

JuliaSprenger · 2022-03-31T14:34:45Z

@easy-electrophysiology Any news on this? Tell us if you need more feedback or support with a request towards pywavesurfer.

samuelgarcia · 2022-06-23T14:26:12Z

@easy-electrophysiology : any news on lazy implementation possibiliy ?

easy-electrophysiology · 2022-06-25T22:28:09Z

Hi both apologies for the delayed reply. Happy to contact pywavesurfer, though just reveiwing their module and would be great to get your advice on best way to implement lazy loading. Pywavesurfer loads the file into memory at once with h5py. I believe h5py does support lazy loading with memory loaded when data is sliced.

However to convert the pywavesurfer to support lazy loading, as I understand it the module would have to undergo quite a major re-write such that all data scaling etc. is run on each sweep only when loaded into memory (?). Could you see any easier way based on the existing code?

h-mayorquin · 2022-10-18T19:28:54Z

Hi, guys. I took a look at this. I looked through the code base of pywavesurfer and I don't think is too hard to modify it for lazy reading.

However, I think that the true challenge is the scaling. Check the following:

If you choose to write your own code for reading data, whether in Matlab or in some other language, please be aware that you must take into account the information in the /header/Acquisition/AnalogScalingCoefficients dataset in the HDF5 file. In the data file, the analog data is stored as signed 16-bit integers ("counts"), exactly as read from the ADC during acquisition. To convert these to analog voltages, it is not sufficient to simply scale these values linearly. Instead, the counts must be transformed according to a cubic polynomial, the coefficients of which are stored in /header/Acquisition/AnalogScalingCoefficients. Simply scaling linearly will result in values that are systematically off by 5–10%. Even if you don't use ws.loadDataFile(), you should ensure that any custom code gives answers identical to the output of ws.loadDataFile() on representative data file

(bold mine)
The coefficients are available on the header but you need to apply a polynomial operation to recover the signal. My understanding is that neo (and spikeinterface, @samuelgarcia) only handle linear gains and offsets. If this is correct it seems that this would not be useful. Have you handled something like this before?

If this can be done (save non-linear scaling) maybe we can have a discussion and I can take care of this @samuelgarcia @JuliaSprenger .

JuliaSprenger · 2022-10-19T07:46:44Z

Maybe this non-linear scaling can be taken care of in _get_analogsignal_chunk and not by the standard rawio mechanics. The question is if we can apply a mapping in a lazy way, such that not even the requested chunk is duplicated when applying the scaling. @samuelgarcia Any ideas for this?

h-mayorquin · 2022-10-19T09:38:15Z

For reference, this is the operation that they do to transfrom int16 to float:

  for i in range(0, n_channels):
      scaled_data[i, :] = inverse_channel_scales[i] * np.polyval(np.flipud(scaling_coefficients[i, :]),
                                                                 data_as_ADC_counts[i, :])

data_as_ADC_counts is the int16 data that would be lazy. All the other quantities are in the header and are easy to extract.

https://github.com/JaneliaSciComp/PyWaveSurfer/blob/dc15d74cf70bb3db4501eb6d65049174fffa3398/pywavesurfer/ws.py#L198-L200

JuliaSprenger · 2023-04-03T12:20:41Z

Hi, guys. I took a look at this. I looked through the code base of pywavesurfer and I don't think is too hard to modify it for lazy reading.

Hi @h-mayorquin! Would you be up for looking into the changes required for lazy loading?

h-mayorquin · 2023-05-11T06:47:06Z

Hi, Julia, I realize that I never replied you. I don't have time right now. But if a conversion comes up that requires this I will be able to do it.

JoeZiminski added 6 commits October 10, 2021 15:56

added wavesurferrawio

fa69e9a

added wavesurferrawio to rawio __init__

5318ec7

added wavesurferio to io __init__

beced3a

tidied up wavesurfer rawio and added import check

27ae612

update neo/io/__init__.py

29dfe3c

final tidy up and list TODO and to discuss in header

171e5cb

easy-electrophysiology changed the title ~~Wavesurfer~~ Wavesurfer File Loading Module Oct 17, 2021

easy-electrophysiology changed the title ~~Wavesurfer File Loading Module~~ WaveSurfer File Loading Module Oct 17, 2021

easy-electrophysiology changed the title ~~WaveSurfer File Loading Module~~ WaveSurfer File Loading Classes Oct 17, 2021

JuliaSprenger added the needs-input label Oct 28, 2021

samuelgarcia reviewed Oct 28, 2021

View reviewed changes

JuliaSprenger added this to the 0.11.0 milestone Oct 28, 2021

samuelgarcia reviewed Oct 28, 2021

View reviewed changes

neo/rawio/wavesurferrawio.py

class WaveSurferRawIO(BaseRawIO):

extensions = ['fake']

Copy link

Contributor

samuelgarcia Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the true extention

samuelgarcia reviewed Oct 28, 2021

View reviewed changes

updated to wrap pywavesurfer in the io, in preparation for discarding…

6be83ec

… rawio version

apdavison modified the milestones: 0.10.3, 0.11.0 Aug 30, 2022

apdavison modified the milestones: 0.11.0, 0.12.0 Sep 29, 2022

JuliaSprenger assigned samuelgarcia Oct 19, 2022

JuliaSprenger modified the milestones: 0.12.0, 0.12.1, 0.13.0 Apr 2, 2023

JoeZiminski mentioned this pull request Oct 15, 2023

Refactor for lazy loading JaneliaSciComp/PyWaveSurfer#189

Open

apdavison modified the milestones: 0.13.0, 0.14.0 Jan 26, 2024

apdavison modified the milestones: 0.14.0, future Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WaveSurfer File Loading Classes #1040

WaveSurfer File Loading Classes #1040

easy-electrophysiology commented Oct 17, 2021

pep8speaks commented Oct 17, 2021 •

edited

Loading

JuliaSprenger commented Oct 18, 2021

easy-electrophysiology commented Oct 27, 2021

JuliaSprenger commented Oct 27, 2021

samuelgarcia commented Oct 28, 2021

samuelgarcia Oct 28, 2021

samuelgarcia Oct 28, 2021

samuelgarcia Oct 28, 2021

samuelgarcia Oct 28, 2021

samuelgarcia Oct 28, 2021

samuelgarcia Oct 28, 2021

samuelgarcia Oct 28, 2021

easy-electrophysiology Nov 2, 2021 •

edited

Loading

samuelgarcia Oct 28, 2021

JuliaSprenger Oct 28, 2021

samuelgarcia Oct 28, 2021

samuelgarcia Oct 28, 2021

boazmohar Oct 28, 2021

samuelgarcia Oct 28, 2021

easy-electrophysiology commented Nov 17, 2021 •

edited

Loading

samuelgarcia commented Nov 18, 2021

easy-electrophysiology commented Nov 18, 2021

samuelgarcia commented Jan 24, 2022 •

edited by JuliaSprenger

Loading

JuliaSprenger commented Mar 31, 2022

samuelgarcia commented Jun 23, 2022

easy-electrophysiology commented Jun 25, 2022

h-mayorquin commented Oct 18, 2022

JuliaSprenger commented Oct 19, 2022

h-mayorquin commented Oct 19, 2022

JuliaSprenger commented Apr 3, 2023

h-mayorquin commented May 11, 2023


		def _parse_header(self):

		pyws_data = ws.loadDataFile(self.filename, format_string="double")

WaveSurfer File Loading Classes #1040

Are you sure you want to change the base?

WaveSurfer File Loading Classes #1040

Conversation

easy-electrophysiology commented Oct 17, 2021

pep8speaks commented Oct 17, 2021 • edited Loading

Comment last updated at 2021-11-17 22:30:39 UTC

JuliaSprenger commented Oct 18, 2021

easy-electrophysiology commented Oct 27, 2021

JuliaSprenger commented Oct 27, 2021

samuelgarcia commented Oct 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

easy-electrophysiology Nov 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

easy-electrophysiology commented Nov 17, 2021 • edited Loading

samuelgarcia commented Nov 18, 2021

easy-electrophysiology commented Nov 18, 2021

samuelgarcia commented Jan 24, 2022 • edited by JuliaSprenger Loading

JuliaSprenger commented Mar 31, 2022

samuelgarcia commented Jun 23, 2022

easy-electrophysiology commented Jun 25, 2022

h-mayorquin commented Oct 18, 2022

JuliaSprenger commented Oct 19, 2022

h-mayorquin commented Oct 19, 2022

JuliaSprenger commented Apr 3, 2023

h-mayorquin commented May 11, 2023

pep8speaks commented Oct 17, 2021 •

edited

Loading

easy-electrophysiology Nov 2, 2021 •

edited

Loading

easy-electrophysiology commented Nov 17, 2021 •

edited

Loading

samuelgarcia commented Jan 24, 2022 •

edited by JuliaSprenger

Loading