File object (de)serialization is not supported (bson.load, bson.dump) #20

jpaalasm · 2013-01-16T10:59:42Z

When you serialize a document to a file, the serialized bson-string is allocated three times.

Into "buf" (https://github.com/martinkou/bson/blob/master/bson/codec.py#L200)
Into "e_list" (https://github.com/martinkou/bson/blob/master/bson/codec.py#L209)
Into the return variable (https://github.com/martinkou/bson/blob/master/bson/codec.py#L211)

Functions that serialize directly to a file object would not need such allocations.

Are there plans to implement bson.load or bson.dump? The latter is much easier to implement and I can volunteer to do it.

kdikert · 2013-01-17T13:52:36Z

I would also suggest adding a dump() function, which would use a file-like object with a write() method. Compare to the JSON module (http://docs.python.org/2/library/json.html).

I know that this is not optimal as the BSON format requires a total length field right at the beginning of the data. Because of this the first bytes can not be written until the entire length of the data is known. However, I think that the internal code in the codec module would be more robust if it could use a file-like buffer in as many places as possible.

Parkayun · 2015-07-21T06:21:57Z

Great idea :)

eulersIDcrisis · 2022-02-06T00:38:42Z

While it is true that the total length fields are needed to dump out a BSON document, I think it is possible to write out the document to a file (or byte stream, i.e io.BytesIO()), then seek back and fill in the proper values for the length once it is known. The length can be calculated using stm.seek() as long as the file/stream is opened in raw/binary mode by diffing the start of the (possibly) nested document/array from the end of that same document/array. The only requirement here is that the file is seekable. I will note in passing that io.BytesIO() is already seekable in this vain.

I have a separate repo that is toying with the idea here: https://github.com/eulersIDcrisis/ibson
It is incomplete, but it illustrates the idea.

Parkayun added the enhancement label Jul 21, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File object (de)serialization is not supported (bson.load, bson.dump) #20

File object (de)serialization is not supported (bson.load, bson.dump) #20

jpaalasm commented Jan 16, 2013

kdikert commented Jan 17, 2013

Parkayun commented Jul 21, 2015

eulersIDcrisis commented Feb 6, 2022

File object (de)serialization is not supported (bson.load, bson.dump) #20

File object (de)serialization is not supported (bson.load, bson.dump) #20

Comments

jpaalasm commented Jan 16, 2013

kdikert commented Jan 17, 2013

Parkayun commented Jul 21, 2015

eulersIDcrisis commented Feb 6, 2022