Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File object (de)serialization is not supported (bson.load, bson.dump) #20

Open
jpaalasm opened this issue Jan 16, 2013 · 3 comments
Open

Comments

@jpaalasm
Copy link

When you serialize a document to a file, the serialized bson-string is allocated three times.

Functions that serialize directly to a file object would not need such allocations.

Are there plans to implement bson.load or bson.dump? The latter is much easier to implement and I can volunteer to do it.

@kdikert
Copy link

kdikert commented Jan 17, 2013

I would also suggest adding a dump() function, which would use a file-like object with a write() method. Compare to the JSON module (http://docs.python.org/2/library/json.html).

I know that this is not optimal as the BSON format requires a total length field right at the beginning of the data. Because of this the first bytes can not be written until the entire length of the data is known. However, I think that the internal code in the codec module would be more robust if it could use a file-like buffer in as many places as possible.

@Parkayun
Copy link
Member

Great idea :)

@eulersIDcrisis
Copy link

While it is true that the total length fields are needed to dump out a BSON document, I think it is possible to write out the document to a file (or byte stream, i.e io.BytesIO()), then seek back and fill in the proper values for the length once it is known. The length can be calculated using stm.seek() as long as the file/stream is opened in raw/binary mode by diffing the start of the (possibly) nested document/array from the end of that same document/array. The only requirement here is that the file is seekable. I will note in passing that io.BytesIO() is already seekable in this vain.

I have a separate repo that is toying with the idea here: https://github.com/eulersIDcrisis/ibson
It is incomplete, but it illustrates the idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants