Switch _ResultIterator to using _Parser #52

spbnick · 2020-09-15T13:57:38Z

This is an attempt to reuse the _Parser for feeding programs with input data, which enables eventual processing of JSON streams with JQ programs.

Might not be the ideal approach, tell me what you think!

Please see more explanations in individual commit messages.

Based on and requires #51.

spbnick · 2020-09-15T15:09:44Z

Ah, one undesirable side-effect of this is that we now throw jq.JSONParseError instead of ValueError as some programs might expect. We can forgo the whole idea of a dedicated exception, of course, but I would personally prefer a line in release notes warning about that instead :)

spbnick · 2020-09-16T13:33:04Z

Added a commit fixing parsing of RS-separated streams (RFC 7464)

Add an implementation of parse_json() function accepting either text or a text iterator and producing an iterable returning parsed values. Add a naive implementation of parse_json_file() function accepting a text file object and producing an iterable returning parsed values. This allows parsing JSON and JSON streams without passing them through a program.

Let Python give us the length of the "bytes" it already knows, instead of doing an strlen(). This improves performance a bit.

In addition to (Unicode) strings, also accept "bytes" (and corresponding iterators) as input to the parser. This allows skipping the decode/encode step when reading raw data from a file or socket, e.g. with os.read(). This introduces small, but measurable performance increase for such cases.

Add support for returning native jv values (wrapped in _JV class) from the _JSONParser iterator, if the "packed" argument is specified as true. This enables future use of _JSONParser in _ResultIterator, replacing its own copy of parsing code.

Remove the "dumpopts" variable from _ResultIterator.__next__(). It was forgotten when the function was switched to using _jv_to_python().

Switch the _ResultIterator to using an instance of _Parser instead of bytes, as the input. This avoids the duplication of the parsing code, and enables eventual processing of JSON streams with programs.

Fix the parsing loop to use the proper interface, thus fixing parsing RS-separated streams (RFC 7464). Without this, the program would abort on assertion failure, like this: python3: src/jv_parse.c:684: jv_parser_set_buf: Assertion `(p->curr_buf == 0 || p->curr_buf_pos == p->curr_buf_length) && "previous buffer not exhausted"' failed.

spbnick · 2024-05-30T17:29:15Z

Closing this outdated PR for another attempt.

spbnick force-pushed the switch_resultiterator_to_using_parser branch 2 times, most recently from 95e6a64 to 4ccadd1 Compare September 15, 2020 15:02

spbnick force-pushed the switch_resultiterator_to_using_parser branch 2 times, most recently from 8407b2f to 5f00fd0 Compare September 21, 2020 14:28

spbnick added 8 commits February 10, 2021 14:08

Use PyBytes_AsStringAndSize()

42f317e

Let Python give us the length of the "bytes" it already knows, instead of doing an strlen(). This improves performance a bit.

Add basic docstrings to _ProgramWithInput

070ef3b

_ResultIterator: Remove unused dumpopts variable

0c11cdd

Remove the "dumpopts" variable from _ResultIterator.__next__(). It was forgotten when the function was switched to using _jv_to_python().

Switch _ResultIterator to using _Parser

8f212d8

Switch the _ResultIterator to using an instance of _Parser instead of bytes, as the input. This avoids the duplication of the parsing code, and enables eventual processing of JSON streams with programs.

spbnick force-pushed the switch_resultiterator_to_using_parser branch from 5f00fd0 to ce6bddf Compare February 24, 2021 15:21

spbnick closed this May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch _ResultIterator to using _Parser #52

Switch _ResultIterator to using _Parser #52

spbnick commented Sep 15, 2020

spbnick commented Sep 15, 2020

spbnick commented Sep 16, 2020

spbnick commented May 30, 2024

Switch _ResultIterator to using _Parser #52

Switch _ResultIterator to using _Parser #52

Conversation

spbnick commented Sep 15, 2020

spbnick commented Sep 15, 2020

spbnick commented Sep 16, 2020

spbnick commented May 30, 2024