Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch _ResultIterator to using _Parser #52

Conversation

spbnick
Copy link
Contributor

@spbnick spbnick commented Sep 15, 2020

This is an attempt to reuse the _Parser for feeding programs with input data, which enables eventual processing of JSON streams with JQ programs.

Might not be the ideal approach, tell me what you think!

Please see more explanations in individual commit messages.

Based on and requires #51.

@spbnick spbnick force-pushed the switch_resultiterator_to_using_parser branch 2 times, most recently from 95e6a64 to 4ccadd1 Compare September 15, 2020 15:02
@spbnick
Copy link
Contributor Author

spbnick commented Sep 15, 2020

Ah, one undesirable side-effect of this is that we now throw jq.JSONParseError instead of ValueError as some programs might expect. We can forgo the whole idea of a dedicated exception, of course, but I would personally prefer a line in release notes warning about that instead :)

@spbnick
Copy link
Contributor Author

spbnick commented Sep 16, 2020

Added a commit fixing parsing of RS-separated streams (RFC 7464)

@spbnick spbnick force-pushed the switch_resultiterator_to_using_parser branch 2 times, most recently from 8407b2f to 5f00fd0 Compare September 21, 2020 14:28
Add an implementation of parse_json() function accepting either text or
a text iterator and producing an iterable returning parsed values.

Add a naive implementation of parse_json_file() function accepting a
text file object and producing an iterable returning parsed values.

This allows parsing JSON and JSON streams without passing them through a
program.
Let Python give us the length of the "bytes" it already knows, instead
of doing an strlen(). This improves performance a bit.
In addition to (Unicode) strings, also accept "bytes" (and corresponding
iterators) as input to the parser. This allows skipping the
decode/encode step when reading raw data from a file or socket, e.g.
with os.read(). This introduces small, but measurable performance
increase for such cases.
Add support for returning native jv values (wrapped in _JV class) from
the _JSONParser iterator, if the "packed" argument is specified as true.

This enables future use of _JSONParser in _ResultIterator, replacing its
own copy of parsing code.
Remove the "dumpopts" variable from _ResultIterator.__next__(). It was
forgotten when the function was switched to using _jv_to_python().
Switch the _ResultIterator to using an instance of _Parser instead of
bytes, as the input. This avoids the duplication of the parsing code,
and enables eventual processing of JSON streams with programs.
Fix the parsing loop to use the proper interface, thus fixing parsing
RS-separated streams (RFC 7464). Without this, the program would abort
on assertion failure, like this:

    python3: src/jv_parse.c:684: jv_parser_set_buf: Assertion
    `(p->curr_buf == 0 || p->curr_buf_pos == p->curr_buf_length) &&
    "previous buffer not exhausted"' failed.
@spbnick spbnick force-pushed the switch_resultiterator_to_using_parser branch from 5f00fd0 to ce6bddf Compare February 24, 2021 15:21
@spbnick
Copy link
Contributor Author

spbnick commented May 30, 2024

Closing this outdated PR for another attempt.

@spbnick spbnick closed this May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant