Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbuffered Stream Reader #88

Merged
merged 2 commits into from
Jun 20, 2022
Merged

Unbuffered Stream Reader #88

merged 2 commits into from
Jun 20, 2022

Conversation

mbaker3
Copy link
Member

@mbaker3 mbaker3 commented Jun 18, 2022

Adds UnbufferedStreamReader. A stream reader that does not buffer data from the backing BaseStream.
This implementation is less performant than StreamReader but doesn't read ahead on the underlying stream ensuring the BaseStream's position to remains lockstep with the characters read out of UnbufferedStreamReader.

This stream is useful when reading files with mixed encoding or where the BaseStream needs to be advanced the minimum amount required.

What is the current behaviour?

StreamReader fills a buffer of characters when responding to read requests. The characters buffer is filled by repeatedly reading blocks of its BaseStream and converting them into characters until the character buffer is full.

This means that the BaseStream is advanced beyond the point that the StreamReader has been read.

What is the new behaviour?

UnbufferedStreamReader effectively functions the same as StreamReader with a byte and character buffer size set to 1. The UnbufferedStreamReader reads one byte at a time out of BaseStream until it has enough to resolve a full character. That one character is then evaluated by the read method to decide whether to continue reading or return.

Note: The implementation does not currently cover the complete features set of StreamReader. New features will be developed as needed. This effort is captured in #87

Why don't you just calculate actual the BaseStream position based on StreamReader's unused buffer contents?

TL;DR

  • Invalid bytes are replaced with a pre-defined fallback character which may not equal the byte size of the invalid bytes.
  • The reflection relies heavily on Microsoft not changing the way StreamReader is implemented
  • It does not work for variable-byte encodings (Ex: UTF-7)

I went down that path. It's not possible. This StackOverflow post covers the issue in detail. Read the linked post + all comments to understand.

If you really want to dig into this read through StreamReader's source: https://referencesource.microsoft.com/#mscorlib/system/io/streamreader.cs,b5fe1efcec14de32

What issues does this resolve?

None, This class allows developers to continue reading a BaseStream beyond the lifecycle of the UnbufferedStreamReader.

What PRs does this depend on?

  • None

Does this introduce a breaking change?

  • Yes
  • No

@mbaker3 mbaker3 requested a review from jkeon June 18, 2022 01:49
@mbaker3 mbaker3 added the type-feature New feature or request label Jun 18, 2022
@mbaker3 mbaker3 added this to the Anvil v1.0 milestone Jun 18, 2022
A version of StreamReader that doesn't make use of a character buffer. This results in lower performance but the prevents the underlying stream from moving further than needed.
@mbaker3 mbaker3 force-pushed the unbuffered-stream-reader branch from e9abe21 to a65f4ec Compare June 18, 2022 01:51
@jkeon jkeon self-assigned this Jun 18, 2022
Copy link
Member

@jkeon jkeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, otherwise looks good

/// </summary>
/// <remarks>
/// This reader is not currently at feature parity with <see cref="StreamReader"/>. Features will be developed as needed.
/// See: #87
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we link that SO post in here too? Useful background info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What SO post, the one about StreamReader position issues?
I don't think it's necessary to teach the user why position can't be calculated on a different class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was the reason this class existed. Normally you'd use StreamReader but sometimes you can't and good background reading on why that might be the case is in that SO post.

But if you don't agree, I'm a-ok with that too.

IO/UnbufferedStreamReader.cs Outdated Show resolved Hide resolved
IO/UnbufferedStreamReader.cs Outdated Show resolved Hide resolved
@mbaker3 mbaker3 requested a review from jkeon June 20, 2022 15:26
@mbaker3
Copy link
Member Author

mbaker3 commented Jun 20, 2022

Feedback addressed

@mbaker3 mbaker3 merged commit 71060c2 into unity-fixes Jun 20, 2022
@mbaker3 mbaker3 deleted the unbuffered-stream-reader branch June 20, 2022 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants