Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example request #14

Open
ShaJaPas opened this issue Oct 5, 2024 · 1 comment
Open

Example request #14

ShaJaPas opened this issue Oct 5, 2024 · 1 comment

Comments

@ShaJaPas
Copy link

ShaJaPas commented Oct 5, 2024

Hey there. Thanks for the great library.
I want to use this library to parse http/1 and http/2 using tokio tcp stream.
Could you please provide examples of usage in this scenario?

@Wonshtrum
Copy link
Contributor

Wonshtrum commented Dec 9, 2024

Hi! Thanks for your interest and enthusiasm. I'm really sorry I didn't see your issue sooner.
I'll try to give you a simple example and will probably add relevant documentation at some point. But first I would like to clarify 4 points:

  • kawa is actively in development (even if there is not much activity in this repo) and may change drastically.
  • kawa was created specifically for the Sozu reverse proxy, and even if we try to make this crate as generic as possible, kawa is designed and optimized to operate as a streaming middleware, not a sending or receiving endpoint.
  • kawa is quite bare bone and doesn't offer many user-friendly APIs, most primitives are fairly low-level, verbose, and do not protect you from many footguns.
  • kawa is split between the message representation, the protocol decoders streaming into the representation, and the protocol encoders streaming from the representation. The representation is agnostic and should accommodate HTTP/1.x, HTTP/2.0, and even HTTP/3.0, but only HTTP/1.x encoders and decoders are ready to use.

For most use cases, especially if you develop a server and/or client, the hyper crate might be a better fit.

The Kawa agnostic structure is the star of this crate, the decoders/encoders in the protocol folder serve as examples and default implementations. Kawa makes it easy to create your own parsers and converters, which is a good design for reverse proxies but not if you are looking for an easy-to-use endpoint.
Since the Kawa representation holds a single HTTP message, It is still relatively straightforward to use for HTTP/1.x as one socket equals one HTTP message. But for HTTP/2.0, one socket multiplexes several messages so you have to keep track of several Kawa representations and redirect each h2 fragment to the corresponding one. To put it simply, you have to write an h2 client (which is quite the endeavor) that uses Kawa as storage. This is what we are doing in Sozu's experimental h2 branch. When the design is ready we may merge it in kawa, but it will be way harder to use than h1.

Now if you'd like to try kawa anyway, here is a quick demonstration of how to do that. First, you need to decide how you will store your HTTP messages. Kawa only ever sees continuous slices of bytes, it is up to you to determine how it is allocated and stored. One more constraint though is that its size should not change as it will use it as a ring buffer, keeping indexes in it. In Sozu, the underlying buffers come from a pool of preallocated paged align buffers. Let's say you want to use a boxed slice:

struct MyStorage (Box<[u8]>);
impl kawa::AsBuffer for MyStorage {
    fn as_buffer(&self) -> &[u8] {
        &self.0
    }
    fn as_buffer_mut(&mut self) -> &mut [u8] {
        &mut self.0
    }
}

This design offers less possibilities than anticipated and may change. It suffers a couple of flaws too, mainly the size of the buffer is the maximum size of all the headers combined kawa can safely parse (8Kb should be enough for most cases).

Then you can create you Kawa parser:

let buffer = kawa::Buffer::new(MyStorage(Box::new([0; 8192])));
let request_stream = kawa::Kawa::new(kawa::Kind::Request, buffer);

Now upon receiving bytes on the corresponding socket, you can do something like:

fn try_read(socket: &mut TcpStream, request_stream: &mut kawa::Kawa<MyStorage>, context: &mut MyRequestContext) {
    if request_stream.storage.is_full() && !request_stream.is_main_phase() {
        panic!("Too many headers, the buffer is full");
    }
    match socket.read(request_stream.storage.space()) {
        Ok(n) => request_stream.storage.fill(n),
        Err(e) => ...
    }
    kawa::h1::parse(request_stream, context);
    if request_stream.is_error() {
        error!("{:?}", request_stream.parsing_phase);
        panic!("invalid request");
    }
}

Kawa parsers work with a callback system, you must provide a struct that can react to parsing events. Currently, there is a single event emitted when the parser has reached the end of all the headers. This is forced by HTTP/1.x design since many important information can only be validated when no more headers can be received. As such, it is not recommended to peek in kawa's representation until this callback is called. In fact, we recommend you only peek into kawa's representation during this callback. Here is a simple example of how to declare a context structure:

struct MyRequestContext {
    was_called: bool,
    closing: bool,
    original_keep_alive: Option<String>,
    method: Option<String>,
    authority: Option<String>,
    path: Option<String>,
    my_cookie: Option<String>,
}
impl kawa::ParserCallbacks<MyStorage> for MyContext {
    fn on_headers(&mut self, request: &mut kawa::Kawa<MyStorage>) {
        self.was_called = true;
        let buf = &mut request.storage.mut_buffer();

        // peek in the status line
        if let kawa::StatusLine::Request {
            method,
            authority,
            path,
            ..
        } = &request.detached.status_line
        {
            // Even at this point you can't be sure `authority` is set since it is not mandatory in HTTP/0.9
            // method and path should always exist though
            // I'm almost sure the values are always utf8, otherwise the parser should have failed earlier
            // so you may use `from_utf8_unchecked`
            self.method = method.data_opt(buf)
                .and_then(|data| from_utf8(data).ok()))
                .map(ToOwned::to_owned);
            self.authority = authority.data_opt(buf)
                .and_then(|data| from_utf8(data).ok()))
                .map(ToOwned::to_owned);
            self.path = path.data_opt(buf)
                .and_then(|data| from_utf8(data).ok()))
                .map(ToOwned::to_owned);
        }

        // peek in cookies
        for cookie in &mut request.detached.jar {
            let key = cookie.key.data(buf);
            if key == b"MyCookie" {
                let val = cookie.val.data(buf);
                // here also the value should be utf8, however, header value parsing may change soon to allow more encodings
                // so you should check
                self.my_cookie = from_utf8(val).ok().map(ToOwned::to_owned);
                // you can elide any cookie
                cookie.elide();
            }
        }

        // peek in blocks
        for block in &mut request.blocks {
            match block {
                // the parsers may have elided some headers so you should check to avoid panicking later
                kawa::Block::Header(header) if !header.is_elided() => {
                    let key = header.key.data(buf);
                    let val = header.val.data(buf);
                    if key == b"connection" {
                        self.original_keep_alive = from_utf8(val).ok().map(ToOwned::to_owned);
                        if self.closing {
                            // you can elide/add/modify any header
                            header.val = kawa::Store::Static(b"close");
                        }
                    }
                }
            }
        }
    }
}

For a concrete example, you can look at Sozu's callbacks implementation.
I invite you to use kawa::debug_kawa extensively while developing your pipeline.

Now, as said earlier, kawa is designed as a middleware that avoids unnecessary allocation. Everything you push into it stays there and takes space, you have to empty it. If you use kawa as an intermediate between two sockets, you can "pipe" its content to the writable socket:

fn try_write(socket: &mut TcpStream, request_stream: &mut kawa::Kawa<MyStorage>) {
    if !request_stream.is_main_phase() {
        // if kawa has not read all the headers, you shouldn't remove anything from its buffers
        return;
    }
    // convert kawa agnostic representation to a specific protocol, here h1
    request_stream.prepare(&mut kawa::h1::BlockConverter);
    let bufs = request_stream.as_io_slice();
    match socket.write_vectored(&bufs) {
        Ok(n) => request_stream.consume(n),
        Err(e) => ...
    }
    // you can check if kawa thinks the request was completely transferred by calling:
    request_stream.terminated() && request_stream.completed();
}

If kawa is your endpoint, meaning you are the one interested in the HTTP headers and body, then either you gave it a buffer large enough to fit the biggest request you plan to receive, or you will have to extract what you want from it regularly.
I never had to implement this properly, but here is what I would do:

fn after_try_read(request_stream: &mut kawa::Kawa<MyStorage>) {
    if !request_stream.is_main_phase() {
        return;
    }
    let buf = &mut request_stream.storage.mut_buffer();
    while let Some(block) = request_stream.blocks.pop_front() {
        kawa::Block::Chunk(chunk) => {
            chunk.data.data(buf); // do something with this &[u8]
        }
        kawa::Block::Header(header) if !header.is_elided() => {
            // note that this block is used for headers and trailers
        }
        kawa::Block::Flags(flags) => {
            // this flag can be remembered to distinguish between headers (before body) and trailers (after body)
            flags.end_body
        }
        _ => {}
    }
    // now that you consumed the data, you should let kawa know how much it can free from its buffer
    let parsed_data = request_stream.storage.head - request_stream.storage.start;
    request_stream.storage.consume(parsed_data);
    if request_stream.storage.should_shift() {
        request_stream.storage.shift();
    }
    // you can check kawa thinks the request is complete by calling:
    request_stream.terminated();
}

Another way to do it would be to create your own BlockConverter:

fn after_try_read(request_stream: &mut kawa::Kawa<MyStorage>, endpoint: &mut MyEndpoint) {
    // as a BlockConverter endpoint will receive one by one each parsed block and have the possibility to output protocol-specific conversion
    // as an endpoint you can treat each block here and never push any protocol-specific conversion
    request_stream.prepare(endpoint);
    // you still have to call consume, this will trigger the buffer clean-up
    request_stream.consume(0);
}

I hope you better understand what I mean by "it may not fit your use-case".
I apologize if the readme gave you a much different impression of this crate.
We are planning to revisit how Sozu internally works and this should lead to a complete rewrite of kawa. The design choices should remain similar and the primary goal is unchanged: an agnostic representation that allocates and copies as little as possible. We will remain as modular as possible letting the user define its own encoders, decoders, and memory management strategies, but will also work on a user-friendly client and server with support for widely used HTTP protocols like h1, h2, h3 and maybe native CGI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants