Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inspecting or modifying bodies requires at least one deep copy. Make changes so that deep copy is not required #408

Open
pszabop opened this issue Oct 4, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@pszabop
Copy link

pszabop commented Oct 4, 2024

What is the problem your feature solves, or the need it fulfills?

I am attempting to implement something like the NginX HTTP substition filter, that allows modification of the request or response body. Documented here https://nginx.org/en/docs/http/ngx_http_sub_module.html

Sometimes I just want to inspect the body, other times I will want to inspect and transform the body.

There is an example of doing something similar in your examples directory: Transforming JSON to YAML. However it performs an extra deep copy (beyond the transformation) as it accumulates the Bytes into one Vec.

I believe this is due to the fact that HTTP chunk accumulation has to be eventually be assigned to body in the response_body_filter function signature, in order to send the accumulated chunks out on the wire, therefor it is beyond my ability to reduce the deep copies: It requires an API change.

    fn response_body_filter(
        &self,
        _session: &mut Session,
        body: &mut Option<Bytes>,
        end_of_stream: bool,
        ctx: &mut Self::CTX,
    ) -> Result<Option<std::time::Duration>> {
       if let Some(b) = body {
            // in the example it was accumulated into a Vec<u8>, I just accumulate Bytes
            // this is a deep copy
            ctx.chunks.extend_from_slice(&b); 
            // drop the body
            b.clear();
        }
        
        if end_of_stream {
            // inspect and transform here

            // this forces a deep copy when putting the data into the context
            // might have to do a std::mem::take, I didn't verify for this example
            *body = ctx.accumulated_bytes; 
        }
   }

Describe the solution you'd like

TL;DR - I would like to be able to inspect and sometimes transform request and response bodies with minimal or no deep copies if possible.

One way to do this is to accumulate into Vec<bytes> which involves no copying. Yes, implementing a near zero deep copy inspect / transform would be painful code, but I've seen it done in C before, and I have a Rust prototype in a sandbox that does this.

However, the function signature for response_body_filter would have to change to allow transmission of a Vec instead of just a single <Bytes>. For example:

    fn response_body_filter(
        &self,
        _session: &mut Session,
        body: &mut Option<Vec<Bytes>>,
        end_of_stream: bool,
        ctx: &mut Self::CTX,
    ) -> Result<Option<std::time::Duration>> {
       if let Some(b) = body {
            // not a deep copy
            ctx.chunks.push(b);     
        }
        
        if end_of_stream {
            // inspect and transform here on a Vec<Bytes> with no copies  - painful but doable

            // not a deep copy
            // might have to do a std::mem::take, I didn't verify for this example
            *body = ctx.chunks; 
        }
   }

I didn't run this by rustc so probably there's something obviously wrong to the trained eye, but that's the general idea.

Describe alternatives you've considered

I may be able to limit my accumulation to the low 100s of kbytes or so, so the extra deep copy currently required may not be too painful. The general case of the nginx sub_module, however, doesn't have this limitation. It's also amazing how fast requirements change.

Additional context

https://github.com/cloudflare/pingora/blob/main/pingora-proxy/examples/modify_response.rs
https://nginx.org/en/docs/http/ngx_http_sub_module.html

@pszabop pszabop changed the title Modifying bodies requires at least one deep copy. Make it possible for body to be a Vec<Bytes> or something similar so deep copy not needed Inspecting or modifying bodies requires at least one deep copy. Make changes so that deep copy is not required Oct 4, 2024
@drcaramelsyrup drcaramelsyrup added the enhancement New feature or request label Oct 11, 2024
@SpinoPi
Copy link

SpinoPi commented Oct 12, 2024

Yes, this is a highly useful feature! I'm currently developing a customized gateway but encountering a challenge. The upstream servers require different request paths and body formats, and I'm looking for a solution to handle this efficiently.

@pszabop
Copy link
Author

pszabop commented Oct 23, 2024

I have looked and have yet to find a "vector of Bytes" (or vector of capable version of the memchr (memmem) crate. I've seen code that implements this in C, but if you choose to implement this enhancement someone is going to have to write such a crate to make the enhancement useful. The state machine involved has to do fun stuff when transitioning between Bytes elements.

Either that, or my crate search skills are terrible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants