You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes I just want to inspect the body, other times I will want to inspect and transform the body.
There is an example of doing something similar in your examples directory: Transforming JSON to YAML. However it performs an extra deep copy (beyond the transformation) as it accumulates the Bytes into one Vec.
I believe this is due to the fact that HTTP chunk accumulation has to be eventually be assigned to body in the response_body_filter function signature, in order to send the accumulated chunks out on the wire, therefor it is beyond my ability to reduce the deep copies: It requires an API change.
fn response_body_filter(
&self,
_session: &mut Session,
body: &mut Option<Bytes>,
end_of_stream: bool,
ctx: &mut Self::CTX,
) -> Result<Option<std::time::Duration>> {
if let Some(b) = body {
// in the example it was accumulated into a Vec<u8>, I just accumulate Bytes
// this is a deep copy
ctx.chunks.extend_from_slice(&b);
// drop the body
b.clear();
}
if end_of_stream {
// inspect and transform here
// this forces a deep copy when putting the data into the context
// might have to do a std::mem::take, I didn't verify for this example
*body = ctx.accumulated_bytes;
}
}
Describe the solution you'd like
TL;DR - I would like to be able to inspect and sometimes transform request and response bodies with minimal or no deep copies if possible.
One way to do this is to accumulate into Vec<bytes> which involves no copying. Yes, implementing a near zero deep copy inspect / transform would be painful code, but I've seen it done in C before, and I have a Rust prototype in a sandbox that does this.
However, the function signature for response_body_filter would have to change to allow transmission of a Vec instead of just a single <Bytes>. For example:
fn response_body_filter(
&self,
_session: &mut Session,
body: &mut Option<Vec<Bytes>>,
end_of_stream: bool,
ctx: &mut Self::CTX,
) -> Result<Option<std::time::Duration>> {
if let Some(b) = body {
// not a deep copy
ctx.chunks.push(b);
}
if end_of_stream {
// inspect and transform here on a Vec<Bytes> with no copies - painful but doable
// not a deep copy
// might have to do a std::mem::take, I didn't verify for this example
*body = ctx.chunks;
}
}
I didn't run this by rustc so probably there's something obviously wrong to the trained eye, but that's the general idea.
Describe alternatives you've considered
I may be able to limit my accumulation to the low 100s of kbytes or so, so the extra deep copy currently required may not be too painful. The general case of the nginx sub_module, however, doesn't have this limitation. It's also amazing how fast requirements change.
The text was updated successfully, but these errors were encountered:
pszabop
changed the title
Modifying bodies requires at least one deep copy. Make it possible for body to be a Vec<Bytes> or something similar so deep copy not needed
Inspecting or modifying bodies requires at least one deep copy. Make changes so that deep copy is not required
Oct 4, 2024
Yes, this is a highly useful feature! I'm currently developing a customized gateway but encountering a challenge. The upstream servers require different request paths and body formats, and I'm looking for a solution to handle this efficiently.
I have looked and have yet to find a "vector of Bytes" (or vector of capable version of the memchr (memmem) crate. I've seen code that implements this in C, but if you choose to implement this enhancement someone is going to have to write such a crate to make the enhancement useful. The state machine involved has to do fun stuff when transitioning between Bytes elements.
Either that, or my crate search skills are terrible.
What is the problem your feature solves, or the need it fulfills?
I am attempting to implement something like the NginX HTTP substition filter, that allows modification of the request or response body. Documented here https://nginx.org/en/docs/http/ngx_http_sub_module.html
Sometimes I just want to inspect the body, other times I will want to inspect and transform the body.
There is an example of doing something similar in your examples directory: Transforming JSON to YAML. However it performs an extra deep copy (beyond the transformation) as it accumulates the Bytes into one Vec.
I believe this is due to the fact that HTTP chunk accumulation has to be eventually be assigned to
body
in theresponse_body_filter
function signature, in order to send the accumulated chunks out on the wire, therefor it is beyond my ability to reduce the deep copies: It requires an API change.Describe the solution you'd like
TL;DR - I would like to be able to inspect and sometimes transform request and response bodies with minimal or no deep copies if possible.
One way to do this is to accumulate into
Vec<bytes>
which involves no copying. Yes, implementing a near zero deep copy inspect / transform would be painful code, but I've seen it done in C before, and I have a Rust prototype in a sandbox that does this.However, the function signature for
response_body_filter
would have to change to allow transmission of a Vec instead of just a single<Bytes>
. For example:I didn't run this by rustc so probably there's something obviously wrong to the trained eye, but that's the general idea.
Describe alternatives you've considered
I may be able to limit my accumulation to the low 100s of kbytes or so, so the extra deep copy currently required may not be too painful. The general case of the nginx
sub_module
, however, doesn't have this limitation. It's also amazing how fast requirements change.Additional context
https://github.com/cloudflare/pingora/blob/main/pingora-proxy/examples/modify_response.rs
https://nginx.org/en/docs/http/ngx_http_sub_module.html
The text was updated successfully, but these errors were encountered: