Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renaming long arguments #66

Open
dfherr opened this issue Dec 1, 2024 · 6 comments
Open

Renaming long arguments #66

dfherr opened this issue Dec 1, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@dfherr
Copy link

dfherr commented Dec 1, 2024

I'm sorry if this is quite obvious, but I couldn't figure it out from the docs.

I am benchmarking a function that takes Strings as input and want to run it on different strings. One of these strings is supposed to be quite long. That leads to divan printing the whole long string into the terminal effectively making the output unreadable.

How can I avoid this?

I guess I could wrap my input into the function with a custom ToString() method. But I find that quite ugly as it alters the input of my function affecting the benchmark results.

I was looking for ways to name the arguments or change the print output of divan instead of changing the benchmark function itself.

@nvzqz
Copy link
Owner

nvzqz commented Dec 1, 2024

When supplying arguments to a benchmark, you should think of them more like cases rather than like the kinds of arguments provided by property-based testing frameworks. See the args docs for examples of the kind of usage I'm describing.

My recommendation would be to use an enum or short predefined strings to define cases, although perhaps it's better to do something else in your situation. Could you give examples of the kinds of strings you're providing?

@dfherr
Copy link
Author

dfherr commented Dec 1, 2024

i was benchmarking a really simple string reverse. one of the test cases returned from strings() was supposed to be a long string (I copied multiple paragraphs of lorem ipsum for that). The test itself runs fine, the problem is really that the long string gets added to the command line output

Here is my benchmark:

#![feature(iter_collect_into)]

fn main() {
    println!("running benchmarks");
    // Run registered benchmarks.
    divan::main();
}

fn strings() -> impl Iterator<Item = String> {
    [
        String::from("abc"),
        String::from("Two words"),
        String::from("Three words, really"),
        String::from("日本語のテキストを長いテキストにutf8"),
        String::from("我們也來添加一些中文吧"),
    ]
    .into_iter()
}

#[divan::bench(args = strings())]
fn reverse_string_naive(s: &str) -> String {
    s.chars().rev().collect::<String>()
}

#[divan::bench(args = strings())]
fn reverse_string_prefilled(s: &str) -> String {
    let mut reversed = String::with_capacity(s.len());
    s.chars().rev().collect_into(&mut reversed);
    reversed
}

#[divan::bench(args = strings())]
fn reverse_string_manual(input: &str) -> String {
    let bytes = input.len();

    let mut reversed: Vec<u8> = vec![0u8; bytes];

    let mut pos = bytes;

    for char in input.chars() {
        let char_len = char.len_utf8();
        let start = pos - char_len;
        char.encode_utf8(&mut reversed[start..pos]);
        pos = start;
    }

    unsafe { String::from_utf8_unchecked(reversed) }
}

even with the shorter strings you can see misalignment on the outputs (a long multiline string just makes it unreadable):

image

so what I would've liked to do is name the test cases. e.g. the last two one could read Japanese Chinese and then another Long Text.

@dfherr
Copy link
Author

dfherr commented Dec 1, 2024

i've looked through the code and found that my request is currently not possible

divan/src/bench/args.rs

Lines 58 to 145 in 414ace9

pub fn runner<I, B>(
&'static self,
make_args: impl FnOnce() -> I,
arg_to_string: impl Fn(&I::Item) -> String,
_bench_impl: B,
) -> BenchArgsRunner
where
I: IntoIterator,
I::Item: Any + Send + Sync,
B: FnOnce(Bencher, &I::Item) + Copy,
{
let args = self.args.get_or_init(|| {
let args_iter = make_args().into_iter();
// Reuse arguments for names if already a slice of strings.
//
// NOTE: We do this over `I::IntoIter` instead of `I` since it works
// for both slices and `slice::Iter`.
let args_strings: Option<&'static [&str]> =
args_iter.cast_ref::<slice::Iter<&str>>().map(|iter| iter.as_slice());
// Collect arguments into leaked slice.
//
// Leaking the collected `args` simplifies memory management, such
// as when reusing for `names`. We're leaking anyways since this is
// accessed via a global `OnceLock`.
//
// PERF: We could optimize this to reuse arguments when users
// provide slices. However, for slices its `Item` is a reference, so
// `slice::Iter<I::Item>` would never match here. To make this
// optimization, we would need to be able to get the referee type.
let args: &'static [I::Item] = Box::leak(args_iter.collect());
// Collect printable representations of arguments.
//
// PERF: We take multiple opportunities to reuse the provided
// arguments buffer or individual strings' buffers:
// - `&[&str]`
// - `IntoIterator<Item = &str>`
// - `IntoIterator<Item = String>`
// - `IntoIterator<Item = Box<str>>`
// - `IntoIterator<Item = Cow<str>>`
let names: &'static [&str] = 'names: {
// PERF: Reuse arguments strings slice.
if let Some(args) = args_strings {
break 'names args;
}
// PERF: Reuse our args slice allocation.
if let Some(args) = args.cast_ref::<&[&str]>() {
break 'names args;
}
Box::leak(
args.iter()
.map(|arg| -> &str {
// PERF: Reuse strings as-is.
if let Some(arg) = arg.cast_ref::<String>() {
return arg;
}
if let Some(arg) = arg.cast_ref::<Box<str>>() {
return arg;
}
if let Some(arg) = arg.cast_ref::<Cow<str>>() {
return arg;
}
// Default to `arg_to_string`, which will format via
// either `ToString` or `Debug`.
Box::leak(arg_to_string(arg).into_boxed_str())
})
.collect(),
)
};
ErasedArgsSlice {
// We `black_box` arguments to prevent the compiler from
// optimizing the benchmark for the provided values.
args: crate::black_box(args.as_ptr().cast()),
names: names.as_ptr(),
len: args.len(),
arg_type: TypeId::of::<I::Item>(),
}
});
BenchArgsRunner { args, bench: bench::<I::Item, B> }
}
}

the runner implementation will always use String arguments as the arg name. If they aren't strings, I could use the Debug implementation to provide my own formatter (still kinda ugly since it also overrides printing my arg).

I was thinking it would be nice to be able to provide a new config parameter that allows the user to provide an args_name_formatter that takes the arg as input and returns a name as String. If this formatter is provided it gets presedence over the existing logic and how to convert args to strings is up to the user.

I tried hacking around a bit to maybe provide a pull request of the idea, but I just started learning Rust and wasn't able to get a grasp on all the meta programming to get another argument passed into the BenchArgs struct (I would've made it an Option function).

The result would work like this:
#[divan::bench(args = strings(), args_name_formatter=formatter)]

with formatter: Fn(&I::Item) -> String

@nvzqz
Copy link
Owner

nvzqz commented Dec 1, 2024

Yeah Divan always formats args according to their ToString implementation. I'm open to suggestions for how to make this better. I'm not sure if an args_name_formatter option is the right choice.

Also, the misalignment is due to using str.chars().count(). Perhaps something else like unicode-width would be more appropriate.

For getting what you want today, you could create a Case type that stores its name and data:

struct Case {
    name: &'static str,
    text: &'static str,
}

impl std::fmt::Display for Case {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        f.write_str(self.name)
    }
}

const ALL_CASES: &[Case] = &[
    Case { name: "one word", text: "abc" },
    Case { name: "two words", text: "Two words" },
    Case { name: "three words", text: "Three words, really" },
    // ...
];

#[divan::bench(args = ALL_CASES)]
fn reverse_string_naive(case: &Case) -> String {
    case.text.chars().rev().collect::<String>()
}

#[divan::bench(args = ALL_CASES)]
fn reverse_string_prefilled(case: &Case) -> String {
    let s = case.text;
    let mut reversed = String::with_capacity(s.len());
    s.chars().rev().collect_into(&mut reversed);
    reversed
}

#[divan::bench(args = ALL_CASES)]
fn reverse_string_manual(case: &Case) -> String {
    let input = case.text;
    let bytes = input.len();

    let mut reversed: Vec<u8> = vec![0u8; bytes];

    let mut pos = bytes;

    for char in input.chars() {
        let char_len = char.len_utf8();
        let start = pos - char_len;
        char.encode_utf8(&mut reversed[start..pos]);
        pos = start;
    }

    unsafe { String::from_utf8_unchecked(reversed) }
}

@dfherr
Copy link
Author

dfherr commented Dec 2, 2024

There were three options that intuitively made sense to me while keeping them entirely optional:

  • providing the names as an additional option. this should error if args.len() != args_names.len()
  • adding a special case to the args parameter that allows name+arg pairs (basically like your case struct but without being forced to handle it inside the function i'm benchmarking)
  • providing a name formatter that provides the string instead of the currently enforced transformation in the runner

the first one fits the "name" attribute for the whole test case, but I don't like the necessary assertion on the same length. The second one is kinda ugly, because it's a special case within the current control flow and makes the args parameter less pure.

i liked the formatter option. it's the least intrusive, doesn't have the length problem and it integrates seamlessly in the current api, while providing full control on how to style the table output to the user.

well, maybe the fourth option would be to not just provide access to a names formatter, but for the whole output. but i guess that's a) a lot harder to use and b) a lot more effort to implement.

@nvzqz nvzqz changed the title How to name arguments Renaming long arguments Dec 5, 2024
@nvzqz nvzqz added the enhancement New feature or request label Dec 5, 2024
@Bktero
Copy link

Bktero commented Dec 20, 2024

I have faced the same issue today with this code;

fn create_random_vector(size: usize) -> Vec<u32> {
    let mut rng = rand::thread_rng();
    (0..size).map(|_| rng.gen_range(0..100)).collect()
}


#[divan::bench(args = [ create_random_vector(100_000) ])]
fn sum_iter(v: &Vec<u32>) -> u64 {
    v.iter().map(|&x| x as u64).sum()
}

The entire vector is printed in the results, and this is of course a nightmare to read them XD

I used a the technique proposed above, with a wrapper struct:

struct BenchmarkData {
    data: Vec<u32>,
}

const BENCHMARK_DATA_SIZE: usize = 50_000_000;

impl BenchmarkData {
    pub fn new() -> Self {
        Self { data: create_random_vector(BENCHMARK_DATA_SIZE) }
    }
}

impl Display for BenchmarkData {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "[...] len={}", self.data.len())
    }
}

#[divan::bench(args = [ BenchmarkData::new() ])]
fn bench_sum_iter(data: &BenchmarkData) {
    sum_iter(&data.data);
}

I believe this technique is worth documenting :)

If I were to propose an evolution of divan, I would suggest 2 possibilities:

  1. #[divan::bench(args = strings(), args_name_limit=50)] => if the string if too long, it's truncated and an ellipsis is added.
  2. divan::main(args_name_limit=50) => I know this is a Python syntax, but you see the idea. Same as 1, but for all benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants