Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel router #653

Open
dvdsk opened this issue Dec 6, 2024 · 21 comments
Open

Channel router #653

dvdsk opened this issue Dec 6, 2024 · 21 comments
Assignees

Comments

@dvdsk
Copy link
Collaborator

dvdsk commented Dec 6, 2024

@PetrGlad you brought this up somewhere else and I just saw it has actually been requested a few weeks ago: #626 (comment)

So lets discuss that here.

@iluvcapra
Copy link
Contributor

Can I write this? I have opinions on this.

@PetrGlad
Copy link
Collaborator

PetrGlad commented Dec 7, 2024

Yes, this is something I had in mind. We do have a channel number converter (ChannelCountConverter) but its primary function seems to be mono->stereo or just discarding channels with higher numbers. It does not look very useful otherwise.
So here I see 3 options.

1️⃣ The most generic channel router would be to have volume/amplification matrix so every output is a weighed sum of inputs. That is for $n$ input channels, and $m$ output channels, there would be a matrix $A_{m,n}$ with floating point weights (those could even be negative :) This version could handle arbitrary routing, mixing, channel duplication, mono conversion, etc. But this one would be the slowest of the three, although I would not worry about that much as it likely can be optimized. This version can probably needs coefficients to be floating point or at least fractional, because it can sum channels together which may cause clipping so one may want to adjust input weights.

2️⃣ A simpler implementation could use just single weighted input for every output. In that case an output is specified as a tuple of {input_channel_index, weight, output_channel_index}, and the router would have a list of those pairs for every output channel. This version can still handle channel routing (including duplication), and adjusting per-channel volume. One may be to allow adding more than one pair for output and sum the results, this way it will be more flexible without having to keeping the whole matrix.

3️⃣ A simpler still implementation of this would just route channel (equivalent to limiting all coefficients in previous version to 1.0). This only would have to use a list of input channel indexes, one for every output channel. This version is useful for arbitrary channel swaps, duplication, discarding channels but not for mixing. And this would be the fastest.

All of these versions look useful to me, so maybe you can pick whatever solves your problem the best. Maybe there are something else that I have missed.

Even simpler but inconvenient would be to use single channel converter and some kind of splitter, then single channel converter router, and then mixer. But this sounds rather cumbersome even to me :)

@dvdsk
Copy link
Collaborator Author

dvdsk commented Dec 7, 2024

Can I write this? I have opinions on this.

have at it :)

It seems like Petr also has some thoughts so they'll probably be able to help out if you need some input along the way.

@PetrGlad
Copy link
Collaborator

PetrGlad commented Dec 7, 2024

By the way #626 (comment)

@PetrGlad
Copy link
Collaborator

PetrGlad commented Dec 7, 2024

After thinking more my personal preference would be 2️⃣ without limiting how many pairs can be added. Since it would support most if not all use-cases I saw so far. If there will be need for more complex processing 1️⃣ can be implemented separately.

@iluvcapra do you have other ideas?

@iluvcapra
Copy link
Contributor

The other applications to keep in mind for this source would be as a drop-in ambisonic encoder/decoder.

@iluvcapra
Copy link
Contributor

iluvcapra commented Dec 7, 2024

(Deleted the first try at this, it was too broken.)

Some thoughts:

  1. Do we want to address the question of channel formatting at this time? Sources reflect the channel count of their outputs but not necessarily channel formatting or assignment (I notice this was brought up in the user story).
  2. Does channel mixing need to be dynamic? Or is it sufficient that it be static over the life of the router?
  3. Ability to mix multiple inputs would be a must for my immediate use-case, you need both duplication and combining to do an ITU-BS.775 downmix of a 5.1 or 7.1 to a stereo.
  4. 1️⃣ has the benefit of simplicity, I'm not really seeing how this is slower considering in this case next() just has to keep track of where it is in the input, where it is in the output, and then pull that coefficient out of the matrix. The alternative is keeping the coefficients in a list and having to do a lookup.

1️⃣ I think ends up looking something like this:

struct ChannelMixer<I> {
    channel_matrix: Vec<Vec<f32>>, // dims are (input.channels(), self.channels())
    output_pos: u16,
    input: I,
    input_buffer: Vec<Sample>, // dim is (self.input.channels())
    //... yada yada
}

impl<I> Iterator for ChannelMixer where I: Source {  

    // for a gain matrix, this logic is relatively straightforward
    fn apply_gains(&self, for_output_channel: u16) -> Sample {
        self.input_buffer.iter()
            .zip(self.channel_matrix)
            .map( |in_sample, out_gains| { in_sample * out_gains[for_output_channel as usize] } )
            .sum()
    }

    fn next(&mut self) -> Option<Sample> {
        if (self.output_pos == self.channels()) {
            // handling end-of-frame condition omitted here
            self.input_buffer = self.input
                .take(self.input.channels())
                .collect(); 
            self.output_pos = 0;
        }
        
        let retval = apply_gains(self.output_pos);
        self.output_pos = self.output_pos + 1;
        retval
    }
}

And then in the case of 2️⃣...

struct MixCoefficient {
    input_channel: u16,
    output_channel: u16,
    gain: f32,
}

struct ChannelMixer<I> {
    channel_matrix: Vec<MixCoefficient>,
    output_pos: u16,
    input: I,
    input_buffer: Vec<Sample>, // dim is (self.input.channels())
    //... yada yada
}

impl<I> Iterator for ChannelMixer where I: Source {  
    
    // Getting the coefficient in this case is a bit more involved, and there are more corner cases, 
    // like what if there are two rules for the same pair of ins-outs, etc.
    fn get_mix_coefficient(&self, for_input: u16, for_output: u16) -> f32 {
        self.mix_coefficients.iter()
            .find( |mc| { mc.output_channel == for_output && mc.input_channel == for_input } )
            .map( |mc| { *mc.gain })
            .or_else(0.0f32);
    }

    fn apply_gains(for_output_channel: usize) -> Sample {
        self.input_buffer.iter().enumerate()
            .map( | (in_channel_index, in_sample) | { 
                in_sample * get_mix_coefficient(in_channel_index, for_output_channel)
            } )
            .sum()
    }

    fn next(&mut self) -> Option<Sample> {
        if (self.output_pos == self.channels()) {
            // handling end-of-frame condition omitted here
            self.input_buffer = self.input
                .take(self.input.channels())
                .collect(); 
            self.output_pos = 0;
        }
        
        let retval = apply_gains(self.output_pos);
        self.output_pos = self.output_pos + 1;
        retval
    }
}

@iluvcapra
Copy link
Contributor

iluvcapra commented Dec 7, 2024

Or! TBF you can do gain lookups in 2️⃣ this way, with a HashMap:

#[derive(Hash, Eq, PartialEq)]
struct InputOutputPair(u16, u16);

struct ChannelMixer<I> {
    channel_map: HashMap<InputOutputPair, f32>,
    output_pos: u16,
    input: I,
    input_buffer: Vec<Sample>, // dim is (self.input.channels())
    //... yada yada
}

impl<I> Iterator for ChannelMixer where I: Source {  
    
    fn apply_gains(&self, output_channel: usize) -> Sample {
        self.input_buffer.iter().enumerate()
            .map( | (input_channel, in_sample) | { 
                let pair = InputOutputPair(input_channel, output_channel);
                in_sample * self.channel_map[pair].or_else(0.0f32)
            } )
            .sum()
    }

    fn next(&mut self) -> Option<Sample> {
        if (self.output_pos == self.channels()) {
            // handling end-of-frame condition omitted here
            self.input_buffer = self.input
                .take(self.input.channels())
                .collect(); 
            self.output_pos = 0;
        }
        
        let retval = apply_gains(self.output_pos);
        self.output_pos = self.output_pos + 1;
        retval
    }
}

I guess it's just six of one, half a dozen of the other. There's some penalty here for hashing when you retrieve the coefficient, and then you have to do a hash lookup, that's the biggest difference but the lookup is at least approaching O(1).

@PetrGlad
Copy link
Collaborator

PetrGlad commented Dec 8, 2024

// Updated to clarify some assumptions.

Yes, 1️⃣ option is logically the simplest but for larger number of channels the matrix will grow bigger and usually most of the coefficients will be zero. For example for 8 channels it the matrix will have 64 elements. If a complete matrix is used then, the code should read all the channels of a frame and then multiply the matrix by this vector to get the output. That is indeed more straightforward than the other option.

In the 2️⃣ option I think, no complex containers are necessary (would do for a prototype, though). I had in mind plain array of tuples {in, out, volume}. Users can provide a bunch of those but it is required to have at most one for any {in, out} pair chat should be checked only at the construction time. Also can discard any items with zero coefficients. At the construction time the router would sort those by in. Channel samples are coming in ascending order, so the algorithm could do something like

// Data and parameters in self
let mut links = vec![(0, 2, 0.3f32), (0, 2, 0.7), (2, 1, 0.9)];
links.sort_by_key(|link| link.1);        
let input = vec![12u16, 34, 56];
let input = input.iter();        
let n_in_channels = 2;
let n_out_channels = 3;

// Getting next frame
// Process input samples until whole frame (all channels) is read and then produce output.
let mut link_i = 0;
// For output channels that are not supposed to get any sound 
//     "silence" values should still be produced.
let mut output = vec![0f32; n_out_channels];
for (i, sample) in input.take(n_in_channels).enumerate() {
    while let Some(link) = links.get(link_i) {
        if link.0 > i {
            break;
        }
        output[link.1] += (*sample as f32).amplify(link.2);
        link_i += 1;
    }
}
// normally we have to read all the input channels before the output frame is ready.
// I do not know yet how this should look like.
for out in output {
    yield out; // Rust cannot do that yet. We need to keep an output buffer and refill that on each frame. As in your examples.
}

List of those tuples is effectively a sparse encoding of a matrix from 1️⃣, if that helps.

@PetrGlad
Copy link
Collaborator

PetrGlad commented Dec 8, 2024

Regarding dynamic adjustments, there are some examples of how this is done currently (see e.g. Sync implementation). There are 3 approaches: periodic access hook, atomics and a command channel. I have yet to see which one is better here.
I think the whole list of links should be replaced at once. But changes should only take effect when a complete frame is processed (between processing frames), so this will complicate the code a bit.

@PetrGlad
Copy link
Collaborator

PetrGlad commented Dec 9, 2024

Another option for 1️⃣ (full matrix case) is always having matrix of size inputs x outputs and use atomics as the coefficients. Then all the adjustments can be done in real time. Maybe that is an advantage of having bigger matrix.

@iluvcapra
Copy link
Contributor

I do like the matrix but I do also agree it's not a great idea for something like a 256x256 matrix, which is definitely something somebody might want to do, but I envision this source as primarily acting on single multichannel sounds, not hundreds. The widest multichannel sounds you might get are an Atmos 9.1.6 bed or 3-order Ambisonic, and both of these are only 16 and this would be the max. I've laid it out at the moment as a HashMap but a matrix of atomics seems like it might do just fine.

@iluvcapra
Copy link
Contributor

Germaine to this conversation I raise #657, I'm wide open on this and am curious for comments.

@dvdsk
Copy link
Collaborator Author

dvdsk commented Dec 10, 2024

I've laid it out at the moment as a HashMap but a matrix of atomics seems like it might do just fine.

We need a test to quantify the speed difference between atomics and using the periodic callback. That would be checking 16*16 atomics. In #654 we are trying to hammer out the rodio goals but it seems like we all agree rodio should be as performant as possible while easy to use. Periodic callback offers the balance between latency (the period) and efficiency. On (arm) mostly smartphones and now mac's atomics have a different impact.

Maybe we can compare three different versions of this:

  • on periodic callback + vec1 of float pairs
  • on periodic callback + matrix of floats
  • a matrix of atomics

Footnotes

  1. While a hashmap::get is O(1) that 1 is quite heavy given you need to hash the value. The vec might be faster. Though feel free to take the hashmap if thats easier to implement.

@dvdsk
Copy link
Collaborator Author

dvdsk commented Dec 10, 2024

@will3942 from your comment in the ChannelVolume PR it sounds like you might have a use case for this. Any preferences for the API?

@will3942
Copy link
Contributor

@will3942 from your comment in the ChannelVolume PR it sounds like you might have a use case for this. Any preferences for the API?

Thanks for remembering and tagging me in the issue.

For us we're going to be using this for OutputStreams/Devices of up to 128 channels but realistically only with input sources of 2 channels (stereo).

In terms of a wishlist for us it would be:

  • Ability to route a specific input channel to a specific output channel
  • Ability to apply a specific volume to a routing (e.g. Input 0 to Output 32 at Volume 0.8)
  • Ability to convert an input source from stereo to mono and route it to a specific channel (e.g. Input [0,1] to Output 16)

From the APIs/implementations discussed I think 1️⃣ and 2️⃣ would work well for us.

@iluvcapra
Copy link
Contributor

What I've done for the time being is an array of floats that are updated over a mpsc channel, this channel is checked every audio frame but this could be done less frequently easily enough.

Atomic floats are gated by experimental so I decided to sidestep that for now but that's doable if we wanted to.

A hashmap-type solution could be sped up by eliminating hashing, by using a u32 as a key and constructing the key by or-ing the input and (output << 16).

@PetrGlad
Copy link
Collaborator

@dvdsk I believe the loop in my example above is the leanest approach to process sparse matrix ( for variant 2️⃣ ) - it only uses 2 vectors (one for coefficients and one output accumulator).

@dvdsk
Copy link
Collaborator Author

dvdsk commented Dec 12, 2024

@dvdsk I believe the loop in my example above is the leanest approach to process sparse matrix ( for variant 2️⃣ ) - it only uses 2 vectors (one for coefficients and one output accumulator).

sparse matrix seems to be the way to go. An optimal HashMap would be the same as a sparse matrix I think.

@PetrGlad
Copy link
Collaborator

Well, the difference in efficiency may not be significant, but in the simplest version it is just iteration over an array, since we know in advance in which order the channel samples will be processed. A hash map would require at least calculating a hash value for every query.

@PetrGlad
Copy link
Collaborator

PetrGlad commented Jan 15, 2025

By the way, the channel mixer/router can also be used instead of ChannelVolume, ChannelCountConverter, and instead of Amplify. The latter I'd keep because it is very common use case and router is too cumbersome for that. But the former two can be deprecated then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants