Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm64: Implement 16bpc cdef_dist_kernel #3293

Merged
merged 2 commits into from
Nov 29, 2023

Conversation

barrbrain
Copy link
Collaborator

No description provided.

@barrbrain barrbrain requested a review from lu-zero November 29, 2023 05:31
Copy link

codecov bot commented Nov 29, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (92b34fc) 88.22% compared to head (81a65a6) 88.22%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3293   +/-   ##
=======================================
  Coverage   88.22%   88.22%           
=======================================
  Files          87       87           
  Lines       28221    28221           
=======================================
  Hits        24898    24898           
  Misses       3323     3323           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@barrbrain barrbrain merged commit a8ad43f into xiph:master Nov 29, 2023
25 checks passed
@barrbrain barrbrain deleted the cdef-dist-aarch64 branch November 29, 2023 14:16
@barrbrain
Copy link
Collaborator Author

See #3281 (comment) for a perf trace from before this PR.
Here is the same workload after this PR:

# Samples: 204K of event 'cycles'
# Event count (approx.): 114662234882
#
#       Overhead  Command / Shared Object / Symbol
# ..............  ...............................................................................................................................................................................................................
#
   100.00%        rav1e  
       92.79%        rav1e                
          10.59%        [.] rav1e_satd8x8_hbd_neon
            |          
            |--6.15%--rav1e::api::internal::ContextInner<T>::receive_packet
            |          rav1e::api::internal::ContextInner<T>::compute_block_importances (inlined)
            |          <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each (inlined)
            |          rav1e::api::internal::ContextInner<T>::compute_block_importances::{{closure}} (inlined)
            |          rav1e::api::internal::ContextInner<T>::update_block_importances (inlined)
            |          core::iter::traits::iterator::Iterator::for_each (inlined)
            |          <core::iter::adapters::flatten::FlatMap<I,U,F> as core::iter::traits::iterator::Iterator>::fold (inlined)
            |          <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold (inlined)
            |          core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold (inlined)
            |          <core::iter::adapters::fuse::Fuse<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
            |          <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
            |          <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
            |          core::iter::traits::iterator::Iterator::fold (inlined)
            |          <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
            |          core::iter::adapters::map::map_fold::{{closure}} (inlined)
            |          core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold::flatten::{{closure}} (inlined)
            |          <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold::flatten::{{closure}} (inlined)
            |          <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
            |          <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
            |          core::iter::traits::iterator::Iterator::fold (inlined)
            |          <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
            |          core::iter::adapters::map::map_fold::{{closure}} (inlined)
            |          rav1e::api::internal::ContextInner<T>::update_block_importances::{{closure}}::{{closure}} (inlined)
            |          rav1e::asm::aarch64::dist::get_satd (inlined)
            |          satd8x8_hbd_neon (inlined)
            |          
            |--3.50%--rav1e::me::estimate_motion
            |          |          
            |           --3.14%--rav1e::me::sub_pixel_me (inlined)
            |                     rav1e::me::subpel_diamond_search (inlined)
            |                     rav1e::me::get_subpel_mv_rd (inlined)
            |                     rav1e::me::compute_mv_rd (inlined)
            |                     rav1e::asm::aarch64::dist::get_satd (inlined)
            |                     satd8x8_hbd_neon (inlined)
            |          
             --0.66%--core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
                       core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut (inlined)
                       rav1e::encoder::encode_tile_group::{{closure}} (inlined)
                       rav1e::encoder::encode_tile (inlined)

           7.87%        [.] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
            |          
            |--5.80%--rav1e::me::estimate_motion
            |          rav1e::me::full_pixel_me (inlined)
            |          |          
            |           --5.65%--rav1e::me::full_pixel_me::{{closure}}
            |                     |          
            |                     |--2.97%--rav1e::me::get_best_predictor (inlined)
            |                     |          rav1e::me::get_fullpel_mv_rd
            |                     |          rav1e::me::compute_mv_rd (inlined)
            |                     |          rav1e::asm::aarch64::dist::get_sad (inlined)
            |                     |          rav1e::asm::aarch64::dist::get_sad::{{closure}} (inlined)
            |                     |          rav1e::dist::rust::get_sad (inlined)
            |                     |          core::iter::traits::iterator::Iterator::sum (inlined)
            |                     |          <u32 as core::iter::traits::accum::Sum>::sum (inlined)
            |                     |          <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
            |                     |          |          
            |                     |           --2.85%--core::iter::traits::iterator::Iterator::fold (inlined)
            |                     |                     |          
            |                     |                      --2.58%--core::iter::adapters::map::map_fold::{{closure}} (inlined)
            |                     |                                |          
            |                     |                                 --2.57%--rav1e::dist::rust::get_sad::{{closure}} (inlined)
            |                     |                                           core::iter::traits::iterator::Iterator::sum (inlined)
            |                     |                                           <u32 as core::iter::traits::accum::Sum>::sum (inlined)
            |                     |                                           <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
            |                     |                                           core::iter::traits::iterator::Iterator::fold (inlined)
            |                     |                                           |          
            |                     |                                           |--0.55%--<core::iter::adapters::zip::Zip<A,B> as core::iter::traits::iterator::Iterator>::next (inlined)
            |                     |                                           |          <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next (inlined)
            |                     |                                           |          
            |                     |                                            --0.52%--core::iter::adapters::map::map_fold::{{closure}} (inlined)
            |                     |          
            |                      --2.68%--rav1e::me::fullpel_diamond_search (inlined)
            |                                rav1e::me::get_fullpel_mv_rd
            |                                rav1e::me::compute_mv_rd (inlined)
            |                                rav1e::asm::aarch64::dist::get_sad (inlined)
            |                                rav1e::asm::aarch64::dist::get_sad::{{closure}} (inlined)
            |                                rav1e::dist::rust::get_sad (inlined)
            |                                core::iter::traits::iterator::Iterator::sum (inlined)
            |                                <u32 as core::iter::traits::accum::Sum>::sum (inlined)
            |                                <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
            |                                |          
            |                                 --2.59%--core::iter::traits::iterator::Iterator::fold (inlined)
            |                                           |          
            |                                            --2.34%--core::iter::adapters::map::map_fold::{{closure}} (inlined)
            |                                                      |          
            |                                                       --2.34%--rav1e::dist::rust::get_sad::{{closure}} (inlined)
            |                                                                 core::iter::traits::iterator::Iterator::sum (inlined)
            |                                                                 <u32 as core::iter::traits::accum::Sum>::sum (inlined)
            |                                                                 <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
            |                                                                 core::iter::traits::iterator::Iterator::fold (inlined)
            |          
             --2.07%--rav1e::api::internal::ContextInner<T>::send_frame
                       rav1e::api::internal::ContextInner<T>::compute_frame_invariants (inlined)
                       rav1e::api::internal::ContextInner<T>::compute_lookahead_motion_vectors (inlined)
                       rav1e::api::lookahead::compute_motion_vectors
                       rayon::iter::ParallelIterator::for_each (inlined)
                       rayon::iter::for_each::for_each (inlined)
                       <rayon::vec::IntoIter<T> as rayon::iter::ParallelIterator>::drive_unindexed (inlined)
                       rayon::iter::plumbing::bridge (inlined)
                       <rayon::vec::IntoIter<T> as rayon::iter::IndexedParallelIterator>::with_producer
                       <rayon::vec::Drain<T> as rayon::iter::IndexedParallelIterator>::with_producer (inlined)
                       <rayon::iter::plumbing::bridge::Callback<C> as rayon::iter::plumbing::ProducerCallback<I>>::callback (inlined)
                       rayon::iter::plumbing::bridge_producer_consumer (inlined)
                       rayon::iter::plumbing::bridge_producer_consumer::helper
                       rayon::iter::plumbing::Producer::fold_with (inlined)
                       <rayon::iter::for_each::ForEachConsumer<F> as rayon::iter::plumbing::Folder<T>>::consume_iter (inlined)
                       core::iter::traits::iterator::Iterator::for_each (inlined)
                       core::iter::traits::iterator::Iterator::fold (inlined)
                       core::iter::traits::iterator::Iterator::for_each::call::{{closure}} (inlined)
                       core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut
                       rav1e::api::lookahead::compute_motion_vectors::{{closure}} (inlined)
                       rav1e::me::estimate_tile_motion
                       rav1e::me::refine_subsampled_sb_motion (inlined)
                       rav1e::me::refine_subsampled_motion_estimate (inlined)
                       rav1e::me::full_search
                       rav1e::me::compute_mv_rd (inlined)
                       rav1e::asm::aarch64::dist::get_sad (inlined)
                       rav1e::asm::aarch64::dist::get_sad::{{closure}} (inlined)
                       rav1e::dist::rust::get_sad (inlined)
                       core::iter::traits::iterator::Iterator::sum (inlined)
                       <u32 as core::iter::traits::accum::Sum>::sum (inlined)
                       <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                       |          
                        --2.05%--core::iter::traits::iterator::Iterator::fold (inlined)
                                  |          
                                   --1.91%--core::iter::adapters::map::map_fold::{{closure}} (inlined)
                                             |          
                                              --1.91%--rav1e::dist::rust::get_sad::{{closure}} (inlined)
                                                        core::iter::traits::iterator::Iterator::sum (inlined)
                                                        <u32 as core::iter::traits::accum::Sum>::sum (inlined)
                                                        <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                                                        core::iter::traits::iterator::Iterator::fold (inlined)

           7.22%        [.] put_8tap_neon
            |          
            |--5.95%--rav1e::me::estimate_motion
            |          rav1e::me::sub_pixel_me (inlined)
            |          rav1e::me::subpel_diamond_search (inlined)
            |          rav1e::me::get_subpel_mv_rd (inlined)
            |          put_8tap_neon
            |          
             --0.62%--put_8tap_neon
                       put_8tap_neon

           5.07%        [.] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
            |          
             --5.00%--rav1e::rdo::rdo_mode_decision
                       |          
                        --4.82%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                                  core::iter::traits::iterator::Iterator::try_fold (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                                  rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                                  rav1e::rdo::luma_chroma_mode_rdo
                                  rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
                                  rav1e::rdo::compute_distortion
                                  rav1e::rdo::sse_wxh (inlined)
                                  rav1e::dist::rust::get_weighted_sse
                                  core::iter::traits::iterator::Iterator::sum (inlined)
                                  <u64 as core::iter::traits::accum::Sum>::sum (inlined)
                                  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                                  core::iter::traits::iterator::Iterator::fold (inlined)
                                  core::iter::adapters::map::map_fold::{{closure}} (inlined)
                                  rav1e::dist::rust::get_weighted_sse::{{closure}} (inlined)
                                  core::iter::traits::iterator::Iterator::sum (inlined)
                                  <u64 as core::iter::traits::accum::Sum>::sum (inlined)
                                  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                                  |          
                                   --4.53%--core::iter::traits::iterator::Iterator::fold (inlined)
                                             |          
                                              --4.26%--core::iter::adapters::map::map_fold::{{closure}} (inlined)
                                                        |          
                                                         --4.25%--rav1e::dist::rust::get_weighted_sse::{{closure}}::{{closure}} (inlined)
                                                                   |          
                                                                    --4.13%--core::iter::traits::iterator::Iterator::sum (inlined)
                                                                              <u32 as core::iter::traits::accum::Sum>::sum (inlined)
                                                                              <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                                                                              core::iter::traits::iterator::Iterator::fold (inlined)
                                                                              |          
                                                                               --3.79%--core::iter::adapters::map::map_fold::{{closure}} (inlined)
                                                                                         |          
                                                                                          --3.67%--rav1e::dist::rust::get_weighted_sse::{{closure}}::{{closure}}::{{closure}} (inlined)
                                                                                                    core::iter::traits::iterator::Iterator::sum (inlined)
                                                                                                    <u32 as core::iter::traits::accum::Sum>::sum (inlined)
                                                                                                    <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                                                                                                    core::iter::traits::iterator::Iterator::fold (inlined)
                                                                                                    |          
                                                                                                    |--1.14%--core::iter::adapters::map::map_fold::{{closure}} (inlined)
                                                                                                    |          |          
                                                                                                    |           --1.13%--rav1e::dist::rust::get_weighted_sse::{{closure}}::{{closure}}::{{closure}}::{{closure}} (inlined)
                                                                                                    |          
                                                                                                     --0.71%--<core::iter::adapters::zip::Zip<A,B> as core::iter::traits::iterator::Iterator>::next (inlined)
                                                                                                               <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next (inlined)

           4.23%        [.] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
            |
            ---rav1e::api::internal::ContextInner<T>::receive_packet
               rav1e::api::internal::ContextInner<T>::compute_block_importances (inlined)
               <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each (inlined)
               rav1e::api::internal::ContextInner<T>::compute_block_importances::{{closure}} (inlined)
               rav1e::api::internal::ContextInner<T>::update_block_importances (inlined)
               core::iter::traits::iterator::Iterator::for_each (inlined)
               <core::iter::adapters::flatten::FlatMap<I,U,F> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold (inlined)
               <core::iter::adapters::fuse::Fuse<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::traits::iterator::Iterator::fold (inlined)
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
               core::iter::adapters::map::map_fold::{{closure}} (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               |          
                --4.22%--<core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
                          core::iter::traits::iterator::Iterator::fold (inlined)
                          |          
                           --4.16%--<core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
                                     |          
                                      --4.16%--core::iter::adapters::map::map_fold::{{closure}} (inlined)
                                                |          
                                                 --4.11%--rav1e::api::internal::ContextInner<T>::update_block_importances::{{closure}}::{{closure}} (inlined)
                                                           |          
                                                            --1.16%--<v_frame::plane::Plane<T> as rav1e::frame::plane::AsRegion<T>>::region (inlined)
                                                                      |          
                                                                       --0.76%--rav1e::tiling::plane_region::PlaneRegion<T>::new (inlined)
                                                                                 rav1e::tiling::plane_region::PlaneRegion<T>::from_slice (inlined)

           3.96%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct32
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               rav1e::asm::aarch64::transform::forward::daala_fdct32
               |          
                --3.27%--rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32 (inlined)
                          |          
                           --2.22%--rav1e::asm::aarch64::transform::forward::daala_fdst_iv_16_asym (inlined)
                                     |          
                                      --0.50%--rav1e::asm::aarch64::transform::forward::RotateKernel::half_kernel (inlined)

           3.52%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct64
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               rav1e::asm::aarch64::transform::forward::daala_fdct64
               |          
               |--1.94%--rav1e::asm::aarch64::transform::forward::daala_fdst_iv_32_asym (inlined)
               |          
                --0.59%--rav1e::asm::aarch64::transform::forward::daala_fdct64::butterfly_pair (inlined)

           3.49%        [.] rav1e::rdo::compute_distortion
            |          
             --3.46%--rav1e::rdo::rdo_mode_decision
                       |          
                        --3.37%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                                  core::iter::traits::iterator::Iterator::try_fold (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                                  rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                                  rav1e::rdo::luma_chroma_mode_rdo
                                  rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
                                  rav1e::rdo::compute_distortion
                                  |          
                                  |--2.01%--rav1e::rdo::cdef_dist_wxh (inlined)
                                  |          |          
                                  |          |--1.07%--rav1e::asm::aarch64::dist::cdef_dist::cdef_dist_kernel (inlined)
                                  |          |          |          
                                  |          |           --0.81%--rav1e::activity::apply_ssim_boost (inlined)
                                  |          |                     |          
                                  |          |                      --0.71%--rav1e::activity::ssim_boost_rsqrt (inlined)
                                  |          |          
                                  |           --0.55%--rav1e::rdo::compute_distortion::{{closure}} (inlined)
                                  |          
                                   --0.98%--rav1e::rdo::sse_wxh (inlined)
                                             |          
                                              --0.77%--rav1e::rdo::compute_distortion::{{closure}} (inlined)
                                                        |          
                                                         --0.54%--rav1e::rdo::distortion_scale (inlined)

           3.16%        [.] rav1e::asm::aarch64::transform::forward::forward_transform_neon
            |          
             --3.11%--rav1e::asm::aarch64::transform::forward::forward_transform_neon
                       |          
                       |--0.65%--<core::iter::adapters::zip::Zip<A,B> as core::iter::traits::iterator::Iterator>::next (inlined)
                       |          <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next (inlined)
                       |          
                        --0.51%--rav1e::asm::aarch64::transform::forward::transpose_8x8_neon (inlined)

           2.99%        [.] rav1e::encoder::encode_block_post_cdef
            |          
             --2.42%--rav1e::encoder::encode_partition_topdown
                       rav1e::rdo::rdo_partition_decision
                       |          
                       |--1.32%--rav1e::rdo::rdo_partition_simple (inlined)
                       |          rav1e::rdo::rdo_mode_decision
                       |          |          
                       |           --1.28%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                       |                     <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                       |                     core::iter::traits::iterator::Iterator::try_fold (inlined)
                       |                     <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                       |                     rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                       |                     rav1e::rdo::luma_chroma_mode_rdo
                       |                     rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
                       |                     rav1e::encoder::encode_block_post_cdef
                       |          
                        --1.10%--rav1e::rdo::rdo_partition_none (inlined)
                                  rav1e::rdo::rdo_mode_decision
                                  |          
                                   --1.07%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                                             <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                                             core::iter::traits::iterator::Iterator::try_fold (inlined)
                                             <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                                             rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                                             rav1e::rdo::luma_chroma_mode_rdo
                                             rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
                                             rav1e::encoder::encode_block_post_cdef

           1.92%        [.] rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
            |
            ---rav1e::encoder::encode_tx_block
               rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
               |          
               |--0.77%--rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_coeffs (inlined)
               |          
                --0.57%--rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_coeff_signs (inlined)

           1.91%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct_ii_16
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               |          
               |--1.29%--rav1e::asm::aarch64::transform::forward::daala_fdct16
               |          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_16
               |          |          
               |          |--0.52%--rav1e::asm::aarch64::transform::forward::daala_fdct_ii_8_asym (inlined)
               |          |          
               |           --0.51%--rav1e::asm::aarch64::transform::forward::daala_fdst_iv_8_asym (inlined)
               |          
                --0.61%--rav1e::asm::aarch64::transform::forward::daala_fdct64
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32_asym (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_16

           1.87%        [.] core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut
            |
            ---rav1e::api::internal::ContextInner<T>::receive_packet
               rav1e::api::internal::ContextInner<T>::compute_block_importances (inlined)
               <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each (inlined)
               rav1e::api::internal::ContextInner<T>::compute_block_importances::{{closure}} (inlined)
               rav1e::api::internal::ContextInner<T>::update_block_importances (inlined)
               core::iter::traits::iterator::Iterator::for_each (inlined)
               <core::iter::adapters::flatten::FlatMap<I,U,F> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold (inlined)
               <core::iter::adapters::fuse::Fuse<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::traits::iterator::Iterator::fold (inlined)
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
               core::iter::adapters::map::map_fold::{{closure}} (inlined)
               core::iter::adapters::flatten::FlattenCompat<I,U>::iter_fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::fold::flatten::{{closure}} (inlined)
               <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold (inlined)
               core::iter::traits::iterator::Iterator::fold (inlined)
               <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}} (inlined)
               core::iter::adapters::map::map_fold::{{closure}} (inlined)
               core::ops::function::impls::<impl core::ops::function::FnMut<A> for &mut F>::call_mut
               |          
                --1.82%--core::iter::traits::iterator::Iterator::for_each::call::{{closure}} (inlined)
                          rav1e::api::internal::ContextInner<T>::update_block_importances::{{closure}} (inlined)
                          |          
                           --0.71%--rav1e::api::internal::ContextInner<T>::update_block_importances::{{closure}}::{{closure}} (inlined)

           1.82%        [.] <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
            |          
             --1.79%--rav1e::encoder::encode_tx_block
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_coeffs (inlined)
                       <rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol_with_update
                       |          
                       |--0.60%--<rav1e::ec::WriterBase<S> as rav1e::ec::Writer>::symbol (inlined)
                       |          
                        --0.56%--rav1e::context::cdf_context::CDFContextLog::push (inlined)
                                  rav1e::context::cdf_context::CDFContextLogPartition<_>::push (inlined)

           1.77%        [.] rav1e_cdef_dist_kernel_8x8_hbd_neon
            |          
             --1.75%--rav1e::rdo::rdo_mode_decision
                       |          
                        --1.70%--rav1e::rdo::inter_frame_rdo_mode_decision (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each (inlined)
                                  core::iter::traits::iterator::Iterator::try_fold (inlined)
                                  <core::iter::adapters::take::Take<I> as core::iter::traits::iterator::Iterator>::for_each::check::{{closure}} (inlined)
                                  rav1e::rdo::inter_frame_rdo_mode_decision::{{closure}} (inlined)
                                  rav1e::rdo::luma_chroma_mode_rdo
                                  rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
                                  rav1e::rdo::compute_distortion
                                  rav1e::rdo::cdef_dist_wxh (inlined)
                                  rav1e::asm::aarch64::dist::cdef_dist::cdef_dist_kernel (inlined)
                                  cdef_dist_kernel_8x8_hbd_neon (inlined)

           1.76%        [.] rav1e::quantize::QuantizationContext::quantize
            |          
             --1.74%--rav1e::encoder::encode_tx_block
                       rav1e::quantize::QuantizationContext::quantize
                       |          
                        --0.66%--core::iter::traits::iterator::Iterator::max (inlined)
                                  core::iter::traits::iterator::Iterator::max_by (inlined)
                                  core::iter::traits::iterator::Iterator::reduce (inlined)
                                  |          
                                   --0.66%--<core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold (inlined)
                                             core::iter::traits::iterator::Iterator::fold (inlined)

           1.26%        [.] rav1e::encoder::encode_tx_block
            |          
             --1.23%--rav1e::encoder::encode_tx_block
                       |          
                        --0.83%--rav1e::encoder::diff (inlined)

           1.19%        [.] rav1e::quantize::rust::dequantize
            |          
             --1.19%--rav1e::encoder::encode_tx_block
                       rav1e::quantize::rust::dequantize
                       |          
                        --0.94%--<core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::next (inlined)
                                  <core::iter::adapters::zip::Zip<A,B> as core::iter::traits::iterator::Iterator>::next (inlined)
                                  <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next (inlined)

           1.03%        [.] rav1e::cdef::cdef_filter_superblock
            |          
             --0.97%--rav1e::encoder::encode_frame
                       rav1e::encoder::encode_tile_group (inlined)
                       rav1e::encoder::FrameState<T>::apply_tile_state_mut (inlined)
                       rav1e::encoder::encode_tile_group::{{closure}} (inlined)
                       rav1e::cdef::cdef_filter_tile
                       rav1e::cdef::cdef_filter_superblock

           0.98%        [.] rav1e::lrf::rust::sgrproj_box_ab_r1
            |          
             --0.52%--rav1e::lrf::sgrproj_stripe_filter
                       rav1e::lrf::rust::sgrproj_box_ab_r1
                       |          
                        --0.51%--rav1e::lrf::rust::sgrproj_box_ab_internal (inlined)

           0.97%        [.] rav1e::rdo::luma_chroma_mode_rdo::{{closure}}
            |          
             --0.74%--rav1e::encoder::encode_partition_topdown
                       |          
                        --0.73%--rav1e::rdo::rdo_partition_decision
                                  |          
                                   --0.51%--rav1e::rdo::rdo_partition_simple (inlined)
                                             rav1e::rdo::rdo_mode_decision

           0.95%        [.] prep_neon
            |          
             --0.70%--rav1e::rdo::rdo_partition_decision

           0.93%        [.] rav1e::asm::aarch64::transform::forward::daala_fdct_ii_8
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               |          
                --0.78%--rav1e::asm::aarch64::transform::forward::daala_fdct32
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32 (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_16_asym (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_8

           0.91%        [.] rav1e::me::get_fullpel_mv_rd
            |
            ---rav1e::me::estimate_motion
               |          
                --0.90%--rav1e::me::full_pixel_me (inlined)
                          |          
                           --0.86%--rav1e::me::full_pixel_me::{{closure}}

           0.79%        [.] rav1e::asm::aarch64::transform::forward::daala_fdst_iv_16
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               |          
                --0.77%--rav1e::asm::aarch64::transform::forward::daala_fdct64
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32_asym (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdst_iv_16

           0.75%        [.] rav1e::deblock::sse_size14
           0.73%        [.] rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_contexts
            |          
             --0.71%--rav1e::encoder::encode_tx_block
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_coeffs (inlined)
                       rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_contexts
                       |          
                        --0.68%--rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_ctx (inlined)
                                  |          
                                   --0.54%--rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_ctx_from_stats (inlined)

           0.70%        [.] rav1e::asm::aarch64::transform::forward::daala_fdst_iv_8
            |
            ---rav1e::asm::aarch64::transform::forward::forward_transform_neon
               |          
                --0.70%--rav1e::asm::aarch64::transform::forward::daala_fdct32
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_32 (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdct_ii_16_asym (inlined)
                          rav1e::asm::aarch64::transform::forward::daala_fdst_iv_8

           0.66%        [.] rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_mag
            |          
             --0.64%--rav1e::encoder::encode_tx_block
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::write_coeffs_lv_map
                       rav1e::context::block_unit::<impl rav1e::context::cdf_context::ContextWriter>::encode_coeffs (inlined)
                       rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_contexts
                       rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_map_ctx (inlined)
                       rav1e::context::transform_unit::<impl rav1e::context::cdf_context::ContextWriter>::get_nz_mag

           0.65%        [.] rav1e_avg_16bpc_neon
           0.56%        [.] rav1e::me::full_pixel_me::{{closure}}
            |
            ---rav1e::me::estimate_motion
               rav1e::me::full_pixel_me (inlined)
               rav1e::me::full_pixel_me::{{closure}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants