Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: TestPipeline.EngineFactories fails on arm64 #525

Closed
2 tasks done
dagardner-nv opened this issue Dec 20, 2024 · 3 comments · Fixed by #524
Closed
2 tasks done

[BUG]: TestPipeline.EngineFactories fails on arm64 #525

dagardner-nv opened this issue Dec 20, 2024 · 3 comments · Fixed by #524
Assignees
Labels
bug Something isn't working

Comments

@dagardner-nv
Copy link
Contributor

Version

25.02

Which installation method(s) does this occur on?

Docker

Describe the bug.

Observed in CI:

2024-12-20T18:30:16.0896967Z [ RUN      ] TestPipeline.EngineFactories
2024-12-20T18:30:16.0898474Z E20241220 18:30:14.152635 281473592543296 service.cpp:40] Must call Service::call_in_destructor to ensure service is cleaned up before being destroyed
2024-12-20T18:30:16.0900601Z unknown file: Failure
2024-12-20T18:30:16.0901830Z C++ exception with description "cpu_set must be a subset of the initial topology to create a fiber pool" thrown in the test body.
2024-12-20T18:30:16.0902949Z 
2024-12-20T18:30:16.0903255Z [  FAILED  ] TestPipeline.EngineFactories (92 ms)

Minimum reproducible example

Enable TestPipeline.EngineFactories

Relevant log output

Full env printout

Other/Misc.

No response

Code of Conduct

  • I agree to follow MRC's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
@dagardner-nv dagardner-nv added the bug Something isn't working label Dec 20, 2024
dagardner-nv added a commit to dagardner-nv/MRC that referenced this issue Dec 20, 2024
@ericevans-nv
Copy link

ericevans-nv commented Jan 10, 2025

I ran into this issue as well.

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i9-14900K
CPU family: 6
Model: 183
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 1
Stepping: 1
CPU(s) scaling MHz: 28%
CPU max MHz: 6000.0000
CPU min MHz: 800.0000
BogoMIPS: 6374.40
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pd
pe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmul
qdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid
ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 x
saves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip p
ku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabil
ities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 896 KiB (24 instances)
L1i: 1.3 MiB (24 instances)
L2: 32 MiB (12 instances)
L3: 36 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-31
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Mitigation; Clear Register File
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Srbds: Not affected
Tsx async abort: Not affected

@willkill07
Copy link
Contributor

Just wanted to add info -- on a GH200 system I see the following test output in verbose mode:

I20250114 23:08:59.645902 258971845740736 partitions.cpp:219] split cpuset: [cpu_set - count: 72; str: 0-71]
I20250114 23:08:59.645936 258971845740736 partitions.cpp:249] host_partition_id: 0 contains 72 logical cpus (0-71) with 470.9 GiB memory
I20250114 23:08:59.645955 258971845740736 partitions.cpp:279] evaluating engine factory cpu sets for host_partition 0-71
I20250114 23:08:59.645991 258971845740736 engine_factory_cpu_sets.cpp:61] hyper_threading [off]: [cpu_set - count: 72; str: 2,4-11,14,16-26,28-56,58-68,70,72-81]
I20250114 23:08:59.646014 258971845740736 engine_factory_cpu_sets.cpp:75] computing cpu_sets for engine factories
I20250114 23:08:59.646017 258971845740736 engine_factory_cpu_sets.cpp:76] - using dedicated_main_thread: FALSE
I20250114 23:08:59.646020 258971845740736 engine_factory_cpu_sets.cpp:77] - default engine type        : fiber

@willkill07
Copy link
Contributor

Followup: using os_index is problematic within hwloc. logical_index appears to give the expected value necessary across various platforms (consistent on AMD and GH200).

diff --git a/cpp/mrc/src/internal/system/engine_factory_cpu_sets.cpp b/cpp/mrc/src/internal/system/engine_factory_cpu_sets.cpp
index 0405af38..15f4da93 100644
--- a/cpp/mrc/src/internal/system/engine_factory_cpu_sets.cpp
+++ b/cpp/mrc/src/internal/system/engine_factory_cpu_sets.cpp
@@ -51,12 +51,18 @@ EngineFactoryCpuSets generate_engine_factory_cpu_sets(const Topology& topology,

     if (options.engine_factories().ignore_hyper_threads())
     {
-        auto core_count = hwloc_get_nbobjs_inside_cpuset_by_type(topology.handle(), &cpu_set.bitmap(), HWLOC_OBJ_CORE);
-        for (int i = 0; i < core_count; i++)
+        struct hwloc_obj* core_obj{nullptr};
+        while (true)
         {
-            auto* core_obj =
-                hwloc_get_obj_inside_cpuset_by_type(topology.handle(), &cpu_set.bitmap(), HWLOC_OBJ_CORE, i);
-            pe_set.on(core_obj->os_index);
+            core_obj = hwloc_get_next_obj_inside_cpuset_by_type(topology.handle(),
+                                                                &cpu_set.bitmap(),
+                                                                HWLOC_OBJ_CORE,
+                                                                core_obj);
+            if (core_obj == nullptr)
+            {
+                break;
+            }
+            pe_set.on(core_obj->logical_index);
         }
         DVLOG(10) << "hyper_threading [off]: " << pe_set;
     }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants