最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

arm - Does Pico 2, utilizes SIMD-instructions or just loop-unrolling during arm_dot_prod_f32? - Stack Overflow

programmeradmin0浏览0评论

The title says almost everything. Does (Cortex-M33) RP2350 rasberry-pi-pico 2 utilizes any SIMD-instructions (at all) via DSP during arm_dot_prod_f32 or does it simply loop-unrolls? I know for certain that RP2040 doesn't have SIMD. Or that ESP32-S3 uses it's own cool dsps_dotprod_f32_ae32, but for RP2350...

The title says almost everything. Does (Cortex-M33) RP2350 rasberry-pi-pico 2 utilizes any SIMD-instructions (at all) via DSP during arm_dot_prod_f32 or does it simply loop-unrolls? I know for certain that RP2040 doesn't have SIMD. Or that ESP32-S3 uses it's own cool dsps_dotprod_f32_ae32, but for RP2350...

Share Improve this question edited Mar 28 at 18:28 artless-noise-bye-due2AI 22.5k6 gold badges73 silver badges110 bronze badges asked Mar 28 at 18:06 Gios XouGios Xou 2,2441 gold badge21 silver badges40 bronze badges 4
  • I don't believe the Cortex-M33 includes the instructions used by arm_dot_prod_f32. – Tim Roberts Commented Mar 28 at 18:32
  • f32x4_t vecA, vecB; are SIMD vector types and vfmaq() is the intrinsic for a 16-byte vector FMA. But that code is inside #if defined(ARM_MATH_NEON). You can use a disassembler to see if it uses and d or q registers, or if it only uses s registers with scalar FP like en.wikipedia./wiki/ARM_Cortex-M#Cortex-M33 says is the only hardware FPU option on Cortex-M33. Assuming Wikipedia is correct and the build system defines the appropriate macros for -mcpu=cortex-m33, it will have to use the scalar code paths that at most unrolls a loop. – Peter Cordes Commented Mar 28 at 18:32
  • Blog on Cortex-M DSP, PDF on Cortex-M DSP. The RP2350 'colophon' you cite says the device has DSP features. MAC (multiply accumulate) is useful for dot product and it is more than loop unrolling. It is not SIMD. (info for code writer, not library users). – artless-noise-bye-due2AI Commented Mar 28 at 18:35
  • Also quite possible that CMIS will translate vfmaq to a DSP instruction via includes (at least for some data types). – artless-noise-bye-due2AI Commented Mar 28 at 18:45
Add a comment  | 

1 Answer 1

Reset to default 0

Today I had my Pico 2 delivered. I downloaded this Arduino-IDE core and by adding a few simple #error "messages" at arm_dot_prod_f32(...) in ~/Arduino/libraries/Arduino_CMSIS-DSP/src/arm_math.h I figured out it actually doesn't even compile loop-unrolling.

Moreover

Even though __ARM_FEATURE_DSP is enabled via -march=armv8-m.main+fp+dsp and -mcpu=cortex-m33 as seen at boards.rxt (and tested via #pragma message), compiling a sketch with:

  1. #define ARM_MATH_NEON results in incompatibility errors
  2. #define ARM_MATH_MVEF -> #error "MVE feature not supported"
  3. #define LOOPUNROLL ON or #define ARM_MATH_LOOPUNROLL does nothing

Therefore I either have to add -DLOOPUNROLL=ON at boards.txt or see if anything else is supposed to make it work.

Results

I run this poor example a few times both with loop-unrolling (by manually editing the source) and normal-loop, using -Ofast vs -O0-(disabled)

#include <arm_math.h>

void setup() {
  Serial.begin(9600);

  float x[500];
  float y[500];
  float dest;

  for (int i=0; i<500; ++i){
    x[i] = (i+1)/1000.0;
    y[i] = i/1100.0;
  } 

  unsigned long startTime = micros(); 

  for (int i=0; i<500; ++i)
    arm_dot_prod_f32(x, y, i, &dest); 

  Serial.print(micros() - startTime);
  Serial.print(" microseconds | ");
  Serial.println(dest,7);
}

for -O0 (disabled) average results were:

normal-loop loop-unroll Difference x,y,i
20190 μs 14656 μs 5534 μs 500
12985 μs 9457 μs 3528 μs 400
7336 μs 5365 μs 1971 μs 300
3289 μs 2441 μs 848 μs 200
861 μs 657 μs 204 μs 100
220 μs 178 μs 42 μs 50
64 μs 54 μs 10 μs 25

(TODO: -Ofast)

Conclusion

Loop-unrolling has an effect but unfortunately it doesn't compile by default or I wasn't able to do so without edditing the source. Also, Don't take me on word, might still be wrong. But I tried my best.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论