最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

onnx - Onnxruntime quantization script for MatMulNbits, what is the type after conversion? - Stack Overflow

programmeradmin7浏览0评论

In the onnxruntime documentation, for quantization here:

.html#quantize-to-int4uint4

It sets accuracy_level=4 which means it's a 4 bit quantization corresponding to int4/uint4.

However in the MatMulNbits documentation, accuracy level of 4 means int8:

.md#attributes-35

And if using that script to apply quantization the MatMulNbits node's accuracy level is 4 and bits is 4, however the type for the tensor is int8.

So is this quantization converting weights to int4?

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论