The one remaining problem here is that the quantization gird is asymmetric, i.e. That's often a lot cheaper than suing a built-in rounding function. That's done here by adding 0.5 and then truncation or 'round towards -infinity'. 15) and then simply convert to integer using appropriate rounding methods. So basically you scale the floating point vector to the same range as your quantization levels (from -16. Convert to integer using 'truncation' (which is the default behavior for most languages).Calculate the ratio of quantization levels (32 in your case) to the value range (-4 to 4, so 8 in your case.The input to quantization is a vector of floating point numbers, the output are integer numbers so there needs to be a type conversion step.Īn efficient quantization algorithm is the following