In java there is an API called VectorApi. It makes possible to do arithmetical operations on a whole float[] array in a single cpu cycle.
For Example:
FloatVector fv = FloatVector.fromArray(FloatVector.SPECIES_PREFERRED, new float[]{1, 2, 3, 4, 5, 6, 7}, 0);
//multiplies the wohle array in a single cycle by 2 (if the CPU supports this)
fv.mul(2f);
now I would like to calculate the result of 1f / FloatVector. For now I do it by
fv.pow(-1f);
I assume this could be a slow operation. Is there a better way to do this?
In java there is an API called VectorApi. It makes possible to do arithmetical operations on a whole float[] array in a single cpu cycle.
For Example:
FloatVector fv = FloatVector.fromArray(FloatVector.SPECIES_PREFERRED, new float[]{1, 2, 3, 4, 5, 6, 7}, 0);
//multiplies the wohle array in a single cycle by 2 (if the CPU supports this)
fv.mul(2f);
now I would like to calculate the result of 1f / FloatVector. For now I do it by
fv.pow(-1f);
I assume this could be a slow operation. Is there a better way to do this?
Share Improve this question edited Jan 21 at 7:51 neoexpert asked Jan 20 at 12:21 neoexpertneoexpert 4741 gold badge12 silver badges22 bronze badges 7 | Show 2 more comments1 Answer
Reset to default 4I got this code to run on an Intel architecture Windows Laptop (jshell, so no class)
import jdk.incubator.vector.FloatVector;
import jdk.incubator.vector.VectorSpecies;
VectorSpecies SPECIES = FloatVector.SPECIES_256;
FloatVector ONE = FloatVector.zero(SPECIES).add(1f);
FloatVector fv = FloatVector.fromArray(SPECIES, new float[]{1, 2, 3, 4, 5, 6, 7, 8}, 0);
fv.pow(-1f);
ONE.div(fv); // gives the same result as the above pow operation
I did not do any performance measurements, as they are probably also platform dependent, but as you can define ONE
as constant and don't have to consider construction and addition as time consuming operations, you could do that yourself to find out if ONE.div(fv)
performs better than fv.pow(-1f);
log2x
and2^x
hiding inside but some compilers special case certain values that they know how to do some more efficient way like -1, -1/2, -1/3 for example. BTW Division is never single cycle on any conventional hardware it is typically an order of magnitude slower than all of the other binary operations+, -, *
. – Martin Brown Commented Jan 20 at 13:04invert
(and evendivInvert
[value/vectorElement]) to Vector and to VectorOperators – user85421 Commented Jan 20 at 14:31