numpy.vectorize
conveniently converts a scalar function to vectorized functions that can be applied directly to arrays. However, when inputting a single value into the vectorized function, the output is a 0-dimentional array instead of the corresponding value type, which can cause errors when using the result elsewhere due to typing issues. My question is: is there a mechanism in numpy
that can resolve this problem by automatically convert the 0-dimensional array return value to the corresponding data type?
For explanation I'd give an example:
@np.vectorize ( excluded = ( 1, 2 ) )
def rescale (
value: float,
srcRange: tuple [ float, float ],
dstRange: tuple [ float, float ] = ( 0, 1 ),
) -> float:
srcMin, srcMax = srcRange
dstMin, dstMax = dstRange
t = ( value - srcMin ) / ( srcMax - srcMin )
return dstMin + t * ( dstMax - dstMin )
When calling the function above with rescale ( 5, ( 0, 10 ) )
the return value is numpy.array(0.5)
instead of just the value 0.5
.
Currently I resolve this problem by a self-defined decorator:
def vectorize0dFix ( func ):
def _func ( *args, **kwargs ):
result = func ( *args, **kwargs )
if isinstance ( result, np.ndarray ) and result.shape == ( ):
return result.item ( )
else:
return result
return _func
But if this problem do causes trouble there should be a mechanism in numpy
which properly deals with the problem. I wonder whether there is one or why there isn't.
numpy.vectorize
conveniently converts a scalar function to vectorized functions that can be applied directly to arrays. However, when inputting a single value into the vectorized function, the output is a 0-dimentional array instead of the corresponding value type, which can cause errors when using the result elsewhere due to typing issues. My question is: is there a mechanism in numpy
that can resolve this problem by automatically convert the 0-dimensional array return value to the corresponding data type?
For explanation I'd give an example:
@np.vectorize ( excluded = ( 1, 2 ) )
def rescale (
value: float,
srcRange: tuple [ float, float ],
dstRange: tuple [ float, float ] = ( 0, 1 ),
) -> float:
srcMin, srcMax = srcRange
dstMin, dstMax = dstRange
t = ( value - srcMin ) / ( srcMax - srcMin )
return dstMin + t * ( dstMax - dstMin )
When calling the function above with rescale ( 5, ( 0, 10 ) )
the return value is numpy.array(0.5)
instead of just the value 0.5
.
Currently I resolve this problem by a self-defined decorator:
def vectorize0dFix ( func ):
def _func ( *args, **kwargs ):
result = func ( *args, **kwargs )
if isinstance ( result, np.ndarray ) and result.shape == ( ):
return result.item ( )
else:
return result
return _func
But if this problem do causes trouble there should be a mechanism in numpy
which properly deals with the problem. I wonder whether there is one or why there isn't.
1 Answer
Reset to default 3Short answer:
- You can unwrap 0-d results into scalars while keeping n-d results (n>0) by indexing with an empty tuple
()
. - Better yet, I would try to avoid using
@np.vectorize
altogether – in general, but in particular with your given example where vectorization is not necessary.
Long answer:
Following these answers to related questions, by indexing with an empty tuple ()
, you can systematically unwrap 0-d arrays into scalars while keeping other arrays.
So, using the @np.vectorize
d function rescale()
from your question, you can post-process your results accordingly, for example:
with_scalar_input = rescale(5, (0, 10))[()]
with_vector_input = rescale([5], (0, 10))[()]
print(type(with_scalar_input)) # <class 'numpy.float64'>
print(type(with_vector_input)) # <class 'numpy.ndarray'>
I am not aware of any built-in NumPy mechanism that solves this edge case of @np.vectorize
for you, so providing your own decorator is probably a viable way to go.
Custom scalar-unwrapping @vectorize
decorator
Writing your own custom decorator that (a) accepts all arguments of and behaves exactly like @np.vectorize
, but (b) appends the scalar unwrapping step, could look as follows:
from functools import wraps
import numpy as np
def vectorize(*wa, **wkw):
def decorator(f):
@wraps(f)
def wrap(*fa, **fkw): return np.vectorize(f, *wa, **wkw)(*fa, **fkw)[()]
return wrap
return decorator
@vectorize(excluded=(1, 2))
def rescale(value, srcRange, dstRange=(0, 1)):
srcMin, srcMax = srcRange
dstMin, dstMax = dstRange
t = (value - srcMin) / (srcMax - srcMin)
return dstMin + t * (dstMax - dstMin)
with_scalar_input = rescale(5, (0, 10))
with_vector_input = rescale([5], (0, 10))
print(type(with_scalar_input)) # <class 'numpy.float64'>
print(type(with_vector_input)) # <class 'numpy.ndarray'>
If you don't care about docstring propagation (of which @functools.wraps
takes care), the @vectorize
decorator can be shortened to:
import numpy as np
vectorize = lambda *wa, **wkw: lambda f: lambda *fa, **fkw: \
np.vectorize(f, *wa, **wkw)(*fa, **fkw)[()]
@vectorize(excluded=(1, 2))
def rescale(value, srcRange, dstRange=(0, 1)):
srcMin, srcMax = srcRange
dstMin, dstMax = dstRange
t = (value - srcMin) / (srcMax - srcMin)
return dstMin + t * (dstMax - dstMin)
with_scalar_input = rescale(5, (0, 10))
with_vector_input = rescale([5], (0, 10))
print(type(with_scalar_input)) # <class 'numpy.float64'>
print(type(with_vector_input)) # <class 'numpy.ndarray'>
Caution: All approaches using ()
, as proposed above, produce a new edge case: if the input is provided as a 0-d NumPy array, such as np.array(5)
, the result will also be unwrapped into a scalar. Likewise, you might have noticed that the scalar results are NumPy scalars, <class 'numpy.float64'>
, rather than native Python scalars, <class 'float'>
. If either of this is not acceptable for you, then more elaborate type checking or post-processing will be necessary.
Try to avoid @np.vectorize
altogether
As a final note: Maybe try to avoid using @np.vectorize
altogether in the first place, and try to write your code such that it works both with NumPy arrays and scalars.
As to avoiding @np.vectorize
: Its documentation states:
The
vectorize
function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
As to adjusting your code accordingly: Your given function rescale()
is a good example for writing code that works both with NumPy arrays and scalars correctly; in fact, it does so already, without any adjustments! You just have to ensure that vector-valued input is given as a NumPy array (rather than, say, a plain Python list or tuple):
import numpy as np
def rescale(value, srcRange, dstRange=(0, 1)):
srcMin, srcMax = srcRange
dstMin, dstMax = dstRange
t = (value - srcMin) / (srcMax - srcMin)
return dstMin + t * (dstMax - dstMin)
with_scalar_input = rescale(5, (0, 10))
with_vector_input = rescale(np.asarray([5]), (0, 10))
print(type(with_scalar_input)) # <class 'float'>
print(type(with_vector_input)) # <class 'numpy.ndarray'>
Moreover, while producing exactly the same output for vector-type input¹, the @np.vectorize
d version is orders of magnitude slower:
import numpy as np
from timeit import Timer
def rescale(value, srcRange, dstRange=(0, 1)):
srcMin, srcMax = srcRange
dstMin, dstMax = dstRange
t = (value - srcMin) / (srcMax - srcMin)
return dstMin + t * (dstMax - dstMin)
vectorized = np.vectorize(rescale, excluded=(1, 2))
a = np.random.normal(size=10000)
assert (rescale(a, (0, 10)) == vectorized(a, (0, 10))).all() # Same result?
print("Unvectorized:", Timer(lambda: rescale(a, (0, 10))).timeit(100))
print("Vectorized:", Timer(lambda: vectorized(a, (0, 10))).timeit(100))
On my machine, this produces about 0.003
seconds for the unvectorized version and about 0.8
seconds for the vectorized version.
In other words: we have more than a 250× speedup with the given, unvectorized function for a given 10,000-element array, while (if used carefully, i.e. by providing NumPy arrays rather than plain Python sequences for vector-type inputs) the function already produces scalar outputs for scalar inputs and vector outputs for vector inputs!
I guess the code above might not be the code that you are actually trying to vectorize; but anyway: in a lot of cases, a similar approach is possible.
¹) Again, the case of a 0-d vector input is special here, but you might want to check that for yourself.
vectorize
in production code (not just experimental things), you should try to find and understand its code. Currently the [source] link of its__call__
method docs is the most direct link. github/numpy/numpy/blob/v2.2.0/numpy/lib/… – hpaulj Commented 2 days ago