python - 0-dimensional array problems with `numpy.vectorize`

numpy.vectorize conveniently converts a scalar function to vectorized functions that can be applied directly to arrays. However, when inputting a single value into the vectorized function, the output is a 0-dimentional array instead of the corresponding value type, which can cause errors when using the result elsewhere due to typing issues. My question is: is there a mechanism in numpy that can resolve this problem by automatically convert the 0-dimensional array return value to the corresponding data type?

For explanation I'd give an example:

@np.vectorize ( excluded = ( 1, 2 ) )
def rescale ( 
    value: float, 
    srcRange: tuple [ float, float ], 
    dstRange: tuple [ float, float ] = ( 0, 1 ), 
) -> float:
    srcMin, srcMax = srcRange
    dstMin, dstMax = dstRange
    t = ( value - srcMin ) / ( srcMax - srcMin )
    return dstMin + t * ( dstMax - dstMin )

When calling the function above with rescale ( 5, ( 0, 10 ) ) the return value is numpy.array(0.5) instead of just the value 0.5.

Currently I resolve this problem by a self-defined decorator:

def vectorize0dFix ( func ):
    def _func ( *args, **kwargs ):
        result = func ( *args, **kwargs )
        if isinstance ( result, np.ndarray ) and result.shape == ( ):
            return result.item ( )
        else:
            return result
    return _func

But if this problem do causes trouble there should be a mechanism in numpy which properly deals with the problem. I wonder whether there is one or why there isn't.

For explanation I'd give an example:

@np.vectorize ( excluded = ( 1, 2 ) )
def rescale ( 
    value: float, 
    srcRange: tuple [ float, float ], 
    dstRange: tuple [ float, float ] = ( 0, 1 ), 
) -> float:
    srcMin, srcMax = srcRange
    dstMin, dstMax = dstRange
    t = ( value - srcMin ) / ( srcMax - srcMin )
    return dstMin + t * ( dstMax - dstMin )

When calling the function above with rescale ( 5, ( 0, 10 ) ) the return value is numpy.array(0.5) instead of just the value 0.5.

Currently I resolve this problem by a self-defined decorator:

def vectorize0dFix ( func ):
    def _func ( *args, **kwargs ):
        result = func ( *args, **kwargs )
        if isinstance ( result, np.ndarray ) and result.shape == ( ):
            return result.item ( )
        else:
            return result
    return _func

But if this problem do causes trouble there should be a mechanism in numpy which properly deals with the problem. I wonder whether there is one or why there isn't.

Share Improve this question asked 2 days ago F. X. P. 333 bronze badges

'why' questions are generally unanswerable. None of us are original developers, and few are current developers. So the best we can do is deduce reasons from patterns. – hpaulj Commented 2 days ago
If you try to include vectorize in production code (not just experimental things), you should try to find and understand its code. Currently the [source] link of its __call__ method docs is the most direct link. github/numpy/numpy/blob/v2.2.0/numpy/lib/… – hpaulj Commented 2 days ago

Add a comment |

1 Answer 1

Sorted by: Reset to default 3

Short answer:

You can unwrap 0-d results into scalars while keeping n-d results (n>0) by indexing with an empty tuple ().
Better yet, I would try to avoid using @np.vectorize altogether – in general, but in particular with your given example where vectorization is not necessary.

Long answer:

Following these answers to related questions, by indexing with an empty tuple (), you can systematically unwrap 0-d arrays into scalars while keeping other arrays.

So, using the @np.vectorized function rescale() from your question, you can post-process your results accordingly, for example:

with_scalar_input = rescale(5, (0, 10))[()]
with_vector_input = rescale([5], (0, 10))[()]
print(type(with_scalar_input))  # <class 'numpy.float64'>
print(type(with_vector_input))  # <class 'numpy.ndarray'>

I am not aware of any built-in NumPy mechanism that solves this edge case of @np.vectorize for you, so providing your own decorator is probably a viable way to go.

Custom scalar-unwrapping `@vectorize` decorator

Writing your own custom decorator that (a) accepts all arguments of and behaves exactly like @np.vectorize, but (b) appends the scalar unwrapping step, could look as follows:

from functools import wraps
import numpy as np

def vectorize(*wa, **wkw):
    def decorator(f):
        @wraps(f)
        def wrap(*fa, **fkw): return np.vectorize(f, *wa, **wkw)(*fa, **fkw)[()]
        return wrap
    return decorator

@vectorize(excluded=(1, 2))
def rescale(value, srcRange, dstRange=(0, 1)):
    srcMin, srcMax = srcRange
    dstMin, dstMax = dstRange
    t = (value - srcMin) / (srcMax - srcMin)
    return dstMin + t * (dstMax - dstMin)

with_scalar_input = rescale(5, (0, 10))
with_vector_input = rescale([5], (0, 10))
print(type(with_scalar_input))  # <class 'numpy.float64'>
print(type(with_vector_input))  # <class 'numpy.ndarray'>

If you don't care about docstring propagation (of which @functools.wraps takes care), the @vectorize decorator can be shortened to:

import numpy as np

vectorize = lambda *wa, **wkw: lambda f: lambda *fa, **fkw: \
            np.vectorize(f, *wa, **wkw)(*fa, **fkw)[()]

@vectorize(excluded=(1, 2))
def rescale(value, srcRange, dstRange=(0, 1)):
    srcMin, srcMax = srcRange
    dstMin, dstMax = dstRange
    t = (value - srcMin) / (srcMax - srcMin)
    return dstMin + t * (dstMax - dstMin)

with_scalar_input = rescale(5, (0, 10))
with_vector_input = rescale([5], (0, 10))
print(type(with_scalar_input))  # <class 'numpy.float64'>
print(type(with_vector_input))  # <class 'numpy.ndarray'>

Caution: All approaches using (), as proposed above, produce a new edge case: if the input is provided as a 0-d NumPy array, such as np.array(5), the result will also be unwrapped into a scalar. Likewise, you might have noticed that the scalar results are NumPy scalars, <class 'numpy.float64'>, rather than native Python scalars, <class 'float'>. If either of this is not acceptable for you, then more elaborate type checking or post-processing will be necessary.

Try to avoid `@np.vectorize` altogether

As a final note: Maybe try to avoid using @np.vectorize altogether in the first place, and try to write your code such that it works both with NumPy arrays and scalars.

As to avoiding @np.vectorize: Its documentation states:

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

As to adjusting your code accordingly: Your given function rescale() is a good example for writing code that works both with NumPy arrays and scalars correctly; in fact, it does so already, without any adjustments! You just have to ensure that vector-valued input is given as a NumPy array (rather than, say, a plain Python list or tuple):

import numpy as np

def rescale(value, srcRange, dstRange=(0, 1)):
    srcMin, srcMax = srcRange
    dstMin, dstMax = dstRange
    t = (value - srcMin) / (srcMax - srcMin)
    return dstMin + t * (dstMax - dstMin)

with_scalar_input = rescale(5, (0, 10))
with_vector_input = rescale(np.asarray([5]), (0, 10))
print(type(with_scalar_input))  # <class 'float'>
print(type(with_vector_input))  # <class 'numpy.ndarray'>

Moreover, while producing exactly the same output for vector-type input¹, the @np.vectorized version is orders of magnitude slower:

import numpy as np
from timeit import Timer

def rescale(value, srcRange, dstRange=(0, 1)):
    srcMin, srcMax = srcRange
    dstMin, dstMax = dstRange
    t = (value - srcMin) / (srcMax - srcMin)
    return dstMin + t * (dstMax - dstMin)

vectorized = np.vectorize(rescale, excluded=(1, 2))

a = np.random.normal(size=10000)
assert (rescale(a, (0, 10)) == vectorized(a, (0, 10))).all()  # Same result?
print("Unvectorized:", Timer(lambda: rescale(a, (0, 10))).timeit(100))
print("Vectorized:", Timer(lambda: vectorized(a, (0, 10))).timeit(100))

On my machine, this produces about 0.003 seconds for the unvectorized version and about 0.8 seconds for the vectorized version.

In other words: we have more than a 250× speedup with the given, unvectorized function for a given 10,000-element array, while (if used carefully, i.e. by providing NumPy arrays rather than plain Python sequences for vector-type inputs) the function already produces scalar outputs for scalar inputs and vector outputs for vector inputs!

I guess the code above might not be the code that you are actually trying to vectorize; but anyway: in a lot of cases, a similar approach is possible.

_{¹) Again, the case of a 0-d vector input is special here, but you might want to check that for yourself.}

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - 0-dimensional array problems with `numpy.vectorize` - Stack Overflow

1 Answer 1

Short answer:

Long answer:

Custom scalar-unwrapping `@vectorize` decorator

Try to avoid `@np.vectorize` altogether

与本文相关的文章

评论列表(0)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

1 Answer 1

Short answer:

Long answer:

Custom scalar-unwrapping @vectorize decorator

Try to avoid @np.vectorize altogether

与本文相关的文章

评论列表(0)

Custom scalar-unwrapping `@vectorize` decorator

Try to avoid `@np.vectorize` altogether