python - Should we pre-calculate scalar calculations before we apply them to dataframe columns?

Just curious if option (b) is more efficient than option (a)? At the first glance, option (a) will have several times of more operations than option (b). But I did some simulations for a million rows in df, option (b) is just a fraction faster on average. Does it mean the Pandas will group all the scalar operations in option (a) automatically?

(a) Variable a, b, c, d, e, f are all scalars.

    df['val2'] = (a*b+c*d)*df['val1']*e/f

(b)

    x = (a*b+c*d)*e/f
    df['val2'] = df['val1']*x

(a) Variable a, b, c, d, e, f are all scalars.

    df['val2'] = (a*b+c*d)*df['val1']*e/f

(b)

    x = (a*b+c*d)*e/f
    df['val2'] = df['val1']*x

Share Improve this question asked Feb 14 at 18:43 sguo 1542 silver badges10 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

Yes, it is better to pre-compute x. Actually what matters is the operator precedence and the order in which the operations are performed.

Assuming s your Series, when you run (a*b+c*d)*s*e/f you perform two multiplications and one division of the full Series. If you pre-compute or use (a*b+c*d)*e/f*s, then there is only one operation involving the Series.

Example:

%timeit x*s
1.19 ms ± 73.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit (a*b+c*d)*s*e/f
3.45 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit s*(a*b+c*d)*e/f
3.63 ms ± 84.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# now let's force the scalar operation to be grouped
%timeit s*((a*b+c*d)*e/f)
1.21 ms ± 29.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit (a*b+c*d)*e/f*s
1.14 ms ± 80.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Setup:

s = pd.Series(np.arange(1_000_000))
a=b=c=d=e=f=2
x = (a*b+c*d)*e/f

In the initial (a*b+c*d)*df['val1']*e/f, the order or the operations is:

a*b       # ab      #
c*d       # cd      # scalars
ab + cd   # abcd    #
s * abcd  # sabcd      #
e * sabcd # esabcd     # Series
esabcd / f             #

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - Should we pre-calculate scalar calculations before we apply them to dataframe columns? - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)