I am generating noise using a gaussian distribution, and have hit a conceptual block.
Is there a difference between generating a noise value and adding it to clean data:
def add_noise(data_frame, amplitude):
noise = np.random.normal(0, scale = amplitude * 0.01, size = len(data_frame))
return data_frame + noise
Or generating noise directly using the data you have:
def add_noise_alt(data_frame, amplitude):
noise = np.random.normal(data_frame, scale = amplitude * 0.01)
return noise
The plots returned are very similar, but conceptually they seem to be different things.
I am generating noise using a gaussian distribution, and have hit a conceptual block.
Is there a difference between generating a noise value and adding it to clean data:
def add_noise(data_frame, amplitude):
noise = np.random.normal(0, scale = amplitude * 0.01, size = len(data_frame))
return data_frame + noise
Or generating noise directly using the data you have:
def add_noise_alt(data_frame, amplitude):
noise = np.random.normal(data_frame, scale = amplitude * 0.01)
return noise
The plots returned are very similar, but conceptually they seem to be different things.
Share Improve this question edited Nov 19, 2024 at 17:46 chrslg 13.5k5 gold badges23 silver badges38 bronze badges asked Nov 19, 2024 at 17:43 hiddenuserhiddenuser 174 bronze badges 5 |2 Answers
Reset to default 1Mathematically, it is the same.
Each value is data_frame
plus a centered noise with standard deviation of amplitude×0.01
You can see it
import matplotlib.pyplot as plt
import numpy as np
x=np.arange(1000000)%100.0 # Just a way to have values between 0 and 100.
y=np.random.normal(x, 0.1) # Note that stdev is small compared to x values
plt.hist(y-x, bins=50)
plt.show()
print((y-x).mean()) # 1.0911322496417843e-05 Small enough
print((y-x).std()) # 0.10018683917918221 close enough to 0.1
So clearly, y-x is just a normal distribution of mean 0 and stdev 0.1. Exactly as if I had defined y=x + np.random.normal(0, 0.1, x.shape)
From a computation point of view, I would say that both are vectorized enough. (Whether is is np.random.normal
that add the numbers that you passed to it, or whether it is numpy's +
operator, must be roughly the same)
Is there a difference between generating a noise value and adding it to clean data, or generating noise directly using the data you have?
No.
If you look at the way that NumPy uses the loc
and scale
parameters, it uses them by multiplying the random value by scale
, then adding loc
.
double random_normal(bitgen_t *bitgen_state, double loc, double scale) {
return loc + scale * random_standard_normal(bitgen_state);
}
Adding the value during generation, or after generation is the same thing.
You can check this idea experimentally by re-seeding the random number generator to the same value multiple times to get the same sequence out.
import numpy as np
values = np.random.rand(100)
np.random.seed(42)
loc_generated = np.random.normal(values, size=100)
np.random.seed(42)
add_generated = np.random.normal(0, size=100) + values
print(np.allclose(add_generated, loc_generated))
prints
True
x+f(x)*np.random.normal(0,1)
ornp.random.normal(x, f(x))
doesn't seem fundamentally different (it is not like you would have to do a for loop in one case). Or I am missing something? (I mean, clearly, there are some noises for which it is more complicated. But here, we are talking gaussian noise. Even if each center and each standard deviation is unique, available in 2 arrays,centers
andstdev
,np.random.normal(center, stdev)
would work, and so wouldcenter+stdev*np.random.normal(0,1)
. I would prefer the 1st (because, if numpy offers to do it for you, it is generally – chrslg Commented Nov 19, 2024 at 18:03*
and+
of the second form were done in pure python. – chrslg Commented Nov 19, 2024 at 18:04np.random.normal([1e-30, 1e-30], [1e-30, 1e30])
is better thannp.random.normal(0,1,2,))*[1e-30, 1e30]+[1e-30,1e30]
. But I am not even sure of that – chrslg Commented Nov 19, 2024 at 18:10