I'm working with a CSV file representing the sampling of a radio signal detector. My columns represent time, each one being 0.25 seconds, for a total of 3,600 columns. The rows represent the frequency at which the signal was captured, for a total of 200. For the next phase, my managers asked me to reduce the sampling resolution so that each column is equivalent to 1 second.
For those with more experience with Pandas, is there a method to reduce the CSV in this way?
I found this way to use the .mean() function:
row_avg = df.mean(axis=1)
But this works for the complete row. Is any way to use this or any other method to get the mean of 4 columns per row?
I'm working with a CSV file representing the sampling of a radio signal detector. My columns represent time, each one being 0.25 seconds, for a total of 3,600 columns. The rows represent the frequency at which the signal was captured, for a total of 200. For the next phase, my managers asked me to reduce the sampling resolution so that each column is equivalent to 1 second.
For those with more experience with Pandas, is there a method to reduce the CSV in this way?
I found this way to use the .mean() function:
row_avg = df.mean(axis=1)
But this works for the complete row. Is any way to use this or any other method to get the mean of 4 columns per row?
Share Improve this question edited Apr 1 at 5:59 snakecharmerb 56.1k13 gold badges134 silver badges187 bronze badges asked Apr 1 at 4:40 M1ctl4nt3cutl1M1ctl4nt3cutl1 714 bronze badges 1- 1 read notice : minimal reproducible example – Panda Kim Commented Apr 1 at 4:44
1 Answer
Reset to default 0There are many ways of doing this depending on the data you have.. however by far the simplest is to coerce your data into a time series index if it's not already and use .resample("1s").mean()
generate some sample data
>>> count = 3600
>>> df = pd.DataFrame({f"freq {i}MHz": np.random.randn(count)*100 for i in range(1,201)}, index=pd.date_range("2025-04-01", periods=count, freq="250ms")).T
>>> df
2025-04-01 00:00:00.000 2025-04-01 00:00:00.250 2025-04-01 00:00:00.500 ... 2025-04-01 00:14:59.250 2025-04-01 00:14:59.500 2025-04-01 00:14:59.750
freq 1MHz 56.277885 -76.715028 -19.436522 ... 10.744845 70.650393 21.947893
freq 2MHz -12.898406 1.598879 93.373505 ... -40.860931 -60.643461 -2.759319
freq 3MHz -65.795367 -86.950974 -2.421849 ... -103.419081 49.461729 82.793567
freq 4MHz 83.350473 191.231577 -161.003622 ... -132.219173 131.218199 25.113404
freq 5MHz 142.520352 125.725121 -3.235449 ... -10.971120 108.124412 -106.275392
... ... ... ... ... ... ... ...
freq 196MHz 77.526046 24.296306 -32.023278 ... 32.674445 30.273234 -84.255384
freq 197MHz 69.552259 199.034193 -150.456317 ... 105.675402 -32.833817 -41.417296
freq 198MHz -129.568614 43.051751 -10.824167 ... -11.545223 135.946161 9.608785
freq 199MHz 151.030454 -5.387144 71.144257 ... -101.057261 68.122765 -130.901913
freq 200MHz 25.296514 80.701457 32.373565 ... 146.546715 -155.170539 -78.732363
[200 rows x 3600 columns]
now just resample!
note .resample()
expects the data to be in columns (axis=1
is deprecated "Deprecated since version 2.0.0: Use frame.T.resample(…)
instead."), so you can transpose with .T
and then back again after the resample
>>> df.T.resample("1s").mean().T
2025-04-01 00:00:00 2025-04-01 00:00:01 2025-04-01 00:00:02 2025-04-01 00:00:03 ... 2025-04-01 00:14:56 2025-04-01 00:14:57 2025-04-01 00:14:58 2025-04-01 00:14:59
freq 1MHz 15.117181 -31.716795 23.756633 -27.578701 ... -19.287829 -6.923784 2.077375 32.955678
freq 2MHz 16.831645 56.396253 26.996447 0.327802 ... 63.255776 68.122855 29.455577 -7.643232
freq 3MHz -55.707993 28.662803 -59.010300 6.659348 ... -71.702346 47.037690 -27.076411 61.648608
freq 4MHz 11.924516 -79.776633 114.552432 62.607081 ... 82.446966 1.601780 30.801114 2.600828
freq 5MHz 61.761102 0.689386 35.821693 -13.006058 ... -87.572613 -18.939267 -45.864923 -14.326446
... ... ... ... ... ... ... ... ... ...
freq 196MHz 18.919907 -62.561788 15.818990 -26.336276 ... 11.755455 -42.537646 -79.619288 -28.461854
freq 197MHz 10.340424 -40.260711 -15.458829 -19.250989 ... 27.373582 -15.024511 -5.521782 -14.690702
freq 198MHz -38.281829 44.141023 -3.625441 -16.368935 ... 36.209582 26.942521 -61.729879 10.357420
freq 199MHz 61.499773 -23.392824 -21.751043 -18.025937 ... 20.019479 9.421139 -11.937625 -14.725595
freq 200MHz 46.654067 16.341845 43.656015 45.139867 ... -18.569374 31.027033 7.913154 -31.834015
[200 rows x 900 columns]
Closely inspecting a smaller section (8 values from the first 2 rows), you can see it working on a smaller scale
>>> df_mini = df.iloc[0:2,0:8].T
>>> df_mini
freq 1MHz freq 2MHz
2025-04-01 00:00:00.000 56.277885 -12.898406
2025-04-01 00:00:00.250 -76.715028 1.598879
2025-04-01 00:00:00.500 -19.436522 93.373505
2025-04-01 00:00:00.750 100.342389 -14.747396
2025-04-01 00:00:01.000 24.979359 70.526222
2025-04-01 00:00:01.250 15.195554 98.314063
2025-04-01 00:00:01.500 -209.241802 36.403973
2025-04-01 00:00:01.750 42.199710 20.340755
>>> df_mini.resample("1s").mean()
freq 1MHz freq 2MHz
2025-04-01 00:00:00 15.117181 16.831645
2025-04-01 00:00:01 -31.716795 56.396253