最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Downsampling spectra with Pandas - Stack Overflow

programmeradmin0浏览0评论

I'm working with a CSV file representing the sampling of a radio signal detector. My columns represent time, each one being 0.25 seconds, for a total of 3,600 columns. The rows represent the frequency at which the signal was captured, for a total of 200. For the next phase, my managers asked me to reduce the sampling resolution so that each column is equivalent to 1 second.

For those with more experience with Pandas, is there a method to reduce the CSV in this way?

I found this way to use the .mean() function:

row_avg = df.mean(axis=1)

But this works for the complete row. Is any way to use this or any other method to get the mean of 4 columns per row?

I'm working with a CSV file representing the sampling of a radio signal detector. My columns represent time, each one being 0.25 seconds, for a total of 3,600 columns. The rows represent the frequency at which the signal was captured, for a total of 200. For the next phase, my managers asked me to reduce the sampling resolution so that each column is equivalent to 1 second.

For those with more experience with Pandas, is there a method to reduce the CSV in this way?

I found this way to use the .mean() function:

row_avg = df.mean(axis=1)

But this works for the complete row. Is any way to use this or any other method to get the mean of 4 columns per row?

Share Improve this question edited Apr 1 at 5:59 snakecharmerb 56.1k13 gold badges134 silver badges187 bronze badges asked Apr 1 at 4:40 M1ctl4nt3cutl1M1ctl4nt3cutl1 714 bronze badges 1
  • 1 read notice : minimal reproducible example – Panda Kim Commented Apr 1 at 4:44
Add a comment  | 

1 Answer 1

Reset to default 0

There are many ways of doing this depending on the data you have.. however by far the simplest is to coerce your data into a time series index if it's not already and use .resample("1s").mean()

generate some sample data

>>> count = 3600
>>> df = pd.DataFrame({f"freq {i}MHz": np.random.randn(count)*100 for i in range(1,201)}, index=pd.date_range("2025-04-01", periods=count, freq="250ms")).T
>>> df
             2025-04-01 00:00:00.000  2025-04-01 00:00:00.250  2025-04-01 00:00:00.500  ...  2025-04-01 00:14:59.250  2025-04-01 00:14:59.500  2025-04-01 00:14:59.750
freq 1MHz                  56.277885               -76.715028               -19.436522  ...                10.744845                70.650393                21.947893
freq 2MHz                 -12.898406                 1.598879                93.373505  ...               -40.860931               -60.643461                -2.759319
freq 3MHz                 -65.795367               -86.950974                -2.421849  ...              -103.419081                49.461729                82.793567
freq 4MHz                  83.350473               191.231577              -161.003622  ...              -132.219173               131.218199                25.113404
freq 5MHz                 142.520352               125.725121                -3.235449  ...               -10.971120               108.124412              -106.275392
...                              ...                      ...                      ...  ...                      ...                      ...                      ...
freq 196MHz                77.526046                24.296306               -32.023278  ...                32.674445                30.273234               -84.255384
freq 197MHz                69.552259               199.034193              -150.456317  ...               105.675402               -32.833817               -41.417296
freq 198MHz              -129.568614                43.051751               -10.824167  ...               -11.545223               135.946161                 9.608785
freq 199MHz               151.030454                -5.387144                71.144257  ...              -101.057261                68.122765              -130.901913
freq 200MHz                25.296514                80.701457                32.373565  ...               146.546715              -155.170539               -78.732363

[200 rows x 3600 columns]

now just resample!

note .resample() expects the data to be in columns (axis=1 is deprecated "Deprecated since version 2.0.0: Use frame.T.resample(…) instead."), so you can transpose with .T and then back again after the resample

>>> df.T.resample("1s").mean().T
             2025-04-01 00:00:00  2025-04-01 00:00:01  2025-04-01 00:00:02  2025-04-01 00:00:03  ...  2025-04-01 00:14:56  2025-04-01 00:14:57  2025-04-01 00:14:58  2025-04-01 00:14:59
freq 1MHz              15.117181           -31.716795            23.756633           -27.578701  ...           -19.287829            -6.923784             2.077375            32.955678
freq 2MHz              16.831645            56.396253            26.996447             0.327802  ...            63.255776            68.122855            29.455577            -7.643232
freq 3MHz             -55.707993            28.662803           -59.010300             6.659348  ...           -71.702346            47.037690           -27.076411            61.648608
freq 4MHz              11.924516           -79.776633           114.552432            62.607081  ...            82.446966             1.601780            30.801114             2.600828
freq 5MHz              61.761102             0.689386            35.821693           -13.006058  ...           -87.572613           -18.939267           -45.864923           -14.326446
...                          ...                  ...                  ...                  ...  ...                  ...                  ...                  ...                  ...
freq 196MHz            18.919907           -62.561788            15.818990           -26.336276  ...            11.755455           -42.537646           -79.619288           -28.461854
freq 197MHz            10.340424           -40.260711           -15.458829           -19.250989  ...            27.373582           -15.024511            -5.521782           -14.690702
freq 198MHz           -38.281829            44.141023            -3.625441           -16.368935  ...            36.209582            26.942521           -61.729879            10.357420
freq 199MHz            61.499773           -23.392824           -21.751043           -18.025937  ...            20.019479             9.421139           -11.937625           -14.725595
freq 200MHz            46.654067            16.341845            43.656015            45.139867  ...           -18.569374            31.027033             7.913154           -31.834015

[200 rows x 900 columns]

Closely inspecting a smaller section (8 values from the first 2 rows), you can see it working on a smaller scale

>>> df_mini = df.iloc[0:2,0:8].T
>>> df_mini
                          freq 1MHz  freq 2MHz
2025-04-01 00:00:00.000   56.277885 -12.898406
2025-04-01 00:00:00.250  -76.715028   1.598879
2025-04-01 00:00:00.500  -19.436522  93.373505
2025-04-01 00:00:00.750  100.342389 -14.747396
2025-04-01 00:00:01.000   24.979359  70.526222
2025-04-01 00:00:01.250   15.195554  98.314063
2025-04-01 00:00:01.500 -209.241802  36.403973
2025-04-01 00:00:01.750   42.199710  20.340755
>>> df_mini.resample("1s").mean()
                     freq 1MHz  freq 2MHz
2025-04-01 00:00:00  15.117181  16.831645
2025-04-01 00:00:01 -31.716795  56.396253
发布评论

评论列表(0)

  1. 暂无评论