pandas - How to display a random sample from a styled DataFrame?

I often want to view a random sample of k rows from a DataFrame rather than just the head/tail, for which I would use df.sample(frac=1.0).iloc[:k].

When I chain on .style to this sample, the styler will only see the k selected rows, and the resulting colour-mapping will be inaccurate as it only considers the sample.

How can I shuffle, sample, and style a DataFrame, whilst ensuring the styler uses all of the data?

Example

import pandas as pd
import numpy as np

#Data for testing
df = pd.DataFrame({
    'device_id': np.random.randint(200, 800, size=1000),
    'normalised_score': np.random.uniform(0, 2, size=1000),
    'severity_level': np.random.randint(-3, 4, size=1000),
})

#Inaccurate styling if I chain .style onto a sampled DataFrame:
df.sample(frac=1.0).iloc[:5].style.background_gradient(subset='severity_level', cmap='RdYlGn')

I am using a colourmap that roughly goes red-white-green over the range of severity_level (-3, -2, -1, 0, +1, +2, +3). A value of 0 should therefore display as white, but it gets coloured red in the sample below:

The colouring should consider all severity_level values, even though I only display a few randomly-selected rows.

I often want to view a random sample of k rows from a DataFrame rather than just the head/tail, for which I would use df.sample(frac=1.0).iloc[:k].

When I chain on .style to this sample, the styler will only see the k selected rows, and the resulting colour-mapping will be inaccurate as it only considers the sample.

How can I shuffle, sample, and style a DataFrame, whilst ensuring the styler uses all of the data?

Example

import pandas as pd
import numpy as np

#Data for testing
df = pd.DataFrame({
    'device_id': np.random.randint(200, 800, size=1000),
    'normalised_score': np.random.uniform(0, 2, size=1000),
    'severity_level': np.random.randint(-3, 4, size=1000),
})

#Inaccurate styling if I chain .style onto a sampled DataFrame:
df.sample(frac=1.0).iloc[:5].style.background_gradient(subset='severity_level', cmap='RdYlGn')

The colouring should consider all severity_level values, even though I only display a few randomly-selected rows.

Share Improve this question asked Jan 19 at 15:10 MuhammedYunus 5,0102 gold badges3 silver badges16 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

You would need to pipe df into the styler first, and then chain on .hide, whereat you select a random subset of rows using .hide(df.sample(frac=1.0).index[k:]).

.hide doesn't take lambda functions, so you can't shuffle before .style and then access the shuffled DataFrame later in the chain.

#... data from OP
(
    df
    .style
    .background_gradient(subset='severity_level', cmap='RdYlGn')

    #Shuffle and select k indices (by hiding rows coming after k)
    .hide(df.sample(frac=1.0).index[k:])
)

A value of 0 should therefore display as white, but it gets coloured red because the styler only gets part of the data

The styler now uses all values of severity_level, irrespective of the sample displayed

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

pandas - How to display a random sample from a styled DataFrame? - Stack Overflow

Example

Example

1 Answer 1

与本文相关的文章

评论列表(0)