Create an N length list by uniformly (in frequency) selecting items from a separate list in python

SETUP

I have a list days and a value N

days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52

WHAT I AM TRYING TO DO

I am trying to create a list selections with length N where I uniformly in frequency sample values from days (remainders are fine). I would like the order of this list to then be shuffled.

EXAMPLE OUTPUT

NOTE HOW THE ORDER IS SHUFFLED, BUT THE DISTRIBUTION OF VALUES IS UNIFORM

selections

['Wednesday','Friday','Monday',...'Tuesday','Thursday','Monday']

import collections
counter = collections.Counter(selections)
counter
Counter({'Monday': 11, 'Tuesday': 10, 'Wednesday': 11, 'Thursday': 10, 'Friday': 10})

WHAT I HAVE TRIED

I have code to randomly select N values from days

from random import choice, seed

seed(1)

days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52

selections = [choice(days) for x in range(N)]

But they aren't selected uniformly

import collections
counter = collections.Counter(selections)
counter

Counter({'Tuesday': 9,
         'Friday': 8,
         'Monday': 14,
         'Wednesday': 7,
         'Thursday': 14})

How can I adjust this code or what different method will create a list of length N with a uniform distribution of values from days in a random order?

EDIT: I obviously seemed to have phrased this question poorly. I am looking for list with length N with a uniform distribution of values from days but in a shuffled order (what I meant by random.) So I suppose what I am looking for is how to uniformly sample values from days N times, then just shuffle that list. Again, I want an equal amount of each value from days making up a list with length N. I need a uniform distribution for a list of exactly length 52, just as the example output shows.

SETUP

I have a list days and a value N

days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52

WHAT I AM TRYING TO DO

I am trying to create a list selections with length N where I uniformly in frequency sample values from days (remainders are fine). I would like the order of this list to then be shuffled.

EXAMPLE OUTPUT

NOTE HOW THE ORDER IS SHUFFLED, BUT THE DISTRIBUTION OF VALUES IS UNIFORM

selections

['Wednesday','Friday','Monday',...'Tuesday','Thursday','Monday']

import collections
counter = collections.Counter(selections)
counter
Counter({'Monday': 11, 'Tuesday': 10, 'Wednesday': 11, 'Thursday': 10, 'Friday': 10})

WHAT I HAVE TRIED

I have code to randomly select N values from days

from random import choice, seed

seed(1)

days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52

selections = [choice(days) for x in range(N)]

But they aren't selected uniformly

import collections
counter = collections.Counter(selections)
counter

Counter({'Tuesday': 9,
         'Friday': 8,
         'Monday': 14,
         'Wednesday': 7,
         'Thursday': 14})

How can I adjust this code or what different method will create a list of length N with a uniform distribution of values from days in a random order?

Share Improve this question edited Mar 13 at 19:34 Timur Shtatland 12.5k3 gold badges38 silver badges64 bronze badges asked Mar 13 at 17:32 bismo 1,4611 gold badge26 silver badges54 bronze badges

1 What do you mean is not uniformly selected? If you want the same number of selections for each day, then it's not random. Random means that each day is selected with equal probability, but you will most likely not end up with each day selected the same number of times. If you put a very large value of N, then the percentage of time each day is selected will be roughly the same. N=52 is small enough that you will see fluctuations. It's even possible (although unlikely) that in one of the runs, Monday is selected 52 times. That's what random means – Cincinnatus Commented Mar 13 at 17:40
I meant random in order, per my mention of "randomly ordered, uniform selection". Sorry. I will edit OP for further clarification. – bismo Commented Mar 13 at 17:44
so, are you looking to randomly shuffle a list? – Cincinnatus Commented Mar 13 at 17:46
1 @bismo Are you looking for randomly shuffling a deterministic list of days of length N? I updated my answer for this use case. – Timur Shtatland Commented Mar 13 at 18:21
1 That is exactly what I am looking for. Apologies for the poor phrasing on my end. – bismo Commented Mar 13 at 18:23

| Show 1 more comment

5 Answers 5

Sorted by: Reset to default 5

The code you have is correct. You are seeing expected noise around the mean.

Note that for higher N, the relative noise decreases, as expected. For example, this is what you get for N = 10000000:

Counter({'Tuesday': 2000695, 'Thursday': 2000615, 'Wednesday': 2000096, 'Monday': 1999526, 'Friday': 1999068})

If you need equal or approximately equal (deterministic, rather than random) numbers of each element in random order, try a combination of itertools.cycle, itertools.islice and random.shuffle like so:


import random
import collections
import itertools

random.seed(1)

days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52

# If `N` is not divisible by `len(days)`, this line ensures that the last 
# `N % len(days)` elements of `selections` also stay random:
random.shuffle(days)

selections = list(itertools.islice(itertools.cycle(days), N))
random.shuffle(selections)
print(selections)

counter = collections.Counter(selections)
print(counter)

Output:

['Friday', 'Friday', 'Wednesday', ...,  'Thursday']
Counter({'Tuesday': 11, 'Monday': 11, 'Friday': 10, 'Wednesday': 10, 'Thursday': 10})

According to the documentation

For integers, there is uniform selection from a range. For sequences, there is uniform selection of a random element, a function to generate a random permutation of a list in-place, and a function for random sampling without replacement. [emphasis mine]

The differences you are seeing come down to randomness as you might imagine.

For demonstration, I've set up the same test using choice, uniform, and randint - you'll notice they all provide similar (random) results:

from collections import Counter
from random import choice, seed, uniform, randint

# seed(1)

days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52

selections = [choice(days) for _ in range(N)]  # what you're doing now
uni = [days[int(uniform(0, len(days)))] for _ in range(N)]
randi = [days[randint(0, len(days) - 1)] for _ in range(N)]

print(Counter(selections))
print(Counter(uni))
print(Counter(randi))

Output from a random sample:

Counter({'Tuesday': 16, 'Thursday': 13, 'Wednesday': 9, 'Monday': 7, 'Friday': 7})
Counter({'Friday': 14, 'Wednesday': 11, 'Monday': 10, 'Thursday': 9, 'Tuesday': 8})
Counter({'Friday': 15, 'Monday': 12, 'Wednesday': 10, 'Tuesday': 9, 'Thursday': 6})

You could build a list of days with a uniform distribution (not randomly) then just shuffle it.

Something like this:

import random
from collections import Counter
days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52
# populate lista with at least N values then truncate it to the required length
lista = (days * (N//len(days)+1))[:N]
# demonstrate uniformity
print(Counter(lista))
random.shuffle(lista)
print(lista)

If you want uniform distribution (which isn't random at all), then you really want to use random choices only for the remainder, which is N % len(days).

In your example, N is 52 and there are five days in the list, so that's ten occurrences of each day, leaving two remaining choices for random additional days (and you should ensure the same day isn't chosen twice.)

So, make a new list with N // len(days) copies of days, shuffle the list, then add N % len(days) additional random choices.

The frequencies don't change, so a simpler solution would be to just assign frequencies to a randomly-shuffled list of keys:

#!/usr/bin/env python

import random

days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
N = 52

random.shuffle(days)

n_days = len(days)
days_counter = {day: 0 for day in days}
for i in range(0, N):
    day = days[i % n_days]
    days_counter[day] += 1

assert(sum(days_counter.values()) == N)
print(days_counter)

If you then need a uniform sample of days from these frequencies, you can use rejection sampling:

days_sample = []
while len(days_sample) < N:
    day_idx = random.randint(0, n_days - 1)
    day = days[day_idx]
    if days_counter[day] > 0:
        days_counter[day] -= 1
        days_sample.append(day)

assert(len(days_sample) == N)
print(days_sample)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Create an N length list by uniformly (in frequency) selecting items from a separate list in python - Stack Overflow

SETUP

WHAT I AM TRYING TO DO

EXAMPLE OUTPUT

WHAT I HAVE TRIED

SETUP

WHAT I AM TRYING TO DO

EXAMPLE OUTPUT

WHAT I HAVE TRIED

5 Answers 5

与本文相关的文章

评论列表(0)