SETUP
I have a list days
and a value N
days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52
WHAT I AM TRYING TO DO
I am trying to create a list selections
with length N
where I uniformly in frequency sample values from days
(remainders are fine). I would like the order of this list to then be shuffled.
EXAMPLE OUTPUT
NOTE HOW THE ORDER IS SHUFFLED, BUT THE DISTRIBUTION OF VALUES IS UNIFORM
selections
['Wednesday','Friday','Monday',...'Tuesday','Thursday','Monday']
import collections
counter = collections.Counter(selections)
counter
Counter({'Monday': 11, 'Tuesday': 10, 'Wednesday': 11, 'Thursday': 10, 'Friday': 10})
WHAT I HAVE TRIED
I have code to randomly select N
values from days
from random import choice, seed
seed(1)
days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52
selections = [choice(days) for x in range(N)]
But they aren't selected uniformly
import collections
counter = collections.Counter(selections)
counter
Counter({'Tuesday': 9,
'Friday': 8,
'Monday': 14,
'Wednesday': 7,
'Thursday': 14})
How can I adjust this code or what different method will create a list of length N
with a uniform distribution of values from days
in a random order?
EDIT: I obviously seemed to have phrased this question poorly. I am looking for list with length N
with a uniform distribution of values from days
but in a shuffled order (what I meant by random.) So I suppose what I am looking for is how to uniformly sample values from days
N
times, then just shuffle that list. Again, I want an equal amount of each value from days
making up a list with length N
. I need a uniform distribution for a list of exactly length 52, just as the example output shows.
SETUP
I have a list days
and a value N
days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52
WHAT I AM TRYING TO DO
I am trying to create a list selections
with length N
where I uniformly in frequency sample values from days
(remainders are fine). I would like the order of this list to then be shuffled.
EXAMPLE OUTPUT
NOTE HOW THE ORDER IS SHUFFLED, BUT THE DISTRIBUTION OF VALUES IS UNIFORM
selections
['Wednesday','Friday','Monday',...'Tuesday','Thursday','Monday']
import collections
counter = collections.Counter(selections)
counter
Counter({'Monday': 11, 'Tuesday': 10, 'Wednesday': 11, 'Thursday': 10, 'Friday': 10})
WHAT I HAVE TRIED
I have code to randomly select N
values from days
from random import choice, seed
seed(1)
days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52
selections = [choice(days) for x in range(N)]
But they aren't selected uniformly
import collections
counter = collections.Counter(selections)
counter
Counter({'Tuesday': 9,
'Friday': 8,
'Monday': 14,
'Wednesday': 7,
'Thursday': 14})
How can I adjust this code or what different method will create a list of length N
with a uniform distribution of values from days
in a random order?
EDIT: I obviously seemed to have phrased this question poorly. I am looking for list with length N
with a uniform distribution of values from days
but in a shuffled order (what I meant by random.) So I suppose what I am looking for is how to uniformly sample values from days
N
times, then just shuffle that list. Again, I want an equal amount of each value from days
making up a list with length N
. I need a uniform distribution for a list of exactly length 52, just as the example output shows.
5 Answers
Reset to default 5The code you have is correct. You are seeing expected noise around the mean.
Note that for higher N, the relative noise decreases, as expected. For example, this is what you get for N = 10000000
:
Counter({'Tuesday': 2000695, 'Thursday': 2000615, 'Wednesday': 2000096, 'Monday': 1999526, 'Friday': 1999068})
If you need equal or approximately equal (deterministic, rather than random) numbers of each element in random order, try a combination of itertools.cycle
, itertools.islice
and random.shuffle
like so:
import random
import collections
import itertools
random.seed(1)
days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52
# If `N` is not divisible by `len(days)`, this line ensures that the last
# `N % len(days)` elements of `selections` also stay random:
random.shuffle(days)
selections = list(itertools.islice(itertools.cycle(days), N))
random.shuffle(selections)
print(selections)
counter = collections.Counter(selections)
print(counter)
Output:
['Friday', 'Friday', 'Wednesday', ..., 'Thursday']
Counter({'Tuesday': 11, 'Monday': 11, 'Friday': 10, 'Wednesday': 10, 'Thursday': 10})
According to the documentation
For integers, there is uniform selection from a range. For sequences, there is uniform selection of a random element, a function to generate a random permutation of a list in-place, and a function for random sampling without replacement. [emphasis mine]
The differences you are seeing come down to randomness as you might imagine.
For demonstration, I've set up the same test using choice
, uniform
, and randint
- you'll notice they all provide similar (random) results:
from collections import Counter
from random import choice, seed, uniform, randint
# seed(1)
days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52
selections = [choice(days) for _ in range(N)] # what you're doing now
uni = [days[int(uniform(0, len(days)))] for _ in range(N)]
randi = [days[randint(0, len(days) - 1)] for _ in range(N)]
print(Counter(selections))
print(Counter(uni))
print(Counter(randi))
Output from a random sample:
Counter({'Tuesday': 16, 'Thursday': 13, 'Wednesday': 9, 'Monday': 7, 'Friday': 7})
Counter({'Friday': 14, 'Wednesday': 11, 'Monday': 10, 'Thursday': 9, 'Tuesday': 8})
Counter({'Friday': 15, 'Monday': 12, 'Wednesday': 10, 'Tuesday': 9, 'Thursday': 6})
You could build a list of days with a uniform distribution (not randomly) then just shuffle it.
Something like this:
import random
from collections import Counter
days = ['Monday','Tuesday','Wednesday','Thursday','Friday']
N = 52
# populate lista with at least N values then truncate it to the required length
lista = (days * (N//len(days)+1))[:N]
# demonstrate uniformity
print(Counter(lista))
random.shuffle(lista)
print(lista)
If you want uniform distribution (which isn't random at all), then you really want to use random choices only for the remainder, which is N % len(days)
.
In your example, N is 52 and there are five days in the list, so that's ten occurrences of each day, leaving two remaining choices for random additional days (and you should ensure the same day isn't chosen twice.)
So, make a new list with N // len(days)
copies of days
, shuffle the list, then add N % len(days)
additional random choices.
The frequencies don't change, so a simpler solution would be to just assign frequencies to a randomly-shuffled list of keys:
#!/usr/bin/env python
import random
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
N = 52
random.shuffle(days)
n_days = len(days)
days_counter = {day: 0 for day in days}
for i in range(0, N):
day = days[i % n_days]
days_counter[day] += 1
assert(sum(days_counter.values()) == N)
print(days_counter)
If you then need a uniform sample of days from these frequencies, you can use rejection sampling:
days_sample = []
while len(days_sample) < N:
day_idx = random.randint(0, n_days - 1)
day = days[day_idx]
if days_counter[day] > 0:
days_counter[day] -= 1
days_sample.append(day)
assert(len(days_sample) == N)
print(days_sample)
days
of lengthN
? I updated my answer for this use case. – Timur Shtatland Commented Mar 13 at 18:21