I am trying to replicate some values in a dataset. The original data that I am running my code against for verification purposes has two categories across multiple groups, like so:
grp5
0 3941
1 459
grp6
0 4120
1 280
grp7
0 4300
1 100
The original code that was used to create the categories across groups was written in SAS, a straightforward if-then macro statement, see below:
%macro E
IF 4 <= n <= 5
THEN grp5 = 1;
ELSE IF 2 <= n <= 3
THEN grp6 = 1;
ELSE grp7 = 1;
%mend E
With my Python code I should also get the same number of cases in each category across groups, however, there are some discrepancies between the values I am getting and I'm not sure why. Below is my Python script and the values I am getting.
# Initialize columns
df['grp5'] = 0
df['grp6'] = 0
df['grp7'] = 0
# create boolean conditions
cond5 = df['n'].between(4, 5)
cond6 = df['n'].between(2, 3)
# Apply conditions
df.loc[cond5, 'grp5'] = 1
df.loc[~cond5 & cond6, 'grp6'] = 1
df.loc[~cond5 & ~cond6, 'grp7'] = 1
grp5
0 3878
1 522
grp6
0 2437
1 1963
grp7
0 2485
1 1915