sql - How do I get matched pairs of results (1 male and 1 female) with the same age at random?

I have a database/table with this structure:

Year	Age	Gender	OrderID
2012	18	M	4268
2021	75	M	7569
2015	56	F	5381
2018	29	M	2876
2014	33	F	3749

I have a database/table with this structure:

Year	Age	Gender	OrderID
2012	18	M	4268
2021	75	M	7569
2015	56	F	5381
2018	29	M	2876
2014	33	F	3749

What I am trying to acheive is that I want 400 records/table rows pulled at random to form a smaller sample, but I need 200 male and 200 female records. On top of this I need each male record to give a female record with the same Age value, so I essentially end up with 200 pairs of results - each pair having a male and female of the same Age.

I have already produced and tried the following code:

DROP TABLE IF EXISTS #SampleTableM
DROP TABLE IF EXISTS #SampleTableF

SELECT TOP (200) [Year],[Age],[Gender],[OrderID]
INTO #SampleTableM
  FROM [database.name]
  WHERE Age <=90 AND Sex = 'M'
  ORDER BY NEWID()

SELECT TOP (200) [Year],[Age],[Gender],[OrderID]
INTO #SampleTableF
  FROM [database.name]
  WHERE Age <=90 AND Sex = 'F'
  ORDER BY NEWID()

SELECT * FROM #SampleTableM
UNION
SELECT * FROM #SampleTableF;

However, this just gets me 200 random Male results and 200 random Female results without each result being matched to one of the opposite Gender with the same age.

Share Improve this question edited Feb 5 at 13:09 Thom A 95.6k11 gold badges60 silver badges92 bronze badges asked Feb 5 at 12:37 Arsinq 134 bronze badges

1 This question is similar to: Select n random rows from SQL Server table. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. – Filburt Commented Feb 5 at 12:40
The existing answers don't give me any idea on the matching pairs of results side of things. – Arsinq Commented Feb 5 at 12:46
Do you want to prioritize the age distribution somehow? Or is it better to prioritize that you have pairs – siggemannen Commented Feb 5 at 13:07
The priority is the matching pairs, age distribution shouldn't matter too much. – Arsinq Commented Feb 5 at 14:10

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

Based on the comments, I first selected random 200 male sample and then matched the age for female gender corresponding to the collected 200 male samples.Unsure how your output should look like as you did not share the expected output, you can adjust the columns in output as required.

Here is an example

    WITH male_sample AS (
    SELECT TOP (200) [Year], [Age], [Gender], [OrderID]
    FROM test
    WHERE Age <= 90 AND Gender = 'M'
    ),
female_sample AS (
    SELECT TOP (200) m.[Year] AS male_year, m.[Age] AS age, m.[Gender] AS male_gender, m.[OrderID] AS male_OrderID,
           f.[Year] AS female_year, f.[Gender] AS female_gender, f.[OrderID] AS female_OrderID
    FROM male_sample m
    INNER JOIN test f ON m.Age = f.Age
    WHERE f.Gender = 'F'
)

SELECT *
FROM female_sample;

Fiddle

Male_Year	Age	Male_Gender	Male_OrderID	Female_Year	Female_Gender	Female_OrderID
2012	18	M	4268	2013	F	4269
2021	75	M	4269	2020	F	4270
2018	29	M	4271	2019	F	4272
2016	56	M	4273	2015	F	4270
2014	33	M	4274	2014	F	4272
2022	40	M	4001	2017	F	4002
2023	50	M	5001	2011	F	5002
2024	60	M	6001	2010	F	6002

As per comment we can get the matching pairs in subsequent rows using UNION ALL and then order by age, gender. There may or not be equal distribution of age for each gender but atleast similar age will appear together.

Note : Visually this might still look uneven for when the number of male and female are more than 2 for an age.

WITH male_sample AS (
    SELECT TOP (200) [Year], [Age], [Gender], [OrderID]
    FROM test
    WHERE Age <= 90 AND Gender = 'M'
    ),
female_sample AS (
    SELECT TOP (200) m.[Year] AS male_year, m.[Age] AS age, m.[Gender] AS male_gender, m.[OrderID] AS male_OrderID,
           f.[Year] AS female_year, f.[Gender] AS female_gender, f.[OrderID] AS female_OrderID
    FROM male_sample m
    INNER JOIN test f ON m.Age = f.Age
    WHERE f.Gender = 'F'
),
combined_sample AS (
    SELECT [male_year] AS [Year], [Age], [male_gender] AS [Gender], [male_OrderID] AS [OrderID]
    FROM female_sample
    UNION ALL
    SELECT [female_year] AS [Year], [Age], [female_gender] AS [Gender], [female_OrderID] AS [OrderID]
    FROM female_sample
)
SELECT [Year], [Age], [Gender], [OrderID]
FROM combined_sample
ORDER BY age, Gender  ;

Fiddle

Output

Year	Age	Gender	OrderID
2013	18	F	4269
2012	18	M	4268
2019	29	F	4272
2018	29	M	4271
2014	33	F	4272
2014	33	M	4274
2017	40	F	4002
2022	40	M	4001
2011	50	F	5002
2023	50	M	5001
2015	56	F	4270
2016	56	M	4273
2010	60	F	6002
2024	60	M	6001
2020	75	F	4270
2021	75	M	4269

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

sql - How do I get matched pairs of results (1 male and 1 female) with the same age at random? - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)