最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql - How do I get matched pairs of results (1 male and 1 female) with the same age at random? - Stack Overflow

programmeradmin8浏览0评论

I have a database/table with this structure:

Year Age Gender OrderID
2012 18 M 4268
2021 75 M 7569
2015 56 F 5381
2018 29 M 2876
2014 33 F 3749

I have a database/table with this structure:

Year Age Gender OrderID
2012 18 M 4268
2021 75 M 7569
2015 56 F 5381
2018 29 M 2876
2014 33 F 3749

What I am trying to acheive is that I want 400 records/table rows pulled at random to form a smaller sample, but I need 200 male and 200 female records. On top of this I need each male record to give a female record with the same Age value, so I essentially end up with 200 pairs of results - each pair having a male and female of the same Age.

I have already produced and tried the following code:

DROP TABLE IF EXISTS #SampleTableM
DROP TABLE IF EXISTS #SampleTableF

SELECT TOP (200) [Year],[Age],[Gender],[OrderID]
INTO #SampleTableM
  FROM [database.name]
  WHERE Age <=90 AND Sex = 'M'
  ORDER BY NEWID()

SELECT TOP (200) [Year],[Age],[Gender],[OrderID]
INTO #SampleTableF
  FROM [database.name]
  WHERE Age <=90 AND Sex = 'F'
  ORDER BY NEWID()

SELECT * FROM #SampleTableM
UNION
SELECT * FROM #SampleTableF;

However, this just gets me 200 random Male results and 200 random Female results without each result being matched to one of the opposite Gender with the same age.

Share Improve this question edited Feb 5 at 13:09 Thom A 95.6k11 gold badges60 silver badges92 bronze badges asked Feb 5 at 12:37 ArsinqArsinq 134 bronze badges 4
  • 1 This question is similar to: Select n random rows from SQL Server table. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. – Filburt Commented Feb 5 at 12:40
  • The existing answers don't give me any idea on the matching pairs of results side of things. – Arsinq Commented Feb 5 at 12:46
  • Do you want to prioritize the age distribution somehow? Or is it better to prioritize that you have pairs – siggemannen Commented Feb 5 at 13:07
  • The priority is the matching pairs, age distribution shouldn't matter too much. – Arsinq Commented Feb 5 at 14:10
Add a comment  | 

1 Answer 1

Reset to default 1

Based on the comments, I first selected random 200 male sample and then matched the age for female gender corresponding to the collected 200 male samples.Unsure how your output should look like as you did not share the expected output, you can adjust the columns in output as required.

Here is an example

    WITH male_sample AS (
    SELECT TOP (200) [Year], [Age], [Gender], [OrderID]
    FROM test
    WHERE Age <= 90 AND Gender = 'M'
    ),
female_sample AS (
    SELECT TOP (200) m.[Year] AS male_year, m.[Age] AS age, m.[Gender] AS male_gender, m.[OrderID] AS male_OrderID,
           f.[Year] AS female_year, f.[Gender] AS female_gender, f.[OrderID] AS female_OrderID
    FROM male_sample m
    INNER JOIN test f ON m.Age = f.Age
    WHERE f.Gender = 'F'
)

SELECT *
FROM female_sample;

Fiddle

Male_Year Age Male_Gender Male_OrderID Female_Year Female_Gender Female_OrderID
2012 18 M 4268 2013 F 4269
2021 75 M 4269 2020 F 4270
2018 29 M 4271 2019 F 4272
2016 56 M 4273 2015 F 4270
2014 33 M 4274 2014 F 4272
2022 40 M 4001 2017 F 4002
2023 50 M 5001 2011 F 5002
2024 60 M 6001 2010 F 6002

As per comment we can get the matching pairs in subsequent rows using UNION ALL and then order by age, gender. There may or not be equal distribution of age for each gender but atleast similar age will appear together.

Note : Visually this might still look uneven for when the number of male and female are more than 2 for an age.

WITH male_sample AS (
    SELECT TOP (200) [Year], [Age], [Gender], [OrderID]
    FROM test
    WHERE Age <= 90 AND Gender = 'M'
    ),
female_sample AS (
    SELECT TOP (200) m.[Year] AS male_year, m.[Age] AS age, m.[Gender] AS male_gender, m.[OrderID] AS male_OrderID,
           f.[Year] AS female_year, f.[Gender] AS female_gender, f.[OrderID] AS female_OrderID
    FROM male_sample m
    INNER JOIN test f ON m.Age = f.Age
    WHERE f.Gender = 'F'
),
combined_sample AS (
    SELECT [male_year] AS [Year], [Age], [male_gender] AS [Gender], [male_OrderID] AS [OrderID]
    FROM female_sample
    UNION ALL
    SELECT [female_year] AS [Year], [Age], [female_gender] AS [Gender], [female_OrderID] AS [OrderID]
    FROM female_sample
)
SELECT [Year], [Age], [Gender], [OrderID]
FROM combined_sample
ORDER BY age, Gender  ;

Fiddle

Output

Year Age Gender OrderID
2013 18 F 4269
2012 18 M 4268
2019 29 F 4272
2018 29 M 4271
2014 33 F 4272
2014 33 M 4274
2017 40 F 4002
2022 40 M 4001
2011 50 F 5002
2023 50 M 5001
2015 56 F 4270
2016 56 M 4273
2010 60 F 6002
2024 60 M 6001
2020 75 F 4270
2021 75 M 4269
发布评论

评论列表(0)

  1. 暂无评论