I am not sure how to achieve this by a MySQL query:
I have a list of ads posted by multiple users and some users post multiple ads (e.g 10-20), when I load a list of ads (e.g sorted by ad_date DESC) I can run into a page full of duplicate spammy ads from same user.
Normally I can load them into a PHP array and loop over them to check user_id, user_email, user_ip, user_phone
then only print one ad matching that detail and skip the rest, but this solution causes problems when visitor clicks to go to page 2, when I don't know on what ad nubmer to start, because I need to track how many ads I skipped in previous page, so that solution is not working.
How can I do something like this ?
SELECT * from ads but just one by each user_id, one by each user_email, one by each user_phone ORDER BY ad_date DESC;
Another solution I see would be to load all ads at once and loop over them all, then remove duplicates and decide what ads to print on each page number, but there are around 200 000 records and PHP page says it uses around 200 MB to load all ads, so that aproach is not working eider.
Edit: I am providing sample input and output:
So the expected behaviour is not to list all unique emails or phones, but rather identify that an ad by same person was already printed once, identified by eider one of the fileds (email, phone, user ip address)
// I removed user_id here because that causes some confusion
sample input
| ad_id | user_email | user_phone | ad_description | ad_date | comment
|-------|-------------------|--------------|----------------|------------|---------
| 1 | | 123-456-7891 | ad_desc1 | 2023-01-01 |
| 2 | [email protected] | 123-456-7891 | ad_desc2 | 2023-01-02 | skip because PHONE listed before
| 3 | [email protected] | 123-456-7892 | ad_desc3 | 2023-01-03 | skip because EMAIL listed before
| 4 | [email protected] | 123-456-7892 | ad_desc4 | 2023-02-01 | skip because PHONE listed before
| 5 | [email protected] | | ad_desc5 | 2023-02-02 | skip because EMAIL listed before
| 6 | [email protected] | 123-456-7893 | ad_desc6 | 2023-03-01 |
| 7 | | 123-456-7892 | ad_desc7 | 2023-03-02 | skip because PHONE listed before
| 8 | [email protected] | 123-456-7895 | ad_desc8 | 2023-03-03 | skip because EMAIL listed before
| 9 | [email protected] | | ad_desc9 | 2023-03-02 | skip because EMAIL listed before
| 10 | | 123-456-7899 | ad_desc10 | 2023-03-02 |
expected output
| ad_id | user_email | user_phone | ad_description | ad_date |
|-------|-------------------|--------------|----------------|------------|
| 1 | | 123-456-7891 | ad_desc1 | 2023-01-01 |
| 6 | [email protected] | 123-456-7893 | ad_desc6 | 2023-03-01 |
| 10 | | 123-456-7899 | ad_desc10 | 2023-03-02 |
I am not sure how to achieve this by a MySQL query:
I have a list of ads posted by multiple users and some users post multiple ads (e.g 10-20), when I load a list of ads (e.g sorted by ad_date DESC) I can run into a page full of duplicate spammy ads from same user.
Normally I can load them into a PHP array and loop over them to check user_id, user_email, user_ip, user_phone
then only print one ad matching that detail and skip the rest, but this solution causes problems when visitor clicks to go to page 2, when I don't know on what ad nubmer to start, because I need to track how many ads I skipped in previous page, so that solution is not working.
How can I do something like this ?
SELECT * from ads but just one by each user_id, one by each user_email, one by each user_phone ORDER BY ad_date DESC;
Another solution I see would be to load all ads at once and loop over them all, then remove duplicates and decide what ads to print on each page number, but there are around 200 000 records and PHP page says it uses around 200 MB to load all ads, so that aproach is not working eider.
Edit: I am providing sample input and output:
So the expected behaviour is not to list all unique emails or phones, but rather identify that an ad by same person was already printed once, identified by eider one of the fileds (email, phone, user ip address)
// I removed user_id here because that causes some confusion
sample input
| ad_id | user_email | user_phone | ad_description | ad_date | comment
|-------|-------------------|--------------|----------------|------------|---------
| 1 | | 123-456-7891 | ad_desc1 | 2023-01-01 |
| 2 | [email protected] | 123-456-7891 | ad_desc2 | 2023-01-02 | skip because PHONE listed before
| 3 | [email protected] | 123-456-7892 | ad_desc3 | 2023-01-03 | skip because EMAIL listed before
| 4 | [email protected] | 123-456-7892 | ad_desc4 | 2023-02-01 | skip because PHONE listed before
| 5 | [email protected] | | ad_desc5 | 2023-02-02 | skip because EMAIL listed before
| 6 | [email protected] | 123-456-7893 | ad_desc6 | 2023-03-01 |
| 7 | | 123-456-7892 | ad_desc7 | 2023-03-02 | skip because PHONE listed before
| 8 | [email protected] | 123-456-7895 | ad_desc8 | 2023-03-03 | skip because EMAIL listed before
| 9 | [email protected] | | ad_desc9 | 2023-03-02 | skip because EMAIL listed before
| 10 | | 123-456-7899 | ad_desc10 | 2023-03-02 |
expected output
| ad_id | user_email | user_phone | ad_description | ad_date |
|-------|-------------------|--------------|----------------|------------|
| 1 | | 123-456-7891 | ad_desc1 | 2023-01-01 |
| 6 | [email protected] | 123-456-7893 | ad_desc6 | 2023-03-01 |
| 10 | | 123-456-7899 | ad_desc10 | 2023-03-02 |
Share
Improve this question
edited Jan 20 at 16:45
adrianTNT
asked Jan 20 at 0:16
adrianTNTadrianTNT
4,0825 gold badges35 silver badges40 bronze badges
5
|
1 Answer
Reset to default 1EDIT : Considered ad_description
and ad_id
as unique as per comment while user_id
, user_email
and user_phone
are not unique as per comment.
Simple way would be to what is mentioned in comments to use group by
and choose the max
or min
ad_date per user_id
,user_email
, user_phone
etc
SELECT user_id, user_email, user_phone , max(ad_date) latest_ad_date
from ads
group by user_id, user_email, user_phone ;
which for sample data
| user_id | user_email | user_phone | ad_date |
|---------|-------------------|--------------|------------|
| 1 | [email protected] | 123-456-7890 | 2023-01-01 |
| 1 | [email protected] | 123-456-7890 | 2023-02-01 |
| 1 | [email protected] | 123-456-7890 | 2023-03-01 |
| 2 | [email protected] | 123-456-7891 | 2023-01-15 |
| 2 | [email protected] | 123-456-7891 | 2023-02-15 |
| 3 | [email protected] | 123-456-7892 | 2023-01-20 |
| 3 | [email protected] | 123-456-7892 | 2023-02-20 |
| 3 | [email protected] | 123-456-7892 | 2023-03-20 |
generates
| user_id | user_email | user_phone | latest_ad_date |
|---------|-------------------|--------------|----------------|
| 1 | [email protected] | 123-456-7890 | 2023-03-01 |
| 2 | [email protected] | 123-456-7891 | 2023-02-15 |
| 3 | [email protected] | 123-456-7892 | 2023-03-20 |
If you already have parameter for user_id, you can use a filter(where
) and a limit
to use DESC like you mentioned in the question
select user_id, user_email, user_phone , ad_date from ads
where user_id = 1
order by ad_date desc
limit 1;
generates
| user_id | user_email | user_phone | ad_date |
|---------|-------------------|--------------|------------|
| 1 | [email protected] | 123-456-7890 | 2023-03-01 |
Another option is to rank each user based on user_id and order of ad_date
WITH rank_ads AS (
SELECT user_id, user_email, user_phone, ad_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY ad_date DESC) AS rn
FROM ads
)
SELECT
user_id, user_email, user_phone, ad_date
FROM rank_ads
WHERE rn = 1
ORDER BY user_id ;
generates
| user_id | user_email | user_phone | ad_date |
|---------|-------------------|--------------|------------|
| 1 | [email protected] | 123-456-7890 | 2023-03-01 |
| 2 | [email protected] | 123-456-7891 | 2023-02-15 |
| 3 | [email protected] | 123-456-7892 | 2023-03-20 |
EDIT as per comment
Basis my understanding, I created few sample data, note ad_desc and ad_id are unique while user_id, user_email, user_phone are not unique.
| ad_id | user_id | user_email | user_phone | ad_description | ad_date |
|-------|---------|-------------------|--------------|----------------|------------|
| 1 | 1 | [email protected] | 123-456-7890 | ad_desc1 | 2023-01-01 |
| 2 | 1 | [email protected] | 123-456-7890 | ad_desc2 | 2023-01-02 |
| 3 | 1 | [email protected] | 123-456-7890 | ad_desc3 | 2023-01-03 |
| 4 | 2 | [email protected] | 123-456-7891 | ad_desc4 | 2023-02-01 |
| 5 | 2 | [email protected] | 123-456-7891 | ad_desc5 | 2023-02-02 |
| 6 | 3 | [email protected] | 123-456-7892 | ad_desc6 | 2023-03-01 |
| 7 | 3 | [email protected] | 123-456-7892 | ad_desc7 | 2023-03-02 |
| 8 | 3 | [email protected] | 123-456-7892 | ad_desc8 | 2023-03-03 |
In this case if you cant directly use group by because all the rows for each unique combination(user_id+user_email+user_phone+user_description) will be returned.
SELECT ad_id,user_id, user_email, user_phone ,ad_description , max(ad_date) latest_ad_date
from ads
group by ad_id, user_id, user_email, user_phone,ad_description ;
generates
| ad_id | user_id | user_email | user_phone | ad_description | latest_ad_date |
|-------|---------|-------------------|--------------|----------------|----------------|
| 1 | 1 | [email protected] | 123-456-7890 | ad_desc1 | 2023-01-01 |
| 2 | 1 | [email protected] | 123-456-7890 | ad_desc2 | 2023-01-02 |
| 3 | 1 | [email protected] | 123-456-7890 | ad_desc3 | 2023-01-03 |
| 4 | 2 | [email protected] | 123-456-7891 | ad_desc4 | 2023-02-01 |
| 5 | 2 | [email protected] | 123-456-7891 | ad_desc5 | 2023-02-02 |
| 6 | 3 | [email protected] | 123-456-7892 | ad_desc6 | 2023-03-01 |
| 7 | 3 | [email protected] | 123-456-7892 | ad_desc7 | 2023-03-02 |
| 8 | 3 | [email protected] | 123-456-7892 | ad_desc8 | 2023-03-03 |
I guess this is not what you want, you can use window function for this
WITH rank_ads AS (
SELECT ad_id, user_id, user_email, user_phone, ad_description , ad_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY ad_date DESC) AS rn
FROM ads
)
SELECT
ad_id , user_id, user_email, user_phone, ad_description ,ad_date
FROM rank_ads
WHERE rn = 1
ORDER BY user_id ;
generates
| ad_id | user_id | user_email | user_phone | ad_description | ad_date |
|-------|---------|-------------------|--------------|----------------|------------|
| 3 | 1 | [email protected] | 123-456-7890 | ad_desc3 | 2023-01-03 |
| 5 | 2 | [email protected] | 123-456-7891 | ad_desc5 | 2023-02-02 |
| 8 | 3 | [email protected] | 123-456-7892 | ad_desc8 | 2023-03-03 |
5.7.44
one by ... not sure how to explain correctly, regardless of the field, I don't want to print another record if one was printed above (with eider same email, phone, etc), some user have same email+phone, some have no email but same phone. I found some queries to select individual ones, but after I filter these duplicates I actually need to load full row data (ad_id, ad_descrption), not just these problematic fields. – adrianTNT Commented Jan 20 at 11:39