最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql - Remove duplicate record using Unnest | Aws Athena - Stack Overflow

programmeradmin2浏览0评论

I am facing issue while filtering data with array I have columns userid,event_name,attributes,ti

Attributes column have value like this

{"bool_sample":true,"array_int":[10,20,25,38],"array_string":["hello","world","i am fine"]}

My query

SELECT * FROM "events-data" CROSS JOIN UNNEST(CAST(json_extract(attributes, '$["array_int"]') AS array<int>)) AS t(val)
WHERE val > 9; 

This query filtering data but it giving me multiple row for same record, for above attributes column record it giving me 4 rows

userid,event_name,attributes,ti,val
test-userid,test-event,test-attributes,10
test-userid,test-event,test-attributes,20
test-userid,test-event,test-attributes,25
test-userid,test-event,test-attributes,38

I do not need multiple rows

I am facing issue while filtering data with array I have columns userid,event_name,attributes,ti

Attributes column have value like this

{"bool_sample":true,"array_int":[10,20,25,38],"array_string":["hello","world","i am fine"]}

My query

SELECT * FROM "events-data" CROSS JOIN UNNEST(CAST(json_extract(attributes, '$["array_int"]') AS array<int>)) AS t(val)
WHERE val > 9; 

This query filtering data but it giving me multiple row for same record, for above attributes column record it giving me 4 rows

userid,event_name,attributes,ti,val
test-userid,test-event,test-attributes,10
test-userid,test-event,test-attributes,20
test-userid,test-event,test-attributes,25
test-userid,test-event,test-attributes,38

I do not need multiple rows

Share Improve this question edited Jan 19 at 19:45 Guru Stron 144k11 gold badges168 silver badges209 bronze badges asked Jan 18 at 14:51 Nishant DixitNishant Dixit 5,5225 gold badges21 silver badges31 bronze badges 1
  • "I do not need multiple rows" - and what do you need? What is the desired output here? – Guru Stron Commented Jan 19 at 19:30
Add a comment  | 

2 Answers 2

Reset to default 0

No need CROSS JOIN with UNNEST extracted data. Check exists for.

Try

SELECT * FROM "events-data" 
WHERE exists (
    select val 
    from UNNEST(CAST(json_extract_path(attributes, '$.["array_int"]') AS int[])) AS t(val) 
    where val>9)
; 

That is what UNNEST does - it expands array in multiple rows (one row per array element, see the docs).

If the goal is to fetch rows with array_int containing values more than 9 then you can use JSON functions, for example json_exists:

-- sample data
WITH dataset(userid,event_name,attributes) as (
    values ('test-userid', 'test-event', '{"bool_sample":true,"array_int":[10,20,25,38],"array_string":["hello","world","i am fine"]}')
)

-- query
select *
from dataset
where json_exists(attributes, 'lax $.array_int[*]?(@ > 9)');

Output:

userid event_name attributes
test-userid test-event {"bool_sample":true,"array_int":[10,20,25,38],"array_string":["hello","world","i am fine"]}

If you need to get the array itself with filtered data then you can use the casting+filtering:

-- query
select userid, event_name, vals
from (
    select *, filter(cast(json_extract(attributes, '$.array_int') as array(int)), v -> v > 9) vals
    from dataset)
where cardinality(vals) > 0;

Output:

userid event_name vals
test-userid test-event {10,20,25,38}
发布评论

评论列表(0)

  1. 暂无评论