I've been trying for quite a while to generate a query that will give results like this:
*Dimension: Date, Item ID, Session source/medium
Metric: Items viewed, Items purchased, Item revenue*
Query I got:
SELECT
DATE,
item_id,
CONCAT(COALESCE(session_source, '(direct)'),' / ',COALESCE(session_medium, '(none)')) AS session_source_medium,
SUM(items_viewed) AS items_viewed,
SUM(items_purchased) AS items_purchased,
SUM(item_revenue) AS item_revenue
FROM
(
SELECT
event_date AS DATE,
items.item_id AS item_id,
CONCAT( user_pseudo_id, (SELECT value.int_value FROM UNNEST (event_params) WHERE key = 'ga_session_id' ) ) AS ga_session_id,
ARRAY_AGG(( CASE WHEN collected_traffic_source.gclid IS NOT NULL THEN 'google' ELSE collected_traffic_source.manual_source END ) IGNORE NULLS ORDER BY event_timestamp ASC LIMIT 1 ) [SAFE_OFFSET(0)] AS session_source,
ARRAY_AGG((CASE WHEN collected_traffic_source.gclid IS NOT NULL THEN 'cpc' ELSE collected_traffic_source.manual_medium END ) IGNORE NULLS ORDER BY event_timestamp ASC LIMIT 1 ) [SAFE_OFFSET(0)] AS session_medium,
SUM( CASE WHEN event_name = 'view_item' THEN items.quantity END ) AS items_viewed,
SUM( CASE WHEN event_name = 'purchase' THEN items.quantity END ) AS items_purchased,
SUM(items.item_revenue) AS item_revenue
FROM
`maps-098153.analytics_4783657881.events_20250222`,UNNEST (items) AS items
GROUP BY
DATE,
item_id,
ga_session_id
)
GROUP BY
DATE,
item_id,
session_source_medium
I compare these results with GA panel data and Looker Studio and I don't think that BigQuery data is accurate as it seems that most of the data is put into direct which is not true. Below stats for most viewed item in one day:
BigQuery item views/purchases results:
direct - 86 985, 27 purchased
google / cpc - 87, 3 purchased
google / anic - 36, 1 purchased
other sources - 1, 0 purchased
GA4/Looker Studio results:
direct - 41 204, 7 purchased
google / cpc - 31 086, 5 purchased
google - play / anic - 4 652, 3 purchased
google / anic - 3 781, 10 purchased
other sources - 7 253, 3 purchased
Does anyone have an idea how to fix this so results are closer to GA/Looker data? I do realize the data in BQ will never be the same as in GA as it's not partioned and modeled. Maybe the case is to adjust the code to be more like GA attribution, but I don't know how can I do that.