I'm working on optimizing a SQL Server query that fetches filtered data from two large tables:
Articles
(≈1 million rows for the last 3 months)ArticleHistory
(stores status changes, timestamps, and user changes, with a minimum of 10 rows per article)
The query needs to:
- Fetch articles based on multiple filters (date range, status, user actions).
- Select the latest or earliest status change date for specific statuses.
- Join with multiple tables for additional metadata.
- Use pagination (OFFSET ... FETCH NEXT).
Currently, the query runs too slow due to the large dataset and complex joins.
So far I have this code that is still too slow:
SELECT a.Id, a.CreatedAt, a.UpdatedAt, a.CustomsId, a.InternationalStatusId,
a.StatusId, a.UUID, a.ArticleDescription
-- i have more selected data here from other tables
FROM Articles AS a
-- i have left joins with other tables here
LEFT JOIN (
SELECT
sh2.ArticleId,
sh2.StatusId,
CASE
-- Special case: StatusId = 4 derived from 144, use MAX
WHEN sh2.StatusId = 4 AND EXISTS (
SELECT 1
FROM #TempStatusIds tsi
WHERE tsi.StatusId = 4 AND tsi.IsSpecial = 1
) THEN MAX(sh2.CreatedAt)
-- Actual case: StatusId = 4, use MIN
WHEN sh2.StatusId = 4 THEN MIN(sh2.CreatedAt)
-- Special case: StatusId = 10 derived from 1410, use MIN
WHEN sh2.StatusId = 10 AND EXISTS (
SELECT 1
FROM #TempStatusIds tsi
WHERE tsi.StatusId = 10 AND tsi.IsSpecial = 1
) THEN MIN(sh2.CreatedAt)
-- Actual case: StatusId = 10, use MAX
WHEN sh2.StatusId = 10 THEN MAX(sh2.CreatedAt)
-- Special case: StatusId = 14 derived from 114, use MIN
WHEN sh2.StatusId = 14 AND EXISTS (
SELECT 1
FROM #TempStatusIds tsi
WHERE tsi.StatusId = 14 AND tsi.IsSpecial = 1
) THEN MIN(sh2.CreatedAt)
-- Actual case: StatusId = 14, use MAX
WHEN sh2.StatusId = 14 THEN MAX(sh2.CreatedAt)
ELSE MAX(sh2.CreatedAt)
END AS CreatedAt
FROM ArticleHistory sh2
--One ArticleHisotry can have more than one row with the same ArticleId, StatusId and sometimes CreatedAt as in date
--and only some specific ones can be used for filtering
WHERE sh2.ChangeComment LIKE '%Status%'
AND sh2.StatusId = 3
AND cast(sh2.CreatedAt as date) >= '2025-01-01'
AND cast(sh2.CreatedAt as date) <= '2025-01-31'
GROUP BY sh2.ArticleId, sh2.StatusId
) AS sh ON a.Id = sh.ArticleId
WHERE 1 = 1
AND sh.StatusId = 3
AND cast(sh.CreatedAt as date) >= '2025-01-01'
AND cast(sh.CreatedAt as date) <= '2025-01-31'
ORDER BY a.CreatedAt DESC
OFFSET 0 ROWS
FETCH NEXT 20 ROWS ONLY
See it and sample data here:
I've tried:
- Indexing key columns (StatusId, ArticleId, StatusDate).
- Using WITH (NOLOCK) to minimize locking impact.
- Filtering ArticleHistory inside a LEFT JOIN to select relevant statuses and date ranges.
Despite this, performance is still not ideal. Are there best practices for optimizing queries with large history tables where filtering and aggregation (min/max dates) are required? Would partitioning or pre-aggregating status change data in a temporary table help?
I'm working on optimizing a SQL Server query that fetches filtered data from two large tables:
Articles
(≈1 million rows for the last 3 months)ArticleHistory
(stores status changes, timestamps, and user changes, with a minimum of 10 rows per article)
The query needs to:
- Fetch articles based on multiple filters (date range, status, user actions).
- Select the latest or earliest status change date for specific statuses.
- Join with multiple tables for additional metadata.
- Use pagination (OFFSET ... FETCH NEXT).
Currently, the query runs too slow due to the large dataset and complex joins.
So far I have this code that is still too slow:
SELECT a.Id, a.CreatedAt, a.UpdatedAt, a.CustomsId, a.InternationalStatusId,
a.StatusId, a.UUID, a.ArticleDescription
-- i have more selected data here from other tables
FROM Articles AS a
-- i have left joins with other tables here
LEFT JOIN (
SELECT
sh2.ArticleId,
sh2.StatusId,
CASE
-- Special case: StatusId = 4 derived from 144, use MAX
WHEN sh2.StatusId = 4 AND EXISTS (
SELECT 1
FROM #TempStatusIds tsi
WHERE tsi.StatusId = 4 AND tsi.IsSpecial = 1
) THEN MAX(sh2.CreatedAt)
-- Actual case: StatusId = 4, use MIN
WHEN sh2.StatusId = 4 THEN MIN(sh2.CreatedAt)
-- Special case: StatusId = 10 derived from 1410, use MIN
WHEN sh2.StatusId = 10 AND EXISTS (
SELECT 1
FROM #TempStatusIds tsi
WHERE tsi.StatusId = 10 AND tsi.IsSpecial = 1
) THEN MIN(sh2.CreatedAt)
-- Actual case: StatusId = 10, use MAX
WHEN sh2.StatusId = 10 THEN MAX(sh2.CreatedAt)
-- Special case: StatusId = 14 derived from 114, use MIN
WHEN sh2.StatusId = 14 AND EXISTS (
SELECT 1
FROM #TempStatusIds tsi
WHERE tsi.StatusId = 14 AND tsi.IsSpecial = 1
) THEN MIN(sh2.CreatedAt)
-- Actual case: StatusId = 14, use MAX
WHEN sh2.StatusId = 14 THEN MAX(sh2.CreatedAt)
ELSE MAX(sh2.CreatedAt)
END AS CreatedAt
FROM ArticleHistory sh2
--One ArticleHisotry can have more than one row with the same ArticleId, StatusId and sometimes CreatedAt as in date
--and only some specific ones can be used for filtering
WHERE sh2.ChangeComment LIKE '%Status%'
AND sh2.StatusId = 3
AND cast(sh2.CreatedAt as date) >= '2025-01-01'
AND cast(sh2.CreatedAt as date) <= '2025-01-31'
GROUP BY sh2.ArticleId, sh2.StatusId
) AS sh ON a.Id = sh.ArticleId
WHERE 1 = 1
AND sh.StatusId = 3
AND cast(sh.CreatedAt as date) >= '2025-01-01'
AND cast(sh.CreatedAt as date) <= '2025-01-31'
ORDER BY a.CreatedAt DESC
OFFSET 0 ROWS
FETCH NEXT 20 ROWS ONLY
See it and sample data here:
https://dbfiddle.uk/wT0_5R3N
I've tried:
- Indexing key columns (StatusId, ArticleId, StatusDate).
- Using WITH (NOLOCK) to minimize locking impact.
- Filtering ArticleHistory inside a LEFT JOIN to select relevant statuses and date ranges.
Despite this, performance is still not ideal. Are there best practices for optimizing queries with large history tables where filtering and aggregation (min/max dates) are required? Would partitioning or pre-aggregating status change data in a temporary table help?
Share Improve this question edited Feb 1 at 9:30 Mark Rotteveel 109k229 gold badges156 silver badges220 bronze badges asked Jan 30 at 15:44 E.SE.S 11 bronze badge 12 | Show 7 more comments1 Answer
Reset to default -1You should have indexes on the id column.
Also replace
AND cast(sh.CreatedAt as date) >= '2025-01-01'
AND cast(sh.CreatedAt as date) <= '2025-01-31'
With
YEAR(sh.CreatedAt) = 2025 AND MONTH(sh.CreatedAt) = 1
You could also make an index on the year and month if still slow.
SELECT .. FROM (select ...) JOIN .. WHERE x = (select max(..) from ...)
It helps me keep straight what code is at what level. – Joel Coehoorn Commented Jan 30 at 16:36AND cast(sh.CreatedAt as date) >= '2025-01-01' AND cast(sh.CreatedAt as date) <= '2025-01-31'
You want to instead write it in a way that avoids mutating the stored data. This not only saves pre-row processing work, but helps make better use of indexes. – Joel Coehoorn Commented Jan 30 at 16:39<
) upper bound rather than inclusive (<=
):AND sh.CreatedAt >= '20250101' AND sh.CreatedAt < '20250201'
– Joel Coehoorn Commented Jan 30 at 16:54