I've seen a few post with people searching for a similar answer in both SO and other forums, but yet to see an answer. I would like to find the max memory by namespace and container in the last 30 day. To be clear I do not want a Time series of the max memory every day over the last 30 days. I want 1 aggregate max number for each namespace, container combination.
I started with this: set Time Range to: "Last 30 Days"
populate the "Metrics Browser" with:
max by(namespace, container) (container_memory_working_set_bytes{container!~".generic", namespace=~"namespace-01|namespace-02"})
Depending on what value is in the "min step" field in the options section, depends on how many rows I get back for each combination of namespace/container. For fun I tried 30d, thinking I would get one number that errored.
Next I tried 15d, ("today" = 4/3/2025) thinking I would get back 2 rows for each namespace/container, nope I got back 3 with these timestamps. 2025-02-25 19:00:00 2025-03-12 20:00:00 2025-03-27 20:00:00
Very strange. 4/3/2025 - 30 = 3/4/2025. Where did 2/25/2025 come from? Sure 2/25 +15 = 3/12 and 3/12 + 15 = 3/27 . . . but 3/27 + 15 <> 4/3, so how did the starting point get to be 2/25??
Next I tried 10d, thinking I would get back 3 rows for each, nope I got back 4. Similar strangeness as described with the 15d above.
It gets worse. I converted all this to a python script and used Pandas to aggregate the data I get back from the PromQL. For fun I tried 10d and compared aggregated valued to 7d. The 7d had higher max memory!! So I reran again with 5d, again 5d had higher max memory across most combinations then the 7d!! Tried again with 1d, 12h and 6h all three of these had the same max memory and was higher the 5d!! I'm shocked I get different values for any of these UNLESS the max memory happened to be in the last few minutes or the first few minutes, aka in between run times. IT WAS NOT. This makes me think it might be taking "instant" values for each timestamp and not the max.
I've also tried something like max by(namespace, container) (max_over_time(container_memory_working_set_bytes{container!~".generic", namespace~"namespace-01|namespace-02"}[1d]) )
Also same behavior, but I get more namespace/container combinations, so it appears max_over_time returns a bigger list. I suspect some namespace/container are not in every time slice and min_over_time returns namespace/containers that have gaps. So this is an improvement. but depending on "Min Step" everything changes in a similar way as described above.
Thanks in advance for any assistance you can provide.