I'm trying to query a DynamoDB table to retrieve data for the current month or current week, depending on the filter applied (data_type = monthly
or weekly
). However, I'm encountering an issue where my query is returning both monthly and weekly data, even though I'm filtering for only monthly
data (or only weekly data).
DynamoDB Table Structure:
rate_id: Stores a value like
2024-11-summary
for the month of November or2024-11-week1
for the first week of November.type: Specifies whether the data is
weekly
ormonthly
. The weekly data only contains data for the current week. The monthly data contains data for all months of the current year up to current month.platform: Specifies the platform, such as
Generic_Data_DF
.Other fields: Various fields related to deployment frequency and success rate.
Lambda Function Code:
Here is the current Lambda function code, which attempts to filter the data before sending it to Grafana for monitoring:
import boto3
from boto3.dynamodb.conditions import Key
import logging
import json
from datetime import datetime, timedelta
from decimal import Decimal
# Initialize DynamoDB resource using IAM role
dynamodb = boto3.resource('dynamodb', region_name='eu-west-1')
# Specify the table
table = dynamodb.Table('deployment-frequency-table')
def query_items_by_tenant(tenant):
response = table.scan(
FilterExpression=Key('tenant').eq(tenant),
ProjectionExpression='#st, platform',
ExpressionAttributeNames={
'#st': 'status'
}
)
return response['Items']
def query_items_by_platform(platform, data_type=None):
filter_expression = Key('platform').eq(platform)
if data_type:
filter_expression &= Key('type').eq(data_type)
response = table.scan(
FilterExpression=filter_expression
)
return response['Items']
def query_items_by_rate_id(rate_id, platform, data_type=None):
"""Query items by a specific rate_id."""
filter_expression = Key('rate_id').eq(rate_id) & Key('platform').eq(platform)
if data_type:
filter_expression &= Key('type').eq(data_type)
response = table.scan(
FilterExpression=filter_expression
)
return response['Items']
def query_items_current_month(platform, data_type=None):
"""Query items for the current month based on rate_id and type."""
# Get the current year and month in 'YYYY-MM' format
current_month = datetime.now().strftime('%Y-%m')
filter_expression = Key('rate_id').begins_with(current_month)
# Then filter by platform
filter_expression &= Key('platform').eq(platform)
# Filter by 'monthly' type if data_type is 'monthly'
if data_type == 'monthly':
filter_expression &= Key('type').eq('monthly')
# Scan the table for entries that match the current month and platform, and also check for 'monthly' type
response = table.scan(
FilterExpression=filter_expression
)
return response.get('Items', [])
def query_all_items():
"""Query all items from the table."""
response = table.scan()
return response.get('Items', [])
# Function to extract month and year from rate_id
def preprocess_rate_id(items):
for item in items:
raw_rate_id = item.get('rate_id', '')
# Extract Year and Month
year, month = raw_rate_id.split('-')[:2]
# Convert month number to short name
month_name = datetime.strptime(month, "%m").strftime("%b")
# Format as "MMM YYYY"
item['rate_id_pretty'] = f"{month_name} {year}"
return items
def decimal_default(obj):
if isinstance(obj, Decimal):
return float(obj)
raise TypeError
def lambda_handler(event, context):
logging.info(f"Received event: {event}")
platform = event.get("queryStringParameters", {}).get("platform", "Generic_Data_DF") # Allow platform as a query parameter
rate_id = event.get("queryStringParameters", {}).get("rate_id")
current_month = event.get("queryStringParameters", {}).get("current_month", "false").lower() == "true"
all_data = event.get("queryStringParameters", {}).get("all_data", "false").lower() == "true"
data_type = event.get("queryStringParameters", {}).get("type", None)
# Log the incoming parameters for debugging
logging.info(f"Platform: {platform}, Data Type: {data_type}, Rate ID: {rate_id}, Current Month: {current_month}, All Data: {all_data}")
# If the platform is Generic_Data_LTC, fetch all items for that platform
if platform == 'Generic_Data_LTC':
logging.info("Fetching all data for platform Generic_Data_LTC.")
items = query_items_by_platform('Generic_Data_LTC', data_type)
elif all_data:
logging.info("Fetching all data for all months.")
items = query_all_items()
elif current_month:
logging.info("Filtering by current month.")
items = query_items_current_month(platform, data_type)
elif rate_id:
logging.info(f"Filtering by rate_id: {rate_id}")
items = query_items_by_rate_id(rate_id, platform, data_type)
else:
logging.info(f"Filtering by default platform: {platform}")
items = query_items_by_platform(platform, data_type)
# Preprocess the items to add the pretty rate_id
items = preprocess_rate_id(items)
return {
'statusCode': 200,
'body': json.dumps(items, default=decimal_default)
}
if __name__ == "__main__":
# Query items by platform
platform = 'Generic_Data_LTC'
queried_items = query_items_by_platform(platform)
print(f"Items with platform {platform}:")
for item in queried_items:
print(json.dumps(item, default=decimal_default))
In Grafana, I'm passing queries like:
https://<my-api-url>/dora-visualize?current_month=true&platform=Generic_Data_DF&data_type=monthly
Expected Behavior: Only retrieve items for platform
Generic_Data_DF
wheretype=monthly
for November 2024 (e.g.,2024-11-summary
).Actual Behavior: The query still returns weekly data for November 2024 (e.g.,
2024-11-week1
,2024-11-week2
), despite filtering bytype=monthly
.
Another query example:
I'm passing this query:
https://<my-api-url>/dora-visualize?platform=Generic_Data_DF&data_type=weekly
Expected Behavior: Only retrieve items for platform
Generic_Data_DF
wheretype=weekly
for November 2024 (e.g.,2024-11-week1
,2024-11-week2
).Actual Behavior: The query still returns monthly data for November 2024 (e.g.,
2024-11-summary
), despite filtering bytype=weekly
.
What I’m Trying to Achieve:
When the
data_type
is set tomonthly
, I only want to retrieve the monthly data.When the
data_type
is set toweekly
, I only want to retrieve the weekly data.
What I need help with:
Why is my query still returning both weekly
and monthly
data, despite applying the type
filter?
I'm trying to query a DynamoDB table to retrieve data for the current month or current week, depending on the filter applied (data_type = monthly
or weekly
). However, I'm encountering an issue where my query is returning both monthly and weekly data, even though I'm filtering for only monthly
data (or only weekly data).
DynamoDB Table Structure:
rate_id: Stores a value like
2024-11-summary
for the month of November or2024-11-week1
for the first week of November.type: Specifies whether the data is
weekly
ormonthly
. The weekly data only contains data for the current week. The monthly data contains data for all months of the current year up to current month.platform: Specifies the platform, such as
Generic_Data_DF
.Other fields: Various fields related to deployment frequency and success rate.
Lambda Function Code:
Here is the current Lambda function code, which attempts to filter the data before sending it to Grafana for monitoring:
import boto3
from boto3.dynamodb.conditions import Key
import logging
import json
from datetime import datetime, timedelta
from decimal import Decimal
# Initialize DynamoDB resource using IAM role
dynamodb = boto3.resource('dynamodb', region_name='eu-west-1')
# Specify the table
table = dynamodb.Table('deployment-frequency-table')
def query_items_by_tenant(tenant):
response = table.scan(
FilterExpression=Key('tenant').eq(tenant),
ProjectionExpression='#st, platform',
ExpressionAttributeNames={
'#st': 'status'
}
)
return response['Items']
def query_items_by_platform(platform, data_type=None):
filter_expression = Key('platform').eq(platform)
if data_type:
filter_expression &= Key('type').eq(data_type)
response = table.scan(
FilterExpression=filter_expression
)
return response['Items']
def query_items_by_rate_id(rate_id, platform, data_type=None):
"""Query items by a specific rate_id."""
filter_expression = Key('rate_id').eq(rate_id) & Key('platform').eq(platform)
if data_type:
filter_expression &= Key('type').eq(data_type)
response = table.scan(
FilterExpression=filter_expression
)
return response['Items']
def query_items_current_month(platform, data_type=None):
"""Query items for the current month based on rate_id and type."""
# Get the current year and month in 'YYYY-MM' format
current_month = datetime.now().strftime('%Y-%m')
filter_expression = Key('rate_id').begins_with(current_month)
# Then filter by platform
filter_expression &= Key('platform').eq(platform)
# Filter by 'monthly' type if data_type is 'monthly'
if data_type == 'monthly':
filter_expression &= Key('type').eq('monthly')
# Scan the table for entries that match the current month and platform, and also check for 'monthly' type
response = table.scan(
FilterExpression=filter_expression
)
return response.get('Items', [])
def query_all_items():
"""Query all items from the table."""
response = table.scan()
return response.get('Items', [])
# Function to extract month and year from rate_id
def preprocess_rate_id(items):
for item in items:
raw_rate_id = item.get('rate_id', '')
# Extract Year and Month
year, month = raw_rate_id.split('-')[:2]
# Convert month number to short name
month_name = datetime.strptime(month, "%m").strftime("%b")
# Format as "MMM YYYY"
item['rate_id_pretty'] = f"{month_name} {year}"
return items
def decimal_default(obj):
if isinstance(obj, Decimal):
return float(obj)
raise TypeError
def lambda_handler(event, context):
logging.info(f"Received event: {event}")
platform = event.get("queryStringParameters", {}).get("platform", "Generic_Data_DF") # Allow platform as a query parameter
rate_id = event.get("queryStringParameters", {}).get("rate_id")
current_month = event.get("queryStringParameters", {}).get("current_month", "false").lower() == "true"
all_data = event.get("queryStringParameters", {}).get("all_data", "false").lower() == "true"
data_type = event.get("queryStringParameters", {}).get("type", None)
# Log the incoming parameters for debugging
logging.info(f"Platform: {platform}, Data Type: {data_type}, Rate ID: {rate_id}, Current Month: {current_month}, All Data: {all_data}")
# If the platform is Generic_Data_LTC, fetch all items for that platform
if platform == 'Generic_Data_LTC':
logging.info("Fetching all data for platform Generic_Data_LTC.")
items = query_items_by_platform('Generic_Data_LTC', data_type)
elif all_data:
logging.info("Fetching all data for all months.")
items = query_all_items()
elif current_month:
logging.info("Filtering by current month.")
items = query_items_current_month(platform, data_type)
elif rate_id:
logging.info(f"Filtering by rate_id: {rate_id}")
items = query_items_by_rate_id(rate_id, platform, data_type)
else:
logging.info(f"Filtering by default platform: {platform}")
items = query_items_by_platform(platform, data_type)
# Preprocess the items to add the pretty rate_id
items = preprocess_rate_id(items)
return {
'statusCode': 200,
'body': json.dumps(items, default=decimal_default)
}
if __name__ == "__main__":
# Query items by platform
platform = 'Generic_Data_LTC'
queried_items = query_items_by_platform(platform)
print(f"Items with platform {platform}:")
for item in queried_items:
print(json.dumps(item, default=decimal_default))
In Grafana, I'm passing queries like:
https://<my-api-url>/dora-visualize?current_month=true&platform=Generic_Data_DF&data_type=monthly
Expected Behavior: Only retrieve items for platform
Generic_Data_DF
wheretype=monthly
for November 2024 (e.g.,2024-11-summary
).Actual Behavior: The query still returns weekly data for November 2024 (e.g.,
2024-11-week1
,2024-11-week2
), despite filtering bytype=monthly
.
Another query example:
I'm passing this query:
https://<my-api-url>/dora-visualize?platform=Generic_Data_DF&data_type=weekly
Expected Behavior: Only retrieve items for platform
Generic_Data_DF
wheretype=weekly
for November 2024 (e.g.,2024-11-week1
,2024-11-week2
).Actual Behavior: The query still returns monthly data for November 2024 (e.g.,
2024-11-summary
), despite filtering bytype=weekly
.
What I’m Trying to Achieve:
When the
data_type
is set tomonthly
, I only want to retrieve the monthly data.When the
data_type
is set toweekly
, I only want to retrieve the weekly data.
What I need help with:
Why is my query still returning both weekly
and monthly
data, despite applying the type
filter?
1 Answer
Reset to default 0I would strongly question the design of your schema, such as you use a Scan for every request which is not efficient and not really reaping the benefits of DynamoDBs performance.
Think about creating indexes to satisfy your access patterns.
Platform as partition key and Type as the sort key.
Now you can efficiently write:
SELECT * FROM mYtable WHERE Platform='foo' AND Type='monthly'
.
If you need to continue with Scan for any reason, then I suggest you also implement pagination as you're only ever going to evaluate 1MB of data in your current implementation, which could lead to under fetching and missing items that exist.
if data_type == 'weekly':
, so whendata_type=weekly
the filter expression does not filter for weekly data. This may explain why the second example does not work. Also check this linedata_type = event.get("queryStringParameters", {}).get("type", None)
. Should it be...get("data_type", None)
? – user202311 Commented Nov 27, 2024 at 3:48if data_type == 'monthly':
inquery_items_current_month
because it will only ever need monthly data. The logic in the rest of the functions checks ifdata_type
is provided in the query, if it is, it should filter using whatever value passed to the query. Also, yes, you're right thank you! it should be...get("data_type", None)
. I fixed it but still not getting the expected results inquery_items_by_platform
for weekly data. – Rania Commented Nov 27, 2024 at 8:13