最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - How to query DynamoDB with multiple filters (platform, type, and rate_id) to return only monthlyweekly data - Stack Ove

programmeradmin8浏览0评论

I'm trying to query a DynamoDB table to retrieve data for the current month or current week, depending on the filter applied (data_type = monthly or weekly). However, I'm encountering an issue where my query is returning both monthly and weekly data, even though I'm filtering for only monthly data (or only weekly data).

DynamoDB Table Structure:

  • rate_id: Stores a value like 2024-11-summary for the month of November or 2024-11-week1 for the first week of November.

  • type: Specifies whether the data is weekly or monthly. The weekly data only contains data for the current week. The monthly data contains data for all months of the current year up to current month.

  • platform: Specifies the platform, such as Generic_Data_DF.

  • Other fields: Various fields related to deployment frequency and success rate.

Lambda Function Code:

Here is the current Lambda function code, which attempts to filter the data before sending it to Grafana for monitoring:

import boto3
from boto3.dynamodb.conditions import Key
import logging
import json
from datetime import datetime, timedelta
from decimal import Decimal

# Initialize DynamoDB resource using IAM role
dynamodb = boto3.resource('dynamodb', region_name='eu-west-1')

# Specify the table
table = dynamodb.Table('deployment-frequency-table')

def query_items_by_tenant(tenant):
    response = table.scan(
        FilterExpression=Key('tenant').eq(tenant),
        ProjectionExpression='#st, platform',
        ExpressionAttributeNames={
            '#st': 'status'
        }
    )
    return response['Items']

def query_items_by_platform(platform, data_type=None):
    filter_expression = Key('platform').eq(platform)
    if data_type:
        filter_expression &= Key('type').eq(data_type)
    response = table.scan(
        FilterExpression=filter_expression
    )
    return response['Items']

def query_items_by_rate_id(rate_id, platform, data_type=None):
    """Query items by a specific rate_id."""
    filter_expression = Key('rate_id').eq(rate_id) & Key('platform').eq(platform)
    if data_type:
        filter_expression &= Key('type').eq(data_type)
    response = table.scan(
        FilterExpression=filter_expression
    )
    return response['Items']

def query_items_current_month(platform, data_type=None):
    """Query items for the current month based on rate_id and type."""
    # Get the current year and month in 'YYYY-MM' format
    current_month = datetime.now().strftime('%Y-%m')
    
    filter_expression = Key('rate_id').begins_with(current_month)

    # Then filter by platform
    filter_expression &= Key('platform').eq(platform)

    # Filter by 'monthly' type if data_type is 'monthly'
    if data_type == 'monthly':
        filter_expression &= Key('type').eq('monthly')
    
    # Scan the table for entries that match the current month and platform, and also check for 'monthly' type
    response = table.scan(
        FilterExpression=filter_expression
    )
    
    return response.get('Items', [])

def query_all_items():
    """Query all items from the table."""
    response = table.scan()
    return response.get('Items', [])

# Function to extract month and year from rate_id
def preprocess_rate_id(items):
    for item in items:
        raw_rate_id = item.get('rate_id', '')
        # Extract Year and Month
        year, month = raw_rate_id.split('-')[:2]
        # Convert month number to short name
        month_name = datetime.strptime(month, "%m").strftime("%b")
        # Format as "MMM YYYY"
        item['rate_id_pretty'] = f"{month_name} {year}"
    return items

def decimal_default(obj):
    if isinstance(obj, Decimal):
        return float(obj)
    raise TypeError

def lambda_handler(event, context):
    logging.info(f"Received event: {event}")
    platform = event.get("queryStringParameters", {}).get("platform", "Generic_Data_DF")  # Allow platform as a query parameter
    rate_id = event.get("queryStringParameters", {}).get("rate_id")
    current_month = event.get("queryStringParameters", {}).get("current_month", "false").lower() == "true"
    all_data = event.get("queryStringParameters", {}).get("all_data", "false").lower() == "true"
    data_type = event.get("queryStringParameters", {}).get("type", None)
    # Log the incoming parameters for debugging
    logging.info(f"Platform: {platform}, Data Type: {data_type}, Rate ID: {rate_id}, Current Month: {current_month}, All Data: {all_data}")
    
    # If the platform is Generic_Data_LTC, fetch all items for that platform
    if platform == 'Generic_Data_LTC':
        logging.info("Fetching all data for platform Generic_Data_LTC.")
        items = query_items_by_platform('Generic_Data_LTC', data_type)
    elif all_data:
        logging.info("Fetching all data for all months.")
        items = query_all_items()
    elif current_month:
        logging.info("Filtering by current month.")
        items = query_items_current_month(platform, data_type)
    elif rate_id:
        logging.info(f"Filtering by rate_id: {rate_id}")
        items = query_items_by_rate_id(rate_id, platform, data_type)
    else:
        logging.info(f"Filtering by default platform: {platform}")
        items = query_items_by_platform(platform, data_type)
    
    # Preprocess the items to add the pretty rate_id
    items = preprocess_rate_id(items)
    
    return {
        'statusCode': 200,
        'body': json.dumps(items, default=decimal_default)
    }

if __name__ == "__main__":
    # Query items by platform
    platform = 'Generic_Data_LTC'
    queried_items = query_items_by_platform(platform)
    print(f"Items with platform {platform}:")
    for item in queried_items:
        print(json.dumps(item, default=decimal_default))

In Grafana, I'm passing queries like:

https://<my-api-url>/dora-visualize?current_month=true&platform=Generic_Data_DF&data_type=monthly
  1. Expected Behavior: Only retrieve items for platform Generic_Data_DF where type=monthly for November 2024 (e.g., 2024-11-summary).

  2. Actual Behavior: The query still returns weekly data for November 2024 (e.g., 2024-11-week1, 2024-11-week2), despite filtering by type=monthly.

Another query example:
I'm passing this query:

https://<my-api-url>/dora-visualize?platform=Generic_Data_DF&data_type=weekly
  1. Expected Behavior: Only retrieve items for platform Generic_Data_DF where type=weekly for November 2024 (e.g., 2024-11-week1, 2024-11-week2).

  2. Actual Behavior: The query still returns monthly data for November 2024 (e.g., 2024-11-summary), despite filtering by type=weekly.

What I’m Trying to Achieve:

  • When the data_type is set to monthly, I only want to retrieve the monthly data.

  • When the data_type is set to weekly, I only want to retrieve the weekly data.

What I need help with:

Why is my query still returning both weekly and monthly data, despite applying the type filter?

I'm trying to query a DynamoDB table to retrieve data for the current month or current week, depending on the filter applied (data_type = monthly or weekly). However, I'm encountering an issue where my query is returning both monthly and weekly data, even though I'm filtering for only monthly data (or only weekly data).

DynamoDB Table Structure:

  • rate_id: Stores a value like 2024-11-summary for the month of November or 2024-11-week1 for the first week of November.

  • type: Specifies whether the data is weekly or monthly. The weekly data only contains data for the current week. The monthly data contains data for all months of the current year up to current month.

  • platform: Specifies the platform, such as Generic_Data_DF.

  • Other fields: Various fields related to deployment frequency and success rate.

Lambda Function Code:

Here is the current Lambda function code, which attempts to filter the data before sending it to Grafana for monitoring:

import boto3
from boto3.dynamodb.conditions import Key
import logging
import json
from datetime import datetime, timedelta
from decimal import Decimal

# Initialize DynamoDB resource using IAM role
dynamodb = boto3.resource('dynamodb', region_name='eu-west-1')

# Specify the table
table = dynamodb.Table('deployment-frequency-table')

def query_items_by_tenant(tenant):
    response = table.scan(
        FilterExpression=Key('tenant').eq(tenant),
        ProjectionExpression='#st, platform',
        ExpressionAttributeNames={
            '#st': 'status'
        }
    )
    return response['Items']

def query_items_by_platform(platform, data_type=None):
    filter_expression = Key('platform').eq(platform)
    if data_type:
        filter_expression &= Key('type').eq(data_type)
    response = table.scan(
        FilterExpression=filter_expression
    )
    return response['Items']

def query_items_by_rate_id(rate_id, platform, data_type=None):
    """Query items by a specific rate_id."""
    filter_expression = Key('rate_id').eq(rate_id) & Key('platform').eq(platform)
    if data_type:
        filter_expression &= Key('type').eq(data_type)
    response = table.scan(
        FilterExpression=filter_expression
    )
    return response['Items']

def query_items_current_month(platform, data_type=None):
    """Query items for the current month based on rate_id and type."""
    # Get the current year and month in 'YYYY-MM' format
    current_month = datetime.now().strftime('%Y-%m')
    
    filter_expression = Key('rate_id').begins_with(current_month)

    # Then filter by platform
    filter_expression &= Key('platform').eq(platform)

    # Filter by 'monthly' type if data_type is 'monthly'
    if data_type == 'monthly':
        filter_expression &= Key('type').eq('monthly')
    
    # Scan the table for entries that match the current month and platform, and also check for 'monthly' type
    response = table.scan(
        FilterExpression=filter_expression
    )
    
    return response.get('Items', [])

def query_all_items():
    """Query all items from the table."""
    response = table.scan()
    return response.get('Items', [])

# Function to extract month and year from rate_id
def preprocess_rate_id(items):
    for item in items:
        raw_rate_id = item.get('rate_id', '')
        # Extract Year and Month
        year, month = raw_rate_id.split('-')[:2]
        # Convert month number to short name
        month_name = datetime.strptime(month, "%m").strftime("%b")
        # Format as "MMM YYYY"
        item['rate_id_pretty'] = f"{month_name} {year}"
    return items

def decimal_default(obj):
    if isinstance(obj, Decimal):
        return float(obj)
    raise TypeError

def lambda_handler(event, context):
    logging.info(f"Received event: {event}")
    platform = event.get("queryStringParameters", {}).get("platform", "Generic_Data_DF")  # Allow platform as a query parameter
    rate_id = event.get("queryStringParameters", {}).get("rate_id")
    current_month = event.get("queryStringParameters", {}).get("current_month", "false").lower() == "true"
    all_data = event.get("queryStringParameters", {}).get("all_data", "false").lower() == "true"
    data_type = event.get("queryStringParameters", {}).get("type", None)
    # Log the incoming parameters for debugging
    logging.info(f"Platform: {platform}, Data Type: {data_type}, Rate ID: {rate_id}, Current Month: {current_month}, All Data: {all_data}")
    
    # If the platform is Generic_Data_LTC, fetch all items for that platform
    if platform == 'Generic_Data_LTC':
        logging.info("Fetching all data for platform Generic_Data_LTC.")
        items = query_items_by_platform('Generic_Data_LTC', data_type)
    elif all_data:
        logging.info("Fetching all data for all months.")
        items = query_all_items()
    elif current_month:
        logging.info("Filtering by current month.")
        items = query_items_current_month(platform, data_type)
    elif rate_id:
        logging.info(f"Filtering by rate_id: {rate_id}")
        items = query_items_by_rate_id(rate_id, platform, data_type)
    else:
        logging.info(f"Filtering by default platform: {platform}")
        items = query_items_by_platform(platform, data_type)
    
    # Preprocess the items to add the pretty rate_id
    items = preprocess_rate_id(items)
    
    return {
        'statusCode': 200,
        'body': json.dumps(items, default=decimal_default)
    }

if __name__ == "__main__":
    # Query items by platform
    platform = 'Generic_Data_LTC'
    queried_items = query_items_by_platform(platform)
    print(f"Items with platform {platform}:")
    for item in queried_items:
        print(json.dumps(item, default=decimal_default))

In Grafana, I'm passing queries like:

https://<my-api-url>/dora-visualize?current_month=true&platform=Generic_Data_DF&data_type=monthly
  1. Expected Behavior: Only retrieve items for platform Generic_Data_DF where type=monthly for November 2024 (e.g., 2024-11-summary).

  2. Actual Behavior: The query still returns weekly data for November 2024 (e.g., 2024-11-week1, 2024-11-week2), despite filtering by type=monthly.

Another query example:
I'm passing this query:

https://<my-api-url>/dora-visualize?platform=Generic_Data_DF&data_type=weekly
  1. Expected Behavior: Only retrieve items for platform Generic_Data_DF where type=weekly for November 2024 (e.g., 2024-11-week1, 2024-11-week2).

  2. Actual Behavior: The query still returns monthly data for November 2024 (e.g., 2024-11-summary), despite filtering by type=weekly.

What I’m Trying to Achieve:

  • When the data_type is set to monthly, I only want to retrieve the monthly data.

  • When the data_type is set to weekly, I only want to retrieve the weekly data.

What I need help with:

Why is my query still returning both weekly and monthly data, despite applying the type filter?

Share Improve this question edited Nov 27, 2024 at 2:28 Rania asked Nov 27, 2024 at 2:07 RaniaRania 412 bronze badges 2
  • There is no if data_type == 'weekly':, so when data_type=weekly the filter expression does not filter for weekly data. This may explain why the second example does not work. Also check this line data_type = event.get("queryStringParameters", {}).get("type", None). Should it be ...get("data_type", None)? – user202311 Commented Nov 27, 2024 at 3:48
  • I only added the if data_type == 'monthly': in query_items_current_month because it will only ever need monthly data. The logic in the rest of the functions checks if data_type is provided in the query, if it is, it should filter using whatever value passed to the query. Also, yes, you're right thank you! it should be ...get("data_type", None). I fixed it but still not getting the expected results in query_items_by_platform for weekly data. – Rania Commented Nov 27, 2024 at 8:13
Add a comment  | 

1 Answer 1

Reset to default 0

I would strongly question the design of your schema, such as you use a Scan for every request which is not efficient and not really reaping the benefits of DynamoDBs performance.

Think about creating indexes to satisfy your access patterns.

Platform as partition key and Type as the sort key.

Now you can efficiently write:

SELECT * FROM mYtable WHERE Platform='foo' AND Type='monthly'.

If you need to continue with Scan for any reason, then I suggest you also implement pagination as you're only ever going to evaluate 1MB of data in your current implementation, which could lead to under fetching and missing items that exist.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论