I have a large codebase with hundreds of functions that take string arguments, and I want to reduce brittle use of hardcoded strings while preserving autocomplete for users.
The Problem
Right now, functions look like this:
def process_data(dataset: str = "default_dataset"):
"""Process a dataset.
Args:
dataset: The name of the dataset to process. Available options: 'default_dataset', 'alternative_dataset'.
"""
if dataset not in {"default_dataset", "alternative_dataset"}:
raise ValueError(f"Invalid dataset: {dataset}")
return f"Processing {dataset}"
This has three major issues:
Magic strings everywhere → "default_dataset" appears in function signatures, conditionals, and documentation. Brittle when renaming → If "default_dataset" changes, we must manually update all instances. Autocomplete is nice here (dataset: str = "default_dataset").
Attempted Fix:
I tried using an Enum to remove hardcoded strings:
from enum import Enum
class DatasetOptions(str, Enum):
DEFAULT = "default_dataset"
ALTERNATIVE = "alternative_dataset"
def process_data(dataset: DatasetOptions = DatasetOptions.DEFAULT):
"""Process a dataset.
Args:
dataset: The name of the dataset to process. Available options: {datasets}.
""".format(datasets=", ".join([d.value for d in DatasetOptions]))
if dataset not in DatasetOptions._value2member_map_:
raise ValueError(f"Invalid dataset: {dataset}")
return f"Processing {dataset}"
This removes magic strings, but now autocomplete is bad:
process_data( # shows (dataset: DatasetOptions = DatasetOptions.DEFAULT)
Expected behavior:
process_data( # should show (dataset: str = "default_dataset")
We want users to pass plain strings ("default_dataset") but internally enforce correctness with the Enum.
What I Need
- Autocomplete should still show dataset: str = "default_dataset" (not DatasetOptions.DEFAULT).
- No magic strings in the codebase (no "default_dataset" scattered everywhere).
- Functions should be easy to refactor without updating hundreds of string references.
- Users should be able to pass plain strings ("default_dataset") and get a clean error if they mistype.
What’s the best way to achieve this balance? Is this even possible in python?
How do you remove magic strings while keeping clean autocomplete in function signatures?