I have a python script which needs to be executed by passing the input using command line. The command is as follows
python script.py --input [{\\"A\\":\\"322|985\\",\\"B\\":3}]
The idea is to convert the input to a pandas DataFrame. Code below does convert it to Pandas DataFrame but only creates a single column named 0
and the value for that column is [{\A\:\322|985\,\B\:3}]
.
import json
import pandas as pd
import argparse
def validate_input(input_data):
if isinstance(input_data, pd.DataFrame):
return input_data # Already a DataFrame, return as is
json_conv = json.dumps(input_data)
json_data = json.loads(json_conv)
return pd.DataFrame([json_data]) # Convert JSON serializable to DataFrame
def process_data(input_data):
"""
Function that processes data, only called if dtype is valid.
"""
validated_data = validate_input(input_data)
print(validated_data)
print("Processing data:\n", validated_data)
def main():
parser = argparse.ArgumentParser(description="Validate and process JSON or Pandas DataFrame input.")
parser.add_argument("--input", type=str, help="Input data as a JSON string")
args = parser.parse_args()
try:
process_data(args.input) # Proceed with processing only after validation
except json.JSONDecodeError:
raise TypeError("Invalid JSON input. Please provide a valid JSON string.")
if __name__ == "__main__":
main()
Run code below to get expected output
pd.DataFrame([{"A":"322|985","B":3}])
I have a python script which needs to be executed by passing the input using command line. The command is as follows
python script.py --input [{\\"A\\":\\"322|985\\",\\"B\\":3}]
The idea is to convert the input to a pandas DataFrame. Code below does convert it to Pandas DataFrame but only creates a single column named 0
and the value for that column is [{\A\:\322|985\,\B\:3}]
.
import json
import pandas as pd
import argparse
def validate_input(input_data):
if isinstance(input_data, pd.DataFrame):
return input_data # Already a DataFrame, return as is
json_conv = json.dumps(input_data)
json_data = json.loads(json_conv)
return pd.DataFrame([json_data]) # Convert JSON serializable to DataFrame
def process_data(input_data):
"""
Function that processes data, only called if dtype is valid.
"""
validated_data = validate_input(input_data)
print(validated_data)
print("Processing data:\n", validated_data)
def main():
parser = argparse.ArgumentParser(description="Validate and process JSON or Pandas DataFrame input.")
parser.add_argument("--input", type=str, help="Input data as a JSON string")
args = parser.parse_args()
try:
process_data(args.input) # Proceed with processing only after validation
except json.JSONDecodeError:
raise TypeError("Invalid JSON input. Please provide a valid JSON string.")
if __name__ == "__main__":
main()
Run code below to get expected output
pd.DataFrame([{"A":"322|985","B":3}])
Share
edited Mar 7 at 22:09
Barmar
784k57 gold badges548 silver badges659 bronze badges
asked Mar 7 at 21:49
LopezLopez
4341 gold badge5 silver badges30 bronze badges
0
1 Answer
Reset to default 2You're escaping the backslashes, so the doublequotes aren't being taken literally. As a result, the shell is treating them as string delimiters, not passing them to python.
The simplest fix would be to put the entire argument in single quotes.
python script.py --input '[{"A":"322|985","B":3}]'
There's no need to call json.dumps(input_data)
. input_data
is a JSON string, not data, so it doesn't need to be converted to JSON.
json_data
is already a list because the JSON has []
. You don't need to wrap it in another list when calling pd.DataFrame()
.
So the corrected version of validate_input()
is:
def validate_input(input_data):
if isinstance(input_data, pd.DataFrame):
return input_data # Already a DataFrame, return as is
json_data = json.loads(input_data)
return pd.DataFrame(json_data) # Convert JSON serializable to DataFrame