Issue: I am attempting to create a fine-tuning job on Amazon Bedrock via the AWS Web console. The base model selected for the task is AWS Nova Micro. The training data - which resides in an S3 bucket in a .jsonl file - is saved in the required format as per the Amazon Bedrock User Guide and contains around 3000 records:-
{"prompt": "What is the capital of France?", "completion": "The capital of France is Paris."}
The IAM role used contains the correct level of permissions for reading & writing. The job saves and runs for up to about 14-16 minutes before failing. The error message flashed on the top of the page is
Unable to parse S3 file.
I have enabled Model Invocation logging for Bedrock but nothing related to this job is written to the CloudWatch. Additionally, the standard dashboards in CloudWatch don't seem to have any helpful information either.
In order to test the quality of my data, I converted the data into Open AI's required format:-
{"messages": [{"role": "user", "content": "Where do babies come from"}, {"role": "assistant", "content": "That is really something you should be asking your dad about"}]}
I then performed a fine-tuning job on Open AI's console and the job completed successfully and the custom model was created as required.
So, my assumption is that the data is relatively clean and that the error lies somewhere with Bedrock - but how can I get more specific information on what exactly is causing the failure?
I am looking for guidance on how to go about debugging this.
Issue: I am attempting to create a fine-tuning job on Amazon Bedrock via the AWS Web console. The base model selected for the task is AWS Nova Micro. The training data - which resides in an S3 bucket in a .jsonl file - is saved in the required format as per the Amazon Bedrock User Guide and contains around 3000 records:-
{"prompt": "What is the capital of France?", "completion": "The capital of France is Paris."}
The IAM role used contains the correct level of permissions for reading & writing. The job saves and runs for up to about 14-16 minutes before failing. The error message flashed on the top of the page is
Unable to parse S3 file.
I have enabled Model Invocation logging for Bedrock but nothing related to this job is written to the CloudWatch. Additionally, the standard dashboards in CloudWatch don't seem to have any helpful information either.
In order to test the quality of my data, I converted the data into Open AI's required format:-
{"messages": [{"role": "user", "content": "Where do babies come from"}, {"role": "assistant", "content": "That is really something you should be asking your dad about"}]}
I then performed a fine-tuning job on Open AI's console and the job completed successfully and the custom model was created as required.
So, my assumption is that the data is relatively clean and that the error lies somewhere with Bedrock - but how can I get more specific information on what exactly is causing the failure?
I am looking for guidance on how to go about debugging this.
Share Improve this question asked Mar 25 at 13:27 ReegzReegz 6041 gold badge7 silver badges15 bronze badges1 Answer
Reset to default 0Ok so it turns out that this was indeed a format error on my part.
The below format is the correct per-line format to use for each record in the JSONL file.
{
"schemaVersion":"bedrock-conversation-2024",
"system":[
{
"text":"You are a digital assistant with a friendly personality"
}
],
"messages":[
{
"role":"user",
"content":[
{
"text":"Question"
}
]
},
{
"role":"assistant",
"content":[
{
"text":"Answer"
}
]
}
]
}