I have tried uploading a batch job with unique custom_id for each row in my input file. The job gets validated but completes very quickly and once I check the job only 276/4096 (as shown in the example below) is completed.
I'm unsure what is going wrong here. There is no error, previously I thought it might be a duplicate custom ID issue but still after resolving that I still face the same issue.
This is an example of the batch data below, which shows status as completed
however,
request_counts=BatchRequestCounts(completed=276, failed=0, total=4096)
shows only 276 are completed.
Batch(id='batch_02d3c78a-ba97-40e3-8646-e83099ba5dbb', completion_window='24h', created_at=1742003026, endpoint='/chat/completions', input_file_id='file-083b8e3f2f024dc9a34a06d6014679c9', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003648, error_file_id='file-c93301a5-0199-479e-8660-71426f22ce2d', errors=None, expired_at=None, expires_at=1742089426, failed_at=None, finalizing_at=1742003528, in_progress_at=1742003279, metadata=None, output_file_id='file-34f7b43e-956c-4bc7-9d0a-6699db9333c6', request_counts=BatchRequestCounts(completed=276, failed=0, total=4096))
I was expecting that the batch output will provide me back with all responses back in a single file with 4096 rows, but only 276 came back in the output file.
My rate limits are also really high, so I don't think that's an issue.
I tried running GPT-4o-mini (Global Batch) on a batch of 5K training samples. This is the example of a single instance taken from the file containing 4096 samples ~ 139MB in size:
{
"custom_id": "0_18_v2_fever_958611a10d8432c7ca51a59fca384dbc",
"method": "POST",
"url": "/chat/completions",
"body": {
"model": "gpt-4o-mini-batch",
"messages": [
{"role": "user", "content": "<my prompt here>"}
],
"max_completion_tokens": 4096,
"temperature": 0.1
}
}
I also tried deploying a new model and uploading the files, however, I got back the same response -> exactly almost similar amount of rows got completed only ~200. I am also showing the jobs run till now, only the first run got completed returning back all the rows required. Here is all the jobs below:
Batch(id='batch_02d3c78a-ba97-40e3-8646-e83099ba5dbb', completion_window='24h', created_at=1742003026, endpoint='/chat/completions', input_file_id='file-083b8e3f2f024dc9a34a06d6014679c9', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003648, error_file_id='file-c93301a5-0199-479e-8660-71426f22ce2d', errors=None, expired_at=None, expires_at=1742089426, failed_at=None, finalizing_at=1742003528, in_progress_at=1742003279, metadata=None, output_file_id='file-34f7b43e-956c-4bc7-9d0a-6699db9333c6', request_counts=BatchRequestCounts(completed=276, failed=0, total=4096))
Batch(id='batch_15b67c2a-d941-43fa-94dd-d433ebdc94c4', completion_window='24h', created_at=1742003010, endpoint='/chat/completions', input_file_id='file-9dc4bf26713147aa98e19621fa0f907e', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003745, error_file_id='file-2be4eb9a-8abc-4312-801c-8e258fac607d', errors=None, expired_at=None, expires_at=1742089410, failed_at=None, finalizing_at=1742003631, in_progress_at=1742003279, metadata=None, output_file_id='file-417f3e4f-f299-4b3a-a195-827c8f4db1ca', request_counts=BatchRequestCounts(completed=265, failed=0, total=5000))
Batch(id='batch_9dccaf7c-1394-4778-aa07-aa02b280c772', completion_window='24h', created_at=1742002991, endpoint='/chat/completions', input_file_id='file-d7aaa72ebff440b19722cb9c9b8d205f', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003646, error_file_id='file-c2634ad4-99fc-4575-a6b3-d7b68492b97a', errors=None, expired_at=None, expires_at=1742089391, failed_at=None, finalizing_at=1742003527, in_progress_at=1742003293, metadata=None, output_file_id='file-6b9c6de8-4514-4b94-a42c-b1a1cbb1c635', request_counts=BatchRequestCounts(completed=275, failed=0, total=5000))
Batch(id='batch_e945eb8c-31a9-4a07-a415-356d1a064fe2', completion_window='24h', created_at=1742002970, endpoint='/chat/completions', input_file_id='file-e337e5ac8c06408ba67cae7b624f0c28', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003757, error_file_id='file-414ee824-b070-4bb1-9d80-14dd65a4aea2', errors=None, expired_at=None, expires_at=1742089370, failed_at=None, finalizing_at=1742003631, in_progress_at=1742003281, metadata=None, output_file_id='file-07ff6f4e-ab4b-4d15-9932-b84058f70736', request_counts=BatchRequestCounts(completed=272, failed=0, total=5000))
Batch(id='batch_cf622edb-2670-4ff2-9ee9-bd25bb35a3a0', completion_window='24h', created_at=1742002952, endpoint='/chat/completions', input_file_id='file-c88451a73f434a3b85bc0f95c21be385', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003753, error_file_id='file-f6d2900c-fb52-4c13-9bba-4cce1ffaec14', errors=None, expired_at=None, expires_at=1742089352, failed_at=None, finalizing_at=1742003631, in_progress_at=1742003289, metadata=None, output_file_id='file-9af3869e-fff3-45a4-b7a4-c38c9f2f9bac', request_counts=BatchRequestCounts(completed=270, failed=0, total=5000))
Batch(id='batch_40d32e10-a5a9-4c8c-93fc-03179afcfe78', completion_window='24h', created_at=1742002931, endpoint='/chat/completions', input_file_id='file-f1f64a0a175044e180afc9e9396d67d9', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003640, error_file_id='file-6d878431-9fca-44f8-a849-6972505d4d4a', errors=None, expired_at=None, expires_at=1742089331, failed_at=None, finalizing_at=1742003528, in_progress_at=1742003278, metadata=None, output_file_id='file-6f8cd787-c3b2-41bf-b7ef-0eba738899e8', request_counts=BatchRequestCounts(completed=277, failed=0, total=5000))
...
Batch(id='batch_23204b19-99e7-4ac3-8dd1-a70929280323', completion_window='24h', created_at=1741826257, endpoint='/chat/completions', input_file_id='file-3e06cf290fb24b0592154510e54b8809', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1741828981, error_file_id='file-da5bbdd7-6872-4379-b0fb-4a0376b6e6cc', errors=None, expired_at=None, expires_at=1741912657, failed_at=None, finalizing_at=1741828830, in_progress_at=1741828482, metadata=None, output_file_id='file-95378806-a0bc-4d80-a27a-eb3171ad9f70', request_counts=BatchRequestCounts(completed=5243, failed=0, total=5243))
I uploaded the batch script with Python, here is the pseudocode:
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT") if endpoint is None else endpoint,
api_key=os.getenv("AZURE_OPENAI_API_KEY") if api_key is None else api_key,
api_version=os.getenv("AZURE_OPENAI_API_VERSION") if api_version is None else api_version,
)
file = client.files.create(
file=open(final_filepath, "rb"),
purpose="batch"
)
file_id = file.id
batch_response = client.batches.create(
input_file_id=file_id,
endpoint="/chat/completions",
completion_window="24h"
)