After migrating from Gen 1 to Gen 2 on Azure Data Lake, there are lot of updates on Azure Libraries end.
Can anyone please provide me latest sample code to read and write csv from Azure ML Studio to CDL (Data Lake) Gen 2 using Azure Datastores in Python. Also please provide me the resources where i can read stuff regarding these changes in Gen 2 and Azure libraries which we need to install for the same for reading and writing csv from Azure ML Studio to CDL (Data Lake) Gen 2 using Azure Datastores in Python.
After migrating from Gen 1 to Gen 2 on Azure Data Lake, there are lot of updates on Azure Libraries end.
Can anyone please provide me latest sample code to read and write csv from Azure ML Studio to CDL (Data Lake) Gen 2 using Azure Datastores in Python. Also please provide me the resources where i can read stuff regarding these changes in Gen 2 and Azure libraries which we need to install for the same for reading and writing csv from Azure ML Studio to CDL (Data Lake) Gen 2 using Azure Datastores in Python.
Share edited 7 hours ago Venkatesan 10.8k2 gold badges5 silver badges20 bronze badges Recognized by Microsoft Azure Collective asked 23 hours ago NeoNeo 13 bronze badges New contributor Neo is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 1- Refer this learn.microsoft/en-us/azure/synapse-analytics/… – Venkatesan Commented 7 hours ago
1 Answer
Reset to default 0Can anyone please provide me latest sample code to read and write csv from Azure ML Studio to CDL (Data Lake) Gen 2 using Azure Datastores in Python. Also please provide me the resources where i can read stuff regarding these changes in Gen 2 and Azure libraries which we need to install for the same for reading and writing csv from Azure ML Studio to CDL (Data Lake) Gen 2 using Azure Datastores in Python.
You can use the below code to read and write CSV files between Azure Machine Learning (Azure ML) Studio and Azure Data Lake Storage Gen2 using Azure Datastores in Python.
Register the Azure Data Lake Gen2 as a Datastore Code:
from azure.ai.ml import MLClient
from azure.ai.ml.entities import AzureDataLakeGen2Datastore
from azure.identity import DefaultAzureCredential
# Authenticate with Azure ML Workspace
credential = DefaultAzureCredential()
ml_client = MLClient(
credential=credential,
subscription_id="xxxx",
resource_group_name="xxx",
workspace_name="xx"
)
# Define the ADLS Gen2 Datastore
datastore = AzureDataLakeGen2Datastore(
name="sampledatastore",
account_name="xxx",
filesystem="xxx",
)
# Register the Datastore
ml_client.datastores.create_or_update(datastore)
print("Datastore registered successfully!")
To read a CSV file from the datastore:
Code:
import pandas as pd
# Define path to the CSV file in ADLS Gen2
csv_path = "azureml://subscriptions/xxxxx/resourcegroups/vexxxx/workspaces/xxxx/datastores/xxxx/paths/003.csv"
df = pd.read_csv(csv_path)
print(df.head())
Output:
CATEGORY TIME INDICATOR \
0 Rankings 2016.0 NaN
1 NaN NaN Health Outcomes - Rank
2 NaN NaN Health Outcomes - Quartile
3 NaN NaN Health Factors - Rank
4 NaN NaN Health Factors - Quartile
To write a CSV file from the datastore:
Code:
df = pd.DataFrame({
"Name": ["Alice", "Bob"],
"Score": [90, 85]
})
# Save to local temporary CSV file
df.to_csv("sample.csv", index=False)
# Upload it to the datastore
data_asset = Data(
path="sample.csv",
type=AssetTypes.URI_FILE,
name="csv-upload",
description="Sample CSV upload to Data Lake",
datastore=datastore.name
)
uploaded_data = ml_client.data.create_or_update(data_asset)
print("CSV uploaded to:", uploaded_data.path)
Output:
Uploading sample.csv (< 1 MB): 27.0B [00:00, 78.8B/s]
CSV uploaded to: azureml://subscriptionsxxx/resourcegroups/xx/workspaces/xxce/datastores/xxx/paths/LocalUpload/99xxx4/sample.csv
Reference: Use datastores - Azure Machine Learning | Microsoft Learn