最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Parquet files as Delta Table read by Azure Databricks from Azure Storage on public network or Microsoft network? - Stack Overflo

programmeradmin5浏览0评论

We created Delta Table in Azure Databricks. We have Parquet files being stored in Azure Storage. Is this data read being on public network or Microsoft network? As of now Azure Storage and Azure Databricks both are NOT in any our VNet. Adding both of them will improved read speed? Creating Private Endpoint on Azure Storage will ensure read through Microsoft network?

We created Delta Table in Azure Databricks. We have Parquet files being stored in Azure Storage. Is this data read being on public network or Microsoft network? As of now Azure Storage and Azure Databricks both are NOT in any our VNet. Adding both of them will improved read speed? Creating Private Endpoint on Azure Storage will ensure read through Microsoft network?

Share Improve this question asked Mar 23 at 11:40 knowdotnetknowdotnet 9373 gold badges16 silver badges34 bronze badges 1
  • can you please provide what is the approach you have tried so far? – Dileep Raj Narayan Thumula Commented Mar 24 at 3:25
Add a comment  | 

1 Answer 1

Reset to default 3

If your Azure Storage and Azure Databricks are not in any VNet, the data read is happening over the public network. To make sure that the data read happens over the Microsoft network, you can use Azure Private Link to create private endpoints for both Azure Storage and Azure Databricks.

Creating private endpoints will make sure that the traffic between Azure Databricks and Azure Storage remains within the Microsoft network, which can improve security and potentially improve read speed by avoiding the public internet.

ADLS Gen2 operates on a shared architecture. To securely access it from Azure Databricks, there are two available options:

  1. Service Endpoints
  2. Azure Private Link

You can choose either from the above approaches for Securing access between Azure Databricks (ADB) and ADLS Gen2 requires the ADB workspace to be VNet-injected, regardless of the approach used.

When a storage account is configured with a private endpoint, a firewall is enabled by default. To allow access, the VNet and subnets used by Databricks must be added to the firewall settings, as shown below.

After this you can mount the ADLS However, to read files from the folder, you also need to manage ACLs for both the container and the files.

The same can be done for files by right-clicking on the file that needs to be accessed from the Databricks notebook.

Know more how to Secure Access to Storage: Azure Databricks and Azure Data Lake Storage Gen2 Patterns

Deploy Azure Databricks in your Azure virtual network (VNet injection)

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论