I'm struggling with an Azure Data Factory issue, and I'm not exactly sure how to proceed. Full discloser, I've never used ADF before this project.
We have a SQL Server database that's in an on-prem data center, and we have an Azure application in a different Domain that needs to be able to access data from it. It doesn't need to be in real time. However, the networking security team doesn't want us to use SQL Server authentication to hit this database, so we can't make a call to the database directly.
Another team is using Azure Data Factory to pull data from this database and put it into an Azure SQL database; while we can't add onto their process, we can mirror it. So, we're looking to also use ADF to copy data to an Azure SQL database, and our application can use that.
As of now, the self-hosted Azure runtime in the on-prem data center can't access the Azure SQL database, which is pretty locked down. It's been a challenge to get this to work even for testing, and I'm not convinced that our network security team would even allow this; it involves not using the private endpoint, and making the database publicly accessible with white-lists, etc.
Since the on-prem IR can't hit Azure SQL, the Copy Data activity doesn't work, so I'm trying to come up with an alternative that doesn't involve writing a file to Blob Storage and then reading that file again. To add more complexity, the total number of records is greater than the maximum allowed by the lookup activity, so I'm looking at having to do a paging strategy.
I tried having the stored procedure return JSON, in the hopes of parsing the JSON in a script activity to do an insert, but the JSON is too big for the Lookup Activity's maximum size.
Currently, I have:
- A pipeline with a couple variables: CurrentPage and PageSize.
- A Lookup Activity that calls a stored procedure that takes those two variables, and returns a page of data. This data also includes the total records.
- A ForEach activity that uses the output of the Lookup Activity
- Within the ForEach activity, I have a Script activity that does a simple Insert into the correct Azure SQL table.
So this works; I haven't fully implemented the paging piece, where the pipeline loops until there's no more data. However, it doesn't do any merging of the data, and I don't want to keep inserting.
I could call a stored procedure, but then I'm calling it many times.
Is there any way to do a Copy Data activity using the output of the Lookup Activity as the source?
Is there any way to do a SQL Merge in the Script activity? I guess I could try a complex if statement that checks for existence and does the insert or the update, but I'm hoping for something simpler.
I've also briefly tried Data Flows but wasn't able to get the data from the SQL Server via a stored procedure.
I feel like I'm spinning my wheels here. Any suggestions are welcome. If I've left any information out, please let me know and I'll try to get it.
Thanks in advance.