SSAS Tabular 1600 - Partition Processing Issue - Random Empty Partitions During Full Parallel Processing
Details:
We have a fact table with 800 million rows, split into 25 partitions by Month Index (dynamic partitions).
We have a TMSL script (below) running in an Agent Job (SQL Server) which processes 5 partitions in parallel, in 5 series (5 x 5 = 25).
Connection to the source is via the "Modern" SQL driver: Microsoft.Mashup.OleDb.1 provider.
Issue:
If we attempt to fully process the cube (all partitions or one batch at a time), we observe the following:
- All partitions in scope are executed.
- All partitions in scope are running in parallel.
- On the SQL source, we can see open and active sessions.
- They all run for 25+ minutes.
- However, at the end, we end up with some partitions are not-refreshed.
For example, a batch has 5 partitions: 5 were executed, 5 were running, 5 sessions were open. But only 3 partitions have data out of 5, on the end. No error. Or 4 are refreshed with data and 1 is not refreshed. On a second attempt, another random 3 out of 5 may not be refreshed (not even the same partitions) and two others will be OK.
The process always ends up successfully without any error. Yet, some partitions at the end may be empty.
More Details:
Monitoring the SQL Server source, the only suspicious thing is that this happens if the SQL source server is under stress, and some sessions/partitions exhibit this wait type: "CXSYNCPORT".
However, all job steps end without any error! They end up successfully. Apparently, standalone connection sessions for each partition were running for some time (confirmed by monitoring). But in the end, some partitions are just empty.
Are Data in Source?
Yes, there is data in the source for those partitions, because if we run the same command again, by chance, there could be a full successful refresh with all partitions and all data. This randomness prevents us from using this solution in production. This uncertainty issue, without any apparent error, is something we cannot find any reference for on the internet.
Processing Cube Manually in SSMS:
Processing the cube manually in SSMS did not help. It does not matter if the cube is processed by TMSL script in a job or manually through SSMS. The result is the same randomness in refresh, no error, yet some partitions are empty.
Attempt to solve the issue:
limit number of parallel execution, opened session, partitions processed at once. Seems like this helps for some time, unless SQL source is in heavy stress, then even less partitions in parallel does not matter.
I do not know if this is SQL source issue? Or SSAS issue, or particularly Mashup driver issue. However still there is no error and process pretend to end successfully. Even it is not true.
Snippet of TMSL from JOB step:
{
"sequence": {
"maxParallelism": 5,
"operations": [
{
"refresh": {
"type": "full",
"objects": [
{
"database": "Cube",
"table": "Fact Detail",
"partition": "MnthIndx 0"
}
]
}
},
.....
.....
.....
{
"refresh": {
"type": "full",
"objects": [
{
"database": "Cube",
"table": "Fact Detail",
"partition": "MnthIndx 5"
}
]
}
}
]
}
}