pyspark - Data Lineage information for flat files like parquet , delta path or CSV files rather than a table in databricks

We know that in unity catalog we can track the table lineage information but what if we have only flat files like path as a csv, parquet or even delta path.

In those case how it is possible to get lineage.

I am searching for a way in which I can also track lineage for files rather than a delta table or any table, it could be parquet or deltain path or CSV files. In many case it will be the case that I have to see the given files is originated from which files, in our case we have multiple csv and parquet file but when we ingest some data we don't know which files are creating what files.

Solution I want something like catalog explorer where we can have lineage tracking for those input and output files as well rather a table and we can get lineage information with that API.

Something like
/Path/file1.csv -> /Path/file2.csv and rest are the column level information.

I am not figuring it out how can we find the lineage details for flat files or whether it is possible or not with current databricks features?

Any help will be really help in my exploration, thankyou.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

pyspark - Data Lineage information for flat files like parquet , delta path or CSV files rather than a table in databricks - Sta

与本文相关的文章

评论列表(0)