最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

pyspark - Data Lineage information for flat files like parquet , delta path or CSV files rather than a table in databricks - Sta

programmeradmin0浏览0评论

We know that in unity catalog we can track the table lineage information but what if we have only flat files like path as a csv, parquet or even delta path.

In those case how it is possible to get lineage.

I am searching for a way in which I can also track lineage for files rather than a delta table or any table, it could be parquet or deltain path or CSV files. In many case it will be the case that I have to see the given files is originated from which files, in our case we have multiple csv and parquet file but when we ingest some data we don't know which files are creating what files.

Solution I want something like catalog explorer where we can have lineage tracking for those input and output files as well rather a table and we can get lineage information with that API.

Something like
/Path/file1.csv -> /Path/file2.csv and rest are the column level information.

I am not figuring it out how can we find the lineage details for flat files or whether it is possible or not with current databricks features?

Any help will be really help in my exploration, thankyou.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论