I am creating RAG using langchain. Everything is working fine when it comes to PDF.
But, I have a task to ingest excel .xlsx spreadsheets, find the table inside of them and then parse that table for the RAG.
I am unable to automatically identify the empty rows, columns, merged rows/columns for text etc, and remove them. Are there open source libraries that can help with it? Or is there something in langchain itself that can be of help?