Currently, our platform need analysis the hdfs storage usage, so we have a schedule config on airflow. It will fetch the latest fsImage from namenode everyday, then use oiv transform the fsImage to csv file, and we can use hive to load the csv file to analysis the hdfs file. But, there have a problem, use oiv to transform fsImage to csv is too slow, our latest fsImage size is 18g, and we use the code to run the oiv, Xmx64g, the transform need 1hour+, sometimes it will faild. I wanna know how to speed up the transform?
And the oiv code like
String[] oivArgs = {"-i", fsimageMetadata.getAbsolutePath(), "-o", csv, "-p", "Delimited"}; int exitCode = OfflineImageViewerPB.run(oivArgs);
Because we have test, if use hdfs oiv... command the jvm -Xmx only 8g, but if use the code to run, the -Xmx will bigger than 8g.