I'm wondering if anyone has any experience in running a complete Data Vault setup on Hadoop?
I'm talking about:
- banking data model (a lot of custom logic in BDV);
- storing PSA, RDV, BDV in parquet format (or any other);
- loading RDV/BDV with pySpark / SparkSQL;
- accessing the data from BI reporting tools like Cognos / Tableau
- process orchestration (Airflow)
The biggest doubt I have now is: would it load/query effectively (keeping in mind that Hadoop "likes" wide, big datasets, while the DV model is rather a set of small tables)?
Any suggestions/advice/experience would be greatly appreciated.