Clear all

Complete Data Vault setup on Hadoop

Posts: 11
Topic starter
Active Member
Joined: 11 months ago

Hi Everyone,

I'm wondering if anyone has any experience in running a complete Data Vault setup on Hadoop?

I'm talking about: 

  • banking data model (a lot of custom logic in BDV); 
  • storing PSA, RDV, BDV in parquet format (or any other);
  • loading RDV/BDV with pySpark / SparkSQL;
  • accessing the data from BI reporting tools like Cognos / Tableau
  • process orchestration (Airflow)

The biggest doubt I have now is: would it load/query effectively (keeping in mind that Hadoop "likes" wide, big datasets, while the DV model is rather a set of small tables)?


Any suggestions/advice/experience would be greatly appreciated.



Topic Tags
5 Replies