Clear all

Azure SQL Data Warehouse, hashing and the potential overuse Bridge Tables

Posts: 18
Topic starter
Joined: 1 year ago

Hi All,

Just curious if there could be a concern of overusing bridge tables? I'm finding scenarios where they're needed to help performance, but I want to make sure I'm not missing something in my overall design and methodology. We have an Azure SQL Data Warehouse based DV2.0 model for healthcare data from 3 hospital/clinic medical record systems. In Azure SQL Data Warehouse we must specify the distribution of our tables with the goal to minimize data movement (broadcasting and shuffling operations). I'm finding many cases where it makes sense to hash one way for the raw data vault to get the latest version of a record (we have a few driving key scenarios and resolved that with LinkSatEffective tables and a PIT table for the link) but there would be a different hash selection combining objects to generate SQL for the facts in the information mart where more than one link and/or link satellite must be combined. I've attached a diagram showing our raw and business vault tables needed to create FactVisit (one line per hospital or clinic VisitID). For that visit we'll want to get it's costs, it's primary diagnosis, it's primary procedure, a sum of charges, and more. I'd prefer to create a single bridge shown in the second diagram. This SQL is a little lengthy but relatively clean, however performance is incredibly poor because the many different links, link sat effective tables, hubs and PIT tables all hashed differently. My next idea was diagram 3. I'd create multiple bridge tables all hashed on Visit Key. It will perform fine, however it's strikingly similar to the links that it's primarily built from. It does make the query to populate the fact table perform better (all bridges are hashed the same) and it's far fewer joins. I just didn't like that it was an object that was so similar to it's parent link. I don't think I'd have this issue were it not for the hashing AND the driving key scenarios necessitating a PIT table to resolve the latest link records. I attempted to do this virtually and that was a big flow. I'd like create the single bridge and be done with it, but oh well. Does anyone have ideas aside from getting a better database :)?


Example Visits DV Objects
BridgeVisit Option 1
Multiple Bridges Option 2
2 Replies