I have read the interesting article from @patrick-cuba regarding solving the time crisis of back dated date and the concept of the Record Tracking Satellite Extended object.
I am now thinking about how to put this into practice...
Is the guidance that this is used for every hub and link table or selectively against those that have evidence of time crime? I am leaning towards making it a standard pattern for every hub and link.
I have a general query regarding the satellite table load. Does the use of RTS-X mean that the satellite table also needs to distinguish between the traditional 'load date' (into the data warehouse) and the 'applied load date' (the date/time the record relates to), i.e., should the satellite table now include both these attributes? If so, which of these two date attributes along with the hashkey would form the primary key of the satellite?
It looks like the use of RTS-X moves away from the 'load date' for a satellite table actually storing the date the record was seen by the data warehouse and is more aligned to what we would artificially set it to when handling historical data loads. Have I misunderstood?
RTS-X appears to fundamentally work by requiring one record per source staging record per load. As this could get very large, very quickly, compression is recommended. Where CDC is used to deliver deltas of data, this could be quite manageable. However, what do we do where CDC is not possible and full dumps are delivered each night from the source system which require to be parsed to identify the small proportion of new and updated data? Is RTS-X really viable for full data dumps each night? Some of the dumps I am seeing are millions of records each day. The RTS-X table related to these data sources could get very massive very quickly if one record has to be inserted per record in the staging table containing a full data dump, when 99% of the dump is unchanged data.
Related to this, is it the view that the RTS-X table always contains every record for every day and never has any PIT table-like ageing out, even after several years? If the purpose if primary to deal with late arriving/out of sequence data, which is most likely to occur within days/weeks of the due date of the date, does it make sense to hold every RTS-X record with load date older than say x days? x could be 365 days for example.