I would like some clarification on the standards for load dates and maybe a bit more background on the rationale for this standard! First I am going to quote (hand written so ...maybe a few typos) what i have found from Supercharge and the New DV standards published in on this site. After each quote, i may post a question!
The DV works on the principles similar to geological layering where data arriving in the warehouse (single batch)
is stamped with a geological time based layer ( a load date time stamp). The load dates enforce audit trails and record
history based on the one and ony controllable system date available to the EDW loading routines. The only point at which this principle does not apply is during real-tmie feeding processing. .... if the process fails mechanically for any reason, it is necessary to examine all rows that were loaded during that process; resulting in removal, replacement
or augmentation do the data set.
What is batch in this context? If a normal "batch" of source data was processed into a DV staging area one per day but on a specific day (say Day1) the data did not reach our staging area and this data was instead included in another "batch" on Day2. i believe that the load date for "Day1" and "Day2" would be...the same load date. Correct?
the staging area is refreshed (purged) prior to each batch load cycle, in other words, they should not ever house
history of loads
Why? is this for performance reasons? Why can we not keep an archive of all our source data exactly as it was when we received it? Surely this is smart if we want to reload the DV right?
These quotes are from the new standards. It seems conceptually very similar but some differences.
The Load Date Time Stamp is the date and time of insert for the record.
Insert into what? Presumably, insert into the first place these data land in the staging area or directly into the DV for real time processing. Correct?
This is the date stamp of the date/time that the data was loaded into the database.
What database? Again, presumably the date for batches comes from the the staging area record of when we first received the data from the source IN OUR "system".
Extract Date (Optional) - Extract Dates should always be additional attributes in Hubs,
Links, and Satellites. They should never be applied as load dates
What extract date are you referring to? I believe you are referring generally to the date that the data was extracted fro the source system. In the case of text files, this could be the timestamp on the filename or possibly included as a header in the file itself.
OK, if my assumption above is correct....this very much speaks to my use case in the first paragraph. A loaddate is NOT the Extract date. Its when we see if IN OUR "system". Correct?
OK, so now the why! Audit is a big word so i'll focus on what i can understand. I understand that "if the process fails mechanically for some reason....we need to use this date to properly reload. Anything else concrete that I can sink my teeth into?
That seems like alot. I'll leave it there for now. I have a follow question that deal with "other" dates we have but lets start here!