Notifications
Clear all

PIT Load Pattern (Best practice, high performance)


Posts: 472
Topic starter
(@dlinstedt)
Member
Joined: 3 years ago

Hello everyone, I hope you are well.  I have re-created the PIT table load with a highly efficient ANSI-SQL based performance driven snapshot.  In this case, I have one hub, and 3 satellites.  I am using an AS_OF control table to tell the query which "dates" / ranges to snapshot.   One more thing: if you need additional columns to replicate to PITS, then please do so by adding the calculations or the columns directly to the query.

EVERYONE has been asking for Performance and Tuning Class, so I have been busy building one.  This is just a tidbit of things to come.

If you need to wrap this with a JOIN to something else, then wrap the ENTIRE query in a WITH LoadDateCTE as ( ... query )  select * from LoadDateCTE command.

Hope you can make use of this tidbit of SQL.  It *should be fast, it should be highly performant, as long as your HUB has a declared and enabled primary key, and your satellites have declared primary keys and foreign key references.  (Table structures posted at the bottom)

In SnowflakeDB - it should still be "fast enough".  And IF you replicate Business Keys to Satellites, then you can join on the business keys (from the hub outward - using the Hub as the Driving Table.  This will ensure Snowflake uses it's micro-partitioning feature set to retrieve data.

NOTE: THIS WAS BUILT AND TESTED USING SQLSERVER, But the syntax should work out-right for all databases that support ANSI-SQL.  Just change the datatime to TIMESTAMP for Oracle, and for Teradata - find a suitable replacement.

Look Below to find out how the AS-OF date table works.  By the way, we teach this in CDVP2 classes!! 

I've eliminated the nasty nested sub-select max.

COALESCE for Hash Key: points a NULL back to a SINGLE ghost record (requires you to insert a single ghost record in each satellite for a hash key of 0 (zero))

COALESCE for Load Date: points a NULL back to a SINGLE ghost record (with the earliest load date of 1960-01-01), again, requires you to insert ONE single ghost record in each Satellite for Hash key zero, load date '1960-01-01'

  select as_of.as_of_date,h.hk_emp, h.emp_no
,coalesce(max(hs.hk_emp),convert(binary(16),0x00,2)) SalariesHashKey
,coalesce(max(hs.ldt),convert(datetime,'1960-01-01')) maxSalaries
,coalesce(max(he.hk_emp),convert(binary(16),0x00,2)) StatusHashKey
,coalesce(max(he.ldt),convert(datetime,'1960-01-01')) maxEmpStatus
,coalesce(max(ht.hk_emp),convert(binary(16),0x00,2)) titleshashkey
,coalesce(max(ht.ldt),convert(datetime,'1960-01-01')) maxTitles
from h_emp h inner join as_of on (1=1)
left outer join hs_salaries hs
on (h.hk_emp = hs.HK_emp -- for SNOWFLAKE switch this from
and hs.ldt <= as_of.as_of_date) -- hash key join to a NATURAL KEY for speed
left outer join hs_empstatus he
on (h.hk_emp = he.HK_emp
and he.ldt <= as_of.as_of_date)
left outer join hs_titles ht
on (h.hk_emp = ht.hk_emp
and ht.ldt <= as_of.as_of_date)
group by as_of.as_of_date,h.hk_emp,h.emp_no
order by 1,2;

The AS_OF Table looks like this:

create table AS_OF (
  as_of_date datetime not null primary key(as_of_date)
)

select * from as_of;
as_of_date
1990-12-31 00:00:00.0000000
2019-10-25 00:00:00.0000000
2019-12-25 00:00:00.0000000

WHAT does the AS-OF Date table DO?

It provides the Snapshot Date & Times for WHAT you want to take a picture of for loading your PIT structures.  Be PRECISE, as it will include or leave-out the data that does not meet the time-line slices!

Alright Happy Holidays!

Leave me a comment or reply if you like this kind of thing!

Tables I used for this query:

CREATE TABLE [dbo].[h_emp](
	[hk_emp] [binary](16) NOT NULL,
	[LDT] [datetime2](7) default getdate() NOT NULL,
	[RSRC] [nvarchar](30) NOT NULL,
	[emp_no] [int] NULL,
 CONSTRAINT [PKVH_emp_no] PRIMARY KEY NONCLUSTERED 
   ([hk_emp] ASC)
)
GO

CREATE TABLE [dbo].[hs_empstatus](
	[HK_emp] [binary](16) NOT NULL,
	[LDT] [datetime2](7) default getdate() NOT NULL,
	[RSRC] [nvarchar](30) NOT NULL,
	[status_date] [datetime2](7) NOT NULL,
	[status_flag] [char](1) NOT NULL,
	[deleted_date] [datetime2](7) NULL,
 CONSTRAINT [PKVS_status] PRIMARY KEY NONCLUSTERED 
  ([HK_emp] ASC,[LDT] ASC)
 ,constraint FKVS_Status foreign key (HK_EMP) references H_EMP
)
GO

CREATE TABLE [dbo].[hs_salaries](
	[HK_emp] [binary](16) NOT NULL,
	[LDT] [datetime2](7) default getdate() NOT NULL,
	[RSRC] [nvarchar](30) NOT NULL,
	[HDIFF_salaries] [binary](16) NOT NULL,
	[from_date] [datetime] NOT NULL,
	[salary] [int] NULL,
	[to_date] [datetime] NULL,
 CONSTRAINT [PKVS_salaries] PRIMARY KEY NONCLUSTERED 
  ([HK_emp] ASC,[LDT] ASC)
 ,constraint FVS_Salaries FOREIGN KEY (hk_Emp) references H_EMP
)
GO

CREATE TABLE [dbo].[hs_titles](
	[HK_emp] [binary](16) NOT NULL,
	[LDT] [datetime2](7) getdate() NOT NULL,
	[RSRC] [nvarchar](30) NOT NULL,
	[HDIFF_titles] [binary](16) NOT NULL,
	[title] [varchar](50) NOT NULL,
	[from_date] [datetime] NOT NULL,
	[to_date] [datetime] NULL,
 CONSTRAINT [PKVS_titles] PRIMARY KEY NONCLUSTERED 
   ([HK_emp] ASC,[LDT] ASC)
, constraint FKVS_Titles foreign key (HK_EMP) references H_EMP
)
go
Reply
22 Replies