Hi Hackers.
I have a test table with multiple (10) columns defined as TIME WITHOUT
TIME ZONE.
When loading this table with a lot of data (e.g. "COPY tbl FROM
/my/path/2GB.csv WITH (FORMAT CSV)") I observed it was spending an
excessive amount of time within the function GetCurrentDateTime.
IIUC the code is calling GetCurrentDateTime only to acquire the
current TX timestamp as a struct pg_tm in order to derive some
timezone information.
My test table has 10 x TIME columns.
My test data has 22.5 million rows (~ 2GB)
So that's 225 million times the GetCurrentDateTime function is called
to populate the struct with the same values.
I have attached a patch which caches this struct, so now those 225
million calls are reduced to just 1 call.
~
Test Results:
Copy 22.5 million rows data (~ 2GB)
BEFORE
Run 1 = 4m 36s
Run 2 = 4m 30s
Run 3 = 4m 32s
perf showed 20.95% time in GetCurrentDateTime
AFTER (cached struct)
Run 1 = 3m 44s
Run 2 = 3m 44s
Run 3 = 3m 45s
perf shows no time in GetCurrentDateTime
~17% performance improvement in my environment. YMMV.
~
Thoughts?
Kind Regards
Peter Smith.
Fujitsu Australia.