Load TIME fields - proposed performance improvement - Mailing list pgsql-hackers

From Peter Smith
Subject Load TIME fields - proposed performance improvement
Date
Msg-id CAHut+Pu89TWjq530V2gY5O6SWi=OEJMQ_VHMt8bdZB_9JFna5A@mail.gmail.com
Whole thread Raw
Responses Re: Load TIME fields - proposed performance improvement  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi Hackers.

I have a test table with multiple (10) columns defined as TIME WITHOUT
TIME ZONE.

When loading this table with a lot of data (e.g. "COPY tbl FROM
/my/path/2GB.csv WITH (FORMAT CSV)") I observed it was spending an
excessive amount of time within the function GetCurrentDateTime.

IIUC the code is calling GetCurrentDateTime only to acquire the
current TX timestamp as a struct pg_tm in order to derive some
timezone information.

My test table has 10 x TIME columns.
My test data has 22.5 million rows (~ 2GB)
So that's 225 million times the GetCurrentDateTime function is called
to populate the struct with the same values.

I have attached a patch which caches this struct, so now those 225
million calls are reduced to just 1 call.

~

Test Results:

Copy 22.5 million rows data (~ 2GB)

BEFORE
Run 1 = 4m 36s
Run 2 = 4m 30s
Run 3 = 4m 32s
perf showed 20.95% time in GetCurrentDateTime

AFTER (cached struct)
Run 1 = 3m 44s
Run 2 = 3m 44s
Run 3 = 3m 45s
perf shows no time in GetCurrentDateTime
~17% performance improvement in my environment. YMMV.

~

Thoughts?

Kind Regards
Peter Smith.
Fujitsu Australia.

Attachment

pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: Transactions involving multiple postgres foreign servers, take 2
Next
From: Tom Lane
Date:
Subject: Re: Load TIME fields - proposed performance improvement