RE: Disable WAL logging to speed up data loading - Mailing list pgsql-hackers

From osumi.takamichi@fujitsu.com
Subject RE: Disable WAL logging to speed up data loading
Date
Msg-id OSBPR01MB48882B8D091CE688C4DE2731ED170@OSBPR01MB4888.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Disable WAL logging to speed up data loading  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses Re: Disable WAL logging to speed up data loading
RE: Disable WAL logging to speed up data loading
List pgsql-hackers
Hi,


I wrote and attached the first patch to disable WAL logging.
This patch passes the regression test of check-world already
and is formatted by pgindent.

Also, I conducted 3 types of performance tests to
clarify the rough benefits of the patch.

I compared two wal_levels both 'minimal' and 'none'.
For both levels, I measured
(1) cluster's restore from pg_dumpall output,
(2) COPY initial data to a defined table as initial data loading, and
(3) COPY data to a defined table with tuples, 3 times each for all cases.
After that, calculated the average and the ratio of the loading speed.
The conclusion is that wal_level=none cuts about 20% of the loading speed
compared to 'minimal' in the three cases above. For sure, we could tune the configuration of
postgresql.conf further but roughly it's meaningful to share the result, I think.
'shared_buffers' was set to 40% of RAM while 'maintenance_work_mem'
was set to 20%. I set max_wal_senders = 0 this time.

The input data was generated from pgbench with 1000 scale factor.
It's about 9.3GB. For the table definition or
the initial data for appended data loading test case,
I used pgbench to set up the schema as well.
Sharing other scenario to measure is welcome.

I need to say that the current patch doesn't take the change of wal_level='none'
from/to other ones into account fully or other commands like ones for two phase commit yet.
Sorry for that.

Also, to return the value of last shut down LSN in XLogInsert(),
it acquires lock of control file every time, which was not good clearly.
I tried to replace that code like having a LSN cache as one variable
but I was not sure where I should put the function to set the initial value of the LSN.
After LocalProcessControlFile() in PostmasterMain() was not the right place.
I'd be happy if someone gives me an advice about it.


Best,
    Takamichi Osumi

Attachment

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Internal key management system
Next
From: Bharath Rupireddy
Date:
Subject: Re: Log message for GSS connection is missing once connection authorization is successful.