RE: Disable WAL logging to speed up data loading - Mailing list pgsql-hackers

From osumi.takamichi@fujitsu.com
Subject RE: Disable WAL logging to speed up data loading
Date
Msg-id OSBPR01MB48887ECC140A97494C542264EDF40@OSBPR01MB4888.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Disable WAL logging to speed up data loading  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses RE: Disable WAL logging to speed up data loading
RE: Disable WAL logging to speed up data loading
List pgsql-hackers
Hi


Sorry for being late.
On Tuesday, December 1, 2020 10:42 AM Tsunakawa, Takayuki <tsunakawa.takay@fujitsu.com> wrote:
> From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > Yeah, although it's enough only to restrict non-harmful records
> > practically, if we find that only a few kinds of records are needed,
> > maybe it's cleaner to allow only required record type(s).
> >
> > Maybe it's right that if we can filter-out records looking only rmid,
> > since the xlog facility doesn't need to know about record types of a
> > resource manager.  But if we need to finer-grained control on the
> > record types, I'm afraid that that's wrong.  However, if we need only
> > the XLOG_CHECKPOINT_SHUTDOWN record, it might be better to let
> > XLogInsert filter records rather than inserting that filtering code to
> > all the caller sites.
>
> Agreed.  As the kind of WAL records to be emitted is very limited, I think
> XLogInsert() can filter them where the current patch does.
Yeah, I can do that.
I'll restrict the types of WAL in a strict manner,
which means filtering out the types of WAL in XLogInsert().
I'll modify this point in the next patch.


> > I don't dislike "none" since it seems to me practically "none".  It
> > seems rather correct if we actually need only the shutdown checkpoint
> > record.
> >
> > "unrecoverable" is apparently misleading. "crash_unsafe" is precise
> > but seems somewhat alien being among "logical", "replica" and
> > "minimal".
>
> OK, I'm happy with "none" too.  We can describe in the manual that some
> limited amount of WAL is emitted.
Sure, no problem. Let's keep adopting "none", then.

By the way, I've conducted one additional performance evaluation
to see whether reducing WAL could contribute to boost the speed of data loading or not.

I prepared a test VM with 24 cores, HDD and 128GB memory
and used a 9.3GB input file which can be generated by pgbench's scale factor 1000.
The test table is a partitioned table whose number of child tables is 20.
The reason why I choose partitioned table is to remove the concern
that any delay of the loading time comes from lock contention of table.
Therefore, I divided the input file into 20 files and loaded each file
by different 20 sessions respectively toward each correspondent table.
I compared wal_level=minimal and wal_level=none this time
by using the server with the PG configurations below.

-----------------------------------------
wal_level = minimal or minimal
archive_mode = off
max_prepared_transactions = 128
max_worker_processes = 128
max_parallel_workers = 128
max_wal_senders = 0
max_wal_size = 100GB
wal_buffers = 1GB
checkpoint_timeout = 1d
shared_buffers = 32GB # 25% of RAM
maintenance_work_mem = 26GB # 20% of RAM
work_mem = 26GB # 20% of RAM
------------------------------------------

I executed each wal_level three times and calculated the average time
and found that disabling WAL logging reduced about 73 % of the minimal's loading speed
in this test. This speed-up came from the difference of generated WAL sizes.
In this test, to load the data generated more than 10GB of WALs with wal_level=minimal
while wal_level=none emits just 760 bytes of WALs.

I double-checked this fact from pg_wal_lsn_diff() for each wal_level
and had a look at the folder size of pg_wal.
I expect this size for none will become smaller when
I take the modification to filter out the types of WAL which is discussed above.
Also, I monitored numbers of iostat's 'util' and noticed that
util's spike to use I/O reduced from twice to once when I changed the level
from minimal to none, which should be the effect of the patch.

Any comments ?

Best,
    Takamichi Osumi




pgsql-hackers by date:

Previous
From: Anastasia Lubennikova
Date:
Subject: Re: Terminate the idle sessions
Next
From: Anastasia Lubennikova
Date:
Subject: Re: Strange behavior with polygon and NaN