Re: Speed up Clog Access by increasing CLOG buffers - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Speed up Clog Access by increasing CLOG buffers
Date
Msg-id CAA4eK1LMMGNQ439BUm0LcS3p0sb8S9kc-cUGU_ThNqMwA8_Tug@mail.gmail.com
Whole thread Raw
In response to Re: Speed up Clog Access by increasing CLOG buffers  (Andres Freund <andres@anarazel.de>)
Responses Re: Speed up Clog Access by increasing CLOG buffers  (Robert Haas <robertmhaas@gmail.com>)
Re: Speed up Clog Access by increasing CLOG buffers  (Jesper Pedersen <jesper.pedersen@redhat.com>)
Re: Speed up Clog Access by increasing CLOG buffers  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thu, Sep 3, 2015 at 5:11 PM, Andres Freund <andres@anarazel.de> wrote:
>
> On 2015-09-01 10:19:19 +0530, Amit Kapila wrote:
> > pgbench setup
> > ------------------------
> > scale factor - 300
> > Data is on magnetic disk and WAL on ssd.
> > pgbench -M prepared tpc-b
> >
> > HEAD - commit 0e141c0f
> > Patch-1 - increase_clog_bufs_v1
> >
>
> The buffer replacement algorithm for clog is rather stupid - I do wonder
> where the cutoff is that it hurts.
>
> Could you perhaps try to create a testcase where xids are accessed that
> are so far apart on average that they're unlikely to be in memory? And
> then test that across a number of client counts?
>

Okay, I have tried one such test, but what I could come up with is on an
average every 100th access is a disk access and then tested it with
different number of clog buffers and client count.  Below is the result:

Non-default parameters
------------------------------------
max_connections = 300
shared_buffers=32GB
min_wal_size=10GB
max_wal_size=15GB
checkpoint_timeout    =35min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 256MB
autovacuum=off


HEAD - commit 49124613
Patch-1 - Clog Buffers - 64
Patch-2 - Clog Buffers - 128

Client Count/Patch_ver1864128
HEAD139583363786634463
Patch-1161581803779935315
Patch-2140982193706834729
 
So there is not much difference in test results with different values for Clog
buffers, probably because the I/O has dominated the test and it shows that
increasing the clog buffers won't regress the current behaviour even though
there are lot more accesses for transaction status outside CLOG buffers.

Now about the test, create a table with large number of rows (say 11617457,
I have tried to create larger, but it was taking too much time (more than a day))
and have each row with different transaction id.  Now each transaction should
update rows that are at least 1048576 (number of transactions whose status can
be held in 32 CLog buffers) distance apart, that way ideally for each update it will
try to access Clog page that is not in-memory, however as the value to update
is getting selected randomly and that leads to every 100th access as disk access.

Test
-------
1. Attached file clog_prep.sh should create and populate the required
table and create the function used to access the CLOG pages.  You
might want to update the no_of_rows based on the rows you want to
create in table
2. Attached file  access_clog_disk.sql is used to execute the function
with random values. You might want to update nrows variable based
on the rows created in previous step.
3. Use pgbench as follows with different client count
./pgbench -c 4 -j 4 -n -M prepared -f "access_clog_disk.sql" -T 300 postgres
4. To ensure that clog access function always accesses same data
during each run, the test ensures to copy the data_directory created by step-1
before each run.

I have checked by adding some instrumentation that approximately
every 100th access is disk access, attached patch clog_info-v1.patch
adds the necessary instrumentation in code.

As an example, pgbench test yields below results:
./pgbench -c 4 -j 4 -n -M prepared -f "access_clog_disk.sql" -T 180 postgres

LOG:  trans_status(3169396)
LOG:  trans_status_disk(29546)
LOG:  trans_status(3054952)
LOG:  trans_status_disk(28291)
LOG:  trans_status(3131242)
LOG:  trans_status_disk(28989)
LOG:  trans_status(3155449)
LOG:  trans_status_disk(29347)

Here 'trans_status' is the number of times the process went for accessing
the CLOG status and 'trans_status_disk' is the number of times it went
to disk for accessing CLOG page.

 
>
> >  /*
> >   * Number of shared CLOG buffers.
> >   *
>
>
>
> I think the comment should be more drastically rephrased to not
> reference individual versions and numbers.
>

Updated comments and the patch (increate_clog_bufs_v2.patch)
containing the same is attached.


With Regards,
Amit Kapila.
Attachment

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: RLS open items are vague and unactionable
Next
From: Dean Rasheed
Date:
Subject: Re: RLS open items are vague and unactionable