Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [Patch] Optimize dropping of relation buffers using dlist
Date
Msg-id 20191112191933.g2ti5ulqurojopsu@development
Whole thread Raw
In response to RE: [Patch] Optimize dropping of relation buffers using dlist  ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
Responses RE: [Patch] Optimize dropping of relation buffers using dlist
List pgsql-hackers
On Tue, Nov 12, 2019 at 10:49:49AM +0000, k.jamison@fujitsu.com wrote:
>On Thurs, November 7, 2019 1:27 AM (GMT+9), Robert Haas wrote:
>> On Tue, Nov 5, 2019 at 10:34 AM Tomas Vondra <tomas.vondra@2ndquadrant.com>
>> wrote:
>> > 2) This adds another hashtable maintenance to BufferAlloc etc. but
>> >     you've only done tests / benchmark for the case this optimizes. I
>> >     think we need to see a benchmark for workload that allocates and
>> >     invalidates lot of buffers. A pgbench with a workload that fits into
>> >     RAM but not into shared buffers would be interesting.
>>
>> Yeah, it seems pretty hard to believe that this won't be bad for some workloads.
>> Not only do you have the overhead of the hash table operations, but you also
>> have locking overhead around that. A whole new set of LWLocks where you have
>> to take and release one of them every time you allocate or invalidate a buffer
>> seems likely to cause a pretty substantial contention problem.
>
>I'm sorry for the late reply. Thank you Tomas and Robert for checking this patch.
>Attached is the v3 of the patch.
>- I moved the unnecessary items from buf_internals.h to cached_buf.c since most of
>  of those items are only used in that file.
>- Fixed the bug of v2. Seems to pass both RT and TAP test now
>
>Thanks for the advice on benchmark test. Please refer below for test and results.
>
>[Machine spec]
>CPU: 16, Number of cores per socket: 8
>RHEL6.5, Memory: 240GB
>
>scale: 3125 (about 46GB DB size)
>shared_buffers = 8GB
>
>[workload that fits into RAM but not into shared buffers]
>pgbench -i -s 3125 cachetest
>pgbench -c 16 -j 8 -T 600 cachetest
>
>[Patched]
>scaling factor: 3125
>query mode: simple
>number of clients: 16
>number of threads: 8
>duration: 600 s
>number of transactions actually processed: 8815123
>latency average = 1.089 ms
>tps = 14691.436343 (including connections establishing)
>tps = 14691.482714 (excluding connections establishing)
>
>[Master/Unpatched]
>...
>number of transactions actually processed: 8852327
>latency average = 1.084 ms
>tps = 14753.814648 (including connections establishing)
>tps = 14753.861589 (excluding connections establishing)
>
>
>My patch caused a little overhead of about 0.42-0.46%, which I think is small.
>Kindly let me know your opinions/comments about the patch or tests, etc.
>

Now try measuring that with a read-only workload, with prepared
statements. I've tried that on a machine with 16 cores, doing

   # 16 clients
   pgbench -n -S -j 16 -c 16 -M prepared -T 60 test

   # 1 client
   pgbench -n -S -c 1 -M prepared -T 60 test

and average from 30 runs of each looks like this:

    # clients      master         patched         %
   ---------------------------------------------------------
    1              29690          27833           93.7%
    16            300935         283383           94.1%

That's quite significant regression, considering it's optimizing an
operation that is expected to be pretty rare (people are generally not
dropping dropping objects as often as they query them).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Proposal: Add more compile-time asserts to exposeinconsistencies.
Next
From: Andres Freund
Date:
Subject: Re: Coding in WalSndWaitForWal