Thread: Re: [COMMITTERS] pgsql: Add the "snapshot too old" feature

Re: [COMMITTERS] pgsql: Add the "snapshot too old" feature

From
Alexander Korotkov
Date:
Kevin,

This commit makes me very uneasy.  I didn't much care about this patch mainly because I didn't imagine its consequences. Now, I see following:

1) We uglify buffer manager interface a lot.  This patch adds 3 more arguments to BufferGetPage().  It looks weird.  Should we try to find less invasive way for doing this?
2) Did you ever try to examine performance regression?  I tried simple read only test on 4x18 Intel server.
pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data fits to shared_buffers)

master                    -   193 740.3 TPS
snapshot too old reverted - 1 065 732.6 TPS

So, for read-only benchmark this patch introduces more than 5 times regression on big machine.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [COMMITTERS] pgsql: Add the "snapshot too old" feature

From
Alvaro Herrera
Date:
Alexander Korotkov wrote:
> Kevin,
> 
> This commit makes me very uneasy.  I didn't much care about this patch
> mainly because I didn't imagine its consequences. Now, I see following:
> 
> 1) We uglify buffer manager interface a lot.  This patch adds 3 more
> arguments to BufferGetPage().  It looks weird.  Should we try to find less
> invasive way for doing this?

Kevin's patch was much less invasive originally.  It became invasive in
the course of later review -- there was fear that future code would
"forget" the snapshot check accidentally, which would have disastrous
effects (data becomes invisible without notice).

> 2) Did you ever try to examine performance regression?  I tried simple read
> only test on 4x18 Intel server.
> pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data
> fits to shared_buffers)
> 
> master                    -   193 740.3 TPS
> snapshot too old reverted - 1 065 732.6 TPS
> 
> So, for read-only benchmark this patch introduces more than 5 times
> regression on big machine.

Wow, this is terrible, but the patch is supposed to have no impact when
the feature is not in use.  Maybe there's some simple oversight that can
be fixed.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [COMMITTERS] pgsql: Add the "snapshot too old" feature

From
Alexander Korotkov
Date:
On Mon, Apr 11, 2016 at 4:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Alexander Korotkov wrote:
> Kevin,
>
> This commit makes me very uneasy.  I didn't much care about this patch
> mainly because I didn't imagine its consequences. Now, I see following:
>
> 1) We uglify buffer manager interface a lot.  This patch adds 3 more
> arguments to BufferGetPage().  It looks weird.  Should we try to find less
> invasive way for doing this?

Kevin's patch was much less invasive originally.  It became invasive in
the course of later review -- there was fear that future code would
"forget" the snapshot check accidentally, which would have disastrous
effects (data becomes invisible without notice).

OK, I will read that thread and try to verify these thoughts.
 
> 2) Did you ever try to examine performance regression?  I tried simple read
> only test on 4x18 Intel server.
> pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data
> fits to shared_buffers)
>
> master                    -   193 740.3 TPS
> snapshot too old reverted - 1 065 732.6 TPS
>
> So, for read-only benchmark this patch introduces more than 5 times
> regression on big machine.

Wow, this is terrible, but the patch is supposed to have no impact when
the feature is not in use.  Maybe there's some simple oversight that can
be fixed.

Perf show that 50% of time is spent in s_lock() called from GetXLogInsertRecPtr() called from GetSnapshotData().  I think at very least we should at least surround with "if" checking that "snapshot too old" is enabled.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 

Re: pgsql: Add the "snapshot too old" feature

From
Kevin Grittner
Date:
On Mon, Apr 11, 2016 at 8:20 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

> 1) We uglify buffer manager interface a lot.  This patch adds 3 more
> arguments to BufferGetPage().  It looks weird.  Should we try to find less
> invasive way for doing this?

As already pointed out, I originally touched about 450 fewer places
in the code, and did not change the signature of BufferGetPage().
I was torn on the argument that we needed a "forced choice" on
checking the snapshot age built into BufferGetPage() -- it is a bit
annoying, but it might prevent a later bug which could silently
return incorrect data.  In the end, I caved to the argument that
the annoyance was worth the improved chances of avoiding such a
bug.

> 2) Did you ever try to examine performance regression?

Yes.  Our customer used big machines for extensive performance
testing -- although they used "paced" input to simulate real users,
and saw nothing but gains.  On my own i7 box, your test scaled to
100 (so it would fit in memory) yielded this:

unpatched:

number of transactions actually processed: 40534737
latency average = 0.739 ms
latency stddev = 0.359 ms
tps = 134973.430374 (including connections establishing)
tps = 135039.395041 (excluding connections establishing)

with the "snapshot too old" patch:

number of transactions actually processed: 40679654
latency average = 0.735 ms
latency stddev = 0.486 ms
tps = 135463.614244 (including connections establishing)
tps = 135866.160933 (excluding connections establishing)

> I tried simple read
> only test on 4x18 Intel server.
> pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data fits
> to shared_buffers)
>
> master                    -   193 740.3 TPS
> snapshot too old reverted - 1 065 732.6 TPS
>
> So, for read-only benchmark this patch introduces more than 5 times
> regression on big machine.

I did not schedule a saturation test on a big machine.  I guess I
should have done.

I'm confident this can be fixed; your suggestion for a wrapping
"if" is probably sufficient.  I am looking at this and the misuse
of "volatile" now.

Are you able to easily test that or should I book time on one (or
more) of our big machines on my end?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql: Add the "snapshot too old" feature

From
Kevin Grittner
Date:
On Mon, Apr 11, 2016 at 12:31 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Mon, Apr 11, 2016 at 8:20 AM, Alexander Korotkov

>> So, for read-only benchmark this patch introduces more than 5 times
>> regression on big machine.
>
> I did not schedule a saturation test on a big machine.  I guess I
> should have done.

I have booked two large NUMA machines for Friday to look for any
remaining performance regressions on high-concurrency loads on such
machines.  Anyone with ideas on particular tests that they feel
should be included, please let me know before then.

Thanks!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql: Add the "snapshot too old" feature

From
Andres Freund
Date:
On 2016-04-12 13:57:26 -0500, Kevin Grittner wrote:
> I have booked two large NUMA machines for Friday to look for any
> remaining performance regressions on high-concurrency loads on such
> machines.  Anyone with ideas on particular tests that they feel
> should be included, please let me know before then.

The worst case is probably statements which are utterly trivial to
execute, but do require a snapshot both during planning and
execution. Like e.g. SELECT 1;. That's probably a good thing to get
started with.

Greetings,

Andres Freund