Thread: Re: [COMMITTERS] pgsql: Add the "snapshot too old" feature
Kevin,
This commit makes me very uneasy. I didn't much care about this patch mainly because I didn't imagine its consequences. Now, I see following:
1) We uglify buffer manager interface a lot. This patch adds 3 more arguments to BufferGetPage(). It looks weird. Should we try to find less invasive way for doing this?
2) Did you ever try to examine performance regression? I tried simple read only test on 4x18 Intel server.
pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data fits to shared_buffers)
master - 193 740.3 TPS
snapshot too old reverted - 1 065 732.6 TPS
So, for read-only benchmark this patch introduces more than 5 times regression on big machine.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov wrote: > Kevin, > > This commit makes me very uneasy. I didn't much care about this patch > mainly because I didn't imagine its consequences. Now, I see following: > > 1) We uglify buffer manager interface a lot. This patch adds 3 more > arguments to BufferGetPage(). It looks weird. Should we try to find less > invasive way for doing this? Kevin's patch was much less invasive originally. It became invasive in the course of later review -- there was fear that future code would "forget" the snapshot check accidentally, which would have disastrous effects (data becomes invisible without notice). > 2) Did you ever try to examine performance regression? I tried simple read > only test on 4x18 Intel server. > pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data > fits to shared_buffers) > > master - 193 740.3 TPS > snapshot too old reverted - 1 065 732.6 TPS > > So, for read-only benchmark this patch introduces more than 5 times > regression on big machine. Wow, this is terrible, but the patch is supposed to have no impact when the feature is not in use. Maybe there's some simple oversight that can be fixed. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 11, 2016 at 4:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov wrote:
> Kevin,
>
> This commit makes me very uneasy. I didn't much care about this patch
> mainly because I didn't imagine its consequences. Now, I see following:
>
> 1) We uglify buffer manager interface a lot. This patch adds 3 more
> arguments to BufferGetPage(). It looks weird. Should we try to find less
> invasive way for doing this?
Kevin's patch was much less invasive originally. It became invasive in
the course of later review -- there was fear that future code would
"forget" the snapshot check accidentally, which would have disastrous
effects (data becomes invisible without notice).
OK, I will read that thread and try to verify these thoughts.
> 2) Did you ever try to examine performance regression? I tried simple read
> only test on 4x18 Intel server.
> pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data
> fits to shared_buffers)
>
> master - 193 740.3 TPS
> snapshot too old reverted - 1 065 732.6 TPS
>
> So, for read-only benchmark this patch introduces more than 5 times
> regression on big machine.
Wow, this is terrible, but the patch is supposed to have no impact when
the feature is not in use. Maybe there's some simple oversight that can
be fixed.
Perf show that 50% of time is spent in s_lock() called from GetXLogInsertRecPtr() called from GetSnapshotData(). I think at very least we should at least surround with "if" checking that "snapshot too old" is enabled.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Mon, Apr 11, 2016 at 8:20 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > 1) We uglify buffer manager interface a lot. This patch adds 3 more > arguments to BufferGetPage(). It looks weird. Should we try to find less > invasive way for doing this? As already pointed out, I originally touched about 450 fewer places in the code, and did not change the signature of BufferGetPage(). I was torn on the argument that we needed a "forced choice" on checking the snapshot age built into BufferGetPage() -- it is a bit annoying, but it might prevent a later bug which could silently return incorrect data. In the end, I caved to the argument that the annoyance was worth the improved chances of avoiding such a bug. > 2) Did you ever try to examine performance regression? Yes. Our customer used big machines for extensive performance testing -- although they used "paced" input to simulate real users, and saw nothing but gains. On my own i7 box, your test scaled to 100 (so it would fit in memory) yielded this: unpatched: number of transactions actually processed: 40534737 latency average = 0.739 ms latency stddev = 0.359 ms tps = 134973.430374 (including connections establishing) tps = 135039.395041 (excluding connections establishing) with the "snapshot too old" patch: number of transactions actually processed: 40679654 latency average = 0.735 ms latency stddev = 0.486 ms tps = 135463.614244 (including connections establishing) tps = 135866.160933 (excluding connections establishing) > I tried simple read > only test on 4x18 Intel server. > pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data fits > to shared_buffers) > > master - 193 740.3 TPS > snapshot too old reverted - 1 065 732.6 TPS > > So, for read-only benchmark this patch introduces more than 5 times > regression on big machine. I did not schedule a saturation test on a big machine. I guess I should have done. I'm confident this can be fixed; your suggestion for a wrapping "if" is probably sufficient. I am looking at this and the misuse of "volatile" now. Are you able to easily test that or should I book time on one (or more) of our big machines on my end? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Apr 11, 2016 at 12:31 PM, Kevin Grittner <kgrittn@gmail.com> wrote: > On Mon, Apr 11, 2016 at 8:20 AM, Alexander Korotkov >> So, for read-only benchmark this patch introduces more than 5 times >> regression on big machine. > > I did not schedule a saturation test on a big machine. I guess I > should have done. I have booked two large NUMA machines for Friday to look for any remaining performance regressions on high-concurrency loads on such machines. Anyone with ideas on particular tests that they feel should be included, please let me know before then. Thanks! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-04-12 13:57:26 -0500, Kevin Grittner wrote: > I have booked two large NUMA machines for Friday to look for any > remaining performance regressions on high-concurrency loads on such > machines. Anyone with ideas on particular tests that they feel > should be included, please let me know before then. The worst case is probably statements which are utterly trivial to execute, but do require a snapshot both during planning and execution. Like e.g. SELECT 1;. That's probably a good thing to get started with. Greetings, Andres Freund