Thread: pgsql: Add the "snapshot too old" feature

pgsql: Add the "snapshot too old" feature

From
Kevin Grittner
Date:
Add the "snapshot too old" feature

This feature is controlled by a new old_snapshot_threshold GUC.  A
value of -1 disables the feature, and that is the default.  The
value of 0 is just intended for testing.  Above that it is the
number of minutes a snapshot can reach before pruning and vacuum
are allowed to remove dead tuples which the snapshot would
otherwise protect.  The xmin associated with a transaction ID does
still protect dead tuples.  A connection which is using an "old"
snapshot does not get an error unless it accesses a page modified
recently enough that it might not be able to produce accurate
results.

This is similar to the Oracle feature, and we use the same SQLSTATE
and error message for compatibility.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/848ef42bb8c7909c9d7baa38178d4a209906e7c1

Modified Files
--------------
contrib/bloom/blscan.c                             |   3 +-
doc/src/sgml/config.sgml                           |  50 +++
src/backend/access/brin/brin.c                     |  19 +-
src/backend/access/brin/brin_revmap.c              |  11 +-
src/backend/access/gin/ginbtree.c                  |   9 +-
src/backend/access/gin/gindatapage.c               |   7 +-
src/backend/access/gin/ginget.c                    |  22 +-
src/backend/access/gin/gininsert.c                 |   2 +-
src/backend/access/gist/gistget.c                  |   2 +-
src/backend/access/hash/hash.c                     |   3 +-
src/backend/access/hash/hashsearch.c               |  10 +-
src/backend/access/heap/heapam.c                   |  31 +-
src/backend/access/heap/pruneheap.c                |  11 +-
src/backend/access/nbtree/nbtinsert.c              |   7 +-
src/backend/access/nbtree/nbtpage.c                |   2 +-
src/backend/access/nbtree/nbtsearch.c              |  51 ++-
src/backend/access/spgist/spgscan.c                |   2 +-
src/backend/commands/vacuum.c                      |   3 +-
src/backend/commands/vacuumlazy.c                  |   3 +-
src/backend/storage/buffer/bufmgr.c                |  40 ++
src/backend/storage/ipc/ipci.c                     |   3 +
src/backend/storage/ipc/procarray.c                |   9 +
src/backend/storage/lmgr/lwlocknames.txt           |   1 +
src/backend/utils/errcodes.txt                     |   4 +
src/backend/utils/misc/guc.c                       |  11 +
src/backend/utils/misc/postgresql.conf.sample      |   2 +
src/backend/utils/time/snapmgr.c                   | 404 +++++++++++++++++++++
src/include/access/brin_revmap.h                   |   5 +-
src/include/access/gin_private.h                   |   4 +-
src/include/access/nbtree.h                        |   7 +-
src/include/storage/bufmgr.h                       |  19 +-
src/include/utils/rel.h                            |   1 +
src/include/utils/snapmgr.h                        |  13 +
src/include/utils/snapshot.h                       |   4 +
src/test/modules/Makefile                          |   1 +
src/test/modules/snapshot_too_old/Makefile         |  47 +++
.../snapshot_too_old/expected/sto_using_cursor.out |  73 ++++
.../snapshot_too_old/expected/sto_using_select.out |  55 +++
.../snapshot_too_old/specs/sto_using_cursor.spec   |  37 ++
.../snapshot_too_old/specs/sto_using_select.spec   |  36 ++
src/test/modules/snapshot_too_old/sto.conf         |   3 +
41 files changed, 942 insertions(+), 85 deletions(-)


Re: pgsql: Add the "snapshot too old" feature

From
Kevin Grittner
Date:
On Fri, Apr 8, 2016 at 2:45 PM, Kevin Grittner <kgrittn@postgresql.org> wrote:
> Add the "snapshot too old" feature

> src/test/modules/Makefile                          |   1 +
> src/test/modules/snapshot_too_old/Makefile         |  47 +++
> .../snapshot_too_old/expected/sto_using_cursor.out |  73 ++++
> .../snapshot_too_old/expected/sto_using_select.out |  55 +++
> .../snapshot_too_old/specs/sto_using_cursor.spec   |  37 ++
> .../snapshot_too_old/specs/sto_using_select.spec   |  36 ++
> src/test/modules/snapshot_too_old/sto.conf         |   3 +

I see that there are failures on buildfarm Windows machines.
Makefiles are not my strong suit, and I don't have access to a
Windows machine for builds.  I'll start digging, but if someone can
help me with that, it would be much appreciated.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql: Add the "snapshot too old" feature

From
Tom Lane
Date:
Kevin Grittner <kgrittn@postgresql.org> writes:
> Add the "snapshot too old" feature

This has broken the MSVC build, evidently because you didn't
connect up the new test module properly.

            regards, tom lane


Re: pgsql: Add the "snapshot too old" feature

From
Tom Lane
Date:
Kevin Grittner <kgrittn@gmail.com> writes:
> I see that there are failures on buildfarm Windows machines.
> Makefiles are not my strong suit, and I don't have access to a
> Windows machine for builds.  I'll start digging, but if someone can
> help me with that, it would be much appreciated.

I think you need to add snapshot_too_old to @contrib_excludes in
Mkvcbuild.pm to stop that file from thinking that this module
contains anything to build.  Not sure if anything else needs to be
done to get the MSVC script to run the test the module embodies,
but that would at least let it get further than it's getting now.

            regards, tom lane


Re: pgsql: Add the "snapshot too old" feature

From
Kevin Grittner
Date:
On Fri, Apr 8, 2016 at 5:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgrittn@gmail.com> writes:
>> I see that there are failures on buildfarm Windows machines.
>> Makefiles are not my strong suit, and I don't have access to a
>> Windows machine for builds.  I'll start digging, but if someone can
>> help me with that, it would be much appreciated.
>
> I think you need to add snapshot_too_old to @contrib_excludes in
> Mkvcbuild.pm to stop that file from thinking that this module
> contains anything to build.  Not sure if anything else needs to be
> done to get the MSVC script to run the test the module embodies,
> but that would at least let it get further than it's getting now.

Thanks!  Pushed.

We'll see what happens now....

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql: Add the "snapshot too old" feature

From
Tom Lane
Date:
Kevin Grittner <kgrittn@postgresql.org> writes:
> Add the "snapshot too old" feature

I am fairly certain that this:

    typedef struct OldSnapshotControlData *OldSnapshotControl;

    static volatile OldSnapshotControl oldSnapshotControl;

does not do what you intended.  That causes the pointer variable
to be marked volatile, not what it points to.  You need to write

    static volatile OldSnapshotControlData *oldSnapshotControl;

to cause oldSnapshotControl to be understood as a pointer to
something volatile.

I think you could lose the OldSnapshotControl typedef altogether;
it's practically unused, it's confusingly similar to the name of
the pointer variable, and if anyplace did use it (eg to declare
a local-variable copy of oldSnapshotControl) it would be wrong
because of the same misplacement of the volatile property as here.

It's possible that you could instead fix this by putting the volatile
marker inside the typedef, but I'm not enough of a C language lawyer
to be sure that that works in the way you need; and since we don't
rely on such a thing anyplace else, I would be hesitant to start here.

            regards, tom lane


Re: pgsql: Add the "snapshot too old" feature

From
Alexander Korotkov
Date:
Kevin,

This commit makes me very uneasy.  I didn't much care about this patch mainly because I didn't imagine its consequences. Now, I see following:

1) We uglify buffer manager interface a lot.  This patch adds 3 more arguments to BufferGetPage().  It looks weird.  Should we try to find less invasive way for doing this?
2) Did you ever try to examine performance regression?  I tried simple read only test on 4x18 Intel server.
pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data fits to shared_buffers)

master                    -   193 740.3 TPS
snapshot too old reverted - 1 065 732.6 TPS

So, for read-only benchmark this patch introduces more than 5 times regression on big machine.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: pgsql: Add the "snapshot too old" feature

From
Kevin Grittner
Date:
On Mon, Apr 11, 2016 at 8:20 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

> 1) We uglify buffer manager interface a lot.  This patch adds 3 more
> arguments to BufferGetPage().  It looks weird.  Should we try to find less
> invasive way for doing this?

As already pointed out, I originally touched about 450 fewer places
in the code, and did not change the signature of BufferGetPage().
I was torn on the argument that we needed a "forced choice" on
checking the snapshot age built into BufferGetPage() -- it is a bit
annoying, but it might prevent a later bug which could silently
return incorrect data.  In the end, I caved to the argument that
the annoyance was worth the improved chances of avoiding such a
bug.

> 2) Did you ever try to examine performance regression?

Yes.  Our customer used big machines for extensive performance
testing -- although they used "paced" input to simulate real users,
and saw nothing but gains.  On my own i7 box, your test scaled to
100 (so it would fit in memory) yielded this:

unpatched:

number of transactions actually processed: 40534737
latency average = 0.739 ms
latency stddev = 0.359 ms
tps = 134973.430374 (including connections establishing)
tps = 135039.395041 (excluding connections establishing)

with the "snapshot too old" patch:

number of transactions actually processed: 40679654
latency average = 0.735 ms
latency stddev = 0.486 ms
tps = 135463.614244 (including connections establishing)
tps = 135866.160933 (excluding connections establishing)

> I tried simple read
> only test on 4x18 Intel server.
> pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data fits
> to shared_buffers)
>
> master                    -   193 740.3 TPS
> snapshot too old reverted - 1 065 732.6 TPS
>
> So, for read-only benchmark this patch introduces more than 5 times
> regression on big machine.

I did not schedule a saturation test on a big machine.  I guess I
should have done.

I'm confident this can be fixed; your suggestion for a wrapping
"if" is probably sufficient.  I am looking at this and the misuse
of "volatile" now.

Are you able to easily test that or should I book time on one (or
more) of our big machines on my end?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql: Add the "snapshot too old" feature

From
Kevin Grittner
Date:
On Mon, Apr 11, 2016 at 12:31 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Mon, Apr 11, 2016 at 8:20 AM, Alexander Korotkov

>> So, for read-only benchmark this patch introduces more than 5 times
>> regression on big machine.
>
> I did not schedule a saturation test on a big machine.  I guess I
> should have done.

I have booked two large NUMA machines for Friday to look for any
remaining performance regressions on high-concurrency loads on such
machines.  Anyone with ideas on particular tests that they feel
should be included, please let me know before then.

Thanks!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql: Add the "snapshot too old" feature

From
Andres Freund
Date:
On 2016-04-12 13:57:26 -0500, Kevin Grittner wrote:
> I have booked two large NUMA machines for Friday to look for any
> remaining performance regressions on high-concurrency loads on such
> machines.  Anyone with ideas on particular tests that they feel
> should be included, please let me know before then.

The worst case is probably statements which are utterly trivial to
execute, but do require a snapshot both during planning and
execution. Like e.g. SELECT 1;. That's probably a good thing to get
started with.

Greetings,

Andres Freund