Thread: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

[HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Dmitriy Sarafannikov

Date:

27 April 2017, 11:08:30

Hi hackers,

I'd like to propose to search min and max value in index with SnapshotAny in get_actual_variable_range function.
Current implementation scans index with SnapshotDirty which accepts uncommitted rows and rejects dead rows.
In a code there is a comment about this:
       /*
       * In principle, we should scan the index with our current
       * active snapshot, which is the best approximation we've got
       * to what the query will see when executed. But that won't
       * be exact if a new snap is taken before running the query,
       * and it can be very expensive if a lot of uncommitted rows
       * exist at the end of the index (because we'll laboriously
       * fetch each one and reject it). What seems like a good
       * compromise is to use SnapshotDirty. That will accept
       * uncommitted rows, and thus avoid fetching multiple heap
       * tuples in this scenario. On the other hand, it will reject
       * known-dead rows, and thus not give a bogus answer when the
       * extreme value has been deleted; that case motivates not
       * using SnapshotAny here.
       */
But if we delete many rows from beginning or end of index, it would be
very expensive too because we will fetch each dead row and reject it.

Following sequence can be used to reproduce this issue:
psql -c "DROP DATABASE test_polygon";
psql -c "CREATE DATABASE test_polygon";
psql test_polygon -c "CREATE EXTENSION postgis";
psql test_polygon -f /tmp/data.sql;
psql test_polygon -c "ANALYZE";

# \d polygon_table
                                  Table "public.polygon_table"
Column   |           Type           |                         Modifiers
-----------+--------------------------+------------------------------------------------------------
id        | integer                  | not null default nextval('polygon_table_id_seq'::regclass)
time      | timestamp with time zone | not null
poly      | geometry(Polygon,4326)   | not null
second_id | integer                  | not null
Indexes:
   "polygon_table_pkey" PRIMARY KEY, btree (id)
   "polygon_table_b179ed4a" btree (second_id)
   "polygon_table_poly_id" gist (poly)
   "polygon_table_time" btree ("time")
Foreign-key constraints:
   "second_table_id" FOREIGN KEY (second_id) REFERENCES second_table(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED

1st session:
pgbench test_polygon -P 1 -R 6000 -c 24 -j 8 -T 1000 -n -f /tmp/bad_request

2nd session:
psql test_polygon -c "DELETE FROM polygon_table WHERE time <= '2017-03-30 16:00:00+03'"

After delete we have many dead rows in the beginning of index "polygon_table_time" (time).

pgbench output:
progress: 1.0 s, 6023.8 tps, lat 1.170 ms stddev 1.022, lag 0.157 ms
progress: 2.0 s, 6023.8 tps, lat 1.045 ms stddev 0.182, lag 0.076 ms
progress: 3.0 s, 5957.0 tps, lat 1.046 ms stddev 0.176, lag 0.071 ms
progress: 4.0 s, 6066.9 tps, lat 1.061 ms stddev 0.184, lag 0.072 ms
progress: 5.0 s, 6178.1 tps, lat 1.060 ms stddev 0.189, lag 0.076 ms
progress: 6.0 s, 6079.0 tps, lat 1.075 ms stddev 0.195, lag 0.075 ms
progress: 7.0 s, 6246.0 tps, lat 1.069 ms stddev 0.194, lag 0.076 ms
progress: 8.0 s, 6046.0 tps, lat 1.050 ms stddev 0.181, lag 0.073 ms
progress: 9.0 s, 1255.0 tps, lat 79.114 ms stddev 189.686, lag 63.194 ms
progress: 10.0 s, 4696.0 tps, lat 1015.294 ms stddev 36.291, lag 1009.983 ms
progress: 11.0 s, 6031.0 tps, lat 1001.354 ms stddev 59.379, lag 997.375 ms
progress: 12.0 s, 6013.0 tps, lat 961.725 ms stddev 104.536, lag 957.736 ms
progress: 13.0 s, 6098.0 tps, lat 936.516 ms stddev 140.039, lag 932.580 ms
progress: 14.0 s, 6032.0 tps, lat 935.867 ms stddev 137.761, lag 931.892 ms
progress: 15.0 s, 5975.0 tps, lat 950.911 ms stddev 153.438, lag 946.895 ms
progress: 16.0 s, 6044.0 tps, lat 953.380 ms stddev 146.601, lag 949.413 ms
progress: 17.0 s, 6105.0 tps, lat 956.524 ms stddev 134.940, lag 952.593 ms
progress: 18.0 s, 6097.0 tps, lat 950.913 ms stddev 135.902, lag 946.980 ms
progress: 19.0 s, 6004.9 tps, lat 933.010 ms stddev 142.037, lag 929.014 ms
progress: 20.0 s, 6078.1 tps, lat 920.415 ms stddev 157.117, lag 916.469 ms
progress: 21.0 s, 5402.0 tps, lat 945.490 ms stddev 145.262, lag 941.048 ms
progress: 22.0 s, 5226.0 tps, lat 1082.013 ms stddev 141.718, lag 1077.423 ms
progress: 23.0 s, 12794.1 tps, lat 479.046 ms stddev 434.510, lag 478.106 ms
progress: 24.0 s, 5914.8 tps, lat 0.604 ms stddev 0.075, lag 0.067 ms
progress: 25.0 s, 5994.0 tps, lat 0.596 ms stddev 0.071, lag 0.066 ms
progress: 26.0 s, 6126.9 tps, lat 0.598 ms stddev 0.072, lag 0.067 ms
progress: 27.0 s, 6076.2 tps, lat 0.601 ms stddev 0.072, lag 0.068 ms
progress: 28.0 s, 6035.0 tps, lat 0.608 ms stddev 0.077, lag 0.068 ms

After delete (9s) latency increases significantly for up to 1000ms until autovacuum comes and performs index cleanup (23s).
From EXPLAIN ANALYZE we could see, that we have significantly increased Planning time:
# explain (analyze, verbose, timing, buffers) SELECT * FROM polygon_table polygon INNER JOIN second_table second ON (polygon.second_id = second.id) WHERE ST_Intersects(poly, ST_SetSrid(ST_MakePoint(52.3433914, 58.7438431), 4326));
                                                                        QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.56..16.86 rows=1 width=160) (actual time=0.100..0.248 rows=4 loops=1)
  Output: polygon.id, polygon."time", polygon.poly, polygon.second_id, second.id
  Buffers: shared hit=49
  -> Index Scan using polygon_table_poly_id on public.polygon_table polygon (cost=0.29..8.55 rows=1 width=156) (actual time=0.081..0.220 rows=4 loops=1)
        Output: polygon.id, polygon."time", polygon.poly, polygon.second_id
        Index Cond: (polygon.poly && '0101000020E6100000245DD83FF42B4A4079ED2D40365F4D40'::geometry)
        Filter: _st_intersects(polygon.poly, '0101000020E6100000245DD83FF42B4A4079ED2D40365F4D40'::geometry)
        Rows Removed by Filter: 6
        Buffers: shared hit=37
  -> Index Only Scan using second_table_pkey on public.second_table second (cost=0.28..8.29 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=4)
        Output: second.id
        Index Cond: (second.id = polygon.second_id)
        Heap Fetches: 4
        Buffers: shared hit=12
Planning time: 115.122 ms
Execution time: 0.422 ms
(16 rows)

Time: 116.926 ms

# explain (analyze, verbose, timing, buffers) SELECT * FROM polygon_table polygon INNER JOIN second_table second ON (polygon.second_id = second.id) WHERE ST_Intersects(poly, ST_SetSrid(ST_MakePoint(52.3433914, 58.7438431), 4326));
                                                                        QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.56..16.86 rows=1 width=160) (actual time=0.059..0.373 rows=46 loops=1)
  Output: polygon.id, polygon."time", polygon.poly, polygon.second_id, second.id
  Buffers: shared hit=170
  -> Index Scan using polygon_table_poly_id on public.polygon_table polygon (cost=0.29..8.55 rows=1 width=156) (actual time=0.045..0.269 rows=46 loops=1)
        Output: polygon.id, polygon."time", polygon.poly, polygon.second_id
        Index Cond: (polygon.poly && '0101000020E6100000245DD83FF42B4A4079ED2D40365F4D40'::geometry)
        Filter: _st_intersects(polygon.poly, '0101000020E6100000245DD83FF42B4A4079ED2D40365F4D40'::geometry)
        Rows Removed by Filter: 44
        Buffers: shared hit=32
  -> Index Only Scan using second_table_pkey on public.second_table second (cost=0.28..8.29 rows=1 width=4) (actual time=0.001..0.002 rows=1 loops=46)
        Output: second.id
        Index Cond: (second.id = polygon.second_id)
        Heap Fetches: 46
        Buffers: shared hit=138
Planning time: 6.139 ms
Execution time: 0.482 ms
(16 rows)

Time: 7.722 ms

Initially, the function used active snapshot from GetActiveSnapshot(). But in fccebe421d0c410e6378fb281419442c84759213
this behavior was "weakened" to SnapshotDirty (I suppose for a similar reason).
Was there a particular reason for allowing planner to see uncommitted rows, but forbidding him access to the dead ones?
Simple patch that uses SnapshotAny is attached. Comments in code are not changed yet.

data.sql file: https://yadi.sk/d/GtBW6Hhu3HQ4CA

Regards,

Dmitriy Sarafannikov

Attachment

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Robert Haas

Date:

27 April 2017, 23:27:40

On Thu, Apr 27, 2017 at 4:08 AM, Dmitriy Sarafannikov
<dsarafannikov@yandex.ru> wrote:
> I'd like to propose to search min and max value in index with SnapshotAny in
> get_actual_variable_range function.

+1 from me, but Tom rejected that approach last time.

> But if we delete many rows from beginning or end of index, it would be
> very expensive too because we will fetch each dead row and reject it.

Yep, and I've seen that turn into a serious problem in production.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Tom Lane

Date:

28 April 2017, 00:22:25

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 27, 2017 at 4:08 AM, Dmitriy Sarafannikov
> <dsarafannikov@yandex.ru> wrote:
>> I'd like to propose to search min and max value in index with SnapshotAny in
>> get_actual_variable_range function.

> +1 from me, but Tom rejected that approach last time.

I remain concerned about the fact that this would allow accepting deleted
values that might be way outside the surviving range of values.
SnapshotDirty is a bit lax about tuple state, but it's not so lax as to
accept fully dead tuples.

>> But if we delete many rows from beginning or end of index, it would be
>> very expensive too because we will fetch each dead row and reject it.

> Yep, and I've seen that turn into a serious problem in production.

How so?  Shouldn't the indexscan go back and mark such tuples dead in
the index, such that they'd be visited this way only once?  If that's
not happening, maybe we should try to fix it.
        regards, tom lane

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Andres Freund

Date:

28 April 2017, 00:27:38

On 2017-04-27 17:22:25 -0400, Tom Lane wrote:
> > Yep, and I've seen that turn into a serious problem in production.
> 
> How so?  Shouldn't the indexscan go back and mark such tuples dead in
> the index, such that they'd be visited this way only once?  If that's
> not happening, maybe we should try to fix it.

One way that never happens is if you end up choosing bitmap index scans
all the time...  I've previously had to force a certain percentage of
queries with it disabled to keep indexes in a good state :(

- Andres

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Tom Lane

Date:

28 April 2017, 00:33:54

Andres Freund <andres@anarazel.de> writes:
> On 2017-04-27 17:22:25 -0400, Tom Lane wrote:
>> How so?  Shouldn't the indexscan go back and mark such tuples dead in
>> the index, such that they'd be visited this way only once?  If that's
>> not happening, maybe we should try to fix it.

> One way that never happens is if you end up choosing bitmap index scans
> all the time...

What I'm thinking of is the regular indexscan that's done internally
by get_actual_variable_range, not whatever ends up getting chosen as
the plan for the user query.  I had supposed that that would kill
dead index entries as it went, but maybe that's not happening for
some reason.
        regards, tom lane

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Vladimir Borodin

Date:

28 April 2017, 17:10:31

Hi all.

28 апр. 2017 г., в 0:22, Tom Lane <tgl@sss.pgh.pa.us> написал(а):

Robert Haas <robertmhaas@gmail.com> writes:
Yep, and I've seen that turn into a serious problem in production.

And that’s why we started digging into it :) We have already seen that to be a problem in another project (not so serious) and that time we rewrote the query. But now planning of such query consumes much more CPU (i.e. on [1] each line is for one DB host) and nearly all queries on all hosts (primary and all standbys) work much worse.

How so? Shouldn't the indexscan go back and mark such tuples dead in
the index, such that they'd be visited this way only once? If that's
not happening, maybe we should try to fix it.

That is definitely not happening. Here is the perf output when the problem happens:

root@pgload01e ~ # perf report | grep -v '^#' | head   56.67%  postgres   postgres                [.] _bt_checkkeys   19.27%  postgres   postgres                [.] _bt_readpage    2.09%  postgres   postgres                [.] pglz_decompress    2.03%  postgres   postgres                [.] LWLockAttemptLock    1.61%  postgres   postgres                [.] PinBuffer.isra.3    1.14%  postgres   postgres                [.] hash_search_with_hash_value    0.68%  postgres   postgres                [.] LWLockRelease    0.42%  postgres   postgres                [.] AllocSetAlloc    0.40%  postgres   postgres                [.] SearchCatCache    0.40%  postgres   postgres                [.] ReadBuffer_common
root@pgload01e ~ #

And here is one of backtrace of one of backends (they all look pretty the same):

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fe9e2660709 in _bt_checkkeys (scan=scan@entry=0x7fe9e48ec838, page=page@entry=0x7fe8e477d780 "\263$", offnum=offnum@entry=338, dir=dir@entry=ForwardScanDirection, continuescan=continuescan@entry=0x7ffe3addfd1f "\001h\341\b") at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/access/nbtree/nbtutils.c:1378
#0  0x00007fe9e2660709 in _bt_checkkeys (scan=scan@entry=0x7fe9e48ec838, page=page@entry=0x7fe8e477d780 "\263$", offnum=offnum@entry=338, dir=dir@entry=ForwardScanDirection, continuescan=continuescan@entry=0x7ffe3addfd1f "\001h\341\b") at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/access/nbtree/nbtutils.c:1378
#1  0x00007fe9e265d301 in _bt_readpage (scan=scan@entry=0x7fe9e48ec838, dir=dir@entry=ForwardScanDirection, offnum=338) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/access/nbtree/nbtsearch.c:1198
#2  0x00007fe9e265dbce in _bt_steppage (scan=scan@entry=0x7fe9e48ec838, dir=dir@entry=ForwardScanDirection) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/access/nbtree/nbtsearch.c:1357
#3  0x00007fe9e265efec in _bt_endpoint (dir=ForwardScanDirection, scan=0x7fe9e48ec838) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/access/nbtree/nbtsearch.c:1739
#4  _bt_first (scan=scan@entry=0x7fe9e48ec838, dir=dir@entry=ForwardScanDirection) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/access/nbtree/nbtsearch.c:746
#5  0x00007fe9e265c119 in btgettuple (scan=0x7fe9e48ec838, dir=ForwardScanDirection) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/access/nbtree/nbtree.c:326
#6  0x00007fe9e2656162 in index_getnext_tid (scan=scan@entry=0x7fe9e48ec838, direction=direction@entry=ForwardScanDirection) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/access/index/indexam.c:415
#7  0x00007fe9e2656354 in index_getnext (scan=scan@entry=0x7fe9e48ec838, direction=direction@entry=ForwardScanDirection) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/access/index/indexam.c:553
#8  0x00007fe9e2942f70 in get_actual_variable_range (vardata=vardata@entry=0x7ffe3ade1620, sortop=<optimized out>, min=0x7fe9e4912ae0, max=0x0, root=0x7fe9e48e89b0) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/utils/adt/selfuncs.c:5137
#9  0x00007fe9e2943c24 in ineq_histogram_selectivity (root=root@entry=0x7fe9e48e89b0, vardata=vardata@entry=0x7ffe3ade1620, opproc=opproc@entry=0x7ffe3ade1550, isgt=isgt@entry=0 '\000', constval=constval@entry=493076, consttype=consttype@entry=23) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/utils/adt/selfuncs.c:833
#10 0x00007fe9e29445f8 in scalarineqsel (root=root@entry=0x7fe9e48e89b0, operator=operator@entry=97, isgt=isgt@entry=0 '\000', vardata=vardata@entry=0x7ffe3ade1620, constval=493076, consttype=23) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/utils/adt/selfuncs.c:563
#11 0x00007fe9e2946233 in mergejoinscansel (root=root@entry=0x7fe9e48e89b0, clause=<optimized out>, opfamily=<optimized out>, strategy=<optimized out>, nulls_first=<optimized out>, leftstart=leftstart@entry=0x7ffe3ade1728, leftend=leftend@entry=0x7ffe3ade1730, rightstart=rightstart@entry=0x7ffe3ade1738, rightend=rightend@entry=0x7ffe3ade1740) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/utils/adt/selfuncs.c:3056
#12 0x00007fe9e27e1b16 in cached_scansel (rinfo=0x7fe9e49100c0, rinfo=0x7fe9e49100c0, pathkey=0x7fe9e490fb80, root=0x7fe9e48e89b0) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/optimizer/path/costsize.c:2626
#13 initial_cost_mergejoin (root=root@entry=0x7fe9e48e89b0, workspace=workspace@entry=0x7ffe3ade1830, jointype=jointype@entry=JOIN_INNER, mergeclauses=mergeclauses@entry=0x7fe9e4912a60, outer_path=outer_path@entry=0x7fe9e490f810, inner_path=inner_path@entry=0x7fe9e490f5b8, outersortkeys=outersortkeys@entry=0x7fe9e4912a10, innersortkeys=innersortkeys@entry=0x7fe9e4912ab0, sjinfo=0x7ffe3ade1a70) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/optimizer/path/costsize.c:2226
#14 0x00007fe9e27ebde6 in try_mergejoin_path (root=root@entry=0x7fe9e48e89b0, joinrel=joinrel@entry=0x7fe9e4910b28, outer_path=outer_path@entry=0x7fe9e490f810, inner_path=inner_path@entry=0x7fe9e490f5b8, pathkeys=0x0, mergeclauses=mergeclauses@entry=0x7fe9e4912a60, outersortkeys=outersortkeys@entry=0x7fe9e4912a10, innersortkeys=innersortkeys@entry=0x7fe9e4912ab0, jointype=jointype@entry=JOIN_INNER, extra=extra@entry=0x7ffe3ade1970) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/optimizer/path/joinpath.c:443
#15 0x00007fe9e27ec705 in sort_inner_and_outer (extra=0x7ffe3ade1970, jointype=JOIN_INNER, innerrel=0x7fe9e48eba00, outerrel=0x7fe9e48ea990, joinrel=0x7fe9e4910b28, root=0x7fe9e48e89b0) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/optimizer/path/joinpath.c:766
#16 add_paths_to_joinrel (root=root@entry=0x7fe9e48e89b0, joinrel=joinrel@entry=0x7fe9e4910b28, outerrel=outerrel@entry=0x7fe9e48ea990, innerrel=innerrel@entry=0x7fe9e48eba00, jointype=jointype@entry=JOIN_INNER, sjinfo=sjinfo@entry=0x7ffe3ade1a70, restrictlist=restrictlist@entry=0x7fe9e4912850) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/optimizer/path/joinpath.c:173
#17 0x00007fe9e27ee201 in make_join_rel (root=root@entry=0x7fe9e48e89b0, rel1=rel1@entry=0x7fe9e48ea990, rel2=rel2@entry=0x7fe9e48eba00) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/optimizer/path/joinrels.c:754
#18 0x00007fe9e27eeabb in make_rels_by_clause_joins (other_rels=<optimized out>, old_rel=<optimized out>, root=<optimized out>) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/optimizer/path/joinrels.c:274
#19 join_search_one_level (root=root@entry=0x7fe9e48e89b0, level=level@entry=2) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/optimizer/path/joinrels.c:96
#20 0x00007fe9e27df09b in standard_join_search (root=0x7fe9e48e89b0, levels_needed=2, initial_rels=<optimized out>) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/optimizer/path/allpaths.c:2186
#21 0x00007fe9e27df41b in make_one_rel (root=root@entry=0x7fe9e48e89b0, joinlist=joinlist@entry=0x7fe9e48ebd58) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/optimizer/path/allpaths.c:176
#22 0x00007fe9e27fa10e in query_planner (root=root@entry=0x7fe9e48e89b0, tlist=tlist@entry=0x7fe9e48e9020, qp_callback=qp_callback@entry=0x7fe9e27fa740 <standard_qp_callback>, qp_extra=qp_extra@entry=0x7ffe3ade1d60) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/optimizer/plan/planmain.c:255
#23 0x00007fe9e27fbc14 in grouping_planner (root=root@entry=0x7fe9e48e89b0, inheritance_update=inheritance_update@entry=0 '\000', tuple_fraction=<optimized out>, tuple_fraction@entry=0) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/optimizer/plan/planner.c:1701
#24 0x00007fe9e27fe861 in subquery_planner (glob=glob@entry=0x7fe9e4533270, parse=parse@entry=0x7fe9e4760290, parent_root=parent_root@entry=0x0, hasRecursion=hasRecursion@entry=0 '\000', tuple_fraction=tuple_fraction@entry=0) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/optimizer/plan/planner.c:759
#25 0x00007fe9e27ff7bd in standard_planner (parse=0x7fe9e4760290, cursorOptions=256, boundParams=0x0) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/optimizer/plan/planner.c:292
#26 0x00007fe9dbeb6345 in pathman_planner_hook () from /usr/lib/postgresql/9.6/lib/pg_pathman.so
#27 0x00007fe9e288d1d4 in pg_plan_query (querytree=<optimized out>, cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/tcop/postgres.c:798
#28 0x00007fe9e288d2c4 in pg_plan_queries (querytrees=<optimized out>, cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/tcop/postgres.c:857
#29 0x00007fe9e288f52d in exec_simple_query (query_string=0x7fe9e475e9d0 "SELECT * FROM polygon_table polygon INNER JOIN second_table second ON (polygon.second_id = second.id) WHERE ST_Intersects(poly, ST_SetSrid(ST_MakePoint(52.3433914, 58.7438431), 4326));") at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/tcop/postgres.c:1022
#30 PostgresMain (argc=<optimized out>, argv=argv@entry=0x7fe9e452f428, dbname=0x7fe9e452f350 "small_weather", username=<optimized out>) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/tcop/postgres.c:4076
#31 0x00007fe9e26134e5 in BackendRun (port=0x7fe9e452d0b0) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/postmaster/postmaster.c:4271
#32 BackendStartup (port=0x7fe9e452d0b0) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/postmaster/postmaster.c:3945
#33 ServerLoop () at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/postmaster/postmaster.c:1701
#34 0x00007fe9e282e4db in PostmasterMain (argc=5, argv=<optimized out>) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/postmaster/postmaster.c:1309
#35 0x00007fe9e2614232 in main (argc=5, argv=0x7fe9e44e1210) at /home/robot-gerrit/workspace/postgresql-9.6-deb/src/build/../src/backend/main/main.c:228

And if you disable autovacuum on the table, the problem will continue until you manually run vacuum on it. There could be thousands of index scans during this time. Dmitriy investigated the problem a bit more and may provide some more details.

[1] https://yadi.sk/i/tVsuKgXt3HSaQf

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Robert Haas

Date:

28 April 2017, 18:31:11

On Thu, Apr 27, 2017 at 5:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> But if we delete many rows from beginning or end of index, it would be
>>> very expensive too because we will fetch each dead row and reject it.
>
>> Yep, and I've seen that turn into a serious problem in production.
>
> How so?  Shouldn't the indexscan go back and mark such tuples dead in
> the index, such that they'd be visited this way only once?  If that's
> not happening, maybe we should try to fix it.

Hmm.  Actually, I think the scenario I saw was where there was a large
number of tuples at the end of the index that weren't dead yet due to
an old snapshot held open.  That index was being scanned by lots of
short-running queries.  Those queries executed just fine, but they
took a long to plan because they had to step over all of the dead
tuples in the index one by one.  That increased planning time -
multiplied by the number of times it was incurred - was sufficient to
cripple the system.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Tom Lane

Date:

28 April 2017, 19:12:08

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 27, 2017 at 5:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> How so?  Shouldn't the indexscan go back and mark such tuples dead in
>> the index, such that they'd be visited this way only once?  If that's
>> not happening, maybe we should try to fix it.

> Hmm.  Actually, I think the scenario I saw was where there was a large
> number of tuples at the end of the index that weren't dead yet due to
> an old snapshot held open.  That index was being scanned by lots of
> short-running queries.  Those queries executed just fine, but they
> took a long to plan because they had to step over all of the dead
> tuples in the index one by one.

But that was the scenario that we intended to fix by changing to
SnapshotDirty, no?  Or I guess not quite, because
dead-but-still-visible-to-somebody tuples are rejected by SnapshotDirty.

Maybe we need another type of snapshot that would accept any
non-vacuumable tuple.  I really don't want SnapshotAny semantics here,
but a tuple that was live more recently than the xmin horizon seems
like it's acceptable enough.  HeapTupleSatisfiesVacuum already
implements the right behavior, but we don't have a Snapshot-style
interface for it.
        regards, tom lane

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Dmitriy Sarafannikov

Date:

28 April 2017, 19:32:35

What I'm thinking of is the regular indexscan that's done internally
by get_actual_variable_range, not whatever ends up getting chosen as
the plan for the user query. I had supposed that that would kill
dead index entries as it went, but maybe that's not happening for
some reason.

Really, this happens as you said. Index entries are marked as dead.

But after this, backends spends cpu time on skip this killed entries

in _bt_checkkeys :

if (scan->ignore_killed_tuples && ItemIdIsDead(iid))

{

/* return immediately if there are more tuples on the page */

if (ScanDirectionIsForward(dir))

{

if (offnum < PageGetMaxOffsetNumber(page))

return NULL;

}

else

{

BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);

if (offnum > P_FIRSTDATAKEY(opaque))

return NULL;

}

This confirmed by perf records and backtrace reported by Vladimir earlier.

root@pgload01e ~ # perf report | grep -v '^#' | head   56.67%  postgres   postgres                [.] _bt_checkkeys   19.27%  postgres   postgres                [.] _bt_readpage    2.09%  postgres   postgres                [.] pglz_decompress    2.03%  postgres   postgres                [.] LWLockAttemptLock    1.61%  postgres   postgres                [.] PinBuffer.isra.3    1.14%  postgres   postgres                [.] hash_search_with_hash_value    0.68%  postgres   postgres                [.] LWLockRelease    0.42%  postgres   postgres                [.] AllocSetAlloc    0.40%  postgres   postgres                [.] SearchCatCache    0.40%  postgres   postgres                [.] ReadBuffer_common
root@pgload01e ~ #

It seems like killing dead tuples does not solve this problem.

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Robert Haas

Date:

28 April 2017, 19:49:48

On Fri, Apr 28, 2017 at 12:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Apr 27, 2017 at 5:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> How so?  Shouldn't the indexscan go back and mark such tuples dead in
>>> the index, such that they'd be visited this way only once?  If that's
>>> not happening, maybe we should try to fix it.
>
>> Hmm.  Actually, I think the scenario I saw was where there was a large
>> number of tuples at the end of the index that weren't dead yet due to
>> an old snapshot held open.  That index was being scanned by lots of
>> short-running queries.  Those queries executed just fine, but they
>> took a long to plan because they had to step over all of the dead
>> tuples in the index one by one.
>
> But that was the scenario that we intended to fix by changing to
> SnapshotDirty, no?  Or I guess not quite, because
> dead-but-still-visible-to-somebody tuples are rejected by SnapshotDirty.

Yup.

> Maybe we need another type of snapshot that would accept any
> non-vacuumable tuple.  I really don't want SnapshotAny semantics here,
> but a tuple that was live more recently than the xmin horizon seems
> like it's acceptable enough.  HeapTupleSatisfiesVacuum already
> implements the right behavior, but we don't have a Snapshot-style
> interface for it.

Maybe.  What I know is that several people have found SnapshotDirty to
be problematic, and in the case with which I am acquainted, using
SnapshotAny fixed it.  I do not know whether, if everybody in the
world were using SnapshotAny, somebody would have the problem you're
talking about, or some other one.  And if they did, I don't know
whether using the new kind of snapshot you are proposing would fix it.
I do know that giving SnapshotAny to people seems to have only
improved things according to the information currently available to
me.

I don't, in general, share your intuition that using SnapshotAny is
the wrong thing.  We're looking up the last value in the index for
planning purposes.  It seems to me that, as far as making index scans
more or less expensive to scan, a dead tuple is almost as good as a
live one.  Until that tuple is not only marked dead, but removed from
the index page, it contributes to the cost of an index scan.  To put
that another way, suppose the range of index keys is 0 to 2 million,
but the heap tuples for values 1 million and up are committed deleted.
All the index tuples remain (and may or may not be removable depending
on what other snapshots exist in the system).  Now, consider the
following three cases:

(a) index scan from 0 to 10,000
(b) index scan from 1,000,000 to 1,010,000
(c) index scan from 3,000,000 to 3,010,000

I contend that the cost of index scan (b) is a lot closer to the cost
of (a) than to the cost of (c).  (b) finds a whole bunch of index
tuples; (c) gives up immediately and goes home.  So I actually think
that using the actual last value in the index - 2,000,000 - is
conceptually *correct* regardless of whether it's marked dead and
regardless of whether the corresponding heap tuple is dead.  The cost
depends mostly on which tuples are present in the index, not which
table rows the user can see.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Tom Lane

Date:

28 April 2017, 22:00:47

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Apr 28, 2017 at 12:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Maybe we need another type of snapshot that would accept any
>> non-vacuumable tuple.  I really don't want SnapshotAny semantics here,

> I don't, in general, share your intuition that using SnapshotAny is
> the wrong thing.

I think you are mistaken.

> We're looking up the last value in the index for
> planning purposes.  It seems to me that, as far as making index scans
> more or less expensive to scan, a dead tuple is almost as good as a
> live one.

You are confusing number of tuples in the index, which we estimate from
independent measurements such as the file size, with endpoint value,
which is used for purposes like guessing whether a mergejoin will be
able to stop early.  For purposes like that, we do NOT want to include
dead tuples, because the merge join is never gonna see 'em.

In short, arguing about the cost of an indexscan per se is quite
irrelevant, because this statistic is not used when estimating the
cost of an indexscan.

Or put another way: the observed problem is planning time, not that the
resulting estimates are bad.  You argue that we should cut planning
time by changing the definition of the estimate, and claim that the
new definition is better, but I think you have nothing but wishful
thinking behind that.  I'm willing to try to cut planning time, but
I don't want the definition to change any further than it has to.
        regards, tom lane

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Robert Haas

Date:

29 April 2017, 04:07:17

On Fri, Apr 28, 2017 at 3:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> You are confusing number of tuples in the index, which we estimate from
> independent measurements such as the file size, with endpoint value,
> which is used for purposes like guessing whether a mergejoin will be
> able to stop early.  For purposes like that, we do NOT want to include
> dead tuples, because the merge join is never gonna see 'em.

I spent several hours today thinking about this and, more than once,
thought I'd come up with an example demonstrating why my idea was
better than yours (despite the fact that, as you point out, the merge
join is never gonna see 'em).  However, in each instance, I eventually
realized that I was wrong, so I guess I'll have to concede this point.

> Or put another way: the observed problem is planning time, not that the
> resulting estimates are bad.  You argue that we should cut planning
> time by changing the definition of the estimate, and claim that the
> new definition is better, but I think you have nothing but wishful
> thinking behind that.  I'm willing to try to cut planning time, but
> I don't want the definition to change any further than it has to.

OK, I guess that makes sense. There can be scads of dead tuples at the
end of the index, and there's no surety that the query actually
requires touching that portion of the index at all apart from
planning, so it seems a bit unfortunate to burden planning with the
overhead of cleaning them up.  But I guess with your new proposed
definition at least that can only happen once.  After that there may
be a bunch of pages to skip at the end of the index before we actually
find a tuple, but they should at least be empty.  Right now, you can
end up skipping many tuples repeatedly for every query.  I tested a
two-table join with a million committed deleted but not dead tuples at
the end of one index; that increased planning time from ~0.25ms to
~90ms; obviously, a two-and-a-half order of magnitude increase in CPU
time spent planning is not a welcome development on a production
system.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Dmitriy Sarafannikov

Date:

29 April 2017, 11:15:25

> Maybe we need another type of snapshot that would accept any
> non-vacuumable tuple.  I really don't want SnapshotAny semantics here,
> but a tuple that was live more recently than the xmin horizon seems
> like it's acceptable enough.  HeapTupleSatisfiesVacuum already
> implements the right behavior, but we don't have a Snapshot-style
> interface for it.


If I understood correctly, this new type of snapshot would help if
there are long running transactions which can see this tuples.
But if there are not long running transactions, it will be the same.
Am i right?

And what about don’t fetch actual min and max values from indexes
whose columns doesn’t involved in join?

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Tom Lane

Date:

29 April 2017, 17:34:51

Dmitriy Sarafannikov <dsarafannikov@yandex.ru> writes:
>> Maybe we need another type of snapshot that would accept any
>> non-vacuumable tuple.  I really don't want SnapshotAny semantics here,

> If I understood correctly, this new type of snapshot would help if
> there are long running transactions which can see this tuples.
> But if there are not long running transactions, it will be the same.
> Am i right?

Right.  You haven't shown us much about the use-case you're concerned
with, so it's not clear what's actually needed.

> And what about don’t fetch actual min and max values from indexes
> whose columns doesn’t involved in join? 

We don't fetch that info unless we need it.

I'm not entirely certain, but there could be cases where a single
planning cycle ends up fetching that data more than once.  (There's
caching at the RestrictInfo level, but that might not be enough.)
So a line of thought that might be worth looking into is adding a
lower layer of caching to make sure it's not done more than once per
plan.  Again, whether this saves anything would depend a lot on
specific use-cases.
        regards, tom lane

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Amit Kapila

Date:

01 May 2017, 14:58:39

On Fri, Apr 28, 2017 at 10:02 PM, Dmitriy Sarafannikov
<dsarafannikov@yandex.ru> wrote:
>
> What I'm thinking of is the regular indexscan that's done internally
> by get_actual_variable_range, not whatever ends up getting chosen as
> the plan for the user query.  I had supposed that that would kill
> dead index entries as it went, but maybe that's not happening for
> some reason.
>
>
> Really, this happens as you said. Index entries are marked as dead.
> But after this, backends spends cpu time on skip this killed entries
> in _bt_checkkeys :
>

If that is the case, then how would using SnapshotAny solve this
problem.  We get the value from index first and then check its
visibility in heap, so if time is spent in _bt_checkkeys, why would
using a different kind of Snapshot solve the problem?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Vladimir Borodin

Date:

02 May 2017, 10:25:27

> 29 апр. 2017 г., в 17:34, Tom Lane <tgl@sss.pgh.pa.us> написал(а):
>
> Dmitriy Sarafannikov <dsarafannikov@yandex.ru> writes:
>>> Maybe we need another type of snapshot that would accept any
>>> non-vacuumable tuple.  I really don't want SnapshotAny semantics here,
>
>> If I understood correctly, this new type of snapshot would help if
>> there are long running transactions which can see this tuples.
>> But if there are not long running transactions, it will be the same.
>> Am i right?
>
> Right.  You haven't shown us much about the use-case you're concerned
> with, so it's not clear what's actually needed.

The use case is nearly the same as the way to reproduce the problem described in the first letter. It’s an OLTP
databasewith short mostly read-only queries (~ 6k rps). Every 10 minutes new data is inserted (~5-10% of rows in
polygon_table)and old is deleted (~ 5-10% or rows in polygon_table). Insert and delete are made in different
transactions.Until the vacuum after delete is finished the planning time is two orders of magnitude is higher than
usually.If we use prepared statements the problem doesn’t reproduce since planning is not actually done. 

>
>> And what about don’t fetch actual min and max values from indexes
>> whose columns doesn’t involved in join?
>
> We don't fetch that info unless we need it.

We used to think that it’s actually not so (there was a problem in our test case), but we rechecked and it is, the
plannerdoesn’t find min and max in unneeded for join indexes. 

>
> I'm not entirely certain, but there could be cases where a single
> planning cycle ends up fetching that data more than once.  (There's
> caching at the RestrictInfo level, but that might not be enough.)
> So a line of thought that might be worth looking into is adding a
> lower layer of caching to make sure it's not done more than once per
> plan.  Again, whether this saves anything would depend a lot on
> specific use-cases.
>
>             regards, tom lane

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Dmitriy Sarafannikov

Date:

02 May 2017, 11:59:30

> If that is the case, then how would using SnapshotAny solve this
> problem.  We get the value from index first and then check its
> visibility in heap, so if time is spent in _bt_checkkeys, why would
> using a different kind of Snapshot solve the problem?

1st scanning on the index with SnapshotAny will accept this rows and
will not mark this entries as killed.

Theremore, killed entries are ignored on standby. And scanning with
SnapshotAny will always accept first row from index.    /*     * During recovery we ignore killed tuples and don't
botherto kill them     * either. We do this because the xmin on the primary node could easily be     * later than the
xminon the standby node, so that what the primary     * thinks is killed is supposed to be visible on standby. So for
correct    * MVCC for queries during recovery we must ignore these hints and check     * all tuples. Do *not* set
ignore_killed_tuplesto true when running in a     * transaction that was started during recovery. xactStartedInRecovery
   * should not be altered by index AMs.     */    scan->kill_prior_tuple = false;    scan->xactStartedInRecovery =
TransactionStartedDuringRecovery();   scan->ignore_killed_tuples = !scan->xactStartedInRecovery; 

We execute this read-only queries on standby.

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Vladimir Borodin

Date:

02 May 2017, 19:37:44

Hi.

> 25 апр. 2017 г., в 18:13, Dmitriy Sarafannikov <dsarafannikov@yandex.ru> написал(а):
>
> I'd like to propose to search min and max value in index with SnapshotAny in get_actual_variable_range function.
> Current implementation scans index with SnapshotDirty which accepts uncommitted rows and rejects dead rows.

The patch is already being discussed here [1].

[1] https://www.postgresql.org/message-id/05C72CF7-B5F6-4DB9-8A09-5AC897653113%40yandex.ru

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Dmitriy Sarafannikov

Date:

04 May 2017, 19:12:55

> Maybe we need another type of snapshot that would accept any
> non-vacuumable tuple.  I really don't want SnapshotAny semantics here,
> but a tuple that was live more recently than the xmin horizon seems
> like it's acceptable enough.  HeapTupleSatisfiesVacuum already
> implements the right behavior, but we don't have a Snapshot-style
> interface for it.


I have tried to implement this new type of snapshot that accepts any
non-vacuumable tuples.
We have tried this patch in our load environment. And it has smoothed out
and reduced magnitude of the cpu usage peaks.
But this snapshot does not solve the problem completely.

Patch is attached.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

snapshot_non_vacuumable.patch

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Amit Kapila

Date:

05 May 2017, 10:49:57

On Thu, May 4, 2017 at 9:42 PM, Dmitriy Sarafannikov
<dsarafannikov@yandex.ru> wrote:
>
>> Maybe we need another type of snapshot that would accept any
>> non-vacuumable tuple.  I really don't want SnapshotAny semantics here,
>> but a tuple that was live more recently than the xmin horizon seems
>> like it's acceptable enough.  HeapTupleSatisfiesVacuum already
>> implements the right behavior, but we don't have a Snapshot-style
>> interface for it.
>
>
> I have tried to implement this new type of snapshot that accepts any
> non-vacuumable tuples.
> We have tried this patch in our load environment. And it has smoothed out
> and reduced magnitude of the cpu usage peaks.
> But this snapshot does not solve the problem completely.
>
> Patch is attached.

1.
+#define InitNonVacuumableSnapshot(snapshotdata)  \
+ do { \
+ (snapshotdata).satisfies = HeapTupleSatisfiesNonVacuumable; \
+ (snapshotdata).xmin = RecentGlobalDataXmin; \
+ } while(0)
+

Can you explain and add comments why you think RecentGlobalDataXmin is
the right to use it here?  As of now, it seems to be only used for
pruning non-catalog tables.

2.
+bool
+HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
+ Buffer buffer)
+{
+ return HeapTupleSatisfiesVacuum(htup, snapshot->xmin, buffer)
+ != HEAPTUPLE_DEAD;
+}
+

Add comments on top of this function and for the sake of consistency
update the file header as well (Summary of visibility functions:)

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Dmitriy Sarafannikov

Date:

05 May 2017, 10:58:48

Amit, thanks for comments!

> 1.
> +#define InitNonVacuumableSnapshot(snapshotdata)  \
> + do { \
> + (snapshotdata).satisfies = HeapTupleSatisfiesNonVacuumable; \
> + (snapshotdata).xmin = RecentGlobalDataXmin; \
> + } while(0)
> +
> 
> Can you explain and add comments why you think RecentGlobalDataXmin is
> the right to use it here?  As of now, it seems to be only used for
> pruning non-catalog tables.

Can you explain me, what value for xmin should be used there?

> 2.
> +bool
> +HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
> + Buffer buffer)
> +{
> + return HeapTupleSatisfiesVacuum(htup, snapshot->xmin, buffer)
> + != HEAPTUPLE_DEAD;
> +}
> +
> 
> Add comments on top of this function and for the sake of consistency
> update the file header as well (Summary of visibility functions:)

Yes, i will add comments and send new patch.

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Amit Kapila

Date:

06 May 2017, 15:02:10

On Fri, May 5, 2017 at 1:28 PM, Dmitriy Sarafannikov
<dsarafannikov@yandex.ru> wrote:
> Amit, thanks for comments!
>
>> 1.
>> +#define InitNonVacuumableSnapshot(snapshotdata)  \
>> + do { \
>> + (snapshotdata).satisfies = HeapTupleSatisfiesNonVacuumable; \
>> + (snapshotdata).xmin = RecentGlobalDataXmin; \
>> + } while(0)
>> +
>>
>> Can you explain and add comments why you think RecentGlobalDataXmin is
>> the right to use it here?  As of now, it seems to be only used for
>> pruning non-catalog tables.
>
> Can you explain me, what value for xmin should be used there?
>

I think we can use RecentGlobalDataXmin for non-catalog relations and
RecentGlobalXmin for catalog relations (probably a check similar to
what we have in heap_page_prune_opt).


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Dmitriy Sarafannikov

Date:

08 May 2017, 16:00:18

I think we can use RecentGlobalDataXmin for non-catalog relations and
RecentGlobalXmin for catalog relations (probably a check similar to
what we have in heap_page_prune_opt).

I took check from heap_page_prune_opt (Maybe this check must be present as separate function?)

But it requires to initialize snapshot for specific relation. Maybe we need to use GetOldestXmin()?

New version of patch is attached.

Attachment

snapshot_non_vacuumable_v2.patch

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Amit Kapila

Date:

08 May 2017, 16:47:04

On Mon, May 8, 2017 at 6:30 PM, Dmitriy Sarafannikov
<dsarafannikov@yandex.ru> wrote:
>
> I think we can use RecentGlobalDataXmin for non-catalog relations and
> RecentGlobalXmin for catalog relations (probably a check similar to
> what we have in heap_page_prune_opt).
>
>
> I took check from heap_page_prune_opt (Maybe this check must be present as
> separate function?)
> But it requires to initialize snapshot for specific relation.
>

+ else \
+ (snapshotdata).xmin = \
+ TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin, \
+ relation); \

I think we don't need to use TransactionIdLimitedForOldSnapshots() as
that is required to override xmin for table vacuum/pruning purposes.

> Maybe we need
> to use GetOldestXmin()?

It is a costly call as it needs ProcArrayLock, so I don't think it
makes sense to use it here.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Dmitriy Sarafannikov

Date:

08 May 2017, 17:23:47

> + else \
> + (snapshotdata).xmin = \
> + TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin, \
> + relation); \
> 
> I think we don't need to use TransactionIdLimitedForOldSnapshots() as
> that is required to override xmin for table vacuum/pruning purposes.
> 
>> Maybe we need
>> to use GetOldestXmin()?
> 
> It is a costly call as it needs ProcArrayLock, so I don't think it
> makes sense to use it here.

Ok, i agree. Patch is attached.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

snapshot_non_vacuumable_v3.patch

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Dmitriy Sarafannikov

Date:

12 May 2017, 19:12:49

> Ok, i agree. Patch is attached.

I added a patch to the CF

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Andrey Borodin

Date:

18 August 2017, 09:50:39

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       tested, failed
Spec compliant:           tested, failed
Documentation:            tested, failed

Hi! I've looked into the patch.

There is not so much of the code. The code itself is pretty clear and well documented. I've found just one small typo
"transactionn"in HeapTupleSatisfiesNonVacuumable() comments.
 

Chosen Snapshot type is not a solution to the author's problem at hand. Though implemented SnapshotNonVacuumable is
closerto rational choice  than currently used SnapshotDirty for range sketching. As far as I can see this is what is
get_actual_variable_range()is used for, despite word "actual" in it's name. 
 
The only place where get_actual_variable_range() is used for actual range is preprocessed-out in
get_variable_range():/** XXX It's very tempting to try to use the actual column min and max, if * we can get them
relatively-cheaplywith an index probe.  However, since * this function is called many times during join planning, that
could* have unpleasant effects on planning speed.  Need more investigation * before enabling this. */
 
#ifdef NOT_USEDif (get_actual_variable_range(root, vardata, sortop, min, max))    return true;
#endif

I think it would be good if we could have something like "give me actual values, but only if it is on first index
page",like conditional locks. But I think this patch is a step to right direction.
 

Best regards, Andrey Borodin.

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Andrey Borodin

Date:

18 August 2017, 11:11:40

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed

Oops, missed those checkboxes. Sorry for the noise. Here's correct checklist: installcheck-world passes, documented as
expected.Feature is there.
 

Best regards, Andrey Borodin.

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Daniel Gustafsson

Date:

05 September 2017, 13:15:52

> On 18 Aug 2017, at 08:50, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:       tested, failed
> Spec compliant:           tested, failed
> Documentation:            tested, failed
>
> Hi! I've looked into the patch.
>
> There is not so much of the code. The code itself is pretty clear and well documented. I've found just one small typo
"transactionn"in HeapTupleSatisfiesNonVacuumable() comments. 
>
> Chosen Snapshot type is not a solution to the author's problem at hand. Though implemented SnapshotNonVacuumable is
closerto rational choice  than currently used SnapshotDirty for range sketching. As far as I can see this is what is
get_actual_variable_range()is used for, despite word "actual" in it's name.  
> The only place where get_actual_variable_range() is used for actual range is preprocessed-out in
get_variable_range():
>     /*
>      * XXX It's very tempting to try to use the actual column min and max, if
>      * we can get them relatively-cheaply with an index probe.  However, since
>      * this function is called many times during join planning, that could
>      * have unpleasant effects on planning speed.  Need more investigation
>      * before enabling this.
>      */
> #ifdef NOT_USED
>     if (get_actual_variable_range(root, vardata, sortop, min, max))
>         return true;
> #endif
>
> I think it would be good if we could have something like "give me actual values, but only if it is on first index
page",like conditional locks. But I think this patch is a step to right direction. 

Thanks for your review!  Based on your review and conclusions, I’ve bumped the
status to “Ready for Committer”. If you believe another status would be more
appropriate, please go in an update.

cheers ./daniel

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Tom Lane

Date:

08 September 2017, 00:30:57

Dmitriy Sarafannikov <dsarafannikov@yandex.ru> writes:
> [ snapshot_non_vacuumable_v3.patch ]

Starting to look at this.  I think that the business with choosing
RecentGlobalXmin vs. RecentGlobalDataXmin is just wrong.  What we
want to do is accept any tuple that would not be considered killable
by heap_hot_search_buffer, and that looks only at RecentGlobalXmin.

It's possible that it'd be sensible for heap_hot_search_buffer to
use RecentGlobalDataXmin when dealing with a non-catalog table,
allowing it to consider more tuples killable.  But I'm not sure;
it looks like the decision which to use costs some effort, which
we probably don't want to be spending in heap_hot_search_buffer.
In any case that would be a different patch topic.

With the patch as it stands, if we have a user-data tuple that is dead
more recently than RecentGlobalXmin but not more recently than
RecentGlobalDataXmin, the snapshot will reject it so the planner's
indexscan will keep searching.  But heap_hot_search_buffer won't
tell the index AM to kill the tuple, so the index entry stays marked
as live, which means the next get_actual_variable_range call will have
to scan past it again.  So we don't get the hoped-for benefit with
tuple deaths in that range.  I do not know whether this effect might
explain your finding that the patch didn't entirely fix your performance
problem.

In short I think we should just set up the threshold as RecentGlobalXmin.
However, this seems like a decision that might be pretty specific to
get_actual_variable_range's use-case, so I'm inclined to have the
threshold be chosen by get_actual_variable_range itself not hard-wired
into InitNonVacuumableSnapshot.

Another point is that I think it might be possible for RecentGlobalXmin
to advance after get_actual_variable_range looks at it (for example,
if we have to take a new catalog snapshot during index scan setup).
This creates the possibility that the snapshot accepts tuples that
heap_hot_search_buffer would consider dead.  That seems fairly harmless
though, so I'm inclined not to fuss about it.  We could arrange for
HeapTupleSatisfiesNonVacuumable to use RecentGlobalXmin directly,
but that seems like it makes it too special-purpose.
        regards, tom lane

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From

Tom Lane

Date:

08 September 2017, 02:43:11

I wrote:
> Dmitriy Sarafannikov <dsarafannikov@yandex.ru> writes:
>> [ snapshot_non_vacuumable_v3.patch ]

> In short I think we should just set up the threshold as RecentGlobalXmin.

Pushed with that adjustment and some fooling with the comments.
        regards, tom lane