Thread: Slow planning time for simple query
Hello -
We have an odd scenario on one of our OLTP systems, which behaves the same way on a streamer, of a 700-1000ms planning time for a query like this:
SELECT *
FROM table1
WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1);
The actual execution time is sub-ms.
We initially thought maybe catalog bloat? But we were able to reindex all of the highly churned catalog tables, and I even did VACUUM FULL on pg_attribute and pg_statistic, to no avail.
There are no custom settings for pg_attribute for the given tables either.
Interestingly, the problem goes away on a SAN snapshot of the target system.
Any ideas of what else we could try? A PL function that caches the query plan works, but that is just a workaround.
Thanks!
Jeremy
Jeremy Finzel <finzelj@gmail.com> writes: > We have an odd scenario on one of our OLTP systems, which behaves the same > way on a streamer, of a 700-1000ms planning time for a query like this: > SELECT * > FROM table1 > WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1); Hm. Is this the first query executed in a session? If so maybe it's got something to do with populating caches and other session spin-up overhead. Another theory is it's some sort of locking issue. Turning on log_lock_waits, and setting deadlock_timeout to say 100ms, would help in investigating that. regards, tom lane
On Wed, Jun 6, 2018 at 12:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeremy Finzel <finzelj@gmail.com> writes:
> We have an odd scenario on one of our OLTP systems, which behaves the same
> way on a streamer, of a 700-1000ms planning time for a query like this:
> SELECT *
> FROM table1
> WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1);
Hm. Is this the first query executed in a session? If so maybe it's
got something to do with populating caches and other session spin-up
overhead.
Another theory is it's some sort of locking issue. Turning on
log_lock_waits, and setting deadlock_timeout to say 100ms, would help
in investigating that.
regards, tom lane
I have run it over and over with no improvement in the planning time, so I don't thing it's first in session-related. I can only make it faster with a pl function so far.
We have log_lock_waits on and nothing shows, and turning down deadlock_timeout also doesn't do anything.
Thanks,
Jeremy
On Wed, Jun 6, 2018 at 1:13 PM, Jeremy Finzel <finzelj@gmail.com> wrote:
On Wed, Jun 6, 2018 at 12:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Jeremy Finzel <finzelj@gmail.com> writes:
> We have an odd scenario on one of our OLTP systems, which behaves the same
> way on a streamer, of a 700-1000ms planning time for a query like this:
> SELECT *
> FROM table1
> WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1);
Hm. Is this the first query executed in a session? If so maybe it's
got something to do with populating caches and other session spin-up
overhead.
Another theory is it's some sort of locking issue. Turning on
log_lock_waits, and setting deadlock_timeout to say 100ms, would help
in investigating that.
regards, tom laneI have run it over and over with no improvement in the planning time, so I don't thing it's first in session-related. I can only make it faster with a pl function so far.We have log_lock_waits on and nothing shows, and turning down deadlock_timeout also doesn't do anything.Thanks,Jeremy
Forgot to mention, this is running on this version:
PostgreSQL 9.6.8 on x86_64-pc-linux-gnu (Ubuntu 9.6.8-1.pgdg16.04+1), compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64-bit
Thanks,
Jeremy
2018-06-06 18:59 GMT+02:00 Jeremy Finzel <finzelj@gmail.com>:
Hello -We have an odd scenario on one of our OLTP systems, which behaves the same way on a streamer, of a 700-1000ms planning time for a query like this:SELECT *FROM table1WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1);The actual execution time is sub-ms.We initially thought maybe catalog bloat? But we were able to reindex all of the highly churned catalog tables, and I even did VACUUM FULL on pg_attribute and pg_statistic, to no avail.There are no custom settings for pg_attribute for the given tables either.Interestingly, the problem goes away on a SAN snapshot of the target system.Any ideas of what else we could try? A PL function that caches the query plan works, but that is just a workaround.
more times I seen similar issue based on bloated indexes on table - pg in planning time detect min max from possible indexes
Regards
Pavel
Thanks!Jeremy
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2018-06-06 18:59 GMT+02:00 Jeremy Finzel <finzelj@gmail.com>: >> We have an odd scenario on one of our OLTP systems, which behaves the same >> way on a streamer, of a 700-1000ms planning time for a query like this: >> SELECT * >> FROM table1 >> WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1); > more times I seen similar issue based on bloated indexes on table - pg in > planning time detect min max from possible indexes Oh, hmm, yeah it could be ye olde get_actual_variable_range() issue. When this happens, are there perhaps a lot of recently-dead rows at either extreme of the range of table1.source_id or table2.id? We made a fix last year to improve that: https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3ca930fc3 but it wasn't back-patched. regards, tom lane
On 06/06/2018 09:59 AM, Jeremy Finzel wrote: > Hello - > > We have an odd scenario on one of our OLTP systems, which behaves the > same way on a streamer, of a 700-1000ms planning time for a query like this: > > SELECT * > FROM table1 > WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1); > > The actual execution time is sub-ms. I am trying to sort out the various references to times so could you: Show the EXPLAIN ANALYZE? and ? psql> \timing psql> SELECT * FROM table1 WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1); > > We initially thought maybe catalog bloat? But we were able to reindex > all of the highly churned catalog tables, and I even did VACUUM FULL on > pg_attribute and pg_statistic, to no avail. > > There are no custom settings for pg_attribute for the given tables either. > > Interestingly, the problem goes away on a SAN snapshot of the target system. > > Any ideas of what else we could try? A PL function that caches the > query plan works, but that is just a workaround. > > Thanks! > Jeremy -- Adrian Klaver adrian.klaver@aklaver.com
Tom Lane <tgl@sss.pgh.pa.us> writes: > Pavel Stehule <pavel.stehule@gmail.com> writes: > >> 2018-06-06 18:59 GMT+02:00 Jeremy Finzel <finzelj@gmail.com>: >>> We have an odd scenario on one of our OLTP systems, which behaves the same >>> way on a streamer, of a 700-1000ms planning time for a query like this: >>> SELECT * >>> FROM table1 >>> WHERE source_id IN (SELECT id FROM table2 WHERE customer_id = $1); > >> more times I seen similar issue based on bloated indexes on table - pg in >> planning time detect min max from possible indexes > > Oh, hmm, yeah it could be ye olde get_actual_variable_range() issue. > When this happens, are there perhaps a lot of recently-dead rows at either > extreme of the range of table1.source_id or table2.id? We noticed the cluster of interest had a rogue physical rep slot holding 71k WAL segments. Dropping same slot seemed to correlate with the problem going away. Does that sound like a plausible explanation for the observed slow planning times? Thx > We made a fix last year to improve that: > https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3ca930fc3 > but it wasn't back-patched. > > regards, tom lane > > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net p: 312.241.7800
Jerry Sievers <gsievers19@comcast.net> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Oh, hmm, yeah it could be ye olde get_actual_variable_range() issue. >> When this happens, are there perhaps a lot of recently-dead rows at either >> extreme of the range of table1.source_id or table2.id? > We noticed the cluster of interest had a rogue physical rep slot holding > 71k WAL segments. > Dropping same slot seemed to correlate with the problem going away. > Does that sound like a plausible explanation for the observed slow > planning times? I believe the slot would hold back global xmin and thereby prevent "recently-dead" rows from becoming just plain "dead", so yeah, this observation does seem to square with the get_actual_variable_range theory. You'd still need to posit that something had recently deleted a lot of rows at the end of the range of one of those columns, though. regards, tom lane
On hot standby I faced with the similar problem. The following simple query: SELECT array_to_json(array_agg(t)) from ( select * from main.message m join main.message_instance mi on m.message_id = mi.message_id join main.channel_type ct on mi.channel_type_id = ct.channel_type_id where m.user_id = 2152595 and ct.name = 'android' and m.user_delete_ts is null order by case when read_ts is null then 0 else 1 end, m.message_id desc limit 100 ) t; is planned 4.940 ms on master and *254.741* ms on standby. I tried to investigate the reasons of so long planning on replica and determined that index accesses on planning stage come to multiple heap scans. Here is the execution plan statistics of query above: Aggregate (actual time=0.641..0.642 rows=1 loops=1) Buffers: shared hit=14 -> Subquery Scan on t (actual time=0.337..0.360 rows=2 loops=1) Buffers: shared hit=14 -> Limit (actual time=0.292..0.301 rows=2 loops=1) Buffers: shared hit=14 -> Sort (actual time=0.287..0.291 rows=2 loops=1) Sort Key: (CASE WHEN (m.read_ts IS NULL) THEN 0 ELSE 1 END), m.message_id DESC Sort Method: quicksort Memory: 27kB Buffers: shared hit=14 -> Nested Loop (actual time=0.157..0.219 rows=2 loops=1) Buffers: shared hit=14 -> Seq Scan on channel_type ct (actual time=0.043..0.048 rows=1 loops=1) Filter: (name = 'android'::text) Rows Removed by Filter: 7 Buffers: shared hit=1 -> Nested Loop (actual time=0.098..0.148 rows=2 loops=1) Buffers: shared hit=13 -> Index Scan using message_user_id_idx1 on message m (actual time=0.055..0.063 rows=2 loops=1) Index Cond: (user_id = 2152595) Buffers: shared hit=5 -> Index Scan using message_instance_pkey on message_instance mi (actual time=0.026..0.029 rows=1 loops=2) Index Cond: ((message_id = m.message_id) AND (channel_type_id = ct.channel_type_id)) Buffers: shared hit=8 The 14 accesses to buffer cache under query execution. But pg_statio_user_tables and pg_statio_user_indexes views show the different picture: select schemaname, relname, indexrelname, idx_blks_hit, idx_blks_read, pg_size_pretty(pg_relation_size(indexrelid::regclass)) from pg_statio_user_indexes where idx_blks_hit + idx_blks_read > 0; schemaname | relname | indexrelname | idx_blks_hit | idx_blks_read | pg_size_pretty ------------+------------------+-----------------------+--------------+---------------+---------------- main | channel_type | channel_type_pkey | 2 | 0 | 16 kB main | message_instance | message_instance_pkey | 666 | 0 | 345 MB main | message | message_pkey | 56 | 0 | 53 MB main | message | message_user_id_idx1 | 3 | 0 | 17 MB select schemaname, relname, heap_blks_read, heap_blks_hit from pg_statio_user_tables where heap_blks_read + heap_blks_hit + idx_blks_hit + idx_blks_read > 0; schemaname | relname | heap_blks_read | heap_blks_hit ------------+------------------+----------------+--------------- main | channel_type | 0 | 3 main | message | 0 | 8682 main | message_instance | 0 | 114922 (This experiment is carried out on test hot standby without parallel activities and with cleared system statistics) The vacuum on problem tables (main.message and main.message_instance) on master node resolves the problem somehow but its often execution slows down all queries and generally increases IO. Is there any case to overcome the problem or it's fundamental issue and necessary to rewrite the query to simplify planning? -- Regards, Maksim Milyutin
On 09.06.2018 21:49, Maksim Milyutin wrote: > On hot standby I faced with the similar problem. Sorry, the problem in question is described here https://www.postgresql.org/message-id/22136.1528312205%40sss.pgh.pa.us -- Regards, Maksim Milyutin
Maksim Milyutin <milyutinma@gmail.com> writes: > On hot standby I faced with the similar problem. > ... > is planned 4.940 ms on master and *254.741* ms on standby. Presumably the problem is that the standby isn't authorized to change the btree index's "entry is dead" bits, so it's unable to prune index entries previously detected as dead, and thus the logic that intends to improve this situation doesn't work on the standby. (I wonder though why, if you executed the same query on the master, its setting of the index-entry-is-dead bits didn't propagate to the standby.) I wonder if we should extend the "SnapshotNonVacuumable" logic introduced in commit 3ca930fc3 so that in hot standby, *all* index entries are deemed non vacuumable. This would essentially get rid of long standby planning times in this sort of scenario by instead accepting worse (possibly much worse) planner range estimates. I'm unsure if that's a good tradeoff or not. regards, tom lane
Maksim Milyutin <milyutinma@gmail.com> writes: > On hot standby I faced with the similar problem. > ... > is planned 4.940 ms on master and *254.741* ms on standby. Presumably the problem is that the standby isn't authorized to change the btree index's "entry is dead" bits, so it's unable to prune index entries previously detected as dead, and thus the logic that intends to improve this situation doesn't work on the standby. (I wonder though why, if you executed the same query on the master, its setting of the index-entry-is-dead bits didn't propagate to the standby.) I wonder if we should extend the "SnapshotNonVacuumable" logic introduced in commit 3ca930fc3 so that in hot standby, *all* index entries are deemed non vacuumable. This would essentially get rid of long standby planning times in this sort of scenario by instead accepting worse (possibly much worse) planner range estimates. I'm unsure if that's a good tradeoff or not. regards, tom lane
On 09.06.2018 22:49, Tom Lane wrote:
Maksim Milyutin <milyutinma@gmail.com> writes:On hot standby I faced with the similar problem. ... is planned 4.940 ms on master and *254.741* ms on standby.(I wonder though why, if you executed the same query on the master, its setting of the index-entry-is-dead bits didn't propagate to the standby.)
I have verified the number dead item pointers (through pageinspect extension) in the first leaf page of index participating in query ('main.message_instance_pkey') on master and slave nodes and have noticed a big difference.
SELECT * FROM monitoring.bt_page_stats('main.message_instance_pkey', 3705);
On master:
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
3705 | l | 1 | 58 | 24 | 8192 | 6496 | 0 | 3719 | 0 | 65
On standby:
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
3705 | l | 59 | 0 | 24 | 8192 | 6496 | 0 | 3719 | 0 | 1
The vacuum routine improves the situation.
Сan there be something that I have incorrectly configured WAL logging or replication?
I wonder if we should extend the "SnapshotNonVacuumable" logic introduced in commit 3ca930fc3 so that in hot standby, *all* index entries are deemed non vacuumable. This would essentially get rid of long standby planning times in this sort of scenario by instead accepting worse (possibly much worse) planner range estimates. I'm unsure if that's a good tradeoff or not.
I applied the patch introduced in this commit to test standby (not master; I don't know if this is correct) and haven't noticed any differences.
-- Regards, Maksim Milyutin
On 09.06.2018 22:49, Tom Lane wrote:
Maksim Milyutin <milyutinma@gmail.com> writes:On hot standby I faced with the similar problem. ... is planned 4.940 ms on master and *254.741* ms on standby.(I wonder though why, if you executed the same query on the master, its setting of the index-entry-is-dead bits didn't propagate to the standby.)
I have verified the number dead item pointers (through pageinspect extension) in the first leaf page of index participating in query ('main.message_instance_pkey') on master and slave nodes and have noticed a big difference.
SELECT * FROM monitoring.bt_page_stats('main.message_instance_pkey', 3705);
On master:
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
3705 | l | 1 | 58 | 24 | 8192 | 6496 | 0 | 3719 | 0 | 65
On standby:
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
3705 | l | 59 | 0 | 24 | 8192 | 6496 | 0 | 3719 | 0 | 1
The vacuum routine improves the situation.
Сan there be something that I have incorrectly configured WAL logging or replication?
I wonder if we should extend the "SnapshotNonVacuumable" logic introduced in commit 3ca930fc3 so that in hot standby, *all* index entries are deemed non vacuumable. This would essentially get rid of long standby planning times in this sort of scenario by instead accepting worse (possibly much worse) planner range estimates. I'm unsure if that's a good tradeoff or not.
I applied the patch introduced in this commit to test standby (not master; I don't know if this is correct) and haven't noticed any differences.
-- Regards, Maksim Milyutin
13.06.2018 12:40, Maksim Milyutin wrote:
On 09.06.2018 22:49, Tom Lane wrote:
Maksim Milyutin <milyutinma@gmail.com> writes:On hot standby I faced with the similar problem. ... is planned 4.940 ms on master and *254.741* ms on standby.(I wonder though why, if you executed the same query on the master, its setting of the index-entry-is-dead bits didn't propagate to the standby.)
I have verified the number dead item pointers (through pageinspect extension) in the first leaf page of index participating in query ('main.message_instance_pkey') on master and slave nodes and have noticed a big difference.
SELECT * FROM monitoring.bt_page_stats('main.message_instance_pkey', 3705);
On master:
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
3705 | l | 1 | 58 | 24 | 8192 | 6496 | 0 | 3719 | 0 | 65
On standby:
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
3705 | l | 59 | 0 | 24 | 8192 | 6496 | 0 | 3719 | 0 | 1
In this point I want to highlight the issue that the changes in lp_flags bits (namely, set items as dead) for index item pointers doesn't propagate from master to replica in my case. As a consequence, on standby I have live index items most of which on master are marked as dead. And my queries on planning stage are forced to descent to heap pages under get_actual_variable_range execution that considerately slows down planning.
Is it bug or restriction of implementation or misconfiguration of WAL/replication?
-- Regards, Maksim Milyutin
13.06.2018 12:40, Maksim Milyutin wrote:
On 09.06.2018 22:49, Tom Lane wrote:
Maksim Milyutin <milyutinma@gmail.com> writes:On hot standby I faced with the similar problem. ... is planned 4.940 ms on master and *254.741* ms on standby.(I wonder though why, if you executed the same query on the master, its setting of the index-entry-is-dead bits didn't propagate to the standby.)
I have verified the number dead item pointers (through pageinspect extension) in the first leaf page of index participating in query ('main.message_instance_pkey') on master and slave nodes and have noticed a big difference.
SELECT * FROM monitoring.bt_page_stats('main.message_instance_pkey', 3705);
On master:
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
3705 | l | 1 | 58 | 24 | 8192 | 6496 | 0 | 3719 | 0 | 65
On standby:
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
3705 | l | 59 | 0 | 24 | 8192 | 6496 | 0 | 3719 | 0 | 1
In this point I want to highlight the issue that the changes in lp_flags bits (namely, set items as dead) for index item pointers doesn't propagate from master to replica in my case. As a consequence, on standby I have live index items most of which on master are marked as dead. And my queries on planning stage are forced to descent to heap pages under get_actual_variable_range execution that considerately slows down planning.
Is it bug or restriction of implementation or misconfiguration of WAL/replication?
-- Regards, Maksim Milyutin
On Sun, Jun 10, 2018 at 1:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Maksim Milyutin <milyutinma@gmail.com> writes: >> On hot standby I faced with the similar problem. >> ... >> is planned 4.940 ms on master and *254.741* ms on standby. > > Presumably the problem is that the standby isn't authorized to change > the btree index's "entry is dead" bits, > I don't see anything like that in the code. We use _bt_killitems to mark the items as dead and neither that function or any of its caller has any such assumption. > so it's unable to prune index > entries previously detected as dead, and thus the logic that intends > to improve this situation doesn't work on the standby. > If my above understanding is correct, then one thing that could lead to such behavior is the computation of RecentGlobalXmin on standby. Basically, if the RecentGlobalXmin has a different value on standby, then it is possible that the decision whether a particular item is dead differs on master and standby. > (I wonder though why, if you executed the same query on the master, > its setting of the index-entry-is-dead bits didn't propagate to the > standby.) > Because we don't WAL log it. See _bt_killitems. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sun, Jun 10, 2018 at 1:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Maksim Milyutin <milyutinma@gmail.com> writes: >> On hot standby I faced with the similar problem. >> ... >> is planned 4.940 ms on master and *254.741* ms on standby. > > Presumably the problem is that the standby isn't authorized to change > the btree index's "entry is dead" bits, > I don't see anything like that in the code. We use _bt_killitems to mark the items as dead and neither that function or any of its caller has any such assumption. > so it's unable to prune index > entries previously detected as dead, and thus the logic that intends > to improve this situation doesn't work on the standby. > If my above understanding is correct, then one thing that could lead to such behavior is the computation of RecentGlobalXmin on standby. Basically, if the RecentGlobalXmin has a different value on standby, then it is possible that the decision whether a particular item is dead differs on master and standby. > (I wonder though why, if you executed the same query on the master, > its setting of the index-entry-is-dead bits didn't propagate to the > standby.) > Because we don't WAL log it. See _bt_killitems. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Jun 14, 2018 at 4:34 AM, Maksim Milyutin <milyutinma@gmail.com> wrote: > 13.06.2018 12:40, Maksim Milyutin wrote: > > On 09.06.2018 22:49, Tom Lane wrote: > > Maksim Milyutin <milyutinma@gmail.com> writes: > > On hot standby I faced with the similar problem. > ... > is planned 4.940 ms on master and *254.741* ms on standby. > > (I wonder though why, if you executed the same query on the master, > its setting of the index-entry-is-dead bits didn't propagate to the > standby.) > > > I have verified the number dead item pointers (through pageinspect > extension) in the first leaf page of index participating in query > ('main.message_instance_pkey') on master and slave nodes and have noticed a > big difference. > > SELECT * FROM monitoring.bt_page_stats('main.message_instance_pkey', 3705); > > On master: > > blkno | type | live_items | dead_items | avg_item_size | page_size | > free_size | btpo_prev | btpo_next | btpo | btpo_flags > -------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------ > 3705 | l | 1 | 58 | 24 | 8192 | > 6496 | 0 | 3719 | 0 | 65 > > On standby: > > blkno | type | live_items | dead_items | avg_item_size | page_size | > free_size | btpo_prev | btpo_next | btpo | btpo_flags > -------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------ > 3705 | l | 59 | 0 | 24 | 8192 | > 6496 | 0 | 3719 | 0 | 1 > > > > In this point I want to highlight the issue that the changes in lp_flags > bits (namely, set items as dead) for index item pointers doesn't propagate > from master to replica in my case. As a consequence, on standby I have live > index items most of which on master are marked as dead. And my queries on > planning stage are forced to descent to heap pages under > get_actual_variable_range execution that considerately slows down planning. > > Is it bug or restriction of implementation or misconfiguration of > WAL/replication? > It is not a misconfiguration issue. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Jun 14, 2018 at 4:34 AM, Maksim Milyutin <milyutinma@gmail.com> wrote: > 13.06.2018 12:40, Maksim Milyutin wrote: > > On 09.06.2018 22:49, Tom Lane wrote: > > Maksim Milyutin <milyutinma@gmail.com> writes: > > On hot standby I faced with the similar problem. > ... > is planned 4.940 ms on master and *254.741* ms on standby. > > (I wonder though why, if you executed the same query on the master, > its setting of the index-entry-is-dead bits didn't propagate to the > standby.) > > > I have verified the number dead item pointers (through pageinspect > extension) in the first leaf page of index participating in query > ('main.message_instance_pkey') on master and slave nodes and have noticed a > big difference. > > SELECT * FROM monitoring.bt_page_stats('main.message_instance_pkey', 3705); > > On master: > > blkno | type | live_items | dead_items | avg_item_size | page_size | > free_size | btpo_prev | btpo_next | btpo | btpo_flags > -------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------ > 3705 | l | 1 | 58 | 24 | 8192 | > 6496 | 0 | 3719 | 0 | 65 > > On standby: > > blkno | type | live_items | dead_items | avg_item_size | page_size | > free_size | btpo_prev | btpo_next | btpo | btpo_flags > -------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------ > 3705 | l | 59 | 0 | 24 | 8192 | > 6496 | 0 | 3719 | 0 | 1 > > > > In this point I want to highlight the issue that the changes in lp_flags > bits (namely, set items as dead) for index item pointers doesn't propagate > from master to replica in my case. As a consequence, on standby I have live > index items most of which on master are marked as dead. And my queries on > planning stage are forced to descent to heap pages under > get_actual_variable_range execution that considerately slows down planning. > > Is it bug or restriction of implementation or misconfiguration of > WAL/replication? > It is not a misconfiguration issue. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
>>>>> "Amit" == Amit Kapila <amit.kapila16@gmail.com> writes: >> Presumably the problem is that the standby isn't authorized to change >> the btree index's "entry is dead" bits, Amit> I don't see anything like that in the code. We use _bt_killitems Amit> to mark the items as dead and neither that function or any of its Amit> caller has any such assumption. See index_fetch_heap: /* * If we scanned a whole HOT chain and found only dead tuples, tell index * AM to kill its entry for that TID (this will take effect in the next * amgettuple call, in index_getnext_tid). We do not do this when in * recovery because it may violate MVCC to do so. See comments in * RelationGetIndexScan(). */ if (!scan->xactStartedInRecovery) scan->kill_prior_tuple = all_dead; (this is the only place where kill_prior_tuple can be set to true) -- Andrew (irc:RhodiumToad)
>>>>> "Amit" == Amit Kapila <amit.kapila16@gmail.com> writes: >> Presumably the problem is that the standby isn't authorized to change >> the btree index's "entry is dead" bits, Amit> I don't see anything like that in the code. We use _bt_killitems Amit> to mark the items as dead and neither that function or any of its Amit> caller has any such assumption. See index_fetch_heap: /* * If we scanned a whole HOT chain and found only dead tuples, tell index * AM to kill its entry for that TID (this will take effect in the next * amgettuple call, in index_getnext_tid). We do not do this when in * recovery because it may violate MVCC to do so. See comments in * RelationGetIndexScan(). */ if (!scan->xactStartedInRecovery) scan->kill_prior_tuple = all_dead; (this is the only place where kill_prior_tuple can be set to true) -- Andrew (irc:RhodiumToad)