Thread: Slow planning, fast execution for particular 3-table query
We’re having trouble working out why the planning time for this particular query is slow (~2.5s vs 0.9ms execution time). As you can see below, there are only 3 tables involved so it’s hard to imagine what decisions the planner has to make that take so long. After 5 runs the prepared-statement code kicks in and it becomes quick, but it’s quite infuriating for the first 5 runs given the execution is so much faster.
Are you able to give any tips what might be taking so long (and how we might improve it)?
We read elsewhere that someone had a “catalog stats file leak”, which I’m taking to mean a big pg_statistic table. Ours is 10mb, which doesn’t seem particularly large to me, but I don’t have much context for it. https://www.postgresql.org/message-id/CABWW-d21z_WgawkjXFQQviqm16oAx0KQvR6bLkRxvYQmhdByfg%40mail.gmail.com
Other queries (with 3 or more tables) in the same db seem to be planning much quicker.
The query:
explain (analyse) SELECT subscription_binding.subscription_binding,
subscription_binding.tid,
subscription.subscription_uuid,
subscription_binding.subscription_binding_uuid,
binding.binding_uuid,
subscription_binding.start_time,
subscription_binding.end_time,
subscription_binding.timezone,
now() >= subscription_binding.start_time AND (subscription_binding.end_time IS NULL OR now() <= subscription_binding.end_time) AS active
FROM jackpot.binding
JOIN jackpot.subscription_binding USING (tid, binding)
JOIN jackpot.subscription USING (tid, subscription)
where (tid=2082003407) AND (binding_uuid='4f61dcd5-97a0-4098-b9ae-c1546c31b2e6'::uuid) offset 0 limit 1000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.29..25.38 rows=1 width=80) (actual time=0.770..0.771 rows=1 loops=1)
-> Nested Loop (cost=1.29..25.38 rows=1 width=80) (actual time=0.770..0.771 rows=1 loops=1)
-> Nested Loop (cost=0.86..16.91 rows=1 width=76) (actual time=0.697..0.698 rows=1 loops=1)
-> Index Scan using binding_tid_binding_uuid_key on binding (cost=0.43..8.45 rows=1 width=28) (actual time=0.647..0.647 rows=1 loops=1)
Index Cond: ((tid = 2082003407) AND (binding_uuid = '4f61dcd5-97a0-4098-b9ae-c1546c31b2e6'::uuid))
-> Index Scan using subscription_binding_idx on subscription_binding (cost=0.43..8.45 rows=1 width=64) (actual time=0.045..0.046 rows=1 loops=1)
Index Cond: ((tid = 2082003407) AND (binding = binding.binding))
-> Index Scan using subscription_pkey on subscription (cost=0.43..8.45 rows=1 width=28) (actual time=0.068..0.068 rows=1 loops=1)
Index Cond: ((tid = 2082003407) AND (subscription = subscription_binding.subscription))
Planning time: 2429.682 ms
Execution time: 0.914 ms
(11 rows)
Postgres version 9.5.19
Each of the tables has between 3-4 indexes, and all the indexes include tid as first parameter. No partitions, no sign of a stray replication slot / uncommitted transaction / prepared transaction that may be holding up autovac, no sign of bloated indexes.
TIA!
Best regards,
David Wheeler
General Manager Bali Office
Bali T +62 361 475 2333 M +62 819 3660 9180
Jl. Pura Mertasari No. 7, Sunset Road Abian Base
Kuta, Badung – Bali 80361, Indonesia
Attachment
On Mon, 2019-11-04 at 03:04 +0000, David Wheeler wrote: > We’re having trouble working out why the planning time for this particular query is slow > (~2.5s vs 0.9ms execution time). As you can see below, there are only 3 tables involved > so it’s hard to imagine what decisions the planner has to make that take so long. After > 5 runs the prepared-statement code kicks in and it becomes quick, but it’s quite > infuriating for the first 5 runs given the execution is so much faster. > > Are you able to give any tips what might be taking so long (and how we might improve it)? > [...] > Planning time: 2429.682 ms > > Execution time: 0.914 ms Strange. Are any of your catalog tables unusually large? SELECT pg_relation_size(t.oid), t.relname FROM pg_class AS t JOIN pg_namespace AS n ON t.relnamespace = n.oid WHERE t.relkind = 'r' ORDER BY pg_relation_size(t.oid) DESC LIMIT 10; Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
I'm not sure what "unusually large" is, but they're all < 1mb which is a little larger than some of our other comparabledatabases (mostly <300kb) but seems reasonable to me. Regards, David On 4/11/19, 3:37 pm, "Laurenz Albe" <laurenz.albe@cybertec.at> wrote: On Mon, 2019-11-04 at 03:04 +0000, David Wheeler wrote: > We’re having trouble working out why the planning time for this particular query is slow > (~2.5s vs 0.9ms execution time). As you can see below, there are only 3 tables involved > so it’s hard to imagine what decisions the planner has to make that take so long. After > 5 runs the prepared-statement code kicks in and it becomes quick, but it’s quite > infuriating for the first 5 runs given the execution is so much faster. > > Are you able to give any tips what might be taking so long (and how we might improve it)? > [...] > Planning time: 2429.682 ms > > Execution time: 0.914 ms Strange. Are any of your catalog tables unusually large? SELECT pg_relation_size(t.oid), t.relname FROM pg_class AS t JOIN pg_namespace AS n ON t.relnamespace = n.oid WHERE t.relkind = 'r' ORDER BY pg_relation_size(t.oid) DESC LIMIT 10; Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
David Wheeler wrote: > I'm not sure what "unusually large" is, but they're all < 1mb which is a little larger > than some of our other comparable databases (mostly <300kb) but seems reasonable to me. I forgot the condition "AND n.nspname = 'pg_catalog'"... But if all your tables are small, catalog bloat is probably not your problem. Do you have many tables in the database? Partitioning? Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
David Wheeler <dwheeler@dgitsystems.com> writes: > We’re having trouble working out why the planning time for this > particular query is slow (~2.5s vs 0.9ms execution time). As you can see > below, there are only 3 tables involved so it’s hard to imagine what > decisions the planner has to make that take so long. I wonder whether this traces to the cost of trying to estimate the largest/smallest value of an indexed column by looking into the index. Normally that's pretty cheap, but if you have a lot of recently-inserted or recently-deleted values at the end of the index, it can get painful. AFAIR this only happens for columns that are equijoin keys, so the fact that your query is a join is significant. I'm not convinced that this is the problem, because it's a corner case that few people hit. To see this issue, you have to have recently inserted or deleted a bunch of extremal values of the indexed join-key column. And the problem only persists until those values become known committed-good, or known dead-to-everybody. (Maybe you've got a long-running transaction somewhere, postponing the dead-to-everybody condition?) > Postgres version 9.5.19 If this *is* the cause, v11 and up have a performance improvement that you need: https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3ca930fc3 regards, tom lane
> To see this issue, you have to have recently > inserted or deleted a bunch of extremal values of the indexed join-key > column. And the problem only persists until those values become known > committed-good, or known dead-to-everybody. (Maybe you've got a > long-running transaction somewhere, postponing the dead-to-everybody > condition?) There are no long-running transactions that have backend_xmin set in pg_stat_activity, if that's what you mean here. Thereare also no open prepared transactions or replication slots which I understand have a similar keeping-things-alive issue. These tables are biggish (hundreds of mb), but not changing so frequently that I'd expect large quantities of data to beinserted or deleted before autovac can get in there and clean it up. And certainly not in a single uncommitted transaction. I'll try reindexing each of the tables just to make sure it's not strange index imbalance or something causing the issue. Regards, David On 4/11/19, 4:01 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: David Wheeler <dwheeler@dgitsystems.com> writes: > We’re having trouble working out why the planning time for this > particular query is slow (~2.5s vs 0.9ms execution time). As you can see > below, there are only 3 tables involved so it’s hard to imagine what > decisions the planner has to make that take so long. I wonder whether this traces to the cost of trying to estimate the largest/smallest value of an indexed column by looking into the index. Normally that's pretty cheap, but if you have a lot of recently-inserted or recently-deleted values at the end of the index, it can get painful. AFAIR this only happens for columns that are equijoin keys, so the fact that your query is a join is significant. I'm not convinced that this is the problem, because it's a corner case that few people hit. To see this issue, you have to have recently inserted or deleted a bunch of extremal values of the indexed join-key column. And the problem only persists until those values become known committed-good, or known dead-to-everybody. (Maybe you've got a long-running transaction somewhere, postponing the dead-to-everybody condition?) > Postgres version 9.5.19 If this *is* the cause, v11 and up have a performance improvement that you need: https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3ca930fc3 regards, tom lane
> To see this issue, you have to have recently
> inserted or deleted a bunch of extremal values of the indexed join-key
> column. And the problem only persists until those values become known
> committed-good, or known dead-to-everybody. (Maybe you've got a
> long-running transaction somewhere, postponing the dead-to-everybody
> condition?)
There are no long-running transactions that have backend_xmin set in pg_stat_activity, if that's what you mean here. There are also no open prepared transactions or replication slots which I understand have a similar keeping-things-alive issue.
These tables are biggish (hundreds of mb), but not changing so frequently that I'd expect large quantities of data to be inserted or deleted before autovac can get in there and clean it up. And certainly not in a single uncommitted transaction.
I'll try reindexing each of the tables just to make sure it's not strange index imbalance or something causing the issue.
Regards,
David
On 4/11/19, 4:01 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
David Wheeler <dwheeler@dgitsystems.com> writes:
> We’re having trouble working out why the planning time for this
> particular query is slow (~2.5s vs 0.9ms execution time). As you can see
> below, there are only 3 tables involved so it’s hard to imagine what
> decisions the planner has to make that take so long.
I wonder whether this traces to the cost of trying to estimate the
largest/smallest value of an indexed column by looking into the index.
Normally that's pretty cheap, but if you have a lot of recently-inserted
or recently-deleted values at the end of the index, it can get painful.
AFAIR this only happens for columns that are equijoin keys, so the fact
that your query is a join is significant.
I'm not convinced that this is the problem, because it's a corner case
that few people hit. To see this issue, you have to have recently
inserted or deleted a bunch of extremal values of the indexed join-key
column. And the problem only persists until those values become known
committed-good, or known dead-to-everybody. (Maybe you've got a
long-running transaction somewhere, postponing the dead-to-everybody
condition?)
> Postgres version 9.5.19
If this *is* the cause, v11 and up have a performance improvement that
you need:
https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3ca930fc3
regards, tom lane
>> I'll try reindexing each of the tables just to make sure it's not strange index imbalance or something causing the issue.
> I seen this issue few time - and reindex helps.
Alas our reindex doesn’t seem to have helped. I’m going to see if we can reproduce this on a non-prod environment so we can muck about a bit more. If we can reproduce it in a safe place, is there a tool we can use to get more info out of the query planner to find what it’s doing to take so long?
Regards,
David
From: Pavel Stehule <pavel.stehule@gmail.com>
Date: Monday, 4 November 2019 at 4:53 pm
To: David Wheeler <dwheeler@dgitsystems.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, "pgsql-performance@lists.postgresql.org" <pgsql-performance@lists.postgresql.org>, Cameron Redpath <credpath@dgitsystems.com>
Subject: Re: Slow planning, fast execution for particular 3-table query
po 4. 11. 2019 v 6:17 odesílatel David Wheeler <dwheeler@dgitsystems.com> napsal:
> To see this issue, you have to have recently
> inserted or deleted a bunch of extremal values of the indexed join-key
> column. And the problem only persists until those values become known
> committed-good, or known dead-to-everybody. (Maybe you've got a
> long-running transaction somewhere, postponing the dead-to-everybody
> condition?)
There are no long-running transactions that have backend_xmin set in pg_stat_activity, if that's what you mean here. There are also no open prepared transactions or replication slots which I understand have a similar keeping-things-alive issue.
These tables are biggish (hundreds of mb), but not changing so frequently that I'd expect large quantities of data to be inserted or deleted before autovac can get in there and clean it up. And certainly not in a single uncommitted transaction.
I'll try reindexing each of the tables just to make sure it's not strange index imbalance or something causing the issue.
I seen this issue few time - and reindex helps.
Pavel
Regards,
David
On 4/11/19, 4:01 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
David Wheeler <dwheeler@dgitsystems.com> writes:
> We’re having trouble working out why the planning time for this
> particular query is slow (~2.5s vs 0.9ms execution time). As you can see
> below, there are only 3 tables involved so it’s hard to imagine what
> decisions the planner has to make that take so long.
I wonder whether this traces to the cost of trying to estimate the
largest/smallest value of an indexed column by looking into the index.
Normally that's pretty cheap, but if you have a lot of recently-inserted
or recently-deleted values at the end of the index, it can get painful.
AFAIR this only happens for columns that are equijoin keys, so the fact
that your query is a join is significant.
I'm not convinced that this is the problem, because it's a corner case
that few people hit. To see this issue, you have to have recently
inserted or deleted a bunch of extremal values of the indexed join-key
column. And the problem only persists until those values become known
committed-good, or known dead-to-everybody. (Maybe you've got a
long-running transaction somewhere, postponing the dead-to-everybody
condition?)
> Postgres version 9.5.19
If this *is* the cause, v11 and up have a performance improvement that
you need:
https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3ca930fc3
regards, tom lane
On Mon, Nov 04, 2019 at 03:04:45AM +0000, David Wheeler wrote: > Postgres version 9.5.19 > Each of the tables has between 3-4 indexes, and all the indexes include tid as first parameter. On Mon, Nov 04, 2019 at 12:00:59AM -0500, Tom Lane wrote: > If this *is* the cause, v11 and up have a performance improvement that > you need: But note that index definition will be prohibited since: https://www.postgresql.org/docs/9.6/release-9-6.html |Disallow creation of indexes on system columns, except for OID columns (David Rowley) |Such indexes were never considered supported, and would very possibly misbehave since the system might change the system-columnfields of a tuple without updating indexes. However, previously there were no error checks to prevent them frombeing created. Justin
Is default_statistics_target set above default 100? I would assume that would reflect in the size of pg_statistic, but wanted to ask since increasing that from 100 to 1000 was the only time I have seen planning time explode. Are other queries slow to plan?
Looks like you’ve found it! Someone has set the target to 10k so that’s going to wildly increase planning time.
Thanks for your help! And thanks to the others who chipped in along the way 😊
Regards,
David
On Thu, 7 Nov 2019 at 11:59, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Mon, Nov 04, 2019 at 03:04:45AM +0000, David Wheeler wrote: > > Postgres version 9.5.19 > > Each of the tables has between 3-4 indexes, and all the indexes include tid as first parameter. > But note that index definition will be prohibited since: > > https://www.postgresql.org/docs/9.6/release-9-6.html > |Disallow creation of indexes on system columns, except for OID columns (David Rowley) > |Such indexes were never considered supported, and would very possibly misbehave since the system might change the system-columnfields of a tuple without updating indexes. However, previously there were no error checks to prevent them frombeing created. David will have meant the user column named "tid" rather than the system column named "ctid". -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Nov 07, 2019 at 01:15:30PM +1300, David Rowley wrote: > On Thu, 7 Nov 2019 at 11:59, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > On Mon, Nov 04, 2019 at 03:04:45AM +0000, David Wheeler wrote: > > > Postgres version 9.5.19 > > > Each of the tables has between 3-4 indexes, and all the indexes include tid as first parameter. > > > But note that index definition will be prohibited since: > > > > https://www.postgresql.org/docs/9.6/release-9-6.html > > |Disallow creation of indexes on system columns, except for OID columns (David Rowley) > > |Such indexes were never considered supported, and would very possibly misbehave since the system might change the system-columnfields of a tuple without updating indexes. However, previously there were no error checks to prevent them frombeing created. > > David will have meant the user column named "tid" rather than the > system column named "ctid". Ah. And David must have meant David W :) Justin