Thread: Advice needed: query performance deteriorates by 2000% within 1 minute
I am seeking for some guidance with the following case that our developers have thrown at me and I apologize in advance for this lengthy mail ...
$> postgres --version
postgres (PostgreSQL) 13.6
We are dealing with the following issue:
select version, content from orderstore.order
WHERE jsonb_to_tsvector('english', content, '["all"]') @@ websearch_to_tsquery('english', '1.20709841') limit 10 ;
The Devs told me that this query normally finishes within a reasonable amount of time (<1sec) but every day - and all of a sudden - performance worsens to execution times > 20sec.
"When we change the query to 'limit 100' it runs fast again"
"When we execute a 'vacuum orderstore.order' everything becomes good again - but that only lasts for a few hours"
So I scheduled a little script to be executed every minute which contains 3 explains.
1 query with limit 10
1 query with limit 100
1 query with the limit-clause omitted
And here's a quick grep of the result after a few hours:
...
Execution Time: 1.413 ms <= limit 10
Execution Time: 0.389 ms <= limit 100
Execution Time: 0.297 ms <= limit clause omitted
Execution Time: 1.456 ms
Execution Time: 0.396 ms
Execution Time: 0.302 ms
Execution Time: 1.412 ms
Execution Time: 0.428 ms
Execution Time: 0.255 ms
Execution Time: 1.404 ms
Execution Time: 0.403 ms
Execution Time: 0.258 ms
Execution Time: 25588.448 ms <= limit 10
Execution Time: 0.919 ms <= limit 100
Execution Time: 0.453 ms <= limit clause omitted
Execution Time: 25657.524 ms
Execution Time: 0.965 ms
Execution Time: 0.452 ms
Execution Time: 25843.139 ms
Execution Time: 0.959 ms
Execution Time: 0.446 ms
Execution Time: 25631.389 ms
Execution Time: 0.946 ms
Execution Time: 0.447 ms
Execution Time: 25452.764 ms
Execution Time: 0.937 ms
Execution Time: 0.444 ms
<= here I manually vacuumed the table
Execution Time: 0.071 ms
Execution Time: 0.021 ms
Execution Time: 0.015 ms
Execution Time: 0.023 ms
Execution Time: 0.017 ms
Execution Time: 0.064 ms
Execution Time: 0.021 ms
Execution Time: 0.015 ms
Execution Time: 0.063 ms
Execution Time: 0.020 ms
Execution Time: 0.015 ms
...
Turned out the devs were right with their complaints.
The execution plan changed within one minute from using an index to a sequential scan;
Here are the details (In the "LOG:"-line I select the current timestamp and the row count of the table):
****************************
*** the last 'good' run: ***
****************************
LOG: | 2022-02-24 13:47:01.747416+01 | 9653
LIMIT 10:
Limit (cost=752.37..789.30 rows=10 width=22) (actual time=1.388..1.390 rows=1 loops=1)
-> Bitmap Heap Scan on "order" (cost=752.37..929.63 rows=48 width=22) (actual time=1.387..1.388 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..752.36 rows=48 width=0) (actual time=1.374..1.374 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.401 ms
Execution Time: 1.404 ms
LIMIT 100:
Limit (cost=752.37..929.63 rows=48 width=22) (actual time=0.391..0.391 rows=1 loops=1)
-> Bitmap Heap Scan on "order" (cost=752.37..929.63 rows=48 width=22) (actual time=0.390..0.391 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..752.36 rows=48 width=0) (actual time=0.387..0.387 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.136 ms
Execution Time: 0.403 ms
NO LIMIT:
Bitmap Heap Scan on "order" (cost=752.37..929.63 rows=48 width=22) (actual time=0.248..0.249 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..752.36 rows=48 width=0) (actual time=0.245..0.245 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.107 ms
Execution Time: 0.258 ms
*********************************************
*** the first 'bad' run (one minute later ***
*********************************************
LOG: | 2022-02-24 13:48:01.840362+01 | 9653
LIMIT 10:
Limit (cost=0.00..804.97 rows=10 width=22) (actual time=23970.845..25588.432 rows=1 loops=1)
-> Seq Scan on "order" (cost=0.00..3863.86 rows=48 width=22) (actual time=23970.843..25588.429 rows=1 loops=1)
Filter: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Rows Removed by Filter: 9652
Planning Time: 0.430 ms
Execution Time: 25588.448 ms
LIMIT 100:
Limit (cost=788.37..965.63 rows=48 width=22) (actual time=0.900..0.902 rows=1 loops=1)
-> Bitmap Heap Scan on "order" (cost=788.37..965.63 rows=48 width=22) (actual time=0.900..0.901 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..788.36 rows=48 width=0) (actual time=0.894..0.895 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.187 ms
Execution Time: 0.919 ms
NO LIMIT:
Bitmap Heap Scan on "order" (cost=788.37..965.63 rows=48 width=22) (actual time=0.442..0.442 rows=1 loops=1)
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..788.36 rows=48 width=0) (actual time=0.438..0.438 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Planning Time: 0.151 ms
Execution Time: 0.453 ms
The table in question isn't that big:
oid | table_schema | table_name | row_estimate | total | index | toast | table
--------+--------------+------------+--------------+--------+--------+-------+-------
155544 | orderstore | order | 9649 | 210 MB | 108 MB | 91 MB | 10 MB
Table DDL:
CREATE TABLE orderstore."order" (
pk_id bigint DEFAULT nextval('orderstore.order_pk_id_seq'::regclass) NOT NULL,
version integer NOT NULL,
content jsonb NOT NULL,
manipulation_history jsonb NOT NULL,
CONSTRAINT chk_external_id_not_null CHECK (((content ->> 'externalId'::text) IS NOT NULL)),
CONSTRAINT chk_id_not_null CHECK (((content ->> 'id'::text) IS NOT NULL))
);
DDL of the index used (one amongst many others that exist):
--
-- Name: idx_fulltext_content; Type: INDEX; Schema: orderstore; Owner: orderstore
--
CREATE INDEX idx_fulltext_content ON orderstore."order" USING gin (jsonb_to_tsvector('english'::regconfig, content, '["all"]'::jsonb));
The record in pg_stat_all_tables before the manual vacuum:
relid | 155544
schemaname | orderstore
relname | order
seq_scan | 249
seq_tup_read | 2209150
idx_scan | 24696
idx_tup_fetch | 1155483
n_tup_ins | 87
n_tup_upd | 1404
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 9653
n_dead_tup | 87
n_mod_since_analyze | 152
n_ins_since_vacuum | 4
last_vacuum | 2022-02-24 10:44:34.524241+01
last_autovacuum |
last_analyze | 2022-02-24 03:20:05.79219+01
last_autoanalyze |
vacuum_count | 3
autovacuum_count | 0
analyze_count | 8
autoanalyze_count | 0
The entry in pg_stat_all_tables after the manual vacuum:
relid | 155544
schemaname | orderstore
relname | order
seq_scan | 249
seq_tup_read | 2209150
idx_scan | 24753
idx_tup_fetch | 1155561
n_tup_ins | 87
n_tup_upd | 1404
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 9476
n_dead_tup | 0
n_mod_since_analyze | 152
n_ins_since_vacuum | 0
last_vacuum | 2022-02-24 14:32:16.083692+01
last_autovacuum |
last_analyze | 2022-02-24 03:20:05.79219+01
last_autoanalyze |
vacuum_count | 4
autovacuum_count | 0
analyze_count | 8
autoanalyze_count | 0
Can someone provide any hints on how to deal with this issue? What am I missing?
In case you need additional informations pls let me know.
kind regards,
peter
What is random_page_cost set to? Do you have default autovacuum/analyze settings?
Perhaps not directly related, but might be interesting to look at-
With indexes on expressions, you get custom stats. It might be worth taking a look at those and seeing if they give anything approaching proper estimates.
eg.
select * from pg_class where relname = 'idx_customer_phone_numbers_phone_number_gist';
select * from pg_statistic where starelid = 'idx_customer_phone_numbers_phone_number_gist'::regclass;
select * from pg_stats where tablename = 'idx_customer_phone_numbers_phone_number_gist';
JSONB is a bit painful to use from a query planning perspective. Values in a jsonb column are fine for me in a select clause, but not ON or WHERE with very rare exceptions. Though, maybe that's not so applicable when you are doing full text search.
Peter Adlersburg <peter.adlersburg@gmail.com> writes: > Limit (cost=0.00..804.97 rows=10 width=22) (actual > time=23970.845..25588.432 rows=1 loops=1) > -> Seq Scan on "order" (cost=0.00..3863.86 rows=48 width=22) (actual > time=23970.843..25588.429 rows=1 loops=1) > Filter: (jsonb_to_tsvector('english'::regconfig, content, > '["all"]'::jsonb) @@ '''1.20709841'''::tsquery) > Rows Removed by Filter: 9652 > Planning Time: 0.430 ms > Execution Time: 25588.448 ms I think the expense here comes from re-executing jsonb_to_tsvector a lot of times. By default that's estimated as 100 times more expensive than a simple function (such as addition), but these results make it seem like that's an understatement. You might try something like alter function jsonb_to_tsvector(regconfig, jsonb, jsonb) cost 1000; to further discourage the planner from picking this plan shape. Possibly the cost estimate for ts_match_tq (the function underlying this variant of @@) needs to be bumped up as well. (Bear in mind that pg_dump will not propagate such hacks on system-defined objects, so you'll need a note to reapply any such changes after dump/reload or pg_upgrade.) regards, tom lane
Re: Advice needed: query performance deteriorates by 2000% within 1 minute
LOG Time: | 2022-02-28 09:30:01.400777+01 | order rows: | 9668
Limit (cost=616.37..653.30 rows=10 width=22) (actual time=1.062..1.063 rows=1 loops=1)
Output: version, content
Buffers: shared hit=154
-> Bitmap Heap Scan on orderstore."order" (cost=616.37..793.63 rows=48 width=22) (actual time=1.061..1.062 rows=1 loops=1)
Output: version, content
Recheck Cond: (jsonb_to_tsvector('english'::regconfig, "order".content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Heap Blocks: exact=1
Buffers: shared hit=154
-> Bitmap Index Scan on idx_fulltext_content (cost=0.00..616.36 rows=48 width=0) (actual time=1.053..1.053 rows=1 loops=1)
Index Cond: (jsonb_to_tsvector('english'::regconfig, "order".content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Buffers: shared hit=153
Planning:
Buffers: shared hit=50
Planning Time: 0.408 msExecution Time: 1.079 ms
pg_stat_all_tables:
n_tup_ins | 102
n_tup_upd | 1554
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 9668
n_dead_tup | 69
n_mod_since_analyze | 61
n_ins_since_vacuum | 8
last_vacuum | 2022-02-25 07:54:46.196508+01
last_autovacuum |
last_analyze | 2022-02-28 03:20:38.761482+01
last_autoanalyze |
The bad:
LOG Time: | 2022-02-28 09:45:01.662702+01 | order rows: | 9668
LIMIT 10:
Limit (cost=0.00..805.63 rows=10 width=22) (actual time=24175.964..25829.767 rows=1 loops=1)
Output: version, content
Buffers: shared hit=26284 read=12550 dirtied=4
-> Seq Scan on orderstore."order" (cost=0.00..3867.01 rows=48 width=22) (actual time=24175.962..25829.763 rows=1 loops=1)
Output: version, content
Filter: (jsonb_to_tsvector('english'::regconfig, "order".content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
Rows Removed by Filter: 9667
Buffers: shared hit=26284 read=12550 dirtied=4
Planning:
Buffers: shared hit=50
Planning Time: 0.377 ms
Execution Time: 25829.778 ms
n_tup_ins | 102
n_tup_upd | 1585
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 9668
n_dead_tup | 100
n_mod_since_analyze | 92
n_ins_since_vacuum | 8
last_vacuum | 2022-02-25 07:54:46.196508+01
last_autovacuum |
last_analyze | 2022-02-28 03:20:38.761482+01
last_autoanalyze |
The ugly:
It should be mentioned that the table in question mainly lives in toast land (but I have no idea if this also influences the query planner):
oid | table_schema | table_name | row_estimate | total_bytes | index_bytes | toast_bytes | table_bytes | total | index | toast | table
--------+--------------+------------+--------------+-------------+-------------+-------------+-------------+--------+--------+-------+-------
155544 | orderstore | order | 9570 | 229826560 | 120184832 | 98557952 | 11083776 | 219 MB | 115 MB | 94 MB | 11 MB
Peter Adlersburg <peter.adlersburg@gmail.com> writes:
> Limit (cost=0.00..804.97 rows=10 width=22) (actual
> time=23970.845..25588.432 rows=1 loops=1)
> -> Seq Scan on "order" (cost=0.00..3863.86 rows=48 width=22) (actual
> time=23970.843..25588.429 rows=1 loops=1)
> Filter: (jsonb_to_tsvector('english'::regconfig, content,
> '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
> Rows Removed by Filter: 9652
> Planning Time: 0.430 ms
> Execution Time: 25588.448 ms
I think the expense here comes from re-executing jsonb_to_tsvector
a lot of times. By default that's estimated as 100 times more expensive
than a simple function (such as addition), but these results make it
seem like that's an understatement. You might try something like
alter function jsonb_to_tsvector(regconfig, jsonb, jsonb) cost 1000;
to further discourage the planner from picking this plan shape.
Possibly the cost estimate for ts_match_tq (the function underlying
this variant of @@) needs to be bumped up as well.
(Bear in mind that pg_dump will not propagate such hacks on
system-defined objects, so you'll need a note to reapply
any such changes after dump/reload or pg_upgrade.)
regards, tom lane