Thread: analyze causes query planner to choose suboptimal plan for a selectquery in separate transaction
analyze causes query planner to choose suboptimal plan for a selectquery in separate transaction
From
Martin Kováčik
Date:
Hi group,
I have a Jenkins build server which runs integration tests of a Java application against PostgreSQL 11.2. The tests run in serial fashion and are reproducible because each test runs in separate DB transaction which is rolled back at the end of each test. Each full build (with 1400+ tests) takes approx 1h:45m.
My problem is that sometimes the build gets stuck (for several hours, often 10+) in random test waiting for random select query with postgres process consuming 100% CPU (1 core).
Well, at least I thought it was random, then I figured out it was triggered by autovacuum daemon. The problem is not the vacuum process itself, but in the analyze part that runs together with autovacuum.
To illustrate my situation let's consider my tests look like this:
BEGIN;
-- A: insert data for the test
-- B: i'll refer to this point later
-- C: select(s)
ROLLBACK;
Everything is fine, until autovacuum (analyze) runs when the test is at point B. After that the query planner at point C chooses wrong plan and the query takes a long time to complete, blocking one CPU core for a long time. It seems like the planner statistics inside running transaction are affected by analyze task running outside of the transaction. In this case after running analyze (outside the transaction) when the transaction is at B, causes query planner to think there are no rows (because the inserts at point A were not yet committed).
I did prepare a simple test case to reproduce this behavior:
First you need to create a table:
create table a (id bigint primary key);
Then run this transaction:
begin;
insert into a
select * from generate_series(1, 1000);
-- during sleep execute analyze on this db in separate connection
select pg_sleep(10);
explain analyze select count(*) from (
select distinct a1, a2, a3, a4 from a a1
left join a a2 on a1.id > 900
left join a a3 on a2.id = a3.id
left join a a4 on a3.id = a4.id
left join a a5 on a5.id = a4.id
left join a a6 on a6.id = a5.id
left join a a7 on a7.id = a6.id
left join a a8 on a8.id = a7.id) temp;
rollback;
The query plan for the select is as follows and the query takes 865ms to finish.
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1987304286213.68..1987304286213.69 rows=1 width=8) (actual time=862.406..862.407 rows=1 loops=1)
-> Unique (cost=1942976891422.08..1987284286213.68 rows=1600000000 width=128) (actual time=783.281..858.261 rows=100900 loops=1)
-> Sort (cost=1942976891422.08..1951838370380.40 rows=3544591583328 width=128) (actual time=783.280..825.055 rows=100900 loops=1)
Sort Key: a1.*, a2.*, a3.*, a4.*
Sort Method: external merge Disk: 12368kB
-> Merge Right Join (cost=5497387.28..53175003562.02 rows=3544591583328 width=128) (actual time=171.612..196.342 rows=100900 loops=1)
Merge Cond: (a4.id = a3.id)
-> Merge Left Join (cost=449798.05..1010691.54 rows=36848656 width=40) (actual time=8.257..8.815 rows=1000 loops=1)
Merge Cond: (a4.id = a5.id)
-> Sort (cost=158.51..164.16 rows=2260 width=40) (actual time=1.174..1.241 rows=1000 loops=1)
Sort Key: a4.id
Sort Method: quicksort Memory: 103kB
-> Seq Scan on a a4 (cost=0.00..32.60 rows=2260 width=40) (actual time=0.040..0.639 rows=1000 loops=1)
-> Materialize (cost=449639.54..465944.26 rows=3260943 width=8) (actual time=7.077..7.254 rows=1000 loops=1)
-> Sort (cost=449639.54..457791.90 rows=3260943 width=8) (actual time=7.066..7.148 rows=1000 loops=1)
Sort Key: a5.id
Sort Method: quicksort Memory: 71kB
-> Merge Right Join (cost=2897.82..52270.24 rows=3260943 width=8) (actual time=4.352..6.630 rows=1000 loops=1)
Merge Cond: (a7.id = a6.id)
-> Merge Left Join (cost=317.01..711.38 rows=25538 width=8) (actual time=1.236..2.482 rows=1000 loops=1)
Merge Cond: (a7.id = a8.id)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (actual time=0.618..0.792 rows=1000 loops=1)
Sort Key: a7.id
Sort Method: quicksort Memory: 71kB
-> Seq Scan on a a7 (cost=0.00..32.60 rows=2260 width=8) (actual time=0.017..0.265 rows=1000 loops=1)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (actual time=0.605..0.773 rows=1000 loops=1)
Sort Key: a8.id
Sort Method: quicksort Memory: 71kB
-> Seq Scan on a a8 (cost=0.00..32.60 rows=2260 width=8) (actual time=0.024..0.261 rows=1000 loops=1)
-> Sort (cost=2580.81..2644.65 rows=25538 width=16) (actual time=3.112..3.278 rows=1000 loops=1)
Sort Key: a6.id
Sort Method: quicksort Memory: 71kB
-> Merge Left Join (cost=317.01..711.38 rows=25538 width=16) (actual time=1.386..2.653 rows=1000 loops=1)
Merge Cond: (a5.id = a6.id)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (actual time=0.697..0.870 rows=1000 loops=1)
Sort Key: a5.id
Sort Method: quicksort Memory: 71kB
-> Seq Scan on a a5 (cost=0.00..32.60 rows=2260 width=8) (actual time=0.026..0.303 rows=1000 loops=1)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (actual time=0.681..0.851 rows=1000 loops=1)
Sort Key: a6.id
Sort Method: quicksort Memory: 71kB
-> Seq Scan on a a6 (cost=0.00..32.60 rows=2260 width=8) (actual time=0.018..0.286 rows=1000 loops=1)
-> Materialize (cost=5047589.24..5143782.35 rows=19238623 width=104) (actual time=163.351..175.058 rows=100900 loops=1)
-> Sort (cost=5047589.24..5095685.80 rows=19238623 width=104) (actual time=163.343..167.508 rows=100900 loops=1)
Sort Key: a3.id
Sort Method: quicksort Memory: 16766kB
-> Merge Right Join (cost=322881.15..615728.13 rows=19238623 width=104) (actual time=119.338..144.270 rows=100900 loops=1)
Merge Cond: (a3.id = a2.id)
-> Sort (cost=158.51..164.16 rows=2260 width=40) (actual time=1.153..1.230 rows=1000 loops=1)
Sort Key: a3.id
Sort Method: quicksort Memory: 103kB
-> Seq Scan on a a3 (cost=0.00..32.60 rows=2260 width=40) (actual time=0.039..0.629 rows=1000 loops=1)
-> Materialize (cost=322722.64..331235.31 rows=1702533 width=72) (actual time=118.179..131.501 rows=100900 loops=1)
-> Sort (cost=322722.64..326978.97 rows=1702533 width=72) (actual time=118.173..122.691 rows=100900 loops=1)
Sort Key: a2.id
Sort Method: quicksort Memory: 16780kB
-> Nested Loop Left Join (cost=0.00..76684.85 rows=1702533 width=72) (actual time=1.331..100.252 rows=100900 loops=1)
Join Filter: (a1.id > 900)
Rows Removed by Join Filter: 900000
-> Seq Scan on a a1 (cost=0.00..32.60 rows=2260 width=40) (actual time=0.019..0.292 rows=1000 loops=1)
-> Materialize (cost=0.00..43.90 rows=2260 width=40) (actual time=0.000..0.047 rows=1000 loops=1000)
-> Seq Scan on a a2 (cost=0.00..32.60 rows=2260 width=40) (actual time=0.016..0.606 rows=1000 loops=1)
Planning Time: 2.234 ms
Execution Time: 865.635 ms
(64 rows)
Now run the transaction again, but this time, run analyze in separate DB connection when the transactions sleeps at select pg_sleep(10).
The query plan is as follows and the select query takes 81s to complete. As you can see the planner greatly underestimated number of rows (1 vs 100900) and chosen nested loop plan which took a long time to finish.
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=40.12..40.13 rows=1 width=8) (actual time=81482.190..81482.190 rows=1 loops=1)
-> Unique (cost=40.10..40.11 rows=1 width=128) (actual time=81418.552..81477.211 rows=100900 loops=1)
-> Sort (cost=40.10..40.10 rows=1 width=128) (actual time=81418.551..81440.453 rows=100900 loops=1)
Sort Key: a1.*, a2.*, a3.*, a4.*
Sort Method: external merge Disk: 12376kB
-> Nested Loop Left Join (cost=0.00..40.09 rows=1 width=128) (actual time=1.424..81324.707 rows=100900 loops=1)
Join Filter: (a8.id = a7.id)
Rows Removed by Join Filter: 100800000
-> Nested Loop Left Join (cost=0.00..35.08 rows=1 width=136) (actual time=1.271..70436.483 rows=100900 loops=1)
Join Filter: (a7.id = a6.id)
Rows Removed by Join Filter: 100800000
-> Nested Loop Left Join (cost=0.00..30.06 rows=1 width=136) (actual time=1.117..59388.674 rows=100900 loops=1)
Join Filter: (a6.id = a5.id)
Rows Removed by Join Filter: 100800000
-> Nested Loop Left Join (cost=0.00..25.05 rows=1 width=136) (actual time=0.962..48361.409 rows=100900 loops=1)
Join Filter: (a5.id = a4.id)
Rows Removed by Join Filter: 100800000
-> Nested Loop Left Join (cost=0.00..20.04 rows=1 width=136) (actual time=0.807..37348.411 rows=100900 loops=1)
Join Filter: (a3.id = a4.id)
Rows Removed by Join Filter: 100800000
-> Nested Loop Left Join (cost=0.00..15.02 rows=1 width=104) (actual time=0.544..18687.670 rows=100900 loops=1)
Join Filter: (a2.id = a3.id)
Rows Removed by Join Filter: 100800000
-> Nested Loop Left Join (cost=0.00..10.01 rows=1 width=72) (actual time=0.283..190.095 rows=100900 loops=1)
Join Filter: (a1.id > 900)
Rows Removed by Join Filter: 900000
-> Seq Scan on a a1 (cost=0.00..5.00 rows=1 width=40) (actual time=0.013..0.352 rows=1000 loops=1)
-> Seq Scan on a a2 (cost=0.00..5.00 rows=1 width=40) (actual time=0.003..0.138 rows=1000 loops=1000)
-> Seq Scan on a a3 (cost=0.00..5.00 rows=1 width=40) (actual time=0.003..0.140 rows=1000 loops=100900)
-> Seq Scan on a a4 (cost=0.00..5.00 rows=1 width=40) (actual time=0.003..0.141 rows=1000 loops=100900)
-> Seq Scan on a a5 (cost=0.00..5.00 rows=1 width=8) (actual time=0.003..0.057 rows=1000 loops=100900)
-> Seq Scan on a a6 (cost=0.00..5.00 rows=1 width=8) (actual time=0.003..0.057 rows=1000 loops=100900)
-> Seq Scan on a a7 (cost=0.00..5.00 rows=1 width=8) (actual time=0.003..0.057 rows=1000 loops=100900)
-> Seq Scan on a a8 (cost=0.00..5.00 rows=1 width=8) (actual time=0.003..0.057 rows=1000 loops=100900)
Planning Time: 0.522 ms
Execution Time: 81483.797 ms
(36 rows)
What do you think, is this intended behavior or a potential bug ?
Regards, Martin
I have a Jenkins build server which runs integration tests of a Java application against PostgreSQL 11.2. The tests run in serial fashion and are reproducible because each test runs in separate DB transaction which is rolled back at the end of each test. Each full build (with 1400+ tests) takes approx 1h:45m.
My problem is that sometimes the build gets stuck (for several hours, often 10+) in random test waiting for random select query with postgres process consuming 100% CPU (1 core).
Well, at least I thought it was random, then I figured out it was triggered by autovacuum daemon. The problem is not the vacuum process itself, but in the analyze part that runs together with autovacuum.
To illustrate my situation let's consider my tests look like this:
BEGIN;
-- A: insert data for the test
-- B: i'll refer to this point later
-- C: select(s)
ROLLBACK;
Everything is fine, until autovacuum (analyze) runs when the test is at point B. After that the query planner at point C chooses wrong plan and the query takes a long time to complete, blocking one CPU core for a long time. It seems like the planner statistics inside running transaction are affected by analyze task running outside of the transaction. In this case after running analyze (outside the transaction) when the transaction is at B, causes query planner to think there are no rows (because the inserts at point A were not yet committed).
I did prepare a simple test case to reproduce this behavior:
First you need to create a table:
create table a (id bigint primary key);
Then run this transaction:
begin;
insert into a
select * from generate_series(1, 1000);
-- during sleep execute analyze on this db in separate connection
select pg_sleep(10);
explain analyze select count(*) from (
select distinct a1, a2, a3, a4 from a a1
left join a a2 on a1.id > 900
left join a a3 on a2.id = a3.id
left join a a4 on a3.id = a4.id
left join a a5 on a5.id = a4.id
left join a a6 on a6.id = a5.id
left join a a7 on a7.id = a6.id
left join a a8 on a8.id = a7.id) temp;
rollback;
The query plan for the select is as follows and the query takes 865ms to finish.
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1987304286213.68..1987304286213.69 rows=1 width=8) (actual time=862.406..862.407 rows=1 loops=1)
-> Unique (cost=1942976891422.08..1987284286213.68 rows=1600000000 width=128) (actual time=783.281..858.261 rows=100900 loops=1)
-> Sort (cost=1942976891422.08..1951838370380.40 rows=3544591583328 width=128) (actual time=783.280..825.055 rows=100900 loops=1)
Sort Key: a1.*, a2.*, a3.*, a4.*
Sort Method: external merge Disk: 12368kB
-> Merge Right Join (cost=5497387.28..53175003562.02 rows=3544591583328 width=128) (actual time=171.612..196.342 rows=100900 loops=1)
Merge Cond: (a4.id = a3.id)
-> Merge Left Join (cost=449798.05..1010691.54 rows=36848656 width=40) (actual time=8.257..8.815 rows=1000 loops=1)
Merge Cond: (a4.id = a5.id)
-> Sort (cost=158.51..164.16 rows=2260 width=40) (actual time=1.174..1.241 rows=1000 loops=1)
Sort Key: a4.id
Sort Method: quicksort Memory: 103kB
-> Seq Scan on a a4 (cost=0.00..32.60 rows=2260 width=40) (actual time=0.040..0.639 rows=1000 loops=1)
-> Materialize (cost=449639.54..465944.26 rows=3260943 width=8) (actual time=7.077..7.254 rows=1000 loops=1)
-> Sort (cost=449639.54..457791.90 rows=3260943 width=8) (actual time=7.066..7.148 rows=1000 loops=1)
Sort Key: a5.id
Sort Method: quicksort Memory: 71kB
-> Merge Right Join (cost=2897.82..52270.24 rows=3260943 width=8) (actual time=4.352..6.630 rows=1000 loops=1)
Merge Cond: (a7.id = a6.id)
-> Merge Left Join (cost=317.01..711.38 rows=25538 width=8) (actual time=1.236..2.482 rows=1000 loops=1)
Merge Cond: (a7.id = a8.id)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (actual time=0.618..0.792 rows=1000 loops=1)
Sort Key: a7.id
Sort Method: quicksort Memory: 71kB
-> Seq Scan on a a7 (cost=0.00..32.60 rows=2260 width=8) (actual time=0.017..0.265 rows=1000 loops=1)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (actual time=0.605..0.773 rows=1000 loops=1)
Sort Key: a8.id
Sort Method: quicksort Memory: 71kB
-> Seq Scan on a a8 (cost=0.00..32.60 rows=2260 width=8) (actual time=0.024..0.261 rows=1000 loops=1)
-> Sort (cost=2580.81..2644.65 rows=25538 width=16) (actual time=3.112..3.278 rows=1000 loops=1)
Sort Key: a6.id
Sort Method: quicksort Memory: 71kB
-> Merge Left Join (cost=317.01..711.38 rows=25538 width=16) (actual time=1.386..2.653 rows=1000 loops=1)
Merge Cond: (a5.id = a6.id)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (actual time=0.697..0.870 rows=1000 loops=1)
Sort Key: a5.id
Sort Method: quicksort Memory: 71kB
-> Seq Scan on a a5 (cost=0.00..32.60 rows=2260 width=8) (actual time=0.026..0.303 rows=1000 loops=1)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (actual time=0.681..0.851 rows=1000 loops=1)
Sort Key: a6.id
Sort Method: quicksort Memory: 71kB
-> Seq Scan on a a6 (cost=0.00..32.60 rows=2260 width=8) (actual time=0.018..0.286 rows=1000 loops=1)
-> Materialize (cost=5047589.24..5143782.35 rows=19238623 width=104) (actual time=163.351..175.058 rows=100900 loops=1)
-> Sort (cost=5047589.24..5095685.80 rows=19238623 width=104) (actual time=163.343..167.508 rows=100900 loops=1)
Sort Key: a3.id
Sort Method: quicksort Memory: 16766kB
-> Merge Right Join (cost=322881.15..615728.13 rows=19238623 width=104) (actual time=119.338..144.270 rows=100900 loops=1)
Merge Cond: (a3.id = a2.id)
-> Sort (cost=158.51..164.16 rows=2260 width=40) (actual time=1.153..1.230 rows=1000 loops=1)
Sort Key: a3.id
Sort Method: quicksort Memory: 103kB
-> Seq Scan on a a3 (cost=0.00..32.60 rows=2260 width=40) (actual time=0.039..0.629 rows=1000 loops=1)
-> Materialize (cost=322722.64..331235.31 rows=1702533 width=72) (actual time=118.179..131.501 rows=100900 loops=1)
-> Sort (cost=322722.64..326978.97 rows=1702533 width=72) (actual time=118.173..122.691 rows=100900 loops=1)
Sort Key: a2.id
Sort Method: quicksort Memory: 16780kB
-> Nested Loop Left Join (cost=0.00..76684.85 rows=1702533 width=72) (actual time=1.331..100.252 rows=100900 loops=1)
Join Filter: (a1.id > 900)
Rows Removed by Join Filter: 900000
-> Seq Scan on a a1 (cost=0.00..32.60 rows=2260 width=40) (actual time=0.019..0.292 rows=1000 loops=1)
-> Materialize (cost=0.00..43.90 rows=2260 width=40) (actual time=0.000..0.047 rows=1000 loops=1000)
-> Seq Scan on a a2 (cost=0.00..32.60 rows=2260 width=40) (actual time=0.016..0.606 rows=1000 loops=1)
Planning Time: 2.234 ms
Execution Time: 865.635 ms
(64 rows)
Now run the transaction again, but this time, run analyze in separate DB connection when the transactions sleeps at select pg_sleep(10).
The query plan is as follows and the select query takes 81s to complete. As you can see the planner greatly underestimated number of rows (1 vs 100900) and chosen nested loop plan which took a long time to finish.
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=40.12..40.13 rows=1 width=8) (actual time=81482.190..81482.190 rows=1 loops=1)
-> Unique (cost=40.10..40.11 rows=1 width=128) (actual time=81418.552..81477.211 rows=100900 loops=1)
-> Sort (cost=40.10..40.10 rows=1 width=128) (actual time=81418.551..81440.453 rows=100900 loops=1)
Sort Key: a1.*, a2.*, a3.*, a4.*
Sort Method: external merge Disk: 12376kB
-> Nested Loop Left Join (cost=0.00..40.09 rows=1 width=128) (actual time=1.424..81324.707 rows=100900 loops=1)
Join Filter: (a8.id = a7.id)
Rows Removed by Join Filter: 100800000
-> Nested Loop Left Join (cost=0.00..35.08 rows=1 width=136) (actual time=1.271..70436.483 rows=100900 loops=1)
Join Filter: (a7.id = a6.id)
Rows Removed by Join Filter: 100800000
-> Nested Loop Left Join (cost=0.00..30.06 rows=1 width=136) (actual time=1.117..59388.674 rows=100900 loops=1)
Join Filter: (a6.id = a5.id)
Rows Removed by Join Filter: 100800000
-> Nested Loop Left Join (cost=0.00..25.05 rows=1 width=136) (actual time=0.962..48361.409 rows=100900 loops=1)
Join Filter: (a5.id = a4.id)
Rows Removed by Join Filter: 100800000
-> Nested Loop Left Join (cost=0.00..20.04 rows=1 width=136) (actual time=0.807..37348.411 rows=100900 loops=1)
Join Filter: (a3.id = a4.id)
Rows Removed by Join Filter: 100800000
-> Nested Loop Left Join (cost=0.00..15.02 rows=1 width=104) (actual time=0.544..18687.670 rows=100900 loops=1)
Join Filter: (a2.id = a3.id)
Rows Removed by Join Filter: 100800000
-> Nested Loop Left Join (cost=0.00..10.01 rows=1 width=72) (actual time=0.283..190.095 rows=100900 loops=1)
Join Filter: (a1.id > 900)
Rows Removed by Join Filter: 900000
-> Seq Scan on a a1 (cost=0.00..5.00 rows=1 width=40) (actual time=0.013..0.352 rows=1000 loops=1)
-> Seq Scan on a a2 (cost=0.00..5.00 rows=1 width=40) (actual time=0.003..0.138 rows=1000 loops=1000)
-> Seq Scan on a a3 (cost=0.00..5.00 rows=1 width=40) (actual time=0.003..0.140 rows=1000 loops=100900)
-> Seq Scan on a a4 (cost=0.00..5.00 rows=1 width=40) (actual time=0.003..0.141 rows=1000 loops=100900)
-> Seq Scan on a a5 (cost=0.00..5.00 rows=1 width=8) (actual time=0.003..0.057 rows=1000 loops=100900)
-> Seq Scan on a a6 (cost=0.00..5.00 rows=1 width=8) (actual time=0.003..0.057 rows=1000 loops=100900)
-> Seq Scan on a a7 (cost=0.00..5.00 rows=1 width=8) (actual time=0.003..0.057 rows=1000 loops=100900)
-> Seq Scan on a a8 (cost=0.00..5.00 rows=1 width=8) (actual time=0.003..0.057 rows=1000 loops=100900)
Planning Time: 0.522 ms
Execution Time: 81483.797 ms
(36 rows)
What do you think, is this intended behavior or a potential bug ?
Regards, Martin
Re: analyze causes query planner to choose suboptimal plan for aselect query in separate transaction
From
Michael Lewis
Date:
I assume it is in the documentation, but I am not aware of how stats are handled for uncommitted work. Obviously in the example you provided the table would be empty, but in your real tests do they start out empty? Would it suffice to use temp tables created like the regular ones and analyze after insert to ensure stats are up to date? Or would it make sense to turn off auto-vacuum/analyze during these tests? The workload is not expected to be similar to real world I would think, and so you don't need the vacuum and the stats update perhaps.
Re: analyze causes query planner to choose suboptimal plan for aselect query in separate transaction
From
Martin Kováčik
Date:
Turning off autovacuum for the tests is a valid option and I will definitely do this as a workaround. Each test pretty much starts with empty schema and data for it is generated during the run and rolled back at the end. I have a lot of tests and at the moment it is not feasible to modify them.
The real workload for the application is different, but there are some cases, when we import data from remote web service in a transaction do some work with it and then we do a commit. If there is an autovacuum during this process I assume there will be similar problem regarding planner statistics.
The real problem here is that the statistics that are seen from one (uncommited) transaction are affected by analyze statement from another (commited) transaction. I've seen similar behavior in production for other applications using PostgreSQL too (10.x and 11.x) - 100% CPU consumed by an otherwise fine select query. I was puzzled by this behavior back then but now it makes sense.
On Thu, Apr 25, 2019 at 5:26 PM Michael Lewis <mlewis@entrata.com> wrote:
I assume it is in the documentation, but I am not aware of how stats are handled for uncommitted work. Obviously in the example you provided the table would be empty, but in your real tests do they start out empty? Would it suffice to use temp tables created like the regular ones and analyze after insert to ensure stats are up to date? Or would it make sense to turn off auto-vacuum/analyze during these tests? The workload is not expected to be similar to real world I would think, and so you don't need the vacuum and the stats update perhaps.
Re: analyze causes query planner to choose suboptimal plan for aselect query in separate transaction
From
Michael Lewis
Date:
On Thu, Apr 25, 2019, 11:34 AM Martin Kováčik <kovacik@redbyte.eu> wrote:
Turning off autovacuum for the tests is a valid option and I will definitely do this as a workaround. Each test pretty much starts with empty schema and data for it is generated during the run and rolled back at the end. I have a lot of tests and at the moment it is not feasible to modify them.The real workload for the application is different, but there are some cases, when we import data from remote web service in a transaction do some work with it and then we do a commit. If there is an autovacuum during this process I assume there will be similar problem regarding planner statistics.
Unless you are importing a huge amount of data relative to what is already there, it seems likely to be significantly less impactful than adding data to a completely empty table. The stats on a table with 0 rows and then 5000 rows is going to be night and day, while the difference between stats on 100,000 rows and 105,000 is not as impactful. Musing here. I expect others will chime in.
Stats are not versioned with MVCC so it would expected that a commit in another transaction that is updating stats would influence the query plan for another transaction that is active.
Re: analyze causes query planner to choose suboptimal plan for aselect query in separate transaction
From
Adrian Klaver
Date:
On 4/25/19 7:37 AM, Martin Kováčik wrote: > Hi group, > See comments inline below > To illustrate my situation let's consider my tests look like this: > > BEGIN; > > -- A: insert data for the test > > -- B: i'll refer to this point later > > -- C: select(s) > > ROLLBACK; > > Everything is fine, until autovacuum (analyze) runs when the test is at > point B. After that the query planner at point C chooses wrong plan and > the query takes a long time to complete, blocking one CPU core for a > long time. It seems like the planner statistics inside running > transaction are affected by analyze task running outside of the > transaction. In this case after running analyze (outside the > transaction) when the transaction is at B, causes query planner to think > there are no rows (because the inserts at point A were not yet committed). > > I did prepare a simple test case to reproduce this behavior: > > First you need to create a table: > > create table a (id bigint primary key); > > Then run this transaction: > > begin; > insert into a > select * from generate_series(1, 1000); > > -- during sleep execute analyze on this db in separate connection > select pg_sleep(10); analyze a; On my machine that changes the time from: 29715.763 ms to 291.765 ms when running ANALYZE in the concurrent connection during the sleep. > > explain analyze select count(*) from ( > select distinct a1, a2, a3, a4 from a a1 > left join a a2 on a1.id <http://a1.id> > 900 > left join a a3 on a2.id <http://a2.id> = a3.id <http://a3.id> > left join a a4 on a3.id <http://a3.id> = a4.id <http://a4.id> > left join a a5 on a5.id <http://a5.id> = a4.id <http://a4.id> > left join a a6 on a6.id <http://a6.id> = a5.id <http://a5.id> > left join a a7 on a7.id <http://a7.id> = a6.id <http://a6.id> > left join a a8 on a8.id <http://a8.id> = a7.id <http://a7.id>) temp; > > rollback; > -- Adrian Klaver adrian.klaver@aklaver.com
Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction
From
Tom Lane
Date:
=?UTF-8?B?TWFydGluIEtvdsOhxI1paw==?= <kovacik@redbyte.eu> writes: > To illustrate my situation let's consider my tests look like this: > BEGIN; > -- A: insert data for the test > -- B: i'll refer to this point later > -- C: select(s) > ROLLBACK; > Everything is fine, until autovacuum (analyze) runs when the test is at > point B. After that the query planner at point C chooses wrong plan and the > query takes a long time to complete, blocking one CPU core for a long time. > It seems like the planner statistics inside running transaction are > affected by analyze task running outside of the transaction. Yup, they are. However, you're already at risk of a pretty bad plan for a case like this, since (by assumption) the stats before you did the insert at step A are radically different from what they should be after the insert. The standard recommendation, when you need the results of a data change to be understood by the planner immediately, is to do your own ANALYZE: BEGIN; -- A: insert data for the test ANALYZE test_table; -- C: select(s) ROLLBACK; This should protect step C against seeing any irrelevant stats, because (a) once your transaction has done an ANALYZE, autovacuum shouldn't think the stats are out of date, and (b) even if it does, your transaction is now holding ShareUpdateExclusiveLock on the table so auto-ANALYZE can't get that lock to do a fresh ANALYZE. Another thing you could do is not even allow the test table to exist outside your transaction: BEGIN; CREATE TABLE test_table (...); -- A: insert data for the test ANALYZE test_table; -- this is now somewhat optional -- C: select(s) ROLLBACK; Then there's nothing for auto-ANALYZE to get its hands on. If you're satisfied with the planner's default behavior in the absence of any stats, you could omit the post-insertion ANALYZE in this case. But I'm not sure that that would represent a test that has much to do with production situations. regards, tom lane
Re: analyze causes query planner to choose suboptimal plan for aselect query in separate transaction
From
Martin Kováčik
Date:
As my example shows you don't have to import a lot of rows - 1000 is enough to make a difference - it all depends on the query. When a cartesian product is involved only a few records is enough.
I think that stats should be MVCC versioned otherwise the planner is using wrong statistics and chooses wrong plans.
On Thu, Apr 25, 2019 at 9:28 PM Michael Lewis <mlewis@entrata.com> wrote:
On Thu, Apr 25, 2019, 11:34 AM Martin Kováčik <kovacik@redbyte.eu> wrote:Turning off autovacuum for the tests is a valid option and I will definitely do this as a workaround. Each test pretty much starts with empty schema and data for it is generated during the run and rolled back at the end. I have a lot of tests and at the moment it is not feasible to modify them.The real workload for the application is different, but there are some cases, when we import data from remote web service in a transaction do some work with it and then we do a commit. If there is an autovacuum during this process I assume there will be similar problem regarding planner statistics.Unless you are importing a huge amount of data relative to what is already there, it seems likely to be significantly less impactful than adding data to a completely empty table. The stats on a table with 0 rows and then 5000 rows is going to be night and day, while the difference between stats on 100,000 rows and 105,000 is not as impactful. Musing here. I expect others will chime in.Stats are not versioned with MVCC so it would expected that a commit in another transaction that is updating stats would influence the query plan for another transaction that is active.
Re: analyze causes query planner to choose suboptimal plan for aselect query in separate transaction
From
Adrian Klaver
Date:
On 4/25/19 12:47 PM, Martin Kováčik wrote: > As my example shows you don't have to import a lot of rows - 1000 is > enough to make a difference - it all depends on the query. When a > cartesian product is involved only a few records is enough. > I think that stats should be MVCC versioned otherwise the planner is > using wrong statistics and chooses wrong plans. Then you are looking at moving the choke point to looking up the correct stats across possibly hundreds/thousands of transactions in flight. > *Martin Kováčik* > /CEO/ > *redByte*, s.r.o. > +421 904 236 791 > kovacik@redbyte.eu <mailto:kovacik@redbyte.eu>, www.redbyte.eu > <http://redbyte.eu> > > > On Thu, Apr 25, 2019 at 9:28 PM Michael Lewis <mlewis@entrata.com > <mailto:mlewis@entrata.com>> wrote: > > > > On Thu, Apr 25, 2019, 11:34 AM Martin Kováčik <kovacik@redbyte.eu > <mailto:kovacik@redbyte.eu>> wrote: > > Turning off autovacuum for the tests is a valid option and I > will definitely do this as a workaround. Each test pretty much > starts with empty schema and data for it is generated during the > run and rolled back at the end. I have a lot of tests and at the > moment it is not feasible to modify them. > > The real workload for the application is different, but there > are some cases, when we import data from remote web service in a > transaction do some work with it and then we do a commit. If > there is an autovacuum during this process I assume there will > be similar problem regarding planner statistics. > > > Unless you are importing a huge amount of data relative to what is > already there, it seems likely to be significantly less impactful > than adding data to a completely empty table. The stats on a table > with 0 rows and then 5000 rows is going to be night and day, while > the difference between stats on 100,000 rows and 105,000 is not as > impactful. Musing here. I expect others will chime in. > > Stats are not versioned with MVCC so it would expected that a commit > in another transaction that is updating stats would influence the > query plan for another transaction that is active. > -- Adrian Klaver adrian.klaver@aklaver.com
Re: analyze causes query planner to choose suboptimal plan for aselect query in separate transaction
From
Martin Kováčik
Date:
Thanks Tom,
that is a great explanation. The default plan, which overestimates number of rows is much better than the plan (which is based on wrong stats) that underestimates it. This is true for my particular case but may be bad for other cases.
The schema is created before the tests begin so the suggestion to create required tables inside a transaction isn't feasible for my case. For the build server I'm going with Michael's suggestion to disable autovacuum. Instead I'll do manual vacuuming/analyzing before build starts.
For the production use case I'll do my own analyze after bulk insert as you suggested.
Initially I thought that inside a transaction stats are automatically updated after the inserts because the query ran much faster than the query which was based on statistics with no rows. I did not realize the planner chooses a "default" plan when there are no stats.
On Thu, Apr 25, 2019 at 9:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Martin Kováčik <kovacik@redbyte.eu> writes:
> To illustrate my situation let's consider my tests look like this:
> BEGIN;
> -- A: insert data for the test
> -- B: i'll refer to this point later
> -- C: select(s)
> ROLLBACK;
> Everything is fine, until autovacuum (analyze) runs when the test is at
> point B. After that the query planner at point C chooses wrong plan and the
> query takes a long time to complete, blocking one CPU core for a long time.
> It seems like the planner statistics inside running transaction are
> affected by analyze task running outside of the transaction.
Yup, they are. However, you're already at risk of a pretty bad plan for
a case like this, since (by assumption) the stats before you did the
insert at step A are radically different from what they should be after
the insert.
The standard recommendation, when you need the results of a data change
to be understood by the planner immediately, is to do your own ANALYZE:
BEGIN;
-- A: insert data for the test
ANALYZE test_table;
-- C: select(s)
ROLLBACK;
This should protect step C against seeing any irrelevant stats, because
(a) once your transaction has done an ANALYZE, autovacuum shouldn't
think the stats are out of date, and
(b) even if it does, your transaction is now holding
ShareUpdateExclusiveLock on the table so auto-ANALYZE can't get
that lock to do a fresh ANALYZE.
Another thing you could do is not even allow the test table to exist
outside your transaction:
BEGIN;
CREATE TABLE test_table (...);
-- A: insert data for the test
ANALYZE test_table; -- this is now somewhat optional
-- C: select(s)
ROLLBACK;
Then there's nothing for auto-ANALYZE to get its hands on. If you're
satisfied with the planner's default behavior in the absence of any
stats, you could omit the post-insertion ANALYZE in this case. But
I'm not sure that that would represent a test that has much to do with
production situations.
regards, tom lane
Re: analyze causes query planner to choose suboptimal plan for aselect query in separate transaction
From
Martin Kováčik
Date:
I'm not sure if I understand what you mean. My initial thought was that stats are fixed per transaction, i.e. analyze from comitted transaction doesn't interfere with another running transaction. Maybe I was confused by this because analyze can be run inside a transaction, so my assumption was it is isolated like other statements.
On Thu, Apr 25, 2019 at 9:58 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 4/25/19 12:47 PM, Martin Kováčik wrote:
> As my example shows you don't have to import a lot of rows - 1000 is
> enough to make a difference - it all depends on the query. When a
> cartesian product is involved only a few records is enough.
> I think that stats should be MVCC versioned otherwise the planner is
> using wrong statistics and chooses wrong plans.
Then you are looking at moving the choke point to looking up the correct
stats across possibly hundreds/thousands of transactions in flight.
> *Martin Kováčik*
> /CEO/
> *redByte*, s.r.o.
> +421 904 236 791
> kovacik@redbyte.eu <mailto:kovacik@redbyte.eu>, www.redbyte.eu
> <http://redbyte.eu>
>
>
> On Thu, Apr 25, 2019 at 9:28 PM Michael Lewis <mlewis@entrata.com
> <mailto:mlewis@entrata.com>> wrote:
>
>
>
> On Thu, Apr 25, 2019, 11:34 AM Martin Kováčik <kovacik@redbyte.eu
> <mailto:kovacik@redbyte.eu>> wrote:
>
> Turning off autovacuum for the tests is a valid option and I
> will definitely do this as a workaround. Each test pretty much
> starts with empty schema and data for it is generated during the
> run and rolled back at the end. I have a lot of tests and at the
> moment it is not feasible to modify them.
>
> The real workload for the application is different, but there
> are some cases, when we import data from remote web service in a
> transaction do some work with it and then we do a commit. If
> there is an autovacuum during this process I assume there will
> be similar problem regarding planner statistics.
>
>
> Unless you are importing a huge amount of data relative to what is
> already there, it seems likely to be significantly less impactful
> than adding data to a completely empty table. The stats on a table
> with 0 rows and then 5000 rows is going to be night and day, while
> the difference between stats on 100,000 rows and 105,000 is not as
> impactful. Musing here. I expect others will chime in.
>
> Stats are not versioned with MVCC so it would expected that a commit
> in another transaction that is updating stats would influence the
> query plan for another transaction that is active.
>
--
Adrian Klaver
adrian.klaver@aklaver.com
Re: analyze causes query planner to choose suboptimal plan for aselect query in separate transaction
From
Adrian Klaver
Date:
On 4/25/19 1:28 PM, Martin Kováčik wrote: > I'm not sure if I understand what you mean. My initial thought was that > stats are fixed per transaction, i.e. analyze from comitted transaction > doesn't interfere with another running transaction. Maybe I was confused > by this because analyze can be run inside a transaction, so my > assumption was it is isolated like other statements. To make it clearer I would take a look at: https://www.postgresql.org/docs/11/sql-analyze.html https://www.postgresql.org/docs/11/catalog-pg-statistic.html The gist is currently there is one entry(with exception noted in second link) per column in each table. This is the most recent information available subject to the conditions Tom pointed out in his post. To make it MVCC aware like you suggest would mean tracking the state of all transactions currently open and recording that information, so each transaction could find its stats. This means the planning problem could then move to the stats table as it would need to be analyzed itself to work efficiently. This would probably also need to be MVCC aware to be relevant, which would add to the overhead. I could see this turn into a 'hall of mirrors' problem quickly. > *Martin Kováčik* > /CEO/ > *redByte*, s.r.o. > +421 904 236 791 > kovacik@redbyte.eu <mailto:kovacik@redbyte.eu>, www.redbyte.eu > <http://redbyte.eu> > > > On Thu, Apr 25, 2019 at 9:58 PM Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>> wrote: > > On 4/25/19 12:47 PM, Martin Kováčik wrote: > > As my example shows you don't have to import a lot of rows - 1000 is > > enough to make a difference - it all depends on the query. When a > > cartesian product is involved only a few records is enough. > > I think that stats should be MVCC versioned otherwise the planner is > > using wrong statistics and chooses wrong plans. > > Then you are looking at moving the choke point to looking up the > correct > stats across possibly hundreds/thousands of transactions in flight. > > > *Martin Kováčik* > > /CEO/ > > *redByte*, s.r.o. > > +421 904 236 791 > > kovacik@redbyte.eu <mailto:kovacik@redbyte.eu> > <mailto:kovacik@redbyte.eu <mailto:kovacik@redbyte.eu>>, > www.redbyte.eu <http://www.redbyte.eu> > > <http://redbyte.eu> > > > > > > On Thu, Apr 25, 2019 at 9:28 PM Michael Lewis <mlewis@entrata.com > <mailto:mlewis@entrata.com> > > <mailto:mlewis@entrata.com <mailto:mlewis@entrata.com>>> wrote: > > > > > > > > On Thu, Apr 25, 2019, 11:34 AM Martin Kováčik > <kovacik@redbyte.eu <mailto:kovacik@redbyte.eu> > > <mailto:kovacik@redbyte.eu <mailto:kovacik@redbyte.eu>>> wrote: > > > > Turning off autovacuum for the tests is a valid option and I > > will definitely do this as a workaround. Each test pretty > much > > starts with empty schema and data for it is generated > during the > > run and rolled back at the end. I have a lot of tests and > at the > > moment it is not feasible to modify them. > > > > The real workload for the application is different, but there > > are some cases, when we import data from remote web > service in a > > transaction do some work with it and then we do a commit. If > > there is an autovacuum during this process I assume there > will > > be similar problem regarding planner statistics. > > > > > > Unless you are importing a huge amount of data relative to > what is > > already there, it seems likely to be significantly less impactful > > than adding data to a completely empty table. The stats on a > table > > with 0 rows and then 5000 rows is going to be night and day, > while > > the difference between stats on 100,000 rows and 105,000 is > not as > > impactful. Musing here. I expect others will chime in. > > > > Stats are not versioned with MVCC so it would expected that a > commit > > in another transaction that is updating stats would influence the > > query plan for another transaction that is active. > > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > -- Adrian Klaver adrian.klaver@aklaver.com
Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction
From
Martin Kováčik
Date:
I’ve got it now, thank you for the clarification. You are right. Martin Kováčik +421904236791 > On 26 Apr 2019, at 00:25, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > >> On 4/25/19 1:28 PM, Martin Kováčik wrote: >> I'm not sure if I understand what you mean. My initial thought was that stats are fixed per transaction, i.e. analyzefrom comitted transaction doesn't interfere with another running transaction. Maybe I was confused by this becauseanalyze can be run inside a transaction, so my assumption was it is isolated like other statements. > > To make it clearer I would take a look at: > > https://www.postgresql.org/docs/11/sql-analyze.html > https://www.postgresql.org/docs/11/catalog-pg-statistic.html > > > The gist is currently there is one entry(with exception noted in second link) per column in each table. This is the mostrecent information available subject to the conditions Tom pointed out in his post. To make it MVCC aware like you suggestwould mean tracking the state of all transactions currently open and recording that information, so each transactioncould find its stats. This means the planning problem could then move to the stats table as it would need to beanalyzed itself to work efficiently. This would probably also need to be MVCC aware to be relevant, which would add tothe overhead. I could see this turn into a 'hall of mirrors' problem quickly. > >> *Martin Kováčik* >> /CEO/ >> *redByte*, s.r.o. >> +421 904 236 791 >> kovacik@redbyte.eu <mailto:kovacik@redbyte.eu>, www.redbyte.eu <http://redbyte.eu> >> On Thu, Apr 25, 2019 at 9:58 PM Adrian Klaver <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: >> On 4/25/19 12:47 PM, Martin Kováčik wrote: >> > As my example shows you don't have to import a lot of rows - 1000 is >> > enough to make a difference - it all depends on the query. When a >> > cartesian product is involved only a few records is enough. >> > I think that stats should be MVCC versioned otherwise the planner is >> > using wrong statistics and chooses wrong plans. >> Then you are looking at moving the choke point to looking up the >> correct >> stats across possibly hundreds/thousands of transactions in flight. >> > *Martin Kováčik* >> > /CEO/ >> > *redByte*, s.r.o. >> > +421 904 236 791 >> > kovacik@redbyte.eu <mailto:kovacik@redbyte.eu> >> <mailto:kovacik@redbyte.eu <mailto:kovacik@redbyte.eu>>, >> www.redbyte.eu <http://www.redbyte.eu> >> > <http://redbyte.eu> >> > >> > >> > On Thu, Apr 25, 2019 at 9:28 PM Michael Lewis <mlewis@entrata.com >> <mailto:mlewis@entrata.com> >> > <mailto:mlewis@entrata.com <mailto:mlewis@entrata.com>>> wrote: >> > >> > >> > >> > On Thu, Apr 25, 2019, 11:34 AM Martin Kováčik >> <kovacik@redbyte.eu <mailto:kovacik@redbyte.eu> >> > <mailto:kovacik@redbyte.eu <mailto:kovacik@redbyte.eu>>> wrote: >> > >> > Turning off autovacuum for the tests is a valid option and I >> > will definitely do this as a workaround. Each test pretty >> much >> > starts with empty schema and data for it is generated >> during the >> > run and rolled back at the end. I have a lot of tests and >> at the >> > moment it is not feasible to modify them. >> > >> > The real workload for the application is different, but there >> > are some cases, when we import data from remote web >> service in a >> > transaction do some work with it and then we do a commit. If >> > there is an autovacuum during this process I assume there >> will >> > be similar problem regarding planner statistics. >> > >> > >> > Unless you are importing a huge amount of data relative to >> what is >> > already there, it seems likely to be significantly less impactful >> > than adding data to a completely empty table. The stats on a >> table >> > with 0 rows and then 5000 rows is going to be night and day, >> while >> > the difference between stats on 100,000 rows and 105,000 is >> not as >> > impactful. Musing here. I expect others will chime in. >> > >> > Stats are not versioned with MVCC so it would expected that a >> commit >> > in another transaction that is updating stats would influence the >> > query plan for another transaction that is active. >> > >> -- Adrian Klaver >> adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > > > -- > Adrian Klaver > adrian.klaver@aklaver.com