Thread: query has huge variance in execution times
postgres 8.3.5 on RHEL4 update 6 This query starts executing at 18:41: cemdb=> select query_start,current_query from pg_stat_activity where procpid=10022; query_start | current_query -------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2010-03-30 18:41:11.685261-07 | select b.ts_id from ts_stats_tranunit_user_daily b, ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id = c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and b.ts_user_incarnation_id = c.ts_user_incarnation_id and c.ts_interval_start_time >= $1 and c.ts_interval_start_time < $2 and b.ts_interval_start_time >= $3 and b.ts_interval_start_time < $4 (1 row) about 5 mins later, I, suspecting problems, do (the values are the same as for $1 et al above; EXPLAIN was done on purpose to keep stats [hopefully] the same as when pid 10022 started; there are 80,000 rows in each of the 2 tables at the time of this EXPLAIN and when 10022 started): cemdb=> explain select b.ts_id from ts_stats_tranunit_user_daily b, ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id = c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and b.ts_user_incarnation_id = c.ts_user_incarnation_id and c.ts_interval_start_time >= '2010-3-29 01:00' and c.ts_interval_start_time < '2010-3-29 02:00' and b.ts_interval_start_time >= '2010-3-29' and b.ts_interval_start_time < '2010-3-30'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=33574.89..34369.38 rows=25207 width=8) Merge Cond: ((b.ts_transet_incarnation_id = c.ts_transet_incarnation_id) AND (b.ts_tranunit_id = c.ts_tranunit_id) AND (b.ts_user_incarnation_id = c.ts_user_incarnation_id)) -> Sort (cost=13756.68..13854.96 rows=78623 width=32) Sort Key: b.ts_transet_incarnation_id, b.ts_tranunit_id, b.ts_user_incarnation_id -> Index Scan using ts_stats_tranunit_user_daily_starttime on ts_stats_tranunit_user_daily b (cost=0.00..10560.13 rows=78623 width=32) Index Cond: ((ts_interval_start_time >= '2010-03-29 00:00:00-07'::timestamp with time zone) AND (ts_interval_start_time < '2010-03-30 00:00:00-07'::timestamp with time zone)) -> Sort (cost=19818.21..19959.72 rows=113207 width=24) Sort Key: c.ts_transet_incarnation_id, c.ts_tranunit_id, c.ts_user_incarnation_id -> Index Scan using ts_stats_tranunit_user_interval_starttime on ts_stats_tranunit_user_interval c (cost=0.00..15066.74 rows=113207 width=24) Index Cond: ((ts_interval_start_time >= '2010-03-29 01:00:00-07'::timestamp with time zone) AND (ts_interval_start_time < '2010-03-29 02:00:00-07'::timestamp with time zone)) (10 rows) cemdb=> \q I then run the query manually: [root@rdl64xeoserv01 log]# time PGPASSWORD=quality psql -U admin -d cemdb -c "select b.ts_id from ts_stats_tranunit_user_daily b, ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id = c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and b.ts_user_incarnation_id = c.ts_user_incarnation_id and c.ts_interval_start_time >= '2010-3-29 01:00' and c.ts_interval_start_time < '2010-3-29 02:00' and b.ts_interval_start_time >= '2010-3-29' and b.ts_interval_start_time < '2010-3-30'" > /tmp/select.txt 2>&1 real 0m0.813s user 0m0.116s sys 0m0.013s I let process 10022 run for an hour. an strace shows lots of I/O: [root@rdl64xeoserv01 log]# strace -p 10022 read(18, "\214\2\0\0\374<\200#\1\0\0\0<\0P\3\0 \4 \0\0\0\0\320\234"..., 8192) = 8192 semop(73007122, 0xbfe0fc20, 1) = 0 _llseek(18, 538451968, [538451968], SEEK_SET) = 0 read(18, "\214\2\0\0\274\347\t#\1\0\0\0<\0P\3\0 \4 \0\0\0\0\320\234"..., 8192) = 8192 _llseek(18, 535928832, [535928832], SEEK_SET) = 0 read(18, "\214\2\0\0\310\300\226\"\1\0\0\0<\0P\3\0 \4 \0\0\0\0\320"..., 8192) = 8192 _llseek(18, 532398080, [532398080], SEEK_SET) = 0 <many more similar lines> I then kill 10022 and the application retries the same query: [10022-cemdb-admin-2010-03-30 19:02:37.460 PDT]FATAL: terminating connection due to administrator command [10022-cemdb-admin-2010-03-30 19:02:37.460 PDT]STATEMENT: select b.ts_id from ts_stats_tranunit_user_daily b, ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id = c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and b.ts_user_incarnation_id = c.ts_user_incarnation_id and c.ts_interval_start_time >= $1 and c.ts_interval_start_time < $2 and b.ts_interval_start_time >= $3 and b.ts_interval_start_time < $4 [10820-cemdb-admin-2010-03-30 19:02:40.363 PDT]LOG: duration: 1096.598 ms execute <unnamed>: select b.ts_id from ts_stats_tranunit_user_daily b, ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id = c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and b.ts_user_incarnation_id = c.ts_user_incarnation_id and c.ts_interval_start_time >= $1 and c.ts_interval_start_time < $2 and b.ts_interval_start_time >= $3 and b.ts_interval_start_time < $4 Ideas on why the big difference in execution times of the same query and how to avoid same are solicited. I tend to doubt, but don't know how to prove, that the query stats for these 2 tables were updated between the start of pid 10022 and when I did the EXPLAIN. Thanks, Brian
On Wed, Mar 31, 2010 at 12:11 AM, Brian Cox <brian.cox@ca.com> wrote:
These won't necessarily get the same plan. If you want to see what plan the prepared query is getting, you'll need to prepare it ("prepare foo as <query>") and then explain *that* via "explain execute foo".
The prepared version likely has a much more generic plan, whereas the regular query gets optimized for the actual values provided.
2010-03-30 18:41:11.685261-07 | select b.ts_id from ts_stats_tranunit_user_daily b, ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id = c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and b.ts_user_incarnation_id = c.ts_user_incarnation_id and c.ts_interval_start_time >= $1 and c.ts_interval_start_time < $2 and b.ts_interval_start_time >= $3 and b.ts_interval_start_time < $4
(1 row)
about 5 mins later, I, suspecting problems, do (the values are the same as for $1 et al above; EXPLAIN was done on purpose to keep stats [hopefully] the same as when pid 10022 started; there are 80,000 rows in each of the 2 tables at the time of this EXPLAIN and when 10022 started):
cemdb=> explain select b.ts_id from ts_stats_tranunit_user_daily b, ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id = c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and b.ts_user_incarnation_id = c.ts_user_incarnation_id and c.ts_interval_start_time >= '2010-3-29 01:00' and c.ts_interval_start_time < '2010-3-29 02:00' and b.ts_interval_start_time >= '2010-3-29' and b.ts_interval_start_time < '2010-3-30';
These won't necessarily get the same plan. If you want to see what plan the prepared query is getting, you'll need to prepare it ("prepare foo as <query>") and then explain *that* via "explain execute foo".
The prepared version likely has a much more generic plan, whereas the regular query gets optimized for the actual values provided.
--
- David T. Wilson
david.t.wilson@gmail.com
On 03/31/2010 12:37 AM, David Wilson [david.t.wilson@gmail.com] wrote: > These won't necessarily get the same plan. If you want to see what plan > the prepared query is getting, you'll need to prepare it ("prepare foo > as <query>") and then explain *that* via "explain execute foo". > > The prepared version likely has a much more generic plan, whereas the > regular query gets optimized for the actual values provided. I didn't know this. Thanks. The plans are indeed different: cemdb=# prepare sq as select b.ts_id from ts_stats_tranunit_user_daily b, ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id = c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and b.ts_user_incarnation_id = c.ts_user_incarnation_id and c.ts_interval_start_time >= $1 and c.ts_interval_start_time < $2 and b.ts_interval_start_time >= $3 and b.ts_interval_start_time < $4; cemdb=# explain execute sq('2010-3-29 01:00', '2010-3-29 02:00', '2010-3-29', '2010-3-30'); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=7885.37..8085.91 rows=30 width=8) Merge Cond: ((b.ts_transet_incarnation_id = c.ts_transet_incarnation_id) AND (b.ts_tranunit_id = c.ts_tranunit_id) AND (b.ts_user_incarnation_id = c.ts_user_incarnation_id)) -> Sort (cost=1711.82..1716.81 rows=3994 width=32) Sort Key: b.ts_transet_incarnation_id, b.ts_tranunit_id, b.ts_user_incarnation_id -> Index Scan using ts_stats_tranunit_user_daily_starttime on ts_stats_tranunit_user_daily b (cost=0.00..1592.36 rows=3994 width=32) Index Cond: ((ts_interval_start_time >= $3) AND (ts_interval_start_time < $4)) cemdb=# explain select b.ts_id from ts_stats_tranunit_user_daily b, ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id = c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and b.ts_user_incarnation_id = c.ts_user_incarnation_id and c.ts_interval_start_time >= '2010-3-29 01:00' and c.ts_interval_start_time < '2010-3-29 02:00' and b.ts_interval_start_time >= '2010-3-29' and b.ts_interval_start_time < '2010-3-30'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=291965.90..335021.46 rows=13146 width=8) Hash Cond: ((c.ts_transet_incarnation_id = b.ts_transet_incarnation_id) AND (c.ts_tranunit_id = b.ts_tranunit_id) AND (c.ts_user_incarnation_id = b.ts_user_incarnation_id)) -> Index Scan using ts_stats_tranunit_user_interval_starttime on ts_stats_tranunit_user_interval c (cost=0.00..11783.36 rows=88529 width=24) Index Cond: ((ts_interval_start_time >= '2010-03-29 01:00:00-07'::timestamp with time zone) AND (ts_interval_start_time < '2010-03-29 02:00:00-07'::timestamp with time zone)) -> Hash (cost=285681.32..285681.32 rows=718238 width=32) -> Index Scan using ts_stats_tranunit_user_daily_starttime on ts_stats_tranunit_user_daily b (cost=0.00..285681.32 rows=718238 width=32) Index Cond: ((ts_interval_start_time >= '2010-03-29 00:00:00-07'::timestamp with time zone) AND (ts_interval_start_time < '2010-03-30 00:00:00-07'::timestamp with time zone)) (7 rows) -> Sort (cost=6173.55..6218.65 rows=36085 width=24) Sort Key: c.ts_transet_incarnation_id, c.ts_tranunit_id, c.ts_user_incarnation_id -> Index Scan using ts_stats_tranunit_user_interval_starttime on ts_stats_tranunit_user_interval c (cost=0.00..4807.81 rows=36085 width=24) Index Cond: ((ts_interval_start_time >= $1) AND (ts_interval_start_time < $2)) (10 rows) I notice that the row estimates are substantially different; this is due to the lack of actual values? But, this prepared query runs in ~4 secs: [root@rdl64xeoserv01 log]# cat /tmp/select.sql prepare sq as select b.ts_id from ts_stats_tranunit_user_daily b, ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id = c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and b.ts_user_incarnation_id = c.ts_user_incarnation_id and c.ts_interval_start_time >= $1 and c.ts_interval_start_time < $2 and b.ts_interval_start_time >= $3 and b.ts_interval_start_time < $4; execute sq('2010-3-29 01:00', '2010-3-29 02:00', '2010-3-29', '2010-3-30 [root@rdl64xeoserv01 log]# time PGPASSWORD=quality psql -U postgres -d cemdb -f /tmp/select.sql > /tmp/select1.txt 2>&1 real 0m4.131s user 0m0.119s sys 0m0.007s so the question still remains: why did it take > 20 mins? To see if it was due to autovacuum running ANALYZE, I turned off autovacuum, created a table using SELECT * INTO temp FROM ts_stats_tranunit_user_daily, added the index on ts_interval_start_time and ran the prepared query with temp, but the query completed in a few secs. Brian
On Wed, Mar 31, 2010 at 2:10 PM, Brian Cox <brian.cox@ca.com> wrote:
It's possible that statistics were updated between the >20 minute run and your most recent prepared query test. In fact, comparing the plans between your two emails, it's quite likely, as even the non-prepared versions are not producing the same plan or the same estimates; it's therefore possible that your problem has already corrected itself if you're unable to duplicate the 20 minute behaviour at this point.
so the question still remains: why did it take > 20 mins? To see if it was due to autovacuum running ANALYZE, I turned off autovacuum, created a table using SELECT * INTO temp FROM ts_stats_tranunit_user_daily, added the index on ts_interval_start_time and ran the prepared query with temp, but the query completed in a few secs.
Taking a look at the statistics accuracy with an explain analyze might still be informative, however.
--
- David T. Wilson
david.t.wilson@gmail.com