Queries taking ages in PG 8.1, have been much faster in PG<=8.0 - Mailing list pgsql-performance

From Markus Wollny
Subject Queries taking ages in PG 8.1, have been much faster in PG<=8.0
Date
Msg-id 28011CD60FB1724DBA4442E38277F6264A626A@hermes.computec.de
Whole thread Raw
Responses Re: Queries taking ages in PG 8.1, have been much faster in PG<=8.0
List pgsql-performance
Hi!

I've got an urgent problem with an application which is evaluating a
monthly survey; it's running quite a lot of queries like this:

select  SOURCE.NAME as TYPE,
   count(PARTICIPANT.SESSION_ID) as TOTAL
from  (
   select  PARTICIPANT.SESSION_ID
   from   survey.PARTICIPANT,
      survey.ANSWER
   where  PARTICIPANT.STATUS = 1
   and   date_trunc('month', PARTICIPANT.CREATED) = date_trunc('month',
now()-'1 month'::interval)
   and   PARTICIPANT.SESSION_ID = ANSWER.SESSION_ID
   and   ANSWER.QUESTION_ID = 6
   and   ANSWER.VALUE = 1
   )
   as PARTICIPANT,
   survey.ANSWER,
   survey.HANDY_JAVA SOURCE
where  PARTICIPANT.SESSION_ID = ANSWER.SESSION_ID
and   ANSWER.QUESTION_ID = 16
and   ANSWER.VALUE = SOURCE.ID
group by SOURCE.NAME,
   SOURCE.POSITION
order by  SOURCE.POSITION asc;

My current PostgreSQL-version is "PostgreSQL 8.1.0 on i686-pc-linux-gnu,
compiled by GCC gcc (GCC) 3.2". Up to 8.0, a query like this took a
couple of seconds, maybe even up to a minute. In 8.1 a query like this
will run from 30 minutes up to two hours to complete, depending on ist
complexity. I've got autovaccum enabled and run a nightly vacuum analyze
over all of my databases. Here's some information about the relevant
tables: Table answer has got ~ 8.9M rows (estimated 8,872,130, counted
8,876,648), participant has got ~178K rows (estimated 178,165, counted
178,248), HANDY_JAVA has got three rows. This is the
explain-analyze-output for the above:

"Sort  (cost=11383.09..11383.10 rows=3 width=16) (actual
time=1952676.858..1952676.863 rows=3 loops=1)"
"  Sort Key: source."position""
"  ->  HashAggregate  (cost=11383.03..11383.07 rows=3 width=16) (actual
time=1952676.626..1952676.635 rows=3 loops=1)"
"        ->  Nested Loop  (cost=189.32..11383.00 rows=5 width=16)
(actual time=6975.812..1952371.782 rows=9806 loops=1)"
"              ->  Nested Loop  (cost=3.48..3517.47 rows=42 width=20)
(actual time=6819.716..15419.930 rows=9806 loops=1)"
"                    ->  Nested Loop  (cost=3.48..1042.38 rows=738
width=16) (actual time=258.434..6233.039 rows=162723 loops=1)"
"                          ->  Seq Scan on handy_java source
(cost=0.00..1.03 rows=3 width=14) (actual time=0.093..0.118 rows=3
loops=1)"
"                          ->  Bitmap Heap Scan on answer
(cost=3.48..344.04 rows=246 width=8) (actual time=172.381..1820.499
rows=54241 loops=3)"
"                                Recheck Cond: ((answer.question_id =
16) AND (answer.value = "outer".id))"
"                                ->  Bitmap Index Scan on
idx02_performance  (cost=0.00..3.48 rows=246 width=0) (actual
time=98.321..98.321 rows=54245 loops=3)"
"                                      Index Cond: ((answer.question_id
= 16) AND (answer.value = "outer".id))"
"                    ->  Index Scan using idx01_perf_0006 on participant
(cost=0.00..3.34 rows=1 width=4) (actual time=0.049..0.050 rows=0
loops=162723)"
"                          Index Cond: (participant.session_id =
"outer".session_id)"
"                          Filter: ((status = 1) AND
(date_trunc('month'::text, created) = date_trunc('month'::text, (now() -
'1 mon'::interval))))"
"              ->  Bitmap Heap Scan on answer  (cost=185.85..187.26
rows=1 width=4) (actual time=197.490..197.494 rows=1 loops=9806)"
"                    Recheck Cond: (("outer".session_id =
answer.session_id) AND (answer.question_id = 6) AND (answer.value = 1))"
"                    ->  BitmapAnd  (cost=185.85..185.85 rows=1 width=0)
(actual time=197.421..197.421 rows=0 loops=9806)"
"                          ->  Bitmap Index Scan on
idx_answer_session_id  (cost=0.00..2.83 rows=236 width=0) (actual
time=0.109..0.109 rows=49 loops=9806)"
"                                Index Cond: ("outer".session_id =
answer.session_id)"
"                          ->  Bitmap Index Scan on idx02_performance
(cost=0.00..182.77 rows=20629 width=0) (actual time=195.742..195.742
rows=165697 loops=9806)"
"                                Index Cond: ((question_id = 6) AND
(value = 1))"
"Total runtime: 1952678.393 ms"

I am really sorry, but currently I haven't got any 8.0-installation
left, so I cannot provide the explain (analyze) output for 8.0.

I fiddled a little with the statement and managed to speed things up
quite a lot:

select  SOURCE.NAME as TYPE,
   count(ANSWER.SESSION_ID) as TOTAL
from survey.ANSWER,
   survey.HANDY_JAVA SOURCE
where  ANSWER.QUESTION_ID = 16
and   ANSWER.VALUE = SOURCE.ID
and ANSWER.SESSION_ID in (
   select  PARTICIPANT.SESSION_ID
   from   survey.PARTICIPANT,
      survey.ANSWER
   where  PARTICIPANT.STATUS = 1
   and   date_trunc('month', PARTICIPANT.CREATED) = date_trunc('month',
now()-'1 month'::interval)
   and   PARTICIPANT.SESSION_ID = ANSWER.SESSION_ID
   and   ANSWER.QUESTION_ID = 6
   and   ANSWER.VALUE = 1
   )
group by SOURCE.NAME,
   SOURCE.POSITION
order by  SOURCE.POSITION asc;

Here's the explain analyze output:
"Sort  (cost=27835.39..27835.39 rows=3 width=16) (actual
time=9609.207..9609.212 rows=3 loops=1)"
"  Sort Key: source."position""
"  ->  HashAggregate  (cost=27835.33..27835.36 rows=3 width=16) (actual
time=9609.058..9609.067 rows=3 loops=1)"
"        ->  Hash IN Join  (cost=26645.78..27835.29 rows=5 width=16)
(actual time=6374.436..9548.945 rows=9806 loops=1)"
"              Hash Cond: ("outer".session_id = "inner".session_id)"
"              ->  Nested Loop  (cost=3.48..1042.38 rows=738 width=16)
(actual time=190.419..4817.977 rows=162704 loops=1)"
"                    ->  Seq Scan on handy_java source  (cost=0.00..1.03
rows=3 width=14) (actual time=0.036..0.058 rows=3 loops=1)"
"                    ->  Bitmap Heap Scan on answer  (cost=3.48..344.04
rows=246 width=8) (actual time=116.719..1390.931 rows=54235 loops=3)"
"                          Recheck Cond: ((answer.question_id = 16) AND
(answer.value = "outer".id))"
"                          ->  Bitmap Index Scan on idx02_performance
(cost=0.00..3.48 rows=246 width=0) (actual time=63.195..63.195
rows=54235 loops=3)"
"                                Index Cond: ((answer.question_id = 16)
AND (answer.value = "outer".id))"
"              ->  Hash  (cost=26639.37..26639.37 rows=1174 width=8)
(actual time=3906.831..3906.831 rows=9806 loops=1)"
"                    ->  Hash Join  (cost=4829.24..26639.37 rows=1174
width=8) (actual time=464.011..3877.539 rows=9806 loops=1)"
"                          Hash Cond: ("outer".session_id =
"inner".session_id)"
"                          ->  Bitmap Heap Scan on answer
(cost=182.76..21413.93 rows=20626 width=4) (actual
time=273.839..2860.984 rows=165655 loops=1)"
"                                Recheck Cond: ((question_id = 6) AND
(value = 1))"
"                                ->  Bitmap Index Scan on
idx02_performance  (cost=0.00..182.76 rows=20626 width=0) (actual
time=171.933..171.933 rows=165659 loops=1)"
"                                      Index Cond: ((question_id = 6)
AND (value = 1))"
"                          ->  Hash  (cost=4621.13..4621.13 rows=10141
width=4) (actual time=123.351..123.351 rows=11134 loops=1)"
"                                ->  Index Scan using idx01_perf_0005 on
participant  (cost=0.01..4621.13 rows=10141 width=4) (actual
time=0.545..93.200 rows=11134 loops=1)"
"                                      Index Cond:
(date_trunc('month'::text, created) = date_trunc('month'::text, (now() -
'1 mon'::interval)))"
"                                      Filter: (status = 1)"
"Total runtime: 9612.249 ms"

Regarding the total runtime, this is roughly in the same dimension as
for the original query in 8.0, as far as I remember. I haven't written
these queries myself in the first place, but they were done in the dark
ages when IN was a no-no and haven't been touched ever since - there
hadn't been any need to do so.

My current problem is that rewriting hundreds of queries, some of them
quite a bit more complex than this one, but all of them using the same
general scheme, would take quite a lot of time - and I'm expected to
hand over the survey results ASAP. So I will obviously have to do a
rewrite if there's just no other way, but I wondered if there might be
some other option that would allow me to point the planner in the right
direction so it would behave the same as in the previous versions,
namely 8.0?

Any suggestions?

Kind regards

   Markus

pgsql-performance by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: 15,000 tables
Next
From: Tom Lane
Date:
Subject: Re: slow insert into very large table