select count(idn) is slow (Seq Scan) instead of Bitmap Heap.. why? - Mailing list pgsql-performance

From Adam PAPAI
Subject select count(idn) is slow (Seq Scan) instead of Bitmap Heap.. why?
Date
Msg-id 4A75BEB5.2070002@wooh.hu
Whole thread Raw
Responses Re: select count(idn) is slow (Seq Scan) instead of Bitmap Heap.. why?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
Hello,


I have a problem with an inner join + count().

my query is:

explain analyze select
k.idn,k.kerdes_subject,k.kerdes_text,u.vezeteknev,u.keresztnev,u.idn as
user_id, kg.kategoria_neve, count(v.idn)

FROM kategoriak as kg

INNER JOIN kerdesek as k on kg.idn = k.kategoria_id
INNER JOIN users as u ON k.user_id = u.idn
INNER JOIN valaszok as v ON k.idn = v.kerdes_id

where kg.idn=15 group by k.idn, k.kerdes_subject,k.kerdes_text,
u.idn,u.vezeteknev,u.keresztnev,kg.kategoria_neve

The problem is with the count(v.idn).

This column has a relation with: v.kerdes_id = k.idn => k.kategoria_id =
  kg.idn

and the WHERE says: kg.idn = 15.

Why does it run through all lines in v?

the explain sais:

  GroupAggregate  (cost=103238.59..103602.66 rows=10402 width=1382)
(actual time=8531.405..8536.633 rows=73 loops=1)
    ->  Sort  (cost=103238.59..103264.59 rows=10402 width=1382) (actual
time=8531.339..8533.199 rows=1203 loops=1)
          Sort Key: k.idn, k.kerdes_subject, k.kerdes_text, u.idn,
u.vezeteknev, u.keresztnev, kg.kategoria_neve
          ->  Hash Join  (cost=3827.79..89951.54 rows=10402 width=1382)
(actual time=1778.590..8523.015 rows=1203 loops=1)
                Hash Cond: (v.kerdes_id = k.idn)
                ->  Seq Scan on valaszok v  (cost=0.00..78215.98
rows=2080998 width=8) (actual time=59.714..5009.171 rows=2080998 loops=1)
                ->  Hash  (cost=3823.42..3823.42 rows=350 width=1378)
(actual time=12.553..12.553 rows=74 loops=1)
                      ->  Nested Loop  (cost=14.98..3823.42 rows=350
width=1378) (actual time=0.714..12.253 rows=74 loops=1)
                            ->  Nested Loop  (cost=14.98..1056.38
rows=350 width=830) (actual time=0.498..5.952 rows=117 loops=1)
                                  ->  Seq Scan on kategoriak kg
(cost=0.00..1.30 rows=1 width=278) (actual time=0.066..0.076 rows=1 loops=1)
                                        Filter: (idn = 15)
                                  ->  Bitmap Heap Scan on kerdesek k
(cost=14.98..1051.58 rows=350 width=560) (actual time=0.374..5.430
rows=117 loops=1)
                                        Recheck Cond: (15 = kategoria_id)
                                        ->  Bitmap Index Scan on
kategoria_id_id_idx  (cost=0.00..14.89 rows=350 width=0) (actual
time=0.212..0.212 rows=117 loops=1)
                                              Index Cond: (15 =
kategoria_id)
                            ->  Index Scan using users_pkey on users u
(cost=0.00..7.89 rows=1 width=552) (actual time=0.047..0.048 rows=1
loops=117)
                                  Index Cond: (k.user_id = u.idn)
  Total runtime: 8536.936 ms



So it run through more than 2 mill lines... but why? It should only
count those lines which has the category_id = 15...

What am I doing wrong?



--
Adam PAPAI

pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: Performance 8.4.0
Next
From: Suvankar Roy
Date:
Subject: Greenplum MapReduce