Re: Seq scans on indexed columns. - Mailing list pgsql-performance

From Shane Ambler
Subject Re: Seq scans on indexed columns.
Date
Msg-id 478BB2C2.7040901@Sheeky.Biz
Whole thread Raw
In response to Seq scans on indexed columns.  (Yannick Le Guédart <yannick@over-blog.com>)
List pgsql-performance
Yannick Le Guédart wrote:
>     Greetings,
>
>     I was trying to get informations on #portgresql about a query plan I
> think is quite strange, and I was said to post on this list. I hope my
> mail will be clear enough. I have included the query, the query plan,
> and the table definitions. I just don't understand the "Seq Scan" on
> fileds that are indexed.
>
>     Thanks in advance
>
>     Yannick

First off what sort of response times are you getting with the query? or
are you just after an understanding of why it plans that way?

pg version?

What sort of row counts do you have in the tables?
Some of the steps show it expecting to get over 2M rows being returned.

How many rows are returned without the limit 5? Is it close to what you
would expect?



 From what I see the first seq scan (role_id=5) can't use an index as
role_id is only the third element of an index, it would need to be
matching all three on the condition to use it. But I don't expect that
to make much difference (time wise) if any.

The first nested loop has the biggest cost estimate difference and I
would expect that to be closer to what you are looking for.


My guess is that the joins are not producing the result you expect -
start without the joins (and the joined columns) and add one at a time
to see which is causing the problem. Then find a better way to join the
data.


http://www.postgresql.org/docs/8.2/interactive/explicit-joins.html
may give you some ideas...


>     Here we go :
>
>
> --------------------------------
>
> QUERY
>
> SELECT
>     _article.*,
>     (
>         SELECT COUNT (id)
>         FROM _comment
>         WHERE parent_id = _article.id
>     ) AS nb_comments,
>     _blog.id as blog_id,
>     _blog.name as blog_name,
>     xpath_string(_blog.data,'/data/title') as blog_title,
>     _blog.reference as blog_ref,
>     _blog.main_host as blog_main_host,
>     _user.id as user_id,
>     _user.reference as user_ref,
>     _user.nickname as user_nickname
>
> FROM _article
>
> INNER JOIN _blog
>     ON _article.path <@ _blog.path
> INNER JOIN _entity_has_element
>     ON _entity_has_element.element_id = _blog.id
> INNER JOIN _user
>     ON _user.id = _entity_has_element.entity_id
>     AND _entity_has_element.role_id = 5
>
> WHERE _article.id IN
>     (
>         SELECT _relation.destination_id AS id
>         FROM _relation
>         WHERE _relation.parent_id = 1008109112
>     )
> AND _article.date_publishing < now ()
>
> ORDER BY nb_comments DESC
> OFFSET 0
> LIMIT 5
>
> --------------------------------
>                                QUERY PLAN
> Limit  (cost=378253213.46..378253213.47 rows=5 width=1185)
> ->  Sort  (cost=378253213.46..378260027.29 rows=2725530 width=1185)
> Sort Key: (subplan)
> ->  Hash Join  (cost=4907270.58..375534454.12 rows=2725530 width=1185)
> Hash Cond: (_entity_has_element.element_id = _blog.id)
> ->  Hash Join  (cost=220747.87..260246.60 rows=543801 width=32)
> Hash Cond: (_entity_has_element.entity_id = _user.id)
> ->  Seq Scan on _entity_has_element  (cost=0.00..19696.96 rows=543801 width=16)
> Filter: (role_id = 5)
> ->  Hash  (cost=205537.72..205537.72 rows=806972 width=24)
> ->  Seq Scan on _user  (cost=0.00..205537.72 rows=806972 width=24)
> ->  Hash  (cost=4309146.55..4309146.55 rows=2388333 width=1161)
> ->  Nested Loop  (cost=8782.56..4309146.55 rows=2388333 width=1161)
> ->  Nested Loop  (cost=8689.45..43352.96 rows=5012 width=1073)
> ->  HashAggregate  (cost=8689.45..8740.05 rows=5060 width=8)
> ->  Bitmap Heap Scan on _relation  (cost=124.98..8674.40 rows=6021 width=8)
> Recheck Cond: (parent_id = 1008109112)
> ->  Bitmap Index Scan on idx_relation_parent_id  (cost=0.00..123.47 rows=6021 width=0)
> Index Cond: (parent_id = 1008109112)
> ->  Index Scan using _article_pkey on _article  (cost=0.00..6.83 rows=1 width=1073)
> Index Cond: (_article.id = _relation.destination_id)
> Filter: (date_publishing < now())
> ->  Bitmap Heap Scan on _blog  (cost=93.11..845.15 rows=477 width=114)
> Recheck Cond: (_article.path <@ _blog.path)
> ->  Bitmap Index Scan on gist_idx_blog_path  (cost=0.00..92.99 rows=477 width=0)
> Index Cond: (_article.path <@ _blog.path)
> SubPlan
> ->  Aggregate  (cost=135.81..135.82 rows=1 width=8)
> ->  Index Scan using idx_comment_parent_id on _comment  (cost=0.00..135.61 rows=79 width=8)
> Index Cond: (parent_id = $0)
>
> --------------------
>


--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

pgsql-performance by date:

Previous
From: Ansgar -59cobalt- Wiechers
Date:
Subject: Re: Saving result set of SELECT to table column
Next
From: "Robins Tharakan"
Date:
Subject: Re: Saving result set of SELECT to table column