Thread: Why is query selecting sequential?

Why is query selecting sequential?

From
Karl Denninger
Date:
I have two tables which have related selection data; they get updated
separately.  One contains messages, the second an "index key" for each
user's viewing history.

When I attempt to use a select that merges the two to produce a "true or
false" output in one of the reply rows, I get a sequential scan of the
second table - which is NOT what I want!

Here are the table definitions and query explain results...

akcs=> \d post
                                         Table "public.post"
  Column   |            Type             |                         Modifiers
-----------+-----------------------------+-----------------------------------------------------------
 forum     | text                        |
 number    | integer                     |
 toppost   | integer                     |
 views     | integer                     | default 0
 login     | text                        |
 subject   | text                        |
 message   | text                        |
 inserted  | timestamp without time zone |
 modified  | timestamp without time zone |
 who       | text                        |
 reason    | text                        |
 ordinal   | integer                     | not null default nextval('public.post_ordinal_seq'::text)
 replies   | integer                     | default 0
 ip        | text                        |
 invisible | integer                     |
 sticky    | integer                     |
 lock      | integer                     |
 pinned    | integer                     | default 0
 replied   | timestamp without time zone |
Indexes:
    "post_forum" btree (forum)
    "post_lookup" btree (forum, number)
    "post_order" btree (number, inserted)
    "post_toppost" btree (forum, toppost, inserted)


akcs=> \d forumlog;
              Table "public.forumlog"
  Column  |            Type             | Modifiers
----------+-----------------------------+-----------
 login    | text                        |
 forum    | text                        |
 lastview | timestamp without time zone |
 number   | integer                     |
Indexes:
    "forumlog_composite" btree (login, forum, number)
    "forumlog_login" btree (login)
    "forumlog_number" btree (number)

akcs=> explain select forum, (replied > (select lastview from forumlog where forumlog.login='%s' and
forumlog.forum='%s'and number=post.number)) as newflag, * from post where forum = '%s' and toppost = 1 order by pinned
desc,replied desc; 
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 Sort  (cost=3.20..3.21 rows=1 width=218)
   Sort Key: pinned, replied
   ->  Index Scan using post_forum on post  (cost=0.00..3.19 rows=1 width=218)
         Index Cond: (forum = '%s'::text)
         Filter: (toppost = 1)
         SubPlan
           ->  Seq Scan on forumlog  (cost=0.00..1.18 rows=1 width=8)
                 Filter: ((login = '%s'::text) AND (forum = '%s'::text) AND (number = $0))
(8 rows)

Why is the subplan using a sequential scan?  At minimum the index on the
post number ("forumlog_number") should be used, no?  What would be even
better would be a set of indices that allow at least two (or even all three)
of the keys in the inside SELECT to be used.

What am I missing here?

--
--
Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist
http://www.denninger.net    Tired of spam at your company?  LOOK HERE!
http://childrens-justice.org    Working for family and children's rights
http://diversunion.org        LOG IN AND GET YOUR TANK STICKERS TODAY!

Re: Why is query selecting sequential?

From
Josh Berkus
Date:
Karl,

>          SubPlan
>            ->  Seq Scan on forumlog  (cost=0.00..1.18 rows=1 width=8)
>                  Filter: ((login = '%s'::text) AND (forum = '%s'::text) AND
(number = $0))

> Why is the subplan using a sequential scan?  At minimum the index on the
> post number ("forumlog_number") should be used, no?  What would be even
> better would be a set of indices that allow at least two (or even all three)
> of the keys in the inside SELECT to be used.

It's using a seq scan because you have only 1 row in the table.     Don't
bother testing performance before your database is populated.

PostgreSQL doesn't just use an index because it's there; it uses and index
because it's faster than not using one.

If there is more than one row in the table, then:
1) run ANALYZE forumlog;
2) Send us the EXPLAIN ANALYZE, not just the explain for the query.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Why is query selecting sequential?

From
Karl Denninger
Date:
On Fri, Feb 06, 2004 at 01:51:39PM -0800, Josh Berkus wrote:
> Karl,
>
> >          SubPlan
> >            ->  Seq Scan on forumlog  (cost=0.00..1.18 rows=1 width=8)
> >                  Filter: ((login = '%s'::text) AND (forum = '%s'::text) AND
> (number = $0))
>
> > Why is the subplan using a sequential scan?  At minimum the index on the
> > post number ("forumlog_number") should be used, no?  What would be even
> > better would be a set of indices that allow at least two (or even all three)
> > of the keys in the inside SELECT to be used.
>
> It's using a seq scan because you have only 1 row in the table.     Don't
> bother testing performance before your database is populated.
>
> PostgreSQL doesn't just use an index because it's there; it uses and index
> because it's faster than not using one.
>
> If there is more than one row in the table, then:
> 1) run ANALYZE forumlog;
> 2) Send us the EXPLAIN ANALYZE, not just the explain for the query.

Hmmm... there is more than one row in the table. :-)  There aren't a huge
number, but there are a few.  I know about the optimizer not using indices
if there are no (or only one) row in the table - not making that
mistake here.

Ran analyze forumlog;

Same results.

Here's an explain analyze with actual values (that DO match real values in
the table) filled in.

akcs=> explain analyze select forum, (replied > (select lastview from forumlog where forumlog.login='genesis' and
forumlog.forum='General'and number=post.number)) as newflag, * from post where forum = 'General' and toppost = 1 order
bypinned desc, replied desc;                
                                                         QUERY PLAN
    

----------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=28.41..28.42 rows=6 width=218) (actual time=0.677..0.698 rows=5 loops=1)
   Sort Key: pinned, replied
   ->  Index Scan using post_toppost on post  (cost=0.00..28.33 rows=6 width=218) (actual time=0.403..0.606 rows=5
loops=1)
         Index Cond: ((forum = 'General'::text) AND (toppost = 1))
         SubPlan
           ->  Seq Scan on forumlog  (cost=0.00..1.18 rows=1 width=8) (actual time=0.015..0.027 rows=1 loops=5)
                 Filter: ((login = 'genesis'::text) AND (forum = 'General'::text) AND (number = $0))
 Total runtime: 0.915 ms
(8 rows)

--
--
Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist
http://www.denninger.net    Tired of spam at your company?  LOOK HERE!
http://childrens-justice.org    Working for family and children's rights
http://diversunion.org        LOG IN AND GET YOUR TANK STICKERS TODAY!

Re: Why is query selecting sequential?

From
Josh Berkus
Date:
Karl,

Well, still with only 5 rows in the forumlog table you're not going get
realistic results compared to a loaded database.  However, you are making
things difficult for the parser with awkward query syntax; what you currently
have encourages a sequential loop.

If there are potentially several rows in forumlog for each row in post, then
your query won't work either.

> akcs=> explain analyze select forum, (replied > (select lastview from
forumlog where forumlog.login='genesis' and forumlog.forum='General' and
number=post.number)) as newflag, * from post where forum = 'General' and
toppost = 1 order by pinned desc, replied desc;

Instead:

if only one row in forumlog per row in post:

SELECT (replied > lastview) AS newflag, post.*
FROM post, forumlog
WHERE post.forum = 'General' and toppost = 1 and forumlog.login = 'genesis'
and forumlog.forum='General' and forumlog.number=post.number;

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Why is query selecting sequential?

From
Karl Denninger
Date:
On Fri, Feb 06, 2004 at 02:36:57PM -0800, Josh Berkus wrote:
> Karl,
>
> Well, still with only 5 rows in the forumlog table you're not going get
> realistic results compared to a loaded database.  However, you are making
> things difficult for the parser with awkward query syntax; what you currently
> have encourages a sequential loop.
>
> If there are potentially several rows in forumlog for each row in post, then
> your query won't work either.

It better not.  Indeed, I WANT it to blow up if there is, as that's a
serious error, and am counting on that to happen (and yes, I know it will -
and it should!)

> > akcs=> explain analyze select forum, (replied > (select lastview from
> forumlog where forumlog.login='genesis' and forumlog.forum='General' and
> number=post.number)) as newflag, * from post where forum = 'General' and
> toppost = 1 order by pinned desc, replied desc;
>
> Instead:
>
> if only one row in forumlog per row in post:
>
> SELECT (replied > lastview) AS newflag, post.*
> FROM post, forumlog
> WHERE post.forum = 'General' and toppost = 1 and forumlog.login = 'genesis'
> and forumlog.forum='General' and forumlog.number=post.number;

It still thinks its going to sequentially scan it...

I'll see what happens when I get some more rows in the table and if it
decides to start using the indices then....

akcs=> explain analyze select (replied > lastview) as newflag, post.* from post, forumlog where post.forum ='General'
andtoppost = 1 and forumlog.login='genesis' and forumlog.forum='General' order by post.pinned desc, post.replied desc; 
                                                            QUERY PLAN
          

----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=23.83..23.90 rows=30 width=226) (actual time=1.228..1.331 rows=25 loops=1)
   Sort Key: post.pinned, post.replied
   ->  Nested Loop  (cost=1.15..23.09 rows=30 width=226) (actual time=0.157..0.797 rows=25 loops=1)
         ->  Index Scan using post_toppost on post  (cost=0.00..21.27 rows=6 width=218) (actual time=0.059..0.089
rows=5loops=1) 
               Index Cond: ((forum = 'General'::text) AND (toppost = 1))
         ->  Materialize  (cost=1.15..1.20 rows=5 width=8) (actual time=0.013..0.046 rows=5 loops=5)
               ->  Seq Scan on forumlog  (cost=0.00..1.15 rows=5 width=8) (actual time=0.027..0.065 rows=5 loops=1)
                     Filter: ((login = 'genesis'::text) AND (forum = 'General'::text))
 Total runtime: 1.754 ms
(9 rows)

--
--
Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist
http://www.denninger.net    Tired of spam at your company?  LOOK HERE!
http://childrens-justice.org    Working for family and children's rights
http://diversunion.org        LOG IN AND GET YOUR TANK STICKERS TODAY!

Re: Why is query selecting sequential?

From
Tom Lane
Date:
Karl Denninger <karl@denninger.net> writes:
> akcs=> explain analyze select forum, (replied > (select lastview from forumlog where forumlog.login='genesis' and
forumlog.forum='General'and number=post.number)) as newflag, * from post where forum = 'General' and toppost = 1 order
bypinned desc, replied desc;                
>                                                          QUERY PLAN
      
>
----------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=28.41..28.42 rows=6 width=218) (actual time=0.677..0.698 rows=5 loops=1)
>    Sort Key: pinned, replied
>    ->  Index Scan using post_toppost on post  (cost=0.00..28.33 rows=6 width=218) (actual time=0.403..0.606 rows=5
loops=1)
>          Index Cond: ((forum = 'General'::text) AND (toppost = 1))
>          SubPlan
>            ->  Seq Scan on forumlog  (cost=0.00..1.18 rows=1 width=8) (actual time=0.015..0.027 rows=1 loops=5)
>                  Filter: ((login = 'genesis'::text) AND (forum = 'General'::text) AND (number = $0))
>  Total runtime: 0.915 ms
> (8 rows)

As noted elsewhere, the inner subplan will not switch over to an
indexscan until you get some more data in that table.  Note however that
the subplan is only accounting for about 0.13 msec (0.027*5) so it's not
the major cost here anyway.  The slow part seems to be the indexed fetch
from "post", which is taking nearly 0.5 msec to fetch five rows.

            regards, tom lane

Re: Why is query selecting sequential?

From
Karl Denninger
Date:
On Sat, Feb 07, 2004 at 01:51:54AM -0500, Tom Lane wrote:
> Karl Denninger <karl@denninger.net> writes:
> > akcs=> explain analyze select forum, (replied > (select lastview from forumlog where forumlog.login='genesis' and
forumlog.forum='General'and number=post.number)) as newflag, * from post where forum = 'General' and toppost = 1 order
bypinned desc, replied desc;                
> >                                                          QUERY PLAN
        
> >
----------------------------------------------------------------------------------------------------------------------------
> >  Sort  (cost=28.41..28.42 rows=6 width=218) (actual time=0.677..0.698 rows=5 loops=1)
> >    Sort Key: pinned, replied
> >    ->  Index Scan using post_toppost on post  (cost=0.00..28.33 rows=6 width=218) (actual time=0.403..0.606 rows=5
loops=1)
> >          Index Cond: ((forum = 'General'::text) AND (toppost = 1))
> >          SubPlan
> >            ->  Seq Scan on forumlog  (cost=0.00..1.18 rows=1 width=8) (actual time=0.015..0.027 rows=1 loops=5)
> >                  Filter: ((login = 'genesis'::text) AND (forum = 'General'::text) AND (number = $0))
> >  Total runtime: 0.915 ms
> > (8 rows)
>
> As noted elsewhere, the inner subplan will not switch over to an
> indexscan until you get some more data in that table.  Note however that
> the subplan is only accounting for about 0.13 msec (0.027*5) so it's not
> the major cost here anyway.  The slow part seems to be the indexed fetch
> from "post", which is taking nearly 0.5 msec to fetch five rows.
>
>             regards, tom lane

Ok...

BTW, the other posted "cleaner" model doesn't work for me.  If there is NO
row in the subtable that matches, the other version returns nothing (which
makes sense since the initial select fails to match any rows as one of the
things its trying to match is missing.)

I do need a return even if the log row is missing (and it WILL be, for a
first visit to that particular item in the table by a particular user)

--
--
Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist
http://www.denninger.net    Tired of spam at your company?  LOOK HERE!
http://childrens-justice.org    Working for family and children's rights
http://diversunion.org        LOG IN AND GET YOUR TANK STICKERS TODAY!

Re: Why is query selecting sequential?

From
Josh Berkus
Date:
Karl,

> BTW, the other posted "cleaner" model doesn't work for me.  If there is NO
> row in the subtable that matches, the other version returns nothing (which
> makes sense since the initial select fails to match any rows as one of the
> things its trying to match is missing.)

Ah, wasn't thinking of that case.    Problem with not really knowing what the
database is about.

--
Josh Berkus
Aglio Database Solutions
San Francisco