Home > mailing lists

Re: Optimising SELECT on a table with one million rows - Mailing list pgsql-general

From	Jon Sime
Subject	Re: Optimising SELECT on a table with one million rows
Date	July 30, 2007 17:29:36
Msg-id	46AE1FCA.7030402@mediamatters.org Whole thread Raw
In response to	Optimising SELECT on a table with one million rows (Cultural Sublimation <cultural_sublimation@yahoo.com>)
Responses	Re: Optimising SELECT on a table with one million rows
List	pgsql-general

Tree view

Cultural Sublimation wrote:
> SELECT comments.comment_id, users.user_name
> FROM comments, users
> WHERE comments.comment_story = 100 AND comments.comment_author = users.user_id;
>
> The problem is that this query takes a *very* long time.  With the said
> 1,000,000 comments, it needs at least 1100ms on my system.  "Explain
> analyze" tells me that a sequential scan is being performed on both
> users and comments:
>
> Hash Join  (cost=28.50..21889.09 rows=988 width=14) (actual
> time=3.674..1144.779 rows=1000 loops=1)
>    Hash Cond: ((comments.comment_author)::integer = (users.user_id)::integer)
>    ->  Seq Scan on comments  (cost=0.00..21847.00 rows=988 width=8) (actual
> time=0.185..1136.067 rows=1000 loops=1)
>          Filter: ((comment_story)::integer = 100)
>    ->  Hash  (cost=16.00..16.00 rows=1000 width=14) (actual time=3.425..3.425
> rows=1000 loops=1)
>          ->  Seq Scan on users  (cost=0.00..16.00 rows=1000 width=14) (actual
> time=0.068..1.845 rows=1000 loops=1)
>  Total runtime: 1146.424 ms

If you have no index on comments.comment_author, then a seqscan will be
required for your join between comments and users. Similarly, if you
have no index on comments.comment_story, then any query against comments
that uses that column as part of a predicate will require a seqscan of
the comments table.

Note that an FK constraint does not automatically create an index on the
underlying column. You need to create the actual index yourself if it
will be necessary for your queries.

> On the long run, I guess one possible solution to this problem will be
> to partition the comments table into a number of sub-tables, most likely
> based on the timestamp attribute (by having current versus historic data).

Partitioning on comments.comment_timestamp won't help you at all for
this particular query, since you don't have a condition in your query
dependent upon that value. It might help you for other queries (such as
gathering up all the comments posted on a particular day, or during some
other time range), but it won't make any positive difference for this query.

-Jon

--
Senior Systems Developer
Media Matters for America
http://mediamatters.org/

pgsql-general by date:

From: Nis Jørgensen
Date: 30 July 2007, 17:26:07
Subject: Re: Optimising SELECT on a table with one million rows

From: Cultural Sublimation
Date: 30 July 2007, 18:21:08
Subject: Re: Optimising SELECT on a table with one million rows

Re: Optimising SELECT on a table with one million rows - Mailing list pgsql-general

Previous

Next