Re: Left Join Not Using Index? - Mailing list pgsql-general

From scott.marlowe
Subject Re: Left Join Not Using Index?
Date
Msg-id Pine.LNX.4.33.0304230904470.12251-100000@css120.ihs.com
Whole thread Raw
In response to Re: Left Join Not Using Index?  (Hunter Hillegas <lists@lastonepicked.com>)
List pgsql-general
On Tue, 22 Apr 2003, Hunter Hillegas wrote:

> Thanks for responding...
>
> With enable_seqscan = false:

SNIP

>  Total runtime: 40855.59 msec
>
> With enable_seqscan = true:
>
>  Unique  (cost=34847.38..35145.38 rows=1192 width=293) (actual
> time=13005.13..13008.51 rows=51 loops=1)
>    ->  Sort  (cost=34847.38..34877.18 rows=11920 width=293) (actual
> time=13005.12..13005.73 rows=292 loops=1)
>          Sort Key: message_board_topics.rec_num,
> message_board_topics.topic_name, message_board_topics.topic_body,
> message_board_topics.topic_author, message_board_topics.topic_author_email,
> message_board_topics.topic_updated,
> message_board_topics.administrator_topic,
> message_board_topics.number_of_comments,
> to_char((message_board_topics.topic_date)::timestamp with time zone,
> 'MM.DD.YYYY'::text)

Note the actual time on the merge join here (about 8000 msec):

>          ->  Merge Join  (cost=26858.21..33007.14 rows=11920 width=293)
> (actual time=4930.32..12949.93 rows=292 loops=1)
>                Merge Cond: ("outer".rec_num = "inner".topic_id)
>                Filter: ((upper(("outer".topic_name)::text) ~~
> 'MADBROWSER'::text) OR (upper("outer".topic_body) ~~ 'MADBROWSER'::text) OR
> (upper(("outer".topic_author)::text) ~~ 'MADBROWSER'::text) OR
> (upper(("inner".comment_author)::text) ~~ 'MADBROWSER'::text))
>                ->  Sort  (cost=2446.01..2475.81 rows=11920 width=265)
> (actual time=628.30..953.50 rows=11920 loops=1)
>                      Sort Key: message_board_topics.rec_num
>                      ->  Seq Scan on message_board_topics
> (cost=0.00..712.20 rows=11920 width=265) (actual time=0.10..223.96
> rows=11920 loops=1)

Note the sort is showing ~ 4000 to 5000 msec

>                ->  Sort  (cost=24412.20..24818.15 rows=162382 width=28)
> (actual time=4301.14..5788.66 rows=162382 loops=1)
>                      Sort Key: message_board_comments.topic_id
>                      ->  Seq Scan on message_board_comments
> (cost=0.00..7203.82 rows=162382 width=28) (actual time=0.10..1335.26
> rows=162382 loops=1)

Run time:

>  Total runtime: 13108.33 msec
> (13 rows)

So, this query, using the seq scan, is 3 times faster.  I.e. the planner
made the right move on the seq scan versus the index scan.

There may be a faster way than using a merge join, but more than likely,
your biggest gain will come from tuning postgresql and your OS to handle
more data at a time therefore faster.

What are your settings for sort_mem, shared_buffers, effective_cache_size?

They're all explained in the 3.4.2. Planner and Optimizer Tuning section
of the 7.3.2 docs quite well.

Look at changing any of these to off and see how it affects the planner as
well.

enable_seqscan
enable_indexscan
enable_tidscan
enable_sort
enable_nestloop
enable_mergejoin
enable_hashjoin

> Your suggestion didn't really make a whole lot of sense to me... Based on
> this info, what do you think?

Well, the problem with the suggestion was that while it did turn on index
scans, it actually resulted in slower performance, since a seq scan proved
3 times faster.  Barring other minor errors in the query planner, you can
either make the machine faster / tune postgresql, change your query to
something that runs faster by the nature of how it's written.  Maybe
having a more selective where clause or using one in a subselect that will
be run first, thus knocking down the amount of data your database has to
sling around, or go to full text indexing.  There were two or three
solutions in the contrib directory last I looked that both were way faster
than the typical "roll your own" solutions.


pgsql-general by date:

Previous
From: Dennis Gearon
Date:
Subject: Re: Regexps and Indices.
Next
From: "scott.marlowe"
Date:
Subject: Re: Solaris