Re: order of nested loop - Mailing list pgsql-general

From Jean-Luc Lachance
Subject Re: order of nested loop
Date
Msg-id 3EEF314A.D1448C8B@nsd.ca
Whole thread Raw
In response to order of nested loop  (Joseph Shraibman <jks@selectacast.net>)
Responses Re: order of nested loop  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Tom,


I am currious.  Why perform a sort first?  Would it not be more
efficient to create an in memory index on the fly?  I seems to me that
it would consume less resources, specially if there are many duplicates.
Also caching the last key lookup would save time if the records are
"bunched".  What am I mising here?

JL


Tom Lane wrote:
>
> Joseph Shraibman <joseph@xtenit.com> writes:
> > Tom Lane wrote:
> >> Joseph Shraibman <jks@selectacast.net> writes:
> >>> I have two queries that return identical results.  One is a SELECT
> >>> DISTINCT and the other is the same query without the DISTINCT.
> >>
> >> Could we see the actual queries?  And the complete EXPLAIN ANALYZE
> >> output?
>
> > (private email, bec I don't want to sanatize this huge thing for the list)
>
> Okay, but people send bigger things than this to the list all the time...
>
> > [fast case]
> > (SELECT distinct
> > u.userkey,u.adminper,u.addper,u.approvper,u.ownerper,u.subper,u.banned,u.status,u.config,u.rules, ...
> > ORDER BY lower(d.username) LIMIT 25)
>
> >                             ->  Limit  (cost=36903.81..36916.31 rows=25 width=604) (actual
> > time=10.92..10.93 rows=1 loops=1)
> >                                   ->  Unique  (cost=36903.81..37128.43 rows=449 width=604)
> > (actual time=10.92..10.92 rows=1 loops=1)
> >                                         ->  Sort  (cost=36903.81..36915.04 rows=4492
> > width=604) (actual time=10.91..10.91 rows=1 loops=1)
> >                                               Sort Key: lower(d.username), u.userkey,
> > u.adminper, u.addper, u.approvper, u.ownerper, u.subper, u.banned, u.status, u.co
> > nfig, u.rules, (subplan), CASE WHEN ((u.rules IS NULL) OR (u.rules = ''::text)) THEN
> > ''::text ELSE 'r'::text END, d.username, d.realname, d.firstname, (subplan), d.st
> > atus, 2
> >                                               ->  Nested Loop  (cost=0.00..36631.27
> > rows=4492 width=604) (actual time=10.65..10.67 rows=1 loops=1)
> >                                                     ->  Index Scan using
> > usertable_podkey_key on usertable u  (cost=0.00..16003.53 rows=5446 width=555) (actual time=0.
> > 04..0.05 rows=1 loops=1)
> >                                                           Index Cond: (podkey = 20)
> >                                                           Filter: ((status = 2) AND (NOT
> > banned))
> >                                                     ->  Index Scan using directory_pkey on
> > directory d  (cost=0.00..3.78 rows=1 width=49) (actual time=0.02..0.03 rows=
> > 1 loops=1)
> >                                                           Index Cond: (d.userkey =
> > "outer".userkey)
> >                                                           Filter: ((status = 2) OR (status
> > = 5))
>
> > [slow case]
> > (SELECT
> > u.userkey,u.adminper,u.addper,u.approvper,u.ownerper,u.subper,u.banned,u.status,u.config,u.rules, ...
> > ORDER BY lower(d.username) LIMIT 25)
>
> >                             ->  Limit  (cost=0.00..30602.38 rows=25 width=604) (actual
> > time=74810.10..102624.62 rows=1 loops=1)
> >                                   ->  Nested Loop  (cost=0.00..5499145.64 rows=4492
> > width=604) (actual time=74810.10..102624.61 rows=1 loops=1)
> >                                         ->  Index Scan using directory_lower_username_key
> > on directory d  (cost=0.00..2380577.42 rows=525568 width=49) (actual time=15.61..79588.09
> > rows=589718 loops=1)
> >                                               Filter: ((status = 2) OR (status = 5))
> >                                         ->  Index Scan using usertable_pkey on usertable u
> >   (cost=0.00..5.92 rows=1 width=555) (actual time=0.04..0.04 rows=0 loops=589718)
> >                                               Index Cond: (("outer".userkey = u.userkey)
> > AND (u.podkey = 20))
> >                                               Filter: ((status = 2) AND (NOT banned))
>
> As near as I can tell, the failure of estimation is not in the slow case
> --- the planner correctly estimates that this plan is expensive.
> Rather, it's in the fast case, because the planner mistakenly thinks
> that that is also expensive.  The critical error is right here:
>
> >                                                     ->  Index Scan using
> > usertable_podkey_key on usertable u  (cost=0.00..16003.53 rows=5446 width=555) (actual time=0.
> > 04..0.05 rows=1 loops=1)
> >                                                           Index Cond: (podkey = 20)
> >                                                           Filter: ((status = 2) AND (NOT
> > banned))
>
> That scan is estimated to yield 5446 rows and it only yields 1.  Do you
> have any idea why the estimate is so far off?  (My guess is that podkey,
> status and banned are correlated to a large extent, but you tell us.)
>
>                         regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html

pgsql-general by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: Sort memory not being released
Next
From: Tom Lane
Date:
Subject: Re: full featured alter table?