Re: Why is explain horribly optimistic for sorts? - Mailing list pgsql-general

From Ben
Subject Re: Why is explain horribly optimistic for sorts?
Date
Msg-id Pine.LNX.4.10.10103031243430.19743-100000@gilgamesh.eos.SilentMedia.com
Whole thread Raw
In response to Re: Why is explain horribly optimistic for sorts?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Sat, 3 Mar 2001, Tom Lane wrote:

> Yeah, I was afraid of that: the new code is smarter but also subject to
> the same range-estimation problem that plagues us in cases like "WHERE a
> BETWEEN b AND c".  The only stats we have for range estimation are the
> column minimum and maximum values collected by VACUUM ANALYZE, so we
> just do a linear interpolation.  If your data isn't fairly uniformly
> distributed between the min and max then range estimates will be all
> wet.  In this example, it seems that 80% of your target entries start
> with "/music/", but I'll bet the min and max cover a much wider range.
> So the estimator mistakenly thinks that LIKE '/music/%' will select
> a fairly small proportion of rows, rather than 80%, and this leads the
> planner to choose a plan that would be appropriate if that were true.

Ah, I think I understand the problem here.

> > Out of curiosity, why does it take so long to order data by a datetime
> > field?
>
> AFAIK it shouldn't be materially different from sorting on a float8
> field.  You're still blaming the wrong thing: this isn't the sorter's
> fault, it's doing the best it can.  The reason this is so slow, I
> believe, is that an indexscan is being used to select 80% of the data.
> That results in a lot of useless thrashing to visit tuples in strict
> order by target. A simple sequential scan would've been much faster.

I think you're correct, because this did speed things up quite a bit....
the query time dropped to about a minute.

> It's possible that an indexscan on logtime would work better, because it
> could stop as soon as it had retrieved 1000 matching tuples.  This would
> be particularly likely if the tuples are more or less in logtime order,
> so that there's not a lot of disk thrashing to retrieve them.  Otherwise
> a plain old sequential scan and sort is likely to be the fastest way.

And *this* sped things up even more.... not using the target key and using
the LogTime key instead dropped the time to about 30 seconds.

> Unfortunately the planner is never going to figure this out as long as
> it's so far off about the selectivity of the LIKE.  You could try
> forcing its hand with various combinations of
>     SET enable_sort TO off;
>     SET enable_indexscan TO off;
> ... just remember to turn these things back on for subsequent queries,
> else you'll get really bad plans ...

Having no notion of what's involved, it seems like it would be nice to let
the query optimizer select which index to use based on the query and the
indices available. Or, if it already does this, to figure out exactly why
it's not doing the right thing here.

Thanks for the help!


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: documentation bug (was Re: Why is explain horribly optimistic for sorts?)
Next
From: will trillich
Date:
Subject: triggers vs "NEW" pseudorecord