Re: Yet another abort-early plan disaster on 9.3 - Mailing list pgsql-performance

From Josh Berkus
Subject Re: Yet another abort-early plan disaster on 9.3
Date
Msg-id 5429D517.3090900@agliodbs.com
Whole thread Raw
In response to Yet another abort-early plan disaster on 9.3  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Yet another abort-early plan disaster on 9.3  (Simon Riggs <simon@2ndquadrant.com>)
Re: Yet another abort-early plan disaster on 9.3  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance
On 09/26/2014 01:06 AM, Simon Riggs wrote:
> On 23 September 2014 00:56, Josh Berkus <josh@agliodbs.com> wrote:
>
>> We've hashed that out a bit, but frankly I think it's much more
>> profitable to pursue fixing the actual problem than providing a
>> workaround like "risk", such as:
>>
>> a) fixing n_distinct estimation
>> b) estimating stacked quals using better math (i.e. not assuming total
>> randomness)
>> c) developing some kind of correlation stats
>>
>> Otherwise we would be just providing users with another knob there's no
>> rational way to set.
>
> I believe this is a serious issue for PostgreSQL users and one that
> needs to be addressed.
>
> n_distinct can be fixed manually, so that is less of an issue.

It's an issue for the 99.8% of our users who don't know what n_distinct
is, let alone how to calculate it.  Also, changing it requires an
exclusive lock on the table. Of course, you and I have been over this
issue before.

One thing I'm wondering is why our estimator is creates n_distinct as a
% so seldom.  Really, any time n_distinct is over 10K we should be
estimating a % instead.  Now, estimating that % has its own issues, but
it does seem like a peculiar quirk of our stats model.

Anyway, in the particular case I posted fixing n_distinct to realistic
numbers (%) fixed the query plan.

>
> The problem, as I see it, is different. We assume that if there are
> 100 distinct values and you use LIMIT 1 that you would only need to
> scan 1% of rows. We assume that the data is arranged in the table in a
> very homogenous layout. When data is not, and it seldom is, we get
> problems.
>
> Simply put, assuming that LIMIT will reduce the size of all scans is
> just way wrong. I've seen many plans where increasing the LIMIT
> dramatically improves the plan.
>
> If we can at least agree it is a problem, we can try to move forwards.

That is certainly another problem.  Does correlation stat figure in the
LIMIT calculation at all, currently?  That's what correlation stat is
for, no?

Also, to be fair, physical correlation of rows can also lead to
abort-early plans being extra fast, if everything we want is towards the
beginning of the index.  Which means we'd need working multi-column
correlation, which is a known hard problem.

For example, consider the query:

SELECT id, updated_on FROM audit_log
WHERE updated_on < '2010-01-01'
ORDER BY id LIMIT 10;

In an append-only table, that query is liable to be very fast with an
abort-early plan scanning on an ID index (AEP from now on), since the
oldest rows are likely to correspond with the smallest IDs.  But then
the user does this:

SELECT id, updated_on FROM audit_log
WHERE updated_on < '2010-01-01'
ORDER BY id DESC LIMIT 10;

... and a completely different plan is called for, because using an AEP
will result in reverse scanning most of the index.  However, I'm not
sure the planner knows the difference, since it's only comparing the
estimated selectivity of (updated_on < '2010-01-01') and seeing that
it's 20% of rows.  I bet you'd get an AEP in the 2nd case too.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


pgsql-performance by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Yet another abort-early plan disaster on 9.3
Next
From: Tom Lane
Date:
Subject: Re: Yet another abort-early plan disaster on 9.3