Re: Yet another abort-early plan disaster on 9.3 - Mailing list pgsql-performance

From Simon Riggs
Subject Re: Yet another abort-early plan disaster on 9.3
Date
Msg-id CA+U5nMJqYiViG8Xu=OqJo5o7NbKgrBYC=Kn6nx-4AtKXidWvtw@mail.gmail.com
Whole thread Raw
In response to Re: Yet another abort-early plan disaster on 9.3  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Yet another abort-early plan disaster on 9.3  (Merlin Moncure <mmoncure@gmail.com>)
Re: Yet another abort-early plan disaster on 9.3  (Greg Stark <stark@mit.edu>)
List pgsql-performance
On 23 September 2014 00:56, Josh Berkus <josh@agliodbs.com> wrote:

> We've hashed that out a bit, but frankly I think it's much more
> profitable to pursue fixing the actual problem than providing a
> workaround like "risk", such as:
>
> a) fixing n_distinct estimation
> b) estimating stacked quals using better math (i.e. not assuming total
> randomness)
> c) developing some kind of correlation stats
>
> Otherwise we would be just providing users with another knob there's no
> rational way to set.

I believe this is a serious issue for PostgreSQL users and one that
needs to be addressed.

n_distinct can be fixed manually, so that is less of an issue.

The problem, as I see it, is different. We assume that if there are
100 distinct values and you use LIMIT 1 that you would only need to
scan 1% of rows. We assume that the data is arranged in the table in a
very homogenous layout. When data is not, and it seldom is, we get
problems.

Simply put, assuming that LIMIT will reduce the size of all scans is
just way wrong. I've seen many plans where increasing the LIMIT
dramatically improves the plan.

If we can at least agree it is a problem, we can try to move forwards.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-performance by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: postgres 9.3 vs. 9.4
Next
From: "Burgess, Freddie"
Date:
Subject: Very slow postgreSQL 9.3.4 query