Re: [EXTERNAL] Re: performance expectations for table(s) with 2B recs - Mailing list pgsql-general

From Peter J. Holzer
Subject Re: [EXTERNAL] Re: performance expectations for table(s) with 2B recs
Date
Msg-id 20211210214312.ycfynzxiqmbm35o6@hjp.at
Whole thread Raw
In response to RE: [EXTERNAL] Re: performance expectations for table(s) with 2B recs  ("Godfrin, Philippe E" <Philippe.Godfrin@nov.com>)
Responses RE: [EXTERNAL] Re: performance expectations for table(s) with 2B recs  ("Godfrin, Philippe E" <Philippe.Godfrin@nov.com>)
List pgsql-general
On 2021-12-10 18:04:07 +0000, Godfrin, Philippe E wrote:
> >But in my experience the biggest problem with large tables are unstable
> >execution plans - for most of the parameters the optimizer will choose
> >to use an index, but for some it will erroneously think that a full
> >table scan is faster. That can lead to a situation where a query
> >normally takes less than a second, but sometimes (seemingly at random)
> >it takes several minutes
[...]
> For Peter I have a question. What exactly causes ‘unstable execution plans’ ??
>
> Besides not using bind variables, bad statistics, would you elaborate
> in what would contribute to that instability?

Not using bind variables and bad statistics are certainly big factors:

On one hand not using bind variables gives a lot more information to the
optimizer, so it can choose a better plan at run time. On the other hand
that makes hard to predict what plan it will choose.

Bad statistics come in many flavours: They might just be wrong, that's
usually easy to fix. More problematic are statistics which just don't
describe reality very well - they may not show a correlation, causing
the optimizer to assume that two distributions are independent when they
really aren't (since PostgreSQL 10 you can create statistics on multiple
columns which helps in many but not all cases) or not show some other
peculiarity of the data. Or they may be just so close to a flipping
point that a small change causes the optimizer to choose a wildly
different plan.

Another source is dynamically generated SQL. Your application may just
put together SQL from fragments or it might use something like
SQLalchemy or an ORM. In any of these cases what looks like one query
from a user's perspective may really be a whole family of related
queries - and PostgreSQL will try to find the optimal plan for each of
them. Which is generally a good thing, but it adds opportunities to mess
up.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Postgresql + containerization possible use case
Next
From: Richard Michael
Date:
Subject: Re: CTE Materialization