Home > mailing lists

Re: [EXTERNAL] Re: performance expectations for table(s) with 2B recs - Mailing list pgsql-general

From	Peter J. Holzer
Subject	Re: [EXTERNAL] Re: performance expectations for table(s) with 2B recs
Date	December 11, 2021 00:43:12
Msg-id	20211210214312.ycfynzxiqmbm35o6@hjp.at Whole thread Raw
In response to	RE: [EXTERNAL] Re: performance expectations for table(s) with 2B recs ("Godfrin, Philippe E" <Philippe.Godfrin@nov.com>)
Responses	RE: [EXTERNAL] Re: performance expectations for table(s) with 2B recs ("Godfrin, Philippe E" <Philippe.Godfrin@nov.com>)
List	pgsql-general

Tree view

On 2021-12-10 18:04:07 +0000, Godfrin, Philippe E wrote:
> >But in my experience the biggest problem with large tables are unstable
> >execution plans - for most of the parameters the optimizer will choose
> >to use an index, but for some it will erroneously think that a full
> >table scan is faster. That can lead to a situation where a query
> >normally takes less than a second, but sometimes (seemingly at random)
> >it takes several minutes
[...]
> For Peter I have a question. What exactly causes ‘unstable execution plans’ ??
>
> Besides not using bind variables, bad statistics, would you elaborate
> in what would contribute to that instability?

Not using bind variables and bad statistics are certainly big factors:

On one hand not using bind variables gives a lot more information to the
optimizer, so it can choose a better plan at run time. On the other hand
that makes hard to predict what plan it will choose.

Bad statistics come in many flavours: They might just be wrong, that's
usually easy to fix. More problematic are statistics which just don't
describe reality very well - they may not show a correlation, causing
the optimizer to assume that two distributions are independent when they
really aren't (since PostgreSQL 10 you can create statistics on multiple
columns which helps in many but not all cases) or not show some other
peculiarity of the data. Or they may be just so close to a flipping
point that a small change causes the optimizer to choose a wildly
different plan.

Another source is dynamically generated SQL. Your application may just
put together SQL from fragments or it might use something like
SQLalchemy or an ORM. In any of these cases what looks like one query
from a user's perspective may really be a whole family of related
queries - and PostgreSQL will try to find the optimal plan for each of
them. Which is generally a good thing, but it adds opportunities to mess
up.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

signature.asc

pgsql-general by date:

From: Adrian Klaver
Date: 11 December 2021, 00:28:15
Subject: Re: Postgresql + containerization possible use case

From: Richard Michael
Date: 11 December 2021, 02:51:33
Subject: Re: CTE Materialization

Re: [EXTERNAL] Re: performance expectations for table(s) with 2B recs - Mailing list pgsql-general

Attachment

Previous

Next