RE: [EXTERNAL] Re: performance expectations for table(s) with 2B recs - Mailing list pgsql-general

From Godfrin, Philippe E
Subject RE: [EXTERNAL] Re: performance expectations for table(s) with 2B recs
Date
Msg-id SA0PR15MB393324C6A5522A80913DC42582719@SA0PR15MB3933.namprd15.prod.outlook.com
Whole thread Raw
In response to Re: performance expectations for table(s) with 2B recs  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
Responses Re: [EXTERNAL] Re: performance expectations for table(s) with 2B recs  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
List pgsql-general

 

On 2021-12-08 14:44:47 -0500, David Gauthier wrote:
> So far, the tables I have in my DB have relatively low numbers of records (most
> are < 10K, all are < 10M).  Things have been running great in terms of
> performance.  But a project is being brainstormed which may require some tables
> to contain a couple billion records.
[...]
> What else should I be worried about ?
>
> I suspect that part of why things are running really well so far is that the
> relatively small amounts of data in these tables ends up in the DB cache and
> disk I/O is kept at a minimum.  Will that no longer be the case once queries
> start running on these big tables ?

>

>Depends a lot on how good the locality of your queries is. If most read

>only the same parts of the same indexes, those will still be in the

>cache. If they are all over the place or if you have queries which need

>to read large parts of your tables, cache misses will make your

>performance a lot less predictable, yes. That stuff is also hard to

>test, because when you are testing a query twice in a row, the second

>time it will likely hit the cache and be quite fast.

>But in my experience the biggest problem with large tables are unstable

>execution plans - for most of the parameters the optimizer will choose

>to use an index, but for some it will erroneously think that a full

>table scan is faster. That can lead to a situation where a query

>normally takes less than a second, but sometimes (seemingly at random)

>it takes several minutes - users will understandably be upset about such

>behaviour. It is in any case a good idea to monitor execution times to

>find such problems (ideally before users complain), but each needs to be

>treated on an individual basis, and sometimes there seems to be no good

>solution.

 

To the OP, that’s is a tall order to answer – basically that’s wjhy DBA’s still have

Jobs…

 

For Peter I have a question. What exactly causes ‘unstable execution plans’ ??

 

Besides not using bind variables, bad statistics, would you elaborate in what would

contribute to that instability?




pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Postgresql + containerization possible use case
Next
From: Michael Lewis
Date:
Subject: Re: Postgresql + containerization possible use case