Re: performance expectations for table(s) with 2B recs - Mailing list pgsql-general

From Peter J. Holzer
Subject Re: performance expectations for table(s) with 2B recs
Date
Msg-id 20211208221413.xfijsujafpvdp5c2@hjp.at
Whole thread Raw
In response to performance expectations for table(s) with 2B recs  (David Gauthier <davegauthierpg@gmail.com>)
Responses RE: [EXTERNAL] Re: performance expectations for table(s) with 2B recs  ("Godfrin, Philippe E" <Philippe.Godfrin@nov.com>)
List pgsql-general
On 2021-12-08 14:44:47 -0500, David Gauthier wrote:
> So far, the tables I have in my DB have relatively low numbers of records (most
> are < 10K, all are < 10M).  Things have been running great in terms of
> performance.  But a project is being brainstormed which may require some tables
> to contain a couple billion records.
[...]
> What else should I be worried about ?
>
> I suspect that part of why things are running really well so far is that the
> relatively small amounts of data in these tables ends up in the DB cache and
> disk I/O is kept at a minimum.  Will that no longer be the case once queries
> start running on these big tables ?

Depends a lot on how good the locality of your queries is. If most read
only the same parts of the same indexes, those will still be in the
cache. If they are all over the place or if you have queries which need
to read large parts of your tables, cache misses will make your
performance a lot less predictable, yes. That stuff is also hard to
test, because when you are testing a query twice in a row, the second
time it will likely hit the cache and be quite fast.

But in my experience the biggest problem with large tables are unstable
execution plans - for most of the parameters the optimizer will choose
to use an index, but for some it will erroneously think that a full
table scan is faster. That can lead to a situation where a query
normally takes less than a second, but sometimes (seemingly at random)
it takes several minutes - users will understandably be upset about such
behaviour. It is in any case a good idea to monitor execution times to
find such problems (ideally before users complain), but each needs to be
treated on an individual basis, and sometimes there seems to be no good
solution.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

pgsql-general by date:

Previous
From: David Gauthier
Date:
Subject: performance expectations for table(s) with 2B recs
Next
From: Дмитрий Иванов
Date:
Subject: Re: CTE Materialization