Re: 600 million rows of data. Bad hardware or need partitioning? - Mailing list pgsql-performance

From Justin Pryzby
Subject Re: 600 million rows of data. Bad hardware or need partitioning?
Date
Msg-id 20200506013741.GH28974@telsasoft.com
Whole thread Raw
In response to Re: 600 million rows of data. Bad hardware or need partitioning?  (Arya F <arya6000@gmail.com>)
Responses Re: 600 million rows of data. Bad hardware or need partitioning?  (Arya F <arya6000@gmail.com>)
List pgsql-performance
On Tue, May 05, 2020 at 08:31:29PM -0400, Arya F wrote:
> On Mon, May 4, 2020 at 5:21 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> 
> > I mentioned in February and March that you should plan to set shared_buffers
> > to fit the indexes currently being updated.
> 
> The following command gives me
> 
> select pg_size_pretty (pg_indexes_size('test_table'));
>  pg_size_pretty >  5216 MB
> 
> So right now, the indexes on that table are taking about 5.2 GB, if a
> machine has 512 GB of RAM and SSDs, is it safe to assume I can achieve
> the same update that takes 1.5 minutes in less than 5 seconds while
> having 600 million rows of data without partitioning?

I am not prepared to guarantee server performance..

But, to my knowledge, you haven't configured shared_buffers at all.  Which I
think might be the single most important thing to configure for loading speed
(with indexes).

Couple months ago, you said your server had 4GB RAM, which isn't much, but if
shared_buffers is ~100MB, I think that deserves attention.

If you get good performance with a million rows and 32MB buffers, then you
could reasonably hope to get good performance (at least initially) with
100million rows and 320MB buffers.  Scale that up to whatever you expect your
index size to be.  Be conservative since you may need to add indexes later, and
you can expect they'll become bloated, so you may want to run a reindex job.

shared_buffers is frequently set to ~25% of RAM, and if you need to efficiently
use indexes larger than what that supports, then you should add RAM, or
implement partitioning.

-- 
Justin



pgsql-performance by date:

Previous
From: Arya F
Date:
Subject: Re: 600 million rows of data. Bad hardware or need partitioning?
Next
From: Steve Pritchard
Date:
Subject: Inaccurate Rows estimate for "Bitmap And" causes Planner to choosewrong join