Re: Half billion records in one table? RDS

From: Laurenz Albe
Subject: Re: Half billion records in one table? RDS
Date: ,
(view: Whole thread, Raw)
In response to: Half billion records in one table? RDS  (Jean Baro)
List: pgsql-performance

Jean Baro wrote:
> Hi there,
> We are creating a new DB which will behave most like a file system,
> I mean, there will be no complex queries or joins running in the DB.
> The idea is to grab the WHOLE set of messages for a particular user
> and then filter, order, combine or full text search in the function itself (AWS Lambda).
> The maximum number of messages is limited to 1.000 messages per user.
> So we expect Postgres to have an amazing performance for this scenario.
> Sizing and worst case scenario:
> ·         500MM messages in the main DB
> ·         4K queries per second (by UserID) – Max time of 500ms per query. Simples SELECT,
>           with no ORDER, WHERE OR GROUP BY. Just grab all the messages for a particular user. MAX 1000 messages per
> ·         1K inserts per second on average (So that in 1 hour we can insert around 3MM messages)
> ·         1K deletes per second on average (So that in 1 hour we can remove around 3MM messages)
> My question is:
> Can we use any kind of compression for PostgreSQL which would result in reduced IO and disk size?
> We are not relying on any kind of table partitioning, is that the best approach for this scenario?
> Is PG on RDS capable of delivering this type of performance while requiring low maintenance?
> What about Auto Vacuum? Any suggestion how to optimize it for such a work load
> (we will insert and delete millions of rows every day).

It sounds like your JSON data, which are your chief concern, are
not processed inside the database.  For that, the type "json" is best.
Such data are automatically stored in a compressed format if their
size exceeds 2KB. The compression is not amazingly good, but fast.

If your application removes data by deleting them from the
table, partitioning won't help.  It is useful if data get removed
in bulk, e.g. if you want to delete all yesterday's data at once.

The workload does not sound amazingly large, so I'd expect PostgreSQL
to have no problems with it with decent storage and CPU power,
but you'd have to test that.

Tune autovacuum if it cannot keep up (tables get bloated).
The first knob to twiddle is probably lowering "autovacuum_vacuum_cost_delay".
Autovacuum might be your biggest problem (only guessing).

Laurenz Albe

pgsql-performance by date:

From: Andres Freund
Subject: Re: Setting effective_io_concurrency in VM?
From: Alex Reece
Subject: Bad plan chosen for union all