Home > mailing lists

Efficient Partitioning Strategies for PostgreSQL Table with KSUID and High Volume - Mailing list pgsql-general

From	wheels
Subject	Efficient Partitioning Strategies for PostgreSQL Table with KSUID and High Volume
Date	September 4, 2023 10:15:42
Msg-id	CAEF_WQR1r=h+kvqZn4Tks7Qz9vo8J8po9nQ1oNW-Ky0nTkOLKg@mail.gmail.com Whole thread
Responses	Re: Efficient Partitioning Strategies for PostgreSQL Table with KSUID and High Volume
List	pgsql-general

Tree view

Hi Team,

I'm working with a PostgreSQL table containing terabytes of data, and
it grows by millions of rows weekly. Each row is identified by a
[KSUID][1], and my primary read patterns are:

1. Retrieve a row by its KSUID.
2. List rows by `user_id` in descending order, pagination acceptable.

Currently, the table is unpartitioned and read performance is
sluggish. I'm contemplating partitioning the table by month using the
KSUID column, [leveraging its embedded uint32 timestamp][2], something
like this:

```sql
CREATE TABLE table_y2023m09 PARTITION OF ksuid
FOR VALUES FROM ('[current_month_ts][128 zeros]') TO
('[next_month_ts][128 zeros]')
```

This allows each 'Get row by KSUID' query to be isolated to a single partition.

For listing rows by `user_id`, I'm considering keyset pagination:

```sql
SELECT *
FROM table_name
WHERE user_id = ?
  AND ksuid > last_seen_ksuid
ORDER BY ksuid
LIMIT 10;
```

However, this method still would need to search through multiple
partitions depending on `last_seen_ksuid`, but I guess that with an
index by `user_id` might be enough.

### Questions:

1. Is using KSUID as a partitioning key viable, especially given that
the column can be represented as text or bytes?
2. Is there a more efficient way to implement listing by `user_id`
other than keyset pagination?
3. Are there any pitfalls or performance issues I should be aware of
with this approach?
4. Would it be better to just partition based on `created_at` and
extract the timestamp from the ksuid on application layer and add it
explicitly to the query?

Thank you very much,
Best regards.

  [1]: https://github.com/segmentio/ksuid
  [2]: https://github.com/segmentio/ksuid#:~:text=a%2032%2Dbit%20unsigned%20integer%20UTC%20timestamp

pgsql-general by date:

From: Lorusso Domenico
Date: 04 September 2023, 09:51:30
Subject: rollback to savepoint issue

From: Erik Wienhold
Date: 04 September 2023, 12:45:01
Subject: Re: rollback to savepoint issue

Efficient Partitioning Strategies for PostgreSQL Table with KSUID and High Volume - Mailing list pgsql-general

Previous

Next