Thread: Efficient Partitioning Strategies for PostgreSQL Table with KSUID and High Volume

Hi Team,

I'm working with a PostgreSQL table containing terabytes of data, and
it grows by millions of rows weekly. Each row is identified by a
[KSUID][1], and my primary read patterns are:

1. Retrieve a row by its KSUID.
2. List rows by `user_id` in descending order, pagination acceptable.

Currently, the table is unpartitioned and read performance is
sluggish. I'm contemplating partitioning the table by month using the
KSUID column, [leveraging its embedded uint32 timestamp][2], something
like this:

```sql
CREATE TABLE table_y2023m09 PARTITION OF ksuid
FOR VALUES FROM ('[current_month_ts][128 zeros]') TO
('[next_month_ts][128 zeros]')
```

This allows each 'Get row by KSUID' query to be isolated to a single partition.

For listing rows by `user_id`, I'm considering keyset pagination:

```sql
SELECT *
FROM table_name
WHERE user_id = ?
  AND ksuid > last_seen_ksuid
ORDER BY ksuid
LIMIT 10;
```

However, this method still would need to search through multiple
partitions depending on `last_seen_ksuid`, but I guess that with an
index by `user_id` might be enough.

### Questions:

1. Is using KSUID as a partitioning key viable, especially given that
the column can be represented as text or bytes?
2. Is there a more efficient way to implement listing by `user_id`
other than keyset pagination?
3. Are there any pitfalls or performance issues I should be aware of
with this approach?
4. Would it be better to just partition based on `created_at` and
extract the timestamp from the ksuid on application layer and add it
explicitly to the query?

Thank you very much,
Best regards.

  [1]: https://github.com/segmentio/ksuid
  [2]: https://github.com/segmentio/ksuid#:~:text=a%2032%2Dbit%20unsigned%20integer%20UTC%20timestamp



Re: Efficient Partitioning Strategies for PostgreSQL Table with KSUID and High Volume

From
Lorusso Domenico
Date:
Hello,
Increase of access cost is logarithmic when we access by index, so partition often isn't a solution to improve performance, but a solution to solve human difficulties to manage huge amounts of data, to develop expensive parallel jobs and, in some cases, to improve performance for sequential reads.

If I understand correctly your need you have 2 topics:
- read by index --> no issue
- read by user_id --> could be fixed by an index, but, because the record are distributed, the access plan use multiple access by index

Some numbers.
each week table increases of 5 millions (5 * 10 ^6) record
after 10 year (520 weeks) table should have 2.6 billion of record
A btree could create a 64 based index that means you need of...5 access to retrieve the right index address plus 1 access to retrieve the row.

This doesn't seem to be a great problem.

But if you have just 100 users the plan calculation foresees to perform 2,6*10^9 / 10^2= 2.6*10^7 access by index and each access (that actually cost 6 access for each).

But probably you don't need to read all the records inserted or modified by a user_id.

An approach could be to create generational partitions.
Main partition stores just the most recent records (with some assumption you have to define), the oldest record could be moved to other historical partition(s).

This could be realize with a technical field where store value that lead the partition identification:
field_for_partition=current --> main partition
field_for_partition=h01 --> historical partition 1
field_for_partition=h02 --> historical partition 2
and so on.

Unfortunately, this approach requires to implement a cron job that each x days change the value of field_for_partition and perform e vacuum.

A  more simple solution is to create index with user_id + creation_date and ALWAYS access by user_id and creation_date > now() - interval 'xx days'

But, if you are users that insert record once each 2 months and users that insert 1000 record per day this approach could not fit your need.
Maybe you need also to store last insertion time for each user and also the number of record inserted.
In other word you could to find a way to define the right amount of days for each users






Il giorno mar 5 set 2023 alle ore 10:24 wheels <cheleon15@gmail.com> ha scritto:
Hi Team,

I'm working with a PostgreSQL table containing terabytes of data, and
it grows by millions of rows weekly. Each row is identified by a
[KSUID][1], and my primary read patterns are:

1. Retrieve a row by its KSUID.
2. List rows by `user_id` in descending order, pagination acceptable.

Currently, the table is unpartitioned and read performance is
sluggish. I'm contemplating partitioning the table by month using the
KSUID column, [leveraging its embedded uint32 timestamp][2], something
like this:

```sql
CREATE TABLE table_y2023m09 PARTITION OF ksuid
FOR VALUES FROM ('[current_month_ts][128 zeros]') TO
('[next_month_ts][128 zeros]')
```

This allows each 'Get row by KSUID' query to be isolated to a single partition.

For listing rows by `user_id`, I'm considering keyset pagination:

```sql
SELECT *
FROM table_name
WHERE user_id = ?
  AND ksuid > last_seen_ksuid
ORDER BY ksuid
LIMIT 10;
```

However, this method still would need to search through multiple
partitions depending on `last_seen_ksuid`, but I guess that with an
index by `user_id` might be enough.

### Questions:

1. Is using KSUID as a partitioning key viable, especially given that
the column can be represented as text or bytes?
2. Is there a more efficient way to implement listing by `user_id`
other than keyset pagination?
3. Are there any pitfalls or performance issues I should be aware of
with this approach?
4. Would it be better to just partition based on `created_at` and
extract the timestamp from the ksuid on application layer and add it
explicitly to the query?

Thank you very much,
Best regards.

  [1]: https://github.com/segmentio/ksuid
  [2]: https://github.com/segmentio/ksuid#:~:text=a%2032%2Dbit%20unsigned%20integer%20UTC%20timestamp




--
Domenico L.

per stupire mezz'ora basta un libro di storia,
io cercai di imparare la Treccani a memoria... [F.d.A.]