Home > mailing lists

Partitioning update-heavy queue with hash partitions vs partial indexes - Mailing list pgsql-performance

From	Dorian Hoxha
Subject	Partitioning update-heavy queue with hash partitions vs partial indexes
Date	August 10, 2023 11:36:12
Msg-id	CANsFX04P_VXOhO19uPnQyo4vi67kw7q_Y3ZXYD8W5AYtCSG76g@mail.gmail.com Whole thread Raw
Responses	Re: Partitioning update-heavy queue with hash partitions vs partial indexes (David Rowley <dgrowleyml@gmail.com>)
List	pgsql-performance

Tree view

Hi list,

I have a queue table with the schema:

```

create table queue(id bigserial primary key, view_time timestamp with timezone not null);

create index queue_view_time ON queue(view_time ASC);

```

The most concurrent operation is:

```

UPDATE queue SET view_time=view_time+INTERVAL '60 seconds' WHERE id=(

SELECT id FROM queue WHERE view_time<=now() at time zone 'utc' ORDER BY view_time ASC LIMIT 1 FOR UPDATE SKIP LOCKED

)

```

As you can imagine, with increased concurrency, this query will have to read & skip a lot of locked+dead index entries, so taking a lot of cpu-time.

I'm assuming 10K+ queries/second will do the update above and actually return a row.

You may think about how you'll maintain 10K connections, but you can increase the limit, the queries being fast, use a connection pooler, use auto-commit, etc.

--------------

Since most of the overhead is in the `queue_view_time` index, I thought of partitioning just that with partial indexes and then querying the indexes randomly. This is with 2 partitions:

```

create index queue_view_time_0 ON queue(view_time ASC) WHERE id%2=0;

create index queue_view_time_0 ON queue(view_time ASC) WHERE id%2=1;

```

Adding `where id%2=0` to the select query above and trying the partitions randomly until I get a row or searched all partitions.

----------------

But looking at the docs https://www.postgresql.org/docs/current/indexes-partial.html, it says:

> Do Not Use Partial Indexes as a Substitute for Partitioning

> While a search in this larger index might have to descend through a couple more tree levels than a search in a smaller index, that's almost certainly going to be cheaper than the planner effort needed to select the appropriate one of the partial indexes. The core of the problem is that the system does not understand the relationship among the partial indexes, and will laboriously test each one to see if it's applicable to the current query.

Would this be true in my case too?

Is it faster for the planner to select a correct partition(hash partitioning on `id` column) instead of a correct partial index like in my case? I don't think I'll need more than ~32 partitions/partial-indexes in an extreme scenario.

Regards,

Dorian

pgsql-performance by date:

From: "David G. Johnston"
Date: 10 August 2023, 06:33:58
Subject: Re: Function call very slow from JDBC/java but super fast from DBear

From: David Rowley
Date: 11 August 2023, 07:49:24
Subject: Re: Partitioning update-heavy queue with hash partitions vs partial indexes

Partitioning update-heavy queue with hash partitions vs partial indexes - Mailing list pgsql-performance

Previous

Next