Re: Batch TIDs lookup in ambulkdelete - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Batch TIDs lookup in ambulkdelete |
Date | |
Msg-id | CAD21AoCGzTu66ZjatOR2yxu6nVkpK7=XBh2reZXiJ819rT-38Q@mail.gmail.com Whole thread Raw |
In response to | Re: Batch TIDs lookup in ambulkdelete (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: Batch TIDs lookup in ambulkdelete
|
List | pgsql-hackers |
On Sun, Jun 1, 2025 at 11:01 PM John Naylor <johncnaylorls@gmail.com> wrote: > > On Thu, May 1, 2025 at 5:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > HEAD PATCHED DIFF > > case-1: 3,021 ms 2.818 ms 93.29% > > case-2: 5, 697 ms 5.545 ms 97.34% > > case-3: 2,833 ms 2.790 ms 98.48% > > case-4: 2,564 ms 2.279 ms 88.86% > > case-5: 4,657 ms 4.706 ms 101.04% > > 1 and 4 look significant -- do the other cases have reproducible > differences or is it just noise? These results are the average of 3 times executions, so they are reproducible differences. > > > Here are the summary of each attached patch: > > > > 0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check > > for multiple TIDs in one function call. If the given TIDs are sorted > > (at least in block number), we can save radix tree lookup for the same > > page entry. > > My only comment is that TidStoreIsMember() is now unused in core (or > maybe just the tests?). It seems like we could just change the API for > it rather than introduce a new function? Good point, changed in the latest patch I posted[1]. > > > 0003: Use batch TIDs lookup in btree index bulk-deletion. > > > > In patch 0003, we implement batch TID lookups for both each > > deduplicated index tuple and remaining all regular index tuples, which > > seems to be the most straightforward approach. > > Seems like a good approach. btvacuumpage() needs to sort if there is a > mix of posting tuples and regular index tuples. Was that covered by > any of the tests above? Good point, I think this case was not covered. I've measured the performance with the following queries: # Case-6: create unlogged table test (c int) with (autovacuum_enabled = off); insert into test select (2 * c - 1) from generate_series(1, ${NROWS}) c; insert into test select c from generate_series(1, ${NROWS}) c; create index on test (c); select pg_relation_size('test') / 8192 as relpages \gset delete from test where c < (${NROWS} * 0.3)::int vacuum test; And here is the result: HEAD PATCHED DIFF case-6: 3,320 ms 3,617 ms 108.94% I'll consider how to deal with overheads of sorting TIDs. > > While further > > optimizations are possible, such as performing batch TID lookups for > > all index tuples on a single page, these could introduce additional > > overhead from sorting and re-sorting TIDs. Moreover, considering that > > radix tree lookups are relatively inexpensive, the benefits of sorting > > TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless, > > these potential optimizations warrant further evaluation to determine > > their actual impact on performance. > > My guess is that always sorting by TID and than back by index tuple > offset is too much overhead to be worth it, but I'm not sure. Agreed. Given the above test results, it's unlikely always sorting the array helps speedups. Regards, [1] https://www.postgresql.org/message-id/CAD21AoAv55DhJ%2B19zaemx-_eO7z%2Bu4gtFmeADsMBFqtHhyUySQ%40mail.gmail.com -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: