Thread: Batch TIDs lookup in ambulkdelete

Batch TIDs lookup in ambulkdelete

From

Masahiko Sawada

Date:

01 May, 01:36:09

Hi all,

During index bulk-deletion in lazy vacuum, we currently check the
deletability of each index tuple individually using the
vac_tid_reaped() function. The attached proof-of-concept patches
propose batching multiple TID lookups for deletability checks to
reduce overhead. This optimization aims to minimize redundant function
calls and repeated TidStore entry retrievals for TIDs on the same
page. I have conducted benchmarks across several scenarios to evaluate
the performance impact.

# Case-1 (btree tuples are regular tuples and dead TIDs are concentrated):

create unlogged table test (c int) with (autovacuum_enabled = off);
insert into test select generate_series(1, ${NROWS});
create index on test (c);
delete from test where c < ${NROWS} * 0.3;

# Case-2 (btree tuples are regular tuples and dead TIDs are sparsed):

create unlogged table test (c int) with (autovacuum_enabled = off);
insert into test select generate_series(1, ${NROWS});
create index on test (c);
delete from test where random() < 0.3;

# Case-3 (btree tuples are deduplicated tuples):

create unlogged table test (c int) with (autovacuum_enabled = off);
insert into test select c % 1000 from generate_series(1, ${NROWS}) c;
create index on test (c);
select pg_relation_size('test') / 8192 as relpages \gset
delete from test where (ctid::text::point)[0] < ((:'relpages')::int * 0.3);

# Case-4 (btree tuples are deduplicated tuples and table is clustered):

create unlogged table test (c int) with (autovacuum_enabled = off);
insert into test select c % 1000 from generate_series(1, ${NROWS}) c;
create index on test (c);
cluster test using test_c_idx;
select pg_relation_size('test') / 8192 as relpages \gset
delete from test where (ctid::text::point)[0] < ((:'relpages')::int * 0.3);

# Case-5 (creating btree index on UUID column)

create unlogged table test (c uuid) with (autovacuum_enabled = off);
insert into test select uuidv4() from generate_series(1, ${NROWS}) c;
create index on test (c);
select pg_relation_size('test') / 8192 as relpages \gset
delete from test where (ctid::text::point)[0] < ((:'relpages')::int * 0.3);

Here are the results (NROWS = 50000000):

                  HEAD         PATCHED    DIFF
case-1:     3,021 ms      2.818 ms    93.29%
case-2:     5, 697 ms     5.545 ms    97.34%
case-3:     2,833 ms      2.790 ms    98.48%
case-4:     2,564 ms      2.279 ms    88.86%
case-5:     4,657 ms      4.706 ms   101.04%

I've measured 6 ~ 11% improvement in btree bulk-deletion.

Here are the summary of each attached patch:

0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
for multiple TIDs in one function call. If the given TIDs are sorted
(at least in block number), we can save radix tree lookup for the same
page entry.

0002: Convert IndexBUlkDeleteCallback() to batched operation.

0003: Use batch TIDs lookup in btree index bulk-deletion.

In patch 0003, we implement batch TID lookups for both each
deduplicated index tuple and remaining all regular index tuples, which
seems to be the most straightforward approach. While further
optimizations are possible, such as performing batch TID lookups for
all index tuples on a single page, these could introduce additional
overhead from sorting and re-sorting TIDs. Moreover, considering that
radix tree lookups are relatively inexpensive, the benefits of sorting
TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
these potential optimizations warrant further evaluation to determine
their actual impact on performance.

Also, the patch includes the batch TIDs lookup support only for btree
indexes but we potentially can improve other index AMs too.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Batch TIDs lookup in ambulkdelete

From

John Naylor

Date:

02 June, 09:01:30

On Thu, May 1, 2025 at 5:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

>                   HEAD         PATCHED    DIFF
> case-1:     3,021 ms      2.818 ms    93.29%
> case-2:     5, 697 ms     5.545 ms    97.34%
> case-3:     2,833 ms      2.790 ms    98.48%
> case-4:     2,564 ms      2.279 ms    88.86%
> case-5:     4,657 ms      4.706 ms   101.04%

1 and 4 look significant -- do the other cases have reproducible
differences or is it just noise?

> Here are the summary of each attached patch:
>
> 0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
> for multiple TIDs in one function call. If the given TIDs are sorted
> (at least in block number), we can save radix tree lookup for the same
> page entry.

My only comment is that TidStoreIsMember() is now unused in core (or
maybe just the tests?). It seems like we could just change the API for
it rather than introduce a new function?

> 0003: Use batch TIDs lookup in btree index bulk-deletion.
>
> In patch 0003, we implement batch TID lookups for both each
> deduplicated index tuple and remaining all regular index tuples, which
> seems to be the most straightforward approach.

Seems like a good approach. btvacuumpage() needs to sort if there is a
mix of posting tuples and regular index tuples. Was that covered by
any of the tests above?

> While further
> optimizations are possible, such as performing batch TID lookups for
> all index tuples on a single page, these could introduce additional
> overhead from sorting and re-sorting TIDs. Moreover, considering that
> radix tree lookups are relatively inexpensive, the benefits of sorting
> TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
> these potential optimizations warrant further evaluation to determine
> their actual impact on performance.

My guess is that always sorting by TID and than back by index tuple
offset is too much overhead to be worth it, but I'm not sure.

--
John Naylor
Amazon Web Services

Re: Batch TIDs lookup in ambulkdelete

From

Masahiko Sawada

Date:

07 June, 00:34:12

On Tue, May 13, 2025 at 2:26 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> Hi,
>
> On 30/04/25 19:36, Masahiko Sawada wrote:
> > Here are the summary of each attached patch:
> >
> > 0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
> > for multiple TIDs in one function call. If the given TIDs are sorted
> > (at least in block number), we can save radix tree lookup for the same
> > page entry.
> >
> > 0002: Convert IndexBUlkDeleteCallback() to batched operation.
> >
> > 0003: Use batch TIDs lookup in btree index bulk-deletion.
> >
> > In patch 0003, we implement batch TID lookups for both each
> > deduplicated index tuple and remaining all regular index tuples, which
> > seems to be the most straightforward approach. While further
> > optimizations are possible, such as performing batch TID lookups for
> > all index tuples on a single page, these could introduce additional
> > overhead from sorting and re-sorting TIDs. Moreover, considering that
> > radix tree lookups are relatively inexpensive, the benefits of sorting
> > TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
> > these potential optimizations warrant further evaluation to determine
> > their actual impact on performance.
> >
> > Also, the patch includes the batch TIDs lookup support only for btree
> > indexes but we potentially can improve other index AMs too.
> >
>
> The code looks good and also +1 for the idea. I just have some small
> points:
>     - Maybe it would be good to mention somewhere that
>       IndexBulkDeleteCallback() callback returns the number of tids
>       members found on TidStore?
>     - The vac_tid_reaped() docs may need to be updated?

Thank you for looking at the patches. I agree with the above comments.

>
> I also executed meson tests for each patch individually and the 0002
> patch is not passing on my machine (MacOs).
>
> Ok:                 39
> Expected Fail:      0
> Fail:               271
> Unexpected Pass:    0
> Skipped:            22
> Timeout:            0
>
> One behaviour that I found by executing the 0002 tests is that it may be
> leaking some shared memory segments. I notice that because after
> executing the tests I tried to re-execute based on master and all tests
> were failing with the "Failed system call was shmget(key=97530599,
> size=56, 03600)" error. I also checked the shared memory segments using
> "ipcs -m" and it returns some segments which is don't returned when I
> execute the tests on master (after cleaning the leaked memory segments)
> and it also doesn't occur when executing based on 0001 or 0003.
>
> ~/d/p/batch-tids-lookup-ambulkdelete ❯❯❯ ipcs -m
> IPC status from <running system> as of Tue May 13 18:19:14 -03 2025
> T     ID     KEY        MODE       OWNER    GROUP
> Shared Memory:
> m 18087936 0x05f873bf --rw-------  matheus    staff
> m 15925250 0x05f966fe --rw-------  matheus    staff
> m 24248325 0x05f9677e --rw-------  matheus    staff
> ....
>
> Note that the the 0003 patch don't have this issue so at the end we will
> not have problem with this I think, but it maybe be good to mention that
> although the patches are separate, there is a dependency between them,
> which may cause issues on buildfarm?

Thank you for the report. With the 0001 and 0002 patches, I got a
SEGV. I've fixed this issue in the attached updated version patches.
I've confirmed that the patches pass CI tests but I'm not sure it
fixes the shared memory segment leak problem you reported. The
attached patches incorporated the comments[1] from John as well.

BTW I found that the constant 'maxblkno' in test_tidstore.sql actually
equals to InvalidBlockNumber, but not MaxBlockNumber. I think it
doesn't make sense that TidStore uses InvalidBlockNumber as the key.
The attached 0001 patch fixes it. I think we can fix it separately on
HEAD as well as back branches.

Regards,

[1] https://www.postgresql.org/message-id/CANWCAZbiJcwSBCczLfbfiPe1mET+V9PjTZv5VvUBwarLvx1Hfg@mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Batch TIDs lookup in ambulkdelete

From

Masahiko Sawada

Date:

07 June, 01:58:02

On Sun, Jun 1, 2025 at 11:01 PM John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Thu, May 1, 2025 at 5:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> >                   HEAD         PATCHED    DIFF
> > case-1:     3,021 ms      2.818 ms    93.29%
> > case-2:     5, 697 ms     5.545 ms    97.34%
> > case-3:     2,833 ms      2.790 ms    98.48%
> > case-4:     2,564 ms      2.279 ms    88.86%
> > case-5:     4,657 ms      4.706 ms   101.04%
>
> 1 and 4 look significant -- do the other cases have reproducible
> differences or is it just noise?

These results are the average of 3 times executions, so they are
reproducible differences.

>
> > Here are the summary of each attached patch:
> >
> > 0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
> > for multiple TIDs in one function call. If the given TIDs are sorted
> > (at least in block number), we can save radix tree lookup for the same
> > page entry.
>
> My only comment is that TidStoreIsMember() is now unused in core (or
> maybe just the tests?). It seems like we could just change the API for
> it rather than introduce a new function?

Good point, changed in the latest patch I posted[1].

>
> > 0003: Use batch TIDs lookup in btree index bulk-deletion.
> >
> > In patch 0003, we implement batch TID lookups for both each
> > deduplicated index tuple and remaining all regular index tuples, which
> > seems to be the most straightforward approach.
>
> Seems like a good approach. btvacuumpage() needs to sort if there is a
> mix of posting tuples and regular index tuples. Was that covered by
> any of the tests above?

Good point, I think this case was not covered. I've measured the
performance with the following queries:

# Case-6:
create unlogged table test (c int) with (autovacuum_enabled = off);
insert into test select (2 * c - 1) from generate_series(1, ${NROWS}) c;
insert into test select c from generate_series(1, ${NROWS}) c;
create index on test (c);
select pg_relation_size('test') / 8192 as relpages \gset
delete from test where c < (${NROWS} * 0.3)::int
vacuum test;

And here is the result:

                  HEAD         PATCHED    DIFF
case-6:     3,320 ms      3,617 ms    108.94%

I'll consider how to deal with overheads of sorting TIDs.

> > While further
> > optimizations are possible, such as performing batch TID lookups for
> > all index tuples on a single page, these could introduce additional
> > overhead from sorting and re-sorting TIDs. Moreover, considering that
> > radix tree lookups are relatively inexpensive, the benefits of sorting
> > TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
> > these potential optimizations warrant further evaluation to determine
> > their actual impact on performance.
>
> My guess is that always sorting by TID and than back by index tuple
> offset is too much overhead to be worth it, but I'm not sure.

Agreed. Given the above test results, it's unlikely always sorting the
array helps speedups.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoAv55DhJ%2B19zaemx-_eO7z%2Bu4gtFmeADsMBFqtHhyUySQ%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Batch TIDs lookup in ambulkdelete

From

Peter Geoghegan

Date:

07 June, 02:27:46

On Fri, Jun 6, 2025 at 6:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Agreed. Given the above test results, it's unlikely always sorting the
> array helps speedups.

Did you try specializing the sort? In my experience, it makes a big difference.

--
Peter Geoghegan

Re: Batch TIDs lookup in ambulkdelete

From

Junwang Zhao

Date:

07 June, 15:46:40

Hi Masahiko,

On Sat, Jun 7, 2025 at 5:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, May 13, 2025 at 2:26 PM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
> >
> > Hi,
> >
> > On 30/04/25 19:36, Masahiko Sawada wrote:
> > > Here are the summary of each attached patch:
> > >
> > > 0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
> > > for multiple TIDs in one function call. If the given TIDs are sorted
> > > (at least in block number), we can save radix tree lookup for the same
> > > page entry.
> > >
> > > 0002: Convert IndexBUlkDeleteCallback() to batched operation.
> > >
> > > 0003: Use batch TIDs lookup in btree index bulk-deletion.
> > >
> > > In patch 0003, we implement batch TID lookups for both each
> > > deduplicated index tuple and remaining all regular index tuples, which
> > > seems to be the most straightforward approach. While further
> > > optimizations are possible, such as performing batch TID lookups for
> > > all index tuples on a single page, these could introduce additional
> > > overhead from sorting and re-sorting TIDs. Moreover, considering that
> > > radix tree lookups are relatively inexpensive, the benefits of sorting
> > > TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
> > > these potential optimizations warrant further evaluation to determine
> > > their actual impact on performance.
> > >
> > > Also, the patch includes the batch TIDs lookup support only for btree
> > > indexes but we potentially can improve other index AMs too.
> > >
> >
> > The code looks good and also +1 for the idea. I just have some small
> > points:
> >     - Maybe it would be good to mention somewhere that
> >       IndexBulkDeleteCallback() callback returns the number of tids
> >       members found on TidStore?
> >     - The vac_tid_reaped() docs may need to be updated?
>
> Thank you for looking at the patches. I agree with the above comments.
>
> >
> > I also executed meson tests for each patch individually and the 0002
> > patch is not passing on my machine (MacOs).
> >
> > Ok:                 39
> > Expected Fail:      0
> > Fail:               271
> > Unexpected Pass:    0
> > Skipped:            22
> > Timeout:            0
> >
> > One behaviour that I found by executing the 0002 tests is that it may be
> > leaking some shared memory segments. I notice that because after
> > executing the tests I tried to re-execute based on master and all tests
> > were failing with the "Failed system call was shmget(key=97530599,
> > size=56, 03600)" error. I also checked the shared memory segments using
> > "ipcs -m" and it returns some segments which is don't returned when I
> > execute the tests on master (after cleaning the leaked memory segments)
> > and it also doesn't occur when executing based on 0001 or 0003.
> >
> > ~/d/p/batch-tids-lookup-ambulkdelete ❯❯❯ ipcs -m
> > IPC status from <running system> as of Tue May 13 18:19:14 -03 2025
> > T     ID     KEY        MODE       OWNER    GROUP
> > Shared Memory:
> > m 18087936 0x05f873bf --rw-------  matheus    staff
> > m 15925250 0x05f966fe --rw-------  matheus    staff
> > m 24248325 0x05f9677e --rw-------  matheus    staff
> > ....
> >
> > Note that the the 0003 patch don't have this issue so at the end we will
> > not have problem with this I think, but it maybe be good to mention that
> > although the patches are separate, there is a dependency between them,
> > which may cause issues on buildfarm?
>
> Thank you for the report. With the 0001 and 0002 patches, I got a
> SEGV. I've fixed this issue in the attached updated version patches.
> I've confirmed that the patches pass CI tests but I'm not sure it
> fixes the shared memory segment leak problem you reported. The
> attached patches incorporated the comments[1] from John as well.
>
> BTW I found that the constant 'maxblkno' in test_tidstore.sql actually
> equals to InvalidBlockNumber, but not MaxBlockNumber. I think it
> doesn't make sense that TidStore uses InvalidBlockNumber as the key.
> The attached 0001 patch fixes it. I think we can fix it separately on
> HEAD as well as back branches.
>
> Regards,
>
> [1] https://www.postgresql.org/message-id/CANWCAZbiJcwSBCczLfbfiPe1mET+V9PjTZv5VvUBwarLvx1Hfg@mail.gmail.com
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com

+ /*
+ * We will sort the deletable array if there are existing
+ * offsets as it should be sorted in ascending order (see
+ * _bt_delitems_vacuum()).
+ */
+ need_sort = (ndeletable > 0);
+
+ ndels = callback(workbuf_htids, workbuf_nitem, workbuf_deletable,
+ callback_state);
+ if (ndels > 0)
+ {
+ for (int i = 0; i < workbuf_nitem; i++)
+ {
+ if (workbuf_deletable[i])
+ deletable[ndeletable++] = workbuf_offs[i];
+ }
+
+ if (need_sort)
+ qsort(deletable, ndeletable, sizeof(OffsetNumber), cmpOffsetNumbers);
+
+ nhtidsdead += ndels;
+ }

I think the need_sort should be calculated after the callback? Maybe just:

if (ndeletable > 1)
    qsort(deletable, ndeletable, sizeof(OffsetNumber), cmpOffsetNumbers);

I think there is no need to sort when ndeletable is 1, so here compare to 1.


--
Regards
Junwang Zhao

Re: Batch TIDs lookup in ambulkdelete

From

Junwang Zhao

Date:

07 June, 15:56:27

On Sat, Jun 7, 2025 at 8:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> Hi Masahiko,
>
> On Sat, Jun 7, 2025 at 5:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, May 13, 2025 at 2:26 PM Matheus Alcantara
> > <matheusssilv97@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On 30/04/25 19:36, Masahiko Sawada wrote:
> > > > Here are the summary of each attached patch:
> > > >
> > > > 0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
> > > > for multiple TIDs in one function call. If the given TIDs are sorted
> > > > (at least in block number), we can save radix tree lookup for the same
> > > > page entry.
> > > >
> > > > 0002: Convert IndexBUlkDeleteCallback() to batched operation.
> > > >
> > > > 0003: Use batch TIDs lookup in btree index bulk-deletion.
> > > >
> > > > In patch 0003, we implement batch TID lookups for both each
> > > > deduplicated index tuple and remaining all regular index tuples, which
> > > > seems to be the most straightforward approach. While further
> > > > optimizations are possible, such as performing batch TID lookups for
> > > > all index tuples on a single page, these could introduce additional
> > > > overhead from sorting and re-sorting TIDs. Moreover, considering that
> > > > radix tree lookups are relatively inexpensive, the benefits of sorting
> > > > TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
> > > > these potential optimizations warrant further evaluation to determine
> > > > their actual impact on performance.
> > > >
> > > > Also, the patch includes the batch TIDs lookup support only for btree
> > > > indexes but we potentially can improve other index AMs too.
> > > >
> > >
> > > The code looks good and also +1 for the idea. I just have some small
> > > points:
> > >     - Maybe it would be good to mention somewhere that
> > >       IndexBulkDeleteCallback() callback returns the number of tids
> > >       members found on TidStore?
> > >     - The vac_tid_reaped() docs may need to be updated?
> >
> > Thank you for looking at the patches. I agree with the above comments.
> >
> > >
> > > I also executed meson tests for each patch individually and the 0002
> > > patch is not passing on my machine (MacOs).
> > >
> > > Ok:                 39
> > > Expected Fail:      0
> > > Fail:               271
> > > Unexpected Pass:    0
> > > Skipped:            22
> > > Timeout:            0
> > >
> > > One behaviour that I found by executing the 0002 tests is that it may be
> > > leaking some shared memory segments. I notice that because after
> > > executing the tests I tried to re-execute based on master and all tests
> > > were failing with the "Failed system call was shmget(key=97530599,
> > > size=56, 03600)" error. I also checked the shared memory segments using
> > > "ipcs -m" and it returns some segments which is don't returned when I
> > > execute the tests on master (after cleaning the leaked memory segments)
> > > and it also doesn't occur when executing based on 0001 or 0003.
> > >
> > > ~/d/p/batch-tids-lookup-ambulkdelete ❯❯❯ ipcs -m
> > > IPC status from <running system> as of Tue May 13 18:19:14 -03 2025
> > > T     ID     KEY        MODE       OWNER    GROUP
> > > Shared Memory:
> > > m 18087936 0x05f873bf --rw-------  matheus    staff
> > > m 15925250 0x05f966fe --rw-------  matheus    staff
> > > m 24248325 0x05f9677e --rw-------  matheus    staff
> > > ....
> > >
> > > Note that the the 0003 patch don't have this issue so at the end we will
> > > not have problem with this I think, but it maybe be good to mention that
> > > although the patches are separate, there is a dependency between them,
> > > which may cause issues on buildfarm?
> >
> > Thank you for the report. With the 0001 and 0002 patches, I got a
> > SEGV. I've fixed this issue in the attached updated version patches.
> > I've confirmed that the patches pass CI tests but I'm not sure it
> > fixes the shared memory segment leak problem you reported. The
> > attached patches incorporated the comments[1] from John as well.
> >
> > BTW I found that the constant 'maxblkno' in test_tidstore.sql actually
> > equals to InvalidBlockNumber, but not MaxBlockNumber. I think it
> > doesn't make sense that TidStore uses InvalidBlockNumber as the key.
> > The attached 0001 patch fixes it. I think we can fix it separately on
> > HEAD as well as back branches.
> >
> > Regards,
> >
> > [1] https://www.postgresql.org/message-id/CANWCAZbiJcwSBCczLfbfiPe1mET+V9PjTZv5VvUBwarLvx1Hfg@mail.gmail.com
> >
> > --
> > Masahiko Sawada
> > Amazon Web Services: https://aws.amazon.com
>
> + /*
> + * We will sort the deletable array if there are existing
> + * offsets as it should be sorted in ascending order (see
> + * _bt_delitems_vacuum()).
> + */
> + need_sort = (ndeletable > 0);
> +
> + ndels = callback(workbuf_htids, workbuf_nitem, workbuf_deletable,
> + callback_state);
> + if (ndels > 0)
> + {
> + for (int i = 0; i < workbuf_nitem; i++)
> + {
> + if (workbuf_deletable[i])
> + deletable[ndeletable++] = workbuf_offs[i];
> + }
> +
> + if (need_sort)
> + qsort(deletable, ndeletable, sizeof(OffsetNumber), cmpOffsetNumbers);
> +
> + nhtidsdead += ndels;
> + }
>
> I think the need_sort should be calculated after the callback? Maybe just:
>
> if (ndeletable > 1)
>     qsort(deletable, ndeletable, sizeof(OffsetNumber), cmpOffsetNumbers);
>
> I think there is no need to sort when ndeletable is 1, so here compare to 1.
>

After further study of the patch, I realize I misunderstood the logic
here, qsort
is only needed when there are other existing items in deletable before the
callback, so the code LGTM ;)

>
> --
> Regards
> Junwang Zhao



--
Regards
Junwang Zhao

Re: Batch TIDs lookup in ambulkdelete

From

John Naylor

Date:

09 June, 14:03:33

On Sat, Jun 7, 2025 at 4:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> BTW I found that the constant 'maxblkno' in test_tidstore.sql actually
> equals to InvalidBlockNumber, but not MaxBlockNumber. I think it
> doesn't make sense that TidStore uses InvalidBlockNumber as the key.
> The attached 0001 patch fixes it. I think we can fix it separately on
> HEAD as well as back branches.

I don't see a bug here, so I don't see the need for a backpatch -- the
block numbers in the tests are just numbers, they don't refer to
actual relations. I understand the desire to make it closer to
reality, but it seems cosmetic.

--
John Naylor
Amazon Web Services

Re: Batch TIDs lookup in ambulkdelete

From

Masahiko Sawada

Date:

09 June, 22:29:31

On Fri, Jun 6, 2025 at 4:28 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Fri, Jun 6, 2025 at 6:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > Agreed. Given the above test results, it's unlikely always sorting the
> > array helps speedups.
>
> Did you try specializing the sort? In my experience, it makes a big difference.

Thank you for the suggestion. I've done the same performance test with
the suggestion. The difference seems to be within an acceptable range.

                  HEAD         PATCHED    DIFF
case-6:     3,320 ms      3,374 ms    101.63%

I've attached the patches I used for this evaluation.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Batch TIDs lookup in ambulkdelete

From

Matheus Alcantara

Date:

10 June, 01:24:23

On 06/06/25 18:34, Masahiko Sawada wrote:
>
> Thank you for the report. With the 0001 and 0002 patches, I got a
> SEGV. I've fixed this issue in the attached updated version patches.
> I've confirmed that the patches pass CI tests but I'm not sure it
> fixes the shared memory segment leak problem you reported. The
> attached patches incorporated the comments[1] from John as well.
>

Thanks for the new version. I've tested the v3 version attached on [1]
and I can confirm that the tests are now passing and I can't see the
shared memory issues that I've mentioned on v1.

Thanks!

[1] https://www.postgresql.org/message-id/CAD21AoAcfp5kdcsT5727Vw4JF-Rw7b73zXh_GwXqNHg3P7-UoA%40mail.gmail.com

-- 
Matheus Alcantara

Re: Batch TIDs lookup in ambulkdelete

From

John Naylor

Date:

23 June, 13:55:06

On Thu, Jun 12, 2025 at 1:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Related to this, I
> realized that TidStoreSetBlockOffsets() checks if the given offset
> number is a valid OffsetNumber

To be specific, it does not check for OffsetNumberIsValid, it checks
that the offset can actually be stored in TidStore's bitmap array (see
comment above MAX_OFFSET_IN_BITMAP). In practice, I think it would
only matter with e.g. the combination of 32-bit platform and 32kB page
size, and even then only if some fork/AM/future core change allowed
about half the page to fill up with line pointers, since INT8_MAX *
BITS_PER_BITMAPWORD * sizeof(ItemIdData) = 16256

We should probably turn that elog into an assert. Maybe it would be
clearer if we just asserted the bitmap index is within range. We could
be more sure about the assertion if we changed "nwords" to be
unsigned. I think I had the idea to reserve -1 etc for a possible
special meaning, but I'm now thinking there's no use for that, since
we already have a separate struct member for flags.

> but doesn't do that for the given block
> number, which is not a bug neither, but I think we can add a check
> along with the test change.

The difference here is no configuration I can think of will cause an
unstorable block number to arrive here by accident. In fact, I imagine
TidStore would work just fine with 64-bit block numbers.

--
John Naylor
Amazon Web Services