Boundary value check in lazy_tid_reaped() - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Boundary value check in lazy_tid_reaped()
Date
Msg-id CA+fd4k76j8jKzJzcx8UqEugvayaMSnQz0iLUt_XgBp-_-bd22A@mail.gmail.com
Whole thread Raw
Responses Re: Boundary value check in lazy_tid_reaped()  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Hi all,

In the current lazy vacuum implementation, some index AMs such as
btree indexes call lazy_tid_reaped() for each index tuples during
ambulkdelete to check if the index tuple points to the (collected)
garbage tuple. In that function, we simply call bsearch(3) but we
should be able to know the result without bsearch(3) if the index
tuple points to the heap tuple that is out of range of the collected
garbage tuples. I've checked whether bsearch(3) does the boundary
value check or not but looking at the bsearch(3) implementation of
FreeBSD libc[1], there is not. The same is true for glibc.

So my proposal is to add boundary value check in lazy_tid_reaped()
before executing bsearch(3). This will help when index vacuum happens
multiple times or when garbage tuples are concentrated to a narrow
range.

I’ve done three performance tests. maintenance_work_mem is set to 64MB
with which we can collect about 11 million tuples at maximum and the
table size is 10GB. The table has one btree index. Here is the average
of index vacuum (ambulkdelete) execution time of three trials:

* Test case 1

I made all pages dirty with 15 million garbage tuples to invoke index
vacuum twice.

HEAD: 164.7 s
Patched: 138.81 s

* Test case 2

I made all pages dirty with 10 million garbage tuples to invoke index
vacuum only once, checking the performance degradation.

HEAD: 127.8 s
Patched: 128.25 s

* Test case 3

I made a certain range of table dirty with 1 million garbage tuples.

HEAD: 31.07 s
Patched: 9.41 s

I thought that we can have a generic function wrapping bsearch(3) that
does boundary value checks and then does bsearch(3) so that we can use
it in other similar places as well. But the attached patch doesn't do
that as I'd like to hear opinions on the proposal first.

Feedback is very welcome.

Regards,

[1] https://svnweb.freebsd.org/base/head/lib/libc/stdlib/bsearch.c?revision=326025&view=markup

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Next
From: "Andrey M. Borodin"
Date:
Subject: Re: Yet another fast GiST build (typo)