Re: Minor optimizations in lazy_scan_heap - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Minor optimizations in lazy_scan_heap
Date
Msg-id CA+TgmobXdhS+-xt=knjYz0QoaqLKv1zEpEm1WbS2ctpFOe0N2g@mail.gmail.com
Whole thread Raw
In response to Minor optimizations in lazy_scan_heap  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: Minor optimizations in lazy_scan_heap
List pgsql-hackers
On Mon, Dec 3, 2012 at 1:23 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
> I was looking at the code in lazy_scan_heap() and I realized there are
> couple of low-hanging optimizations that we can do there.
>
> 1. The for-loop walks through each block of the relation. But if scan_all is
> set to false, which would be the case most often, we can jump over to the
> next not-all-visible block directly (after considering the
> SKIP_PAGES_THRESHOLD etc). I understand that the cost of looping with no-op
> may not be considerable, but it looks unnecessary. And it can matter when
> there are thousands and millions of consecutive all-visible blocks in a
> large table.
>
> 2. We also do a visibilitymap_test() for each block. I think it will be more
> prudent to have a visibilitymap API, say visibilitymap_test_range(), which
> can take a range of blocks and return the first not-all-visible block from
> the range. Internally, the function can then test several blocks at a time.
> We can still do this without holding a lock on the VM buffer because when
> scan_all is false, we don't care much about the correctness of the
> visibility check anyway. Also, this function can later be optimized if we
> start saving some summary information about visibility maps, in which case
> we can more efficiently find first not-all-visible block.
>
> 3. I also thought that the call to vacuum_delay_point() for every visibility
> check is not required and a simple CHECK_FOR_INTERRUPTS would be good
> enough. Later I realized that may be we need that because visibility map
> check can do an IO for the VM page. But if we do 2, then we can at least
> limit calling vacuum_delay_point() once for every VM page, instead of one
> per bit. I concede that the cost of calling vacuum_delay_point() may not be
> too high, but it again looks unnecessary and can be taken care by a slight
> re-factoring of the code.
>
> Comments ? Anyone thinks any/all of above is useful ?

I doubt that any of these things make enough difference to be worth
bothering with, but if you have benchmark results suggesting otherwise
I'm all ears.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [PATCH] Patch to fix libecpg.so for isinf missing
Next
From: Robert Haas
Date:
Subject: Re: WIP: store additional info in GIN index