Re: VACUUM memory management - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: VACUUM memory management
Date
Msg-id 20200405202625.GC2228@telsasoft.com
Whole thread Raw
In response to Re: VACUUM memory management  (Ibrar Ahmed <ibrar.ahmad@gmail.com>)
List pgsql-hackers
On Wed, Dec 11, 2019 at 09:29:17PM +0500, Ibrar Ahmed wrote:
> > Did you modify Claudio's patch or write a totally new one?
> 
> I wrote completely new patch. I tried multiple techniques like using a list
> instead of fixed size array which I thought was most suitable here, but
> leave that because of conflict with Parallel Vacuum.

Using a list will hardly work, or certainly not well, since it needs to be
searched by the ambulkdelete callback.

> >> If you wrote a totally new one, have you compared your work with
> >> Claudio's, to see if he covered
> >> anything you might need to cover?
> >
> > I checked the patch, and it does not do anything special which my patch is
> not doing except one thing. The patch is claiming to increase the limit of
> 1GB along with that, but I have not touched that. In my case, we are still
> under the limit of maintaines_work_mem but allocate memory in chunks. In
> that case, you have the leverage to set a big value of maintaness_work_mem
> (even if you don't need that)  because it will not allocate all the memory
> at the start.

After spending a bunch of time comparing them, I disagree.  Claudio's patch
does these:

 - avoid using multiple chunks if there's no indexes, therefore no need to
   avoid the high cost of index scans to avoid;
 - rather than doing an index scan for each chunk (bad), the callback function
   lazy_tid_reaped() does a custom binary search *over* chunks of different
   sizes and then *within* each chunk.  That's maybe slighly over-engineered,
   I'm not convinced that's needed (but I thought it was pretty clever), but
   someone thought that was important.
 - properly keep track of *total* number of dead tuples, eg for progress
   reporting, and for prev_dead_count for pages with no dead tuples;
 - lazy_record_dead_tuple() doubles allocation when running out of space for
   dead tuples; some people disagree with that (myself included) but I'm
   including it here since that's what it does.  This still seems nontrivial
   (to me) to adapt to work with parallel query.

-- 
Justin



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: backup manifests and contemporaneous buildfarm failures
Next
From: Andres Freund
Date:
Subject: Re: CLOBBER_CACHE_ALWAYS regression instability