Re: Vacuum: allow usage of more than 1GB of work mem - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: Vacuum: allow usage of more than 1GB of work mem
Date
Msg-id CABOikdO_f7thU9MoicB_NGsGJBxRrjjQ2knjyMhCu-pnk=n7mQ@mail.gmail.com
Whole thread Raw
In response to Re: Vacuum: allow usage of more than 1GB of work mem  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Vacuum: allow usage of more than 1GB of work mem  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers


On Wed, Sep 14, 2016 at 8:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:


I am kind of doubtful about this whole line of investigation because
we're basically trying pretty hard to fix something that I'm not sure
is broken.    I do agree that, all other things being equal, the TID
lookups will probably be faster with a bitmap than with a binary
search, but maybe not if the table is large and the number of dead
TIDs is small, because cache efficiency is pretty important.  But even
if it's always faster, does TID lookup speed even really matter to
overall VACUUM performance? Claudio's early results suggest that it
might, but maybe that's just a question of some optimization that
hasn't been done yet.
 
Yeah, I wouldn't worry only about lookup speedup, but if does speeds up, that's a bonus. But the bitmaps seem to win even for memory consumption. As theory and experiments both show, at 10% dead tuple ratio, bitmaps will win handsomely.

 In
theory we could even start with the list of TIDs and switch to the
bitmap if the TID list becomes larger than the bitmap would have been,
but I don't know if it's worth the effort.


Yes, that works too. Or may be even better because we already know the bitmap size requirements, definitely for the tuples collected so far. We might need to maintain some more stats to further optimise the representation, but that seems like unnecessary detailing at this point.
 

On the other hand, if we switch to the bitmap as the ONLY possible
representation, we will lose badly when there are scattered updates -
e.g. 1 deleted tuple every 10 pages. 

Sure. I never suggested that. What I'd suggested is to switch back to array representation once we realise bitmaps are not going to work. But I see it's probably better the other way round.

 
So it seems like we probably
want to have both options.  One tricky part is figuring out how we
switch between them when memory gets tight; we have to avoid bursting
above our memory limit while making the switch.

Yes, I was thinking about this problem. Some modelling will be necessary to ensure that we don't go (much) beyond the maintenance_work_mem while switching representation, which probably means you need to do that earlier than necessary.
 
For instance, one idea to grow memory usage incrementally would be to
store dead tuple information separately for each 1GB segment of the
relation.  So we have an array of dead-tuple-representation objects,
one for every 1GB of the relation.  If there are no dead tuples in a
given 1GB segment, then this pointer can just be NULL.  Otherwise, it
can point to either the bitmap representation (which will take ~4.5MB)
or it can point to an array of TIDs (which will take 6 bytes/TID).
That could handle an awfully wide variety of usage patterns
efficiently; it's basically never worse than what we're doing today,
and when the dead tuple density is high for any portion of the
relation it's a lot better.


Yes seems like a good idea. Another idea that I'd in mind is to use some sort of indirection map where bitmap for every block or a set of blocks will either be recorded or not, depending on whether a bit is set for the range. If the bitmap exists, the indirection map will give out the offset into the larger bitmap area. Seems similar to what you described.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Next
From: Andres Freund
Date:
Subject: Re: Logical Replication WIP