Re: Visibility map thoughts - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Visibility map thoughts
Date
Msg-id 472F98CE.1020600@enterprisedb.com
Whole thread Raw
In response to Re: Visibility map thoughts  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Visibility map thoughts  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> Though 8.3 isn't out of the oven just yet, I've been thinking about the 
>> dead space map a lot, and decided I have to start writing down those 
>> thoughts.
> 
> I think we should do this at the same time as pushing the FSM out of
> shared memory, and design a data structure that serves both needs.

I'm afraid the data structure required for FSM is very different.

For the visibility map, we need something that can be quickly addressed 
by heap block number. For FSM, we need something that's addressable by 
tuple size.

Cache locality of having the FSM and the visibility map in the same 
structure is also not very good. Visibility map is accessed by reads, 
and updates/inserts/deletes on pages that previously had a set bit in 
the map, while the FSM is only needed for cold updates and deletes.

>> Where to store the visibility map?
>> ----------------------------------
> 
>> a) In a fixed size shared memory struct. Not acceptable.
> 
>> b) In the special area of every nth heap page. I played a bit with this 
>> back in February 2006, because it's simple and requires few changes.
> 
>> c) As a new relation kind, like toast tables.
> 
>> d) As an auxiliary smgr relation, attached to the heap.
> 
>> I'm leaning towards D at the moment. It requires a little bit of changes 
>> to the buffer manager API and elsewhere, but not too much.
> 
> I like B.  The main objection I have to D is that it'll be far more
> invasive to the buffer manager (and a bunch of other code) than you
> admit; for starters RelFileNode won't work as-is.

One problem is that you have to atomically update the visibility map 
when you update the heap. That means that you have to lock the 
visibility map page and the heap page at the same time. If the 
visibility map is in the heap, you need to take care that you don't 
deadlock.

We can't directly use B for the FSM, because we also need a FSM for 
indexes, so we'll need something else for indexes anyway.

B is also problematic if we ever get to do upgrades without 
dump/restore, because it requires changes to the format of the heap itself.

> What I'm envisioning is that we dedicate a quarter or half of every n'th
> heap page to visibility + free space map.  Probably a byte per page is
> needed (this would give us 1 visibility bit and 7 bits for free space,
> or other tradeoffs if needed).  So there would be 2K or 4K heap pages
> associated with each such special page.  Or we could dial it down to
> even less, say 1K heap pages per special page, which would have the
> advantage of reducing update contention for those pages.

How would you search that for a page with X bytes of free space?

> I don't think this really works.  You are effectively assuming that no
> kind of crash-induced corruption can set a bit that you didn't intend
> to set.  

Don't we already assume that for hint-bit updates?

> Stated in those terms it seems obviously bogus.  What we will
> need is that every WAL-logged update operation also include the
> appropriate setting or clearing of the page's visibility bit; where
> necessary, add new information to the WAL trace to allow this.

I think we already emit a WAL record whenever we would set the bit in 
the visibility map, so we can certainly do that. It was more an 
interesting observation than anything else.

>> To further reduce contention, we can have a copy of the bit in the page 
>> header of the heap page itself. That way we'd only need to access the 
>> visibility map on the first update on a page that clears a bit.
> 
> Seems exceedingly prone to corruption.

It does?

It seems fairly simple to keep it up-to-date. Whenever we update the 
visibility map, we're holding the heap page locked as well.

If you were thinking of hardware failure, I don't see how that bit would 
be more susceptible to that, and the potential damage isn't any bigger 
than from any other random bit-flip either.

We can double-check and correct both the bit in the page header and in 
the visibility map whenever we perform a regular vacuum (or prune), if 
they did get corrupt for some reason.

>> - We don't need to clear the bit on HOT updates, because by definition 
>> none of the indexed columns changed.
> 
> Huh?  I don't think I believe that, and I definitely don't believe your
> argument for it.

HOT-updating a row doesn't change what an index-only scan would see. An 
index-only scan only needs columns that are in the index, and since it's 
a HOT update, none of those columns have changed, so it doesn't matter 
which tuple we're returning them from.

Pages that have HOT updated tuples, but haven't otherwise been modified 
since last vacuum are also not interesting for VACUUM, because a prune 
will do the job of getting rid of dead HOT-updated tuples just as well.

Am I missing something?


Another thought: Should we support having a FSM and visibility map on 
temp tables? Doesn't seem very useful, but why not, I guess.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Segmentation fault using digest from pg_crypto
Next
From: "Mark Wong"
Date:
Subject: Re: Test lab