Re: Visibility map thoughts - Mailing list pgsql-hackers

From Gokulakannan Somasundaram
Subject Re: Visibility map thoughts
Date
Msg-id 9362e74e0711050550o464a458foce2890909fbb2845@mail.gmail.com
Whole thread Raw
In response to Visibility map thoughts  (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses Re: Visibility map thoughts  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
List pgsql-hackers
This is my feedback regarding the Visibility map.

I just want to disagree on the fact that the DSM/Visibility map would not focus on Vacuum. I think Index only scans should be thought of as a fringe benefit of DSMs/ Visibility Maps. There are some basic assumptions in the design, which says that
a) The inserts won't increase the size of the table. If it increases, it has to lock one full page of Visibility map and this is not suitable for tables, which are short-lived like partitioned tables
b) Even if the inserts don't increase the size of the table, it might make DSM useless, if lot of inserts keep converting the all-visible ones to uncertain ones. For that matter, even the Deletes and Updates are also going to make lot of pages into uncertain ones.
c) Visibility map gets useless, when there is a long running batch query / periodic background queries which run for longer times
d) More updates- more blocks of uncertainity - space usage by DSM and the reference made to DSM is just an overhead
e) Lot of times, people may not need index-only scans. Again this gets to be a overhead
f) If there are scheduled reboots, the DSM crashes and periodic slow-downs in the queries during the time, the DSM gets re-constructed.

I am not opposing this, as it is a redundant feature for Thick indexes. After all every one of us, want Postgres to be the fastest one in the world.

But because DSM has a inherent assumption that lot of tables will become static and all the tuples would be visible to everyone. If there are such tables, then definitely Thick index becomes a overhead in terms of space. But DSM should not become overhead at any cost, as it is a memory resident one at all times and also always gets into the lifecycle of a query. Only way to achieve it is to make it a dual purpose one. It should help Vacuum, freezing and visibility checks.

Thanks,
Gokul.

On 11/5/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:
Though 8.3 isn't out of the oven just yet, I've been thinking about the
dead space map a lot, and decided I have to start writing down those
thoughts.

Reducing VACUUM time is important, but the real big promise is the
ability to do index-only-scans. Because that's the main focus of this
exercise, I'm calling it the the Visibility Map from now on, because
it's not about tracking dead space, but tuple visibility in general.
Don't worry, reduced VACUUM times on read-mostly tables with hot spots
will still fall out of it.



What to store in the Visibility Map?
------------------------------------

Each potential user of the visibility map needs slightly different
information:

a) VACUUM needs to know if a page has enough dead tuples that it's
interesting to visit.
b) VACUUM FREEZE needs to know if a page has any non-frozen tuples that
can be frozen.
c) Index-only-scans needs to know if all tuples on a page are visible to
all transactions. HOT updates are OK, though.

From this point on, this document concentrates on a map suited for
index-only-scans.

The basic structure is a bitmap with one bit per heap page. A set bit
means "all tuples on heap page are visible to all transactions". A
cleared bit means that we don't know if that's true or not.

This kind of map is useful for VACUUM as well, though VACUUM could get
away with less strict semantics, and we might not want to visit a page
in VACUUM if there's only very few dead tuples on a page. What this
implementation gives us is a conservative VACUUM that will always find
all the same dead tuples as a regular VACUUM (except for HOT updated
tuples which can be pruned without scanning indexes). VACUUM using the
visibility map can't update stats, so we'd still want regular vacuums
every now and then, or rely on ANALYZE to do that.

It's not useful for VACUUM FREEZE, unless we're willing to freeze much
more aggressively, and change the meaning of a set bit to "all tuples on
heap page are frozen".

Other kinds of visibility maps could be useful in some narrow scenarios.
For example, if we had a map of pages that have no live tuples, we could
skip those pages in sequential scans. We could also use more than one
bit per page to store more information, but let's keep it simple for now.


Where to store the visibility map?
----------------------------------
a) In a fixed size shared memory struct. Not acceptable.

b) In the special area of every nth heap page. I played a bit with this
back in February 2006, because it's simple and requires few changes.

c) As a new relation kind, like toast tables.

d) As an auxiliary smgr relation, attached to the heap.

I'm leaning towards D at the moment. It requires a little bit of changes
to the buffer manager API and elsewhere, but not too much.

B is not good because we might want to have other kinds of auxiliary
files in the future (FSM in particular). And it is a bit hacky anyway.
Let's do something more generic.

C doesn't seem like the right approach, because a visibility map doesn't
really have much in common with "real" relations.

In any case, the bitmap is divided into pages, and the pages live in
shared_buffers, and are managed by the buffer manager like any other page.

Updating the visibility map
---------------------------

A single page in the visibility map covers a wide range of heap pages
((8192 - page header) * 8 = ~65000 heap pages). It can easily become
locking bottleneck, if every update on any page in that range needs to
lock the visibility map page.

Setting a bit is just a hint. It's ok to lose it on crash. However,
setting a bit mustn't hit the disk too early. What might otherwise
happen is that the change that made all tuples on a page visible, like
committing an inserting transaction, isn't replayed after crash, but the
set bit is already on disk. In other words, setting a bit must set the
LSN of the visibility map page.

Clearing a bit is the opposite. Clearing a bit mustn't be lost, but it's
ok if it hits the disk before the operation that cleared it. Therefore
we don't need to set the LSN, but it must be redone at WAL replay of
heap insert/update/delete.

Because a concurrent update on the same word might be lost if we didn't
lock the page at all, we need to lock the visibility map page to set or
clear a bit. Since these are quick operations, especially clearing a
bit, perhaps we could use just the buffer header spinlock.

To further reduce contention, we can have a copy of the bit in the page
header of the heap page itself. That way we'd only need to access the
visibility map on the first update on a page that clears a bit.

When to update the visibility map?
----------------------------------
- Clear bit on every update/insert/delete.
- Set bit in vacuum or prune, if all tuples on page are now visible to
all transactions.
- If prune sees that all tuples are LIVE or INSERT_IN_PROGRESS, we can't
set the bit yet, but as soon as the maximum xmin on the page <
OldestXmin, we can. Call PageSetPrunable accordingly, to prune again
after that happens.
- We don't need to clear the bit on HOT updates, because by definition
none of the indexed columns changed.




Next we need to figure out how to actually do the index-only-scans,
using the visibility map. And how to use it for VACUUMs; Itagaki's dsm
patch addressed that, and assuming it does what we want, that part of
the patch is still usable.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match



--
Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Fwd: Clarification about HOT
Next
From: "Gokulakannan Somasundaram"
Date:
Subject: Re: Fwd: Clarification about HOT