Re: [PATCHES] Including Snapshot Info with Indexes - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: [PATCHES] Including Snapshot Info with Indexes
Date
Msg-id 1193145947.17735.33.camel@hannu-laptop
Whole thread Raw
In response to Re: [PATCHES] Including Snapshot Info with Indexes  ("Gokulakannan Somasundaram" <gokul007@gmail.com>)
Responses Re: [PATCHES] Including Snapshot Info with Indexes
List pgsql-hackers
Ühel kenal päeval, T, 2007-10-23 kell 18:36, kirjutas Gokulakannan
Somasundaram:

>         
>         There are several advantages to keeping a separate visibility
>         heap:
>         
>         1) it is usually higly compressible, at least you can throw
>         away 
>         cmin/cmax quite soon, usually also FREEZE and RLE encode the
>         rest.
>         
>         2) faster access, more tightly packed data pages.
>         
>         3) index-only scans
>         
>         4) superfast VACUUM FREEZE
>         
>         5) makes VACUUM faster even for worst cases (interleaving live
>         and dead 
>         tuples)
>         
>         6) any index scan will be faster due to fetching only visible
>         rows from
>         main heap.
> 
> if you have to store the visibility fields of all the tuples of each
> table, then you may not be able to accomodate in the cache. Say if a
> table is  of 1 million rows, we would need 22 MB of  visibility
> space(since visibility info takes 16 bytes. I think if we have to link
> it with say tuple-id(6 Bytes). 

You can keep the visibility info small, by first dropping cmin/cmax and
then FREEZ'ing the tuples (setting xmin to special value), after that
you can replace a lot of visibility info tuples with single RLE encoded
tuple, which simply states, that tuples N:A to M:B are visible.

If that 1 million row table is mostly static, the static parts will soon
have (al lot) less than 1 bit in visibility heap.

For example, after vacuum there will be just one visibility info which
say that whole table is visible.

I envision HOT-like on-the-fly VACUUM FREEZE manipulations of visibility
info so it won't grow very big at all.

> I think we may need to link it with indexes with one more id. i am not
> counting that now). 

why ?

we will keep visibility info for ctids (PAGE:NR) and if we need to see,
if any ctid pointe from index points to a visible tuple we check it
based on that ctid.

>  If we have  10 tables, then we will have 220 MB.  Keeping them pinned
> in memory may not be advisable in some circumstances. 

no no! no pinning, the "mostly in cache" will happen automatically (and
I mean mostly in processors _internal_ L1 or L2 cache, not just in RAM)

> If it is not going to be in memory, then that is no different from
> referring a table. But i accept that is a concept worth trying out. I
> think the advantage with thick indexes comes with the fact, that it is
> optional. If we can make this also as optional, that would be better. 
> But if we are going to suggest it as a replacement of DSM, then it
> loses the advantage of being small. 

I agree that a single-purpose DSM can be made smaller than multi-purpose
visibility heap.


------------------
Hannu




pgsql-hackers by date:

Previous
From: "Heikki Linnakangas"
Date:
Subject: Re: [PATCHES] Including Snapshot Info with Indexes
Next
From: Tom Lane
Date:
Subject: Re: Feature Freeze date for 8.4