Re: [PATCHES] Including Snapshot Info with Indexes - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [PATCHES] Including Snapshot Info with Indexes
Date
Msg-id 471DF44A.9030605@enterprisedb.com
Whole thread Raw
In response to Re: [PATCHES] Including Snapshot Info with Indexes  (Hannu Krosing <hannu@skype.net>)
Responses Re: [PATCHES] Including Snapshot Info with Indexes
List pgsql-hackers
Hannu Krosing wrote:
> I would suggest that you use just an additional heap with decoupled
> visibility fields as DSM.

Yeah, I remember you've suggested that before, and I haven't responded
this far. The problems I see with that approach are:

1) How do you know which visibility info corresponds which heap tuple?
You'd need to have a pointer from the visibility info to the heap tuple,
and from the heap tuple to the visibility info. Which increases the
total (uncompressed) storage size.

2) If the visibility info / heap ordering isn't the same, seqscans need
to do random I/O.

3) If you need to do regular index scans, you're going to have to access
the index, the heap and the visibility info separately, and in that
order. That sounds expensive.

4) It's a big and complex change.

The significance of 2 and 3 depends a lot on how much of the visibility
information is in cache.

> For a large number of usage scenarios this will be highly compressible
> and will mostly stay in processor caches .

This seems to be where the potential gains are coming from in this
scheme. It boils down to how much compression you can do, and how
expensive it is to access the information in compressed form.

> 1) it is usually higly compressible, at least you can throw away
> cmin/cmax quite soon, usually also FREEZE and RLE encode the rest.

If you RLE compress the data, you'll need to figure out what to do when
you need update a field and it doesn't compress as well anymore. You
might have to move things around pages, so you'll have to update any
pointers to that information atomically.

> 2) faster access, more tightly packed data pages.

But you do need to access the visibility information as well, at least
on tuples that match the query.

> 5) makes VACUUM faster even for worst cases (interleaving live and dead
> tuples)

Does it? You still need to go to the heap pages to actually remove the
dead tuples. I suppose you could skip that and do it the first time you
access the page, like we do pruning with HOT.

> 6) any index scan will be faster due to fetching only visible rows from
> main heap.

Assuming the visibility information is already in cache, and that
there's enough non-visible tuples for that to matter.

>> BTW, another issue you'll have to tackle, that a DSM-based patch will
>> have to solve as well, is how to return tuples from an index. In b-tree,
>> we scan pages page at a time, keeping a list of all tids that match the
>> scanquals in BTScanOpaque. If we need to return not only the tids of the
>> matching tuples, but the tuples as well, where do we store them? You
>> could make a palloc'd copy of them all, but that seems quite expensive.
> 
> Have you considered returning them as "already visibility-checked pages"
> similar to what views or set-returning functions return ?

Sorry, I don't understand what you mean by that.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: "Jonah H. Harris"
Date:
Subject: Re: MVCC, undo log, and HOT
Next
From: Hannu Krosing
Date:
Subject: Re: [PATCHES] Including Snapshot Info with Indexes