[RFC] indirect toast tuple support - Mailing list pgsql-hackers

From Andres Freund
Subject [RFC] indirect toast tuple support
Date
Msg-id 20130216164231.GA15069@awork2.anarazel.de
Whole thread Raw
Responses Re: [RFC] indirect toast tuple support  (Andres Freund <andres@2ndquadrant.com>)
Re: [RFC] indirect toast tuple support  (Greg Stark <stark@mit.edu>)
Re: [RFC] indirect toast tuple support  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

During logical decoding toast tuples are decoded separately from the
main tuple. Works nicely. To make the main table's HeapTuple actually
easily useable the tuple needs to be "reconstructed" to not point to
disk anymore but to the separately reconstructed tuples.

There are two ways to do this:
a) build a new HeapTuple that contains all formerly toasted datums
   inline (i.e. !VARATT_IS_EXTERNAL)
b) add support for external toast tuples that point into separately
   allocated memory instead of a toast table

a) has the problem that that the flattened HeapTuple can be bigger than
our maximal allocation size which seems like an awkward restriction...

Given that there have been wishes to support something like b) for quite
some time, independent from logical decoding, it seems like a good idea
to add support for it. Its e.g. useful for avoiding repeated detoasting
or decompression of tuples.

The problem with b) is that there is no space in varlena's flag bits to
directly denote that a varlena points into memory instead of either
directly containing the data or a varattrib_1b_e containing a
varatt_external pointing to an on-disk toasted tuple.

I propose extending the EXTERNAL varlenas to be able to point to memory
instead just to disk. It seem apt to use EXTERNAL for this as they
aren't stored in the normal heap tuple but somewhere else.
Unfortunately there is no backward-compatible flag space in
varattrib_1b_e either to nicely denote this and we sure don't want to
break on-disk compatibility for this. Thus I propose to distinguish
on-disk and in-memory tuples via the varattrib_1b_e.va_len_1be.

The attached (RFC, not fully ready!) patch adds the following stuff to
the public interfaces:
typedef struct varatt_indirect
{
    struct varlena *pointer;    /* Pointer to in-memory varlena */
} varatt_indirect;

...

#define VARATT_IS_EXTERNAL(PTR)                VARATT_IS_1B_E(PTR)
#define VARATT_IS_EXTERNAL_TOAST(PTR) \
    (VARATT_IS_EXTERNAL(PTR) && VARSIZE_EXTERNAL(PTR) == TOAST_POINTER_SIZE)
#define VARATT_IS_EXTERNAL_INDIRECT(PTR) \
    (VARATT_IS_EXTERNAL(PTR) && VARSIZE_EXTERNAL(PTR) == INDIRECT_POINTER_SIZE)

I don't like to make the distinction through the size but I don't have a
better idea. And hey, its toast/varlena stuff, who expects cleanliness ;)

Existing code doesn't need to care whether a EXTERNAL datum is TOAST or
INDIRECT, that's handled transparently in tuptoaster.c. All EXTERNAL
tuples need to go through there anyway, so that seems fine.

Currently toast_fetch_datum() in tuptoaster.c does part of the gruntwork
for this as it was the easiest location but I think it might be better
to spread the work to some more callsites of it for clarity's sake.

Opinions?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Next
From: Andrew Dunstan
Date:
Subject: Re: JSON Function Bike Shedding