Home > mailing lists

Re: Variable length varlena headers redux - Mailing list pgsql-hackers

From	Bruce Momjian
Subject	Re: Variable length varlena headers redux
Date	February 8, 2007 23:58:09
Msg-id	200702090358.l193w7v02893@momjian.us Whole thread Raw
Responses	Re: Variable length varlena headers redux
List	pgsql-hackers

Tree view

Uh, I thought the approach was to create type-specific in/out functions,
and add casting so every time there were referenced, they would expand
to a varlena structure in memory.

---------------------------------------------------------------------------

Gregory Stark wrote:
> 
> I've been looking at this again and had a few conversations about it. This may
> be easier than I had originally thought but there's one major issue that's
> bugging me. Do you see any way to avoid having every user function everywhere
> use a new macro api instead of VARDATA/VARATT_DATA and VARSIZE/VARATT_SIZEP?
> 
> The two approaches I see are either 
> 
> a) To have two sets of macros, one of which, VARATT_DATA and VARATT_SIZEP are
> for constructing new tuples and behaves exactly as it does now. So you always
> construct a four-byte header datum. Then in heap_form*tuple we check if you
> can use a shorter header and convert. VARDATA/VARSIZE would be for looking at
> existing datums and would interpret the header bits.
> 
> This seems very fragile since one stray call site using VARATT_DATA to find
> the data in an existing datum would cause random bugs that only occur rarely
> in certain circumstances. It would even work as long as the size is filled in
> with VARATT_SIZEP first which it usually is, but fail if someone changes the
> order of the statements.
> 
> or 
> 
> b) throw away VARATT_DATA and VARATT_SIZEP and make all user function
> everywhere change over to a new macro api. That seems like a pretty big
> burden. It's safer but means every contrib module would have to be updated and
> so on.
> 
> I'm hoping I'm missing something and there's a way to do this without breaking
> the api for every user function.
> 
> 

-- Start of included mail From: Tom Lane <tgl@sss.pgh.pa.us>

> To: Gregory Stark <stark@enterprisedb.com>
> cc: Gregory Stark <gsstark@mit.edu>, Bruce Momjian <bruce@momjian.us>, 
>             Peter Eisentraut <peter_e@gmx.net>, pgsql-hackers@postgresql.org, 
>             Martijn van Oosterhout <kleptog@svana.org>
> Subject: Re: [HACKERS] Fixed length data types issue 
> Date: Mon, 11 Sep 2006 13:15:43 -0400
> Lines: 64
> Xref: stark.xeocode.com work.enterprisedb:683

> Gregory Stark <stark@enterprisedb.com> writes:
> > In any case it seems a bit backwards to me. Wouldn't it be better to
> > preserve bits in the case of short length words where they're precious
> > rather than long ones? If we make 0xxxxxxx the 1-byte case it means ...
> 
> Well, I don't find that real persuasive: you're saying that it's
> important to have a 1-byte not 2-byte header for datums between 64 and
> 127 bytes long.  Which is by definition less than a 2% savings for those
> values.  I think its's more important to pick bitpatterns that reduce
> the number of cases heap_deform_tuple has to think about while decoding
> the length of a field --- every "if" in that inner loop is expensive.
> 
> I realized this morning that if we are going to preserve the rule that
> 4-byte-header and compressed-header cases can be distinguished from the
> data alone, there is no reason to be very worried about whether the
> 2-byte cases can represent the maximal length of an in-line datum.
> If you want to do 16K inline (and your page is big enough for that)
> you can just fall back to the 4-byte-header case.  So there's no real
> disadvantage if the 2-byte headers can only go up to 4K or so.  This
> gives us some more flexibility in the bitpattern choices.
> 
> Another thought that occurred to me is that if we preserve the
> convention that a length word's value includes itself, then for a
> 1-byte header the bit pattern 10000000 is meaningless --- the count
> has to be at least 1.  So one trick we could play is to take over
> this value as the signal for "toast pointer follows", with the
> assumption that the tuple-decoder code knows a-priori how big a
> toast pointer is.  I am not real enamored of this, because it certainly
> adds one case to the inner heap_deform_tuple loop and it'll give us
> problems if we ever want more than one kind of toast pointer.  But
> it's a possibility.
> 
> Anyway, a couple of encodings that I'm thinking about now involve
> limiting uncompressed data to 1G (same as now), so that we can play
> with the first 2 bits instead of just 1:
> 
> 00xxxxxx    4-byte length word, aligned, uncompressed data (up to 1G)
> 01xxxxxx    4-byte length word, aligned, compressed data (up to 1G)
> 100xxxxx    1-byte length word, unaligned, TOAST pointer
> 1010xxxx    2-byte length word, unaligned, uncompressed data (up to 4K)
> 1011xxxx    2-byte length word, unaligned, compressed data (up to 4K)
> 11xxxxxx    1-byte length word, unaligned, uncompressed data (up to 63b)
> 
> or
> 
> 00xxxxxx    4-byte length word, aligned, uncompressed data (up to 1G)
> 010xxxxx    2-byte length word, unaligned, uncompressed data (up to 8K)
> 011xxxxx    2-byte length word, unaligned, compressed data (up to 8K)
> 10000000    1-byte length word, unaligned, TOAST pointer
> 1xxxxxxx    1-byte length word, unaligned, uncompressed data (up to 127b)
>         (xxxxxxx not all zero)
> 
> This second choice allows longer datums in both the 1-byte and 2-byte
> header formats, but it hardwires the length of a TOAST pointer and
> requires four cases to be distinguished in the inner loop; the first
> choice only requires three cases, because TOAST pointer and 1-byte
> header can be handled by the same rule "length is low 6 bits of byte".
> The second choice also loses the ability to store in-line compressed
> data above 8K, but that's probably an insignificant loss.
> 
> There's more than one way to do it ...
> 
>             regards, tom lane
> 
-- End of included mail.

> 
> 
> -- 
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

pgsql-hackers by date:

From: "Simon Riggs"
Date: 08 February 2007, 23:54:01
Subject: Re: [PATCHES] [pgsql-patches] Phantom CommandIDs,updated patch

From: Koichi Suzuki
Date: 09 February 2007, 00:02:09
Subject: Re: Archive log compression keeping physical log available in the crash recovery

Re: Variable length varlena headers redux - Mailing list pgsql-hackers

Previous

Next