Thread: Variable length varlena headers redux

Variable length varlena headers redux

From

Gregory Stark

Date:

12 February 2007, 13:52:20

I've been looking at this again and had a few conversations about it. This may
be easier than I had originally thought but there's one major issue that's
bugging me. Do you see any way to avoid having every user function everywhere
use a new macro api instead of VARDATA/VARATT_DATA and VARSIZE/VARATT_SIZEP?

The two approaches I see are either

a) To have two sets of macros, one of which, VARATT_DATA and VARATT_SIZEP are
for constructing new tuples and behaves exactly as it does now. So you always
construct a four-byte header datum. Then in heap_form*tuple we check if you
can use a shorter header and convert. VARDATA/VARSIZE would be for looking at
existing datums and would interpret the header bits.

This seems very fragile since one stray call site using VARATT_DATA to find
the data in an existing datum would cause random bugs that only occur rarely
in certain circumstances. It would even work as long as the size is filled in
with VARATT_SIZEP first which it usually is, but fail if someone changes the
order of the statements.

or

b) throw away VARATT_DATA and VARATT_SIZEP and make all user function
everywhere change over to a new macro api. That seems like a pretty big
burden. It's safer but means every contrib module would have to be updated and
so on.

I'm hoping I'm missing something and there's a way to do this without breaking
the api for every user function.

Gregory Stark <stark@enterprisedb.com> writes:
> In any case it seems a bit backwards to me. Wouldn't it be better to
> preserve bits in the case of short length words where they're precious
> rather than long ones? If we make 0xxxxxxx the 1-byte case it means ...

Well, I don't find that real persuasive: you're saying that it's
important to have a 1-byte not 2-byte header for datums between 64 and
127 bytes long.  Which is by definition less than a 2% savings for those
values.  I think its's more important to pick bitpatterns that reduce
the number of cases heap_deform_tuple has to think about while decoding
the length of a field --- every "if" in that inner loop is expensive.

I realized this morning that if we are going to preserve the rule that
4-byte-header and compressed-header cases can be distinguished from the
data alone, there is no reason to be very worried about whether the
2-byte cases can represent the maximal length of an in-line datum.
If you want to do 16K inline (and your page is big enough for that)
you can just fall back to the 4-byte-header case.  So there's no real
disadvantage if the 2-byte headers can only go up to 4K or so.  This
gives us some more flexibility in the bitpattern choices.

Another thought that occurred to me is that if we preserve the
convention that a length word's value includes itself, then for a
1-byte header the bit pattern 10000000 is meaningless --- the count
has to be at least 1.  So one trick we could play is to take over
this value as the signal for "toast pointer follows", with the
assumption that the tuple-decoder code knows a-priori how big a
toast pointer is.  I am not real enamored of this, because it certainly
adds one case to the inner heap_deform_tuple loop and it'll give us
problems if we ever want more than one kind of toast pointer.  But
it's a possibility.

Anyway, a couple of encodings that I'm thinking about now involve
limiting uncompressed data to 1G (same as now), so that we can play
with the first 2 bits instead of just 1:

00xxxxxx    4-byte length word, aligned, uncompressed data (up to 1G)
01xxxxxx    4-byte length word, aligned, compressed data (up to 1G)
100xxxxx    1-byte length word, unaligned, TOAST pointer
1010xxxx    2-byte length word, unaligned, uncompressed data (up to 4K)
1011xxxx    2-byte length word, unaligned, compressed data (up to 4K)
11xxxxxx    1-byte length word, unaligned, uncompressed data (up to 63b)

or

00xxxxxx    4-byte length word, aligned, uncompressed data (up to 1G)
010xxxxx    2-byte length word, unaligned, uncompressed data (up to 8K)
011xxxxx    2-byte length word, unaligned, compressed data (up to 8K)
10000000    1-byte length word, unaligned, TOAST pointer
1xxxxxxx    1-byte length word, unaligned, uncompressed data (up to 127b)
        (xxxxxxx not all zero)

This second choice allows longer datums in both the 1-byte and 2-byte
header formats, but it hardwires the length of a TOAST pointer and
requires four cases to be distinguished in the inner loop; the first
choice only requires three cases, because TOAST pointer and 1-byte
header can be handled by the same rule "length is low 6 bits of byte".
The second choice also loses the ability to store in-line compressed
data above 8K, but that's probably an insignificant loss.

There's more than one way to do it ...

            regards, tom lane

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: Variable length varlena headers redux

From

Tom Lane

Date:

13 February 2007, 00:19:21

Gregory Stark <gsstark@mit.edu> writes:
> Do you see any way to avoid having every user function everywhere
> use a new macro api instead of VARDATA/VARATT_DATA and VARSIZE/VARATT_SIZEP?

Bruce and I had a long phone conversation about that today. We
concluded that there doesn't seem to be any strong reason to change the
macros for *reading* a varlena value's length. The problem is with the
code that *writes* a length word. The mother example is textin():
result = (text *) palloc(len + VARHDRSZ);VARATT_SIZEP(result) = len + VARHDRSZ;memcpy(VARDATA(result), inputText,
len);

There is not a lot we can do with that second line: it's going to assign
the value of "len + VARHDRSZ" to something that has to be a native C
variable, struct field, or at best bit field.

If we replaced that line with something like
SET_VARLENA_LEN(result, len + VARHDRSZ);

then we'd have a *whole* lot more control. For example it'd be easy to
implement the previously-discussed design involving storing uncompressed
length words in network byte order: SET_VARLENA_LEN does htonl() and
VARSIZE does ntohl() and nothing else in the per-datatype functions
needs to change. Another idea that we were kicking around is to make
an explicit distinction between little-endian and big-endian hardware:
on big-endian hardware, store the two TOAST flag bits in the MSBs as
now, but on little-endian, store them in the LSBs, shifting the length
value up two bits. This would probably be marginally faster than
htonl/ntohl depending on hardware and compiler intelligence, but either
way you get to guarantee that the flag bits are in the physically first
byte, which is the critical thing needed to be able to tell the
difference between compressed and uncompressed length values.

By my count there are only 170 uses of VARATT_SIZEP in the entire
backend (including contrib) so this is not an especially daunting
change. It would break existing user-written functions that return
varlena values, but the fix wouldn't be painful for them either.
We could guarantee that every problem spot is found by removing the
current definition of the macro (and renaming the underlying struct
fields to get the attention of the truly unreconstructed...). The
important point here is that VARSIZE() still works, so only code that
creates a new varlena value need be affected, not code that examines
one.

Too bad we didn't get this right in the original TOAST rewrite; but
hindsight is always 20/20. Right now I think we gotta absorb some
pain if we want to improve the header-overhead situation, and this
seems about the minimum possible amount of pain.
regards, tom lane

Re: Variable length varlena headers redux

From

Magnus Hagander

Date:

13 February 2007, 04:33:59

On Mon, Feb 12, 2007 at 11:19:14PM -0500, Tom Lane wrote:
> Gregory Stark <gsstark@mit.edu> writes:

> By my count there are only 170 uses of VARATT_SIZEP in the entire
> backend (including contrib) so this is not an especially daunting
> change.  It would break existing user-written functions that return
> varlena values, but the fix wouldn't be painful for them either.

Could the "new style" macros be back-ported to previous releases in case
we do this? That way module maintainers wouldn't need to maintain
two different sets of code for it - they could use the new style and
just compile it against an older version of pg?

//Magnus

Re: Variable length varlena headers redux

From

Gregory Stark

Date:

13 February 2007, 10:09:17

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> If we replaced that line with something like
>
>     SET_VARLENA_LEN(result, len + VARHDRSZ);
>
> then we'd have a *whole* lot more control.  

I think that part was already clear. The problem was in VARDATA.

I don't really see a way around it though. Places that fill in VARDATA before
the size (formatting.c seems to be the worst case) will just have to be
changed and it'll be a fairly fragile point.

> For example it'd be easy to implement the previously-discussed design
> involving storing uncompressed length words in network byte order:
> SET_VARLENA_LEN does htonl() and VARSIZE does ntohl() and nothing else in
> the per-datatype functions needs to change. Another idea that we were
> kicking around is to make an explicit distinction between little-endian and
> big-endian hardware: on big-endian hardware, store the two TOAST flag bits
> in the MSBs as now, but on little-endian, store them in the LSBs, shifting
> the length value up two bits. This would probably be marginally faster than
> htonl/ntohl depending on hardware and compiler intelligence, but either way
> you get to guarantee that the flag bits are in the physically first byte,
> which is the critical thing needed to be able to tell the difference between
> compressed and uncompressed length values.

Actually I think neither htonl nor bitshifting the entire 4-byte word is going
to really work here. Both will require 4-byte alignment. Instead I think we
have to access the length byte by byte as a (char*) and do arithmetic. Since
it's the pointer being passed to VARSIZE that isn't too hard, but it might
perform poorly.

> The important point here is that VARSIZE() still works, so only code that
> creates a new varlena value need be affected, not code that examines one.

So what would VARSIZE() return, the size of the payload plus VARHDRSZ
regardless of what actual size the header was? That seems like it would break
the least existing code though removing all the VARHDRSZ offsets seems like it
would be cleaner.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: Variable length varlena headers redux

From

Bruce Momjian

Date:

13 February 2007, 10:22:24

Magnus Hagander wrote:
> On Mon, Feb 12, 2007 at 11:19:14PM -0500, Tom Lane wrote:
> > Gregory Stark <gsstark@mit.edu> writes:
> 
> > By my count there are only 170 uses of VARATT_SIZEP in the entire
> > backend (including contrib) so this is not an especially daunting
> > change.  It would break existing user-written functions that return
> > varlena values, but the fix wouldn't be painful for them either.
> 
> Could the "new style" macros be back-ported to previous releases in case
> we do this? That way module maintainers wouldn't need to maintain
> two different sets of code for it - they could use the new style and
> just compile it against an older version of pg?

Yes, Tom and I talked about this.  It could appear in the next minor
release of all branches.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Variable length varlena headers redux

From

Heikki Linnakangas

Date:

13 February 2007, 10:26:09

Gregory Stark wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> For example it'd be easy to implement the previously-discussed design
>> involving storing uncompressed length words in network byte order:
>> SET_VARLENA_LEN does htonl() and VARSIZE does ntohl() and nothing else in
>> the per-datatype functions needs to change. Another idea that we were
>> kicking around is to make an explicit distinction between little-endian and
>> big-endian hardware: on big-endian hardware, store the two TOAST flag bits
>> in the MSBs as now, but on little-endian, store them in the LSBs, shifting
>> the length value up two bits. This would probably be marginally faster than
>> htonl/ntohl depending on hardware and compiler intelligence, but either way
>> you get to guarantee that the flag bits are in the physically first byte,
>> which is the critical thing needed to be able to tell the difference between
>> compressed and uncompressed length values.
> 
> Actually I think neither htonl nor bitshifting the entire 4-byte word is going
> to really work here. Both will require 4-byte alignment. Instead I think we
> have to access the length byte by byte as a (char*) and do arithmetic. Since
> it's the pointer being passed to VARSIZE that isn't too hard, but it might
> perform poorly.

We would still require all datums with a 4-byte header to be 4-byte 
aligned, right? When reading, you would first check if it's a compressed 
or uncompressed header. If compressed, read the 1 byte header, if 
uncompressed, read the 4-byte header and do htonl or bitshifting. No 
need to do htonl or bitshifting on unaligned datums.

>> The important point here is that VARSIZE() still works, so only code that
>> creates a new varlena value need be affected, not code that examines one.
> 
> So what would VARSIZE() return, the size of the payload plus VARHDRSZ
> regardless of what actual size the header was? That seems like it would break
> the least existing code though removing all the VARHDRSZ offsets seems like it
> would be cleaner.

My vote would be to change every caller. Though there's a lot of 
callers, it's a very simple change.

To make it posible to compile an external module against 8.2 and 8.3, 
you could have a simple ifdef block to map the new macro to old 
behavior. Or we could backport the macro definitions as Magnus suggested.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Variable length varlena headers redux

From

Gregory Stark

Date:

13 February 2007, 10:41:50

"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

>> Actually I think neither htonl nor bitshifting the entire 4-byte word is going
>> to really work here. Both will require 4-byte alignment. Instead I think we
>> have to access the length byte by byte as a (char*) and do arithmetic. Since
>> it's the pointer being passed to VARSIZE that isn't too hard, but it might
>> perform poorly.
>
> We would still require all datums with a 4-byte header to be 4-byte aligned,
> right? When reading, you would first check if it's a compressed or uncompressed
> header. If compressed, read the 1 byte header, if uncompressed, read the 4-byte
> header and do htonl or bitshifting. No need to do htonl or bitshifting on
> unaligned datums.

It's not easy to require datums with 4-byte headers to be 4-byte aligned. How
do you know where to look for the bits to show it's an uncompressed header if
you don't know where it's aligned yet?

It could be done if you rule that if you're on an unaligned byte and see a \0
then scan forward until the aligned byte. But that seems just as cpu expensive
as just doing the arithmetic. And wastes space to boot.

I'm thinking VARSIZE would look something like:

#define VARSIZE((datum)) \  ((((uint8*)(datum))[0] & 0x80) ?    (((uint8*)(datum))[0] & 0x7F) : \
(((uint8*)(datum))[0]<<24 | ((uint8*)(datum))[1]<<16 | ((uint8*)(datum))[2]<<8 | ((uint8*)(datum))[0]))

Which is effectively the same as doing ntohl except that it only works for
left hand sides -- luckily VARSIZE always has a lhs. It also works for
unaligned accesses. It's going to be fairly slow but no slower than doing an
unaligned access looking at nul padding bytes.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: Variable length varlena headers redux

From

Tom Lane

Date:

13 February 2007, 10:44:25

Bruce Momjian <bruce@momjian.us> writes:
> Magnus Hagander wrote:
>> Could the "new style" macros be back-ported to previous releases in case
>> we do this?

> Yes, Tom and I talked about this.  It could appear in the next minor
> release of all branches.

I don't really see the point of that.  Third-party authors who want
their code to be backwards-compatible would do something like

#ifndef SET_VARLENA_LEN
#define SET_VARLENA_LEN(var,len)  (VARATT_SIZEP(var) = (len))
#endif

While we could provide this same macro in later updates of the current
release branches, those authors are still going to want to include the
above in their code so as to be able to compile against existing
releases.  Therefore there's not really much point in us doing it too.
        regards, tom lane

Re: Variable length varlena headers redux

From

Peter Eisentraut

Date:

13 February 2007, 10:45:23

Gregory Stark wrote:
> a) To have two sets of macros, one of which, VARATT_DATA and
> VARATT_SIZEP are for constructing new tuples and behaves exactly as
> it does now. So you always construct a four-byte header datum. Then
> in heap_form*tuple we check if you can use a shorter header and
> convert. VARDATA/VARSIZE would be for looking at existing datums and
> would interpret the header bits.

Has any thought been given to headers *longer* than four bytes?  I don't 
exactly recall a flood of field reports that one gigabyte per datum is 
too little, but as long as the encoding of variable length data is 
changed, one might as well prepare a little for the future.

Of course, that would put a dent into any plan that wants to normalize 
the header to four bytes somewhere along the way.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Variable length varlena headers redux

From

Magnus Hagander

Date:

13 February 2007, 11:03:29

On Tue, Feb 13, 2007 at 09:44:03AM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Magnus Hagander wrote:
> >> Could the "new style" macros be back-ported to previous releases in case
> >> we do this?
> 
> > Yes, Tom and I talked about this.  It could appear in the next minor
> > release of all branches.
> 
> I don't really see the point of that.  Third-party authors who want
> their code to be backwards-compatible would do something like
> 
> #ifndef SET_VARLENA_LEN
> #define SET_VARLENA_LEN(var,len)  (VARATT_SIZEP(var) = (len))
> #endif
> 
> While we could provide this same macro in later updates of the current
> release branches, those authors are still going to want to include the
> above in their code so as to be able to compile against existing
> releases.  Therefore there's not really much point in us doing it too.

It'd be a help to those who wouldn't be building against releases with
known security issues in them for one ;-)

Sure, it's not important or a dealbreaker or so, but it would be
convenient.

//Magnus

Re: Variable length varlena headers redux

From

Tom Lane

Date:

13 February 2007, 11:15:36

Gregory Stark <stark@enterprisedb.com> writes:
> I don't really see a way around it though. Places that fill in VARDATA before
> the size (formatting.c seems to be the worst case) will just have to be
> changed and it'll be a fairly fragile point.

No, we're not going there: it'd break too much code now and it'd be a
continuing source of bugs for the foreseeable future.  The sane way to
design this is that

(1) code written to existing practice will always generate 4-byte
headers.  (Hence, VARDATA() acts the same as now.)  That's the format
that generally gets passed around in memory.

(2) creation of a short header is handled by the TOAST code just before
the tuple goes to disk.

(3) replacement of a short header with a 4-byte header is considered
part of de-TOASTing.

After we have that working, we can work on offering alternative macros
that let specific functions avoid the overhead of conversion between
4-byte headers and short ones, in much the same way that there are TOAST
macros now that let specific functions get down-and-dirty with the
out-of-line TOAST representation.  But first we have to get to the point
where 4-byte-header datums can be distinguished from short-header datums
by inspection; and that requires either network byte order in the 4-byte
length word or some other change in its representation.

> Actually I think neither htonl nor bitshifting the entire 4-byte word is going
> to really work here. Both will require 4-byte alignment.

And your point is what?  The 4-byte form can continue to require
alignment, and *will* require it in any case, since many of the affected
datatypes expect alignment of the data within the varlena.  The trick is
that when we are examining a non-aligned address within a tuple, we have
to be able to tell whether we are looking at the first byte of a
short-header datum (not aligned) or a pad byte.  This is easily done,
for instance by decreeing that pad bytes must be zeroes.

I think we should probably consider making use of different alignment
codes for different varlena datatypes.  For instance the geometry types
probably will still need align 'd' since they contain doubles; this may
mean that we should just punt on any short-header optimization for them.
But text and friends could have align 'c' showing that they need no
padding and would be perfectly happy with a nonaligned VARDATA pointer.
(Actually, maybe we should only do this whole thing for 'c'-alignable
data types?  But NUMERIC is a bit of a problem, it'd like
's'-alignment.  OTOH we could just switch NUMERIC to an all-two-byte
format that's independent of TOAST per se.)
        regards, tom lane

Re: Variable length varlena headers redux

From

Bruce Momjian

Date:

13 February 2007, 11:24:24

Heikki Linnakangas wrote:
> Gregory Stark wrote:
> > "Tom Lane" <tgl@sss.pgh.pa.us> writes:
> >> For example it'd be easy to implement the previously-discussed design
> >> involving storing uncompressed length words in network byte order:
> >> SET_VARLENA_LEN does htonl() and VARSIZE does ntohl() and nothing else in
> >> the per-datatype functions needs to change. Another idea that we were
> >> kicking around is to make an explicit distinction between little-endian and
> >> big-endian hardware: on big-endian hardware, store the two TOAST flag bits
> >> in the MSBs as now, but on little-endian, store them in the LSBs, shifting
> >> the length value up two bits. This would probably be marginally faster than
> >> htonl/ntohl depending on hardware and compiler intelligence, but either way
> >> you get to guarantee that the flag bits are in the physically first byte,
> >> which is the critical thing needed to be able to tell the difference between
> >> compressed and uncompressed length values.
> > 
> > Actually I think neither htonl nor bitshifting the entire 4-byte word is going
> > to really work here. Both will require 4-byte alignment. Instead I think we
> > have to access the length byte by byte as a (char*) and do arithmetic. Since
> > it's the pointer being passed to VARSIZE that isn't too hard, but it might
> > perform poorly.
> 
> We would still require all datums with a 4-byte header to be 4-byte 
> aligned, right? When reading, you would first check if it's a compressed 
> or uncompressed header. If compressed, read the 1 byte header, if 
> uncompressed, read the 4-byte header and do htonl or bitshifting. No 
> need to do htonl or bitshifting on unaligned datums.

I am not sure how to handle the alignment issue.  If we require 1-byte
headers to be 4-byte aligned, we lose a lot of the benefits of the
1-byte header.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Variable length varlena headers redux

From

Heikki Linnakangas

Date:

13 February 2007, 11:30:47

Bruce Momjian wrote:
> Heikki Linnakangas wrote:
>> We would still require all datums with a 4-byte header to be 4-byte 
>> aligned, right? When reading, you would first check if it's a compressed 
>> or uncompressed header. If compressed, read the 1 byte header, if 
>> uncompressed, read the 4-byte header and do htonl or bitshifting. No 
>> need to do htonl or bitshifting on unaligned datums.
> 
> I am not sure how to handle the alignment issue.  If we require 1-byte
> headers to be 4-byte aligned, we lose a lot of the benefits of the
> 1-byte header.

Why would we require that?

I don't see a problem with having 4-byte header 4-byte aligned, and 
1-byte headers not aligned. The first access to the header is to check 
if it's a 4 or 1 byte header. That's a 1 byte wide access, requiring no 
alignment. After that you know which one it is.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Variable length varlena headers redux

From

Bruce Momjian

Date:

13 February 2007, 11:43:40

Heikki Linnakangas wrote:
> Bruce Momjian wrote:
> > Heikki Linnakangas wrote:
> >> We would still require all datums with a 4-byte header to be 4-byte 
> >> aligned, right? When reading, you would first check if it's a compressed 
> >> or uncompressed header. If compressed, read the 1 byte header, if 
> >> uncompressed, read the 4-byte header and do htonl or bitshifting. No 
> >> need to do htonl or bitshifting on unaligned datums.
> > 
> > I am not sure how to handle the alignment issue.  If we require 1-byte
> > headers to be 4-byte aligned, we lose a lot of the benefits of the
> > 1-byte header.
> 
> Why would we require that?
> 
> I don't see a problem with having 4-byte header 4-byte aligned, and 
> 1-byte headers not aligned. The first access to the header is to check 
> if it's a 4 or 1 byte header. That's a 1 byte wide access, requiring no 
> alignment. After that you know which one it is.

But if you are walking through attributes, how do you know to look at
the next byte or the next aligned byte?  We have to force zeros in
there?

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Variable length varlena headers redux

From

Tom Lane

Date:

13 February 2007, 11:48:49

Bruce Momjian <bruce@momjian.us> writes:
> But if you are walking through attributes, how do you know to look at
> the next byte or the next aligned byte?  We have to force zeros in
> there?

Yup: pad bytes must be zeroes (they are already) and a 1-byte-header
can't be a zero (easily done if its length includes itself).  So the
tuple-walking code would do something like
if (looking-at-a-zero && not-at-4-byte-boundary)    advance to next 4-byte boundary;check current byte to determine if
1-byteor 4-byte header;

        regards, tom lane

Re: Variable length varlena headers redux

From

Gregory Stark

Date:

13 February 2007, 11:49:32

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
> > I don't really see a way around it though. Places that fill in VARDATA before
> > the size (formatting.c seems to be the worst case) will just have to be
> > changed and it'll be a fairly fragile point.
> 
> No, we're not going there: it'd break too much code now and it'd be a
> continuing source of bugs for the foreseeable future.  The sane way to
> design this is that
> 
> (1) code written to existing practice will always generate 4-byte
> headers.  (Hence, VARDATA() acts the same as now.)  That's the format
> that generally gets passed around in memory.

So then we don't need to replace VARSIZE with SET_VARLENA_LEN at all.

> (2) creation of a short header is handled by the TOAST code just before
> the tuple goes to disk.
> 
> (3) replacement of a short header with a 4-byte header is considered
> part of de-TOASTing.

So (nigh) every tuple will get deformed and reformed once before it goes to
disk? Currently the toast code doesn't even look at a tuple if it's small
enough, but in this case we would want it to fire even on very narrow rows.

One design possibility I considered was doing this in heap_deform_tuple and
heap_form_tuple. Basically skipping the extra deform/form_tuple cycle in the
toast code. I had considered having heap_deform_tuple palloc copies of these
data before returning them. But that has the same problems.

The other problem is that there may be places in the code that receive a datum
from someplace where they have every right to expect it not to be toasted. For
example, plpgsql deforming a tuple they just formed, or even as the return
value from a function. They might be quite surprised to receive a toasted
tuple.

Note also that that's going to force us to palloc and memcpy these data. Are
there going to be circumstances where existing code where this changes the
memory context lifetime of some data? If, for example, soemthing like the inet
code knows its arguments can never be large enough to get toasted and doesn't
do a FREE_IF_COPY on its btree operator arguments.

> After we have that working, we can work on offering alternative macros
> that let specific functions avoid the overhead of conversion between
> 4-byte headers and short ones, in much the same way that there are TOAST
> macros now that let specific functions get down-and-dirty with the
> out-of-line TOAST representation.  But first we have to get to the point
> where 4-byte-header datums can be distinguished from short-header datums
> by inspection; and that requires either network byte order in the 4-byte
> length word or some other change in its representation.
> 
> > Actually I think neither htonl nor bitshifting the entire 4-byte word is going
> > to really work here. Both will require 4-byte alignment.
> 
> And your point is what?  The 4-byte form can continue to require
> alignment, and *will* require it in any case, since many of the affected
> datatypes expect alignment of the data within the varlena.  The trick is
> that when we are examining a non-aligned address within a tuple, we have
> to be able to tell whether we are looking at the first byte of a
> short-header datum (not aligned) or a pad byte.  This is easily done,
> for instance by decreeing that pad bytes must be zeroes.

Well if we're doing it in toast then the alignment of the payload really
doesn't matter at all. It'll be realigned after detoasting anyways.

What I had had in mind was to prohibit using smaller headers than the
alignment of the data type. But that was on the assumption we would continue
to use the compressed header in memory and not copy it.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: Variable length varlena headers redux

From

Tom Lane

Date:

13 February 2007, 12:06:12

Gregory Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> (1) code written to existing practice will always generate 4-byte
>> headers.  (Hence, VARDATA() acts the same as now.)  That's the format
>> that generally gets passed around in memory.

> So then we don't need to replace VARSIZE with SET_VARLENA_LEN at all.

Yes, we do, because we have to alter the representation of 4-byte headers.
Otherwise we can't tell which header format a datum is using.

> So (nigh) every tuple will get deformed and reformed once before it goes to
> disk? Currently the toast code doesn't even look at a tuple if it's small
> enough, but in this case we would want it to fire even on very narrow rows.

I'd be inclined to put the intelligence into heap_form_tuple and thereby
avoid getting the TOAST code involved unless there are wide fields to
deal with.

> What I had had in mind was to prohibit using smaller headers than the
> alignment of the data type. But that was on the assumption we would continue
> to use the compressed header in memory and not copy it.

Well, it wouldn't be too unreasonable to limit this whole mechanism to
datatypes that have no alignment requirements on the *content* of their
datums; which in practice is probably just text/varchar/char and perhaps
inet.
        regards, tom lane

Re: Variable length varlena headers redux

From

Bruce Momjian

Date:

13 February 2007, 12:14:34

Gregory Stark wrote:
> 
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > Gregory Stark <stark@enterprisedb.com> writes:
> > > I don't really see a way around it though. Places that fill in VARDATA before
> > > the size (formatting.c seems to be the worst case) will just have to be
> > > changed and it'll be a fairly fragile point.
> > 
> > No, we're not going there: it'd break too much code now and it'd be a
> > continuing source of bugs for the foreseeable future.  The sane way to
> > design this is that
> > 
> > (1) code written to existing practice will always generate 4-byte
> > headers.  (Hence, VARDATA() acts the same as now.)  That's the format
> > that generally gets passed around in memory.
> 
> So then we don't need to replace VARSIZE with SET_VARLENA_LEN at all.
> 
> > (2) creation of a short header is handled by the TOAST code just before
> > the tuple goes to disk.
> > 
> > (3) replacement of a short header with a 4-byte header is considered
> > part of de-TOASTing.
> 
> So (nigh) every tuple will get deformed and reformed once before it goes to
> disk? Currently the toast code doesn't even look at a tuple if it's small
> enough, but in this case we would want it to fire even on very narrow rows.

One weird idea I had was that the macros can read 1 and 4-byte headers,
but can only create 4-byte headers.  The code that writes to the shared
buffer pages would to compression from 1 to 4 bytes as needed.

This might avoid changing any macros.  It also allows us to carry around
4-byte headers in memory, which I think might be more efficient.  I am
not sure if I have heard this idea proposed already or not.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Variable length varlena headers redux

From

Greg Stark

Date:

13 February 2007, 12:16:53

Tom Lane <tgl@sss.pgh.pa.us> writes:

> > So (nigh) every tuple will get deformed and reformed once before it goes to
> > disk? Currently the toast code doesn't even look at a tuple if it's small
> > enough, but in this case we would want it to fire even on very narrow rows.
> 
> I'd be inclined to put the intelligence into heap_form_tuple and thereby
> avoid getting the TOAST code involved unless there are wide fields to
> deal with.

And have heap_deform_tuple / heap_getattr palloc and memcpy the the datum on
the way out? Or wait until detoast time and then do it?

If we do it on the way out of the heaptuple then we could have a rule that
headers are always compressed in a tuple and always uncompressed out of a
tuple.

But I'm surprised, I had the impression that you were reluctant to consider
memcpying all the data all the time.

-- 
greg

Re: Variable length varlena headers redux

From

Bruce Momjian

Date:

13 February 2007, 12:19:34

Greg Stark wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > > So (nigh) every tuple will get deformed and reformed once before it goes to
> > > disk? Currently the toast code doesn't even look at a tuple if it's small
> > > enough, but in this case we would want it to fire even on very narrow rows.
> > 
> > I'd be inclined to put the intelligence into heap_form_tuple and thereby
> > avoid getting the TOAST code involved unless there are wide fields to
> > deal with.
> 
> And have heap_deform_tuple / heap_getattr palloc and memcpy the the datum on
> the way out? Or wait until detoast time and then do it?
> 
> If we do it on the way out of the heaptuple then we could have a rule that
> headers are always compressed in a tuple and always uncompressed out of a
> tuple.
> 
> But I'm surprised, I had the impression that you were reluctant to consider
> memcpying all the data all the time.

Uh, if the macros can read 1 and 4-byte headers, why do we need to
allocate memory for them?

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Variable length varlena headers redux

From

Tom Lane

Date:

13 February 2007, 12:24:14

Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> I'd be inclined to put the intelligence into heap_form_tuple and thereby
>> avoid getting the TOAST code involved unless there are wide fields to
>> deal with.

> And have heap_deform_tuple / heap_getattr palloc and memcpy the the datum on
> the way out? Or wait until detoast time and then do it?

No, heap_deform_tuple / heap_getattr are not responsible for palloc'ing
anything, only for computing appropriate pointers into the tuple.
Existing functions that use PG_DETOAST_DATUM would incur a palloc to
produce a 4-byte-header version of a short-header datum.  We could then
work on modifying one function at a time to use some alternative macro
that doesn't force a useless palloc, but the system wouldn't be broken
meanwhile; and only the high-traffic functions would be worth fixing
at all.

The point I'm trying to get across here is to do things one small step
at a time; if you insist on a "big bang" patch then it'll never get
done.  You might want to go back and review the CVS history for some
other big changes like TOAST and the version-1 function-call protocol
to see our previous uses of this approach.
        regards, tom lane

Re: Variable length varlena headers redux

From

Gregory Stark

Date:

13 February 2007, 12:46:44

"Bruce Momjian" <bruce@momjian.us> writes:

>> But I'm surprised, I had the impression that you were reluctant to consider
>> memcpying all the data all the time.
>
> Uh, if the macros can read 1 and 4-byte headers, why do we need to
> allocate memory for them?

Because the macros can't read 1 and 4-byte headers. If they could we would
have the problem with VARDATA for code sites that write to the data before
they write the size.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: Variable length varlena headers redux

From

Bruce Momjian

Date:

13 February 2007, 13:05:50

Gregory Stark wrote:
> 
> "Bruce Momjian" <bruce@momjian.us> writes:
> 
> >> But I'm surprised, I had the impression that you were reluctant to consider
> >> memcpying all the data all the time.
> >
> > Uh, if the macros can read 1 and 4-byte headers, why do we need to
> > allocate memory for them?
> 
> Because the macros can't read 1 and 4-byte headers. If they could we would
> have the problem with VARDATA for code sites that write to the data before
> they write the size.

OK, so what if we always create 4-byte headers, and read both.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Variable length varlena headers redux

From

Gregory Stark

Date:

13 February 2007, 13:09:24

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> The point I'm trying to get across here is to do things one small step
> at a time; if you insist on a "big bang" patch then it'll never get
> done. You might want to go back and review the CVS history for some
> other big changes like TOAST and the version-1 function-call protocol
> to see our previous uses of this approach.

Believe me I'm not looking for a "big bang" approach. It's just that I've
found problems with any of the incremental approaches. I suppose it's
inevitable that "big bang" approaches always seem like the most perfect since
they automatically involve fixing anything that could be a problem.

You keep suggesting things that I've previously considered and rejected --
perhaps prematurely. Off the top of my head I recall the following four
options from our discussions. It looks like we're circling around option 4.

As long as we're willing to live with the palloc/memcpy overhead at least for
now given that we can reduce it by whittling away at the sites that use the
old macros, that seems like a good compromise and the shortest development
path except perhaps option 1.

And I take it you're not worried about sites that might not detoast a datum or
detoast one in the wrong memory context where previously they were guaranteed
it wouldn't generate a copy? In particular I'm worried about btree code and
plpgsql row/record handling.

I'm not sure what to do about the alignment issue. We could just never align
1-byte headers. That would work just fine as long a the data types that need
alignment don't get ported to the new macros. It seems silly to waste space on
disk in order to save a cpu memcpy that we aren't even going to be saving for
now anyways.

Option 1)

We detect cases where the typmod guarantees either a fixed size or a maximum
size < 256 bytes. In which case instead of copying the typlen from pg_type
into tupledesc and the table's attlen we use the implied attlen or store -3
indicating single byte headers everywhere.

Cons: This implies pallocing and memcpying the datum in heap_deform_tuple.
It doesn't help text or varchar(xxxx) at all even if we mostly store small data in them.

Pros: it buys us 0-byte headers for things like numeric(1,0) and even char(n) on single-byte encodings. It buys us
1-byteheaders for most numerics and char(n) or varchar(n).

Option 2)

We have heap_form_tuple/heap_deform_tuple compress and uncompress the headers.
The rule is that headers are always compressed in a tuple and never compressed
when at the end of a datum.

Cons: This implies pallocing and memcpying the datum in heap_deform_tuple
It requires restricting the range of 1-byte headers to 127 or 63 bytes and always uses 1 byte even for fixed
sizedata. (We could get 0-byte headers for a small range (ascii characters and numeric integers up to 127) but
then1-byte headers would be down to 31 byte data.)

It implies a different varlena format for on-disk and in-memory

Pros: it works for text/varchar as long as we store small data.
It lets us use network byte order or some other format for the on-disk headers without changing the macros to
accessin-memory headers.

Option 3)

We have the toaster (or heap_form_tuple as a shortcut) compress the headers
but delay decompression until pg_detoast_datum. tuples only contain compressed
headers but Datums sometimes point to compressed headers and sometimes
uncompressed headers just as they sometimes point to toasted data and
sometimes detoasted data.

Cons: This still implies pallocing and memcpying the datum at least for now
There could be cases where code expects to deform_tuple and be guaranteed a non-toasted pointer into the
tuple.
requires replacing VARATT_SIZEP with SET_VARLENA_LEN()

Pros: It allows for future macros to examine the compressed datum without decompressing it.

Option 4)

We have compressed data constructed on the fly everywhere possible.

Cons: requires replacing VARATT_SIZEP and also requires hacking VARDATA to find the data in the appropriate place.
Mightneed an additional pair of macros for backwards compatibility in code that really needs to construct a
4-byteheadered varlena.

fragility with risk of VARSIZE / VARDATA being filled in out of order
Requires changing header to be in network byte order all the time.

Pros: one consistent representation for varlena everywhere.

-- Gregory Stark EnterpriseDB http://www.enterprisedb.com

Re: Variable length varlena headers redux

From

Tom Lane

Date:

13 February 2007, 13:10:17

Gregory Stark <stark@enterprisedb.com> writes:
> "Bruce Momjian" <bruce@momjian.us> writes:
>> Uh, if the macros can read 1 and 4-byte headers, why do we need to
>> allocate memory for them?

> Because the macros can't read 1 and 4-byte headers. If they could we would
> have the problem with VARDATA for code sites that write to the data before
> they write the size.

The way I see this working is that VARDATA keeps its current behavior
and hence can only be used with datums that are known to be in
4-byte-header form; hence, to avoid breaking code that uses it,
PG_DETOAST_DATUM has to produce a 4-byte-header datum always.

After we have the infrastructure in place, we'd make a pass over
high-traffic functions to replace uses of PG_DETOAST_DATUM with
something that doesn't forcibly expand 1-byte-header datums, and replace
uses of VARDATA on the result with something that handles both header
formats (and would be unsuitable for generating result datums, since
it'd have to assume that the length is already filled in).

I don't see any good reason why datatype-specific functions would ever
need to generate the short-header format directly.  The only point where
it's worth trimming the header size is during heap_form_tuple, and we
can do it there at no significant efficiency cost.  So uses of VARDATA
in connection with building a new datum need not be touched.

I'm inclined also to suggest that VARSIZE() need only support 4-byte
format: we could have a second macro that understands both formats and
gets used in the same high-traffic functions in which we are replacing
uses of VARDATA().  There's no benefit in making other sites support
1-byte format for VARSIZE() if they aren't going to support it for
VARDATA().
        regards, tom lane

Re: Variable length varlena headers redux

From

mark@mark.mielke.cc

Date:

13 February 2007, 13:19:10

Hey all:

This is a randomly inserted distraction to tell you that I really like to
read about these ideas. No input from myself at this point. I'm happy with
the direction you are taking.

Thanks,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/

Re: Variable length varlena headers redux

From

Tom Lane

Date:

13 February 2007, 13:43:34

Gregory Stark <stark@enterprisedb.com> writes:
> You keep suggesting things that I've previously considered and rejected --
> perhaps prematurely. Off the top of my head I recall the following four
> options from our discussions. It looks like we're circling around option 4.

No, I think I'm arguing for option 3.

> And I take it you're not worried about sites that might not detoast a datum or
> detoast one in the wrong memory context where previously they were guaranteed
> it wouldn't generate a copy? In particular I'm worried about btree code and
> plpgsql row/record handling.

Any such site is already broken, because with the proposed design, code
is only exposed to short-header datums if it is also exposed to toasted
datums.  It's possible that we would find a few places that hadn't
gotten converted, but any such place is a must-fix anyway.  In practice
the TOAST support has been in there long enough that I don't think we'll
find any ... at least not in the core code.  It's entirely possible that
there is user-defined code out there that uses varlena format and isn't
careful about detoasting.  This is an argument for allowing opt-out
(see below).

> I'm not sure what to do about the alignment issue. We could just never align
> 1-byte headers. That would work just fine as long a the data types that need
> alignment don't get ported to the new macros. It seems silly to waste space on
> disk in order to save a cpu memcpy that we aren't even going to be saving for
> now anyways.

No, a datum that is in 1-byte-header format wouldn't have any special
alignment inside a tuple.  There are two paths to get at it: detoast it
(producing an aligned, 4-byte-header, palloc'd datum) or use the
still-to-be-named macros that let you access the unaligned content
directly.

I'm inclined to think that we might want to set things up so that
varlena datatypes can individually opt-in or opt-out of this treatment;
a datatype that needs alignment of its content might well wish to
opt-out to avoid copying overhead.  We could do that either with a
different typlen code, or still typlen -1 but pay attention to whether
typalign is 'c'.

> Option 1)

> We detect cases where the typmod guarantees either a fixed size or a maximum
> size < 256 bytes.

After last week I would veto this option anyway: it fails unless we
always know typmod exactly, and I'm here to tell you we don't.
        regards, tom lane

Re: Variable length varlena headers redux

From

Bruce Momjian

Date:

13 February 2007, 13:52:39

Tom Lane wrote:
> > We detect cases where the typmod guarantees either a fixed size or a maximum
> > size < 256 bytes.
> 
> After last week I would veto this option anyway: it fails unless we
> always know typmod exactly, and I'm here to tell you we don't.

If we can pull this off, it handles short values stored in TEXT fields
too, which is a big win over the typmod idea I had.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Variable length varlena headers redux

From

Gregory Stark

Date:

13 February 2007, 14:28:49

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> You keep suggesting things that I've previously considered and rejected --
>> perhaps prematurely. Off the top of my head I recall the following four
>> options from our discussions. It looks like we're circling around option 4.
>
> No, I think I'm arguing for option 3.

oops, sorry, got my own numbering mixed up.

>> And I take it you're not worried about sites that might not detoast a datum or
>> detoast one in the wrong memory context where previously they were guaranteed
>> it wouldn't generate a copy? In particular I'm worried about btree code and
>> plpgsql row/record handling.
>
> Any such site is already broken, because with the proposed design, code
> is only exposed to short-header datums if it is also exposed to toasted
> datums.  

Well the toaster doesn't run until we were about to put the tuple on disk. I
was afraid there might be circumstances where data from in-memory tuples are
returned and we use them without fearing them being toasted. In particular I
was fearing the record/row handling in plpgsql.

I was also worried about the tuples that indexes stores. But it seems they
don't go through heap_form_tuple at all. It does seem that the
INDEX_TOAST_HACK would need some work -- but its existence means there can't
be anyone else trusting index tuples not to contain toasted data which is
convenient.

> I'm inclined to think that we might want to set things up so that
> varlena datatypes can individually opt-in or opt-out of this treatment;
> a datatype that needs alignment of its content might well wish to
> opt-out to avoid copying overhead.  We could do that either with a
> different typlen code, or still typlen -1 but pay attention to whether
> typalign is 'c'.

Any reason to go with typalign? You can always use typalign='i' to get the
regular headers.

Alternatively, what does the trailing "a" in varlena signify? Would this be
varlenb?

>> Option 1)
>
>> We detect cases where the typmod guarantees either a fixed size or a maximum
>> size < 256 bytes.
>
> After last week I would veto this option anyway: it fails unless we
> always know typmod exactly, and I'm here to tell you we don't.

Oh, heh, timing is everything I guess.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: Variable length varlena headers redux

From

Bruce Momjian

Date:

13 February 2007, 14:32:22

Gregory Stark wrote:
> Alternatively, what does the trailing "a" in varlena signify? Would this be
> varlenb?

"attribute"

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Variable length varlena headers redux

From

Tom Lane

Date:

13 February 2007, 14:41:05

Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> Any such site is already broken, because with the proposed design, code
>> is only exposed to short-header datums if it is also exposed to toasted
>> datums.  

> Well the toaster doesn't run until we were about to put the tuple on disk. I
> was afraid there might be circumstances where data from in-memory tuples are
> returned and we use them without fearing them being toasted. In particular I
> was fearing the record/row handling in plpgsql.

True, which is another argument why datatype-specific functions should
never return short-header datums: there are places that call such
functions directly and probably expect an untoasted result.  I'm not
very concerned about stuff that's gone through a tuple though: in most
cases that could be expected to contain toasted values anyway.
        regards, tom lane

Re: Variable length varlena headers redux

From

elein

Date:

14 February 2007, 21:41:28

On Tue, Feb 13, 2007 at 01:32:11PM -0500, Bruce Momjian wrote:
> Gregory Stark wrote:
> > Alternatively, what does the trailing "a" in varlena signify? Would this be
> > varlenb?
> 
> "attribute"
> 
> -- 

Actually varlena stands for "variable length array."

elein
elein@varlena.com

Re: Variable length varlena headers redux

From

elein

Date:

14 February 2007, 21:46:40

I want to register, just in principle, that I object to changing
the structure of a varlena.  

The idea behind the data type is simple, clean and fast.  And it is 
easily understood by developers and by people developing applications 
and functions in PostgreSQL.

Of course you will do what you will.  Be careful out there.

elein
elein@varlena.com

Re: Variable length varlena headers redux

From

Tom Lane

Date:

14 February 2007, 22:02:41

elein <elein@varlena.com> writes:
> I want to register, just in principle, that I object to changing
> the structure of a varlena.  

Indeed, I'm doing my best to restrain Greg from changing what a
datatype-specific function will see by default.  The stuff with
variable-length headers should be confined to a relatively small
number of functions in which we're willing to pay a code-ugliness
penalty for speed.
        regards, tom lane

ERROR: failed to build any 8-way joins

From

Mario Weilguni

Date:

15 February 2007, 10:48:05

Interesting problem, seems to be planer related:
select 1  from beitraege bei,       b_zuordnungen bz,          (select bei_id             from b_zuordnungen bz,
         ben_zuordnungen z,                  strukturelemente se            where se.id = z.str_id              and
se.sty_id= (select id from strukturtypen where code='GEO')              and z.str_id = bz.str_id              and
z.ben_id= 100           union            select id from beitraege where kz_edit <> 'N' and useraend = 100          ) as
foo where bz.bei_id = bei.id             and foo.bei_id = bei.id       and bei.red_id in (select gba.grp_id
      from grp_ben_applikationen gba,                         grp_gruppen grp                   where grp.id =
gba.grp_id                    and grp.kz_aktiv='J'                     and gba.app_id in (select id from
grp_applikationenwhere code in ('app1', 'app2'))                      and gba.ben_id = 100)    and (bei.bei_id_frei is
nullor bei.kz_edit='N')       and (bei.bt_id, bz.str_id) in ((96,1259036), (96,2688382) )       and bei.red_id=1777712
    and bei.id in (                             select bzu.bei_id                               from b_zuordnungen bzu,
                                  strukturelemente se                              where bzu.str_id = se.id
                  and se.id = 1773715           )        and bei.id=10157309;
 
ERROR:  failed to build any 8-way joins

Interesting: remove any of the above where conditions solves  the problem go away, e.g. removing "and
bei.id=10157309".

Testet with 8.2.1 and 8.1.4, same effect on both systems. Because of the large number of tables involved it's difficult
tofind a self contained patch, but if necessary I'll give it a try.
 
I could give 8.2.3 a try, but I doubt this will help. 

Any ideas?

Regards
Mario Weilguni

Re: ERROR: failed to build any 8-way joins

From

Alvaro Herrera

Date:

15 February 2007, 11:25:44

Mario Weilguni wrote:

> Interesting: remove any of the above where conditions solves  the problem go away, e.g. removing "and
bei.id=10157309".
> 
> Testet with 8.2.1 and 8.1.4, same effect on both systems. Because of the large number of tables involved it's
difficultto find a self contained patch, but if necessary I'll give it a try.
 
> I could give 8.2.3 a try, but I doubt this will help. 

I think a similar problem was fixed after 8.2.3 was released, so you may
want to check out the REL8_2_STABLE branch.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: ERROR: failed to build any 8-way joins

From

Mario Weilguni

Date:

15 February 2007, 11:39:06

Thanks for the info. Is there a fix for 8.1 branch, the production system is 
still 8.1.

RegardsMario Weilguni


Am Donnerstag, 15. Februar 2007 16:25 schrieb Alvaro Herrera:
> Mario Weilguni wrote:
> > Interesting: remove any of the above where conditions solves  the problem
> > go away, e.g. removing "and bei.id=10157309".
> >
> > Testet with 8.2.1 and 8.1.4, same effect on both systems. Because of the
> > large number of tables involved it's difficult to find a self contained
> > patch, but if necessary I'll give it a try. I could give 8.2.3 a try, but
> > I doubt this will help.
>
> I think a similar problem was fixed after 8.2.3 was released, so you may
> want to check out the REL8_2_STABLE branch.

Re: ERROR: failed to build any 8-way joins

From

Tom Lane

Date:

15 February 2007, 12:28:24

Mario Weilguni <mweilguni@sime.com> writes:
> ERROR:  failed to build any 8-way joins

Could you provide a self-contained test case for this?  You probably
don't need any data, just the table schemas.

I fixed a problem with a similar symptom a couple days ago, but that was
in logic that was new in 8.2 ...
        regards, tom lane