Thread: GIST and TOAST

GIST and TOAST

From
Gregory Stark
Date:
I have a problem getting packed varlenas to work with GIST indexes. Namely,
the GIST code doesn't call pg_detoast_datum() enough. Instead of using the
DatumGetFoo() macros it uses DatumGetPointer() and calls PG_DETOAST_DATUM only
when it thinks it'll be necessary.

I've temporarily made the packed varlena code ignore user defined data types.
This isn't very satisfactory though since it means domains over things like
text don't get packed.

And in any case even with this in place it doesn't fix all the problems. The
Postgres regression tests pass but I can still trigger problems because the
GIST code doesn't reliably call detoast even on user data types they're given.

I think these are actual bugs. If you happened to provide a large enough datum
to the gist code it would cause the same problem I'm seeing. The packed
varlena patch just makes it easier to trigger.

I've fixed the problems I'm seeing with the following patch to _int_gist.c.
But I'm not sure it's correct. What I'm afraid of is that I'm not sure where
these functions are being called from and whether they expect to be leaking
memory. If they're expected to not leak memory then they're now leaking the
detoasted datum and that's a problem.

I'm also wondering if there aren't similar problems in the dozens of other
gist indexing modules...

I wouldn't mind a second pair of eyes on the _int_gist.c changes if there's
someone who can tell me whether any of these functions require a
PG_FREE_IF_COPY or equivalent.


Index: _int_gist.c
===================================================================
RCS file: /home/stark/src/REPOSITORY/pgsql/contrib/intarray/_int_gist.c,v
retrieving revision 1.16
diff -u -r1.16 _int_gist.c
--- _int_gist.c    4 Oct 2006 00:29:45 -0000    1.16
+++ _int_gist.c    2 Mar 2007 16:09:26 -0000
@@ -1,6 +1,6 @@#include "_int.h"
-#define GETENTRY(vec,pos) ((ArrayType *) DatumGetPointer((vec)->vector[(pos)].key))
+#define GETENTRY(vec,pos) (DatumGetArrayTypeP((vec)->vector[(pos)].key))/*** GiST support methods
@@ -39,7 +39,7 @@    if (strategy == BooleanSearchStrategy)    {        retval = execconsistent((QUERYTYPE *) query,
-                                (ArrayType *) DatumGetPointer(entry->key),
+                                DatumGetArrayTypeP(entry->key),                                GIST_LEAF(entry));
 pfree(query);
 
@@ -58,7 +58,7 @@    switch (strategy)    {        case RTOverlapStrategyNumber:
-            retval = inner_int_overlap((ArrayType *) DatumGetPointer(entry->key),
+            retval = inner_int_overlap(DatumGetArrayTypeP(entry->key),                                       query);
        break;        case RTSameStrategyNumber:
 
@@ -70,21 +70,21 @@                                    PointerGetDatum(&retval)                    );            else
-                retval = inner_int_contains((ArrayType *) DatumGetPointer(entry->key),
+                retval = inner_int_contains(DatumGetArrayTypeP(entry->key),
query);           break;        case RTContainsStrategyNumber:        case RTOldContainsStrategyNumber:
 
-            retval = inner_int_contains((ArrayType *) DatumGetPointer(entry->key),
+            retval = inner_int_contains(DatumGetArrayTypeP(entry->key),                                        query);
          break;        case RTContainedByStrategyNumber:        case RTOldContainedByStrategyNumber:            if
(GIST_LEAF(entry))               retval = inner_int_contains(query,
 
-                                  (ArrayType *) DatumGetPointer(entry->key));
+                                  DatumGetArrayTypeP(entry->key));            else
-                retval = inner_int_overlap((ArrayType *) DatumGetPointer(entry->key),
+                retval = inner_int_overlap(DatumGetArrayTypeP(entry->key),
query);           break;        default:
 
@@ -282,10 +282,10 @@    float        tmp1,                tmp2;
-    ud = inner_int_union((ArrayType *) DatumGetPointer(origentry->key),
-                         (ArrayType *) DatumGetPointer(newentry->key));
+    ud = inner_int_union(DatumGetArrayTypeP(origentry->key),
+                         DatumGetArrayTypeP(newentry->key));    rt__int_size(ud, &tmp1);
-    rt__int_size((ArrayType *) DatumGetPointer(origentry->key), &tmp2);
+    rt__int_size(DatumGetArrayTypeP(origentry->key), &tmp2);    *result = tmp1 - tmp2;    pfree(ud);
@@ -297,8 +297,8 @@Datumg_int_same(PG_FUNCTION_ARGS){
-    ArrayType  *a = (ArrayType *) PointerGetDatum(PG_GETARG_POINTER(0));
-    ArrayType  *b = (ArrayType *) PointerGetDatum(PG_GETARG_POINTER(1));
+    ArrayType  *a = PG_GETARG_ARRAYTYPE_P(0);
+    ArrayType  *b = PG_GETARG_ARRAYTYPE_P(1);    bool       *result = (bool *) PG_GETARG_POINTER(2);    int4        n
=ARRNELEMS(a);    int4       *da,
 


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: GIST and TOAST

From
Andrew - Supernews
Date:
On 2007-03-02, Gregory Stark <stark@enterprisedb.com> wrote:
> I think these are actual bugs. If you happened to provide a large enough datum
> to the gist code it would cause the same problem I'm seeing. The packed
> varlena patch just makes it easier to trigger.

Are you taking into account the fact that, at least prior to your patch,
values in index tuples could never be toasted?

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: GIST and TOAST

From
Tom Lane
Date:
Andrew - Supernews <andrew+nonews@supernews.com> writes:
> On 2007-03-02, Gregory Stark <stark@enterprisedb.com> wrote:
>> I think these are actual bugs. If you happened to provide a large enough datum
>> to the gist code it would cause the same problem I'm seeing. The packed
>> varlena patch just makes it easier to trigger.

> Are you taking into account the fact that, at least prior to your patch,
> values in index tuples could never be toasted?

False --- see index_form_tuple().
        regards, tom lane


Re: GIST and TOAST

From
Andrew - Supernews
Date:
On 2007-03-02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew - Supernews <andrew+nonews@supernews.com> writes:
>> On 2007-03-02, Gregory Stark <stark@enterprisedb.com> wrote:
>>> I think these are actual bugs. If you happened to provide a large enough
>>> datum
>>> to the gist code it would cause the same problem I'm seeing. The packed
>>> varlena patch just makes it easier to trigger.
>
>> Are you taking into account the fact that, at least prior to your patch,
>> values in index tuples could never be toasted?
>
> False --- see index_form_tuple().

My mistake.

A closer reading, however, shows that at least for cases like intarray,
btree_gist, etc., the detoasting of an index value is being done in the
gist decompress function, so the value seen via GISTENTRY in the other
functions should already have been detoasted once.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: GIST and TOAST

From
Teodor Sigaev
Date:
> A closer reading, however, shows that at least for cases like intarray,
> btree_gist, etc., the detoasting of an index value is being done in the
> gist decompress function, so the value seen via GISTENTRY in the other
> functions should already have been detoasted once.

Right, any stored value form index should be decompressed by GiST decompress 
support method.

Another places:
- compress might get original value in case of inserting new one, in all other 
cases it gets value produced by decompress method.
- query in consistent method

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: GIST and TOAST

From
Gregory Stark
Date:
"Teodor Sigaev" <teodor@sigaev.ru> writes:

>> A closer reading, however, shows that at least for cases like intarray,
>> btree_gist, etc., the detoasting of an index value is being done in the
>> gist decompress function, so the value seen via GISTENTRY in the other
>> functions should already have been detoasted once.
>
> Right, any stored value form index should be decompressed by GiST decompress
> support method.

The problem is that this is the only place in the code where we make wholesale
assumptions that a datum that comes from a tuple (heap tuple or index tuple)
isn't toasted. There are other places but they're always flagged with big
comments explaining *why* the datum can't be toasted and they're minor
localized instances, not a whole subsystem.

This was one of the assumptions that the packed varlena code depended on: that
anyone looking at a datum from a tuple would always detoast it even if they
had formed the tuple themselves and never passed it through the toaster. The
*only* place this has come up as a problem is in GIST.

I would say we could just exempt all the GIST data types from packed varlenas
except that doesn't even solve the problem completely. There's at least one
place, _int_gist.c, where the entry type is just a plain array. So unless I
exempt *all* arrays the arrays it gets out of the index tuples it forms will
be packed and need to be detoasted.

So I'm leaning towards going through all of the GIST modules and replacing all
the (Type*)DatumGetPointers formulations with actually DatumGetType and all
the (Type*)PG_GETARG_POINTER() formulations with PG_GETARG_TYPE(). And having
those macros call PG_DETOAST_DATUM().

How would you feel about that?

There are two downsides I see:

It's an extra check against the toast flag bits which is extra cpu. But this
is how the rest of the Postgres source works and we don't think the extra cpu
cost is significant.

There may be places that assume they won't leak detoasted copies of datums. If
you could help point those places out they should just need PG_FREE_IF_COPY()
calls or in some cases a pg_detoast_datum_copy() call earlier in the correct
memeory context. This again is how the rest of the postgres source handles
this.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: GIST and TOAST

From
Andrew - Supernews
Date:
On 2007-03-06, Gregory Stark <stark@enterprisedb.com> wrote:
> "Teodor Sigaev" <teodor@sigaev.ru> writes:
>>> A closer reading, however, shows that at least for cases like intarray,
>>> btree_gist, etc., the detoasting of an index value is being done in the
>>> gist decompress function, so the value seen via GISTENTRY in the other
>>> functions should already have been detoasted once.
>>
>> Right, any stored value form index should be decompressed by GiST decompress
>> support method.
>
> The problem is that this is the only place in the code where we make wholesale
> assumptions that a datum that comes from a tuple (heap tuple or index tuple)
> isn't toasted.

The places in the intarray code that you tried to "fix" in your patch at
the start of this thread are not dealing with data that came from a tuple,
but from data that came from a decompress method. It's expected that the
decompress method does the detoasting.

So I think you've mis-analyzed the problem. That's especially true since
you are claiming that the existing code is already buggy when in fact no
such bugs have been reported (and clearly intarray has been running with
toasted array values for years).

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: GIST and TOAST

From
Teodor Sigaev
Date:
> The problem is that this is the only place in the code where we make wholesale
> assumptions that a datum that comes from a tuple (heap tuple or index tuple)
> isn't toasted. There are other places but they're always flagged with big
> comments explaining *why* the datum can't be toasted and they're minor
> localized instances, not a whole subsystem.
> 
> This was one of the assumptions that the packed varlena code depended on: that
> anyone looking at a datum from a tuple would always detoast it even if they
> had formed the tuple themselves and never passed it through the toaster. The
> *only* place this has come up as a problem is in GIST.

I'm afraid that we have some lack of understanding. Flow of operation with 
indexed tuple in gist is:
- read tuple
- get n-th attribute with a help of  index_getattr
- call user-defined decompress method which should, at least, detoast value
- result value is passed to other user-defined method

Any new value, produced by user-defined method of GiST, before packing into 
tuple should be compressed by user-defined compress method. Compress method 
should not toast value - that is not its task.

New values are always modified by compress method before insertion. See 
gistinsert:gist.c and gistFormTuple:gistutil.c.

So, index_form_tuple should toast value, but value is already compressed and 
live in memory. Detoasting of value should be done by decompress method and live 
in memory, and so, only after that value can be passed to other user-defined method.

As I understand, packing/unpacking varlena header is doing during 
toasting/detoastiong. So, I'm not understand the problem here.

What is more, GiST API doesn't limit type of keys passed between user-defined 
GiST methods. It just says that new value should be a type on which opclass was 
defined and output of compress method should be a type pointed by STORAGE option  in CREATE OPERATOR CLASS.

> There may be places that assume they won't leak detoasted copies of datums. If
> you could help point those places out they should just need PG_FREE_IF_COPY()

GiST code works in separate memory context to prevent memory leaks. See 
gistinsert/gistbuildCallback/gistfindnext.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: GIST and TOAST

From
Gregory Stark
Date:
"Andrew - Supernews" <andrew+nonews@supernews.com> writes:

> The places in the intarray code that you tried to "fix" in your patch at
> the start of this thread are not dealing with data that came from a tuple,
> but from data that came from a decompress method. It's expected that the
> decompress method does the detoasting.
>
> So I think you've mis-analyzed the problem. That's especially true since
> you are claiming that the existing code is already buggy when in fact no
> such bugs have been reported (and clearly intarray has been running with
> toasted array values for years).

I'm not claiming, I'm asking, because I can't tell. 

And it's not clear _int_gist.c has been running with toasted array values for
years because it's limited to arrays of 100 integers (or perhaps 200 integers,
there's a factor of 2 in the test). That's not enough to trigger toasting
unless there are other large columns in the same table.

I do know that with packed varlenas I get a crash in g_int_union among other
places. I can't tell where the datum came from originally and how it ended up
stored in packed format.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: GIST and TOAST

From
Gregory Stark
Date:
"Teodor Sigaev" <teodor@sigaev.ru> writes:

> I'm afraid that we have some lack of understanding. Flow of operation with
> indexed tuple in gist is:
> - read tuple
> - get n-th attribute with a help of  index_getattr
> - call user-defined decompress method which should, at least, detoast value
> - result value is passed to other user-defined method

So when does index_form_tuple get called?

> So, index_form_tuple should toast value, but value is already compressed and
> live in memory. Detoasting of value should be done by decompress method and
> live in memory, and so, only after that value can be passed to other
> user-defined method.

Does every data type define a compress/decompress method? Even if it's not a
data type that normally gets very large?

> As I understand, packing/unpacking varlena header is doing during
> toasting/detoastiong. So, I'm not understand the problem here.

Well we cheated a bit and had heap/index_form_tuple convert the data to packed
format. This saves having to push small tuples through the toaster. So now
tuples can magically become toasted as soon as they go into a tuple even if
they never get pushed through the toaster. 

>> There may be places that assume they won't leak detoasted copies of datums. If
>> you could help point those places out they should just need PG_FREE_IF_COPY()
>
> GiST code works in separate memory context to prevent memory leaks. See
> gistinsert/gistbuildCallback/gistfindnext.

So it's perfectly safe to just use DatumGetType and PG_GETARG_TYPE instead of
using DatumGetPointer and PG_GETARG_POINTER and having to manually cast
everywhere, no? It seems like there's a lot of extra pain to maintain the code
in the present style with all the manual casts.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: GIST and TOAST

From
Gregory Stark
Date:
"Gregory Stark" <stark@enterprisedb.com> writes:

> "Andrew - Supernews" <andrew+nonews@supernews.com> writes:
>
>> So I think you've mis-analyzed the problem. That's especially true since
>> you are claiming that the existing code is already buggy when in fact no
>> such bugs have been reported (and clearly intarray has been running with
>> toasted array values for years).
>
> I'm not claiming, I'm asking, because I can't tell. 
>
> And it's not clear _int_gist.c has been running with toasted array values for
> years because it's limited to arrays of 100 integers (or perhaps 200 integers,
> there's a factor of 2 in the test). That's not enough to trigger toasting
> unless there are other large columns in the same table.

Actually I just realized the other large columns in the table would be
irrelevant. It's not whether it's toasted in the table that matters, only if
it gets compressed by index_form_tuple that does. And it can't since 400 bytes
isn't large enough to trigger compression. Unless someone's using multi-column
intarray gist indexes with very large arrays which I'm not convinced anyone
is.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: GIST and TOAST

From
Teodor Sigaev
Date:
> So when does index_form_tuple get called?

The single call of index_form_tuple in GiST core is in gistFormTuple which 
initially compress any indexed value with a help of their compress methods.

Only tuples formed by gistFormTuple could be inserted in index.

> Does every data type define a compress/decompress method? Even if it's not a
> data type that normally gets very large?

Yes, any GiST opclass should have such methods. In trivial case it just returns  input value. As I remember, only
R-Treeemulation over boxes, contrib/seg and 
 
contrib/cube have simple compress method.


> Well we cheated a bit and had heap/index_form_tuple convert the data to packed
> format. This saves having to push small tuples through the toaster. So now
> tuples can magically become toasted as soon as they go into a tuple even if
> they never get pushed through the toaster. 
Ok, it should be safe for GiST except some possible memory management issue. 
index_form_tuple in GiST works in GiST's memory context which is short-live. Is 
it possible issue for your patch? BTW, that's connected to GIN too.


> So it's perfectly safe to just use DatumGetType and PG_GETARG_TYPE instead of
> using DatumGetPointer and PG_GETARG_POINTER and having to manually cast
> everywhere, no? It seems like there's a lot of extra pain to maintain the code
> in the present style with all the manual casts.
Of course, I agree. Just PG_FREE_IF_COPY is extra call in support methods.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: GIST and TOAST

From
Teodor Sigaev
Date:
> And it's not clear _int_gist.c has been running with toasted array values for
> years because it's limited to arrays of 100 integers (or perhaps 200 integers,
> there's a factor of 2 in the test). That's not enough to trigger toasting
> unless there are other large columns in the same table.
That's was intended limitation to prevent indexing of huge arrays. gist__int_ops 
compression method is orientated for small and isn't effective on big ones.

> 
> I do know that with packed varlenas I get a crash in g_int_union among other
> places. I can't tell where the datum came from originally and how it ended up
> stored in packed format.
Can you provide your patch (in current state) and test suite? Or backtrace at least.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: GIST and TOAST

From
Gregory Stark
Date:

>> Does every data type define a compress/decompress method? Even if it's not a
>> data type that normally gets very large?
>
> Yes, any GiST opclass should have such methods. In trivial case it just returns
> input value. As I remember, only R-Tree emulation over boxes, contrib/seg and
> contrib/cube have simple compress method.

Hm, if they just return the original datum without detoasting it then it could
be an issue. I'll check.

>> Well we cheated a bit and had heap/index_form_tuple convert the data to packed
>> format. This saves having to push small tuples through the toaster. So now
>> tuples can magically become toasted as soon as they go into a tuple even if
>> they never get pushed through the toaster. 
>
> Ok, it should be safe for GiST except some possible memory management issue.
> index_form_tuple in GiST works in GiST's memory context which is short-live. Is
> it possible issue for your patch? BTW, that's connected to GIN too.

index_form_tuple doesn't leak memory. packed varlena format just has a shorter
header so it can store the header and then copy the data to the new location.
It doesn't have to create a copy of the data (except in the tuple, obviously).

But it means index_getattr can return a toasted tuple. I see several calls to
index_getattr that immediately put the datum into a GISTENTRY and call support
functions like the union function. For example gistMakeUnionItVec does this.

>> So it's perfectly safe to just use DatumGetType and PG_GETARG_TYPE instead of
>> using DatumGetPointer and PG_GETARG_POINTER and having to manually cast
>> everywhere, no? It seems like there's a lot of extra pain to maintain the code
>> in the present style with all the manual casts.
>
> Of course, I agree. Just PG_FREE_IF_COPY is extra call in support methods.

Well if you're doing everything in short-lived memory contexts then we don't
even need this. btree procedures do it because the btree code expects to be
able to do comparisons without having to set up short-lived memory contexts.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: GIST and TOAST

From
Teodor Sigaev
Date:
>> input value. As I remember, only R-Tree emulation over boxes, contrib/seg and
>> contrib/cube have simple compress method.
> Hm, if they just return the original datum without detoasting it then it could
> be an issue. I'll check.
seg and box aren't a varlena types, but cube is and it seems broken :(.
g_cube_decompress and g_cube_compress don't detoast values. I'll fix that.

> index_form_tuple doesn't leak memory. packed varlena format just has a shorter
> header so it can store the header and then copy the data to the new location.
> It doesn't have to create a copy of the data (except in the tuple, obviously).
Nice, now that's clear.

> But it means index_getattr can return a toasted tuple. I see several calls to
> index_getattr that immediately put the datum into a GISTENTRY and call support
> functions like the union function. For example gistMakeUnionItVec does this.From gistMakeUnionItVec:

datum = index_getattr(itvec[j], i + 1, giststate->tupdesc, &IsNull);
if (IsNull)
continue;
gistdentryinit(giststate, i, evec->vector + evec->n, datum, )

gistdentryinit calls user-defined decompress method.

The reason of confusion is: there is three similar functions/macros:
gistentryinit, gistcentryinit and gistdentryinit :) That names was choosen by 
authors initially developed GiST in pgsql.


> Well if you're doing everything in short-lived memory contexts then we don't
> even need this. 
Sure

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: GIST and TOAST

From
Gregory Stark
Date:
"Teodor Sigaev" <teodor@sigaev.ru> writes:

>> And it's not clear _int_gist.c has been running with toasted array values for
>> years because it's limited to arrays of 100 integers (or perhaps 200 integers,
>> there's a factor of 2 in the test). That's not enough to trigger toasting
>> unless there are other large columns in the same table.
>
> That's was intended limitation to prevent indexing of huge arrays.
> gist__int_ops compression method is orientated for small and isn't effective on
> big ones.

Right, so it's possible nobody see any toasted arrays with _int_gist.c since
they never get very large. It looks like index_form_tuple will never compress
anything under 512b so I guess it's safe currently.

>> I do know that with packed varlenas I get a crash in g_int_union among other
>> places. I can't tell where the datum came from originally and how it ended up
>> stored in packed format.
> Can you provide your patch (in current state) and test suite? Or backtrace at least.

It doesn't actually crash, it just fails CHECKARRVALID. I added an assertion
in there to cause it to generate a core dump.


You can download the core dump and binary from 
http://community.enterprisedb.com/varlena/core._inthttp://community.enterprisedb.com/varlena/postgres._int

The last patch (without the assertion) is at:
http://community.enterprisedb.com/varlena/patch-varvarlena-14.patch.gz


What I'm seeing is this:

(gdb) f 3
#3  0xb7fd924b in inner_int_union (a=0x84e41f4, b=0xb64220d0)   at _int_tool.c:81
81        CHECKARRVALID(b);

The array is actually garbage:

(gdb) p *b
$2 = {vl_len_ = 141, ndim = 0, dataoffset = 5888, elemtype = 0}

What's going on is that the va_1byte header is 141 which is 0x80 | 13. So it's
actually only 13 bytes with a 1 byte header or a 12 byte array:

(gdb) p *(varattrib*)b
$3 = {va_1byte = {va_header = 141 '\215', va_data = ""}, va_external = {   va_header = 141 '\215', va_padding =
"\000\000",va_rawsize = 0,    va_extsize = 5888, va_valueid = 0, va_toastrelid = 0}, va_compressed = {   va_header =
141,va_rawsize = 0, va_data = ""}, va_4byte = {   va_header = 141, va_data = ""}}
 

(gdb) bt
#0  0xb7e6a947 in raise () from /lib/tls/libc.so.6
#1  0xb7e6c0c9 in abort () from /lib/tls/libc.so.6
#2  0x082fec97 in ExceptionalCondition (   conditionName=0xb7fdd3b9 "!(!((b)->dataoffset != 0))",
errorType=0xb7fdd371"FailedAssertion",    fileName=0xb7fdd347 "_int_tool.c", lineNumber=81) at assert.c:51
 
#3  0xb7fd924b in inner_int_union (a=0x84e41f4, b=0xb64220d0)   at _int_tool.c:81
#4  0xb7fd547d in g_int_picksplit (fcinfo=0xbf9e43f0) at _int_gist.c:403
#5  0x08304d9c in FunctionCall2 (flinfo=0xbf9e5a94, arg1=139342312,    arg2=3214821160) at fmgr.c:1154
#6  0x08094fd3 in gistUserPicksplit (r=0xb6078d4c, entryvec=0x84e31e8,    attno=0, v=0xbf9e4728, itup=0x84e2ddc,
len=142,giststate=0xbf9e4b94)   at gistsplit.c:306
 
#7  0x08095deb in gistSplitByKey (r=0xb6078d4c, page=0xb6420220 "",    itup=0x84e2ddc, len=142, giststate=0xbf9e4b94,
v=0xbf9e4728,   entryvec=0x84e31e8, attno=0) at gistsplit.c:548
 
#8  0x080874bd in gistSplit (r=0xb6078d4c, page=0xb6420220 "",    itup=0x84e2ddc, len=142, giststate=0xbf9e4b94) at
gist.c:943
#9  0x080850fa in gistplacetopage (state=0xbf9e49e0, giststate=0xbf9e4b94)   at gist.c:329
#10 0x080871eb in gistmakedeal (state=0xbf9e49e0, giststate=0xbf9e4b94)   at gist.c:873
#11 0x08084f21 in gistdoinsert (r=0xb6078d4c, itup=0x84e2ce4, freespace=819,    giststate=0xbf9e4b94) at gist.c:278
#12 0x08084cf5 in gistbuildCallback (index=0xb6078d4c, htup=0x84c8c30,    values=0xbf9e4a98, isnull=0xbf9e4a78 "",
tupleIsAlive=1'\001',    state=0xbf9e4b94) at gist.c:201
 
#13 0x080fc81f in IndexBuildHeapScan (heapRelation=0xb60d6860,    indexRelation=0xb6078d4c, indexInfo=0x84cd620,
callback=0x8084c27<gistbuildCallback>, callback_state=0xbf9e4b94)   at index.c:1548
 
#14 0x08084bdd in gistbuild (fcinfo=0xbf9e60e8) at gist.c:150
#15 0x08305630 in OidFunctionCall3 (functionId=782, arg1=3054332000,    arg2=3053948236, arg3=139253280) at
fmgr.c:1460
#16 0x080fc363 in index_build (heapRelation=0xb60d6860,    indexRelation=0xb6078d4c, indexInfo=0x84cd620, isprimary=0
'\0')  at index.c:1296
 
#17 0x080fb86a in index_create (heapRelationId=21361,    indexRelationName=0x84a531c "text_idx", indexRelationId=21366,
  indexInfo=0x84cd620, accessMethodObjectId=783, tableSpaceId=0,    classObjectId=0x84cd60c, coloptions=0x84cd6ac,
reloptions=0,   isprimary=0 '\0', isconstraint=0 '\0', allow_system_table_mods=0 '\0',    skip_build=0 '\0',
concurrent=0'\0') at index.c:794
 
#18 0x0815f3e4 in DefineIndex (heapRelation=0x84a5354,    indexRelationName=0x84a531c "text_idx", indexRelationId=0,
accessMethodName=0x84a5380"gist", tableSpaceName=0x0,    attributeList=0x84a5448, predicate=0x0, rangetable=0x0,
options=0x0,   unique=0 '\0', primary=0 '\0', isconstraint=0 '\0',    is_alter_table=0 '\0', check_rights=1 '\001',
skip_build=0'\0',    quiet=0 '\0', concurrent=0 '\0') at indexcmds.c:452
 
#19 0x0825dcea in ProcessUtility (parsetree=0x84a5464, params=0x0,    dest=0x84a54e0, completionTag=0xbf9e687e "") at
utility.c:797
#20 0x0825bef9 in PortalRunUtility (portal=0x84ca2ec, utilityStmt=0x84a5464,    dest=0x84a54e0,
completionTag=0xbf9e687e"") at pquery.c:1176
 
#21 0x0825c04b in PortalRunMulti (portal=0x84ca2ec, dest=0x84a54e0,    altdest=0x84a54e0, completionTag=0xbf9e687e "")
atpquery.c:1263
 
#22 0x0825b786 in PortalRun (portal=0x84ca2ec, count=2147483647,    dest=0x84a54e0, altdest=0x84a54e0,
completionTag=0xbf9e687e"")   at pquery.c:814
 
#23 0x0825604d in exec_simple_query (   query_string=0x84a4fec "CREATE INDEX text_idx on test__int using gist ( a
gist__int_ops);") at postgres.c:953
 
#24 0x08259c2f in PostgresMain (argc=4, argv=0x844df58,    username=0x844de44 "stark") at postgres.c:3434
#25 0x08223d2c in BackendRun (port=0x8461550) at postmaster.c:2974
#26 0x082232b9 in BackendStartup (port=0x8461550) at postmaster.c:2601
#27 0x08220d82 in ServerLoop () at postmaster.c:1214
#28 0x08220728 in PostmasterMain (argc=3, argv=0x844a340) at postmaster.c:967
#29 0x081c426b in main (argc=3, argv=0x844a340) at main.c:188

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: GIST and TOAST

From
Teodor Sigaev
Date:
> It doesn't actually crash, it just fails CHECKARRVALID. I added an assertion
> in there to cause it to generate a core dump.

Wow, catch that, see attached patch.

g_int_decompress doesn't returns detoasted array in case it was empty.
Previously it was safe because empty array never has been toasted.

Should I commit it or you'll include in your patch?

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/
*** ./contrib/intarray.orig/./_int_gist.c    Tue Mar  6 20:59:23 2007
--- ./contrib/intarray/./_int_gist.c    Tue Mar  6 21:41:54 2007
***************
*** 232,238 ****
--- 232,247 ----

      CHECKARRVALID(in);
      if (ARRISVOID(in))
+     {
+         if (in != (ArrayType *) DatumGetPointer(entry->key)) {
+             retval = palloc(sizeof(GISTENTRY));
+             gistentryinit(*retval, PointerGetDatum(in),
+                 entry->rel, entry->page, entry->offset, FALSE);
+             PG_RETURN_POINTER(retval);
+         }
+
          PG_RETURN_POINTER(entry);
+     }

      lenin = ARRNELEMS(in);


Re: GIST and TOAST

From
Gregory Stark
Date:
"Teodor Sigaev" <teodor@sigaev.ru> writes:

>> It doesn't actually crash, it just fails CHECKARRVALID. I added an assertion
>> in there to cause it to generate a core dump.
>
> Wow, catch that, see attached patch.
>
> g_int_decompress doesn't returns detoasted array in case it was empty.
> Previously it was safe because empty array never has been toasted.

Ah, thanks a bunch.

> Should I commit it or you'll include in your patch?

I'll include it in the patch I guess since it's fine the way it is until the
patch hits.

Now I'll try running the regressions again with the gist datatypes like hstore
etc all packed as well.


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: GIST and TOAST

From
Gregory Stark
Date:
"Teodor Sigaev" <teodor@sigaev.ru> writes:

>>> input value. As I remember, only R-Tree emulation over boxes, contrib/seg and
>>> contrib/cube have simple compress method.
>> Hm, if they just return the original datum without detoasting it then it could
>> be an issue. I'll check.
> seg and box aren't a varlena types, but cube is and it seems broken :(.
> g_cube_decompress and g_cube_compress don't detoast values. I'll fix that.

Also, all the cube operators like cube_union, cube_size, cube_cmp, etc.

Would you like me to do it or are you already started?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: GIST and TOAST

From
Teodor Sigaev
Date:
I'm already started, don't worry about that. Cube is broken since TOAST 
implemented :)

Gregory Stark wrote:
> "Teodor Sigaev" <teodor@sigaev.ru> writes:
> 
>>>> input value. As I remember, only R-Tree emulation over boxes, contrib/seg and
>>>> contrib/cube have simple compress method.
>>> Hm, if they just return the original datum without detoasting it then it could
>>> be an issue. I'll check.
>> seg and box aren't a varlena types, but cube is and it seems broken :(.
>> g_cube_decompress and g_cube_compress don't detoast values. I'll fix that.
> 
> Also, all the cube operators like cube_union, cube_size, cube_cmp, etc.
> 
> Would you like me to do it or are you already started?
> 

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/