Thread: Review: B-Tree emulation for GIN

Review: B-Tree emulation for GIN

From
"Ibrar Ahmed"
Date:
Hi Teodor Sigaev,

I am getting server crash in contrib regression. May be I am doing something wrong here. Regression diff is attached.

BTW these queries work fine outside the regression.

 
--
  Ibrar Ahmed
  EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: Review: B-Tree emulation for GIN

From
Teodor Sigaev
Date:
will see, may be it's needed to update the patch

Ibrar Ahmed wrote:
> Hi Teodor Sigaev,
> 
> I am getting server crash in contrib regression. May be I am doing 
> something wrong here. Regression diff is attached.
> 
> BTW these queries work fine outside the regression.
> 
>  
> -- 
>   Ibrar Ahmed
>   EnterpriseDB   http://www.enterprisedb.com
> 
> 
> ------------------------------------------------------------------------
> 
> 

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: Review: B-Tree emulation for GIN

From
Teodor Sigaev
Date:
Updated patch.

Ibrar Ahmed wrote:
> Hi Teodor Sigaev,
>
> I am getting server crash in contrib regression. May be I am doing
> something wrong here. Regression diff is attached.
>
> BTW these queries work fine outside the regression.
>
>
> --
>   Ibrar Ahmed
>   EnterpriseDB   http://www.enterprisedb.com
>
>
> ------------------------------------------------------------------------
>
>

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

Re: Review: B-Tree emulation for GIN

From
"Ibrar Ahmed"
Date:
Thanks,


On Fri, Dec 19, 2008 at 3:26 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
Updated patch.

Ibrar Ahmed wrote:
Hi Teodor Sigaev,

I am getting server crash in contrib regression. May be I am doing something wrong here. Regression diff is attached.

BTW these queries work fine outside the regression.

 
--
 Ibrar Ahmed
 EnterpriseDB   http://www.enterprisedb.com


------------------------------------------------------------------------



--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                  WWW: http://www.sigaev.ru/



--
  Ibrar Ahmed
  EnterpriseDB   http://www.enterprisedb.com

Re: Review: B-Tree emulation for GIN

From
Jeff Davis
Date:
On Fri, 2008-12-19 at 13:26 +0300, Teodor Sigaev wrote:
> Updated patch.

I have merged this with HEAD, written a brief document (which is mostly
a copy of the btree-gist page), added the module to the contrib
Makefile, and made some very minor changes. Patch attached.

A couple comments:

1. Right now, to implement "less than" you need to start at the
beginning of the index and scan until you reach the supplied query
datum. This is because GIN doesn't support backwards scans
( http://archives.postgresql.org/pgsql-hackers/2008-07/msg01146.php ).

Unfortunately, that means numeric is not supportable for btree-gin
because there is no left-most value from which to start the scan. Do you
see an easy workaround to support numeric?

2. Why do you create a shell type "int32" from btree_gin.sql? I tried
doing a search-and-replace to use "int4" instead, and it seems to work
fine (and eliminates the warnings). I left it with int32 in my version
of the patch because I thought you may have some reason for using it.

Regards,
    Jeff Davis

Attachment

Re: Review: B-Tree emulation for GIN

From
Jeff Davis
Date:
On Sun, 2008-12-28 at 23:41 -0800, Jeff Davis wrote:
> 2. Why do you create a shell type "int32" from btree_gin.sql? I tried
> doing a search-and-replace to use "int4" instead, and it seems to work
> fine (and eliminates the warnings). I left it with int32 in my version
> of the patch because I thought you may have some reason for using it.
>

Attached an updated patch with this change.

Regards,
    Jeff Davis

Attachment

Re: Review: B-Tree emulation for GIN

From
Teodor Sigaev
Date:
> I have merged this with HEAD, written a brief document (which is mostly
> a copy of the btree-gist page), added the module to the contrib
> Makefile, and made some very minor changes. Patch attached.
Thank you

>
> A couple comments:
>
> 1. Right now, to implement "less than" you need to start at the
> beginning of the index and scan until you reach the supplied query
> datum. This is because GIN doesn't support backwards scans
> ( http://archives.postgresql.org/pgsql-hackers/2008-07/msg01146.php ).
>
> Unfortunately, that means numeric is not supportable for btree-gin
> because there is no left-most value from which to start the scan. Do you
> see an easy workaround to support numeric?
Hmm, let we use minimal varlena struct (with size equal to VARHDRSZ) as
left-most (and fake) value. It is never used for any actual argument except
compare function, so, new compare function is provided. New version of patch is
attached and it based on on your patch (btree-gin.20090111.patch.gz).

 > 2. Why do you create a shell type "int32" from btree_gin.sql? I tried
 > doing a search-and-replace to use "int4" instead, and it seems to work
 > fine (and eliminates the warnings). I left it with int32 in my version
 > of the patch because I thought you may have some reason for using it.

My mistake, thank you for fix
--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

Re: Review: B-Tree emulation for GIN

From
Jeff Davis
Date:
On Fri, 2009-01-16 at 17:42 +0300, Teodor Sigaev wrote:
> > Unfortunately, that means numeric is not supportable for btree-gin
> > because there is no left-most value from which to start the scan. Do you
> > see an easy workaround to support numeric?
> Hmm, let we use minimal varlena struct (with size equal to VARHDRSZ) as 
> left-most (and fake) value. It is never used for any actual argument except 
> compare function, so, new compare function is provided. New version of patch is 
> attached and it based on on your patch (btree-gin.20090111.patch.gz).
> 

Looks good to me. I updated the CommitFest wiki to point to this patch.

I like the fact that this patch does not modify the numeric ADT. It
still relies on the fact that the numeric type will never make use of
the minimal varlena struct, however. I bring this up in case someone
sees it as a problem.

Thanks,Jeff Davis



Re: Review: B-Tree emulation for GIN

From
Alvaro Herrera
Date:
Jeff Davis escribió:

> I like the fact that this patch does not modify the numeric ADT. It
> still relies on the fact that the numeric type will never make use of
> the minimal varlena struct, however. I bring this up in case someone
> sees it as a problem.

Greg Stark was working on a patch to make certain values use the short
representation.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Review: B-Tree emulation for GIN

From
Teodor Sigaev
Date:
>> still relies on the fact that the numeric type will never make use of
>> the minimal varlena struct, however. I bring this up in case someone
>> sees it as a problem.
> 
> Greg Stark was working on a patch to make certain values use the short
> representation.

Fake value contains only VARHDRSZ bytes.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: Review: B-Tree emulation for GIN

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> I like the fact that this patch does not modify the numeric ADT. It
> still relies on the fact that the numeric type will never make use of
> the minimal varlena struct, however. I bring this up in case someone
> sees it as a problem.

I'm pretty certain I recall Greg Stark recommending that we adopt
something like that as the standard numeric representation of zero.
It didn't get done yet, but it might happen someday.
        regards, tom lane


Re: Review: B-Tree emulation for GIN

From
Jeff Davis
Date:
On Mon, 2009-01-19 at 11:35 -0500, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > I like the fact that this patch does not modify the numeric ADT. It
> > still relies on the fact that the numeric type will never make use of
> > the minimal varlena struct, however. I bring this up in case someone
> > sees it as a problem.
> 
> I'm pretty certain I recall Greg Stark recommending that we adopt
> something like that as the standard numeric representation of zero.
> It didn't get done yet, but it might happen someday.
> 

Then we should use the previous version of the patch here:

http://archives.postgresql.org/message-id/1231709713.25019.129.camel@jdavis

Was there any talk of supporting a +/- infinity in numeric? If we did
that, it would allow numeric to be supported for btree-gin.

Regards,Jeff Davis



Re: Review: B-Tree emulation for GIN

From
Teodor Sigaev
Date:
> I'm pretty certain I recall Greg Stark recommending that we adopt
> something like that as the standard numeric representation of zero.
> It didn't get done yet, but it might happen someday.

Changes:
- use NULL as left-most value. It's safe because NULL numeric value
   cannot be an argument for any function except gin_numeric_cmp and it
   cannot be returned in regular SQL query.
- fix uninstall script for support numeric type.
--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

Re: Review: B-Tree emulation for GIN

From
Jeff Davis
Date:
On Mon, 2009-01-19 at 20:15 +0300, Teodor Sigaev wrote:
> Changes:
> - use NULL as left-most value. It's safe because NULL numeric value
>    cannot be an argument for any function except gin_numeric_cmp and it
>    cannot be returned in regular SQL query.

gin_numeric_cmp() can be called from regular SQL. I missed this before,
but that function will segfault if you call gin_numeric_cmp(NULL, 1) (in
v0.7 at least).

I know you mean a C NULL, not a SQL NULL, but it reminded me to test SQL
NULL.

And how does GIN handle SQL NULL values in the column? Does it index
them at all, or just ignore them?

Regards,Jeff Davis



Re: Review: B-Tree emulation for GIN

From
Teodor Sigaev
Date:
> gin_numeric_cmp() can be called from regular SQL. I missed this before,
> but that function will segfault if you call gin_numeric_cmp(NULL, 1) (in
> v0.7 at least).

Fixed, gin_numeric_cmp is marked as strict.

> And how does GIN handle SQL NULL values in the column? Does it index
> them at all, or just ignore them?
SQL NULL: GIN doesn't support it (amindexnulls/amsearchnulls == false)
C NULL: NULL-numeric could be returned only by gin_extract_query_numeric which
cannot be called by user directly because of internal type of argument.
GIN doesn't do anything with values returned by gin_extract_query_numeric except
providing they as an argument for comparing functions.
--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

Re: Review: B-Tree emulation for GIN

From
Jeff Davis
Date:
On Mon, 2009-01-19 at 21:41 +0300, Teodor Sigaev wrote:
> > And how does GIN handle SQL NULL values in the column? Does it index
> > them at all, or just ignore them?
> SQL NULL: GIN doesn't support it (amindexnulls/amsearchnulls == false)
> C NULL: NULL-numeric could be returned only by gin_extract_query_numeric which 
> cannot be called by user directly because of internal type of argument.
> GIN doesn't do anything with values returned by gin_extract_query_numeric except 
> providing they as an argument for comparing functions.

Ok, I understand now. I will look at this later.

Regards,Jeff Davis



Re: Review: B-Tree emulation for GIN

From
Jeff Davis
Date:
On Mon, 2009-01-19 at 21:41 +0300, Teodor Sigaev wrote:
> > gin_numeric_cmp() can be called from regular SQL. I missed this before,
> > but that function will segfault if you call gin_numeric_cmp(NULL, 1) (in
> > v0.7 at least).
> 
> Fixed, gin_numeric_cmp is marked as strict.
> 
> > And how does GIN handle SQL NULL values in the column? Does it index
> > them at all, or just ignore them?
> SQL NULL: GIN doesn't support it (amindexnulls/amsearchnulls == false)
> C NULL: NULL-numeric could be returned only by gin_extract_query_numeric which 
> cannot be called by user directly because of internal type of argument.
> GIN doesn't do anything with values returned by gin_extract_query_numeric except 
> providing they as an argument for comparing functions.

Ok, looks good. I updated the wiki to show this as the latest version of
the patch.

Thanks,Jeff Davis



Re: Review: B-Tree emulation for GIN

From
Tom Lane
Date:
Looked at this a bit ... do you think it's really a good idea to remove
the strategy number argument of comparePartial?  The argument given in
the docs for it is that it might be needed to determine when to end the
scan, and that still seems plausible to me.

The description of extractQuery's extra_data parameter seems confusing
too.  AFAICS it is incorrect, or at least misleading, to describe it as
void ** extra_data[]; it is really void ***extra_data, because there is
only one object there not an array.
        regards, tom lane


Re: Review: B-Tree emulation for GIN

From
Teodor Sigaev
Date:
> Looked at this a bit ... do you think it's really a good idea to remove
> the strategy number argument of comparePartial?  The argument given in
> the docs for it is that it might be needed to determine when to end the
> scan, and that still seems plausible to me.
Strategy number is not removed, it's replaced by pointer to opclass-specific 
data on per key basis. Actually, partial match feature is new for 8.4 and there 
is no any compatibility problem. None of existing opclasses (except 
contrib/btree_gin) uses strategy number in comparePartial.



> The description of extractQuery's extra_data parameter seems confusing
> too.  AFAICS it is incorrect, or at least misleading, to describe it as
> void ** extra_data[]; it is really void ***extra_data, because there is
> only one object there not an array.

It's really array, see fillScanKey() in ginscan.c, line 61 of patched file. 
extractQuery could provide data for each returned key.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: Review: B-Tree emulation for GIN

From
Jeff Davis
Date:
On Wed, 2009-02-04 at 20:22 +0300, Teodor Sigaev wrote:
> > The description of extractQuery's extra_data parameter seems confusing
> > too.  AFAICS it is incorrect, or at least misleading, to describe it as
> > void ** extra_data[]; it is really void ***extra_data, because there is
> > only one object there not an array.
> 
> It's really array, see fillScanKey() in ginscan.c, line 61 of patched file. 
> extractQuery could provide data for each returned key.
> 

If I understand correctly, the extra_data parameter to extractQuery()
should mean "a pointer to an array of void*". "void **extra_data[]"
might be misinterpreted as "an array of void**".

Would "void *(*extra_data)[]" be more descriptive? I'm not an expert on
style questions like this, but it seems to match the parameter lists for
comparePartial() and consistent().

"void ***extra_data" seems reasonable, too.

Regards,Jeff Davis



Re: Review: B-Tree emulation for GIN

From
Jeff Davis
Date:
On Mon, 2009-01-19 at 21:41 +0300, Teodor Sigaev wrote:
> > gin_numeric_cmp() can be called from regular SQL. I missed this before,
> > but that function will segfault if you call gin_numeric_cmp(NULL, 1) (in
> > v0.7 at least).
> 
> Fixed, gin_numeric_cmp is marked as strict.
> 
> > And how does GIN handle SQL NULL values in the column? Does it index
> > them at all, or just ignore them?
> SQL NULL: GIN doesn't support it (amindexnulls/amsearchnulls == false)
> C NULL: NULL-numeric could be returned only by gin_extract_query_numeric which 
> cannot be called by user directly because of internal type of argument.
> GIN doesn't do anything with values returned by gin_extract_query_numeric except 
> providing they as an argument for comparing functions.

Looking through the code again, gin_compare_prefix_##type looks a little
confusing.

Is there a reason for using: (data->strategy == BTLessStrategyNumber ||  data->strategy == BTLessEqualStrategyNumber )
?    PointerGetDatum(data->datum) : a
 
rather than just using: PointerGetDatum(data->datum)

Also, it might be a little less confusing if you used two separate
variables rather than using "res" for two purposes.

Regards,Jeff Davis





Re: Review: B-Tree emulation for GIN

From
Teodor Sigaev
Date:
> Looking through the code again, gin_compare_prefix_##type looks a little
> confusing.
>
> Is there a reason for using:
>   (data->strategy == BTLessStrategyNumber ||
>    data->strategy == BTLessEqualStrategyNumber ) ?
>      PointerGetDatum(data->datum) : a
> rather than just using:
>   PointerGetDatum(data->datum)

Added comments:
     /*
      * Datum a is a value from extract_query method and for BTLess*
      * strategy it is a left-most value. So, use original datum from
      * QueryInfo to decide stop scanning on not. Datum b is always
      * from index.
      */


> Also, it might be a little less confusing if you used two separate
> variables rather than using "res" for two purposes.
done
--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

Re: Review: B-Tree emulation for GIN

From
Heikki Linnakangas
Date:
The GIN_EXTRACT_VALUE macro returns a pointer to a static 'entries' 
variable. That doesn't seem safe. Is it really never possible to have to  two GIN searches in a plan, both calling and
usingthe value returned 
 
by extractValue simultaneously? In any case that seems like a pretty 
weak assumption.

You might want to declare extra_data as just "void *", instead of an 
array of pointers. The data type implementation might want to store 
something there that's not per-key, but applies to the whole query. I 
see that you're passing it to comparePartial, but that seems to be just 
future-proofing. What kind of a data type are you envisioning that would 
make use of it? It seems that you could pass the same information in the 
partial key Datum itself that extractQuery returns. You're currently 
using it as a way to avoid some palloc's in gin_tsquery_consistent(). 
That seems like a pretty dirty hack. I doubt there's any meaningful 
performance advantage from that, but if there is, I think you could use 
a statically allocated array instead.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Review: B-Tree emulation for GIN

From
Teodor Sigaev
Date:
> The GIN_EXTRACT_VALUE macro returns a pointer to a static 'entries'
> variable. That doesn't seem safe. Is it really never possible to have to
>  two GIN searches in a plan, both calling and using the value returned
> by extractValue simultaneously? In any case that seems like a pretty
> weak assumption.
Fixed.


> You might want to declare extra_data as just "void *", instead of an
> array of pointers. The data type implementation might want to store
> something there that's not per-key, but applies to the whole query. I
> see that you're passing it to comparePartial, but that seems to be just
> future-proofing. What kind of a data type are you envisioning that would

wildspeed module (http://www.sigaev.ru/cvsweb/cvsweb.cgi/wildspeed/) - for each
key from it's needed to store some info. Right now it's coded directly in Datum,
but it looks ugly (at least for me).

It's possible to clarify interface by introducing new type:

typedef void* OpaquePtr;

Then, prototypes will be:
extractQuery(..., OpaquePtr* extra_data[])
consistent(...., OpaquePtr extra_data[])
comparePartial(..., OpaquePtr extra_data)

Or another option: partial match feature is new for 8.4, so we could change
interface:
typedef struct KeyData {
    bool pmatch,
    void *extra_data;
} KeyData;

Datum *extractQuery(Datum query, int32 *nkeys, StrategyNumber n, KeyData* data[])
bool consistent(bool check[], StrategyNumber n, Datum query, bool *recheck,
KeyData data[])
comparePartial(Datum partial_key, Datum key, KeyData *data);

> make use of it? It seems that you could pass the same information in the
> partial key Datum itself that extractQuery returns. You're currently
> using it as a way to avoid some palloc's in gin_tsquery_consistent().
> That seems like a pretty dirty hack. I doubt there's any meaningful
> performance advantage from that, but if there is, I think you could use
> a statically allocated array instead.

It's easy to un-dirty that hack, but before I'd like to see your comments about
thoughts above.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

Re: Review: B-Tree emulation for GIN

From
Heikki Linnakangas
Date:
Teodor Sigaev wrote:
>> You might want to declare extra_data as just "void *", instead of an 
>> array of pointers. The data type implementation might want to store 
>> something there that's not per-key, but applies to the whole query. I 
>> see that you're passing it to comparePartial, but that seems to be 
>> just future-proofing. What kind of a data type are you envisioning 
>> that would 
> 
> wildspeed module (http://www.sigaev.ru/cvsweb/cvsweb.cgi/wildspeed/) - 
> for each key from it's needed to store some info. Right now it's coded 
> directly in Datum, but it looks ugly (at least for me).

Ok, I guess it's best to leave it as you had in the patch then.

>> make use of it? It seems that you could pass the same information in 
>> the partial key Datum itself that extractQuery returns. You're 
>> currently using it as a way to avoid some palloc's in 
>> gin_tsquery_consistent(). That seems like a pretty dirty hack. I doubt 
>> there's any meaningful performance advantage from that, but if there 
>> is, I think you could use a statically allocated array instead.
> 
> It's easy to un-dirty that hack, but before I'd like to see your 
> comments about thoughts above.

Yeah, please revert that hack.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Review: B-Tree emulation for GIN

From
Teodor Sigaev
Date:
>> It's easy to un-dirty that hack, but before I'd like to see your
>> comments about thoughts above.
>
> Yeah, please revert that hack.
Done. Also, I changed void *extra_data to Pointer extra_data and corresponding
**extra_data and ***extra_data to simplify understanding.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

Re: Review: B-Tree emulation for GIN

From
Tom Lane
Date:
[ back to this patch... ]

Teodor Sigaev <teodor@sigaev.ru> writes:
>> Looked at this a bit ... do you think it's really a good idea to remove
>> the strategy number argument of comparePartial?  The argument given in
>> the docs for it is that it might be needed to determine when to end the
>> scan, and that still seems plausible to me.

> Strategy number is not removed, it's replaced by pointer to opclass-specific 
> data on per key basis. Actually, partial match feature is new for 8.4 and there 
> is no any compatibility problem. None of existing opclasses (except 
> contrib/btree_gin) uses strategy number in comparePartial.

I'm still not real happy about omitting the strategy number from
comparePartial's arguments.  Yes, the extractQuery function can make
up for that if it has to, but why should it have to?  It seems to me
to be inconsistent that the consistent function gets the strategy
number but comparePartial doesn't.

I know that there's not a backwards compatibility issue here, but it
still seems to me that you're forcing opclasses to do things the hard
way (allocating an extra_data entry) when they might only need
information that's already part of the call signature for existing
functions.
        regards, tom lane


Re: Review: B-Tree emulation for GIN

From
Tom Lane
Date:
BTW ... while I'm thinking about it: it seems to me to be a serious
error that the consistent() function isn't given nkeys so that it can
know the length of the arrays it's being handed.  I suppose it's
possible for it to re-deduce nkeys by examining the query datum, but
that could be quite expensive; and it's certainly error-prone to have
to keep extractQuery() and consistent() in sync on this.  Since we are
adding arguments to consistent() anyway for 8.4, I propose that its
signature ought to be

bool consistent(bool check[], StrategyNumber n, Datum query,               int32 nkeys, bool *recheck, Pointer
extra_data[])

where the first three arguments are what existed in 8.3.
        regards, tom lane


Re: Review: B-Tree emulation for GIN

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> BTW ... while I'm thinking about it: it seems to me to be aTom> serious error that the consistent() function isn't
givennkeysTom> so that it can know the length of the arrays it's being handed.Tom> I suppose it's possible for it to
re-deducenkeys by examiningTom> the query datum, but that could be quite expensive; and it'sTom> certainly error-prone
tohave to keep extractQuery() andTom> consistent() in sync on this.
 

I ran into exactly this problem in hstore; the omission of nkeys there
is seriously annoying.

-- 
Andrew (irc:RhodiumToad)


Re: Review: B-Tree emulation for GIN

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> BTW ... while I'm thinking about it: it seems to me to be a
>  Tom> serious error that the consistent() function isn't given nkeys
>  Tom> so that it can know the length of the arrays it's being handed.

> I ran into exactly this problem in hstore; the omission of nkeys there
> is seriously annoying.

It's de-omitted ...
        regards, tom lane


Re: Review: B-Tree emulation for GIN

From
Tom Lane
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:
> [ btree_gin 0.12 ]

Committed with some editorializations.  There are still a few loose
ends:

* the question about zero-key queries that I mentioned before

* After this new look at the code I think that matchPartialInPendingList
is completely broken.  Surely it's an infinite loop if the
comparePartial function returns -1.  I also don't understand why it
stops short of examining the tuple at maxoff, or for that matter why
it's doing any searching at all when the code that's calling it seems
to be implementing a binary search.  I think that part of the code
needs more testing and commenting.

* In keyGetItem(), it's not at all clear what the assumptions are for
the contents of the entryRes[] array and for the interaction with a
LossyPage entry.  Is it intentional that you run through the array
comparing to a LossyPage key before deciding to exit the loop?  It
seems like that might be affecting the behavior at the next call,
but if so you're making some undocumented assumptions about the way
comparison of a lossy-page pointer is going to behave.  I thought it'd
be cleaner to move the ItemPointerIsLossyPage check up to before that
loop (right after the ItemPointerIsMax test) but the more I look at it
the less sure I am if that would break things.  That code needs more
commenting too.

* I'd also like to come to some agreement about getting rid of the
fail-on-NULL-scankey problem in newScanKey().  As I noted in the
comment there, we could make that work cleanly if we are willing to
assume that all GIN-indexable operators are strict.  We already assume
the same for hash and btree operators, so it doesn't seem like a big
problem to do this, but I wonder if there are any objections.
        regards, tom lane


Re: Review: B-Tree emulation for GIN

From
Jeff Davis
Date:
On Wed, 2009-03-25 at 19:31 -0400, Tom Lane wrote:
> * I'd also like to come to some agreement about getting rid of the
> fail-on-NULL-scankey problem in newScanKey().  As I noted in the
> comment there, we could make that work cleanly if we are willing to
> assume that all GIN-indexable operators are strict.  We already assume
> the same for hash and btree operators, so it doesn't seem like a big
> problem to do this, but I wonder if there are any objections.

"IS NULL" is indexable in a btree and non-strict, so there is at least
some precedent.

Also, if extractQuery is non-strict, shouldn't we call it and see if it
returns some useful keys? If so, I don't see a reason to assume that
nothing matches.

If the opclass author wants a search against NULL to mean "matches
nothing", they can just make extractQuery non-strict and return -1.

However, if extractQuery is strict or returns NULL, I'm fine with either
an error or assuming "nothing matches". I don't see a functionality
difference either way, so we should just document whatever seems to make
the most sense.

Regards,Jeff Davis




Re: Review: B-Tree emulation for GIN

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> Also, if extractQuery is non-strict, shouldn't we call it and see if it
> returns some useful keys?

Perhaps.  One risk factor for approaching it that way is that there are
probably a lot of opclasses out there that haven't bothered to mark
these functions strict, since it's never mattered before.  (A handy
example is that the brand new btree_gin opclasses did not bother, as
submitted; though in a fit of paranoia I made them do so before
committing.)  If the extractQuery function isn't actually guarding
against this, you'll get a crash.

That's not a showstopper reason not to change, of course, but it does
mean that I'd like to see an actual use case for a non-strict GIN index
operator before taking any risk.  Note that IS NULL isn't an operator,
so even if we were to try to support it in GIN, that would be a
different code path (just as it is in btree).
        regards, tom lane


Re: Review: B-Tree emulation for GIN

From
Teodor Sigaev
Date:

> * After this new look at the code I think that matchPartialInPendingList
> is completely broken.  Surely it's an infinite loop if the
> comparePartial function returns -1.  I also don't understand why it
fixed

> stops short of examining the tuple at maxoff, or for that matter why
> it's doing any searching at all when the code that's calling it seems
> to be implementing a binary search.  I think that part of the code
> needs more testing and commenting.
Tuples of rows are stored in pending list in the same order as in entry tree
(order by (attrnum, Datum)). So, it's possible to use binary search. By test's
binary search presents about 5-15% of performance.

But heap's row could be one page or on several pages. If page contains
several rows, each row is checked separately and tuples of one heap row are
placed from pendingPosition->firstOffset up to pendingPosition->lastOffset.
The page is marked by GIN_LIST_FULLROW flag if it contains all tuples of row(s)
or it's a last page in page's chain containing single heap row.

scanGetCandidate() always returns offset range for tuples of one heap row.
Although it may return NOT all tuples of row, if so then collectDatumForItem()
will ask scanGetCandidate() about next portion of tuples of that heap row. It's
safe because tuple's of one heap row are continuosly stored in pending list.

> * In keyGetItem(), it's not at all clear what the assumptions are for
> the contents of the entryRes[] array and for the interaction with a
> LossyPage entry.  Is it intentional that you run through the array
> comparing to a LossyPage key before deciding to exit the loop?  It
> seems like that might be affecting the behavior at the next call,
> but if so you're making some undocumented assumptions about the way
> comparison of a lossy-page pointer is going to behave.  I thought it'd
> be cleaner to move the ItemPointerIsLossyPage check up to before that
> loop (right after the ItemPointerIsMax test) but the more I look at it
> the less sure I am if that would break things.  That code needs more
> commenting too.
entryRes[] array is used for two purposes:
- as an argument for consistentFn
- to store information about entries which was a minimal on previous
   call/loop. So, only that entries should be renewed by entryGetItem()
   call. Lossy page is encoded as ItemPointer with greatest offset (0xffff),
   so  keyGetItem() could return items from page and then return the same
   page as lossy. This is not a problem because of usage of TIDBitmap to
   callect all results.

So, algorithm of  keyGetItem could be described as:
1 renew all entries which is equal to previous result (entryGetItems returns
them in increasing order)
2 find minimal entries
3 fill up  entryRes[] to new values and check lossy or call consistentFn


> * I'd also like to come to some agreement about getting rid of the
> fail-on-NULL-scankey problem in newScanKey().  As I noted in the
> comment there, we could make that work cleanly if we are willing to
> assume that all GIN-indexable operators are strict.  We already assume
> the same for hash and btree operators, so it doesn't seem like a big
> problem to do this, but I wonder if there are any objections.
Agree.  I changed the GIN code, but don't know where is other places to change
to fixate this agreement.



--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


Attachment

Re: Review: B-Tree emulation for GIN

From
Tom Lane
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:
> [ fixes for the GIN stuff I complained about before ]

This all looks good to me, please apply.  One little suggestion: 
!         /* 
!          * entryRes array is used for:
!          * - as an argument for consistentFn
!          * - entry->curItem with corresponding key->entryRes[i] == false are greater 
!          *   than key->curItem, so next loop/call they should be renewed
!          *   by entryGetItem(). So, we need to set up an array before
!          *   checking of lossy page.
!          */

pgindent will reflow this comment block, since it's not at the left
margin.  To keep the formatting looking good you'll need to add /*--------

>> * I'd also like to come to some agreement about getting rid of the
>> fail-on-NULL-scankey problem in newScanKey().  As I noted in the
>> comment there, we could make that work cleanly if we are willing to
>> assume that all GIN-indexable operators are strict.  We already assume
>> the same for hash and btree operators, so it doesn't seem like a big
>> problem to do this, but I wonder if there are any objections.

> Agree.  I changed the GIN code, but don't know where is other places
> to change to fixate this agreement.

I don't think there is anything else that needs to be done for that.
        regards, tom lane