Thread: btree_gin and btree_gist for enums
Here is a patch to add enum support to btree_gin and btree_gist. I didn't include distance operations, as I didn't think it terribly important, and there isn't a simple way to compute it sanely and efficiently, so no KNN support. cheers andrew
Attachment
On 03/17/2016 06:44 AM, Andrew Dunstan wrote: > Here is a patch to add enum support to btree_gin and btree_gist. I > didn't include distance operations, as I didn't think it terribly > important, and there isn't a simple way to compute it sanely and > efficiently, so no KNN support. > > This time including the data file for the gist regression tests. cheers andrew
Attachment
On Thu, Mar 17, 2016 at 11:21 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > On 03/17/2016 06:44 AM, Andrew Dunstan wrote: >> >> Here is a patch to add enum support to btree_gin and btree_gist. I didn't >> include distance operations, as I didn't think it terribly important, and >> there isn't a simple way to compute it sanely and efficiently, so no KNN >> support. > > This time including the data file for the gist regression tests. Are you intending this as a 9.7 submission? Because it's pretty late for 9.6. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 03/18/2016 09:54 AM, Robert Haas wrote: > On Thu, Mar 17, 2016 at 11:21 AM, Andrew Dunstan <andrew@dunslane.net> wrote: >> On 03/17/2016 06:44 AM, Andrew Dunstan wrote: >>> Here is a patch to add enum support to btree_gin and btree_gist. I didn't >>> include distance operations, as I didn't think it terribly important, and >>> there isn't a simple way to compute it sanely and efficiently, so no KNN >>> support. >> This time including the data file for the gist regression tests. > Are you intending this as a 9.7 submission? Because it's pretty late for 9.6. > I guess so. I would certainly find it hard to argue that it should be in 9.6 unless there is a consensus on it. cheers andrew
Hi Andrew, all, First message here! I didn't get around to sending an intro/"thank you all" e-mail yet, and a small performance (?) patch+idea(s)... (CPU stuff, since I don't otherwise know much about PG internals.) Anyway... ----- Original Message ----- From: "Andrew Dunstan" Sent: Thursday, March 17, 2016 > Here is a patch to add enum support to btree_gin and btree_gist. I > didn't include distance operations, as I didn't think it terribly > important, and there isn't a simple way to compute it sanely and > efficiently, so no KNN support. Major thanks for coming up with this a week after your "enums and indexing" message. My love of all things Postgres has a lot to do with being able to do nearly anything one could want (coming from MySQL :-/)! So I was like, "Wait, what?" when you brought up the subject, since this was something I hadn't actually tried, but was planning to use btree_gin/gist with a good mix of stuff, including enums (array and scalar). At first I thought it was just arrays of enums, until this patch for the contrib extensions (for the btree parts with GIN/GiST; plus GiST exclusion?), and your blog posts... [1][2] I wasn't certain from the second post whether your array solution can be used today without this patch (yes, just tried 9.5). So, two separate issues, and the patch addresses scalar stuff, IIUC. It would be *really* nice to have this in 9.6. It seems it's simply filling out functionality that should already be there, right? An oversight/bug fix so it works "as advertised?" :-) Is any other btree type-compatibility missing from these modules? (I notice the btree_gin docs don't mention "numeric," but it works.) I just looked over the patch, and the actual code addition/changes seem pretty small and straightforward (or am I wrong?). And it's not changing anything in the core, so... Well, just wanted to argue my case! :^) [1] http://adpgtech.blogspot.com/2016/03/gist-and-gin-support-for-enums.html [2] http://adpgtech.blogspot.com/2016/03/gin-indexing-array-of-enums.html > cheers > > andrew Thanks, Matt
On 03/24/2016 12:40 PM, Matt Wilmas wrote: > > > It would be *really* nice to have this in 9.6. It seems it's simply > filling out functionality that should already be there, right? An > oversight/bug fix so it works "as advertised?" :-) I think that would be stretching the process a bit far. I'm certainly not prepared to commit it unless there is a general consensus to make an exception for it. > Is any other btree type-compatibility missing from these modules? (I > notice the btree_gin docs don't mention "numeric," but it works.) uuid would be another type that should be fairly easily covered but isn't. > > > I just looked over the patch, and the actual code addition/changes > seem pretty small and straightforward (or am I wrong?). Yes, sure, it's all fairly simple. Took me a little while to understand what I was doing, but once I did it was pretty much plain sailing. cheers andrew
On 03/24/2016 12:40 PM, Matt Wilmas wrote: > (I notice the btree_gin docs don't mention "numeric," but it works.) > Numeric does work - we have regression tests to prove it, do we should fix the docs. But I'm also curious to know why apparently we don't have distance operator support for numeric. cheers andrew
>> Here is a patch to add enum support to btree_gin and btree_gist. I didn't >> include distance operations, as I didn't think it terribly important, and >> there isn't a simple way to compute it sanely and efficiently, so no KNN >> support. I don't think we can implement a meaningful distance operator for enums. > This time including the data file for the gist regression tests. It doesn't apply to HEAD anymore. I have tested it on b12fd41. The GiST part of it doesn't really work. The current patch compares oids. We need to change it to compare enumsortorder. I could make it fail easily: > regression=# create type e as enum ('0', '2', '3'); > CREATE TYPE > regression=# alter type e add value '1' after '0'; > ALTER TYPE > regression=# create table t as select (i % 4)::text::e from generate_series(0, 100000) as i; > SELECT 100001 > regression=# create index on t using gist (e); > SEGFAULT > +ALTER OPERATOR FAMILY gist_enum_ops USING gist ADD Why don't we just create it with those changes? > + * Use a similar trick to that used for numeric for enums, since we don't Do you mean "similar trick that is used" or "trick similar to numeric"? > + * actually know the leftmost value of any enum without knowing the concrete > + * type, so we use a dummy leftmost value of InvalidOid. > + return DatumGetBool( > + DirectFunctionCall2(enum_gt, ObjectIdGetDatum(*((const Oid *) a)), ObjectIdGetDatum(*((const Oid*) b))) > + ); Wouldn't it be nice to refactor enum_cmp_internal() to accept typcache and use it there like the rangetypes?
On 11/04/2016 04:14 PM, Emre Hasegeli wrote: >>> Here is a patch to add enum support to btree_gin and btree_gist. I didn't >>> include distance operations, as I didn't think it terribly important, and >>> there isn't a simple way to compute it sanely and efficiently, so no KNN >>> support. > I don't think we can implement a meaningful distance operator for enums. Probably. > >> This time including the data file for the gist regression tests. > It doesn't apply to HEAD anymore. I have tested it on b12fd41. I'll fix the bitrot.. > > The GiST part of it doesn't really work. The current patch compares > oids. We need to change it to compare enumsortorder. I could make it > fail easily: ouch. Ok,I'll work on this. >> +ALTER OPERATOR FAMILY gist_enum_ops USING gist ADD > Why don't we just create it with those changes? I'll take a look. > >> + * Use a similar trick to that used for numeric for enums, since we don't > Do you mean "similar trick that is used" or "trick similar to numeric"? This is perfectly legal English. It means "... a similar trick to the one used for numeric ... ". I'll change it to that if you think it's clearer. > >> + * actually know the leftmost value of any enum without knowing the concrete >> + * type, so we use a dummy leftmost value of InvalidOid. >> + return DatumGetBool( >> + DirectFunctionCall2(enum_gt, ObjectIdGetDatum(*((const Oid *) a)), ObjectIdGetDatum(*((constOid *) b))) >> + ); > Wouldn't it be nice to refactor enum_cmp_internal() to accept typcache > and use it there like the rangetypes? > Not sure. Will look. Thanks for the review. cheers andrew
On 11/04/2016 04:14 PM, Emre Hasegeli wrote: > The GiST part of it doesn't really work. The current patch compares > oids. We need to change it to compare enumsortorder. I could make it > fail easily: > >> regression=# create type e as enum ('0', '2', '3'); >> CREATE TYPE >> regression=# alter type e add value '1' after '0'; >> ALTER TYPE >> regression=# create table t as select (i % 4)::text::e from generate_series(0, 100000) as i; >> SELECT 100001 >> regression=# create index on t using gist (e); >> SEGFAULT It calls the enum routines which use the sortorder if necessary. It's not necessary to use sortorder in the case of evenly numbered enum oids as we have carefully arranged for them to be directly comparable, and all the initially allocated oids are even, a very nifty efficiency measure devised by Tom when we added enum extension. The real problem here is that enum_cmp_internal assumes that fcinfo->flinfo has been set up, and DirectFunctionCallN doesn't, it sets it to NULL. The patch below cures the problem. I'm not sure if there is a better way. Thoughts? cheers andrew diff --git a/src/backend/utils/adt/enum.c b/src/backend/utils/adt/enum.c index 47d5355..64ba7a7 100644 --- a/src/backend/utils/adt/enum.c +++ b/src/backend/utils/adt/enum.c @@ -261,7 +261,7 @@ enum_send(PG_FUNCTION_ARGS) static int enum_cmp_internal(Oid arg1, Oid arg2, FunctionCallInfo fcinfo){ - TypeCacheEntry *tcache; + TypeCacheEntry *tcache = NULL; /* Equal OIDs are equal no matter what */ if (arg1 == arg2) @@ -277,7 +277,8 @@ enum_cmp_internal(Oid arg1, Oid arg2, FunctionCallInfo fcinfo) } /* Locate the typcache entry for the enum type */ - tcache = (TypeCacheEntry *) fcinfo->flinfo->fn_extra; + if (fcinfo->flinfo != NULL) + tcache = (TypeCacheEntry *) fcinfo->flinfo->fn_extra; if (tcache == NULL) { HeapTuple enum_tup; @@ -296,7 +297,8 @@ enum_cmp_internal(Oid arg1, Oid arg2, FunctionCallInfo fcinfo) ReleaseSysCache(enum_tup); /* Now locate and remember the typcache entry */ tcache = lookup_type_cache(typeoid, 0); - fcinfo->flinfo->fn_extra = (void *) tcache; + if (fcinfo->flinfo != NULL) + fcinfo->flinfo->fn_extra = (void *) tcache; } /* The remaining comparison logic is in typcache.c */
Andrew Dunstan <andrew@dunslane.net> writes: > The real problem here is that enum_cmp_internal assumes that > fcinfo->flinfo has been set up, and DirectFunctionCallN doesn't, it sets > it to NULL. > The patch below cures the problem. I'm not sure if there is a better > way. Thoughts? That may be a good fix for robustness purposes, but it seems pretty horrid from an efficiency standpoint. Where is this call, and should we be modifying it to provide a flinfo? regards, tom lane
On 11/05/2016 11:46 AM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> The real problem here is that enum_cmp_internal assumes that >> fcinfo->flinfo has been set up, and DirectFunctionCallN doesn't, it sets >> it to NULL. >> The patch below cures the problem. I'm not sure if there is a better >> way. Thoughts? > That may be a good fix for robustness purposes, but it seems pretty horrid > from an efficiency standpoint. Where is this call, and should we be > modifying it to provide a flinfo? > > See attached updated patch that adds enum support to btree_gist and btree_gin I thought of providing an flinfo, but I couldn't see a simple way to do it that would provide something much longer lived than the function call, in which case it seemed a bit pointless. That's why I asked for assistance :-) cheers andrew
Attachment
Andrew Dunstan <andrew@dunslane.net> writes: > On 11/05/2016 11:46 AM, Tom Lane wrote: >> That may be a good fix for robustness purposes, but it seems pretty horrid >> from an efficiency standpoint. Where is this call, and should we be >> modifying it to provide a flinfo? > I thought of providing an flinfo, but I couldn't see a simple way to do > it that would provide something much longer lived than the function > call, in which case it seemed a bit pointless. That's why I asked for > assistance :-) Hmm. The problem is that the intermediate layer in btree_gist (and probably btree_gin too, didn't look) does not pass through the FunctionCallInfo data structure that's provided by the GIST AM. That wasn't needed up to now, because none of the supported data types are complex enough that any cached state would be useful, but trying to extend it to enums exposes the shortcoming. We could fix this, but it would be pretty invasive since it would require touching just about every function in btree_gist/gin. Not entirely sure that it's worth it. On the other hand, the problem might well come up again in future, perhaps for a datatype where the penalty for lack of a cache is greater --- eg, it would be pretty painful to support record_cmp without caching. And the changes ought to be relatively mechanical, even if they are extensive. BTW, this patch would be a great deal shorter if you made use of the work done in 40b449ae8. That is, you no longer need to replace the base extension script --- just add an update script and change the default version in the .control file. See fd321a1df for an example. regards, tom lane
On 11/05/2016 01:13 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 11/05/2016 11:46 AM, Tom Lane wrote: >>> That may be a good fix for robustness purposes, but it seems pretty horrid >>> from an efficiency standpoint. Where is this call, and should we be >>> modifying it to provide a flinfo? >> I thought of providing an flinfo, but I couldn't see a simple way to do >> it that would provide something much longer lived than the function >> call, in which case it seemed a bit pointless. That's why I asked for >> assistance :-) > Hmm. The problem is that the intermediate layer in btree_gist (and > probably btree_gin too, didn't look) does not pass through the > FunctionCallInfo data structure that's provided by the GIST AM. > That wasn't needed up to now, because none of the supported data types > are complex enough that any cached state would be useful, but trying > to extend it to enums exposes the shortcoming. > > We could fix this, but it would be pretty invasive since it would require > touching just about every function in btree_gist/gin. Not entirely sure > that it's worth it. On the other hand, the problem might well come up > again in future, perhaps for a datatype where the penalty for lack of > a cache is greater --- eg, it would be pretty painful to support > record_cmp without caching. And the changes ought to be relatively > mechanical, even if they are extensive. Yeah. I think it's probably worth doing. btree_gin is probably a better place to start because it's largely macro-ized. I don't have time right now to do this. I'll try to get to it if nobody else does over then next couple of months. > > BTW, this patch would be a great deal shorter if you made use of the > work done in 40b449ae8. That is, you no longer need to replace the > base extension script --- just add an update script and change the > default version in the .control file. See fd321a1df for an example. Oh, nice. My work was originally done in March, before this came in. cheers andrew
I won't have time to fix this before the end of the Commitfest
On Wed, Nov 23, 2016 at 11:31 AM, Andrew Dunstan <adunstan@postgresql.org> wrote:
I won't have time to fix this before the end of the Commitfest
The patch is marked as "returned with feedback" in 2016-11 commitfest.
Please free to submit an updated patch to the next commitfest.
Regards,
Hari Babu
Fujitsu Australia
On 11/05/2016 03:11 PM, Andrew Dunstan wrote: > > > On 11/05/2016 01:13 PM, Tom Lane wrote: >> Andrew Dunstan <andrew@dunslane.net> writes: >>> On 11/05/2016 11:46 AM, Tom Lane wrote: >>>> That may be a good fix for robustness purposes, but it seems pretty >>>> horrid >>>> from an efficiency standpoint. Where is this call, and should we be >>>> modifying it to provide a flinfo? >>> I thought of providing an flinfo, but I couldn't see a simple way to do >>> it that would provide something much longer lived than the function >>> call, in which case it seemed a bit pointless. That's why I asked for >>> assistance :-) >> Hmm. The problem is that the intermediate layer in btree_gist (and >> probably btree_gin too, didn't look) does not pass through the >> FunctionCallInfo data structure that's provided by the GIST AM. >> That wasn't needed up to now, because none of the supported data types >> are complex enough that any cached state would be useful, but trying >> to extend it to enums exposes the shortcoming. >> >> We could fix this, but it would be pretty invasive since it would >> require >> touching just about every function in btree_gist/gin. Not entirely sure >> that it's worth it. On the other hand, the problem might well come up >> again in future, perhaps for a datatype where the penalty for lack of >> a cache is greater --- eg, it would be pretty painful to support >> record_cmp without caching. And the changes ought to be relatively >> mechanical, even if they are extensive. > > > > Yeah. I think it's probably worth doing. btree_gin is probably a > better place to start because it's largely macro-ized. So looking at this we have: static Datum gin_btree_compare_prefix(FunctionCallInfo fcinfo) { Datum a = PG_GETARG_DATUM(0); Datum b = PG_GETARG_DATUM(1); QueryInfo *data = (QueryInfo *) PG_GETARG_POINTER(3); int32 res, cmp; cmp = DatumGetInt32(DirectFunctionCall2Coll( data->typecmp, PG_GET_COLLATION(), (data->strategy == BTLessStrategyNumber || data->strategy == BTLessEqualStrategyNumber) ? data->datum : a, b)); and then the referred to routine in the enum case looks like this: Datum gin_enum_cmp(PG_FUNCTION_ARGS) { Oid a = PG_GETARG_OID(0); Oid b = PG_GETARG_OID(1); int res = 0; if (ENUM_IS_LEFTMOST(a)) { res = (ENUM_IS_LEFTMOST(b)) ? 0 : -1; } else if (ENUM_IS_LEFTMOST(b)) { res = 1; } else { res = DatumGetInt32(DirectFunctionCall2(enum_cmp, ObjectIdGetDatum(a), ObjectIdGetDatum(b))); } PG_RETURN_INT32(res); } I'm not entirely sure how I should replace those DirectFunctionCall2 calls. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > I'm not entirely sure how I should replace those DirectFunctionCall2 calls. Basically what we want is for the called function to be able to use the fcinfo->flinfo->fn_extra and fcinfo->flinfo->fn_mcxt fields of the FmgrInfo struct that GIN has set up for the btree_gin support function. The fast but somewhat scary way to do it would just be to pass through the flinfo pointer as-is. Imagine that fmgr.c grows a set of functions like Datum DontKnowWhatToCallThisFunctionCall2(PGFunction func, FmgrInfo *flinfo, Oid collation, Datum arg1, Datum arg2) { FunctionCallInfoData fcinfo; Datum result; InitFunctionCallInfoData(fcinfo, flinfo, 2, collation, NULL, NULL); fcinfo.arg[0] = arg1; fcinfo.arg[1] = arg2; fcinfo.argnull[0] = false; fcinfo.argnull[1] = false; result = (*func) (&fcinfo); /* Check for null result, since caller is clearly not expecting one */ if (fcinfo.isnull) elog(ERROR, "function%p returned NULL", (void *) func); return result; } and then you'd just pass through the fcinfo->flinfo you got. The reason this is kind of scary is that it's just blithely assuming that the function won't look at the *other* fields of the FmgrInfo. If it did, it would likely get very confused, since those fields would be describing the GIN support function, not the function we're calling. We could alternatively have this trampoline function set up a fresh, local FmgrInfo struct that it zeroes except for copying fn_extra and fn_mcxt from the caller's struct, and then it copies fn_extra back again on the way out. That's a few more cycles but it would be safer, I think; if the function tried to look at the other fields such as fn_oid it would see obviously bogus data. BTW, while I'm looking at this ... isn't gin_enum_cmp broken on its face? It's using DirectFunctionCall2 to call enum_cmp, and that's wrong because DirectFunctionCall2 does not pass through a flinfo but enum_cmp needs to have one. I've not tested, but I'm certain that this would dump core if asked to compare odd-numbered enum OIDs. regards, tom lane
On 02/23/2017 04:41 PM, Tom Lane wrote: > BTW, while I'm looking at this ... isn't gin_enum_cmp broken on its face? > It's using DirectFunctionCall2 to call enum_cmp, and that's wrong because > DirectFunctionCall2 does not pass through a flinfo but enum_cmp needs to > have one. I've not tested, but I'm certain that this would dump core if > asked to compare odd-numbered enum OIDs. > > Yes, that's what I'm trying to fix. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 02/23/2017 04:41 PM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> I'm not entirely sure how I should replace those DirectFunctionCall2 calls. > Basically what we want is for the called function to be able to use the > fcinfo->flinfo->fn_extra and fcinfo->flinfo->fn_mcxt fields of the > FmgrInfo struct that GIN has set up for the btree_gin support function. > > The fast but somewhat scary way to do it would just be to pass through > the flinfo pointer as-is. Imagine that fmgr.c grows a set of functions > like > > [...] Here's a POC for btree_gin based on my original patch. I just made your function static in btree_gin.c at least for now. I stayed with the DirectFunctionCall2 use in the type-specific routines where calling context wasn't needed (i.e. everything except enums). But it looks more than ugly and highly invasive for btree_gist, and sadly that's my main use case, exclusion constraints with enum fields. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 02/23/2017 04:41 PM, Tom Lane wrote: >> BTW, while I'm looking at this ... isn't gin_enum_cmp broken on its face? > Yes, that's what I'm trying to fix. I'd forgotten where this thread started. For a minute there I thought we had a live bug, rather than a deficiency in code-under-development. After thinking about that, I believe that enum_cmp_internal is a rather considerable hazard. It might not be our only function that requires fcinfo->flinfo cache space just some of the time not all the time, but I don't recall anyplace else that could so easily undergo a reasonable amount of testing without *ever* reaching the case where it requires that cache space. So I'm now worried that it wouldn't be too hard for such a bug to get past us. I think we should address that by adding a non-bypassable Assert that the caller did provide flinfo, as in the attached. regards, tom lane diff --git a/src/backend/utils/adt/enum.c b/src/backend/utils/adt/enum.c index 8110ee2..b1d2a6f 100644 *** a/src/backend/utils/adt/enum.c --- b/src/backend/utils/adt/enum.c *************** enum_cmp_internal(Oid arg1, Oid arg2, Fu *** 263,268 **** --- 263,277 ---- { TypeCacheEntry *tcache; + /* + * We don't need the typcache except in the hopefully-uncommon case that + * one or both Oids are odd. This means that cursory testing of code that + * fails to pass flinfo to an enum comparison function might not disclose + * the oversight. To make such errors more obvious, Assert that we have a + * place to cache even when we take a fast-path exit. + */ + Assert(fcinfo->flinfo != NULL); + /* Equal OIDs are equal no matter what */ if (arg1 == arg2) return 0; *************** enum_cmp(PG_FUNCTION_ARGS) *** 381,392 **** Oid a = PG_GETARG_OID(0); Oid b = PG_GETARG_OID(1); ! if (a == b) ! PG_RETURN_INT32(0); ! else if (enum_cmp_internal(a, b, fcinfo) > 0) ! PG_RETURN_INT32(1); ! else ! PG_RETURN_INT32(-1); } /* Enum programming support functions */ --- 390,396 ---- Oid a = PG_GETARG_OID(0); Oid b = PG_GETARG_OID(1); ! PG_RETURN_INT32(enum_cmp_internal(a, b, fcinfo)); } /* Enum programming support functions */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 02/23/2017 08:34 PM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> On 02/23/2017 04:41 PM, Tom Lane wrote: >>> BTW, while I'm looking at this ... isn't gin_enum_cmp broken on its face? >> Yes, that's what I'm trying to fix. > I'd forgotten where this thread started. For a minute there I thought > we had a live bug, rather than a deficiency in code-under-development. > > After thinking about that, I believe that enum_cmp_internal is a rather > considerable hazard. It might not be our only function that requires > fcinfo->flinfo cache space just some of the time not all the time, but > I don't recall anyplace else that could so easily undergo a reasonable > amount of testing without *ever* reaching the case where it requires > that cache space. So I'm now worried that it wouldn't be too hard for > such a bug to get past us. > > I think we should address that by adding a non-bypassable Assert that > the caller did provide flinfo, as in the attached. > > Looks sane. I don't believe there is anywhere in the core code that calls this without fcinfo->flinfo, But obviously I have tickled that with my original patch for the extension. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 02/23/2017 04:41 PM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> I'm not entirely sure how I should replace those DirectFunctionCall2 calls. > Basically what we want is for the called function to be able to use the > fcinfo->flinfo->fn_extra and fcinfo->flinfo->fn_mcxt fields of the > FmgrInfo struct that GIN has set up for the btree_gin support function. > > The fast but somewhat scary way to do it would just be to pass through > the flinfo pointer as-is. Imagine that fmgr.c grows a set of functions > like > > Datum > DontKnowWhatToCallThisFunctionCall2(PGFunction func, > FmgrInfo *flinfo, Oid collation, > Datum arg1, Datum arg2) > { > FunctionCallInfoData fcinfo; > Datum result; > > InitFunctionCallInfoData(fcinfo, flinfo, 2, collation, NULL, NULL); > > fcinfo.arg[0] = arg1; > fcinfo.arg[1] = arg2; > fcinfo.argnull[0] = false; > fcinfo.argnull[1] = false; > > result = (*func) (&fcinfo); > > /* Check for null result, since caller is clearly not expecting one */ > if (fcinfo.isnull) > elog(ERROR, "function %p returned NULL", (void *) func); > > return result; > } > > and then you'd just pass through the fcinfo->flinfo you got. > > The reason this is kind of scary is that it's just blithely assuming > that the function won't look at the *other* fields of the FmgrInfo. > If it did, it would likely get very confused, since those fields > would be describing the GIN support function, not the function we're > calling. > > We could alternatively have this trampoline function set up a fresh, local > FmgrInfo struct that it zeroes except for copying fn_extra and fn_mcxt > from the caller's struct, and then it copies fn_extra back again on the > way out. That's a few more cycles but it would be safer, I think; if the > function tried to look at the other fields such as fn_oid it would see > obviously bogus data. > > Do we want one or both of these? I'm prepared to code up a patch to fmgr.[ch] to provide them. I don't know what to call it either. In my test I used CallerContextFunctionCall2 - not sure if that's quite right, but should be close. The technique is somewhat similar to what we do in plperl.c where we fake up a function call for handling DO blocks. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 02/23/2017 04:41 PM, Tom Lane wrote: >> The reason this is kind of scary is that it's just blithely assuming >> that the function won't look at the *other* fields of the FmgrInfo. >> If it did, it would likely get very confused, since those fields >> would be describing the GIN support function, not the function we're >> calling. >> >> We could alternatively have this trampoline function set up a fresh, local >> FmgrInfo struct that it zeroes except for copying fn_extra and fn_mcxt >> from the caller's struct, and then it copies fn_extra back again on the >> way out. That's a few more cycles but it would be safer, I think; if the >> function tried to look at the other fields such as fn_oid it would see >> obviously bogus data. > Do we want one or both of these? I'm prepared to code up a patch to > fmgr.[ch] to provide them. On reflection I'm not sure that the double-copy approach is all that much safer than just passing down the caller's flinfo pointer. Most of the time it would be better, but suppose that the callee updates fn_extra and then throws elog(ERROR) --- the outcome would be different, probably creating a leak in fn_mcxt. Maybe this would still be okay, because perhaps that FmgrInfo is never used again, but I don't think we can assume that for the case at hand. At this point I'd be inclined to just document that the called function should only use fn_extra/fn_mcxt. > I don't know what to call it either. In my test I used > CallerContextFunctionCall2 - not sure if that's quite right, but should > be close. CallerInfo? CallerFInfo? Or we could spell out CallerFmgrInfo but that seems a bit verbose. regards, tom lane
On 02/24/2017 11:02 AM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> On 02/23/2017 04:41 PM, Tom Lane wrote: >>> The reason this is kind of scary is that it's just blithely assuming >>> that the function won't look at the *other* fields of the FmgrInfo. >>> If it did, it would likely get very confused, since those fields >>> would be describing the GIN support function, not the function we're >>> calling. >>> >>> We could alternatively have this trampoline function set up a fresh, local >>> FmgrInfo struct that it zeroes except for copying fn_extra and fn_mcxt >>> from the caller's struct, and then it copies fn_extra back again on the >>> way out. That's a few more cycles but it would be safer, I think; if the >>> function tried to look at the other fields such as fn_oid it would see >>> obviously bogus data. >> Do we want one or both of these? I'm prepared to code up a patch to >> fmgr.[ch] to provide them. > On reflection I'm not sure that the double-copy approach is all that much > safer than just passing down the caller's flinfo pointer. Most of the > time it would be better, but suppose that the callee updates fn_extra > and then throws elog(ERROR) --- the outcome would be different, probably > creating a leak in fn_mcxt. Maybe this would still be okay, because > perhaps that FmgrInfo is never used again, but I don't think we can assume > that for the case at hand. > > At this point I'd be inclined to just document that the called function > should only use fn_extra/fn_mcxt. fair enough. Will do it that way. > >> I don't know what to call it either. In my test I used >> CallerContextFunctionCall2 - not sure if that's quite right, but should >> be close. > CallerInfo? CallerFInfo? Or we could spell out CallerFmgrInfo but > that seems a bit verbose. > > I'll go with CallerFInfoFunctionCall2 etc. In the btree_gist system the calls to the routines like enum_cmp are buried about three levels deep. I'm thinking I'll just pass the flinfo down the stack and just call these routines at the bottom level. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 02/24/2017 02:55 PM, Andrew Dunstan wrote: > > On 02/24/2017 11:02 AM, Tom Lane wrote: >>> I don't know what to call it either. In my test I used >>> CallerContextFunctionCall2 - not sure if that's quite right, but should >>> be close. >> CallerInfo? CallerFInfo? Or we could spell out CallerFmgrInfo but >> that seems a bit verbose. >> >> > I'll go with CallerFInfoFunctionCall2 etc. > > In the btree_gist system the calls to the routines like enum_cmp are > buried about three levels deep. I'm thinking I'll just pass the flinfo > down the stack and just call these routines at the bottom level. > > > It's occurred to me that we could reduce the code clutter in fmgr.c a bit by turning the DirectFunctionCall{n]Coll functions into macros calling these functions and passing NULL as the flinfo param. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 02/24/2017 05:34 PM, Andrew Dunstan wrote: > > On 02/24/2017 02:55 PM, Andrew Dunstan wrote: >> On 02/24/2017 11:02 AM, Tom Lane wrote: >>>> I don't know what to call it either. In my test I used >>>> CallerContextFunctionCall2 - not sure if that's quite right, but should >>>> be close. >>> CallerInfo? CallerFInfo? Or we could spell out CallerFmgrInfo but >>> that seems a bit verbose. >>> >>> >> I'll go with CallerFInfoFunctionCall2 etc. >> >> In the btree_gist system the calls to the routines like enum_cmp are >> buried about three levels deep. I'm thinking I'll just pass the flinfo >> down the stack and just call these routines at the bottom level. >> >> >> > > It's occurred to me that we could reduce the code clutter in fmgr.c a > bit by turning the DirectFunctionCall{n]Coll functions into macros > calling these functions and passing NULL as the flinfo param. > here's a patch along those lines. If there's agreement on this I can finish up the work on btree_gist and btree_gin supoport for enums. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 02/24/2017 05:34 PM, Andrew Dunstan wrote: >> It's occurred to me that we could reduce the code clutter in fmgr.c a >> bit by turning the DirectFunctionCall{n]Coll functions into macros >> calling these functions and passing NULL as the flinfo param. > here's a patch along those lines. If there's agreement on this I can > finish up the work on btree_gist and btree_gin supoport for enums. I'm not really thrilled with doing it like that, for two reasons: 1. This makes it look like CallerFInfoFunctionCallN is the more basic, common interface, which it isn't and never will be. At best, that's confusing. And I don't like having only one documentation block covering both interfaces; that does nothing for clarity either. 2. By my count there are over 500 uses of DirectFunctionCallN in the backend. This will add an additional hidden parameter to each one, implying code bloat and some fractional speed cost. I think it'd be better to leave DirectFunctionCallN alone and just invent a small number of CallerFInfoFunctionCallN support functions (maybe N=1 and N=2 would be enough, at least for now). regards, tom lane
> The reason this is kind of scary is that it's just blithely assuming > that the function won't look at the *other* fields of the FmgrInfo. > If it did, it would likely get very confused, since those fields > would be describing the GIN support function, not the function we're > calling. I am sorry if it doesn't make sense, but wouldn't the whole thing be better, if we refactor enum.c to call only he internal functions with TypeCacheEntry just like the rangetypes?
On 02/25/2017 12:04 PM, Tom Lane wrote: > I think it'd be better to leave DirectFunctionCallN alone and just invent > a small number of CallerFInfoFunctionCallN support functions (maybe N=1 > and N=2 would be enough, at least for now). > > See attached. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > On 02/25/2017 12:04 PM, Tom Lane wrote: >> I think it'd be better to leave DirectFunctionCallN alone and just invent >> a small number of CallerFInfoFunctionCallN support functions (maybe N=1 >> and N=2 would be enough, at least for now). > See attached. Yeah, I like this better, except that instead of + * The callee should not look at anything except the fn_mcxt and fn_extra. + * Anything else is likely to be bogus. maybe + * It's recommended that the callee only use the fn_extra and fn_mcxt + * fields, as other fields will typically describe the calling function + * not the callee. Conversely, the calling function should not have + * used fn_extra, unless its use is known compatible with the callee's. regards, tom lane
On 02/25/2017 01:34 PM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> On 02/25/2017 12:04 PM, Tom Lane wrote: >>> I think it'd be better to leave DirectFunctionCallN alone and just invent >>> a small number of CallerFInfoFunctionCallN support functions (maybe N=1 >>> and N=2 would be enough, at least for now). >> See attached. > Yeah, I like this better, except that instead of > > + * The callee should not look at anything except the fn_mcxt and fn_extra. > + * Anything else is likely to be bogus. > > maybe > > + * It's recommended that the callee only use the fn_extra and fn_mcxt > + * fields, as other fields will typically describe the calling function > + * not the callee. Conversely, the calling function should not have > + * used fn_extra, unless its use is known compatible with the callee's. > > OK, Works for me. Thanks. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 02/25/2017 01:39 PM, Andrew Dunstan wrote: > > On 02/25/2017 01:34 PM, Tom Lane wrote: >> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >>> On 02/25/2017 12:04 PM, Tom Lane wrote: >>>> I think it'd be better to leave DirectFunctionCallN alone and just invent >>>> a small number of CallerFInfoFunctionCallN support functions (maybe N=1 >>>> and N=2 would be enough, at least for now). >>> See attached. >> Yeah, I like this better, except that instead of >> >> + * The callee should not look at anything except the fn_mcxt and fn_extra. >> + * Anything else is likely to be bogus. >> >> maybe >> >> + * It's recommended that the callee only use the fn_extra and fn_mcxt >> + * fields, as other fields will typically describe the calling function >> + * not the callee. Conversely, the calling function should not have >> + * used fn_extra, unless its use is known compatible with the callee's. >> >> > > OK, Works for me. Thanks. > This works for the btree_gin case. However, there's a difficulty for btree_gist - in the puicksplit routine the cmp function is passed to qsort() so there is no chance to pass it an flinfo to set up the call to the real comparison routine. Implementing a custom sort routine to work around the problem seems a bridge too far. I can;t think of an alternative off hand. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > This works for the btree_gin case. However, there's a difficulty for > btree_gist - in the puicksplit routine the cmp function is passed to > qsort() so there is no chance to pass it an flinfo to set up the call to > the real comparison routine. Implementing a custom sort routine to work > around the problem seems a bridge too far. I can;t think of an > alternative off hand. We already have qsort_arg ... can't you change it to use that? regards, tom lane
On 02/26/2017 03:26 PM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> This works for the btree_gin case. However, there's a difficulty for >> btree_gist - in the puicksplit routine the cmp function is passed to >> qsort() so there is no chance to pass it an flinfo to set up the call to >> the real comparison routine. Implementing a custom sort routine to work >> around the problem seems a bridge too far. I can;t think of an >> alternative off hand. > We already have qsort_arg ... can't you change it to use that? > > Yes, wasn't aware of that, that looks like exactly what I need. thanks. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 02/26/2017 04:09 PM, Andrew Dunstan wrote: > > On 02/26/2017 03:26 PM, Tom Lane wrote: >> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >>> This works for the btree_gin case. However, there's a difficulty for >>> btree_gist - in the puicksplit routine the cmp function is passed to >>> qsort() so there is no chance to pass it an flinfo to set up the call to >>> the real comparison routine. Implementing a custom sort routine to work >>> around the problem seems a bridge too far. I can;t think of an >>> alternative off hand. >> We already have qsort_arg ... can't you change it to use that? >> >> > > Yes, wasn't aware of that, that looks like exactly what I need. thanks. > > OK, here's the whole series of patches. Patch 1 adds the CallerFInfoFunctionCall{1,2} functions. Patch 2 adds btree_gist support for their use for non-varlena types Patch 3 does the same for varlena types (Not required for patch 4, but better to be consistent, I think.) Patch 4 adds enum support to btree_gist Patch 5 adds enum support to btree_gin cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
- 0004-Add-btree_gist-support-for-enum-types.patch
- 0003-CallerFInfoFunctionCall-support-for-btree_gist-varlena-types.patch
- 0002-CallerFInfoFunctionCall-support-for-btree_gist-non-varlena-types.patch
- 0001-Provide-a-direct-function-call-mechanism-that-uses-t.patch
- 0005-Add-btree_gin-support-for-enum-types.patch
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: > OK, here's the whole series of patches. I've not tested it at all, but this looks generally sane in a quick once-over. A minor quibble is that in 0003, you weren't terribly consistent about argument order --- in some places you have the FmgrInfo argument added before the collation argument, and in some places after. I'd suggest trying to make the argument orders consistent with the fmgr.c support functions. (I'm generally -1 on blindly adding stuff at the end.) regards, tom lane
On 02/27/2017 04:41 PM, Tom Lane wrote: > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> OK, here's the whole series of patches. > I've not tested it at all, but this looks generally sane in a quick > once-over. > > A minor quibble is that in 0003, you weren't terribly consistent about > argument order --- in some places you have the FmgrInfo argument added > before the collation argument, and in some places after. I'd suggest > trying to make the argument orders consistent with the fmgr.c support > functions. (I'm generally -1 on blindly adding stuff at the end.) > > Thanks for reviewing. I don't mind changing it, but if I do I'm inclined to make it as consistent as possible with the 0002 patch, which did put all the FmgrInfo arguments at the end - there's not any other more obvious place for them in that case, as there is no collation argument. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
28.02.2017 00:22, Andrew Dunstan: > OK, here's the whole series of patches. > > Patch 1 adds the CallerFInfoFunctionCall{1,2} functions. > > Patch 2 adds btree_gist support for their use for non-varlena types > > Patch 3 does the same for varlena types (Not required for patch 4, but > better to be consistent, I think.) > > Patch 4 adds enum support to btree_gist > > Patch 5 adds enum support to btree_gin > > cheers > > andrew Thank you for implementing the feature! These patches seem to have some merge conflicts with recent commit: commit c7a9fa399d557c6366222e90b35db31e45d25678 Author: Stephen Frost <sfrost@snowman.net> Date: Wed Mar 15 11:16:25 2017 -0400 Add support for EUI-64 MAC addresses as macaddr8 And also, it's needed to update patch 0002 to consider macaddr8, I attached the diff. Complete patch (including 0002_addition fix) rebased to the current master is attached as well. It works as expected, code itself looks clear and well documented. The only issue I've found is a make check failure in contrib/btree_gin subdirectory: test numeric ... ok test enum ... /bin/sh: 1: cannot open /home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/sql/enum.sql: No such file diff: /home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/results/enum.out: No such file or directory diff command failed with status 512: diff "/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/expected/enum.out" "/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/results/enum.out" > "/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/results/enum.out.diff" Please, add regression test for btree_gin, and this patch can be considered "Ready for committer". -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers