Thread: Expanded Objects and Order By
I'm starting to think this might not actually be my mistake, but be a very narrow issue w/ the expanded object code. So, I've added support for converting postgis in-memory objects into expanded outputs, and have overwritten the usual lwgeom_to_gserialized() function with one that creates an expanded object. I haven't done anything to actually handle expanded objects on input, but as I understand it, that's fine, everything is supposed to work as normal when handed expanded objects. And thus far, that has been true, almost all regression tests work fine. But here's my narrow case. This works fine with the old lwgeom_to_gserialized that uses standard flat varlena outputs. It does the following w/ expanded outputs. SELECT gid FROM test_table ORDER BY st_distance(geom, 'POINT(-305 998.5)'::geometry) LIMIT 5; ERROR: could not find pathkey item to sort Tracing through the debugger, I see the lwgeom_to_gserialized() function get hit once for the geometry literal, via parse_analyze. After that, the error shows up shortly. If I change the query ever so slightly, adding a geometry output column: SELECT gid, geom FROM test_table ORDER BY st_distance(geom, 'POINT(-305 998.5)'::geometry) LIMIT 5; It runs through to completion, five records returned. As long as the query includes a geometry output on the output line, it works. A function that uses a geometry, but not doesn't actually output it, will still fail, with the same pathkey error. This will fail, for example. SELECT gid, ST_AsText(geom) FROM test_table ORDER BY st_distance(geom, 'POINT(-305 998.5)'::geometry) LIMIT 5; So I'm wondering what I should debug next. My code is hardly being touched at all at this point, and I'm very confident it's memory clean, etc (the flattener raises hell if the output is shorter or longer than the expected allocation). Any pointers as to what to look for, much appreciated. P.
Paul Ramsey <pramsey@cleverelephant.ca> writes: > So, I've added support for converting postgis in-memory objects into > expanded outputs, and have overwritten the usual > lwgeom_to_gserialized() function with one that creates an expanded > object. I haven't done anything to actually handle expanded objects on > input, but as I understand it, that's fine, everything is supposed to > work as normal when handed expanded objects. And thus far, that has > been true, almost all regression tests work fine. Hmm ... you do need to be able to flatten them again. In the given example, the parser is going to want to form a Const node whose Datum value is a geometry object, and that Const node needs to be copiable by copyObject(), which means datumCopy() has to work, and if you look at that it will exercise EOH_get_flat_size/EOH_flatten_into when the input routine originally produced an expanded object. The error message is very strange; it's hard to see how toying with the internal representation of Consts could cause that. I think the hypothesis of a memory clobber is stronger than you give it credit for, especially since you see the behavior changing for seemingly unrelated reasons. Valgrind might be a useful tool here. regards, tom lane
I have a size/flatten callback setup (and they are very careful not to write outside their boundaries), so that’s all OK. Since you’re not seeing anything “aha” in the error pattern, I’ll go back to the mats on memory… is there a good page onvalgriding postgresql? I thought the memory manager papered over things so much that valgrind couldn’t see what was goingon under the covers. Thanks! P > On Jan 18, 2016, at 8:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Paul Ramsey <pramsey@cleverelephant.ca> writes: >> So, I've added support for converting postgis in-memory objects into >> expanded outputs, and have overwritten the usual >> lwgeom_to_gserialized() function with one that creates an expanded >> object. I haven't done anything to actually handle expanded objects on >> input, but as I understand it, that's fine, everything is supposed to >> work as normal when handed expanded objects. And thus far, that has >> been true, almost all regression tests work fine. > > Hmm ... you do need to be able to flatten them again. In the given > example, the parser is going to want to form a Const node whose Datum > value is a geometry object, and that Const node needs to be copiable > by copyObject(), which means datumCopy() has to work, and if you look > at that it will exercise EOH_get_flat_size/EOH_flatten_into when the > input routine originally produced an expanded object. > > The error message is very strange; it's hard to see how toying with the > internal representation of Consts could cause that. I think the > hypothesis of a memory clobber is stronger than you give it credit for, > especially since you see the behavior changing for seemingly unrelated > reasons. Valgrind might be a useful tool here. > > regards, tom lane
Paul Ramsey <pramsey@cleverelephant.ca> writes: > Since you’re not seeing anything “aha” in the error pattern, I’ll go back to the mats on memory… is there a good page onvalgriding postgresql? https://wiki.postgresql.org/wiki/Valgrind > I thought the memory manager papered over things so much that valgrind couldn’t see what was going on under the covers. That used to be true, but it's a lot better now. regards, tom lane
Thank the Maker, it is reproduceable: returning an expanded header in the _in function is not appreciated in a very narrownumber of cases. Edit arrayfuncs.c:array_in(), change the return line thus: // PG_RETURN_ARRAYTYPE_P(retval); PG_RETURN_DATUM(expand_array(PointerGetDatum(retval), CurrentMemoryContext, my_extra)); And here is a small test case that exercises it: CREATE TABLE orderby_expanded ( id integer, a integer[] ); INSERT INTO orderby_expanded (id, a) VALUES (1, ARRAY[1,2]); -- works SELECT id, a FROM orderby_expanded ORDER BY a = '{1,2}'::integer[]; -- ERROR: could not find pathkey item to sort SELECT id FROM orderby_expanded ORDER BY a = '{1,2}'::integer[]; -- works SELECT id FROM orderby_expanded ORDER BY a = ARRAY[1,2];
Paul Ramsey <pramsey@cleverelephant.ca> writes: > Thank the Maker, it is reproduceable: returning an expanded header in the _in function is not appreciated in a very narrownumber of cases. Thanks for finding a test case! I'll check into it shortly. regards, tom lane
I wrote: > Paul Ramsey <pramsey@cleverelephant.ca> writes: >> Thank the Maker, it is reproduceable: returning an expanded header in >> the _in function is not appreciated in a very narrow number of cases. > Thanks for finding a test case! I'll check into it shortly. So the short answer is that the planner assumes, not unreasonably, that copying a parse node with copyObject() should produce a node that's still equal() to the original. (The proximate cause of the "could not find pathkey item to sort" error is failure to match an ORDER BY expression to an equivalence-class item that was made from it earlier.) In the case here, coerce_type() produces a Const node whose constvalue is pointing at the expanded array returned by array_in(). Nothing further happens to that node until the planner tries to copy it --- and copyObject will produce a flattened array value, cf datumCopy(). Even if copying made a new expanded object, that object would not be equal() to the original node according to datumIsEqual's very simplistic comparisons. Now that I've seen this, I'm suspicious that we probably have a lot of other related bugs --- for example, short-header and non-short-header varlena values would also not be seen as equal(). What I'm thinking is that we had best apply pg_detoast_datum() to any varlena value being put into a Const node. That will do nothing in the normal case where the value already is a simple standard-header varlena value, but will fix any cases where we might be trying to put a toasted, expanded, etc value into a Const. Generally it'd be a bad idea to have a toast reference in a Const anyhow; for example, there is code at the end of evaluate_expr() that does the equivalent of this when making a Const from the result of constant-simplifying an expression. That got put in there as the result of hard experience; but now it seems we did not go deep enough. An alternative idea would be to make datumIsEqual smarter; but given the other ways in which it's used I doubt we want it to be trying to do detoasting. regards, tom lane
Paul Ramsey <pramsey@cleverelephant.ca> writes: > Thank the Maker, it is reproduceable: returning an expanded header in the _in function is not appreciated in a very narrownumber of cases. BTW, on further poking around: if you'd had RANDOMIZE_ALLOCATED_MEMORY enabled, returning an expanded object from an input function would have failed this test in stringTypeDatum(): #ifdef RANDOMIZE_ALLOCATED_MEMORY /* * For pass-by-reference data types, repeat the conversion to see if the * input function leaves any uninitializedbytes in the result. We can * only detect that reliably if RANDOMIZE_ALLOCATED_MEMORY is enabled, so * we don't bother testing otherwise. The reason we don't want any * instability in the input function is that comparisonof Const nodes * relies on bytewise comparison of the datums, so if the input function * leaves garbage thensubexpressions that should be identical may not get * recognized as such. See pgsql-hackers discussion of 2008-04-04. */ if (string && !typform->typbyval) { Datum result2; result2 = OidInputFunctionCall(typinput, string, typioparam, atttypmod); if (!datumIsEqual(result, result2, typform->typbyval, typform->typlen)) elog(WARNING, "type %s has unstable inputconversion for \"%s\"", NameStr(typform->typname), string); } #endif The pointer values in the two objects could not be equal so datumIsEqual would certainly fail. So one way of looking at this is that the case indeed should be considered unsupported. However, what I'm realizing from this discussion is that we need to have a stronger convention on what we put into Const nodes. Repeatable results from the input function are certainly necessary, but we *also* need to forbid short-header, compressed, or externally toasted values. Otherwise we will have Consts that aren't very Const (if the external value goes away) or fail to compare equal to other Consts that they reasonably should compare equal to. Seen in that light, applying pg_detoast_datum() to all varlena values that are going into Consts is sensible, and it'll take care of the expanded-object variant of the issue as well. I've not coded the fix yet, but it looks like the thing to do is to put "if not-null and typlen==-1 then pg_detoast_datum()" logic into makeConst() and coerce_type(), which so far as I can find are the only places that build new Const nodes. Also, the above-quoted repeatable-results check has to be applied *after* the detoast step, else it'll still be a hazard for expanded objects. However, coerce_type() is the only caller of stringTypeDatum(), so what we can do is move the repeatable-results check over to coerce_type() and have it repeat pg_detoast_datum() as well as the input-function call proper. regards, tom lane