Thread: [HACKERS] Valgrind-detected bug in partitioning code
skink has been unhappy since commit d26fa4f went in, but I think that just exposed a pre-existing bug. Running valgrind here duplicates the failure: ==00:00:02:01.653 16626== Conditional jump or move depends on uninitialised value(s) ==00:00:02:01.653 16626== at 0x4BDF6B: btint4cmp (nbtcompare.c:97) ==00:00:02:01.653 16626== by 0x81D6BE: FunctionCall2Coll (fmgr.c:1318) ==00:00:02:01.653 16626== by 0x52D584: partition_bounds_equal (partition.c:627) ==00:00:02:01.653 16626== by 0x80CF8E: RelationClearRelation (relcache.c:1203) ==00:00:02:01.653 16626== by 0x80E601: RelationCacheInvalidateEntry (relcache.c:2662) ==00:00:02:01.653 16626== by 0x803DD6: LocalExecuteInvalidationMessage (inval.c:568) ==00:00:02:01.653 16626== by 0x803F53: ProcessInvalidationMessages.clone.0 (inval.c:444) ==00:00:02:01.653 16626== by 0x8040C8: CommandEndInvalidationMessages (inval.c:1056) ==00:00:02:01.653 16626== by 0x80C719: RelationSetNewRelfilenode (relcache.c:3490) ==00:00:02:01.653 16626== by 0x5CD50A: ExecuteTruncate (tablecmds.c:1393) ==00:00:02:01.653 16626== by 0x721AC7: standard_ProcessUtility (utility.c:532) ==00:00:02:01.653 16626== by 0x71D943: PortalRunUtility (pquery.c:1163) IOW, partition_bounds_equal() is testing uninitialized memory during a TRUNCATE on a partitioned table. regards, tom lane
On Fri, Jan 20, 2017 at 8:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > skink has been unhappy since commit d26fa4f went in, but I think > that just exposed a pre-existing bug. Running valgrind here > duplicates the failure: > > ==00:00:02:01.653 16626== Conditional jump or move depends on uninitialised value(s) > ==00:00:02:01.653 16626== at 0x4BDF6B: btint4cmp (nbtcompare.c:97) > ==00:00:02:01.653 16626== by 0x81D6BE: FunctionCall2Coll (fmgr.c:1318) > ==00:00:02:01.653 16626== by 0x52D584: partition_bounds_equal (partition.c:627) > ==00:00:02:01.653 16626== by 0x80CF8E: RelationClearRelation (relcache.c:1203) > ==00:00:02:01.653 16626== by 0x80E601: RelationCacheInvalidateEntry (relcache.c:2662) > ==00:00:02:01.653 16626== by 0x803DD6: LocalExecuteInvalidationMessage (inval.c:568) > ==00:00:02:01.653 16626== by 0x803F53: ProcessInvalidationMessages.clone.0 (inval.c:444) > ==00:00:02:01.653 16626== by 0x8040C8: CommandEndInvalidationMessages (inval.c:1056) > ==00:00:02:01.653 16626== by 0x80C719: RelationSetNewRelfilenode (relcache.c:3490) > ==00:00:02:01.653 16626== by 0x5CD50A: ExecuteTruncate (tablecmds.c:1393) > ==00:00:02:01.653 16626== by 0x721AC7: standard_ProcessUtility (utility.c:532) > ==00:00:02:01.653 16626== by 0x71D943: PortalRunUtility (pquery.c:1163) > > IOW, partition_bounds_equal() is testing uninitialized memory during > a TRUNCATE on a partitioned table. Hmm. That's bad. I kind of wonder how sane it is to think that we can invoke SQL-callable functions during a relcache reload, because couldn't we be processing an invalidation in the context of an aborted transaction? And I wonder why we really need or want to do that anyway. For purposes of equalPartitionDescs(), it seems like the relevant test is datumIsEqual(), not the equality operator derived from the partition opclass. But I think the immediate problem here is some fuzzy thinking about the handling of the values taken from b1->content and b2->content. Those have to be checked before examining values from b1->datums and/or b2->datums, and the latter should be inspected only if the former are both identical and both RANGE_DATUM_FINITE. I'll push a fix for that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Hmm. That's bad. I kind of wonder how sane it is to think that we > can invoke SQL-callable functions during a relcache reload, because > couldn't we be processing an invalidation in the context of an aborted > transaction? You're doing WHAT? regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > Hmm. That's bad. I kind of wonder how sane it is to think that we > > can invoke SQL-callable functions during a relcache reload, because > > couldn't we be processing an invalidation in the context of an aborted > > transaction? > > You're doing WHAT? Uh. +1. Thanks! Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> Hmm. That's bad. I kind of wonder how sane it is to think that we >>> can invoke SQL-callable functions during a relcache reload, >> You're doing WHAT? > Uh. +1. Now that I've calmed down a bit: the right way to do this sort of thing is simply to flush the invalidated data during reload, and recompute it when it is next requested, which necessarily will be inside a valid transaction. Compare e.g. the handling of the lists of a relation's indexes. regards, tom lane
On Fri, Jan 20, 2017 at 4:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Stephen Frost <sfrost@snowman.net> writes: >> * Tom Lane (tgl@sss.pgh.pa.us) wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> Hmm. That's bad. I kind of wonder how sane it is to think that we >>>> can invoke SQL-callable functions during a relcache reload, > >>> You're doing WHAT? > >> Uh. +1. > > Now that I've calmed down a bit: the right way to do this sort of thing is > simply to flush the invalidated data during reload, and recompute it when > it is next requested, which necessarily will be inside a valid > transaction. Compare e.g. the handling of the lists of a relation's > indexes. The existing handling of partition descriptors is modeled on and very similar to the existing handling for other types of objects: keep_tupdesc = equalTupleDescs(relation->rd_att, newrel->rd_att); keep_rules = equalRuleLocks(relation->rd_rules, newrel->rd_rules); keep_policies = equalRSDesc(relation->rd_rsdesc, newrel->rd_rsdesc); keep_partkey = (relation->rd_partkey != NULL); keep_partdesc = equalPartitionDescs(relation->rd_partkey, relation->rd_partdesc, newrel->rd_partdesc); And I think the reason is the same too, namely, if we've got a pointer into partition descriptor in the relcache, we don't want that to suddenly get swapped out and replaced with a pointer to an equivalent data structure at a different address, because then our pointer will be dangling. That seems fine as far as it goes. The difference is that those other equalBLAH functions call a carefully limited amount of code whereas, in looking over the backtrace you sent, I realized that equalPartitionDescs is calling partition_bounds_equal which does this: cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[j], key->partcollation[j], b1->datums[i][j], b2->datums[i][j])) That's of course opening up a much bigger can of worms. But apart from the fact that it's unsafe, I think it's also wrong, as I said upthread. I think calling datumIsEqual() there should be better all around. Do you think that's unsafe here? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > The difference is that those other equalBLAH functions call a > carefully limited amount of code whereas, in looking over the > backtrace you sent, I realized that equalPartitionDescs is calling > partition_bounds_equal which does this: > cmpval = > DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[j], > key->partcollation[j], > b1->datums[i][j], > b2->datums[i][j])) Ah, gotcha. > That's of course opening up a much bigger can of worms. But apart > from the fact that it's unsafe, I think it's also wrong, as I said > upthread. I think calling datumIsEqual() there should be better all > around. Do you think that's unsafe here? That sounds like a plausible solution. It is safe in the sense of being a bounded amount of code. It would return "false" in various interesting cases like toast pointer versus detoasted equivalent, but I think that would be fine in this application. It would probably be a good idea to add something to datumIsEqual's comment to the effect that trying to make it smarter would be a bad idea, because some callers rely on it being stupid. regards, tom lane
On 2017/01/21 9:01, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> The difference is that those other equalBLAH functions call a >> carefully limited amount of code whereas, in looking over the >> backtrace you sent, I realized that equalPartitionDescs is calling >> partition_bounds_equal which does this: >> cmpval = >> DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[j], >> key->partcollation[j], >> b1->datums[i][j], >> b2->datums[i][j])) > > Ah, gotcha. > >> That's of course opening up a much bigger can of worms. But apart >> from the fact that it's unsafe, I think it's also wrong, as I said >> upthread. I think calling datumIsEqual() there should be better all >> around. Do you think that's unsafe here? > > That sounds like a plausible solution. It is safe in the sense of > being a bounded amount of code. It would return "false" in various > interesting cases like toast pointer versus detoasted equivalent, > but I think that would be fine in this application. Sorry for jumping in late. Attached patch replaces the call to partitioning-specific comparison function by the call to datumIsEqual(). I wonder if it is safe to assume that datumIsEqual() would return true for a datum and copy of it made using datumCopy(). The latter is used to copy a single datum from a bound's Const node (what is stored in the catalog for every bound). > It would probably be a good idea to add something to datumIsEqual's > comment to the effect that trying to make it smarter would be a bad idea, > because some callers rely on it being stupid. I assume "making datumIsEqual() smarter" here means to make it account for toasting of varlena datums, which is not a good idea because some of its callers may be working in the context of an aborted transaction. I tried to update the header comment along these lines, though please feel to rewrite it. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Mon, Jan 23, 2017 at 12:45 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Sorry for jumping in late. Attached patch replaces the call to > partitioning-specific comparison function by the call to datumIsEqual(). > I wonder if it is safe to assume that datumIsEqual() would return true for > a datum and copy of it made using datumCopy(). The latter is used to copy > a single datum from a bound's Const node (what is stored in the catalog > for every bound). Thanks, committed. I expanded the comment in partition.c because I think you missed the other rationale for doing it this way, which is that the partitioning operator might ignore some "unimportant" changes (e.g. for numeric, the difference between 1.0 and 1.00) but for this purpose it's better to update the relcache if there is *any* change. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company