Re: [HACKERS] Valgrind-detected bug in partitioning code - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Valgrind-detected bug in partitioning code
Date
Msg-id CA+Tgmobs5b_E8d4oAUOuTwf=K13Mf+Jn=zNg08Hsi-Np6vwhOQ@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Valgrind-detected bug in partitioning code  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Valgrind-detected bug in partitioning code  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Jan 20, 2017 at 8:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> skink has been unhappy since commit d26fa4f went in, but I think
> that just exposed a pre-existing bug.  Running valgrind here
> duplicates the failure:
>
> ==00:00:02:01.653 16626== Conditional jump or move depends on uninitialised value(s)
> ==00:00:02:01.653 16626==    at 0x4BDF6B: btint4cmp (nbtcompare.c:97)
> ==00:00:02:01.653 16626==    by 0x81D6BE: FunctionCall2Coll (fmgr.c:1318)
> ==00:00:02:01.653 16626==    by 0x52D584: partition_bounds_equal (partition.c:627)
> ==00:00:02:01.653 16626==    by 0x80CF8E: RelationClearRelation (relcache.c:1203)
> ==00:00:02:01.653 16626==    by 0x80E601: RelationCacheInvalidateEntry (relcache.c:2662)
> ==00:00:02:01.653 16626==    by 0x803DD6: LocalExecuteInvalidationMessage (inval.c:568)
> ==00:00:02:01.653 16626==    by 0x803F53: ProcessInvalidationMessages.clone.0 (inval.c:444)
> ==00:00:02:01.653 16626==    by 0x8040C8: CommandEndInvalidationMessages (inval.c:1056)
> ==00:00:02:01.653 16626==    by 0x80C719: RelationSetNewRelfilenode (relcache.c:3490)
> ==00:00:02:01.653 16626==    by 0x5CD50A: ExecuteTruncate (tablecmds.c:1393)
> ==00:00:02:01.653 16626==    by 0x721AC7: standard_ProcessUtility (utility.c:532)
> ==00:00:02:01.653 16626==    by 0x71D943: PortalRunUtility (pquery.c:1163)
>
> IOW, partition_bounds_equal() is testing uninitialized memory during
> a TRUNCATE on a partitioned table.

Hmm.  That's bad.  I kind of wonder how sane it is to think that we
can invoke SQL-callable functions during a relcache reload, because
couldn't we be processing an invalidation in the context of an aborted
transaction?  And I wonder why we really need or want to do that
anyway.  For purposes of equalPartitionDescs(), it seems like the
relevant test is datumIsEqual(), not the equality operator derived
from the partition opclass.

But I think the immediate problem here is some fuzzy thinking about
the handling of the values taken from b1->content and b2->content.
Those have to be checked before examining values from b1->datums
and/or b2->datums, and the latter should be inspected only if the
former are both identical and both RANGE_DATUM_FINITE.  I'll push a
fix for that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Jesper Pedersen
Date:
Subject: Re: [HACKERS] Microvacuum support for Hash Index
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] pg_hba_file_settings view patch