Re: Huge memory consumption on partitioned table with FKs - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Huge memory consumption on partitioned table with FKs
Date
Msg-id CA+HiwqECx2BVSw3+9k9pkMvNH7ymW3gEbPgw956TtSCWt5F2mA@mail.gmail.com
Whole thread Raw
In response to Re: Huge memory consumption on partitioned table with FKs  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Huge memory consumption on partitioned table with FKs
List pgsql-hackers
Hello,

On Thu, Dec 3, 2020 at 10:15 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> At Wed, 2 Dec 2020 22:02:50 +0900, Amit Langote <amitlangote09@gmail.com> wrote in
> > Hello,
> >
> > On Tue, Dec 1, 2020 at 9:40 AM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > >
> > > At Mon, 30 Nov 2020 21:03:45 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in
> > > > On 2020-Nov-26, Kyotaro Horiguchi wrote:
> > > >
> > > > > This shares RI_ConstraintInfo cache by constraints that shares the
> > > > > same parent constraints. But you forgot that the cache contains some
> > > > > members that can differ among partitions.
> > > > >
> > > > > Consider the case of attaching a partition that have experienced a
> > > > > column deletion.
> > > >
> > > > I think this can be solved easily in the patch, by having
> > > > ri_BuildQueryKey() compare the parent's fk_attnums to the parent; if
> > > > they are equal then use the parent's constaint_id, otherwise use the
> > > > child constraint.  That way, the cache entry is reused in the common
> > > > case where they are identical.
> > >
> > > *I* think it's the direction.  After an off-list discussion, we
> > >  confirmed that even in that case the patch works as is because
> > >  fk_attnum (or contuple.conkey) always stores key attnums compatible
> > >  to the topmost parent when conparent has a valid value (assuming the
> > >  current usage of fk_attnum), but I still feel uneasy to rely on that
> > >  unclear behavior.
> >
> > Hmm, I don't see what the problem is here, because it's not
> > RI_ConstraintInfo that is being shared among partitions of the same
> > parent, but the RI query (and the SPIPlanPtr for it) that is issued
> > through SPI.  The query (and the plan) turns out to be the same no
> > matter what partition's RI_ConstraintInfo is first used to generate
> > it.  What am I missing?
>
> I don't think you're missing something. As I (tried to..) mentioned in
> another branch of this thread, you're right. On the otherhand
> fk_attnums is actually used to generate the query, thus *issue*
> potentially affects the query.  Although that might be paranoic, that
> possibility is eliminated by checking the congruency(?) of fk_attnums
> (or other members).  We might even be able to share a riinfo among
> such children.

Just to be sure I've not misunderstood you, let me mention why I think
the queries used by different partitions all turn out to be the same
despite attribute number differences among partitions.  The places
that uses fk_attnums when generating a query are these among others:

Oid fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
...
Oid fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
...
            quoteOneName(attname,
                         RIAttName(fk_rel, riinfo->fk_attnums[i]));

For the queries on the referencing side ("check" side),
type/collation/attribute name determined using the above are going to
be the same for all partitions in a given tree irrespective of the
attribute number, because they're logically the same column.  On the
referenced side ("restrict", "cascade", "set" side), as you already
mentioned, fk_attnums refers to the top parent table of the
referencing side, so no possibility of they being different in the
various referenced partitions' RI_ConstraintInfos.

On the topic of how we'd be able to share even the RI_ConstraintInfos
among partitions, that would indeed look a bit more elaborate than the
patch we have right now.

> > BTW, one problem with Kuroda-san's patch is that it's using
> > conparentid as the shared key, which can still lead to multiple
> > queries being generated when using multi-level partitioning, because
> > there would be many (intermediate) parent constraints in that case.
> > We should really be using the "root" constraint OID as the key,
> > because there would only be one root from which all constraints in a
> > given partition hierarchy would've originated.  Actually, I had
> > written a patch a few months back to do exactly that, a polished
> > version of which I've attached with this email.  Please take a look.
>
> I don't like that the function self-recurses but
> ri_GetParentConstrOid() actually climbs up to the root constraint, so
> the patch covers the multilevel partitioning cases.

Sorry, I got too easily distracted by the name Kuroda-san chose for
the field in RI_ConstraintInfo and the function.

> About your patch, it calculates the root constrid at the time an
> riinfo is created, but when the root-partition is further attached to
> another partitioned-table after the riinfo creation,
> constraint_root_id gets stale.  Of course that dones't matter
> practically, though.

Maybe we could also store the hash value of the root constraint OID as
rootHashValue and check for that one too in
InvalidateConstraintCacheCallBack().  That would take care of this
unless I'm missing something.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Disable WAL logging to speed up data loading
Next
From: "Hou, Zhijie"
Date:
Subject: RE: A new function to wait for the backend exit after termination