Re: [HACKERS] UPDATE of partition key - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] UPDATE of partition key
Date
Msg-id CA+TgmobVbxfkxEpZG=Zxumfg_b3q7PECOLBoHk2g4KtauYVx5Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] UPDATE of partition key  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Responses Re: [HACKERS] UPDATE of partition key
List pgsql-hackers
On Thu, Jan 11, 2018 at 6:07 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> In the first paragraph of my explanation, I was explaining why two
> Transition capture states does not look like a good idea to me :

Oh, sorry.  I didn't read what you wrote carefully enough, I guess.

I see your points.  I think that there is probably a general need for
some refactoring here.  AfterTriggerSaveEvent() got significantly more
complicated and harder to understand with the arrival of transition
tables, and this patch is adding more complexity still.  It's also
adding complexity in other places to make ExecInsert() and
ExecDelete() usable for the semi-internal DELETE/INSERT operations
being produced when we split a partition key update into a DELETE and
INSERT pair.  It would be awfully nice to have some better way to
separate out each of the different things we might or might not want
to do depending on the situation: capture old tuple, capture new
tuple, fire before triggers, fire after triggers, count processed
rows, set command tag, perform actual heap operation, update indexes,
etc.  However, I don't have a specific idea how to do it better, so
maybe we should just get this committed for now and perhaps, with more
eyes on the code, someone will have a good idea.

> Slight correction; it was suggested by Amit Langote; not by David.

Oh, OK, sorry.

> So there are two independent optimizations we are talking about :
>
> 1. Create the map only when needed.
> 2. In case of UPDATE, for partitions that take part in update scans,
> there should be a single map; there should not be two separate maps,
> one for accessing per-subplan and the other for accessing per-leaf.

These optimizations aren't completely independent.   Optimization #2
can be implemented in several different ways.  The way you've chosen
to do it is to index the same array in two different ways depending on
whether per-leaf indexing is not needed, which I think is
unacceptable.  Another approach, which I proposed upthread, is to
always built the per-leaf mapping, but you pointed out that this could
involve doing a lot of unnecessary work in the case where most leaves
were pruned.  However, if you also implement #1, then that problem
goes away.  In other words, depending on the design you choose for #2,
you may or may not need to also implement optimization #1 to get good
performance.

To put that another way, I think Amit's idea of keeping a
subplan-offsets array is a pretty good one.  From your comments, you
do too.  But if we want to keep that, then we need a way to avoid the
expense of populating it for leaves that got pruned, except when we
are doing update row movement.  Otherwise, I don't see much choice but
to jettison the subplan-offsets array and just maintain two separate
arrays of mappings.

> Regarding the array names ...
>
> Noting all this, I feel we can go with names according to the
> structure of maps. Something like : mt_perleaf_tupconv_maps, and
> mt_persubplan_tupconv_maps. Other suggestions welcome.

I'd probably do mt_per_leaf_tupconv_maps, since inserting an
underscore between some but not all words seems strange.  But OK
otherwise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] [PATCH] Generic type subscripting