Thread: Rework manipulation and structure of attribute mappings

Rework manipulation and structure of attribute mappings

From

Michael Paquier

Date:

21 November 2019, 07:25:56

Hi all,

After working on dc816e58, I have noticed that what we are doing with
attribute mappings is not that good.  In a couple of code paths of the
rewriter, the executor, and more particularly ALTER TABLE, when
working on the creation of inherited relations or partitions an
attribute mapping gets used to make sure that the new cloned elements
(indexes, fk, etc.) have correct definitions linked correctly from the
parent to the child's attributes.

Sometimes things can go wrong, because the attribute array is just an
AttrNumber pointer and it is up to the caller building the map to
guess which length it has.  Existing callers do that fine, but this
can lead to errors as recent history has proved.

Attached is a patch to refactor all that which simply adds the
attribute mapping length directly with the attribute list.  The nice
effect of the refactoring is that now callers willing to get attribute
maps don't need to think about which length it should have, and this
allows to perform checks on the expected number of attributes in the
map particularly in the executor part.  A couple of structures also
have their logic simplified.

On top of that, I have spotted two fishy attribute mapping calculations
in addFkRecurseReferencing() when adding a foreign key for partitions
when there are dropped columns and in CloneFkReferencing().  The
mapping was using the number of attributes from the foreign key, which
can be lower than the mapping of the parent if there are dropped
columns in-between.  I am pretty sure that if some attributes of the
parent are dropped (aka mapping set to 0 in the middle of its array
then we could finish with an incorrect attribute mapping, and I
suspect that this could lead to failures similar to what was fixed in
dc816e58, but I have not spent much time yet into that part.

I'll add this patch to the next CF for review.  The patch compiles and
passes all regression tests.

Thanks,
--
Michael

Attachment

Re: Rework manipulation and structure of attribute mappings

From

Amit Langote

Date:

22 November 2019, 08:21:41

Hi Michael,

Thanks for working on this.  I guess this is a follow up to:
https://www.postgresql.org/message-id/20191102052001.GB1614%40paquier.xyz

On Thu, Nov 21, 2019 at 1:26 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> Hi all,
>
> After working on dc816e58, I have noticed that what we are doing with
> attribute mappings is not that good.  In a couple of code paths of the
> rewriter, the executor, and more particularly ALTER TABLE, when
> working on the creation of inherited relations or partitions an
> attribute mapping gets used to make sure that the new cloned elements
> (indexes, fk, etc.) have correct definitions linked correctly from the
> parent to the child's attributes.
>
> Sometimes things can go wrong, because the attribute array is just an
> AttrNumber pointer and it is up to the caller building the map to
> guess which length it has.  Existing callers do that fine, but this
> can lead to errors as recent history has proved.
>
> Attached is a patch to refactor all that which simply adds the
> attribute mapping length directly with the attribute list.  The nice
> effect of the refactoring is that now callers willing to get attribute
> maps don't need to think about which length it should have, and this
> allows to perform checks on the expected number of attributes in the
> map particularly in the executor part.  A couple of structures also
> have their logic simplified.

The refactoring to use AttrMap looks good, though attmap.c as a
separate module contains too little functionality (just palloc and
pfree) to be a separate file, IMHO.  If we are to build a separate
module, why not move a bit more functionality into it from
tupconvert.c.  How about move all the code that actually creates the
map to attmap.c?  The entry points would be all the
convert_tuples_by_name_map() and convert_tuples_by_name_map_if_req()
functions that return AttrMap, rather than simply make_attrmap(int
len) which can be a static routine.  Actually, we should also refactor
convert_tuples_by_position() to carve out the code that builds the
AttrMap into a separate function and move it to attrmap.c.

To be honest, "convert_tuples_" part in those names might start
sounding a bit outdated in the future, so we should really consider
inventing a new interface map_attributes(TupleDesc indesc, TupleDesc
outdesc), because most call sites that fetch the AttrMap directly
don't really use it for "converting tuples", but to convert
expressions or to map key arrays.

After all the movement, tupconvert.c will only retain the
functionality to build a TupleConversionMap (wrapping the AttrMap) and
to convert HeapTuples, that is, execute_attr_map_tuple() and
execute_attr_map_slot(), which makes sense.

Thoughts?

> On top of that, I have spotted two fishy attribute mapping calculations
> in addFkRecurseReferencing() when adding a foreign key for partitions
> when there are dropped columns and in CloneFkReferencing().  The
> mapping was using the number of attributes from the foreign key, which
> can be lower than the mapping of the parent if there are dropped
> columns in-between.  I am pretty sure that if some attributes of the
> parent are dropped (aka mapping set to 0 in the middle of its array
> then we could finish with an incorrect attribute mapping, and I
> suspect that this could lead to failures similar to what was fixed in
> dc816e58, but I have not spent much time yet into that part.

Actually, the patch can make addFkRecurseReferencing() crash, because
the fkattnum[] array doesn't really contain attmap->maplen elements:

-            for (int j = 0; j < numfks; j++)
-                mapped_fkattnum[j] = attmap[fkattnum[j] - 1];
+            for (int j = 0; j < attmap->maplen; j++)
+                mapped_fkattnum[j] = attmap->attnums[fkattnum[j] - 1];

You failed to notice that j is really used as index into fkattnum[],
not the map array returned by convert_tuples_by_name(). So, I think
the original coding is fine here.

Thanks,
Amit

Re: Rework manipulation and structure of attribute mappings

From

Michael Paquier

Date:

22 November 2019, 10:57:24

On Fri, Nov 22, 2019 at 02:21:41PM +0900, Amit Langote wrote:
> Thanks for working on this.  I guess this is a follow up to:
> https://www.postgresql.org/message-id/20191102052001.GB1614%40paquier.xyz

Exactly.  I got that in my mind for a couple of days, so I gave it a
shot and the result was kind of nice.  And here we are.

> On Thu, Nov 21, 2019 at 1:26 PM Michael Paquier <michael@paquier.xyz> wrote:
> The refactoring to use AttrMap looks good, though attmap.c as a
> separate module contains too little functionality (just palloc and
> pfree) to be a separate file, IMHO.  If we are to build a separate
> module, why not move a bit more functionality into it from
> tupconvert.c.  How about move all the code that actually creates the
> map to attmap.c?  The entry points would be all the
> convert_tuples_by_name_map() and convert_tuples_by_name_map_if_req()
> functions that return AttrMap, rather than simply make_attrmap(int
> len) which can be a static routine.

Yeah, the current part is a little bit shy about that.  Moving
convert_tuples_by_name_map() and the second one to attmap.c makes
sense.

> Actually, we should also refactor
> convert_tuples_by_position() to carve out the code that builds the
> AttrMap into a separate function and move it to attmap.c.

Not sure how to name that.  One logic uses a match based on the
attribute name, and the other uses the type.  So the one based on the
attribute name should be something like build_attrmap_by_name() and
the second attrmap_build_by_position()?  We could use a better
convention like AttrMapBuildByPosition for example.  Any suggestions
of names are welcome.  Please note that I still have a commit fest to
run and finish, so I'll unlikely come back to that before December.
Let's continue with the tuning of the function names though.

> To be honest, "convert_tuples_" part in those names might start
> sounding a bit outdated in the future, so we should really consider
> inventing a new interface map_attributes(TupleDesc indesc, TupleDesc
> outdesc), because most call sites that fetch the AttrMap directly
> don't really use it for "converting tuples", but to convert
> expressions or to map key arrays.
>
> After all the movement, tupconvert.c will only retain the
> functionality to build a TupleConversionMap (wrapping the AttrMap) and
> to convert HeapTuples, that is, execute_attr_map_tuple() and
> execute_attr_map_slot(), which makes sense.

Agreed.  Let's design that carefully.

> Actually, the patch can make addFkRecurseReferencing() crash, because
> the fkattnum[] array doesn't really contain attmap->maplen elements:
>
> -            for (int j = 0; j < numfks; j++)
> -                mapped_fkattnum[j] = attmap[fkattnum[j] - 1];
> +            for (int j = 0; j < attmap->maplen; j++)
> +                mapped_fkattnum[j] = attmap->attnums[fkattnum[j] - 1];
>
> You failed to notice that j is really used as index into fkattnum[],
> not the map array returned by convert_tuples_by_name(). So, I think
> the original coding is fine here.

Ouch, yes.  The regression tests did not complain on this one.  It
means that we could improve the coverage.  The second, though...  I
need to check it more closely.
--
Michael

Attachment

signature.asc

Re: Rework manipulation and structure of attribute mappings

From

Amit Langote

Date:

25 November 2019, 11:55:50

On Fri, Nov 22, 2019 at 4:57 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Nov 22, 2019 at 02:21:41PM +0900, Amit Langote wrote:
> > Actually, we should also refactor
> > convert_tuples_by_position() to carve out the code that builds the
> > AttrMap into a separate function and move it to attmap.c.
>
> Not sure how to name that.  One logic uses a match based on the
> attribute name, and the other uses the type.  So the one based on the
> attribute name should be something like build_attrmap_by_name() and
> the second attrmap_build_by_position()?  We could use a better
> convention like AttrMapBuildByPosition for example.  Any suggestions
> of names are welcome.

Actually, I was just suggesting that we create a new function
convert_tuples_by_position_map() and put the logic that compares the
two TupleDescs to create the AttrMap in it, just like
convert_tuples_by_name_map().  Now you could say that there would be
no point in having such a function, because no caller currently wants
to use such a map (that is, without the accompanying
TupleConversionMap), but maybe there will be in the future.  Though
irrespective of that consideration, I guess you'd agree to group
similar code in a single source file.

Regarding coming up with any new name, having a word in the name that
distinguishes the aspect of mapping by attribute name vs. type
(position) should suffice.  We can always do the renaming in a
separate patch.

>  Please note that I still have a commit fest to
> run and finish, so I'll unlikely come back to that before December.
> Let's continue with the tuning of the function names though.

As it's mainly just moving around code, I gave it a shot; patch
attached (applies on top of yours).  I haven't invented any new names
yet, but let's keep discussing that as you say.

Thanks,
Amit

Attachment

0002-Move-more-code-to-attmap.c.patch

Re: Rework manipulation and structure of attribute mappings

From

Michael Paquier

Date:

04 December 2019, 11:03:08

On Mon, Nov 25, 2019 at 05:55:50PM +0900, Amit Langote wrote:
> Actually, I was just suggesting that we create a new function
> convert_tuples_by_position_map() and put the logic that compares the
> two TupleDescs to create the AttrMap in it, just like
> convert_tuples_by_name_map().  Now you could say that there would be
> no point in having such a function, because no caller currently wants
> to use such a map (that is, without the accompanying
> TupleConversionMap), but maybe there will be in the future.  Though
> irrespective of that consideration, I guess you'd agree to group
> similar code in a single source file.

Hmm.  I would rather keep the attribute map generation and the tuple
conversion part, the latter depending on the former, into two
different files.  That's what I did in the v2 attached.

> As it's mainly just moving around code, I gave it a shot; patch
> attached (applies on top of yours).  I haven't invented any new names
> yet, but let's keep discussing that as you say.

I see.  That saved me some time, thanks.  It is not really intuitive
to name routines about tuple conversion from tupconvert.c to
attrmap.c, so I'd think about renaming those routines as well, like
build_attrmap_by_name/position instead. That's more consistent with
the rest I added.

Another thing is that we have duplicated code at the end of
build_attrmap_by_name_if_req and build_attrmap_by_position, which I
think would be better refactored as a static function part of
attmap.c.  This way the if_req() flavor gets much simpler.

I have also fixed the issue with the FK mapping in
addFkRecurseReferencing() you reported previously.

What do you think about that?  I would like to think that we are
getting at something rather committable here, though I feel that I
need to review more the comments around the new files and if we could
document better AttrMap and its properties.
--
Michael

Attachment

Re: Rework manipulation and structure of attribute mappings

From

Amit Langote

Date:

06 December 2019, 12:03:12

Thanks for the updated patch.

On Wed, Dec 4, 2019 at 5:03 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Mon, Nov 25, 2019 at 05:55:50PM +0900, Amit Langote wrote:
> > Actually, I was just suggesting that we create a new function
> > convert_tuples_by_position_map() and put the logic that compares the
> > two TupleDescs to create the AttrMap in it, just like
> > convert_tuples_by_name_map().  Now you could say that there would be
> > no point in having such a function, because no caller currently wants
> > to use such a map (that is, without the accompanying
> > TupleConversionMap), but maybe there will be in the future.  Though
> > irrespective of that consideration, I guess you'd agree to group
> > similar code in a single source file.
>
> Hmm.  I would rather keep the attribute map generation and the tuple
> conversion part, the latter depending on the former, into two
> different files.  That's what I did in the v2 attached.

Check.

> > As it's mainly just moving around code, I gave it a shot; patch
> > attached (applies on top of yours).  I haven't invented any new names
> > yet, but let's keep discussing that as you say.
>
> I see.  That saved me some time, thanks.  It is not really intuitive
> to name routines about tuple conversion from tupconvert.c to
> attrmap.c, so I'd think about renaming those routines as well, like
> build_attrmap_by_name/position instead. That's more consistent with
> the rest I added.

Sorry I don't understand this.  Do you mean we should rename the
routines left behind in tupconvert.c?  For example,
convert_tuples_by_name() doesn't really "convert" tuples, only builds
a map needed to do so.  Maybe build_tuple_conversion_map_by_name()
would be a more suitable name.

> Another thing is that we have duplicated code at the end of
> build_attrmap_by_name_if_req and build_attrmap_by_position, which I
> think would be better refactored as a static function part of
> attmap.c.  This way the if_req() flavor gets much simpler.

Neat.

> I have also fixed the issue with the FK mapping in
> addFkRecurseReferencing() you reported previously.

Check.

> What do you think about that?  I would like to think that we are
> getting at something rather committable here, though I feel that I
> need to review more the comments around the new files and if we could
> document better AttrMap and its properties.

Regarding that, comment on a comment added by the patch:

+ * Attribute mapping structure
+ *
+ * An attribute mapping tracks the relationship of a child relation and
+ * its parent for inheritance and partitions.  This is used mainly for
+ * cloned object creations (indexes, foreign keys, etc.) when creating
+ * an inherited child relation, and for runtime-execution attribute
+ * mapping.
+ *
+ * Dropped attributes are marked with 0 and the length of the map is set
+ * to be the number of attributes of the parent, which takes into account
+ * its dropped attributes.

Maybe we don't need to repeat here what AttrMap is used for (it's
already written in attmap.c), only write what it is and why it's
needed in the first place.  Maybe like this:

/*
 * Attribute mapping structure
 *
 * This maps attribute numbers between a pair of relations, designated 'input'
 * and 'output' (most typically inheritance parent and child relations), whose
 * common columns have different attribute numbers.  Such difference may arise
 * due to the columns being ordered differently in the two relations or the
 * two relations having dropped columns at different positions.
 *
 * 'maplen' is set to the number of attributes of the 'output' relation,
 * taking into account any of its dropped attributes, with the corresponding
 * elements of the 'attnums' array set to zero.
 */

Thanks,
Amit

Re: Rework manipulation and structure of attribute mappings

From

Michael Paquier

Date:

09 December 2019, 05:56:57

On Fri, Dec 06, 2019 at 06:03:12PM +0900, Amit Langote wrote:
> On Wed, Dec 4, 2019 at 5:03 PM Michael Paquier <michael@paquier.xyz> wrote:
>> I see.  That saved me some time, thanks.  It is not really intuitive
>> to name routines about tuple conversion from tupconvert.c to
>> attrmap.c, so I'd think about renaming those routines as well, like
>> build_attrmap_by_name/position instead. That's more consistent with
>> the rest I added.
>
> Sorry I don't understand this.  Do you mean we should rename the
> routines left behind in tupconvert.c?  For example,
> convert_tuples_by_name() doesn't really "convert" tuples, only builds
> a map needed to do so.  Maybe build_tuple_conversion_map_by_name()
> would be a more suitable name.

I had no plans to touch this area nor to rename this layer because
that was a bit out of the original scope of this patch which is to
remove the confusion and random bets with map lengths.  I see your
point though and actually a name like what you are suggesting reflects
better what the routine does in reality. :p

> Maybe we don't need to repeat here what AttrMap is used for (it's
> already written in attmap.c), only write what it is and why it's
> needed in the first place.  Maybe like this:
>
> /*
>  * Attribute mapping structure
>  *
>  * This maps attribute numbers between a pair of relations, designated 'input'
>  * and 'output' (most typically inheritance parent and child relations), whose
>  * common columns have different attribute numbers.  Such difference may arise
>  * due to the columns being ordered differently in the two relations or the
>  * two relations having dropped columns at different positions.
>  *
>  * 'maplen' is set to the number of attributes of the 'output' relation,
>  * taking into account any of its dropped attributes, with the corresponding
>  * elements of the 'attnums' array set to zero.
>  */

That sounds better, thanks.

While on it, I have also spent some time checking after the FK-related
points that I suspected as fishy at the beginning of the thread but I
have not been able to break it.  We also have coverage for problems
related to dropped columns in foreign_key.sql (grep for fdrop1), which
is more than enough.  There was actually one extra issue in the patch
as of CloneFkReferencing() when filling in mapped_conkey based on the
number of keys in the constraint.

So, a couple of hours after looking at the code I am finishing with
the updated and indented version attached.  What do you think?
--
Michael

Attachment

Re: Rework manipulation and structure of attribute mappings

From

Amit Langote

Date:

17 December 2019, 07:54:27

Hi Michael,

On Mon, Dec 9, 2019 at 11:57 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Dec 06, 2019 at 06:03:12PM +0900, Amit Langote wrote:
> > Sorry I don't understand this.  Do you mean we should rename the
> > routines left behind in tupconvert.c?  For example,
> > convert_tuples_by_name() doesn't really "convert" tuples, only builds
> > a map needed to do so.  Maybe build_tuple_conversion_map_by_name()
> > would be a more suitable name.
>
> I had no plans to touch this area nor to rename this layer because
> that was a bit out of the original scope of this patch which is to
> remove the confusion and random bets with map lengths.  I see your
> point though and actually a name like what you are suggesting reflects
> better what the routine does in reality. :p

Maybe another day. :)

> So, a couple of hours after looking at the code I am finishing with
> the updated and indented version attached.  What do you think?

Thanks for the updated patch.  I don't have any comments, except that
the text I suggested couple of weeks ago no longer reads clear:

+ * by DDL operating on inheritance and partition trees to convert fully
+ * transformed expression trees from parent rowtype to child rowtype or
+ * vice-versa.

Maybe:

...to adjust the Vars in fully transformed expression trees to bear
output relation's attribute numbers.

Thanks,
Amit

Re: Rework manipulation and structure of attribute mappings

From

Michael Paquier

Date:

18 December 2019, 10:26:44

On Tue, Dec 17, 2019 at 01:54:27PM +0900, Amit Langote wrote:
> Thanks for the updated patch.  I don't have any comments, except that
> the text I suggested couple of weeks ago no longer reads clear:

I have spent a couple of extra hours on the patch, and committed it.
There was one issue in logicalrelation.h which failed to compile
standalone.

> + * by DDL operating on inheritance and partition trees to convert fully
> + * transformed expression trees from parent rowtype to child rowtype or
> + * vice-versa.
>
> Maybe:
>
> ...to adjust the Vars in fully transformed expression trees to bear
> output relation's attribute numbers.

I have used something more generic at the end:
+ * mappings by comparing input and output TupleDescs.  Such mappings
+ * are typically used by DDL operating on inheritance and partition trees
+ * to do a conversion between rowtypes logically equivalent but with
+ * columns in a different order, taking into account dropped columns.
--
Michael

Attachment

signature.asc