Thread: tablecmds.c/MergeAttributes() cleanup

tablecmds.c/MergeAttributes() cleanup

From
Peter Eisentraut
Date:
The MergeAttributes() and related code in and around tablecmds.c is huge 
and ancient, with many things bolted on over time, and difficult to deal 
with.  I took some time to make careful incremental updates and 
refactorings to make the code easier to follow, more compact, and more 
modern in appearance.  I also found several pieces of obsolete code 
along the way.  This resulted in the attached long patch series.  Each 
patch tries to make a single change and can be considered incrementally. 
  At the end, the code is shorter, better factored, and I hope easier to 
understand.  There shouldn't be any change in behavior.
Attachment

Re: tablecmds.c/MergeAttributes() cleanup

From
Alvaro Herrera
Date:
On 2023-Jun-28, Peter Eisentraut wrote:

> The MergeAttributes() and related code in and around tablecmds.c is huge and
> ancient, with many things bolted on over time, and difficult to deal with.
> I took some time to make careful incremental updates and refactorings to
> make the code easier to follow, more compact, and more modern in appearance.
> I also found several pieces of obsolete code along the way.  This resulted
> in the attached long patch series.  Each patch tries to make a single change
> and can be considered incrementally.  At the end, the code is shorter,
> better factored, and I hope easier to understand.  There shouldn't be any
> change in behavior.

I request to leave this alone for now. I have enough things to juggle
with in the NOTNULLs patch; this patchset looks like it will cause me
messy merge conflicts.  0004 for instance looks problematic, as does
0007 and 0016.

FWIW for the most part that patch is working and I intend to re-submit
shortly, but the relevant pg_upgrade code is really brittle, so it's
taken me much more than I expected to get it in good shape for all
cases.

Thanks

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: tablecmds.c/MergeAttributes() cleanup

From
Alvaro Herrera
Date:
On 2023-Jun-28, Peter Eisentraut wrote:

> The MergeAttributes() and related code in and around tablecmds.c is huge and
> ancient, with many things bolted on over time, and difficult to deal with.
> I took some time to make careful incremental updates and refactorings to
> make the code easier to follow, more compact, and more modern in appearance.
> I also found several pieces of obsolete code along the way.  This resulted
> in the attached long patch series.  Each patch tries to make a single change
> and can be considered incrementally.  At the end, the code is shorter,
> better factored, and I hope easier to understand.  There shouldn't be any
> change in behavior.

I spent a few minutes doing a test merge of this to my branch with NOT
NULL changes.  Here's a quick review.

> Subject: [PATCH 01/17] Remove obsolete comment about OID support

Obvious, trivial.  +1

> Subject: [PATCH 02/17] Remove ancient special case code for adding oid columns

LGTM; deletes dead code.

> Subject: [PATCH 03/17] Remove ancient special case code for dropping oid
>  columns

Hmm, interesting.  Yay for more dead code removal.  Didn't verify it.

> Subject: [PATCH 04/17] Make more use of makeColumnDef()

Good idea, +1.  Some lines (almost all makeColumnDef callsites) end up
too long.  This is the first patch that actually conflicts with the NOT
NULLs one, and the conflicts are easy to solve, so I won't complain if
you want to get it pushed soon.

> Subject: [PATCH 05/17] Clean up MergeAttributesIntoExisting()

I don't disagree with this in principle, but this one has more
conflicts than the previous ones.


> Subject: [PATCH 06/17] Clean up MergeCheckConstraint()

Looks a reasonable change as far as this patch goes.

However, reading it I noticed that CookedConstraint->inhcount is int
and is tested for wraparound, but pg_constraint.coninhcount is int16.
This is probably bogus already.  ColumnDef->inhcount is also int.  These
should be narrowed to match the catalog definitions.


> Subject: [PATCH 07/17] MergeAttributes() and related variable renaming

I think this makes sense, but there's a bunch of conflicts to NOT NULLs.
I guess we can come back to this one later.

> Subject: [PATCH 08/17] Improve some catalog documentation
> 
> Point out that typstorage and attstorage are never '\0', even for
> fixed-length types.  This is different from attcompression.  For this
> reason, some of the handling of these columns in tablecmds.c etc. is
> different.  (catalogs.sgml already contained this information in an
> indirect way.)

I don't understand why we must point out that they're never '\0'.  I
mean, if we're doing that, why not say that they can never be \xFF?
The valid values are already listed.  The other parts of this patch look
okay.

> Subject: [PATCH 09/17] Remove useless if condition
> 
> This is useless because these fields are not set anywhere before, so
> we can assign them unconditionally.  This also makes this more
> consistent with ATExecAddColumn().

Makes sense.

> Subject: [PATCH 10/17] Remove useless if condition
> 
> We can call GetAttributeCompression() with a NULL argument.  It
> handles that internally already.  This change makes all the callers of
> GetAttributeCompression() uniform.

I agree, +1.

> From 2eda6bc9897d0995a5112e2851c51daf0c35656e Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter@eisentraut.org>
> Date: Wed, 14 Jun 2023 17:51:31 +0200
> Subject: [PATCH 11/17] Refactor ATExecAddColumn() to use
>  BuildDescForRelation()
> 
> BuildDescForRelation() has all the knowledge for converting a
> ColumnDef into pg_attribute/tuple descriptor.  ATExecAddColumn() can
> make use of that, instead of duplicating all that logic.  We just pass
> a one-element list of ColumnDef and we get back exactly the data
> structure we need.  Note that we don't even need to touch
> BuildDescForRelation() to make this work.

Hmm, crazy.  I'm not sure I like this, because it seems much too clever.
The number of lines that are deleted is alluring, though.

Maybe it'd be better to create a separate routine that takes a single
ColumnDef and returns the Form_pg_attribute element for it; then use
that in both BuildDescForRelation and ATExecAddColumn.

> Subject: [PATCH 12/17] Push attidentity and attgenerated handling into
>  BuildDescForRelation()
> 
> Previously, this was handled by the callers separately, but it can be
> trivially moved into BuildDescForRelation() so that it is handled in a
> central place.

Looks reasonable.



I think the last few patches are the more innovative (interesting,
useful) of the bunch.  Let's get the first few out of the way.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: tablecmds.c/MergeAttributes() cleanup

From
Peter Eisentraut
Date:
On 11.07.23 20:17, Alvaro Herrera wrote:
> I spent a few minutes doing a test merge of this to my branch with NOT
> NULL changes.  Here's a quick review.
> 
>> Subject: [PATCH 01/17] Remove obsolete comment about OID support
> 
> Obvious, trivial.  +1
> 
>> Subject: [PATCH 02/17] Remove ancient special case code for adding oid columns
> 
> LGTM; deletes dead code.
> 
>> Subject: [PATCH 03/17] Remove ancient special case code for dropping oid
>>   columns
> 
> Hmm, interesting.  Yay for more dead code removal.  Didn't verify it.

I have committed these first three.  I'll leave it at that for now.

>> Subject: [PATCH 08/17] Improve some catalog documentation
>>
>> Point out that typstorage and attstorage are never '\0', even for
>> fixed-length types.  This is different from attcompression.  For this
>> reason, some of the handling of these columns in tablecmds.c etc. is
>> different.  (catalogs.sgml already contained this information in an
>> indirect way.)
> 
> I don't understand why we must point out that they're never '\0'.  I
> mean, if we're doing that, why not say that they can never be \xFF?
> The valid values are already listed.  The other parts of this patch look
> okay.

While working through the storage and compression handling, which look 
similar, I was momentarily puzzled by this.  While attcompression can be 
0 to mean, use default, this is not possible/allowed for attstorage, but 
it took looking around three corners to verify this.  It could be more 
explicit, I thought.




Re: tablecmds.c/MergeAttributes() cleanup

From
Peter Eisentraut
Date:
On 12.07.23 16:29, Peter Eisentraut wrote:
> On 11.07.23 20:17, Alvaro Herrera wrote:
>> I spent a few minutes doing a test merge of this to my branch with NOT
>> NULL changes.  Here's a quick review.
>>
>>> Subject: [PATCH 01/17] Remove obsolete comment about OID support
>>
>> Obvious, trivial.  +1
>>
>>> Subject: [PATCH 02/17] Remove ancient special case code for adding 
>>> oid columns
>>
>> LGTM; deletes dead code.
>>
>>> Subject: [PATCH 03/17] Remove ancient special case code for dropping oid
>>>   columns
>>
>> Hmm, interesting.  Yay for more dead code removal.  Didn't verify it.
> 
> I have committed these first three.  I'll leave it at that for now.

I have committed a few more patches from this series that were already 
agreed upon.  The remaining ones are rebased and reordered a bit, attached.

There was some doubt about the patch "Refactor ATExecAddColumn() to use 
BuildDescForRelation()" (now 0009), whether it's too clever to build a 
fake one-item tuple descriptor.  I am working on another patch, which I 
hope to post this week, that proposes to replace the use of tuple 
descriptors there with a List of something.  That would then allow maybe 
rewriting this in a less-clever way.  That patch incidentally also wants 
to move BuildDescForRelation from tupdesc.c to tablecmds.c (patch 0007 
here).

Attachment

Re: tablecmds.c/MergeAttributes() cleanup

From
Alvaro Herrera
Date:
On 2023-Aug-29, Peter Eisentraut wrote:


Regarding this hunk in 0002,

> @@ -3278,13 +3261,16 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
>   *
>   * constraints is a list of CookedConstraint structs for previous constraints.
>   *
> - * Returns true if merged (constraint is a duplicate), or false if it's
> - * got a so-far-unique name, or throws error if conflict.
> + * If the constraint is a duplicate, then the existing constraint's
> + * inheritance count is updated.  If the constraint doesn't match or conflict
> + * with an existing one, a new constraint is appended to the list.  If there
> + * is a conflict (same name but different expression), throw an error.

This wording confused me:

"If the constraint doesn't match or conflict with an existing one, a new
constraint is appended to the list."

I first read it as "doesn't match or conflicts with ..." (i.e., the
negation only applied to the first verb, not both) which would have been
surprising (== broken) behavior.

I think it's clearer if you say "doesn't match nor conflict", but I'm
not sure if this is grammatically correct.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Re: tablecmds.c/MergeAttributes() cleanup

From
Alvaro Herrera
Date:
On 2023-Aug-29, Peter Eisentraut wrote:

> From 471fda80c41fae835ecbe63ae8505526a37487a9 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter@eisentraut.org>
> Date: Wed, 12 Jul 2023 16:12:35 +0200
> Subject: [PATCH v2 04/10] Add TupleDescGetDefault()
> 
> This unifies some repetitive code.
> 
> Note: I didn't push the "not found" error message into the new
> function, even though all existing callers would be able to make use
> of it.  Using the existing error handling as-is would probably require
> exposing the Relation type via tupdesc.h, which doesn't seem
> desirable.

Note that all errors here are elog(ERROR), not user-facing, so ISTM
it would be OK to change them to report the relation OID rather than the
name.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"In fact, the basic problem with Perl 5's subroutines is that they're not
crufty enough, so the cruft leaks out into user-defined code instead, by
the Conservation of Cruft Principle."  (Larry Wall, Apocalypse 6)



Re: tablecmds.c/MergeAttributes() cleanup

From
Nathan Bossart
Date:
On Tue, Aug 29, 2023 at 10:43:39AM +0200, Peter Eisentraut wrote:
> I have committed a few more patches from this series that were already
> agreed upon.  The remaining ones are rebased and reordered a bit, attached.

My compiler is complaining about 1fa9241b:

../postgresql/src/backend/commands/sequence.c: In function ‘DefineSequence’:
../postgresql/src/backend/commands/sequence.c:196:21: error: ‘coldef’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
  196 |   stmt->tableElts = lappend(stmt->tableElts, coldef);
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This went away when I added a default case that ERROR'd or initialized
coldef to NULL.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: tablecmds.c/MergeAttributes() cleanup

From
Alvaro Herrera
Date:
On 2023-Aug-29, Nathan Bossart wrote:

> On Tue, Aug 29, 2023 at 10:43:39AM +0200, Peter Eisentraut wrote:
> > I have committed a few more patches from this series that were already
> > agreed upon.  The remaining ones are rebased and reordered a bit, attached.
> 
> My compiler is complaining about 1fa9241b:
> 
> ../postgresql/src/backend/commands/sequence.c: In function ‘DefineSequence’:
> ../postgresql/src/backend/commands/sequence.c:196:21: error: ‘coldef’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
>   196 |   stmt->tableElts = lappend(stmt->tableElts, coldef);
>       |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> This went away when I added a default case that ERROR'd or initialized
> coldef to NULL.

Makes sense.  However, maybe we should replace those ugly defines and
their hardcoded values in DefineSequence with a proper array with their
names and datatypes.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Los trabajadores menos efectivos son sistematicamente llevados al lugar
donde pueden hacer el menor daño posible: gerencia."  (El principio Dilbert)



Re: tablecmds.c/MergeAttributes() cleanup

From
Nathan Bossart
Date:
On Tue, Aug 29, 2023 at 08:44:02PM +0200, Alvaro Herrera wrote:
> On 2023-Aug-29, Nathan Bossart wrote:
>> My compiler is complaining about 1fa9241b:
>> 
>> ../postgresql/src/backend/commands/sequence.c: In function ‘DefineSequence’:
>> ../postgresql/src/backend/commands/sequence.c:196:21: error: ‘coldef’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
>>   196 |   stmt->tableElts = lappend(stmt->tableElts, coldef);
>>       |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 
>> This went away when I added a default case that ERROR'd or initialized
>> coldef to NULL.
> 
> Makes sense.  However, maybe we should replace those ugly defines and
> their hardcoded values in DefineSequence with a proper array with their
> names and datatypes.

That might be an improvement, but IIUC you'd still need to enumerate all of
the columns or data types to make sure you use the right get-Datum
function.  It doesn't help that last_value uses Int64GetDatumFast and
log_cnt uses Int64GetDatum.  I could be missing something, though.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: tablecmds.c/MergeAttributes() cleanup

From
Peter Eisentraut
Date:
On 29.08.23 19:45, Nathan Bossart wrote:
> On Tue, Aug 29, 2023 at 10:43:39AM +0200, Peter Eisentraut wrote:
>> I have committed a few more patches from this series that were already
>> agreed upon.  The remaining ones are rebased and reordered a bit, attached.
> 
> My compiler is complaining about 1fa9241b:
> 
> ../postgresql/src/backend/commands/sequence.c: In function ‘DefineSequence’:
> ../postgresql/src/backend/commands/sequence.c:196:21: error: ‘coldef’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
>    196 |   stmt->tableElts = lappend(stmt->tableElts, coldef);
>        |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> This went away when I added a default case that ERROR'd or initialized
> coldef to NULL.

fixed




Re: tablecmds.c/MergeAttributes() cleanup

From
Alvaro Herrera
Date:
On 2023-Aug-29, Nathan Bossart wrote:

> On Tue, Aug 29, 2023 at 08:44:02PM +0200, Alvaro Herrera wrote:

> > Makes sense.  However, maybe we should replace those ugly defines and
> > their hardcoded values in DefineSequence with a proper array with their
> > names and datatypes.
> 
> That might be an improvement, but IIUC you'd still need to enumerate all of
> the columns or data types to make sure you use the right get-Datum
> function.  It doesn't help that last_value uses Int64GetDatumFast and
> log_cnt uses Int64GetDatum.  I could be missing something, though.

Well, for sure I meant to enumerate everything that was needed,
including the initializer for the value.  Like in the attached patch.

However, now that I've actually written it, I don't find it so pretty
anymore, but maybe that's just because I don't know how to write the
array assignment as a single statement instead of a separate statement
for each column.

But this should silence the warnings, anyway.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

Attachment

Re: tablecmds.c/MergeAttributes() cleanup

From
Peter Eisentraut
Date:
On 29.08.23 13:20, Alvaro Herrera wrote:
> On 2023-Aug-29, Peter Eisentraut wrote:
>> @@ -3278,13 +3261,16 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
>>    *
>>    * constraints is a list of CookedConstraint structs for previous constraints.
>>    *
>> - * Returns true if merged (constraint is a duplicate), or false if it's
>> - * got a so-far-unique name, or throws error if conflict.
>> + * If the constraint is a duplicate, then the existing constraint's
>> + * inheritance count is updated.  If the constraint doesn't match or conflict
>> + * with an existing one, a new constraint is appended to the list.  If there
>> + * is a conflict (same name but different expression), throw an error.
> 
> This wording confused me:
> 
> "If the constraint doesn't match or conflict with an existing one, a new
> constraint is appended to the list."
> 
> I first read it as "doesn't match or conflicts with ..." (i.e., the
> negation only applied to the first verb, not both) which would have been
> surprising (== broken) behavior.
> 
> I think it's clearer if you say "doesn't match nor conflict", but I'm
> not sure if this is grammatically correct.

Here is an updated version of this patch set.  I resolved some conflicts 
and addressed this comment of yours.  I also dropped the one patch with 
some catalog header edits that people didn't seem to particularly like.

The patches that are now 0001 through 0004 were previously reviewed and 
just held for the not-null constraint patches, I think, so I'll commit 
them in a few days if there are no objections.

Patches 0005 through 0007 are also ready in my opinion, but they haven't 
really been reviewed, so this would be something for reviewers to focus 
on.  (0005 and 0007 are trivial, but they go to together with 0006.)

The remaining 0008 and 0009 were still under discussion and contemplation.
Attachment

Re: tablecmds.c/MergeAttributes() cleanup

From
Peter Eisentraut
Date:
On 19.09.23 15:11, Peter Eisentraut wrote:
> Here is an updated version of this patch set.  I resolved some conflicts 
> and addressed this comment of yours.  I also dropped the one patch with 
> some catalog header edits that people didn't seem to particularly like.
> 
> The patches that are now 0001 through 0004 were previously reviewed and 
> just held for the not-null constraint patches, I think, so I'll commit 
> them in a few days if there are no objections.
> 
> Patches 0005 through 0007 are also ready in my opinion, but they haven't 
> really been reviewed, so this would be something for reviewers to focus 
> on.  (0005 and 0007 are trivial, but they go to together with 0006.)
> 
> The remaining 0008 and 0009 were still under discussion and contemplation.

I have committed through 0007, and I'll now close this patch set as 
"Committed", and I will (probably) bring back the rest (especially 0008) 
as part of a different patch set soon.




Re: tablecmds.c/MergeAttributes() cleanup

From
Peter Eisentraut
Date:
On 05.10.23 17:49, Peter Eisentraut wrote:
> On 19.09.23 15:11, Peter Eisentraut wrote:
>> Here is an updated version of this patch set.  I resolved some 
>> conflicts and addressed this comment of yours.  I also dropped the one 
>> patch with some catalog header edits that people didn't seem to 
>> particularly like.
>>
>> The patches that are now 0001 through 0004 were previously reviewed 
>> and just held for the not-null constraint patches, I think, so I'll 
>> commit them in a few days if there are no objections.
>>
>> Patches 0005 through 0007 are also ready in my opinion, but they 
>> haven't really been reviewed, so this would be something for reviewers 
>> to focus on.  (0005 and 0007 are trivial, but they go to together with 
>> 0006.)
>>
>> The remaining 0008 and 0009 were still under discussion and 
>> contemplation.
> 
> I have committed through 0007, and I'll now close this patch set as 
> "Committed", and I will (probably) bring back the rest (especially 0008) 
> as part of a different patch set soon.

After playing with this for, well, 2 months, and considering various 
other approaches, I would like to bring back the remaining two patches 
in unchanged form.

Especially the (now) first patch "Refactor ATExecAddColumn() to use 
BuildDescForRelation()" would be very helpful for facilitating further 
refactoring in this area, because it avoids having two essentially 
duplicate pieces of code responsible for converting a ColumnDef node 
into internal form.

One of your (Álvaro's) comments about this earlier was

 > Hmm, crazy.  I'm not sure I like this, because it seems much too clever.
 > The number of lines that are deleted is alluring, though.
 >
 > Maybe it'd be better to create a separate routine that takes a single
 > ColumnDef and returns the Form_pg_attribute element for it; then use
 > that in both BuildDescForRelation and ATExecAddColumn.

which was also my thought at the beginning.  However, this wouldn't 
quite work that way, for several reasons.  For instance, 
BuildDescForRelation() also needs to keep track of the has_not_null 
property across all fields, and so if you split that function up, you 
would have to somehow make that an output argument and have the caller 
keep track of it.  Also, the output of BuildDescForRelation() in 
ATExecAddColumn() is passed into InsertPgAttributeTuples(), which 
requires a tuple descriptor anyway, so splitting this up into a 
per-attribute function would then require ATExecAddColumn() to 
re-assemble a tuple descriptor anyway, so this wouldn't save anything. 
Also note that ATExecAddColumn() could in theory be enhanced to add more 
than one column in one go, so having this code structure in place isn't 
inconsistent with that.

The main hackish thing, I suppose, is that we have to fix up the 
attribute number after returning from BuildDescForRelation().  I suppose 
we could address that by passing in a starting attribute number (or 
alternatively maximum existing attribute number) into 
BuildDescForRelation().  I think that would be okay; it would probably 
be about a wash in terms of code added versus saved.


The (now) second patch is also still of interest to me, but I have since 
noticed that I think [0] should be fixed first, but to make that fix 
simpler, I would like the first patch from here.

[0]: 
https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7%40eisentraut.org

Attachment

Re: tablecmds.c/MergeAttributes() cleanup

From
Alvaro Herrera
Date:
On 2023-Dec-06, Peter Eisentraut wrote:

> One of your (Álvaro's) comments about this earlier was
> 
> > Hmm, crazy.  I'm not sure I like this, because it seems much too clever.
> > The number of lines that are deleted is alluring, though.
> >
> > Maybe it'd be better to create a separate routine that takes a single
> > ColumnDef and returns the Form_pg_attribute element for it; then use
> > that in both BuildDescForRelation and ATExecAddColumn.
> 
> which was also my thought at the beginning.  However, this wouldn't quite
> work that way, for several reasons.  For instance, BuildDescForRelation()
> also needs to keep track of the has_not_null property across all fields, and
> so if you split that function up, you would have to somehow make that an
> output argument and have the caller keep track of it.  Also, the output of
> BuildDescForRelation() in ATExecAddColumn() is passed into
> InsertPgAttributeTuples(), which requires a tuple descriptor anyway, so
> splitting this up into a per-attribute function would then require
> ATExecAddColumn() to re-assemble a tuple descriptor anyway, so this wouldn't
> save anything.

Hmm.  Well, if this state of affairs is useful to you, then I withdraw
my objection, because with this patch we're not really adding any new
weirdness, just moving around already-existing weirdness.  So let's
press ahead with 0001.  (I didn't look at 0002 this time, since
apparently you'd like to process the other patch first and then come
back here.)


If you look closely at InsertPgAttributeTuples and accompanying code, it
all looks a bit archaic.  They seem to be treating TupleDesc as a
glorified array of Form_pg_attribute elements in a convenient packaging.
It's probably cleaner to change these APIs so that they deal with a
Form_pg_attribute array, and not TupleDesc anymore.  But we can hack on
that some other day.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)



Re: tablecmds.c/MergeAttributes() cleanup

From
Alvaro Herrera
Date:
On 2024-Jan-11, Alvaro Herrera wrote:

> If you look closely at InsertPgAttributeTuples and accompanying code, it
> all looks a bit archaic.  They seem to be treating TupleDesc as a
> glorified array of Form_pg_attribute elements in a convenient packaging.
> It's probably cleaner to change these APIs so that they deal with a
> Form_pg_attribute array, and not TupleDesc anymore.  But we can hack on
> that some other day.

In addition, it also occurs to me now that maybe it would make sense to
change the TupleDesc implementation to use a List of Form_pg_attribute
instead of an array, and do away with ->natts.  This would let us change
all "for ( ... natts ...)" loops into foreach_ptr() loops ...  there are
only five hundred of them or so --rolls eyes--.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"El sudor es la mejor cura para un pensamiento enfermo" (Bardia)



Re: tablecmds.c/MergeAttributes() cleanup

From
Robert Haas
Date:
On Fri, Jan 12, 2024 at 5:32 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> On 2024-Jan-11, Alvaro Herrera wrote:
> > If you look closely at InsertPgAttributeTuples and accompanying code, it
> > all looks a bit archaic.  They seem to be treating TupleDesc as a
> > glorified array of Form_pg_attribute elements in a convenient packaging.
> > It's probably cleaner to change these APIs so that they deal with a
> > Form_pg_attribute array, and not TupleDesc anymore.  But we can hack on
> > that some other day.
>
> In addition, it also occurs to me now that maybe it would make sense to
> change the TupleDesc implementation to use a List of Form_pg_attribute
> instead of an array, and do away with ->natts.  This would let us change
> all "for ( ... natts ...)" loops into foreach_ptr() loops ...  there are
> only five hundred of them or so --rolls eyes--.

Open-coding stuff like this can easily work out to a loss, and I
personally think we're overly dependent on List. It's not a
particularly good abstraction, IMHO, and if we do a lot of work to
start using it everywhere because a list is really an array, then what
happens when somebody decides that a list really ought to be a
skip-list, or a hash table, or some other crazy thing?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: tablecmds.c/MergeAttributes() cleanup

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jan 12, 2024 at 5:32 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> In addition, it also occurs to me now that maybe it would make sense to
>> change the TupleDesc implementation to use a List of Form_pg_attribute
>> instead of an array, and do away with ->natts.  This would let us change
>> all "for ( ... natts ...)" loops into foreach_ptr() loops ...  there are
>> only five hundred of them or so --rolls eyes--.

> Open-coding stuff like this can easily work out to a loss, and I
> personally think we're overly dependent on List. It's not a
> particularly good abstraction, IMHO, and if we do a lot of work to
> start using it everywhere because a list is really an array, then what
> happens when somebody decides that a list really ought to be a
> skip-list, or a hash table, or some other crazy thing?

Without getting into opinions on whether List is a good abstraction,
I'm -1 on this idea.  It would be a large amount of code churn, with
attendant back-patching pain, and I just don't see that there is
much to be gained.

            regards, tom lane



Re: tablecmds.c/MergeAttributes() cleanup

From
Robert Haas
Date:
On Fri, Jan 12, 2024 at 10:09 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Open-coding stuff like this can easily work out to a loss, and I
> personally think we're overly dependent on List. It's not a
> particularly good abstraction, IMHO, and if we do a lot of work to
> start using it everywhere because a list is really an array, then what
> happens when somebody decides that a list really ought to be a
> skip-list, or a hash table, or some other crazy thing?

This paragraph was a bit garbled. I meant to say that open-coding can
be better than relying on a canned abstraction, but it came out the
other way around.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: tablecmds.c/MergeAttributes() cleanup

From
Peter Eisentraut
Date:
On 06.12.23 09:23, Peter Eisentraut wrote:
> The (now) second patch is also still of interest to me, but I have since 
> noticed that I think [0] should be fixed first, but to make that fix 
> simpler, I would like the first patch from here.
> 
> [0]: 
> https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7%40eisentraut.org

The remaining patch in this series needed a rebase and adjustment.

The above precondition still applies.

Attachment

Re: tablecmds.c/MergeAttributes() cleanup

From
Ashutosh Bapat
Date:
Hi Peter,

On Mon, Jan 22, 2024 at 6:13 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 06.12.23 09:23, Peter Eisentraut wrote:
> > The (now) second patch is also still of interest to me, but I have since
> > noticed that I think [0] should be fixed first, but to make that fix
> > simpler, I would like the first patch from here.
> >
> > [0]:
> > https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7%40eisentraut.org
>
> The remaining patch in this series needed a rebase and adjustment.
>
> The above precondition still applies.

While working on identity support and now while looking at the
compression problem you referred to, I found MergeAttribute() to be
hard to read. It's hard to follow high level logic in that function
since the function is not modular. I took a stab at modularising a
part of it. Attached is the resulting patch series.

0001 is your patch as is
0002 is pgindent fix and also fixing what I think is a typo/thinko
from 0001. If you are fine with the changes, 0002 should be merged
into 0003.
0003 separates the part of code merging a child attribute to the
corresponding inherited attribute into its own function.
0004 does the same for code merging inherited attributes incrementally.

I have kept 0003 and 0004 separate in case we pick one and not the
other. But they can be committed as a single commit.

The two new functions have some common code and some differences.
Common code is not surprising since merging attributes whether from
child definition or from inheritance parents will have common rules.
Differences are expected in cases when child attributes need to be
treated differently. But the differences may point us to some
yet-unknown bugs; compression being one of those differences. I think
the next step should combine these functions into one so that all the
logic to merge one attribute is at one place. I haven't attempted it;
wanted to propose the idea first.

I can see that this moduralization will lead to another and we will be
able to reduce MergeAttribute() to a set of function calls reflecting
its high level logic and push the detailed implementation into minion
functions like this.

--
Best Wishes,
Ashutosh Bapat

Attachment

Re: tablecmds.c/MergeAttributes() cleanup

From
Peter Eisentraut
Date:
On 24.01.24 07:27, Ashutosh Bapat wrote:
> While working on identity support and now while looking at the
> compression problem you referred to, I found MergeAttribute() to be
> hard to read. It's hard to follow high level logic in that function
> since the function is not modular. I took a stab at modularising a
> part of it. Attached is the resulting patch series.
> 
> 0001 is your patch as is
> 0002 is pgindent fix and also fixing what I think is a typo/thinko
> from 0001. If you are fine with the changes, 0002 should be merged
> into 0003.
> 0003 separates the part of code merging a child attribute to the
> corresponding inherited attribute into its own function.
> 0004 does the same for code merging inherited attributes incrementally.
> 
> I have kept 0003 and 0004 separate in case we pick one and not the
> other. But they can be committed as a single commit.

I have committed all this.  These are great improvements.

(One little change I made to your 0003 and 0004 patches is that I kept 
the check whether the new column matches an existing one by name in 
MergeAttributes().  I found that pushing that down made the logic in 
MergeAttributes() too hard to follow.  But it's pretty much the same.)




Re: tablecmds.c/MergeAttributes() cleanup

From
Alexander Lakhin
Date:
Hello Peter,

26.01.2024 16:42, Peter Eisentraut wrote:
>
> I have committed all this.  These are great improvements.
>

Please look at the segmentation fault triggered by the following query since
4d969b2f8:
CREATE TABLE t1(a text COMPRESSION pglz);
CREATE TABLE t2(a text);
CREATE TABLE t3() INHERITS(t1, t2);
NOTICE:  merging multiple inherited definitions of column "a"
server closed the connection unexpectedly
         This probably means the server terminated abnormally
         before or while processing the request.

Core was generated by `postgres: law regression [local] CREATE TABLE                                 '.
Program terminated with signal SIGSEGV, Segmentation fault.

(gdb) bt
#0  __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:116
#1  0x00005606fbcc9d52 in MergeAttributes (columns=0x0, supers=supers@entry=0x5606fe293d30, relpersistence=112 'p', 
is_partition=false, supconstr=supconstr@entry=0x7fff4046d410, supnotnulls=supnotnulls@entry=0x7fff4046d418)
     at tablecmds.c:2811
#2  0x00005606fbccd764 in DefineRelation (stmt=stmt@entry=0x5606fe26a130, relkind=relkind@entry=114 'r', ownerId=10, 
ownerId@entry=0, typaddress=typaddress@entry=0x0,
     queryString=queryString@entry=0x5606fe2695c0 "CREATE TABLE t3() INHERITS(t1, t2);") at tablecmds.c:885
...

Best regards,
Alexander



Re: tablecmds.c/MergeAttributes() cleanup

From
Ashutosh Bapat
Date:
Hi Alexander,

On Sun, Jan 28, 2024 at 1:30 PM Alexander Lakhin <exclusion@gmail.com> wrote:
>
> Hello Peter,
>
> 26.01.2024 16:42, Peter Eisentraut wrote:
> >
> > I have committed all this.  These are great improvements.
> >
>
> Please look at the segmentation fault triggered by the following query since
> 4d969b2f8:
> CREATE TABLE t1(a text COMPRESSION pglz);
> CREATE TABLE t2(a text);
> CREATE TABLE t3() INHERITS(t1, t2);
> NOTICE:  merging multiple inherited definitions of column "a"
> server closed the connection unexpectedly
>          This probably means the server terminated abnormally
>          before or while processing the request.
>
> Core was generated by `postgres: law regression [local] CREATE TABLE                                 '.
> Program terminated with signal SIGSEGV, Segmentation fault.
>
> (gdb) bt
> #0  __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:116
> #1  0x00005606fbcc9d52 in MergeAttributes (columns=0x0, supers=supers@entry=0x5606fe293d30, relpersistence=112 'p',
> is_partition=false, supconstr=supconstr@entry=0x7fff4046d410, supnotnulls=supnotnulls@entry=0x7fff4046d418)
>      at tablecmds.c:2811
> #2  0x00005606fbccd764 in DefineRelation (stmt=stmt@entry=0x5606fe26a130, relkind=relkind@entry=114 'r', ownerId=10,
> ownerId@entry=0, typaddress=typaddress@entry=0x0,
>      queryString=queryString@entry=0x5606fe2695c0 "CREATE TABLE t3() INHERITS(t1, t2);") at tablecmds.c:885

This bug existed even before the refactoring.Happens because strcmp()
is called on NULL input (t2's compression is NULL). I already have a
fix for this and will be posting it in [1].

[1] https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7%40eisentraut.org

--
Best Wishes,
Ashutosh Bapat



Re: tablecmds.c/MergeAttributes() cleanup

From
Alexander Lakhin
Date:
Hello,

30.01.2024 09:22, Ashutosh Bapat wrote:
>
>> Please look at the segmentation fault triggered by the following query since
>> 4d969b2f8:
>> CREATE TABLE t1(a text COMPRESSION pglz);
>> CREATE TABLE t2(a text);
>> CREATE TABLE t3() INHERITS(t1, t2);
>> NOTICE:  merging multiple inherited definitions of column "a"
>> server closed the connection unexpectedly
>>           This probably means the server terminated abnormally
>>           before or while processing the request.
>>
>> Core was generated by `postgres: law regression [local] CREATE TABLE                                 '.
>> Program terminated with signal SIGSEGV, Segmentation fault.
>>
>> (gdb) bt
>> #0  __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:116
>> #1  0x00005606fbcc9d52 in MergeAttributes (columns=0x0, supers=supers@entry=0x5606fe293d30, relpersistence=112 'p',
>> is_partition=false, supconstr=supconstr@entry=0x7fff4046d410, supnotnulls=supnotnulls@entry=0x7fff4046d418)
>>       at tablecmds.c:2811
>> #2  0x00005606fbccd764 in DefineRelation (stmt=stmt@entry=0x5606fe26a130, relkind=relkind@entry=114 'r',
ownerId=10,
>> ownerId@entry=0, typaddress=typaddress@entry=0x0,
>>       queryString=queryString@entry=0x5606fe2695c0 "CREATE TABLE t3() INHERITS(t1, t2);") at tablecmds.c:885
> This bug existed even before the refactoring.Happens because strcmp()
> is called on NULL input (t2's compression is NULL). I already have a
> fix for this and will be posting it in [1].
>
> [1] https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7%40eisentraut.org
>

Now that that fix is closed with RwF [1], shouldn't this crash issue be
added to Open Items for v17?
(I couldn't reproduce the crash on 4d969b2f8~1 nor on REL_16_STABLE.)

https://commitfest.postgresql.org/47/4813/

Best regards,
Alexander



Re: tablecmds.c/MergeAttributes() cleanup

From
Ashutosh Bapat
Date:


On Sat, Apr 20, 2024 at 9:30 AM Alexander Lakhin <exclusion@gmail.com> wrote:
Hello,

30.01.2024 09:22, Ashutosh Bapat wrote:
>
>> Please look at the segmentation fault triggered by the following query since
>> 4d969b2f8:
>> CREATE TABLE t1(a text COMPRESSION pglz);
>> CREATE TABLE t2(a text);
>> CREATE TABLE t3() INHERITS(t1, t2);
>> NOTICE:  merging multiple inherited definitions of column "a"
>> server closed the connection unexpectedly
>>           This probably means the server terminated abnormally
>>           before or while processing the request.
>>
>> Core was generated by `postgres: law regression [local] CREATE TABLE                                 '.
>> Program terminated with signal SIGSEGV, Segmentation fault.
>>
>> (gdb) bt
>> #0  __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:116
>> #1  0x00005606fbcc9d52 in MergeAttributes (columns=0x0, supers=supers@entry=0x5606fe293d30, relpersistence=112 'p',
>> is_partition=false, supconstr=supconstr@entry=0x7fff4046d410, supnotnulls=supnotnulls@entry=0x7fff4046d418)
>>       at tablecmds.c:2811
>> #2  0x00005606fbccd764 in DefineRelation (stmt=stmt@entry=0x5606fe26a130, relkind=relkind@entry=114 'r', ownerId=10,
>> ownerId@entry=0, typaddress=typaddress@entry=0x0,
>>       queryString=queryString@entry=0x5606fe2695c0 "CREATE TABLE t3() INHERITS(t1, t2);") at tablecmds.c:885
> This bug existed even before the refactoring.Happens because strcmp()
> is called on NULL input (t2's compression is NULL). I already have a
> fix for this and will be posting it in [1].
>
> [1] https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7%40eisentraut.org
>

Now that that fix is closed with RwF [1], shouldn't this crash issue be
added to Open Items for v17?
(I couldn't reproduce the crash on 4d969b2f8~1 nor on REL_16_STABLE.)

https://commitfest.postgresql.org/47/4813/

Yes please. Probably this issue surfaced again after we reverted compression and storage fix? Please  If that's the case, please add it to the open items.

--
Best Wishes,
Ashutosh Bapat

Re: tablecmds.c/MergeAttributes() cleanup

From
Robert Haas
Date:
On Sat, Apr 20, 2024 at 12:17 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> Yes please. Probably this issue surfaced again after we reverted compression and storage fix? Please  If that's the
case,please add it to the open items. 

This is still on the open items list and I'm not clear who, if anyone,
is working on fixing it.

It would be good if someone fixed it. :-)

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: tablecmds.c/MergeAttributes() cleanup

From
Ashutosh Bapat
Date:


On Mon, Apr 29, 2024 at 6:46 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Apr 20, 2024 at 12:17 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> Yes please. Probably this issue surfaced again after we reverted compression and storage fix? Please  If that's the case, please add it to the open items.

This is still on the open items list and I'm not clear who, if anyone,
is working on fixing it.

It would be good if someone fixed it. :-)

Here's a patch fixing it.
 
I have added the reproducer provided by Alexander as a test. I thought of improving that test further to test the compression of the inherited table but did not implement it since we haven't documented the behaviour of compression with inheritance. Defining and implementing compression behaviour for inherited tables was the goal of 0413a556990ba628a3de8a0b58be020fd9a14ed0, which has been reverted.

--
Best Wishes,
Ashutosh Bapat
Attachment

Re: tablecmds.c/MergeAttributes() cleanup

From
Robert Haas
Date:
On Tue, Apr 30, 2024 at 2:19 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Mon, Apr 29, 2024 at 6:46 PM Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sat, Apr 20, 2024 at 12:17 AM Ashutosh Bapat
>> <ashutosh.bapat.oss@gmail.com> wrote:
>> > Yes please. Probably this issue surfaced again after we reverted compression and storage fix? Please  If that's
thecase, please add it to the open items. 
>>
>> This is still on the open items list and I'm not clear who, if anyone,
>> is working on fixing it.
>>
>> It would be good if someone fixed it. :-)
>
> Here's a patch fixing it.
>
> I have added the reproducer provided by Alexander as a test. I thought of improving that test further to test the
compressionof the inherited table but did not implement it since we haven't documented the behaviour of compression
withinheritance. Defining and implementing compression behaviour for inherited tables was the goal of
0413a556990ba628a3de8a0b58be020fd9a14ed0,which has been reverted. 

I took a look at this patch. Currently this case crashes:

CREATE TABLE cmdata(f1 text COMPRESSION pglz);
CREATE TABLE cmdata3(f1 text);
CREATE TABLE cminh() INHERITS (cmdata, cmdata3);

The patch makes this succeed, but I was initially unclear why it
didn't make it fail with an error instead: you can argue that cmdata
has pglz and cmdata3 has default and those are different. It seems
that prior precedent goes both ways -- we treat the absence of a
STORAGE specification as STORAGE EXTENDED and it conflicts with an
explicit storage specification on some other inheritance parent - but
on the other hand, we treat the absence of a default as compatible
with any explicit default, similar to what happens here. But I
eventually realized that you're just putting back behavior that we had
in previous releases: pre-v17, the code already works the way this
patch makes it do, and MergeChildAttribute() is already coded similar
to this. As Alexander Lakhin said upthread, 4d969b2f8 seems to have
broken this.

So now I think this is committable, but I can't do it now because I
won't be around for the next few hours in case the buildfarm blows up.
I can do it tomorrow, or perhaps Peter would like to handle it since
it seems to have been his commit that introduced the issue.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: tablecmds.c/MergeAttributes() cleanup

From
Peter Eisentraut
Date:
On 30.04.24 21:48, Robert Haas wrote:
> I took a look at this patch. Currently this case crashes:
> 
> CREATE TABLE cmdata(f1 text COMPRESSION pglz);
> CREATE TABLE cmdata3(f1 text);
> CREATE TABLE cminh() INHERITS (cmdata, cmdata3);
> 
> The patch makes this succeed, but I was initially unclear why it
> didn't make it fail with an error instead: you can argue that cmdata
> has pglz and cmdata3 has default and those are different. It seems
> that prior precedent goes both ways -- we treat the absence of a
> STORAGE specification as STORAGE EXTENDED and it conflicts with an
> explicit storage specification on some other inheritance parent - but
> on the other hand, we treat the absence of a default as compatible
> with any explicit default, similar to what happens here.

The actual behavior here is arguably not ideal.  It was the purpose of 
the other thread mentioned upthread to improve that, but that was not 
successful for the time being.

> So now I think this is committable, but I can't do it now because I
> won't be around for the next few hours in case the buildfarm blows up.
> I can do it tomorrow, or perhaps Peter would like to handle it since
> it seems to have been his commit that introduced the issue.

I have committed it now.




Re: tablecmds.c/MergeAttributes() cleanup

From
Ashutosh Bapat
Date:


On Fri, May 3, 2024 at 2:47 PM Peter Eisentraut <peter@eisentraut.org> wrote:
On 30.04.24 21:48, Robert Haas wrote:
> I took a look at this patch. Currently this case crashes:
>
> CREATE TABLE cmdata(f1 text COMPRESSION pglz);
> CREATE TABLE cmdata3(f1 text);
> CREATE TABLE cminh() INHERITS (cmdata, cmdata3);
>
> The patch makes this succeed, but I was initially unclear why it
> didn't make it fail with an error instead: you can argue that cmdata
> has pglz and cmdata3 has default and those are different. It seems
> that prior precedent goes both ways -- we treat the absence of a
> STORAGE specification as STORAGE EXTENDED and it conflicts with an
> explicit storage specification on some other inheritance parent - but
> on the other hand, we treat the absence of a default as compatible
> with any explicit default, similar to what happens here.

The actual behavior here is arguably not ideal.  It was the purpose of
the other thread mentioned upthread to improve that, but that was not
successful for the time being.

> So now I think this is committable, but I can't do it now because I
> won't be around for the next few hours in case the buildfarm blows up.
> I can do it tomorrow, or perhaps Peter would like to handle it since
> it seems to have been his commit that introduced the issue.

I have committed it now.


Thanks Peter.

--
Best Wishes,
Ashutosh Bapat