Separate the attribute physical order from logical order - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Separate the attribute physical order from logical order
Date
Msg-id 20220628083230.c4qhp4fdp3dp22ig@jrouhaud
Whole thread Raw
Responses Re: Separate the attribute physical order from logical order
Re: Separate the attribute physical order from logical order
List pgsql-hackers
(Starting a new thread)

On Sun, Jun 26, 2022 at 10:48:24AM +0800, Julien Rouhaud wrote:
> On Thu, Jun 23, 2022 at 10:19:44AM -0400, Robert Haas wrote:
> > On Thu, Jun 23, 2022 at 6:13 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > > And should record_in / record_out use the logical position, as in:
> > > SELECT ab::text FROM ab / SELECT (a, b)::ab;
> > >
> > > I would think not, as relying on a possibly dynamic order could break things if
> > > you store the results somewhere, but YMMV.
> >
> > I think here the answer is yes again. I mean, consider that you could
> > also ALTER TABLE DROP COLUMN and then ALTER TABLE ADD COLUMN with the
> > same name. That is surely going to affect the meaning of such things.
> > I don't think we want to have one meaning if you reorder things that
> > way and a different meaning if you reorder things using whatever
> > commands we create for changing the display column positions.
>
> It indeed would, but ALTER TABLE DROP COLUMN is a destructive operation, and
> I'm assuming that anyone doing that is aware that it will have an impact on
> stored data and such.  I initially thought that changing the display order of
> columns shouldn't have the same impact with the stability of otherwise
> unchanged record definition, as it would make such reorder much more impacting.
> But I agree that having different behaviors seems worse.
>
> > > Then, what about joinrels expansion?  I learned that the column ordering rules
> > > are far from being obvious, and I didn't find those in the documentation (note
> > > that I don't know if that something actually described in the SQL standard).
> > > So for instance, if a join is using an explicit USING clause rather than an ON
> > > clause, the merged columns are expanded first, so:
> > >
> > > SELECT * FROM ab ab1 JOIN ab ab2 USING (b)
> > >
> > > should unexpectedly expand to (b, a, a).  Is this order a strict requirement?
> >
> > I dunno, but I can't see why it creates a problem for this patch to
> > maintain the current behavior. I mean, just use the logical column
> > position instead of the physical one here and forget about the details
> > of how it works beyond that.
>
> I'm not that familiar with this part of the code so I may have missed
> something, but I didn't see any place where I could just simply do that.
>
> To be clear, the approach I used is to change the expansion ordering but
> otherwise keep the current behavior, to try to minimize the changes.  This is
> done by keeping the attribute in the physical ordering pretty much everywhere,
> including in the nsitems, and just logically reorder them during the expansion.
> In other words all the code still knows that the 1st column is the first
> physical column and so on.
>
> So in that query, the ordering is supposed to happen when handling the "SELECT
> *", which makes it impossible to retain that order.
>
> I'm assuming that what you meant is to change the ordering when processing the
> JOIN and retain the old "SELECT *" behavior, which is to emit items in the
> order they're found.  But IIUC the only way to do that would be to change the
> order when building the nsitems themselves, and have the code believe that the
> attributes are physically stored in the logical order.  That's probably doable,
> but that looks like a way more impacting change.  Or did you mean to keep the
> approach I used, and just have some special case for "SELECT *" when referring
> to a joinrel and instead try to handle the logical expansion in the join?
> AFAICS it would require to add some extra info in the parsing structures, as it
> doesn't really really store any position, just relies on array offset / list
> position and maps things that way.

So, assuming that the current JOIN expansion order shouldn't be changed, I
implemented the last approach I mentioned.  As expected, it requires some extra
information in the parsing structures.  In the attached patch I added an array
in the ParseNamespaceItem struct (p_mappings) to map the logical / physical
positions, and iterate over that array when processing the JOIN in
transformFromClauseItem to emit the same tuples as if no logical order were
defined.  Also, expandNSItemAttrs() now needs to know that when an RTE_JOIN is
expanded, to keep the original order.

While at it I also fixed the column list that get automatically generated when
deparsing a view if the original query didn't had any alias but some DDL is
later executed (like renaming one of the column) making this column list
necessary.  This isn't problematic except in one case: functions returning
(setof) tables.  For this, I also need to save a array to map the physical /
logical positions but as far as I can see I need to save it in the
RangeTblEntry, only for RTE_FUNCTION, which is serialized in pg_rewrite so that
the deparsing can emit the correct order even if the attribute positions
changed between the view creation and the deparsing.  This also works well but
feels really hackish.

With those changes, the create_view.sql test now entirely works (except some
error message referencing a physical position).  There are still a lot of other
tests that fail, and I didn't really dig into all of them to know if that's
something normal or just some other places that needs to be fixed.

As I mentioned in my first email, I'm a bit doubtful about this approach in
general, so I'm looking for some feedback on it before investigating too much
time implementing something that would never be close to committable.
>
> > > Another problem (that probably wouldn't be a problem for system catalogs) is
> > > that defaults are evaluated in the physical position.  This example from the
> > > regression test will clearly have a different behavior if the columns are in a
> > > different physical order:
> > >
> > >  CREATE TABLE INSERT_TBL (
> > >         x INT DEFAULT nextval('insert_seq'),
> > >         y TEXT DEFAULT '-NULL-',
> > >         z INT DEFAULT -1 * currval('insert_seq'),
> > >         CONSTRAINT INSERT_TBL_CON CHECK (x >= 3 AND y <> 'check failed' AND x < 8),
> > >         CHECK (x + z = 0));
> > >
> > > But changing the behavior to rely on the logical position seems quite
> > > dangerous.
> >
> > Why?
>
> It feels to me like a POLA violation, and probably people wouldn't expect it to
> behave this way (even if this is clearly some corner case problem).  Even if
> you argue that this is not simply a default display order but something more
> like real column order, the physical position being some implementation detail,
> it still doesn't really feels right.
>
> The main reason for having the possibility to change the logical position is to
> have "better looking", easier to work with, relations even if you have some
> requirements with the real physical order like trying to optimize things as
> much as possible (reordering columns to avoid padding space, put non-nullable
> columns first...).  The order in which defaults are evaluated looks like the
> same kind of requirements.  How useful would it be if you could chose a logical
> order, but not being able to chose the one you actually want because it would
> break your default values?
>
> Anyway, per the nearby discussions I don't see much interest, especially not in
> the context of varlena identifiers (I should have started a different thread,
> sorry about that), so I don't think it's worth investing more efforts into it.

Attachment

pgsql-hackers by date:

Previous
From: gkokolatos@pm.me
Date:
Subject: Re: Add LZ4 compression in pg_dump
Next
From: Alvaro Herrera
Date:
Subject: Re: Separate the attribute physical order from logical order