Re: NAMEDATALEN increase because of non-latin languages - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: NAMEDATALEN increase because of non-latin languages
Date
Msg-id 20220626024824.qnlpp6vikzjvuxs3@jrouhaud
Whole thread Raw
In response to Re: NAMEDATALEN increase because of non-latin languages  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: NAMEDATALEN increase because of non-latin languages
List pgsql-hackers
On Thu, Jun 23, 2022 at 10:19:44AM -0400, Robert Haas wrote:
> On Thu, Jun 23, 2022 at 6:13 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> > And should record_in / record_out use the logical position, as in:
> > SELECT ab::text FROM ab / SELECT (a, b)::ab;
> >
> > I would think not, as relying on a possibly dynamic order could break things if
> > you store the results somewhere, but YMMV.
>
> I think here the answer is yes again. I mean, consider that you could
> also ALTER TABLE DROP COLUMN and then ALTER TABLE ADD COLUMN with the
> same name. That is surely going to affect the meaning of such things.
> I don't think we want to have one meaning if you reorder things that
> way and a different meaning if you reorder things using whatever
> commands we create for changing the display column positions.

It indeed would, but ALTER TABLE DROP COLUMN is a destructive operation, and
I'm assuming that anyone doing that is aware that it will have an impact on
stored data and such.  I initially thought that changing the display order of
columns shouldn't have the same impact with the stability of otherwise
unchanged record definition, as it would make such reorder much more impacting.
But I agree that having different behaviors seems worse.

> > Then, what about joinrels expansion?  I learned that the column ordering rules
> > are far from being obvious, and I didn't find those in the documentation (note
> > that I don't know if that something actually described in the SQL standard).
> > So for instance, if a join is using an explicit USING clause rather than an ON
> > clause, the merged columns are expanded first, so:
> >
> > SELECT * FROM ab ab1 JOIN ab ab2 USING (b)
> >
> > should unexpectedly expand to (b, a, a).  Is this order a strict requirement?
>
> I dunno, but I can't see why it creates a problem for this patch to
> maintain the current behavior. I mean, just use the logical column
> position instead of the physical one here and forget about the details
> of how it works beyond that.

I'm not that familiar with this part of the code so I may have missed
something, but I didn't see any place where I could just simply do that.

To be clear, the approach I used is to change the expansion ordering but
otherwise keep the current behavior, to try to minimize the changes.  This is
done by keeping the attribute in the physical ordering pretty much everywhere,
including in the nsitems, and just logically reorder them during the expansion.
In other words all the code still knows that the 1st column is the first
physical column and so on.

So in that query, the ordering is supposed to happen when handling the "SELECT
*", which makes it impossible to retain that order.

I'm assuming that what you meant is to change the ordering when processing the
JOIN and retain the old "SELECT *" behavior, which is to emit items in the
order they're found.  But IIUC the only way to do that would be to change the
order when building the nsitems themselves, and have the code believe that the
attributes are physically stored in the logical order.  That's probably doable,
but that looks like a way more impacting change.  Or did you mean to keep the
approach I used, and just have some special case for "SELECT *" when referring
to a joinrel and instead try to handle the logical expansion in the join?
AFAICS it would require to add some extra info in the parsing structures, as it
doesn't really really store any position, just relies on array offset / list
position and maps things that way.

> > Another problem (that probably wouldn't be a problem for system catalogs) is
> > that defaults are evaluated in the physical position.  This example from the
> > regression test will clearly have a different behavior if the columns are in a
> > different physical order:
> >
> >  CREATE TABLE INSERT_TBL (
> >         x INT DEFAULT nextval('insert_seq'),
> >         y TEXT DEFAULT '-NULL-',
> >         z INT DEFAULT -1 * currval('insert_seq'),
> >         CONSTRAINT INSERT_TBL_CON CHECK (x >= 3 AND y <> 'check failed' AND x < 8),
> >         CHECK (x + z = 0));
> >
> > But changing the behavior to rely on the logical position seems quite
> > dangerous.
>
> Why?

It feels to me like a POLA violation, and probably people wouldn't expect it to
behave this way (even if this is clearly some corner case problem).  Even if
you argue that this is not simply a default display order but something more
like real column order, the physical position being some implementation detail,
it still doesn't really feels right.

The main reason for having the possibility to change the logical position is to
have "better looking", easier to work with, relations even if you have some
requirements with the real physical order like trying to optimize things as
much as possible (reordering columns to avoid padding space, put non-nullable
columns first...).  The order in which defaults are evaluated looks like the
same kind of requirements.  How useful would it be if you could chose a logical
order, but not being able to chose the one you actually want because it would
break your default values?

Anyway, per the nearby discussions I don't see much interest, especially not in
the context of varlena identifiers (I should have started a different thread,
sorry about that), so I don't think it's worth investing more efforts into it.



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Backends stunk in wait event IPC/MessageQueueInternal
Next
From: Andres Freund
Date:
Subject: Re: NAMEDATALEN increase because of non-latin languages