Re: NAMEDATALEN increase because of non-latin languages - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: NAMEDATALEN increase because of non-latin languages |
Date | |
Msg-id | 20220623101155.3dljtwradu7eik6g@jrouhaud Whole thread Raw |
In response to | Re: NAMEDATALEN increase because of non-latin languages (John Naylor <john.naylor@enterprisedb.com>) |
Responses |
Re: NAMEDATALEN increase because of non-latin languages
|
List | pgsql-hackers |
Hi, On Fri, Jun 03, 2022 at 01:28:16PM +0700, John Naylor wrote: > > I wanted to revive this thread to summarize what was discussed and get > a sense of next steps we could take. Thanks for reviving this topic! > The idea that gained the most traction is to make identifiers > variable-length in the catalogs, which has the added benefit of > reducing memory in syscaches in the common case. That presented two > challenges: > > 1. That would require putting the name physically closer to the end of > the column list. To make this less annoying for users, we'd need to > separate physical order from display order (at least/at first only for > system catalogs). This would require: > > - changing star expansion in SELECTs (expandRTE etc) > - adjusting pg_dump, \d, etc I tried to work on a physical / logical attribute position patch (more at the end of the mail). I based it on Álvaro's feedback at [1], which is: - rename current pg_attribute.attnum to attphysnum - add a new pg_attribute.attnum for the now logical position - tweak a few places to use attnum rather than attphysnum (which however isn't what I did) The idea, IIUC, is to make a physical position reordering entirely transparent to clients. However, once you change the star expansion, things start to subtly break in a lot of places. I try to dig in a bit to see if things could be made to work the same but it opened a whole can of worms. So before spending too much time on something that could be a non-starter I'd like to get some feedback on the general approach. While some problem wouldn't happen if we restricted the feature to system catalogs only (e.g. with renamed / dropped attributes, inheritance...), a lot would still exist and would have to be dealt with initially. However I'm not sure what behavior would be wanted or acceptable, especially if we want something that can eventually be used for user relations too, with possibly dynamic positions. I'll describe some of those problems, and just to make things easier I will use a normal table "ab" with 2 attributes, a and b, with their physical / logical position reversed. Obviously SELECT * FROM ab should return a and b in that order. The aliases should also probably match the logical position, as in: SELECT * FROM ab ab(aa, bb) should probably map aa to a and bb to b. But should COPY FROM / COPY TO / INSERT INTO use the logical position too if not column list is provided? I'd think so, but it goes further from the original "only handle star expansion" idea. And should record_in / record_out use the logical position, as in: SELECT ab::text FROM ab / SELECT (a, b)::ab; I would think not, as relying on a possibly dynamic order could break things if you store the results somewhere, but YMMV. Note that there a variations of that, like SELECT ab::ab FROM (SELECT * FROM ab) ab; But those should probably work as intended (for now it doesn't) as you can't store a bare record, and storing a plain "ab" type wouldn't be problematic with dynamic changes. Then, what about joinrels expansion? I learned that the column ordering rules are far from being obvious, and I didn't find those in the documentation (note that I don't know if that something actually described in the SQL standard). So for instance, if a join is using an explicit USING clause rather than an ON clause, the merged columns are expanded first, so: SELECT * FROM ab ab1 JOIN ab ab2 USING (b) should unexpectedly expand to (b, a, a). Is this order a strict requirement? For now I sort the expanded columns by (varno, attnum) but that's clearly a noticeable change. Another problem (that probably wouldn't be a problem for system catalogs) is that defaults are evaluated in the physical position. This example from the regression test will clearly have a different behavior if the columns are in a different physical order: CREATE TABLE INSERT_TBL ( x INT DEFAULT nextval('insert_seq'), y TEXT DEFAULT '-NULL-', z INT DEFAULT -1 * currval('insert_seq'), CONSTRAINT INSERT_TBL_CON CHECK (x >= 3 AND y <> 'check failed' AND x < 8), CHECK (x + z = 0)); But changing the behavior to rely on the logical position seems quite dangerous. I'm also attaching some few patches as a basis for an implementation discussion. Those are just POC level prototypes with the bare minimum changes to try to have a behavior based on the logical position, and absolutely no effort to optimize anything. I'm also not familiar with all the impacted places, so I'm definitely not claiming that this is a good approach. Any suggestion on a better approach is welcome. 0001 is the s/attphysnum/attnum/ suggested by Álvaro. Here again it's the bare minimum to make the code compile, a lot of the code still reference attnum in some ways (attnameAttnum(), variables like attnums and so on) that would need to be cleaned up. 0002 is the addition of the new pg_attribute.attnum with some changes trying to make things work, but no infrastructure to change it anywhere. The relevant changes are there. In expandNSItemAttrs and some other places, the difference is that the used TupleDesc is first copied and sorted by logical position, and then the attphysnum attributes are used rather than a loop counter, or the targetlist are sorted and then the resno" are computed . Other places (only the relation columns aliases I think for now) also builds an array for the physical / logical mappings,. I also changed psql to display the column in logical position, and emit an extra line with the physical position in the verbose mode, but that's a clearly poor design which would need a lot more thoughts. No changes in pg_dump or any other tool. 0003 adds some infrastructure to specify the logical position for system catalogs. It also moves relname and attname near the end of the non-varlena part of pg_class / pg_attribute (just before the end so no change is needed in the *_FIXED_PART_SIZE macros) I also added some patches I used for testing: 0004 automatically reverses relation attributes order and preserves the original logical position, so you can run the whole regression tests and see what breaks. 0005 adds an ORDER (column_name, [...]) clause in the simplest CREATE TABLE command form to declare the logical order and manually test things, something like: CREATE TABLE ab (b int, a int) ORDER (a, b); [1] https://www.postgresql.org/message-id/202108181639.xjuovrpwgkr2@alvherre.pgsql
Attachment
pgsql-hackers by date: