Thread: NAMEDATALEN increase because of non-latin languages
Hello dear hackers. I understand the position of the developers community about NAMEDATALEN length - and, in fact, 63 bytes is more than enough - but only if we speak about latin languages.
Postgresql has wonderful support for unicode in table and column names. And it looks like very good idea to create table with names on native language for databases across the world. But when I want to create, for example, table with name "Catalog_Контрагенты_КонтактнаяИнформация" (that stands in Russian for catalog of counteragent contacts) it will be auto-shrinked to "Catalog_Контрагенты_КонтактнаяИнформ". And this is not a fictional problem - many words in Russian are just longer than it's English counterparts and I have many examples like this.
Although recompiling the source is not so hard, updating is hard. I know that is not free for disk space because of storing table names and field names but, from my point of view, in 2021 year convenience is more important than disk space.
I ask you to consider increasing NAMEDATALEN for maybe 128 bytes in future releases.
Sorry for wasted time for this message if this topic is not match with direction of postgresql development (and thank you for your hard work)
On Wed, Aug 18, 2021 at 7:08 PM Денис Романенко <deromanenko@gmail.com> wrote: > > Hello dear hackers. I understand the position of the developers community about NAMEDATALEN length - and, in fact, 63 bytesis more than enough - but only if we speak about latin languages. > > Postgresql has wonderful support for unicode in table and column names. And it looks like very good idea to create tablewith names on native language for databases across the world. But when I want to create, for example, table with name"Catalog_Контрагенты_КонтактнаяИнформация" (that stands in Russian for catalog of counteragent contacts) it will be auto-shrinkedto "Catalog_Контрагенты_КонтактнаяИнформ". And this is not a fictional problem - many words in Russian are justlonger than it's English counterparts and I have many examples like this. > > Although recompiling the source is not so hard, updating is hard. I know that is not free for disk space because of storingtable names and field names but, from my point of view, in 2021 year convenience is more important than disk space. Unfortunately, the problem isn't really the additional disk space it would require. The problem is the additional performance hit and memory overhead, as the catalog names are part of the internal syscache. I understand your frustration, but given those problems I don't think that postgres will increase the default NAMEDATALEN value any time soon, even though it's in contradiction with the SQL standard.
Em qua., 18 de ago. de 2021 às 08:08, Денис Романенко <deromanenko@gmail.com> escreveu:
Hello dear hackers. I understand the position of the developers community about NAMEDATALEN length - and, in fact, 63 bytes is more than enough - but only if we speak about latin languages.Postgresql has wonderful support for unicode in table and column names. And it looks like very good idea to create table with names on native language for databases across the world. But when I want to create, for example, table with name "Catalog_Контрагенты_КонтактнаяИнформация" (that stands in Russian for catalog of counteragent contacts) it will be auto-shrinked to "Catalog_Контрагенты_КонтактнаяИнформ". And this is not a fictional problem - many words in Russian are just longer than it's English counterparts and I have many examples like this.Although recompiling the source is not so hard, updating is hard. I know that is not free for disk space because of storing table names and field names but, from my point of view, in 2021 year convenience is more important than disk space.I ask you to consider increasing NAMEDATALEN for maybe 128 bytes in future releases.
+1 once that Oracle Database 12.2 and higher, has support for 128 bytes names.
What possibly, in the future, could impact some migration from Oracle to Postgres.
regards,
Ranier Vilela
On Wed, Aug 18, 2021 at 7:15 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> Unfortunately, the problem isn't really the additional disk space it
> would require. The problem is the additional performance hit and
> memory overhead, as the catalog names are part of the internal
> syscache.
Some actual numbers on recent hardware would show what kind of tradeoff is involved. No one has done that for a long time that I recall.
--
John Naylor
EDB: http://www.enterprisedb.com
On Wed, Aug 18, 2021 at 7:27 PM John Naylor <john.naylor@enterprisedb.com> wrote: > > On Wed, Aug 18, 2021 at 7:15 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > Unfortunately, the problem isn't really the additional disk space it > > would require. The problem is the additional performance hit and > > memory overhead, as the catalog names are part of the internal > > syscache. > > Some actual numbers on recent hardware would show what kind of tradeoff is involved. No one has done that for a long timethat I recall. Agreed, but I don't have access to such hardware. However this won't influence the memory overhead part, and there is already frequent problems with that, especially since declarative partitioning, so I don't see how we could afford that without some kind of cache TTL or similar. AFAIR the last discussion about it a few years ago didn't lead anywhere :(
I don't very close with PG testing methodology, but I can pay for a server (virtual or dedicated, DO maybe) and give access to it, if anyone has time for that.
Or if someone describes to me steps and shows where to look - I can do it by myself.
Could we just make the limitation to be 64 (or 128) _characters_ not _bytes_ ? Memory sizes and processor speeds have grown by order(s) of magnitude since the 64 byte limit was decided and supporting non-ASCII charsets properly seems like a prudent thing to do. Also - have we checked that at least the truncation does not cut utf-8 characters in half ? ----- Hannu Krosing Google Cloud - We have a long list of planned contributions and we are hiring. Contact me if interested. On Wed, Aug 18, 2021 at 1:33 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Wed, Aug 18, 2021 at 7:27 PM John Naylor > <john.naylor@enterprisedb.com> wrote: > > > > On Wed, Aug 18, 2021 at 7:15 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > Unfortunately, the problem isn't really the additional disk space it > > > would require. The problem is the additional performance hit and > > > memory overhead, as the catalog names are part of the internal > > > syscache. > > > > Some actual numbers on recent hardware would show what kind of tradeoff is involved. No one has done that for a longtime that I recall. > > Agreed, but I don't have access to such hardware. However this won't > influence the memory overhead part, and there is already frequent > problems with that, especially since declarative partitioning, so I > don't see how we could afford that without some kind of cache TTL or > similar. AFAIR the last discussion about it a few years ago didn't > lead anywhere :( > >
On Wed, Aug 18, 2021 at 7:33 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > Some actual numbers on recent hardware would show what kind of tradeoff is involved. No one has done that for a long time that I recall.
>
> Agreed, but I don't have access to such hardware. However this won't
Well, by "recent" I had in mind something more recent than 2002, which is the time where I see a lot of hits in the archives if you search for this topic.
> influence the memory overhead part, and there is already frequent
> problems with that, especially since declarative partitioning, so I
> problems with that, especially since declarative partitioning, so I
That's a fair point.
> don't see how we could afford that without some kind of cache TTL or
> similar. AFAIR the last discussion about it a few years ago didn't
> lead anywhere :(
If you mean the thread "Protect syscache from bloating with negative cache entries", it had activity earlier this year, so I wouldn't give up hope just yet. Progress has been slow, so I'll see about putting some effort into that after concluding my attempt to speed up the syscaches first [1].
The main thing I'm worried about is the fact that a name would no longer fit in a Datum. The rest I think we can mitigate in some way.
On Wed, Aug 18, 2021 at 8:03 AM Hannu Krosing <hannuk@google.com> wrote:
>
> Could we just make the limitation to be 64 (or 128) _characters_ not _bytes_ ?
That couldn't work because characters are variable length. The limit has to be a fixed length in bytes so we can quickly compute offsets in the attribute tuple.
--
John Naylor
EDB: http://www.enterprisedb.com
>
> Could we just make the limitation to be 64 (or 128) _characters_ not _bytes_ ?
That couldn't work because characters are variable length. The limit has to be a fixed length in bytes so we can quickly compute offsets in the attribute tuple.
--
John Naylor
EDB: http://www.enterprisedb.com
On Wed, Aug 18, 2021 at 8:04 PM John Naylor <john.naylor@enterprisedb.com> wrote: > > > > > Agreed, but I don't have access to such hardware. However this won't > > Well, by "recent" I had in mind something more recent than 2002, which is the time where I see a lot of hits in the archivesif you search for this topic. Yeah, but my current laptop has a tendency to crash after a few minute if I stress it too much, so I'm still out. > If you mean the thread "Protect syscache from bloating with negative cache entries", it had activity earlier this year,so I wouldn't give up hope just yet. Yes that's the thread I was thinking about. I'm not giving up hope either, but I also don't see it being solved for v15. > The main thing I'm worried about is the fact that a name would no longer fit in a Datum. The rest I think we can mitigatein some way. Agreed.
On Wed, Aug 18, 2021 at 8:03 PM Hannu Krosing <hannuk@google.com> wrote: > > Also - have we checked that at least the truncation does not cut utf-8 > characters in half ? Yes, same for all other places that can truncate text (like the query text in pg_stat_activity and similar). See usage of pg_mbcliplen() in truncate_identifier.
On Wed, 2021-08-18 at 08:16 -0300, Ranier Vilela wrote: > Em qua., 18 de ago. de 2021 às 08:08, Денис Романенко <deromanenko@gmail.com> escreveu: > > Hello dear hackers. I understand the position of the developers community about > > NAMEDATALEN length - and, in fact, 63 bytes is more than enough - but only if we > > speak about latin languages. > > > > Postgresql has wonderful support for unicode in table and column names. And it > > looks like very good idea to create table with names on native language for > > databases across the world. But when I want to create, for example, table with > > name "Catalog_Контрагенты_КонтактнаяИнформация" (that stands in Russian for > > catalog of counteragent contacts) it will be auto-shrinked to > > "Catalog_Контрагенты_КонтактнаяИнформ". And this is not a fictional problem - > > many words in Russian are just longer than it's English counterparts and I > > have many examples like this. > > > > Although recompiling the source is not so hard, updating is hard. I know that > > is not free for disk space because of storing table names and field names but, > > from my point of view, in 2021 year convenience is more important than disk space. > > > > I ask you to consider increasing NAMEDATALEN for maybe 128 bytes in future releases. My stance here is that you should always use ASCII only for database identifiers, not only because of this, but also to avoid unpleasant encoding problems if you want to do something like pg_dump -t Catalog_Контрагенты_КонтактнаяИнформация mydb on a shell with an encoding different from the database encoding. So I am not too excited about this. > +1 once that Oracle Database 12.2 and higher, has support for 128 bytes names. > What possibly, in the future, could impact some migration from Oracle to Postgres. That seems to be a better argument from my point of view. I have no idea as to how bad the additional memory impact for the catalog caches would be... Yours, Laurenz Albe
Em qua., 18 de ago. de 2021 às 09:33, Laurenz Albe <laurenz.albe@cybertec.at> escreveu:
On Wed, 2021-08-18 at 08:16 -0300, Ranier Vilela wrote:
> Em qua., 18 de ago. de 2021 às 08:08, Денис Романенко <deromanenko@gmail.com> escreveu:
> > Hello dear hackers. I understand the position of the developers community about
> > NAMEDATALEN length - and, in fact, 63 bytes is more than enough - but only if we
> > speak about latin languages.
> >
> > Postgresql has wonderful support for unicode in table and column names. And it
> > looks like very good idea to create table with names on native language for
> > databases across the world. But when I want to create, for example, table with
> > name "Catalog_Контрагенты_КонтактнаяИнформация" (that stands in Russian for
> > catalog of counteragent contacts) it will be auto-shrinked to
> > "Catalog_Контрагенты_КонтактнаяИнформ". And this is not a fictional problem -
> > many words in Russian are just longer than it's English counterparts and I
> > have many examples like this.
> >
> > Although recompiling the source is not so hard, updating is hard. I know that
> > is not free for disk space because of storing table names and field names but,
> > from my point of view, in 2021 year convenience is more important than disk space.
> >
> > I ask you to consider increasing NAMEDATALEN for maybe 128 bytes in future releases.
My stance here is that you should always use ASCII only for database identifiers,
not only because of this, but also to avoid unpleasant encoding problems if
you want to do something like
pg_dump -t Catalog_Контрагенты_КонтактнаяИнформация mydb
on a shell with an encoding different from the database encoding.
So I am not too excited about this.
> +1 once that Oracle Database 12.2 and higher, has support for 128 bytes names.
> What possibly, in the future, could impact some migration from Oracle to Postgres.
That seems to be a better argument from my point of view.
I have no idea as to how bad the additional memory impact for the catalog
caches would be...
It seems to me that this is a case for macro:
HAS_SUPPORT_NAME_128_BYTES
Деnis Романенко would like and would pay the price for regression in exchange for the convenience.
What impacts him now is the difficulty of maintaining a private tree, with this support.
HAS_SUPPORT_NAME_128_BYTES
Деnis Романенко would like and would pay the price for regression in exchange for the convenience.
What impacts him now is the difficulty of maintaining a private tree, with this support.
regards,
Ranier Vilela
John Naylor <john.naylor@enterprisedb.com> writes: > The main thing I'm worried about is the fact that a name would no longer > fit in a Datum. The rest I think we can mitigate in some way. Not sure what you mean by that? name is a pass-by-ref data type. Anyway, this whole argument could be rendered moot if we could convert name to a variable-length type. That would satisfy *both* sides of the argument, since those who need long names could have them, while those who don't would see net reduction instead of growth in space usage. Of course, this is far far easier said than done; else we would have done it years ago. But maybe it's not entirely out of reach. I do not think it'd be hard to change "name" to have the same on-disk storage representation as cstring; the hard part is what about its usage in fixed-width catalog structures. Maybe we could finesse that by decreeing that the name column always has to be the last non-CATALOG_VARLEN field. (This would require fixing up the couple of places where we let some other var-width field have that distinction; but I suspect that would be small in comparison to the other work this implies. If there are any catalogs having two name columns, one of them would become more difficult to reach from C code.) Another fun thing --- and, I fear, another good argument against just raising NAMEDATALEN --- is what about TupleDescs, which last I checked used an array of fixed-width pg_attribute images. But maybe we could replace that with an array of pointers. Andres already did a lot of the heavy code churn required to hide that data structure behind TupleDescAttr() macros, so changing the representation should be much less painful than it would once have been. I wonder if we'd get complaints from changing the catalog column layouts that much. People are used to the name at the front, I think. OTOH, I expected a lot of bleating about the OID column becoming frontmost, but there hasn't been much. Anyway, I have little desire to work on this myself, but I recommend that somebody who is more affected by the name length restriction look into it. regards, tom lane
On Wed, Aug 18, 2021 at 10:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Anyway, this whole argument could be rendered moot if we could convert > name to a variable-length type. That would satisfy *both* sides of > the argument, since those who need long names could have them, while > those who don't would see net reduction instead of growth in space usage. Yeah it seems like the best way forward. > Of course, this is far far easier said than done; else we would have > done it years ago. But maybe it's not entirely out of reach. > I do not think it'd be hard to change "name" to have the same on-disk > storage representation as cstring; the hard part is what about its > usage in fixed-width catalog structures. Maybe we could finesse that > by decreeing that the name column always has to be the last > non-CATALOG_VARLEN field. (This would require fixing up the couple of > places where we let some other var-width field have that distinction; > but I suspect that would be small in comparison to the other work this > implies. If there are any catalogs having two name columns, one of them > would become more difficult to reach from C code.) Here is the list on some recent build (as of 17707c059c): relname | array_agg ------------------+---------------------------------- pg_collation | {collname,collctype,collcollate} pg_database | {datctype,datcollate,datname} pg_event_trigger | {evtevent,evtname} pg_subscription | {subname,subslotname} pg_trigger | {tgname,tgnewtable,tgoldtable} (5 rows) > I wonder if we'd get complaints from changing the catalog column layouts > that much. People are used to the name at the front, I think. OTOH, > I expected a lot of bleating about the OID column becoming frontmost, > but there hasn't been much. I don't think that would be comparable. Having an extra oid in the 1st column doesn't really make a raw SELECT * harder to read. But having the XXXname column way behind, and not even at the end, means that most people will have to type an extra "xxxname," for each throwaway query run to quickly verify something. I know that I often do that, and while I could live with it I'd rather not have to do it.
On 18.08.21 13:33, Julien Rouhaud wrote: > Agreed, but I don't have access to such hardware. However this won't > influence the memory overhead part, and there is already frequent > problems with that, especially since declarative partitioning, On the flip side, with partitioning you need room for longer table names, since you need room for the real name plus some partition identifier.
Julien Rouhaud <rjuju123@gmail.com> writes: > On Wed, Aug 18, 2021 at 10:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I wonder if we'd get complaints from changing the catalog column layouts >> that much. People are used to the name at the front, I think. OTOH, >> I expected a lot of bleating about the OID column becoming frontmost, >> but there hasn't been much. > I don't think that would be comparable. Having an extra oid in the > 1st column doesn't really make a raw SELECT * harder to read. But > having the XXXname column way behind, and not even at the end, means > that most people will have to type an extra "xxxname," for each > throwaway query run to quickly verify something. I know that I often > do that, and while I could live with it I'd rather not have to do it. Yeah, it would annoy the heck out of me too. Again there's a potential technical solution, which is to decouple the user-visible column order from the storage order. However, multiple people have tilted at that windmill without much success, so making it a prerequisite for improving the name-length situation doesn't seem like a smart plan. regards, tom lane
On 8/18/21 10:53 AM, Tom Lane wrote: > Julien Rouhaud <rjuju123@gmail.com> writes: >> On Wed, Aug 18, 2021 at 10:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I wonder if we'd get complaints from changing the catalog column layouts >>> that much. People are used to the name at the front, I think. OTOH, >>> I expected a lot of bleating about the OID column becoming frontmost, >>> but there hasn't been much. >> I don't think that would be comparable. Having an extra oid in the >> 1st column doesn't really make a raw SELECT * harder to read. But >> having the XXXname column way behind, and not even at the end, means >> that most people will have to type an extra "xxxname," for each >> throwaway query run to quickly verify something. I know that I often >> do that, and while I could live with it I'd rather not have to do it. > Yeah, it would annoy the heck out of me too. Again there's a potential > technical solution, which is to decouple the user-visible column order > from the storage order. However, multiple people have tilted at that > windmill without much success, so making it a prerequisite for improving > the name-length situation doesn't seem like a smart plan. > > There might be other benefits, though. IIRC what we discussed years ago was having for each attribute an immutable number (presumably attnum as now) and a mutable display order and storage order. Previous patches didn't implement this and so were rejected. I think part of the trouble is that we'd have to go through roughly 1700 mentions of attnum in the source and decide if it's really attnum they want or if it's attnum_display_order or attnum_storage_order. So this has the potential to be extraordinarily invasive and potentially bug-prone. And then there's the world of extensions to consider. I have a bit less sympathy for the argument that just moving it will break things that use 'select *' on the catalogs. In general, if you care about the order of columns you should name the columns you want in the order you want. I've seen 'select *' break for people on other changes, like adding or dropping a column. It might cause Postgres developers a bit of pain, but it should be manageable, so I kind of like your suggestion. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > On 8/18/21 10:53 AM, Tom Lane wrote: >> Yeah, it would annoy the heck out of me too. Again there's a potential >> technical solution, which is to decouple the user-visible column order >> from the storage order. However, multiple people have tilted at that >> windmill without much success, so making it a prerequisite for improving >> the name-length situation doesn't seem like a smart plan. > There might be other benefits, though. IIRC what we discussed years ago > was having for each attribute an immutable number (presumably attnum as > now) and a mutable display order and storage order. Previous patches > didn't implement this and so were rejected. I think part of the trouble > is that we'd have to go through roughly 1700 mentions of attnum in the > source and decide if it's really attnum they want or if it's > attnum_display_order or attnum_storage_order. So this has the potential > to be extraordinarily invasive and potentially bug-prone. And then > there's the world of extensions to consider. Yeah, exactly: conceptually that's simple, but flushing all the bugs out would be a years-long nightmare. It'd make all the fun we had with missed attisdropped checks look like a walk in the park. Unless somebody can figure out a way to mechanically check for mistakes, I don't think I want to go there. I wonder though if we could fix the immediate problem with something less ambitious. The hard part of the full proposal, I think, is separating permanent identity from physical position. If we were to split out *only* the display order from that, the patch footprint ought to be far far smaller --- basically, I think, we'd need to fix star-expansion and not a lot more in the backend. Of course, client-side code like psql's \d and pg_dump would need to be upgraded too, but any missed things there would be cosmetic that's-not-the- expected-column-order bugs, not core dumps. Also, at least in the v1 patch we could use this just for system catalogs without exposing it as a feature for user tables, which would greatly restrict the set of client-side places that really need fixed. (I think Alvaro was the last person to mess with this issue, so I wonder what his take is on the feasibility of such a restricted solution.) regards, tom lane
On 2021-Aug-18, Tom Lane wrote: > I wonder though if we could fix the immediate problem with something > less ambitious. The hard part of the full proposal, I think, is > separating permanent identity from physical position. If we were to > split out *only* the display order from that, the patch footprint > ought to be far far smaller --- basically, I think, we'd need to fix > star-expansion and not a lot more in the backend. Of course, > client-side code like psql's \d and pg_dump would need to be upgraded > too, but any missed things there would be cosmetic that's-not-the- > expected-column-order bugs, not core dumps. Also, at least in the v1 > patch we could use this just for system catalogs without exposing it > as a feature for user tables, which would greatly restrict the set of > client-side places that really need fixed. > > (I think Alvaro was the last person to mess with this issue, so I > wonder what his take is on the feasibility of such a restricted > solution.) Yeah, my impression is that the project of just changing star expansion is much, much easier than splitting out attribute identity from physical location. The former, as I recall, is a localized change in expandRTE and friends, and you don't have to touch anything else. The other part of the change is much more invasive and got me into territory that I wasn't able to navigate successfully. (I think we should consider keeping 'attnum' as the display-order attribute, and have the physical-and-identity attribute get a new name, say attphysnum. That's so that you don't have to change psql and the whole world -- the change is transparent to them. This means we need a first step that's very invasive, because every single current use of 'attnum' has to be changed to attphysnum first, followed by a functional patch that changes a few of those back to attnum. It'd be a large inconvenience to backend developers to ease the lives of client-side developers.) -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On 8/18/21 12:39 PM, Alvaro Herrera wrote: > On 2021-Aug-18, Tom Lane wrote: > >> I wonder though if we could fix the immediate problem with something >> less ambitious. The hard part of the full proposal, I think, is >> separating permanent identity from physical position. If we were to >> split out *only* the display order from that, the patch footprint >> ought to be far far smaller --- basically, I think, we'd need to fix >> star-expansion and not a lot more in the backend. Of course, >> client-side code like psql's \d and pg_dump would need to be upgraded >> too, but any missed things there would be cosmetic that's-not-the- >> expected-column-order bugs, not core dumps. Also, at least in the v1 >> patch we could use this just for system catalogs without exposing it >> as a feature for user tables, which would greatly restrict the set of >> client-side places that really need fixed. >> >> (I think Alvaro was the last person to mess with this issue, so I >> wonder what his take is on the feasibility of such a restricted >> solution.) > Yeah, my impression is that the project of just changing star expansion > is much, much easier than splitting out attribute identity from physical > location. The former, as I recall, is a localized change in expandRTE > and friends, and you don't have to touch anything else. The other part > of the change is much more invasive and got me into territory that I > wasn't able to navigate successfully. > > (I think we should consider keeping 'attnum' as the display-order > attribute, and have the physical-and-identity attribute get a new name, > say attphysnum. That's so that you don't have to change psql and the > whole world -- the change is transparent to them. This means we need a > first step that's very invasive, because every single current use of > 'attnum' has to be changed to attphysnum first, followed by a functional > patch that changes a few of those back to attnum. It'd be a large > inconvenience to backend developers to ease the lives of client-side > developers.) Can we call it attid just in case we decide some day to break the nexus on the other side? I like the idea of keeping it to the catalog to start with. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Thu, Aug 19, 2021 at 12:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Yeah, exactly: conceptually that's simple, but flushing all the bugs > out would be a years-long nightmare. It'd make all the fun we had > with missed attisdropped checks look like a walk in the park. Unless > somebody can figure out a way to mechanically check for mistakes, > I don't think I want to go there. Maybe a silly idea, but we could have some shared_preload_libraries module with a command_utility_hook that intercept table creation and randomize the logical column order. If we also have some GUC (assert only if needed) to switch star expansion to physical-order rather than logical-order, we could then use the regression test as a big hammer to see if anything breaks. That's clearly not ideal (it would obviously need some other hacks to avoid other breakage like \d and other things) and not 100% coverage, but it should give some confidence in any patch completeness.
Hi, On 2021-08-18 10:21:03 -0400, Tom Lane wrote: > Anyway, this whole argument could be rendered moot if we could convert > name to a variable-length type. That would satisfy *both* sides of > the argument, since those who need long names could have them, while > those who don't would see net reduction instead of growth in space usage. Yes, I think that's the best direction to go. We're loosing a fair bit of in-memory efficiency with current NAMEDATALEN already, it'd imo be beneficial to go for variable length encoding from that angle alone. > Of course, this is far far easier said than done; else we would have > done it years ago. But maybe it's not entirely out of reach. > I do not think it'd be hard to change "name" to have the same on-disk > storage representation as cstring; the hard part is what about its > usage in fixed-width catalog structures. Indeed. ISTM that the hardest part of that is dealing with copying around Form* structs? I wonder if we're getting closer to the time where we should just give up on the struct / ondisk layout mirroring for catalog tables, and generate explicit transformation routines via Catalog.pm. If we have to touch places handling Form_* structs anyway, we could make that more worthwhile by doing away with CATALOG_VARLEN etc - the transformation routines would just make those fields pointers which then can be accessed normally. > Maybe we could finesse that > by decreeing that the name column always has to be the last > non-CATALOG_VARLEN field. (This would require fixing up the couple of > places where we let some other var-width field have that distinction; > but I suspect that would be small in comparison to the other work this > implies. If there are any catalogs having two name columns, one of them > would become more difficult to reach from C code.) I wish we could find a way to make "relative pointers" (i.e. a variable offset from some base struct) work on the C level in a transparent and portable manner. Just about any instruction set has them natively anyway, and for a lot of tasks like variable length members it can be considerably more efficient... It'd also make DSM using code simpler and faster. Oh well, one can dream. > Another fun thing --- and, I fear, another good argument against just > raising NAMEDATALEN --- is what about TupleDescs, which last I checked > used an array of fixed-width pg_attribute images. But maybe we could > replace that with an array of pointers. Andres already did a lot of > the heavy code churn required to hide that data structure behind > TupleDescAttr() macros, so changing the representation should be much > less painful than it would once have been. I was recently wondering if we shouldn't go to a completely bespoke datastructure for TupleDesc->attrs, rather than reusing FormData_pg_attribute. Right now every attribute uses nearly two cachelines (112 bytes). Given how frequent a task tuple [de]forming is, and how often it's a bottleneck, increasing the cache efficiency of tupledescs would worth quite a bit of effort - I do see tupledesc attr cache misses in profiles. A secondary benefit would be that we do create a lot of short-lived descs in the executor, slimming those down obviously would be good on its own. A third benefit would be that we could get rid of attcacheoff in pg_attribute, that always smelled funny to me. One possible way to structure such future tupledescs would be to have multiple arrays in struct TupleDescData. With an array of just the data necessary for [de]forming at the place ->attrs is, and other stuff in one or more separate arrays. The other option could perhaps be omitted for some tupledescs or computed lazily. For deforming we just need attlen (2byte), attbyval (1 byte), attalign (1byte) and optionally attcacheoff (4 byte), for forming we also need attstorage (1 byte). Naively that ends up being 12 bytes - 5 attrs / cacheline is a heck of a lot better than ~0.5. Greetings, Andres Freund
On Thu, 19 Aug 2021 at 13:44, Andres Freund <andres@anarazel.de> wrote: > > > Another fun thing --- and, I fear, another good argument against just > > raising NAMEDATALEN --- is what about TupleDescs, which last I checked > > used an array of fixed-width pg_attribute images. But maybe we could > > replace that with an array of pointers. Andres already did a lot of > > the heavy code churn required to hide that data structure behind > > TupleDescAttr() macros, so changing the representation should be much > > less painful than it would once have been. > > I was recently wondering if we shouldn't go to a completely bespoke > datastructure for TupleDesc->attrs, rather than reusing FormData_pg_attribute. > > Right now every attribute uses nearly two cachelines (112 bytes). Given how > frequent a task tuple [de]forming is, and how often it's a bottleneck, > increasing the cache efficiency of tupledescs would worth quite a bit of > effort - I do see tupledesc attr cache misses in profiles. A secondary benefit > would be that we do create a lot of short-lived descs in the executor, > slimming those down obviously would be good on its own. A third benefit would > be that we could get rid of attcacheoff in pg_attribute, that always smelled > funny to me. > > One possible way to structure such future tupledescs would be to have multiple > arrays in struct TupleDescData. With an array of just the data necessary for > [de]forming at the place ->attrs is, and other stuff in one or more separate > arrays. The other option could perhaps be omitted for some tupledescs or > computed lazily. > > For deforming we just need attlen (2byte), attbyval (1 byte), attalign (1byte) > and optionally attcacheoff (4 byte), for forming we also need attstorage (1 > byte). Naively that ends up being 12 bytes - 5 attrs / cacheline is a heck of > a lot better than ~0.5. I tried to implement this 'compact attribute access descriptor' a few months ago in my effort to improve btree index performance. I abandoned the idea at the time as I didn't find any measurable difference for the (limited!) tests I ran, where the workload was mainly re-indexing, select * into, and similar items while benchmarking reindexing in the 'pp-complete' dataset. But, seeing that there might be interest outside this effort on a basis seperate from just plain performance, I'll share the results. Attached is the latest version of my patch that I could find; it might be incorrect or fail, as this is something I sent to myself between 2 of my systems during development of the patch. Also, attached as .txt, as I don't want any CFBot coverage on this (this is not proposed for inclusion, it is just a show of work, and might be basis for future work). The patch allocates an array of 'TupleAttrAlignData'-structs at the end of the attrs-array, fills it with the correct data upon TupleDesc-creation, and uses this TupleAttrAlign-data for constructing and destructing tuples. One main difference from what you described was that I used a union for storing attbyval and attstorage, as the latter is only applicable to attlen < 0, and the first only for attlen >= 0. This keeps the whole structure in 8 bytes, whilst also being useable in both tuple forming and deforming. I hope this can is useful, otherwise sorry for the noise. Kind regards, Matthias van de Meent
Attachment
Hi, On 2021-08-19 14:47:42 +0200, Matthias van de Meent wrote: > I tried to implement this 'compact attribute access descriptor' a few > months ago in my effort to improve btree index performance. cool > The patch allocates an array of 'TupleAttrAlignData'-structs at the > end of the attrs-array, fills it with the correct data upon > TupleDesc-creation, and uses this TupleAttrAlign-data for constructing > and destructing tuples. > One main difference from what you described was that I used a union > for storing attbyval and attstorage, as the latter is only applicable > to attlen < 0, and the first only for attlen >= 0. This keeps the > whole structure in 8 bytes, whilst also being useable in both tuple > forming and deforming. That's why I just talked about the naive way - it's clearly possible to do better... ;) > I hope this can is useful, otherwise sorry for the noise. It is! I haven't looked at your patch in detail, but I suspect that one reason that you didn't see performance benefits is that you added overhead as well. The computation of the "compact" memory location now will need a few more instructions than before, and I suspect the compiler may not even be able to optimize out some of the redundant accesses in loops. It'd be interesting to see what you'd get if you stored the compact array as the flexible-array and stored a pointer to the "full" attrs array (while still keeping it allocated together). Another reason is that it looks like you didn't touch slot_deform_heap_tuple(), which is I think the hottest of the deforming routines... Greetings, Andres Freund
On Thu, 19 Aug 2021 at 14:58, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2021-08-19 14:47:42 +0200, Matthias van de Meent wrote:
> > I tried to implement this 'compact attribute access descriptor' a few
> > months ago in my effort to improve btree index performance.
>
> cool
>
>
> > The patch allocates an array of 'TupleAttrAlignData'-structs at the
> > end of the attrs-array, fills it with the correct data upon
> > TupleDesc-creation, and uses this TupleAttrAlign-data for constructing
> > and destructing tuples.
>
> > One main difference from what you described was that I used a union
> > for storing attbyval and attstorage, as the latter is only applicable
> > to attlen < 0, and the first only for attlen >= 0. This keeps the
> > whole structure in 8 bytes, whilst also being useable in both tuple
> > forming and deforming.
>
> That's why I just talked about the naive way - it's clearly possible to
> do better... ;)
>
>
> > I hope this can is useful, otherwise sorry for the noise.
>
> It is!
Great!
> I haven't looked at your patch in detail, but I suspect that one reason
> that you didn't see performance benefits is that you added overhead as
> well. The computation of the "compact" memory location now will need a
> few more instructions than before, and I suspect the compiler may not
> even be able to optimize out some of the redundant accesses in loops.
>
> It'd be interesting to see what you'd get if you stored the compact
> array as the flexible-array and stored a pointer to the "full" attrs
> array (while still keeping it allocated together).
Yes, I remember testing swapping the order of the compact array with the FormData_pg_attribute array as well, with no clear results.
I think this can partially be attributed to the split access methods of the data in the attribute descriptor: some of it is 'give me the name', some of it is 'does this attribute exist, what type description does it have?' (atttypid, attnum, atttypmod, , and others are only interested in the physical representation information. Prioritizing some over the other might work, but I think to make full use of that it'd need a lot of work.
> Another reason is that it looks like you didn't touch
> slot_deform_heap_tuple(), which is I think the hottest of the deforming
> routines...
That might be for normal operations, but I'm not certain that code is in the hot path for (btree) indexing workloads, due to the relatively high number of operations on each tuple whilst sorting, or finding an insertion point or scan start point.
Anyway, after some digging I found the final state of this patch before I stopped working on it, and after polishing it up a bit with your suggestions it now passes check-world on the latest head (8d2d6ec7).
Kind regards,
Matthias van de Meent
>
> Hi,
>
> On 2021-08-19 14:47:42 +0200, Matthias van de Meent wrote:
> > I tried to implement this 'compact attribute access descriptor' a few
> > months ago in my effort to improve btree index performance.
>
> cool
>
>
> > The patch allocates an array of 'TupleAttrAlignData'-structs at the
> > end of the attrs-array, fills it with the correct data upon
> > TupleDesc-creation, and uses this TupleAttrAlign-data for constructing
> > and destructing tuples.
>
> > One main difference from what you described was that I used a union
> > for storing attbyval and attstorage, as the latter is only applicable
> > to attlen < 0, and the first only for attlen >= 0. This keeps the
> > whole structure in 8 bytes, whilst also being useable in both tuple
> > forming and deforming.
>
> That's why I just talked about the naive way - it's clearly possible to
> do better... ;)
>
>
> > I hope this can is useful, otherwise sorry for the noise.
>
> It is!
Great!
> I haven't looked at your patch in detail, but I suspect that one reason
> that you didn't see performance benefits is that you added overhead as
> well. The computation of the "compact" memory location now will need a
> few more instructions than before, and I suspect the compiler may not
> even be able to optimize out some of the redundant accesses in loops.
>
> It'd be interesting to see what you'd get if you stored the compact
> array as the flexible-array and stored a pointer to the "full" attrs
> array (while still keeping it allocated together).
Yes, I remember testing swapping the order of the compact array with the FormData_pg_attribute array as well, with no clear results.
I think this can partially be attributed to the split access methods of the data in the attribute descriptor: some of it is 'give me the name', some of it is 'does this attribute exist, what type description does it have?' (atttypid, attnum, atttypmod, , and others are only interested in the physical representation information. Prioritizing some over the other might work, but I think to make full use of that it'd need a lot of work.
> Another reason is that it looks like you didn't touch
> slot_deform_heap_tuple(), which is I think the hottest of the deforming
> routines...
That might be for normal operations, but I'm not certain that code is in the hot path for (btree) indexing workloads, due to the relatively high number of operations on each tuple whilst sorting, or finding an insertion point or scan start point.
Anyway, after some digging I found the final state of this patch before I stopped working on it, and after polishing it up a bit with your suggestions it now passes check-world on the latest head (8d2d6ec7).
Kind regards,
Matthias van de Meent
Attachment
Hi, I wanted to revive this thread to summarize what was discussed and get a sense of next steps we could take. The idea that gained the most traction is to make identifiers variable-length in the catalogs, which has the added benefit of reducing memory in syscaches in the common case. That presented two challenges: 1. That would require putting the name physically closer to the end of the column list. To make this less annoying for users, we'd need to separate physical order from display order (at least/at first only for system catalogs). This would require: - changing star expansion in SELECTs (expandRTE etc) - adjusting pg_dump, \d, etc That much seems clear and agreed upon. 2. We'd need to change how TupleDescs are stored and accessed. For this point, Matthias shared a prototype patch for a 'compact attribute access descriptor' and Andres wondered if we should "give up on the struct / ondisk layout mirroring for catalog tables, and generate explicit transformation routines via Catalog.pm". After this discussion dropped off, and it's not quite clear to me what the first logical step is to make progress on the thread subject, and what's nice to have for other reasons. Is Matthias's patch or something like it a good next step? -- John Naylor EDB: http://www.enterprisedb.com
Hi, On Fri, Jun 03, 2022 at 01:28:16PM +0700, John Naylor wrote: > > I wanted to revive this thread to summarize what was discussed and get > a sense of next steps we could take. Thanks for reviving this topic! > The idea that gained the most traction is to make identifiers > variable-length in the catalogs, which has the added benefit of > reducing memory in syscaches in the common case. That presented two > challenges: > > 1. That would require putting the name physically closer to the end of > the column list. To make this less annoying for users, we'd need to > separate physical order from display order (at least/at first only for > system catalogs). This would require: > > - changing star expansion in SELECTs (expandRTE etc) > - adjusting pg_dump, \d, etc I tried to work on a physical / logical attribute position patch (more at the end of the mail). I based it on Álvaro's feedback at [1], which is: - rename current pg_attribute.attnum to attphysnum - add a new pg_attribute.attnum for the now logical position - tweak a few places to use attnum rather than attphysnum (which however isn't what I did) The idea, IIUC, is to make a physical position reordering entirely transparent to clients. However, once you change the star expansion, things start to subtly break in a lot of places. I try to dig in a bit to see if things could be made to work the same but it opened a whole can of worms. So before spending too much time on something that could be a non-starter I'd like to get some feedback on the general approach. While some problem wouldn't happen if we restricted the feature to system catalogs only (e.g. with renamed / dropped attributes, inheritance...), a lot would still exist and would have to be dealt with initially. However I'm not sure what behavior would be wanted or acceptable, especially if we want something that can eventually be used for user relations too, with possibly dynamic positions. I'll describe some of those problems, and just to make things easier I will use a normal table "ab" with 2 attributes, a and b, with their physical / logical position reversed. Obviously SELECT * FROM ab should return a and b in that order. The aliases should also probably match the logical position, as in: SELECT * FROM ab ab(aa, bb) should probably map aa to a and bb to b. But should COPY FROM / COPY TO / INSERT INTO use the logical position too if not column list is provided? I'd think so, but it goes further from the original "only handle star expansion" idea. And should record_in / record_out use the logical position, as in: SELECT ab::text FROM ab / SELECT (a, b)::ab; I would think not, as relying on a possibly dynamic order could break things if you store the results somewhere, but YMMV. Note that there a variations of that, like SELECT ab::ab FROM (SELECT * FROM ab) ab; But those should probably work as intended (for now it doesn't) as you can't store a bare record, and storing a plain "ab" type wouldn't be problematic with dynamic changes. Then, what about joinrels expansion? I learned that the column ordering rules are far from being obvious, and I didn't find those in the documentation (note that I don't know if that something actually described in the SQL standard). So for instance, if a join is using an explicit USING clause rather than an ON clause, the merged columns are expanded first, so: SELECT * FROM ab ab1 JOIN ab ab2 USING (b) should unexpectedly expand to (b, a, a). Is this order a strict requirement? For now I sort the expanded columns by (varno, attnum) but that's clearly a noticeable change. Another problem (that probably wouldn't be a problem for system catalogs) is that defaults are evaluated in the physical position. This example from the regression test will clearly have a different behavior if the columns are in a different physical order: CREATE TABLE INSERT_TBL ( x INT DEFAULT nextval('insert_seq'), y TEXT DEFAULT '-NULL-', z INT DEFAULT -1 * currval('insert_seq'), CONSTRAINT INSERT_TBL_CON CHECK (x >= 3 AND y <> 'check failed' AND x < 8), CHECK (x + z = 0)); But changing the behavior to rely on the logical position seems quite dangerous. I'm also attaching some few patches as a basis for an implementation discussion. Those are just POC level prototypes with the bare minimum changes to try to have a behavior based on the logical position, and absolutely no effort to optimize anything. I'm also not familiar with all the impacted places, so I'm definitely not claiming that this is a good approach. Any suggestion on a better approach is welcome. 0001 is the s/attphysnum/attnum/ suggested by Álvaro. Here again it's the bare minimum to make the code compile, a lot of the code still reference attnum in some ways (attnameAttnum(), variables like attnums and so on) that would need to be cleaned up. 0002 is the addition of the new pg_attribute.attnum with some changes trying to make things work, but no infrastructure to change it anywhere. The relevant changes are there. In expandNSItemAttrs and some other places, the difference is that the used TupleDesc is first copied and sorted by logical position, and then the attphysnum attributes are used rather than a loop counter, or the targetlist are sorted and then the resno" are computed . Other places (only the relation columns aliases I think for now) also builds an array for the physical / logical mappings,. I also changed psql to display the column in logical position, and emit an extra line with the physical position in the verbose mode, but that's a clearly poor design which would need a lot more thoughts. No changes in pg_dump or any other tool. 0003 adds some infrastructure to specify the logical position for system catalogs. It also moves relname and attname near the end of the non-varlena part of pg_class / pg_attribute (just before the end so no change is needed in the *_FIXED_PART_SIZE macros) I also added some patches I used for testing: 0004 automatically reverses relation attributes order and preserves the original logical position, so you can run the whole regression tests and see what breaks. 0005 adds an ORDER (column_name, [...]) clause in the simplest CREATE TABLE command form to declare the logical order and manually test things, something like: CREATE TABLE ab (b int, a int) ORDER (a, b); [1] https://www.postgresql.org/message-id/202108181639.xjuovrpwgkr2@alvherre.pgsql
Attachment
Hi, On 2022-06-03 13:28:16 +0700, John Naylor wrote: > 1. That would require putting the name physically closer to the end of > the column list. To make this less annoying for users, we'd need to > separate physical order from display order (at least/at first only for > system catalogs). FWIW, it's not at all clear to me that this would be required. If I were to tackle this I'd look at breaking up the mirroring of C structs to catalog struct layout, by having genbki (or such) generate functions to map to/from tuple layout. It'll be an annoying amount of changes, but feasible, I think. > This would require: > > - changing star expansion in SELECTs (expandRTE etc) > - adjusting pg_dump, \d, etc > > That much seems clear and agreed upon. FWIW, I don't agree that this is a reasonable way to tackle changing NAMEDATALEN. It'd be nice to have, but it to me it seems a pretty small fraction of the problem of making Names variable length. You'll still have all the problems of name fields being accessed all over, but now not being included in the part of the struct visible to C etc. Greetings, Andres Freund
On Thu, Jun 23, 2022 at 6:13 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > While some problem wouldn't happen if we restricted the feature to system > catalogs only (e.g. with renamed / dropped attributes, inheritance...), a lot > would still exist and would have to be dealt with initially. However I'm not > sure what behavior would be wanted or acceptable, especially if we want > something that can eventually be used for user relations too, with possibly > dynamic positions. I am not very convinced that this would make the project a whole lot easier, but it does seem like the result would be less nice. > I'll describe some of those problems, and just to make things easier I will use > a normal table "ab" with 2 attributes, a and b, with their physical / logical > position reversed. > > Obviously > > SELECT * FROM ab > > should return a and b in that order. The aliases should also probably match > the logical position, as in: > > SELECT * FROM ab ab(aa, bb) > > should probably map aa to a and bb to b. Check. > But should COPY FROM / COPY TO / INSERT INTO use the logical position too if > not column list is provided? I'd think so, but it goes further from the > original "only handle star expansion" idea. I think yes. > And should record_in / record_out use the logical position, as in: > SELECT ab::text FROM ab / SELECT (a, b)::ab; > > I would think not, as relying on a possibly dynamic order could break things if > you store the results somewhere, but YMMV. I think here the answer is yes again. I mean, consider that you could also ALTER TABLE DROP COLUMN and then ALTER TABLE ADD COLUMN with the same name. That is surely going to affect the meaning of such things. I don't think we want to have one meaning if you reorder things that way and a different meaning if you reorder things using whatever commands we create for changing the display column positions. > Note that there a variations of that, like > SELECT ab::ab FROM (SELECT * FROM ab) ab; > > But those should probably work as intended (for now it doesn't) as you can't > store a bare record, and storing a plain "ab" type wouldn't be problematic with > dynamic changes. If the sub-SELECT and the cast both use the display order, which I think they should, then there's no problem here, I believe. > Then, what about joinrels expansion? I learned that the column ordering rules > are far from being obvious, and I didn't find those in the documentation (note > that I don't know if that something actually described in the SQL standard). > So for instance, if a join is using an explicit USING clause rather than an ON > clause, the merged columns are expanded first, so: > > SELECT * FROM ab ab1 JOIN ab ab2 USING (b) > > should unexpectedly expand to (b, a, a). Is this order a strict requirement? I dunno, but I can't see why it creates a problem for this patch to maintain the current behavior. I mean, just use the logical column position instead of the physical one here and forget about the details of how it works beyond that. > Another problem (that probably wouldn't be a problem for system catalogs) is > that defaults are evaluated in the physical position. This example from the > regression test will clearly have a different behavior if the columns are in a > different physical order: > > CREATE TABLE INSERT_TBL ( > x INT DEFAULT nextval('insert_seq'), > y TEXT DEFAULT '-NULL-', > z INT DEFAULT -1 * currval('insert_seq'), > CONSTRAINT INSERT_TBL_CON CHECK (x >= 3 AND y <> 'check failed' AND x < 8), > CHECK (x + z = 0)); > > But changing the behavior to rely on the logical position seems quite > dangerous. Why? -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Jun 23, 2022 at 10:17 AM Andres Freund <andres@anarazel.de> wrote: > > This would require: > > > > - changing star expansion in SELECTs (expandRTE etc) > > - adjusting pg_dump, \d, etc > > > > That much seems clear and agreed upon. > > FWIW, I don't agree that this is a reasonable way to tackle changing > NAMEDATALEN. It'd be nice to have, but it to me it seems a pretty small > fraction of the problem of making Names variable length. You'll still have all > the problems of name fields being accessed all over, but now not being > included in the part of the struct visible to C etc. Yeah, I agree. I think that being able to reorder columns logically without changing anything physically would be cool, but I think we could find some way of making the catalog columns variable-length without introducing that feature. I'm not sure whether your idea of creating translator functions is a good one or not, but it doesn't seem too crazy. It'd be like a special purpose tuple deformer. deform_pg_class_tuple(&pg_class_struct, pg_class_tuple); The code in there could be automatically generated statements that maybe look like this: memcpy(&struct.relnamespace, tuple + Aoffset_pg_class_relnamespace, sizeof(Oid)); Once you hit the first variable-length field you'd need something more clever, but because the code would be specialized for each catalog, it would probably be quite fast anyway. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jun 23, 2022 at 10:17 AM Andres Freund <andres@anarazel.de> wrote: >> FWIW, I don't agree that this is a reasonable way to tackle changing >> NAMEDATALEN. It'd be nice to have, but it to me it seems a pretty small >> fraction of the problem of making Names variable length. You'll still have all >> the problems of name fields being accessed all over, but now not being >> included in the part of the struct visible to C etc. > Yeah, I agree. I think that being able to reorder columns logically > without changing anything physically would be cool, but I think we > could find some way of making the catalog columns variable-length > without introducing that feature. Agreed. Bearing in mind that multiple smart people have tackled that idea and gotten nowhere, I think setting it as a prerequisite for this project is a good way of ensuring this won't happen. > I'm not sure whether your idea of creating translator functions is a > good one or not, but it doesn't seem too crazy. It'd be like a special > purpose tuple deformer. Sounds worth investigating, anyway. It'd also get us out from under C-struct-related problems such as the nearby AIX alignment issue. The extra cost of the deforming step could also be repaid, in some cases, by not having to use SysCacheGetAttr etc later on to fetch variable-length fields. That is, I'm imagining that the deformer would extract all the fields, even varlena ones, and drop pointers or whatever into fields of the C struct. regards, tom lane
On Thu, Jun 23, 2022 at 2:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sounds worth investigating, anyway. It'd also get us out from under > C-struct-related problems such as the nearby AIX alignment issue. Yeah. > The extra cost of the deforming step could also be repaid, in some > cases, by not having to use SysCacheGetAttr etc later on to fetch > variable-length fields. That is, I'm imagining that the deformer > would extract all the fields, even varlena ones, and drop pointers > or whatever into fields of the C struct. Yeah, if we were going to do something like this, I can't see why we wouldn't do it this way. It wouldn't make sense to do it for only some of the attributes. I'm not sure exactly where we would put this translation step, though. I think for the syscaches and relcache we'd want to translate when populating the cache so that when you do a cache lookup you get the data already translated. It's hard to be sure without testing, but that seems like it would make this cheap enough that we wouldn't have to be too worried, since the number of times we build new cache entries should be small compared to the number of times we access existing ones. The trickier thing might be code that uses systable_beginscan() et. al. directly. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 2022-06-23 14:42:17 -0400, Robert Haas wrote: > On Thu, Jun 23, 2022 at 2:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The extra cost of the deforming step could also be repaid, in some > > cases, by not having to use SysCacheGetAttr etc later on to fetch > > variable-length fields. That is, I'm imagining that the deformer > > would extract all the fields, even varlena ones, and drop pointers > > or whatever into fields of the C struct. I was also thinking we'd translate all attributes. Not entirely sure whether we'd want to use "plain" pointers though - there are some places where we rely on being able to copy such structs around. That'd be a bit easier with relative pointers, pointing to the end of the struct. But likely the notational overhead of dealing with relative pointers would be far higher than the notational overhead of having to invoke a generated "tuple struct" copy function. Which we'd likely need anyway, because some previously statically sized allocations would end up being variable sized? > Yeah, if we were going to do something like this, I can't see why we > wouldn't do it this way. It wouldn't make sense to do it for only some > of the attributes. Agreed. > I'm not sure exactly where we would put this translation step, though. > I think for the syscaches and relcache we'd want to translate when > populating the cache so that when you do a cache lookup you get the > data already translated. It's hard to be sure without testing, but > that seems like it would make this cheap enough that we wouldn't have > to be too worried, since the number of times we build new cache > entries should be small compared to the number of times we access > existing ones. The trickier thing might be code that uses > systable_beginscan() et. al. directly. I was thinking we'd basically do it wherever we do a GETSTRUCT() today. A first step could be to transform code like (Form_pg_attribute) GETSTRUCT(tuple) into GETSTRUCT(pg_attribute, tuple) then, in a subsequent step, we'd redefine GETSTRUCT as something #define GESTRUCT(catalog, tuple) tuple_to_struct_##catalog(tuple) Greetings, Andres Freund
On Thu, Jun 23, 2022 at 5:49 PM Andres Freund <andres@anarazel.de> wrote: > I was thinking we'd basically do it wherever we do a GETSTRUCT() today. > > A first step could be to transform code like > (Form_pg_attribute) GETSTRUCT(tuple) > into > GETSTRUCT(pg_attribute, tuple) > > then, in a subsequent step, we'd redefine GETSTRUCT as something > #define GESTRUCT(catalog, tuple) tuple_to_struct_##catalog(tuple) That seems a little fraught, because you'd be turning what's now basically a trivial operation into a non-trivial operation involving memory allocation. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jun 23, 2022 at 5:49 PM Andres Freund <andres@anarazel.de> wrote: >> I was thinking we'd basically do it wherever we do a GETSTRUCT() today. > That seems a little fraught, because you'd be turning what's now > basically a trivial operation into a non-trivial operation involving > memory allocation. Nonetheless, the presence of GETSTRUCT calls should be a good guide to where we need to do something. regards, tom lane
On Thu, Jun 23, 2022 at 9:17 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2022-06-03 13:28:16 +0700, John Naylor wrote: > > 1. That would require putting the name physically closer to the end of > > the column list. To make this less annoying for users, we'd need to > > separate physical order from display order (at least/at first only for > > system catalogs). > > FWIW, it's not at all clear to me that this would be required. If I were to > tackle this I'd look at breaking up the mirroring of C structs to catalog > struct layout, by having genbki (or such) generate functions to map to/from > tuple layout. It'll be an annoying amount of changes, but feasible, I think. Hmm, I must have misunderstood this aspect. In my mind I was thinking that if a varlen attribute were at the end, these functions would make it easier to access them quickly. But from this and the follow-on responses, these would be used to access varlen attributes wherever they may be. I'll take a look at the current uses of GETSTRUCT(). -- John Naylor EDB: http://www.enterprisedb.com
On Thu, Jun 23, 2022 at 6:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Nonetheless, the presence of GETSTRUCT calls should be a good guide > to where we need to do something. Indubitably. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Jun 23, 2022 at 11:11 PM John Naylor <john.naylor@enterprisedb.com> wrote: > Hmm, I must have misunderstood this aspect. In my mind I was thinking > that if a varlen attribute were at the end, these functions would make > it easier to access them quickly. But from this and the follow-on > responses, these would be used to access varlen attributes wherever > they may be. I'll take a look at the current uses of GETSTRUCT(). I don't know whether we can or should move all the "name" columns to the end of the catalog. It would be user-visible and probably not user-desirable, but it would save something in terms of tuple deforming cost. I'm just not sure how much. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > I don't know whether we can or should move all the "name" columns to > the end of the catalog. It would be user-visible and probably not > user-desirable, I'm a strong -1 on changing that if we're not absolutely forced to. > but it would save something in terms of tuple > deforming cost. I'm just not sure how much. I'd guess nothing, if we are deforming all the fields. regards, tom lane
On Thu, Jun 23, 2022 at 10:19:44AM -0400, Robert Haas wrote: > On Thu, Jun 23, 2022 at 6:13 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > And should record_in / record_out use the logical position, as in: > > SELECT ab::text FROM ab / SELECT (a, b)::ab; > > > > I would think not, as relying on a possibly dynamic order could break things if > > you store the results somewhere, but YMMV. > > I think here the answer is yes again. I mean, consider that you could > also ALTER TABLE DROP COLUMN and then ALTER TABLE ADD COLUMN with the > same name. That is surely going to affect the meaning of such things. > I don't think we want to have one meaning if you reorder things that > way and a different meaning if you reorder things using whatever > commands we create for changing the display column positions. It indeed would, but ALTER TABLE DROP COLUMN is a destructive operation, and I'm assuming that anyone doing that is aware that it will have an impact on stored data and such. I initially thought that changing the display order of columns shouldn't have the same impact with the stability of otherwise unchanged record definition, as it would make such reorder much more impacting. But I agree that having different behaviors seems worse. > > Then, what about joinrels expansion? I learned that the column ordering rules > > are far from being obvious, and I didn't find those in the documentation (note > > that I don't know if that something actually described in the SQL standard). > > So for instance, if a join is using an explicit USING clause rather than an ON > > clause, the merged columns are expanded first, so: > > > > SELECT * FROM ab ab1 JOIN ab ab2 USING (b) > > > > should unexpectedly expand to (b, a, a). Is this order a strict requirement? > > I dunno, but I can't see why it creates a problem for this patch to > maintain the current behavior. I mean, just use the logical column > position instead of the physical one here and forget about the details > of how it works beyond that. I'm not that familiar with this part of the code so I may have missed something, but I didn't see any place where I could just simply do that. To be clear, the approach I used is to change the expansion ordering but otherwise keep the current behavior, to try to minimize the changes. This is done by keeping the attribute in the physical ordering pretty much everywhere, including in the nsitems, and just logically reorder them during the expansion. In other words all the code still knows that the 1st column is the first physical column and so on. So in that query, the ordering is supposed to happen when handling the "SELECT *", which makes it impossible to retain that order. I'm assuming that what you meant is to change the ordering when processing the JOIN and retain the old "SELECT *" behavior, which is to emit items in the order they're found. But IIUC the only way to do that would be to change the order when building the nsitems themselves, and have the code believe that the attributes are physically stored in the logical order. That's probably doable, but that looks like a way more impacting change. Or did you mean to keep the approach I used, and just have some special case for "SELECT *" when referring to a joinrel and instead try to handle the logical expansion in the join? AFAICS it would require to add some extra info in the parsing structures, as it doesn't really really store any position, just relies on array offset / list position and maps things that way. > > Another problem (that probably wouldn't be a problem for system catalogs) is > > that defaults are evaluated in the physical position. This example from the > > regression test will clearly have a different behavior if the columns are in a > > different physical order: > > > > CREATE TABLE INSERT_TBL ( > > x INT DEFAULT nextval('insert_seq'), > > y TEXT DEFAULT '-NULL-', > > z INT DEFAULT -1 * currval('insert_seq'), > > CONSTRAINT INSERT_TBL_CON CHECK (x >= 3 AND y <> 'check failed' AND x < 8), > > CHECK (x + z = 0)); > > > > But changing the behavior to rely on the logical position seems quite > > dangerous. > > Why? It feels to me like a POLA violation, and probably people wouldn't expect it to behave this way (even if this is clearly some corner case problem). Even if you argue that this is not simply a default display order but something more like real column order, the physical position being some implementation detail, it still doesn't really feels right. The main reason for having the possibility to change the logical position is to have "better looking", easier to work with, relations even if you have some requirements with the real physical order like trying to optimize things as much as possible (reordering columns to avoid padding space, put non-nullable columns first...). The order in which defaults are evaluated looks like the same kind of requirements. How useful would it be if you could chose a logical order, but not being able to chose the one you actually want because it would break your default values? Anyway, per the nearby discussions I don't see much interest, especially not in the context of varlena identifiers (I should have started a different thread, sorry about that), so I don't think it's worth investing more efforts into it.
Hi, On 2022-06-26 10:48:24 +0800, Julien Rouhaud wrote: > Anyway, per the nearby discussions I don't see much interest, especially not in > the context of varlena identifiers (I should have started a different thread, > sorry about that), so I don't think it's worth investing more efforts into > it. FWIW, I still think it's a worthwhile feature - just, to me, unrelated to varlena identifiers. I've seen tremendous amounts of space wasted in tables just because of alignment issues... - Andres
On Sat, Jun 25, 2022 at 08:00:04PM -0700, Andres Freund wrote: > > On 2022-06-26 10:48:24 +0800, Julien Rouhaud wrote: > > Anyway, per the nearby discussions I don't see much interest, especially not in > > the context of varlena identifiers (I should have started a different thread, > > sorry about that), so I don't think it's worth investing more efforts into > > it. > > FWIW, I still think it's a worthwhile feature Oh, ok, I will start a new thread then. > - just, to me, unrelated to varlena identifiers. Yeah I get that and I agree. > I've seen tremendous amounts of space wasted in tables > just because of alignment issues... Same here unfortunately.
On Fri, Jun 24, 2022 at 4:49 AM Andres Freund <andres@anarazel.de> wrote: > A first step could be to transform code like > (Form_pg_attribute) GETSTRUCT(tuple) > into > GETSTRUCT(pg_attribute, tuple) To get the ball rolling, I've done this much in 0001. This is an easy mechanical change, although we'll next want a third argument to pass the Form_* pointer. 0002 changes this macro to #define GETSTRUCT(CAT, TUP) Deform_##CAT##_tuple((char *) ((TUP)->t_data) + (TUP)->t_data->t_hoff) ...where genbki.pl generates stub macros in the form of #define Deform_pg_am_tuple(TUP) (Form_pg_am) (TUP) ...which live in pg_*_deform.h and are #included into each main catalog header after the Form_* is defined. Feedback on whether a more tasteful way should be sought would be appreciated. While not very interesting on its own, it gives an idea of how the build would work. Next step would be to turn these into functions like static inline void Deform_pg_am_tuple(Form_pg_am, char * tuple) {...} that just memcpy() things over as already discussed, adjusting the callers of GETSTRUCT to match. This won't be quite as mechanical and will require some manual work since some not least because some call sites mix declaration and assignment. I imagine there will also be a need for the inverse operation, forming a tuple from a struct. We will also have to take care to work around cases where the Form_* pointer is currently used for in-place updates. I imagine there are not many, but they will require extra care. The pseudo-catalog pg_sequence_data is an example, and a bit of a special case anyway. On Fri, Jun 24, 2022 at 1:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sounds worth investigating, anyway. It'd also get us out from under > C-struct-related problems such as the nearby AIX alignment issue. Getting around that issue sounds like a good first goal for committing. -- John Naylor EDB: http://www.enterprisedb.com
Attachment
On Thu, Jun 23, 2022 at 9:27 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> I'm not sure whether your idea of creating translator functions is a
> good one or not, but it doesn't seem too crazy. It'd be like a special
> purpose tuple deformer.
>
> deform_pg_class_tuple(&pg_class_struct, pg_class_tuple);
>
> The code in there could be automatically generated statements that
> maybe look like this:
>
> memcpy(&struct.relnamespace, tuple + Aoffset_pg_class_relnamespace,
> sizeof(Oid));
I've made a small step in this direction.
0001 is just boilerplate, same as v1
0002 teaches Catalog.pm to export both C attr name and SQL attr name, so we can use "sizeof".
0003 generates static inline functions that work the same as the current GETSTRUCT macro, i.e. just cast to the right pointer and return it.
0004 generates a function for pg_cast that memcpys to a passed struct, for demonstration.
Here's the full deforming function generated for pg_cast. In English, the current offset is the previous offset plus the previous type length, plus any alignment padding suitable for the current type (there is none here, so the alignment aspect is not tested). I'm hoping something like this will be sufficient for what's in the current structs, but of course will need additional work when expanding those to include pointers to varlen attributes. I've not yet inspected the emitted assembly language, but regression tests pass.
static inline void
Deform_pg_cast_tuple(Form_pg_cast pg_cast_struct, char * pg_cast_tuple)
{
#define Aoff_pg_cast_oid 0
memcpy(&pg_cast_struct->oid, pg_cast_tuple + Aoff_pg_cast_oid, sizeof(Oid));
#define Aoff_pg_cast_castsource att_align_nominal(Aoff_pg_cast_oid + sizeof(Oid), 'i')
memcpy(&pg_cast_struct->castsource, pg_cast_tuple + Aoff_pg_cast_castsource, sizeof(Oid));
#define Aoff_pg_cast_casttarget att_align_nominal(Aoff_pg_cast_castsource + sizeof(Oid), 'i')
memcpy(&pg_cast_struct->casttarget, pg_cast_tuple + Aoff_pg_cast_casttarget, sizeof(Oid));
#define Aoff_pg_cast_castfunc att_align_nominal(Aoff_pg_cast_casttarget + sizeof(Oid), 'i')
memcpy(&pg_cast_struct->castfunc, pg_cast_tuple + Aoff_pg_cast_castfunc, sizeof(Oid));
#define Aoff_pg_cast_castcontext att_align_nominal(Aoff_pg_cast_castfunc + sizeof(Oid), 'c')
memcpy(&pg_cast_struct->castcontext, pg_cast_tuple + Aoff_pg_cast_castcontext, sizeof(char));
#define Aoff_pg_cast_castmethod att_align_nominal(Aoff_pg_cast_castcontext + sizeof(char), 'c')
memcpy(&pg_cast_struct->castmethod, pg_cast_tuple + Aoff_pg_cast_castmethod, sizeof(char));
}
Here's an example of the changes done in the call sites. For the rest of the catalogs, perhaps many of them can be done mechanically. I still anticipate a lot of manual work. Note that GETSTRUCT_MEMCPY is just temporary scaffolding to distinguish from just returning a pointer, and is not proposed for commit.
- Form_pg_cast castForm = GETSTRUCT(pg_cast, tuple);
+ FormData_pg_cast castForm;
CoercionContext castcontext;
+ GETSTRUCT_MEMCPY(pg_cast, &castForm, tuple);
+
/* convert char value for castcontext to CoercionContext enum */
- switch (castForm->castcontext)
+ switch (castForm.castcontext)
--
John Naylor
EDB: http://www.enterprisedb.com
static inline void
Deform_pg_cast_tuple(Form_pg_cast pg_cast_struct, char * pg_cast_tuple)
{
#define Aoff_pg_cast_oid 0
memcpy(&pg_cast_struct->oid, pg_cast_tuple + Aoff_pg_cast_oid, sizeof(Oid));
#define Aoff_pg_cast_castsource att_align_nominal(Aoff_pg_cast_oid + sizeof(Oid), 'i')
memcpy(&pg_cast_struct->castsource, pg_cast_tuple + Aoff_pg_cast_castsource, sizeof(Oid));
#define Aoff_pg_cast_casttarget att_align_nominal(Aoff_pg_cast_castsource + sizeof(Oid), 'i')
memcpy(&pg_cast_struct->casttarget, pg_cast_tuple + Aoff_pg_cast_casttarget, sizeof(Oid));
#define Aoff_pg_cast_castfunc att_align_nominal(Aoff_pg_cast_casttarget + sizeof(Oid), 'i')
memcpy(&pg_cast_struct->castfunc, pg_cast_tuple + Aoff_pg_cast_castfunc, sizeof(Oid));
#define Aoff_pg_cast_castcontext att_align_nominal(Aoff_pg_cast_castfunc + sizeof(Oid), 'c')
memcpy(&pg_cast_struct->castcontext, pg_cast_tuple + Aoff_pg_cast_castcontext, sizeof(char));
#define Aoff_pg_cast_castmethod att_align_nominal(Aoff_pg_cast_castcontext + sizeof(char), 'c')
memcpy(&pg_cast_struct->castmethod, pg_cast_tuple + Aoff_pg_cast_castmethod, sizeof(char));
}
Here's an example of the changes done in the call sites. For the rest of the catalogs, perhaps many of them can be done mechanically. I still anticipate a lot of manual work. Note that GETSTRUCT_MEMCPY is just temporary scaffolding to distinguish from just returning a pointer, and is not proposed for commit.
- Form_pg_cast castForm = GETSTRUCT(pg_cast, tuple);
+ FormData_pg_cast castForm;
CoercionContext castcontext;
+ GETSTRUCT_MEMCPY(pg_cast, &castForm, tuple);
+
/* convert char value for castcontext to CoercionContext enum */
- switch (castForm->castcontext)
+ switch (castForm.castcontext)
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment
Hi, On 2022-07-18 09:46:44 +0700, John Naylor wrote: > I've made a small step in this direction. Thanks for working on this! > 0001 is just boilerplate, same as v1 If we were to go for this, I wonder if we should backpatch the cast containing version of GESTRUCT for less pain backpatching bugfixes. That'd likely require using a different name for the cast containing one. > 0002 teaches Catalog.pm to export both C attr name and SQL attr name, so we > can use "sizeof". > 0003 generates static inline functions that work the same as the current > GETSTRUCT macro, i.e. just cast to the right pointer and return it. It seems likely that inline functions are going to be too large for this. There's a lot of GESTRUCTs in a lot of files, emitting a copy of the function every time doesn't seem great. > current offset is the previous offset plus the previous type length, plus > any alignment padding suitable for the current type (there is none here, so > the alignment aspect is not tested). I'm hoping something like this will be > sufficient for what's in the current structs, but of course will need > additional work when expanding those to include pointers to varlen > attributes. I've not yet inspected the emitted assembly language, but > regression tests pass. Hm. Wouldn't it make sense to just use the normal tuple deforming routines and then map the results to the structs? Greetings, Andres Freund
On Mon, Jul 18, 2022 at 9:58 AM Andres Freund <andres@anarazel.de> wrote:
> > 0001 is just boilerplate, same as v1
>
> If we were to go for this, I wonder if we should backpatch the cast containing
> version of GESTRUCT for less pain backpatching bugfixes. That'd likely require
> using a different name for the cast containing one.
>
> If we were to go for this, I wonder if we should backpatch the cast containing
> version of GESTRUCT for less pain backpatching bugfixes. That'd likely require
> using a different name for the cast containing one.
The new version in this series was meant to be temporary scaffolding, but in the back of my mind I wondered if we should go ahead and keep the simple cast for catalogs that have no varlenas or alignment issues. It sounds like you'd be in favor of that.
> > 0003 generates static inline functions that work the same as the current
> > GETSTRUCT macro, i.e. just cast to the right pointer and return it.
>
> It seems likely that inline functions are going to be too large for
> this. There's a lot of GESTRUCTs in a lot of files, emitting a copy of the
> function every time doesn't seem great.
Ok.
> > current offset is the previous offset plus the previous type length, plus
> > any alignment padding suitable for the current type (there is none here, so
> > the alignment aspect is not tested). I'm hoping something like this will be
> > sufficient for what's in the current structs, but of course will need
> > additional work when expanding those to include pointers to varlen
> > attributes. I've not yet inspected the emitted assembly language, but
> > regression tests pass.
>
> Hm. Wouldn't it make sense to just use the normal tuple deforming routines and
> then map the results to the structs?
I wasn't sure if they'd be suitable for this, but if they are, that'd make this easier and more maintainable. I'll look into it.
--
John Naylor
EDB: http://www.enterprisedb.com
> > GETSTRUCT macro, i.e. just cast to the right pointer and return it.
>
> It seems likely that inline functions are going to be too large for
> this. There's a lot of GESTRUCTs in a lot of files, emitting a copy of the
> function every time doesn't seem great.
Ok.
> > current offset is the previous offset plus the previous type length, plus
> > any alignment padding suitable for the current type (there is none here, so
> > the alignment aspect is not tested). I'm hoping something like this will be
> > sufficient for what's in the current structs, but of course will need
> > additional work when expanding those to include pointers to varlen
> > attributes. I've not yet inspected the emitted assembly language, but
> > regression tests pass.
>
> Hm. Wouldn't it make sense to just use the normal tuple deforming routines and
> then map the results to the structs?
I wasn't sure if they'd be suitable for this, but if they are, that'd make this easier and more maintainable. I'll look into it.
--
John Naylor
EDB: http://www.enterprisedb.com
I wrote:
> On Mon, Jul 18, 2022 at 9:58 AM Andres Freund <andres@anarazel.de> wrote:
> > Hm. Wouldn't it make sense to just use the normal tuple deforming routines and
> > then map the results to the structs?
>
> I wasn't sure if they'd be suitable for this, but if they are, that'd make this easier and more maintainable. I'll look into it.
This would seem to have its own problems: heap_deform_tuple writes to passed arrays of datums and bools. The lower level parts like fetchatt and nocachegetattr return datums, so still need some generated boilerplate. Some of these also assume they can write cached offsets on a passed tuple descriptor.
> > Hm. Wouldn't it make sense to just use the normal tuple deforming routines and
> > then map the results to the structs?
>
> I wasn't sure if they'd be suitable for this, but if they are, that'd make this easier and more maintainable. I'll look into it.
This would seem to have its own problems: heap_deform_tuple writes to passed arrays of datums and bools. The lower level parts like fetchatt and nocachegetattr return datums, so still need some generated boilerplate. Some of these also assume they can write cached offsets on a passed tuple descriptor.
I'm thinking where the first few attributes are fixed length, not null, and (because of AIX) not double-aligned, we can do a single memcpy on multiple columns at once. That will still be a common pattern after namedata is varlen. Otherwise, use helper functions/macros similar to the above but instead of passing a tuple descriptor, use info we have at compile time.
Hi, On 2022-07-19 14:30:34 +0700, John Naylor wrote: > I wrote: > > > On Mon, Jul 18, 2022 at 9:58 AM Andres Freund <andres@anarazel.de> wrote: > > > Hm. Wouldn't it make sense to just use the normal tuple deforming > routines and > > > then map the results to the structs? > > > > I wasn't sure if they'd be suitable for this, but if they are, that'd > make this easier and more maintainable. I'll look into it. > > This would seem to have its own problems: heap_deform_tuple writes to > passed arrays of datums and bools. The lower level parts like fetchatt and > nocachegetattr return datums, so still need some generated boilerplate. > Some of these also assume they can write cached offsets on a passed tuple > descriptor. Sure. But that'll just be a form of conversion we do all over, rather than encoding low-level data layout details. Basically struct->member1 = DatumGetInt32(values[0]); struct->member2 = DatumGetChar(values[1]); etc. > I'm thinking where the first few attributes are fixed length, not null, and > (because of AIX) not double-aligned, we can do a single memcpy on multiple > columns at once. That will still be a common pattern after namedata is > varlen. Otherwise, use helper functions/macros similar to the above but > instead of passing a tuple descriptor, use info we have at compile time. I think that might be over-optimizing things. I don't think we do these conversions at a rate that's high enough to warrant it - the common stuff should be in relcache etc. It's possible that we might want to optimize the catcache case specifically - but that'd be more optimizing memory usage than "conversion" imo. Greetings, Andres Freund
On Tue, Jul 19, 2022 at 10:57 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-07-19 14:30:34 +0700, John Naylor wrote:
> > I'm thinking where the first few attributes are fixed length, not null, and
> > (because of AIX) not double-aligned, we can do a single memcpy on multiple
> > columns at once. That will still be a common pattern after namedata is
> > varlen. Otherwise, use helper functions/macros similar to the above but
> > instead of passing a tuple descriptor, use info we have at compile time.
>
> I think that might be over-optimizing things. I don't think we do these
> conversions at a rate that's high enough to warrant it - the common stuff
> should be in relcache etc. It's possible that we might want to optimize the
> catcache case specifically - but that'd be more optimizing memory usage than
> "conversion" imo.
Okay, here is a hackish experiment that applies on top of v2 but also invalidates some of that earlier work. Since there is already a pg_cast.c, I demoed a new function there which looks like this:
void
Deform_pg_cast_tuple(Form_pg_cast pg_cast_struct, HeapTuple pg_cast_tuple, TupleDesc pg_cast_desc)
{
Datum values[Natts_pg_cast];
bool isnull[Natts_pg_cast];
heap_deform_tuple(pg_cast_tuple, pg_cast_desc, values, isnull);
pg_cast_struct->oid = DatumGetObjectId(values[Anum_pg_cast_oid - 1]);
pg_cast_struct->castsource = DatumGetObjectId(values[Anum_pg_cast_castsource - 1]);
pg_cast_struct->casttarget = DatumGetObjectId(values[Anum_pg_cast_casttarget - 1]);
pg_cast_struct->castfunc = DatumGetObjectId(values[Anum_pg_cast_castfunc - 1]);
pg_cast_struct->castcontext = DatumGetChar(values[Anum_pg_cast_castcontext - 1]);
pg_cast_struct->castmethod = DatumGetChar(values[Anum_pg_cast_castmethod - 1]);
}
For the general case we can use pg_*_deform.c or something like that, with extern declarations in the main headers. To get this to work, I had to add a couple pointless table open/close calls to get the tuple descriptor, since currently the whole tuple is stored in the syscache, but that's not good even as a temporary measure. Storing the full struct in the syscache is a good future step, as noted upthread, but to get there without a bunch more churn, maybe the above function can copy the tuple descriptor into a local stack variable from an expanded version of schemapg.h. Once the deformed structs are stored in caches, I imagine most of the times we want to deform are when we have the table open, and we can pass the descriptor as above without additional code.
--
John Naylor
EDB: http://www.enterprisedb.com
>
> Hi,
>
> On 2022-07-19 14:30:34 +0700, John Naylor wrote:
> > I'm thinking where the first few attributes are fixed length, not null, and
> > (because of AIX) not double-aligned, we can do a single memcpy on multiple
> > columns at once. That will still be a common pattern after namedata is
> > varlen. Otherwise, use helper functions/macros similar to the above but
> > instead of passing a tuple descriptor, use info we have at compile time.
>
> I think that might be over-optimizing things. I don't think we do these
> conversions at a rate that's high enough to warrant it - the common stuff
> should be in relcache etc. It's possible that we might want to optimize the
> catcache case specifically - but that'd be more optimizing memory usage than
> "conversion" imo.
Okay, here is a hackish experiment that applies on top of v2 but also invalidates some of that earlier work. Since there is already a pg_cast.c, I demoed a new function there which looks like this:
void
Deform_pg_cast_tuple(Form_pg_cast pg_cast_struct, HeapTuple pg_cast_tuple, TupleDesc pg_cast_desc)
{
Datum values[Natts_pg_cast];
bool isnull[Natts_pg_cast];
heap_deform_tuple(pg_cast_tuple, pg_cast_desc, values, isnull);
pg_cast_struct->oid = DatumGetObjectId(values[Anum_pg_cast_oid - 1]);
pg_cast_struct->castsource = DatumGetObjectId(values[Anum_pg_cast_castsource - 1]);
pg_cast_struct->casttarget = DatumGetObjectId(values[Anum_pg_cast_casttarget - 1]);
pg_cast_struct->castfunc = DatumGetObjectId(values[Anum_pg_cast_castfunc - 1]);
pg_cast_struct->castcontext = DatumGetChar(values[Anum_pg_cast_castcontext - 1]);
pg_cast_struct->castmethod = DatumGetChar(values[Anum_pg_cast_castmethod - 1]);
}
For the general case we can use pg_*_deform.c or something like that, with extern declarations in the main headers. To get this to work, I had to add a couple pointless table open/close calls to get the tuple descriptor, since currently the whole tuple is stored in the syscache, but that's not good even as a temporary measure. Storing the full struct in the syscache is a good future step, as noted upthread, but to get there without a bunch more churn, maybe the above function can copy the tuple descriptor into a local stack variable from an expanded version of schemapg.h. Once the deformed structs are stored in caches, I imagine most of the times we want to deform are when we have the table open, and we can pass the descriptor as above without additional code.
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment
Attached is v3, which is mostly putting Andres's suggestion above (use heap_deform_tuple) into a code generation context. Everything interesting is in 0002, and again only touches pg_cast as an experiment. Some design points which could be debatable:
- I thought individual C files would be a pain for the build process, so I just dumped the new functions into a single file, with the extern declarations in the pg_*.h files.
- I used `#include "catalog/pg_cast.h"` just for Form_pg_cast, but maybe a forward declaration would work.
- To get the struct member assignments, I resorted to a Perl hash to map the attribute types to the DatumGet* macros. That may not be great, but I don't have a better idea at the moment.
- I directly called this function for the table scan rather than hide it behind a new variant of GETSTRUCT as I did before. That seems clearer to me about the intent.
The syscache use of GETSTRUCT still uses a simple cast of the tuple (for pg_cast those calls live in parse_coerce.c, which is unchanged from master in v3). Next step I think is to see about the syscache piece -- teaching a syscache miss to deform the entire tuple into a struct and store it in the syscache.
--
John Naylor
EDB: http://www.enterprisedb.com
- I thought individual C files would be a pain for the build process, so I just dumped the new functions into a single file, with the extern declarations in the pg_*.h files.
- I used `#include "catalog/pg_cast.h"` just for Form_pg_cast, but maybe a forward declaration would work.
- To get the struct member assignments, I resorted to a Perl hash to map the attribute types to the DatumGet* macros. That may not be great, but I don't have a better idea at the moment.
- I directly called this function for the table scan rather than hide it behind a new variant of GETSTRUCT as I did before. That seems clearer to me about the intent.
The syscache use of GETSTRUCT still uses a simple cast of the tuple (for pg_cast those calls live in parse_coerce.c, which is unchanged from master in v3). Next step I think is to see about the syscache piece -- teaching a syscache miss to deform the entire tuple into a struct and store it in the syscache.
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment
I wrote: > The syscache use of GETSTRUCT still uses a simple cast of the tuple (for pg_cast those calls live in parse_coerce.c, whichis unchanged from master in v3). Next step I think is to see about the syscache piece -- teaching a syscache miss todeform the entire tuple into a struct and store it in the syscache. v4 is a hackish attempt at getting the syscache to deform the tuple and store the struct. Using only pg_cast again for ease, it seems to work for that. It's now about as far as I can get without thinking about byref types. 0001 just adds copies of some syscache / catcache functions for private use by pg_cast, as scaffolding. 0002 teaches the temporary CatalogCacheCreateEntry_STRUCT() to call heap_deform_tuple(). This then calls a for-now handwritten function (similar to the one generated in v3) which palloc's space for the struct and copies the fields over. Only this function knows the catalog struct type, so a future design could call the per-cache function via a pointer in the catcache control array. Since we already have the isnull/values array, it's also trivial to copy the datums for the cache keys. WIP: CatCTup->tuple is still declared a tuple, but the t_data contents are now bogus, so there is a temporary GETSTRUCT_NEW that knows it's looking directly at the struct. Getting to varlen attributes: For one, I think it was mentioned above that we will need a way to perform a deep copy of structs that contain pointers to varlen fields. I imagine keeping track of the attributes length will come up for some types and might be tricky. And before that, the catalog machinery will need some preprocessor tricks to declare pointers in the structs. -- John Naylor EDB: http://www.enterprisedb.com