Thread: generate syscache info automatically
I want to report on my on-the-plane-to-PGCon project. The idea was mentioned in [0]. genbki.pl already knows everything about system catalog indexes. If we add a "please also make a syscache for this one" flag to the catalog metadata, we can have genbki.pl produce the tables in syscache.c and syscache.h automatically. Aside from avoiding the cumbersome editing of those tables, I think this layout is also conceptually cleaner, as you can more easily see which system catalog indexes have syscaches and maybe ask questions about why or why not. As a possible follow-up, I have also started work on generating the ObjectProperty structure in objectaddress.c. One of the things you need for that is making genbki.pl aware of the syscache information. There is some more work to be done there, but it's looking promising. [0]: https://www.postgresql.org/message-id/flat/CA%2BhUKGKdpDjKL2jgC-GpoL4DGZU1YPqnOFHbDqFkfRQcPaR5DQ%40mail.gmail.com
Attachment
Peter Eisentraut <peter@eisentraut.org> writes: > The idea was mentioned in [0]. genbki.pl already knows everything about > system catalog indexes. If we add a "please also make a syscache for > this one" flag to the catalog metadata, we can have genbki.pl produce > the tables in syscache.c and syscache.h automatically. +1 on this worthwhile reduction of manual work. Tangentially, it reminded me of one of my least favourite parts of Catalog.pm, the regexes in ParseHeader(): > diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm > index 84aaeb002a..a727d692b7 100644 > --- a/src/backend/catalog/Catalog.pm > +++ b/src/backend/catalog/Catalog.pm > @@ -110,7 +110,7 @@ sub ParseHeader > }; > } > elsif ( > - /^DECLARE_(UNIQUE_)?INDEX(_PKEY)?\(\s*(\w+),\s*(\d+),\s*(\w+),\s*(.+)\)/ > + /^DECLARE_(UNIQUE_)?INDEX(_PKEY)?\(\s*(\w+),\s*(\d+),\s*(\w+),\s*(\w+),\s*(.+)\)/ > ) > { > push @{ $catalog{indexing} }, > @@ -120,7 +120,8 @@ sub ParseHeader > index_name => $3, > index_oid => $4, > index_oid_macro => $5, > - index_decl => $6 > + table_name => $6, > + index_decl => $7 > }; > } > elsif (/^DECLARE_OID_DEFINING_MACRO\(\s*(\w+),\s*(\d+)\)/) Now that we require Perl 5.14, we could replace this parenthesis- counting nightmare with named captures (introduced in Perl 5.10), which would make the above change look like this instead (context expanded to show the whole elsif block): elsif ( /^DECLARE_(UNIQUE_)?INDEX(_PKEY)?\(\s* (?<index_name>\w+),\s* (?<index_oid>\d+),\s* (?<index_oid_macro>\w+),\s* + (?<table_name>\w+),\s* (?<index_decl>.+) \)/x ) { push @{ $catalog{indexing} }, { is_unique => $1 ? 1 : 0, is_pkey => $2 ? 1 : 0, %+, }; } For other patterns without the optional bits in the keyword, it becomes even simpler, e.g. if (/^DECLARE_TOAST\(\s* (?<parent_table>\w+),\s* (?<toast_oid>\d+),\s* (?<toast_index_oid>\d+)\s* \)/x ) { push @{ $catalog{toasting} }, {%+}; } I'd be happy to submit a patch to do this for all the ParseHeader() regexes (in a separate thread) if others agree this is an improvement. - ilmari
On 31.05.23 13:02, Dagfinn Ilmari Mannsåker wrote: > For other patterns without the optional bits in the keyword, it becomes > even simpler, e.g. > > if (/^DECLARE_TOAST\(\s* > (?<parent_table>\w+),\s* > (?<toast_oid>\d+),\s* > (?<toast_index_oid>\d+)\s* > \)/x > ) > { > push @{ $catalog{toasting} }, {%+}; > } > > > I'd be happy to submit a patch to do this for all the ParseHeader() > regexes (in a separate thread) if others agree this is an improvement. I would welcome such a patch.
On Wed, May 31, 2023 at 4:58 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I want to report on my on-the-plane-to-PGCon project.
>
> The idea was mentioned in [0]. genbki.pl already knows everything about
> system catalog indexes. If we add a "please also make a syscache for
> this one" flag to the catalog metadata, we can have genbki.pl produce
> the tables in syscache.c and syscache.h automatically.
>
> Aside from avoiding the cumbersome editing of those tables, I think this
> layout is also conceptually cleaner, as you can more easily see which
> system catalog indexes have syscaches and maybe ask questions about why
> or why not.
When this has come up before, one objection was that index declarations shouldn't know about cache names and bucket sizes [1]. The second paragraph above makes a reasonable case for that, however. I believe one alternative idea was for a script to read the enum, which would look something like this:
#define DECLARE_SYSCACHE(cacheid,indexname,numbuckets) cacheid
enum SysCacheIdentifier
{
DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index, 16) = 0,
...
};
...which would then look up the other info in the usual way from Catalog.pm.
> As a possible follow-up, I have also started work on generating the
> ObjectProperty structure in objectaddress.c. One of the things you need
> for that is making genbki.pl aware of the syscache information. There
> is some more work to be done there, but it's looking promising.
I haven't studied this, but it seems interesting.
One other possible improvement: syscache.c has a bunch of #include's, one for each catalog with a cache, so there's still a bit of manual work in adding a cache, and the current #include list is a bit cumbersome. Perhaps it's worth it to have the script emit them as well?
I also wonder if at some point it will make sense to split off a separate script(s) for some things that are unrelated to the bootstrap data. genbki.pl is getting pretty large, and there are additional things that could be done with syscaches, e.g. inlined eq/hash functions for cache lookup [2].
[1] https://www.postgresql.org/message-id/12460.1570734874@sss.pgh.pa.us
[2] https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de
--
John Naylor
EDB: http://www.enterprisedb.com
>
> I want to report on my on-the-plane-to-PGCon project.
>
> The idea was mentioned in [0]. genbki.pl already knows everything about
> system catalog indexes. If we add a "please also make a syscache for
> this one" flag to the catalog metadata, we can have genbki.pl produce
> the tables in syscache.c and syscache.h automatically.
>
> Aside from avoiding the cumbersome editing of those tables, I think this
> layout is also conceptually cleaner, as you can more easily see which
> system catalog indexes have syscaches and maybe ask questions about why
> or why not.
When this has come up before, one objection was that index declarations shouldn't know about cache names and bucket sizes [1]. The second paragraph above makes a reasonable case for that, however. I believe one alternative idea was for a script to read the enum, which would look something like this:
#define DECLARE_SYSCACHE(cacheid,indexname,numbuckets) cacheid
enum SysCacheIdentifier
{
DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index, 16) = 0,
...
};
...which would then look up the other info in the usual way from Catalog.pm.
> As a possible follow-up, I have also started work on generating the
> ObjectProperty structure in objectaddress.c. One of the things you need
> for that is making genbki.pl aware of the syscache information. There
> is some more work to be done there, but it's looking promising.
I haven't studied this, but it seems interesting.
One other possible improvement: syscache.c has a bunch of #include's, one for each catalog with a cache, so there's still a bit of manual work in adding a cache, and the current #include list is a bit cumbersome. Perhaps it's worth it to have the script emit them as well?
I also wonder if at some point it will make sense to split off a separate script(s) for some things that are unrelated to the bootstrap data. genbki.pl is getting pretty large, and there are additional things that could be done with syscaches, e.g. inlined eq/hash functions for cache lookup [2].
[1] https://www.postgresql.org/message-id/12460.1570734874@sss.pgh.pa.us
[2] https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de
--
John Naylor
EDB: http://www.enterprisedb.com
Here is an updated patch set that adjusts for the recently introduced named captures. I will address the other issues later. I think the first few patches in the series can be considered nonetheless. On 15.06.23 09:45, John Naylor wrote: > On Wed, May 31, 2023 at 4:58 AM Peter Eisentraut <peter@eisentraut.org > <mailto:peter@eisentraut.org>> wrote: > > > > I want to report on my on-the-plane-to-PGCon project. > > > > The idea was mentioned in [0]. genbki.pl <http://genbki.pl> already > knows everything about > > system catalog indexes. If we add a "please also make a syscache for > > this one" flag to the catalog metadata, we can have genbki.pl > <http://genbki.pl> produce > > the tables in syscache.c and syscache.h automatically. > > > > Aside from avoiding the cumbersome editing of those tables, I think this > > layout is also conceptually cleaner, as you can more easily see which > > system catalog indexes have syscaches and maybe ask questions about why > > or why not. > > When this has come up before, one objection was that index declarations > shouldn't know about cache names and bucket sizes [1]. The second > paragraph above makes a reasonable case for that, however. I believe one > alternative idea was for a script to read the enum, which would look > something like this: > > #define DECLARE_SYSCACHE(cacheid,indexname,numbuckets) cacheid > > enum SysCacheIdentifier > { > DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index, 16) = 0, > ... > }; > > ...which would then look up the other info in the usual way from Catalog.pm. > > > As a possible follow-up, I have also started work on generating the > > ObjectProperty structure in objectaddress.c. One of the things you need > > for that is making genbki.pl <http://genbki.pl> aware of the syscache > information. There > > is some more work to be done there, but it's looking promising. > > I haven't studied this, but it seems interesting. > > One other possible improvement: syscache.c has a bunch of #include's, > one for each catalog with a cache, so there's still a bit of manual work > in adding a cache, and the current #include list is a bit cumbersome. > Perhaps it's worth it to have the script emit them as well? > > I also wonder if at some point it will make sense to split off a > separate script(s) for some things that are unrelated to the bootstrap > data. genbki.pl <http://genbki.pl> is getting pretty large, and there > are additional things that could be done with syscaches, e.g. inlined > eq/hash functions for cache lookup [2]. > > [1] https://www.postgresql.org/message-id/12460.1570734874@sss.pgh.pa.us > <https://www.postgresql.org/message-id/12460.1570734874@sss.pgh.pa.us> > [2] > https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de <https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de> > > -- > John Naylor > EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
Attachment
- v2-0001-Update-DECLARE_INDEX-documentation.patch
- v2-0002-Restructure-DECLARE_INDEX-arguments.patch
- v2-0003-genbki.pl-Factor-out-boilerplate-generation.patch
- v2-0004-Generate-syscache-info-from-catalog-files.patch
- v2-0005-Fill-in-more-of-ObjectProperty.patch
- v2-0006-WIP-Generate-ObjectProperty-from-catalog-files.patch
On 03.07.23 07:45, Peter Eisentraut wrote: > Here is an updated patch set that adjusts for the recently introduced > named captures. I will address the other issues later. I think the > first few patches in the series can be considered nonetheless. I have committed the 0001 patch, which was really a (code comment) bug fix. I think the patches 0002 and 0003 should be uncontroversial, so I'd like to commit them if no one objects. The remaining patches are still WIP. > On 15.06.23 09:45, John Naylor wrote: >> On Wed, May 31, 2023 at 4:58 AM Peter Eisentraut <peter@eisentraut.org >> <mailto:peter@eisentraut.org>> wrote: >> > >> > I want to report on my on-the-plane-to-PGCon project. >> > >> > The idea was mentioned in [0]. genbki.pl <http://genbki.pl> already >> knows everything about >> > system catalog indexes. If we add a "please also make a syscache for >> > this one" flag to the catalog metadata, we can have genbki.pl >> <http://genbki.pl> produce >> > the tables in syscache.c and syscache.h automatically. >> > >> > Aside from avoiding the cumbersome editing of those tables, I think >> this >> > layout is also conceptually cleaner, as you can more easily see which >> > system catalog indexes have syscaches and maybe ask questions about >> why >> > or why not. >> >> When this has come up before, one objection was that index >> declarations shouldn't know about cache names and bucket sizes [1]. >> The second paragraph above makes a reasonable case for that, however. >> I believe one alternative idea was for a script to read the enum, >> which would look something like this: >> >> #define DECLARE_SYSCACHE(cacheid,indexname,numbuckets) cacheid >> >> enum SysCacheIdentifier >> { >> DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index, 16) = 0, >> ... >> }; >> >> ...which would then look up the other info in the usual way from >> Catalog.pm. >> >> > As a possible follow-up, I have also started work on generating the >> > ObjectProperty structure in objectaddress.c. One of the things you >> need >> > for that is making genbki.pl <http://genbki.pl> aware of the >> syscache information. There >> > is some more work to be done there, but it's looking promising. >> >> I haven't studied this, but it seems interesting. >> >> One other possible improvement: syscache.c has a bunch of #include's, >> one for each catalog with a cache, so there's still a bit of manual >> work in adding a cache, and the current #include list is a bit >> cumbersome. Perhaps it's worth it to have the script emit them as well? >> >> I also wonder if at some point it will make sense to split off a >> separate script(s) for some things that are unrelated to the bootstrap >> data. genbki.pl <http://genbki.pl> is getting pretty large, and there >> are additional things that could be done with syscaches, e.g. inlined >> eq/hash functions for cache lookup [2]. >> >> [1] >> https://www.postgresql.org/message-id/12460.1570734874@sss.pgh.pa.us >> <https://www.postgresql.org/message-id/12460.1570734874@sss.pgh.pa.us> >> [2] >> https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de <https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de> >> >> -- >> John Naylor >> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
I have committed 0002 and 0003, and also a small bug fix in the ObjectProperty entry for "transforms". I have also gotten the automatic generation of the ObjectProperty lookup table working (with some warts). Attached is an updated patch set. One win here is that the ObjectProperty lookup now uses a hash function instead of a linear search. So hopefully the performance is better (to be confirmed) and it could now be used for things like [0]. [0]: https://www.postgresql.org/message-id/flat/12f1b1d2-f8cf-c4a2-72ec-441bd79546cb@enterprisedb.com There was some discussion about whether the catalog files are the right place to keep syscache information. Personally, I would find that more convenient. (Note that we also recently moved the definitions of indexes and toast tables from files with the whole list into the various catalog files.) But we can also keep it somewhere else. The important thing is that Catalog.pm can find it somewhere in a structured form. To finish up the ObjectProperty generation, we also need to store some more data, namely the OBJECT_* mappings. We probably do not want to store those in the catalog headers; that looks like a layering violation to me. So, there is some discussion to be had about where to put all this across various use cases. On 24.08.23 16:03, Peter Eisentraut wrote: > On 03.07.23 07:45, Peter Eisentraut wrote: >> Here is an updated patch set that adjusts for the recently introduced >> named captures. I will address the other issues later. I think the >> first few patches in the series can be considered nonetheless. > > I have committed the 0001 patch, which was really a (code comment) bug fix. > > I think the patches 0002 and 0003 should be uncontroversial, so I'd like > to commit them if no one objects. > > The remaining patches are still WIP. > > >> On 15.06.23 09:45, John Naylor wrote: >>> On Wed, May 31, 2023 at 4:58 AM Peter Eisentraut >>> <peter@eisentraut.org <mailto:peter@eisentraut.org>> wrote: >>> > >>> > I want to report on my on-the-plane-to-PGCon project. >>> > >>> > The idea was mentioned in [0]. genbki.pl <http://genbki.pl> >>> already knows everything about >>> > system catalog indexes. If we add a "please also make a syscache for >>> > this one" flag to the catalog metadata, we can have genbki.pl >>> <http://genbki.pl> produce >>> > the tables in syscache.c and syscache.h automatically. >>> > >>> > Aside from avoiding the cumbersome editing of those tables, I >>> think this >>> > layout is also conceptually cleaner, as you can more easily see which >>> > system catalog indexes have syscaches and maybe ask questions >>> about why >>> > or why not. >>> >>> When this has come up before, one objection was that index >>> declarations shouldn't know about cache names and bucket sizes [1]. >>> The second paragraph above makes a reasonable case for that, however. >>> I believe one alternative idea was for a script to read the enum, >>> which would look something like this: >>> >>> #define DECLARE_SYSCACHE(cacheid,indexname,numbuckets) cacheid >>> >>> enum SysCacheIdentifier >>> { >>> DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index, 16) = 0, >>> ... >>> }; >>> >>> ...which would then look up the other info in the usual way from >>> Catalog.pm. >>> >>> > As a possible follow-up, I have also started work on generating the >>> > ObjectProperty structure in objectaddress.c. One of the things >>> you need >>> > for that is making genbki.pl <http://genbki.pl> aware of the >>> syscache information. There >>> > is some more work to be done there, but it's looking promising. >>> >>> I haven't studied this, but it seems interesting. >>> >>> One other possible improvement: syscache.c has a bunch of #include's, >>> one for each catalog with a cache, so there's still a bit of manual >>> work in adding a cache, and the current #include list is a bit >>> cumbersome. Perhaps it's worth it to have the script emit them as well? >>> >>> I also wonder if at some point it will make sense to split off a >>> separate script(s) for some things that are unrelated to the >>> bootstrap data. genbki.pl <http://genbki.pl> is getting pretty large, >>> and there are additional things that could be done with syscaches, >>> e.g. inlined eq/hash functions for cache lookup [2]. >>> >>> [1] >>> https://www.postgresql.org/message-id/12460.1570734874@sss.pgh.pa.us >>> <https://www.postgresql.org/message-id/12460.1570734874@sss.pgh.pa.us> >>> [2] >>> https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de <https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de> >>> >>> -- >>> John Naylor >>> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com> > > >
Attachment
On Thu, Aug 31, 2023 at 6:23 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> Attached is an updated patch set.
>
> One win here is that the ObjectProperty lookup now uses a hash function
> instead of a linear search. So hopefully the performance is better (to
> be confirmed) and it could now be used for things like [0].
+ # XXX This one neither, but if I add it to @skip, PerfectHash will fail. (???)
+ #FIXME: AttributeRelationId
I took a quick look at this, and attached is the least invasive way to get it working for now, which is to bump the table size slightly. The comment says doing this isn't reliable, but it happens to work in this case. Playing around with the functions is hit-or-miss, and when that fails, somehow the larger table saves the day.
Attachment
I wrote:
> + # XXX This one neither, but if I add it to @skip, PerfectHash will fail. (???)
> + #FIXME: AttributeRelationId
>
> I took a quick look at this, and attached is the least invasive way to get it working for now, which is to bump the table size slightly. The comment says doing this isn't reliable, but it happens to work in this case. Playing around with the functions is hit-or-miss, and when that fails, somehow the larger table saves the day.
To while away a rainy day, I poked at this a bit more and found the input is pathological with our current methods. Even with a large-ish exhaustive search, the two success are strange in that they only succeeded by accidentally bumping the table size up to where I got it to work before (77):
With multipliers (5, 19), it recognizes that the initial table size (75) is a multiple of 5, so increases the table size to 76, which is a multiple of 19, so it increases it again to 77 and succeeds.
Same with (3, 76): 75 is a multiple of 3, so up to 76, which of course divides 76, so bumps it to 77 likewise.
Turning the loop into
a = (a ^ c) * 257;
b = (b ^ c) * X;
...seems to work very well.
> + #FIXME: AttributeRelationId
>
> I took a quick look at this, and attached is the least invasive way to get it working for now, which is to bump the table size slightly. The comment says doing this isn't reliable, but it happens to work in this case. Playing around with the functions is hit-or-miss, and when that fails, somehow the larger table saves the day.
To while away a rainy day, I poked at this a bit more and found the input is pathological with our current methods. Even with a large-ish exhaustive search, the two success are strange in that they only succeeded by accidentally bumping the table size up to where I got it to work before (77):
With multipliers (5, 19), it recognizes that the initial table size (75) is a multiple of 5, so increases the table size to 76, which is a multiple of 19, so it increases it again to 77 and succeeds.
Same with (3, 76): 75 is a multiple of 3, so up to 76, which of course divides 76, so bumps it to 77 likewise.
Turning the loop into
a = (a ^ c) * 257;
b = (b ^ c) * X;
...seems to work very well.
In fact, now trying some powers-of-two for X before the primes works most of the time with our inputs, even for some unicode normalization functions, on the first seed iteration. That likely won't make any difference in practice, but it's an interesting demo. I've attached these two draft ideas as text.
--
John Naylor
EDB: http://www.enterprisedb.com
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment
On Thu, Aug 31, 2023 at 6:23 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> Attached is an updated patch set.
> There was some discussion about whether the catalog files are the right
> place to keep syscache information. Personally, I would find that more
> convenient. (Note that we also recently moved the definitions of
> indexes and toast tables from files with the whole list into the various
> catalog files.) But we can also keep it somewhere else. The important
> thing is that Catalog.pm can find it somewhere in a structured form.
I don't have anything further to say on how to fit it together, but I'll go ahead share some other comments from a quick read of v3-0003:
+ # XXX hardcoded exceptions
+ # extension doesn't belong to extnamespace
+ $attnum_namespace = undef if $catname eq 'pg_extension';
+ # pg_database owner is spelled datdba
+ $attnum_owner = "Anum_pg_database_datdba" if $catname eq 'pg_database';
These exceptions seem like true exceptions...
+ # XXX?
+ $name_syscache = "SUBSCRIPTIONNAME" if $catname eq 'pg_subscription';
+ # XXX?
+ $is_nsp_name_unique = 1 if $catname eq 'pg_collation';
+ $is_nsp_name_unique = 1 if $catname eq 'pg_opclass';
+ $is_nsp_name_unique = 1 if $catname eq 'pg_opfamily';
+ $is_nsp_name_unique = 1 if $catname eq 'pg_subscription';
... but as the XXX conveys, these indicate a failure to do the right thing. Trying to derive this info from the index declarations is currently fragile. E.g. making one small change to this regex:
- if ($index->{index_decl} =~ /\(\w+name name_ops(, \w+namespace oid_ops)?\)/)
+ if ($index->{index_decl} =~ /\w+name name_ops(, \w+namespace oid_ops)?\)/)
...causes "is_nsp_name_unique" to flip from false to true, and/or sets "name_catcache_id" where it wasn't before, for several entries. It's not quite clear what the "rule" is intended to be, or whether it's robust going forward.
That being the case, perhaps some effort should also be made to make it easy to verify the output matches HEAD (or rather, v3-0001). This is now difficult because of the C99 ".foo = bar" syntax, plus the alphabetical ordering.
+foreach my $catname (@catnames)
+{
+ print $objectproperty_info_fh qq{#include "catalog/${catname}_d.h"\n};
+}
Assuming we have a better way to know which catalogs need object properties, is there a todo to only #include those?
> To finish up the ObjectProperty generation, we also need to store some
> more data, namely the OBJECT_* mappings. We probably do not want to
> store those in the catalog headers; that looks like a layering violation
> to me.
Possibly, but it's also convenient and, one could argue, more straightforward than storing syscache id and num_buckets in the index info.
One last thing: I did try to make the hash function use uint16 for the key (to reduce loop iterations on nul bytes), and that didn't help, so we are left with the ideas I mentioned earlier.
(not changing CF status, because nothing specific is really required to change at the moment, some things up in the air)
> There was some discussion about whether the catalog files are the right
> place to keep syscache information. Personally, I would find that more
> convenient. (Note that we also recently moved the definitions of
> indexes and toast tables from files with the whole list into the various
> catalog files.) But we can also keep it somewhere else. The important
> thing is that Catalog.pm can find it somewhere in a structured form.
I don't have anything further to say on how to fit it together, but I'll go ahead share some other comments from a quick read of v3-0003:
+ # XXX hardcoded exceptions
+ # extension doesn't belong to extnamespace
+ $attnum_namespace = undef if $catname eq 'pg_extension';
+ # pg_database owner is spelled datdba
+ $attnum_owner = "Anum_pg_database_datdba" if $catname eq 'pg_database';
These exceptions seem like true exceptions...
+ # XXX?
+ $name_syscache = "SUBSCRIPTIONNAME" if $catname eq 'pg_subscription';
+ # XXX?
+ $is_nsp_name_unique = 1 if $catname eq 'pg_collation';
+ $is_nsp_name_unique = 1 if $catname eq 'pg_opclass';
+ $is_nsp_name_unique = 1 if $catname eq 'pg_opfamily';
+ $is_nsp_name_unique = 1 if $catname eq 'pg_subscription';
... but as the XXX conveys, these indicate a failure to do the right thing. Trying to derive this info from the index declarations is currently fragile. E.g. making one small change to this regex:
- if ($index->{index_decl} =~ /\(\w+name name_ops(, \w+namespace oid_ops)?\)/)
+ if ($index->{index_decl} =~ /\w+name name_ops(, \w+namespace oid_ops)?\)/)
...causes "is_nsp_name_unique" to flip from false to true, and/or sets "name_catcache_id" where it wasn't before, for several entries. It's not quite clear what the "rule" is intended to be, or whether it's robust going forward.
That being the case, perhaps some effort should also be made to make it easy to verify the output matches HEAD (or rather, v3-0001). This is now difficult because of the C99 ".foo = bar" syntax, plus the alphabetical ordering.
+foreach my $catname (@catnames)
+{
+ print $objectproperty_info_fh qq{#include "catalog/${catname}_d.h"\n};
+}
Assuming we have a better way to know which catalogs need object properties, is there a todo to only #include those?
> To finish up the ObjectProperty generation, we also need to store some
> more data, namely the OBJECT_* mappings. We probably do not want to
> store those in the catalog headers; that looks like a layering violation
> to me.
Possibly, but it's also convenient and, one could argue, more straightforward than storing syscache id and num_buckets in the index info.
One last thing: I did try to make the hash function use uint16 for the key (to reduce loop iterations on nul bytes), and that didn't help, so we are left with the ideas I mentioned earlier.
(not changing CF status, because nothing specific is really required to change at the moment, some things up in the air)
Here is a rebased patch set, along with a summary of the questions I have about these patches: v4-0001-Generate-syscache-info-from-catalog-files.patch What's a good syntax to declare a syscache? Currently, I have, for example -DECLARE_UNIQUE_INDEX_PKEY(pg_type_oid_index, 2703, TypeOidIndexId, pg_type, btree(oid oid_ops)); +DECLARE_UNIQUE_INDEX_PKEY(pg_type_oid_index, 2703, TypeOidIndexId, pg_type, btree(oid oid_ops), TYPEOID, 64); I suppose a sensible alternative could be to leave the DECLARE_..._INDEX... alone and make a separate statement, like MAKE_SYSCACHE(pg_type_oid_index, TYPEOID, 64); That's at least visually easier, because some of those DECLARE_... lines are getting pretty long. I would like to keep those MAKE_SYSCACHE lines in the catalog files. That just makes it easier to reference everything together. v4-0002-Generate-ObjectProperty-from-catalog-files.patch Several questions here: * What's a good way to declare the mapping between catalog and object type? * How to select which catalogs have ObjectProperty entries generated? * Where to declare class descriptions? Or just keep the current hack in the patch. * How to declare the purpose of a catalog column, like "this is the ACL column for this catalog". This is currently done by name, but maybe it should be more explicit. * Question about how to pick the correct value for is_nsp_name_unique? This second patch is clearly still WIP. I hope the first patch can be finished soon, however. I was also amused to find the original commit fa351d5a0db that introduced ObjectProperty, which contains the following comment: Performance isn't critical here, so there's no need to be smart about how we do the search. which I'm now trying to amend. On 11.09.23 07:02, John Naylor wrote: > > On Thu, Aug 31, 2023 at 6:23 PM Peter Eisentraut <peter@eisentraut.org > <mailto:peter@eisentraut.org>> wrote: > > > Attached is an updated patch set. > > > There was some discussion about whether the catalog files are the right > > place to keep syscache information. Personally, I would find that more > > convenient. (Note that we also recently moved the definitions of > > indexes and toast tables from files with the whole list into the various > > catalog files.) But we can also keep it somewhere else. The important > > thing is that Catalog.pm can find it somewhere in a structured form. > > I don't have anything further to say on how to fit it together, but I'll > go ahead share some other comments from a quick read of v3-0003: > > + # XXX hardcoded exceptions > + # extension doesn't belong to extnamespace > + $attnum_namespace = undef if $catname eq 'pg_extension'; > + # pg_database owner is spelled datdba > + $attnum_owner = "Anum_pg_database_datdba" if $catname eq 'pg_database'; > > These exceptions seem like true exceptions... > > + # XXX? > + $name_syscache = "SUBSCRIPTIONNAME" if $catname eq 'pg_subscription'; > + # XXX? > + $is_nsp_name_unique = 1 if $catname eq 'pg_collation'; > + $is_nsp_name_unique = 1 if $catname eq 'pg_opclass'; > + $is_nsp_name_unique = 1 if $catname eq 'pg_opfamily'; > + $is_nsp_name_unique = 1 if $catname eq 'pg_subscription'; > > ... but as the XXX conveys, these indicate a failure to do the right > thing. Trying to derive this info from the index declarations is > currently fragile. E.g. making one small change to this regex: > > - if ($index->{index_decl} =~ /\(\w+name name_ops(, > \w+namespace oid_ops)?\)/) > + if ($index->{index_decl} =~ /\w+name name_ops(, > \w+namespace oid_ops)?\)/) > > ...causes "is_nsp_name_unique" to flip from false to true, and/or sets > "name_catcache_id" where it wasn't before, for several entries. It's not > quite clear what the "rule" is intended to be, or whether it's robust > going forward. > > That being the case, perhaps some effort should also be made to make it > easy to verify the output matches HEAD (or rather, v3-0001). This is now > difficult because of the C99 ".foo = bar" syntax, plus the alphabetical > ordering. > > +foreach my $catname (@catnames) > +{ > + print $objectproperty_info_fh qq{#include "catalog/${catname}_d.h"\n}; > +} > > Assuming we have a better way to know which catalogs need > object properties, is there a todo to only #include those? > > > To finish up the ObjectProperty generation, we also need to store some > > more data, namely the OBJECT_* mappings. We probably do not want to > > store those in the catalog headers; that looks like a layering violation > > to me. > > Possibly, but it's also convenient and, one could argue, more > straightforward than storing syscache id and num_buckets in the index info. > > One last thing: I did try to make the hash function use uint16 for the > key (to reduce loop iterations on nul bytes), and that didn't help, so > we are left with the ideas I mentioned earlier. > > (not changing CF status, because nothing specific is really required to > change at the moment, some things up in the air) > > -- > John Naylor > EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
Attachment
On Thu, Nov 2, 2023 at 4:13 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > Here is a rebased patch set, along with a summary of the questions I > have about these patches: This is an excellent summary of the issues, thanks. > v4-0001-Generate-syscache-info-from-catalog-files.patch > > What's a good syntax to declare a syscache? Currently, I have, for example > > -DECLARE_UNIQUE_INDEX_PKEY(pg_type_oid_index, 2703, TypeOidIndexId, > pg_type, btree(oid oid_ops)); > +DECLARE_UNIQUE_INDEX_PKEY(pg_type_oid_index, 2703, TypeOidIndexId, > pg_type, btree(oid oid_ops), TYPEOID, 64); > > I suppose a sensible alternative could be to leave the > DECLARE_..._INDEX... alone and make a separate statement, like > > MAKE_SYSCACHE(pg_type_oid_index, TYPEOID, 64); > > That's at least visually easier, because some of those > DECLARE_... lines are getting pretty long. Probably a good idea, and below I mention a third possible macro. > I would like to keep those MAKE_SYSCACHE lines in the catalog files. > That just makes it easier to reference everything together. That seems fine. If we ever want to do something more invasive with the syscaches, some of that can be copied/moved out into a separate script, but what's there in 0001 is pretty small. > v4-0002-Generate-ObjectProperty-from-catalog-files.patch > > Several questions here: > > * What's a good way to declare the mapping between catalog and object > type? Perhaps this idea: > I suppose a sensible alternative could be to leave the > DECLARE_..._INDEX... alone and make a separate statement, like > > MAKE_SYSCACHE(pg_type_oid_index, TYPEOID, 64); ...could be used, so something like DECLARE_OBJECT_INFO(OBJECT_ACCESS_METHOD); > * How to select which catalogs have ObjectProperty entries generated? Would the above work for that as well? > * Where to declare class descriptions? Or just keep the current hack in > the patch. I don't have an opinion on this. > * How to declare the purpose of a catalog column, like "this is the ACL > column for this catalog". This is currently done by name, but maybe > it should be more explicit. Perhaps we can have additional column macros, like BKI_OBJ_ACL or something, but that doesn't seem like it would result in less code. > * Question about how to pick the correct value for is_nsp_name_unique? How is that known now -- I don't mean in the patch, but in general? Also, I get most of the hard-wired exceptions, but + # XXX? + $name_syscache = "SUBSCRIPTIONNAME" if $catname eq 'pg_subscription'; What is not right otherwise? + if ($index->{index_decl} eq 'btree(oid oid_ops)') + { + $oid_index = $index->{index_oid_macro}; + $oid_syscache = $index->{syscache_name}; + } + if ($index->{index_decl} =~ /\(\w+name name_ops(, \w+namespace oid_ops)?\)/) + { + $name_syscache = $index->{syscache_name}; + $is_nsp_name_unique = 1 if $index->{is_unique}; + } The variables name_syscache and syscache_name are unfortunately close to eachother.
On 10.01.24 09:00, John Naylor wrote: >> I suppose a sensible alternative could be to leave the >> DECLARE_..._INDEX... alone and make a separate statement, like >> >> MAKE_SYSCACHE(pg_type_oid_index, TYPEOID, 64); >> >> That's at least visually easier, because some of those >> DECLARE_... lines are getting pretty long. > > Probably a good idea, and below I mention a third possible macro. I updated the patch to use this style (but I swapped the first two arguments from my example, so that the thing being created is named first). I also changed the names of the output files a bit to make them less confusing. (I initially had some files named .c.h, which was weird, but apparently necessary to avoid confusing the build system. But it's all clearer now.) Other than bugs and perhaps style opinions, I think the first patch is pretty good now. I haven't changed much in the second patch, other than to update it for the code changes made in the first patch. It's still very much WIP/preview at the moment.
Attachment
On Wed, Jan 17, 2024 at 7:46 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > I updated the patch to use this style (but I swapped the first two > arguments from my example, so that the thing being created is named first). > > I also changed the names of the output files a bit to make them less > confusing. (I initially had some files named .c.h, which was weird, but > apparently necessary to avoid confusing the build system. But it's all > clearer now.) > > Other than bugs and perhaps style opinions, I think the first patch is > pretty good now. LGTM. The only style consideration that occurred to me just now was MAKE_SYSCACHE, since it doesn't match the DECLARE_... macros. It seems like the same thing from a code generation perspective. The other macros refer to relations, so there is a difference, but maybe it doesn't matter. I don't have a strong opinion.
On 19.01.24 06:28, John Naylor wrote: > On Wed, Jan 17, 2024 at 7:46 PM Peter Eisentraut <peter@eisentraut.org> wrote: >> >> I updated the patch to use this style (but I swapped the first two >> arguments from my example, so that the thing being created is named first). >> >> I also changed the names of the output files a bit to make them less >> confusing. (I initially had some files named .c.h, which was weird, but >> apparently necessary to avoid confusing the build system. But it's all >> clearer now.) >> >> Other than bugs and perhaps style opinions, I think the first patch is >> pretty good now. > > LGTM. The only style consideration that occurred to me just now was > MAKE_SYSCACHE, since it doesn't match the DECLARE_... macros. It seems > like the same thing from a code generation perspective. The other > macros refer to relations, so there is a difference, but maybe it > doesn't matter. I don't have a strong opinion. The DECLARE_* macros map to actual BKI commands ("declare toast", "declare index"). So I wanted to use a different verb to distinguish "generate code for this" from those BKI commands.
On Fri, Jan 19, 2024 at 9:03 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > The DECLARE_* macros map to actual BKI commands ("declare toast", > "declare index"). So I wanted to use a different verb to distinguish > "generate code for this" from those BKI commands. That makes sense to me.
On 22.01.24 01:33, John Naylor wrote: > On Fri, Jan 19, 2024 at 9:03 PM Peter Eisentraut <peter@eisentraut.org> wrote: >> >> The DECLARE_* macros map to actual BKI commands ("declare toast", >> "declare index"). So I wanted to use a different verb to distinguish >> "generate code for this" from those BKI commands. > > That makes sense to me. I have committed the first patch, and the buildfarm hiccup seems to have passed. I'll close this commitfest entry now. I have enough feedback to work on the ObjectProperty generation, but I'll make a new entry for that.