Thread: generate syscache info automatically

generate syscache info automatically

From

Peter Eisentraut

Date:

30 May 2023, 21:57:53

I want to report on my on-the-plane-to-PGCon project.

The idea was mentioned in [0].  genbki.pl already knows everything about 
system catalog indexes.  If we add a "please also make a syscache for 
this one" flag to the catalog metadata, we can have genbki.pl produce 
the tables in syscache.c and syscache.h automatically.

Aside from avoiding the cumbersome editing of those tables, I think this 
layout is also conceptually cleaner, as you can more easily see which 
system catalog indexes have syscaches and maybe ask questions about why 
or why not.

As a possible follow-up, I have also started work on generating the 
ObjectProperty structure in objectaddress.c.  One of the things you need 
for that is making genbki.pl aware of the syscache information.  There 
is some more work to be done there, but it's looking promising.


[0]: 
https://www.postgresql.org/message-id/flat/CA%2BhUKGKdpDjKL2jgC-GpoL4DGZU1YPqnOFHbDqFkfRQcPaR5DQ%40mail.gmail.com

Attachment

Re: generate syscache info automatically

From

Dagfinn Ilmari Mannsåker

Date:

31 May 2023, 17:02:57

Peter Eisentraut <peter@eisentraut.org> writes:

> The idea was mentioned in [0].  genbki.pl already knows everything about
> system catalog indexes.  If we add a "please also make a syscache for 
> this one" flag to the catalog metadata, we can have genbki.pl produce
> the tables in syscache.c and syscache.h automatically.

+1 on this worthwhile reduction of manual work.  Tangentially, it
reminded me of one of my least favourite parts of Catalog.pm, the
regexes in ParseHeader():


> diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
> index 84aaeb002a..a727d692b7 100644
> --- a/src/backend/catalog/Catalog.pm
> +++ b/src/backend/catalog/Catalog.pm
> @@ -110,7 +110,7 @@ sub ParseHeader
>                };
>          }
>          elsif (
> -            /^DECLARE_(UNIQUE_)?INDEX(_PKEY)?\(\s*(\w+),\s*(\d+),\s*(\w+),\s*(.+)\)/
> +            /^DECLARE_(UNIQUE_)?INDEX(_PKEY)?\(\s*(\w+),\s*(\d+),\s*(\w+),\s*(\w+),\s*(.+)\)/
>            )
>          {
>              push @{ $catalog{indexing} },
> @@ -120,7 +120,8 @@ sub ParseHeader
>                  index_name => $3,
>                  index_oid => $4,
>                  index_oid_macro => $5,
> -                index_decl => $6
> +                table_name => $6,
> +                index_decl => $7
>                };
>          }
>          elsif (/^DECLARE_OID_DEFINING_MACRO\(\s*(\w+),\s*(\d+)\)/)


Now that we require Perl 5.14, we could replace this parenthesis-
counting nightmare with named captures (introduced in Perl 5.10), which
would make the above change look like this instead (context expanded to
show the whole elsif block):

          elsif (
             /^DECLARE_(UNIQUE_)?INDEX(_PKEY)?\(\s*
              (?<index_name>\w+),\s*
              (?<index_oid>\d+),\s*
              (?<index_oid_macro>\w+),\s*
+             (?<table_name>\w+),\s*
              (?<index_decl>.+)
              \)/x
           )
         {
             push @{ $catalog{indexing} },
               {
                 is_unique => $1 ? 1 : 0,
                 is_pkey => $2 ? 1 : 0,
                 %+,
               };
         }

For other patterns without the optional bits in the keyword, it becomes
even simpler, e.g.

        if (/^DECLARE_TOAST\(\s*
             (?<parent_table>\w+),\s*
             (?<toast_oid>\d+),\s*
             (?<toast_index_oid>\d+)\s*
             \)/x
          )
        {
            push @{ $catalog{toasting} }, {%+};
           }


I'd be happy to submit a patch to do this for all the ParseHeader()
regexes (in a separate thread) if others agree this is an improvement.

- ilmari

Re: generate syscache info automatically

From

Peter Eisentraut

Date:

31 May 2023, 21:24:01

On 31.05.23 13:02, Dagfinn Ilmari Mannsåker wrote:
> For other patterns without the optional bits in the keyword, it becomes
> even simpler, e.g.
> 
>         if (/^DECLARE_TOAST\(\s*
>              (?<parent_table>\w+),\s*
>              (?<toast_oid>\d+),\s*
>              (?<toast_index_oid>\d+)\s*
>              \)/x
>           )
>         {
>             push @{ $catalog{toasting} }, {%+};
>            }
> 
> 
> I'd be happy to submit a patch to do this for all the ParseHeader()
> regexes (in a separate thread) if others agree this is an improvement.

I would welcome such a patch.

Re: generate syscache info automatically

From

John Naylor

Date:

15 June 2023, 07:45:22

On Wed, May 31, 2023 at 4:58 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I want to report on my on-the-plane-to-PGCon project.
>
> The idea was mentioned in [0]. genbki.pl already knows everything about
> system catalog indexes. If we add a "please also make a syscache for
> this one" flag to the catalog metadata, we can have genbki.pl produce
> the tables in syscache.c and syscache.h automatically.
>
> Aside from avoiding the cumbersome editing of those tables, I think this
> layout is also conceptually cleaner, as you can more easily see which
> system catalog indexes have syscaches and maybe ask questions about why
> or why not.

When this has come up before, one objection was that index declarations shouldn't know about cache names and bucket sizes [1]. The second paragraph above makes a reasonable case for that, however. I believe one alternative idea was for a script to read the enum, which would look something like this:

#define DECLARE_SYSCACHE(cacheid,indexname,numbuckets) cacheid

enum SysCacheIdentifier
{
DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index, 16) = 0,
...
};

...which would then look up the other info in the usual way from Catalog.pm.

> As a possible follow-up, I have also started work on generating the
> ObjectProperty structure in objectaddress.c. One of the things you need
> for that is making genbki.pl aware of the syscache information. There
> is some more work to be done there, but it's looking promising.

I haven't studied this, but it seems interesting.

One other possible improvement: syscache.c has a bunch of #include's, one for each catalog with a cache, so there's still a bit of manual work in adding a cache, and the current #include list is a bit cumbersome. Perhaps it's worth it to have the script emit them as well?

I also wonder if at some point it will make sense to split off a separate script(s) for some things that are unrelated to the bootstrap data. genbki.pl is getting pretty large, and there are additional things that could be done with syscaches, e.g. inlined eq/hash functions for cache lookup [2].

[1] https://www.postgresql.org/message-id/12460.1570734874@sss.pgh.pa.us
[2] https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de

--
John Naylor
EDB: http://www.enterprisedb.com

Re: generate syscache info automatically

From

Peter Eisentraut

Date:

03 July 2023, 05:45:39

Here is an updated patch set that adjusts for the recently introduced 
named captures.  I will address the other issues later.  I think the 
first few patches in the series can be considered nonetheless.


On 15.06.23 09:45, John Naylor wrote:
> On Wed, May 31, 2023 at 4:58 AM Peter Eisentraut <peter@eisentraut.org 
> <mailto:peter@eisentraut.org>> wrote:
>  >
>  > I want to report on my on-the-plane-to-PGCon project.
>  >
>  > The idea was mentioned in [0]. genbki.pl <http://genbki.pl> already 
> knows everything about
>  > system catalog indexes.  If we add a "please also make a syscache for
>  > this one" flag to the catalog metadata, we can have genbki.pl 
> <http://genbki.pl> produce
>  > the tables in syscache.c and syscache.h automatically.
>  >
>  > Aside from avoiding the cumbersome editing of those tables, I think this
>  > layout is also conceptually cleaner, as you can more easily see which
>  > system catalog indexes have syscaches and maybe ask questions about why
>  > or why not.
> 
> When this has come up before, one objection was that index declarations 
> shouldn't know about cache names and bucket sizes [1]. The second 
> paragraph above makes a reasonable case for that, however. I believe one 
> alternative idea was for a script to read the enum, which would look 
> something like this:
> 
> #define DECLARE_SYSCACHE(cacheid,indexname,numbuckets) cacheid
> 
> enum SysCacheIdentifier
> {
> DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index, 16) = 0,
> ...
> };
> 
> ...which would then look up the other info in the usual way from Catalog.pm.
> 
>  > As a possible follow-up, I have also started work on generating the
>  > ObjectProperty structure in objectaddress.c.  One of the things you need
>  > for that is making genbki.pl <http://genbki.pl> aware of the syscache 
> information.  There
>  > is some more work to be done there, but it's looking promising.
> 
> I haven't studied this, but it seems interesting.
> 
> One other possible improvement: syscache.c has a bunch of #include's, 
> one for each catalog with a cache, so there's still a bit of manual work 
> in adding a cache, and the current #include list is a bit cumbersome. 
> Perhaps it's worth it to have the script emit them as well?
> 
> I also wonder if at some point it will make sense to split off a 
> separate script(s) for some things that are unrelated to the bootstrap 
> data. genbki.pl <http://genbki.pl> is getting pretty large, and there 
> are additional things that could be done with syscaches, e.g. inlined 
> eq/hash functions for cache lookup [2].
> 
> [1] https://www.postgresql.org/message-id/12460.1570734874@sss.pgh.pa.us 
> <https://www.postgresql.org/message-id/12460.1570734874@sss.pgh.pa.us>
> [2] 
> https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de
<https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de>
> 
> --
> John Naylor
> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>

Attachment

Re: generate syscache info automatically

From

Peter Eisentraut

Date:

24 August 2023, 14:03:29

On 03.07.23 07:45, Peter Eisentraut wrote:
> Here is an updated patch set that adjusts for the recently introduced 
> named captures.  I will address the other issues later.  I think the 
> first few patches in the series can be considered nonetheless.

I have committed the 0001 patch, which was really a (code comment) bug fix.

I think the patches 0002 and 0003 should be uncontroversial, so I'd like 
to commit them if no one objects.

The remaining patches are still WIP.


> On 15.06.23 09:45, John Naylor wrote:
>> On Wed, May 31, 2023 at 4:58 AM Peter Eisentraut <peter@eisentraut.org 
>> <mailto:peter@eisentraut.org>> wrote:
>>  >
>>  > I want to report on my on-the-plane-to-PGCon project.
>>  >
>>  > The idea was mentioned in [0]. genbki.pl <http://genbki.pl> already 
>> knows everything about
>>  > system catalog indexes.  If we add a "please also make a syscache for
>>  > this one" flag to the catalog metadata, we can have genbki.pl 
>> <http://genbki.pl> produce
>>  > the tables in syscache.c and syscache.h automatically.
>>  >
>>  > Aside from avoiding the cumbersome editing of those tables, I think 
>> this
>>  > layout is also conceptually cleaner, as you can more easily see which
>>  > system catalog indexes have syscaches and maybe ask questions about 
>> why
>>  > or why not.
>>
>> When this has come up before, one objection was that index 
>> declarations shouldn't know about cache names and bucket sizes [1]. 
>> The second paragraph above makes a reasonable case for that, however. 
>> I believe one alternative idea was for a script to read the enum, 
>> which would look something like this:
>>
>> #define DECLARE_SYSCACHE(cacheid,indexname,numbuckets) cacheid
>>
>> enum SysCacheIdentifier
>> {
>> DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index, 16) = 0,
>> ...
>> };
>>
>> ...which would then look up the other info in the usual way from 
>> Catalog.pm.
>>
>>  > As a possible follow-up, I have also started work on generating the
>>  > ObjectProperty structure in objectaddress.c.  One of the things you 
>> need
>>  > for that is making genbki.pl <http://genbki.pl> aware of the 
>> syscache information.  There
>>  > is some more work to be done there, but it's looking promising.
>>
>> I haven't studied this, but it seems interesting.
>>
>> One other possible improvement: syscache.c has a bunch of #include's, 
>> one for each catalog with a cache, so there's still a bit of manual 
>> work in adding a cache, and the current #include list is a bit 
>> cumbersome. Perhaps it's worth it to have the script emit them as well?
>>
>> I also wonder if at some point it will make sense to split off a 
>> separate script(s) for some things that are unrelated to the bootstrap 
>> data. genbki.pl <http://genbki.pl> is getting pretty large, and there 
>> are additional things that could be done with syscaches, e.g. inlined 
>> eq/hash functions for cache lookup [2].
>>
>> [1] 
>> https://www.postgresql.org/message-id/12460.1570734874@sss.pgh.pa.us 
>> <https://www.postgresql.org/message-id/12460.1570734874@sss.pgh.pa.us>
>> [2] 
>> https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de
<https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de>
>>
>> -- 
>> John Naylor
>> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>

Re: generate syscache info automatically

From

Peter Eisentraut

Date:

31 August 2023, 11:23:00

I have committed 0002 and 0003, and also a small bug fix in the 
ObjectProperty entry for "transforms".

I have also gotten the automatic generation of the ObjectProperty lookup 
table working (with some warts).

Attached is an updated patch set.

One win here is that the ObjectProperty lookup now uses a hash function 
instead of a linear search.  So hopefully the performance is better (to 
be confirmed) and it could now be used for things like [0].

[0]: 
https://www.postgresql.org/message-id/flat/12f1b1d2-f8cf-c4a2-72ec-441bd79546cb@enterprisedb.com

There was some discussion about whether the catalog files are the right 
place to keep syscache information.  Personally, I would find that more 
convenient.  (Note that we also recently moved the definitions of 
indexes and toast tables from files with the whole list into the various 
catalog files.)  But we can also keep it somewhere else.  The important 
thing is that Catalog.pm can find it somewhere in a structured form.

To finish up the ObjectProperty generation, we also need to store some 
more data, namely the OBJECT_* mappings.  We probably do not want to 
store those in the catalog headers; that looks like a layering violation 
to me.

So, there is some discussion to be had about where to put all this 
across various use cases.


On 24.08.23 16:03, Peter Eisentraut wrote:
> On 03.07.23 07:45, Peter Eisentraut wrote:
>> Here is an updated patch set that adjusts for the recently introduced 
>> named captures.  I will address the other issues later.  I think the 
>> first few patches in the series can be considered nonetheless.
> 
> I have committed the 0001 patch, which was really a (code comment) bug fix.
> 
> I think the patches 0002 and 0003 should be uncontroversial, so I'd like 
> to commit them if no one objects.
> 
> The remaining patches are still WIP.
> 
> 
>> On 15.06.23 09:45, John Naylor wrote:
>>> On Wed, May 31, 2023 at 4:58 AM Peter Eisentraut 
>>> <peter@eisentraut.org <mailto:peter@eisentraut.org>> wrote:
>>>  >
>>>  > I want to report on my on-the-plane-to-PGCon project.
>>>  >
>>>  > The idea was mentioned in [0]. genbki.pl <http://genbki.pl> 
>>> already knows everything about
>>>  > system catalog indexes.  If we add a "please also make a syscache for
>>>  > this one" flag to the catalog metadata, we can have genbki.pl 
>>> <http://genbki.pl> produce
>>>  > the tables in syscache.c and syscache.h automatically.
>>>  >
>>>  > Aside from avoiding the cumbersome editing of those tables, I 
>>> think this
>>>  > layout is also conceptually cleaner, as you can more easily see which
>>>  > system catalog indexes have syscaches and maybe ask questions 
>>> about why
>>>  > or why not.
>>>
>>> When this has come up before, one objection was that index 
>>> declarations shouldn't know about cache names and bucket sizes [1]. 
>>> The second paragraph above makes a reasonable case for that, however. 
>>> I believe one alternative idea was for a script to read the enum, 
>>> which would look something like this:
>>>
>>> #define DECLARE_SYSCACHE(cacheid,indexname,numbuckets) cacheid
>>>
>>> enum SysCacheIdentifier
>>> {
>>> DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index, 16) = 0,
>>> ...
>>> };
>>>
>>> ...which would then look up the other info in the usual way from 
>>> Catalog.pm.
>>>
>>>  > As a possible follow-up, I have also started work on generating the
>>>  > ObjectProperty structure in objectaddress.c.  One of the things 
>>> you need
>>>  > for that is making genbki.pl <http://genbki.pl> aware of the 
>>> syscache information.  There
>>>  > is some more work to be done there, but it's looking promising.
>>>
>>> I haven't studied this, but it seems interesting.
>>>
>>> One other possible improvement: syscache.c has a bunch of #include's, 
>>> one for each catalog with a cache, so there's still a bit of manual 
>>> work in adding a cache, and the current #include list is a bit 
>>> cumbersome. Perhaps it's worth it to have the script emit them as well?
>>>
>>> I also wonder if at some point it will make sense to split off a 
>>> separate script(s) for some things that are unrelated to the 
>>> bootstrap data. genbki.pl <http://genbki.pl> is getting pretty large, 
>>> and there are additional things that could be done with syscaches, 
>>> e.g. inlined eq/hash functions for cache lookup [2].
>>>
>>> [1] 
>>> https://www.postgresql.org/message-id/12460.1570734874@sss.pgh.pa.us 
>>> <https://www.postgresql.org/message-id/12460.1570734874@sss.pgh.pa.us>
>>> [2] 
>>> https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de
<https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de>
>>>
>>> -- 
>>> John Naylor
>>> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
> 
> 
>

Attachment

Re: generate syscache info automatically

From

John Naylor

Date:

01 September 2023, 11:31:14

On Thu, Aug 31, 2023 at 6:23 PM Peter Eisentraut <peter@eisentraut.org> wrote:

> Attached is an updated patch set.
>
> One win here is that the ObjectProperty lookup now uses a hash function
> instead of a linear search. So hopefully the performance is better (to
> be confirmed) and it could now be used for things like [0].

+ # XXX This one neither, but if I add it to @skip, PerfectHash will fail. (???)
+ #FIXME: AttributeRelationId

I took a quick look at this, and attached is the least invasive way to get it working for now, which is to bump the table size slightly. The comment says doing this isn't reliable, but it happens to work in this case. Playing around with the functions is hit-or-miss, and when that fails, somehow the larger table saves the day.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment

quick-hack-fix-hash.patch.txt

Re: generate syscache info automatically

From

John Naylor

Date:

02 September 2023, 10:37:50

I wrote:

> + # XXX This one neither, but if I add it to @skip, PerfectHash will fail. (???)
> + #FIXME: AttributeRelationId
>
> I took a quick look at this, and attached is the least invasive way to get it working for now, which is to bump the table size slightly. The comment says doing this isn't reliable, but it happens to work in this case. Playing around with the functions is hit-or-miss, and when that fails, somehow the larger table saves the day.

To while away a rainy day, I poked at this a bit more and found the input is pathological with our current methods. Even with a large-ish exhaustive search, the two success are strange in that they only succeeded by accidentally bumping the table size up to where I got it to work before (77):

With multipliers (5, 19), it recognizes that the initial table size (75) is a multiple of 5, so increases the table size to 76, which is a multiple of 19, so it increases it again to 77 and succeeds.

Same with (3, 76): 75 is a multiple of 3, so up to 76, which of course divides 76, so bumps it to 77 likewise.

Turning the loop into

a = (a ^ c) * 257;
b = (b ^ c) * X;

...seems to work very well.

In fact, now trying some powers-of-two for X before the primes works most of the time with our inputs, even for some unicode normalization functions, on the first seed iteration. That likely won't make any difference in practice, but it's an interesting demo. I've attached these two draft ideas as text.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment

Re: generate syscache info automatically

From

John Naylor

Date:

11 September 2023, 11:02:18

On Thu, Aug 31, 2023 at 6:23 PM Peter Eisentraut <peter@eisentraut.org> wrote:

> Attached is an updated patch set.

> There was some discussion about whether the catalog files are the right
> place to keep syscache information. Personally, I would find that more
> convenient. (Note that we also recently moved the definitions of
> indexes and toast tables from files with the whole list into the various
> catalog files.) But we can also keep it somewhere else. The important
> thing is that Catalog.pm can find it somewhere in a structured form.

I don't have anything further to say on how to fit it together, but I'll go ahead share some other comments from a quick read of v3-0003:

+ # XXX hardcoded exceptions
+ # extension doesn't belong to extnamespace
+ $attnum_namespace = undef if $catname eq 'pg_extension';
+ # pg_database owner is spelled datdba
+ $attnum_owner = "Anum_pg_database_datdba" if $catname eq 'pg_database';

These exceptions seem like true exceptions...

+ # XXX?
+ $name_syscache = "SUBSCRIPTIONNAME" if $catname eq 'pg_subscription';
+ # XXX?
+ $is_nsp_name_unique = 1 if $catname eq 'pg_collation';
+ $is_nsp_name_unique = 1 if $catname eq 'pg_opclass';
+ $is_nsp_name_unique = 1 if $catname eq 'pg_opfamily';
+ $is_nsp_name_unique = 1 if $catname eq 'pg_subscription';

... but as the XXX conveys, these indicate a failure to do the right thing. Trying to derive this info from the index declarations is currently fragile. E.g. making one small change to this regex:

- if ($index->{index_decl} =~ /$\w+name name_ops(, \w+namespace oid_ops)?$/)
+ if ($index->{index_decl} =~ /\w+name name_ops(, \w+namespace oid_ops)?\)/)

...causes "is_nsp_name_unique" to flip from false to true, and/or sets "name_catcache_id" where it wasn't before, for several entries. It's not quite clear what the "rule" is intended to be, or whether it's robust going forward.

That being the case, perhaps some effort should also be made to make it easy to verify the output matches HEAD (or rather, v3-0001). This is now difficult because of the C99 ".foo = bar" syntax, plus the alphabetical ordering.

+foreach my $catname (@catnames)
+{
+ print $objectproperty_info_fh qq{#include "catalog/${catname}_d.h"\n};
+}

Assuming we have a better way to know which catalogs need object properties, is there a todo to only #include those?

> To finish up the ObjectProperty generation, we also need to store some
> more data, namely the OBJECT_* mappings. We probably do not want to
> store those in the catalog headers; that looks like a layering violation
> to me.

Possibly, but it's also convenient and, one could argue, more straightforward than storing syscache id and num_buckets in the index info.

One last thing: I did try to make the hash function use uint16 for the key (to reduce loop iterations on nul bytes), and that didn't help, so we are left with the ideas I mentioned earlier.

(not changing CF status, because nothing specific is really required to change at the moment, some things up in the air)

--
John Naylor
EDB: http://www.enterprisedb.com

Re: generate syscache info automatically

From

Peter Eisentraut

Date:

01 November 2023, 21:13:03

Here is a rebased patch set, along with a summary of the questions I 
have about these patches:


v4-0001-Generate-syscache-info-from-catalog-files.patch

What's a good syntax to declare a syscache?  Currently, I have, for example

-DECLARE_UNIQUE_INDEX_PKEY(pg_type_oid_index, 2703, TypeOidIndexId, 
pg_type, btree(oid oid_ops));
+DECLARE_UNIQUE_INDEX_PKEY(pg_type_oid_index, 2703, TypeOidIndexId, 
pg_type, btree(oid oid_ops), TYPEOID, 64);

I suppose a sensible alternative could be to leave the
DECLARE_..._INDEX... alone and make a separate statement, like

MAKE_SYSCACHE(pg_type_oid_index, TYPEOID, 64);

That's at least visually easier, because some of those
DECLARE_... lines are getting pretty long.

I would like to keep those MAKE_SYSCACHE lines in the catalog files.
That just makes it easier to reference everything together.


v4-0002-Generate-ObjectProperty-from-catalog-files.patch

Several questions here:

* What's a good way to declare the mapping between catalog and object
   type?

* How to select which catalogs have ObjectProperty entries generated?

* Where to declare class descriptions?  Or just keep the current hack in
   the patch.

* How to declare the purpose of a catalog column, like "this is the ACL
   column for this catalog".  This is currently done by name, but maybe
   it should be more explicit.

* Question about how to pick the correct value for is_nsp_name_unique?

This second patch is clearly still WIP.  I hope the first patch can be 
finished soon, however.


I was also amused to find the original commit fa351d5a0db that 
introduced ObjectProperty, which contains the following comment:

     Performance isn't critical here, so there's no need to be smart
     about how we do the search.

which I'm now trying to amend.


On 11.09.23 07:02, John Naylor wrote:
> 
> On Thu, Aug 31, 2023 at 6:23 PM Peter Eisentraut <peter@eisentraut.org 
> <mailto:peter@eisentraut.org>> wrote:
> 
>  > Attached is an updated patch set.
> 
>  > There was some discussion about whether the catalog files are the right
>  > place to keep syscache information.  Personally, I would find that more
>  > convenient.  (Note that we also recently moved the definitions of
>  > indexes and toast tables from files with the whole list into the various
>  > catalog files.)  But we can also keep it somewhere else.  The important
>  > thing is that Catalog.pm can find it somewhere in a structured form.
> 
> I don't have anything further to say on how to fit it together, but I'll 
> go ahead share some other comments from a quick read of v3-0003:
> 
> + # XXX hardcoded exceptions
> + # extension doesn't belong to extnamespace
> + $attnum_namespace = undef if $catname eq 'pg_extension';
> + # pg_database owner is spelled datdba
> + $attnum_owner = "Anum_pg_database_datdba" if $catname eq 'pg_database';
> 
> These exceptions seem like true exceptions...
> 
> + # XXX?
> + $name_syscache = "SUBSCRIPTIONNAME" if $catname eq 'pg_subscription';
> + # XXX?
> + $is_nsp_name_unique = 1 if $catname eq 'pg_collation';
> + $is_nsp_name_unique = 1 if $catname eq 'pg_opclass';
> + $is_nsp_name_unique = 1 if $catname eq 'pg_opfamily';
> + $is_nsp_name_unique = 1 if $catname eq 'pg_subscription';
> 
> ... but as the XXX conveys, these indicate a failure to do the right 
> thing. Trying to derive this info from the index declarations is 
> currently fragile. E.g. making one small change to this regex:
> 
> -               if ($index->{index_decl} =~ /\(\w+name name_ops(, 
> \w+namespace oid_ops)?\)/)
> +               if ($index->{index_decl} =~ /\w+name name_ops(, 
> \w+namespace oid_ops)?\)/)
> 
> ...causes "is_nsp_name_unique" to flip from false to true, and/or sets 
> "name_catcache_id" where it wasn't before, for several entries. It's not 
> quite clear what the "rule" is intended to be, or whether it's robust 
> going forward.
> 
> That being the case, perhaps some effort should also be made to make it 
> easy to verify the output matches HEAD (or rather, v3-0001). This is now 
> difficult because of the C99 ".foo = bar" syntax, plus the alphabetical 
> ordering.
> 
> +foreach my $catname (@catnames)
> +{
> + print $objectproperty_info_fh qq{#include "catalog/${catname}_d.h"\n};
> +}
> 
> Assuming we have a better way to know which catalogs need 
> object properties, is there a todo to only #include those?
> 
>  > To finish up the ObjectProperty generation, we also need to store some
>  > more data, namely the OBJECT_* mappings.  We probably do not want to
>  > store those in the catalog headers; that looks like a layering violation
>  > to me.
> 
> Possibly, but it's also convenient and, one could argue, more 
> straightforward than storing syscache id and num_buckets in the index info.
> 
> One last thing: I did try to make the hash function use uint16 for the 
> key (to reduce loop iterations on nul bytes), and that didn't help, so 
> we are left with the ideas I mentioned earlier.
> 
> (not changing CF status, because nothing specific is really required to 
> change at the moment, some things up in the air)
> 
> --
> John Naylor
> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>

Attachment

Re: generate syscache info automatically

From

John Naylor

Date:

10 January 2024, 08:00:23

On Thu, Nov 2, 2023 at 4:13 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> Here is a rebased patch set, along with a summary of the questions I
> have about these patches:

This is an excellent summary of the issues, thanks.

> v4-0001-Generate-syscache-info-from-catalog-files.patch
>
> What's a good syntax to declare a syscache?  Currently, I have, for example
>
> -DECLARE_UNIQUE_INDEX_PKEY(pg_type_oid_index, 2703, TypeOidIndexId,
> pg_type, btree(oid oid_ops));
> +DECLARE_UNIQUE_INDEX_PKEY(pg_type_oid_index, 2703, TypeOidIndexId,
> pg_type, btree(oid oid_ops), TYPEOID, 64);
>
> I suppose a sensible alternative could be to leave the
> DECLARE_..._INDEX... alone and make a separate statement, like
>
> MAKE_SYSCACHE(pg_type_oid_index, TYPEOID, 64);
>
> That's at least visually easier, because some of those
> DECLARE_... lines are getting pretty long.

Probably a good idea, and below I mention a third possible macro.

> I would like to keep those MAKE_SYSCACHE lines in the catalog files.
> That just makes it easier to reference everything together.

That seems fine. If we ever want to do something more invasive with
the syscaches, some of that can be copied/moved out into a separate
script, but what's there in 0001 is pretty small.

> v4-0002-Generate-ObjectProperty-from-catalog-files.patch
>
> Several questions here:
>
> * What's a good way to declare the mapping between catalog and object
>    type?

Perhaps this idea:

> I suppose a sensible alternative could be to leave the
> DECLARE_..._INDEX... alone and make a separate statement, like
>
> MAKE_SYSCACHE(pg_type_oid_index, TYPEOID, 64);

...could be used, so something like

DECLARE_OBJECT_INFO(OBJECT_ACCESS_METHOD);

> * How to select which catalogs have ObjectProperty entries generated?

Would the above work for that as well?

> * Where to declare class descriptions?  Or just keep the current hack in
>    the patch.

I don't have an opinion on this.

> * How to declare the purpose of a catalog column, like "this is the ACL
>    column for this catalog".  This is currently done by name, but maybe
>    it should be more explicit.

Perhaps we can have additional column macros, like BKI_OBJ_ACL or
something, but that doesn't seem like it would result in less code.

> * Question about how to pick the correct value for is_nsp_name_unique?

How is that known now -- I don't mean in the patch, but in general?

Also, I get most of the hard-wired exceptions, but

+ # XXX?
+ $name_syscache = "SUBSCRIPTIONNAME" if $catname eq 'pg_subscription';

What is not right otherwise?

+ if ($index->{index_decl} eq 'btree(oid oid_ops)')
+ {
+ $oid_index = $index->{index_oid_macro};
+ $oid_syscache = $index->{syscache_name};
+ }
+ if ($index->{index_decl} =~ /\(\w+name name_ops(, \w+namespace oid_ops)?\)/)
+ {
+ $name_syscache = $index->{syscache_name};
+ $is_nsp_name_unique = 1 if $index->{is_unique};
+ }

The variables name_syscache and syscache_name are unfortunately close
to eachother.

Re: generate syscache info automatically

From

Peter Eisentraut

Date:

17 January 2024, 12:46:02

On 10.01.24 09:00, John Naylor wrote:
>> I suppose a sensible alternative could be to leave the
>> DECLARE_..._INDEX... alone and make a separate statement, like
>>
>> MAKE_SYSCACHE(pg_type_oid_index, TYPEOID, 64);
>>
>> That's at least visually easier, because some of those
>> DECLARE_... lines are getting pretty long.
> 
> Probably a good idea, and below I mention a third possible macro.

I updated the patch to use this style (but I swapped the first two 
arguments from my example, so that the thing being created is named first).

I also changed the names of the output files a bit to make them less 
confusing.  (I initially had some files named .c.h, which was weird, but 
apparently necessary to avoid confusing the build system.  But it's all 
clearer now.)

Other than bugs and perhaps style opinions, I think the first patch is 
pretty good now.

I haven't changed much in the second patch, other than to update it for 
the code changes made in the first patch.  It's still very much 
WIP/preview at the moment.

Attachment

Re: generate syscache info automatically

From

John Naylor

Date:

19 January 2024, 05:28:28

On Wed, Jan 17, 2024 at 7:46 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> I updated the patch to use this style (but I swapped the first two
> arguments from my example, so that the thing being created is named first).
>
> I also changed the names of the output files a bit to make them less
> confusing.  (I initially had some files named .c.h, which was weird, but
> apparently necessary to avoid confusing the build system.  But it's all
> clearer now.)
>
> Other than bugs and perhaps style opinions, I think the first patch is
> pretty good now.

LGTM. The only style consideration that occurred to me just now was
MAKE_SYSCACHE, since it doesn't match the DECLARE_... macros. It seems
like the same thing from a code generation perspective. The other
macros refer to relations, so there is a difference, but maybe it
doesn't matter. I don't have a strong opinion.

Re: generate syscache info automatically

From

Peter Eisentraut

Date:

19 January 2024, 14:03:37

On 19.01.24 06:28, John Naylor wrote:
> On Wed, Jan 17, 2024 at 7:46 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> I updated the patch to use this style (but I swapped the first two
>> arguments from my example, so that the thing being created is named first).
>>
>> I also changed the names of the output files a bit to make them less
>> confusing.  (I initially had some files named .c.h, which was weird, but
>> apparently necessary to avoid confusing the build system.  But it's all
>> clearer now.)
>>
>> Other than bugs and perhaps style opinions, I think the first patch is
>> pretty good now.
> 
> LGTM. The only style consideration that occurred to me just now was
> MAKE_SYSCACHE, since it doesn't match the DECLARE_... macros. It seems
> like the same thing from a code generation perspective. The other
> macros refer to relations, so there is a difference, but maybe it
> doesn't matter. I don't have a strong opinion.

The DECLARE_* macros map to actual BKI commands ("declare toast", 
"declare index").  So I wanted to use a different verb to distinguish 
"generate code for this" from those BKI commands.

Re: generate syscache info automatically

From

John Naylor

Date:

22 January 2024, 00:33:13

On Fri, Jan 19, 2024 at 9:03 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> The DECLARE_* macros map to actual BKI commands ("declare toast",
> "declare index").  So I wanted to use a different verb to distinguish
> "generate code for this" from those BKI commands.

That makes sense to me.

Re: generate syscache info automatically

From

Peter Eisentraut

Date:

23 January 2024, 18:58:30

On 22.01.24 01:33, John Naylor wrote:
> On Fri, Jan 19, 2024 at 9:03 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> The DECLARE_* macros map to actual BKI commands ("declare toast",
>> "declare index").  So I wanted to use a different verb to distinguish
>> "generate code for this" from those BKI commands.
> 
> That makes sense to me.

I have committed the first patch, and the buildfarm hiccup seems to have 
passed.

I'll close this commitfest entry now.  I have enough feedback to work on 
the ObjectProperty generation, but I'll make a new entry for that.