Re: Size of pg_rewrite (Was: Report checkpoint progress with pg_stat_progress_checkpoint) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Size of pg_rewrite (Was: Report checkpoint progress with pg_stat_progress_checkpoint)
Date
Msg-id B6808F91-6F4D-4C75-A112-C0CDD721C849@anarazel.de
Whole thread Raw
In response to Size of pg_rewrite (Was: Report checkpoint progress with pg_stat_progress_checkpoint)  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
Hi,

On April 8, 2022 7:52:07 AM PDT, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>On Sat, 19 Mar 2022 at 01:15, Andres Freund <andres@anarazel.de> wrote:
>> pg_rewrite without pg_stat_progress_checkpoint: 745472, with: 753664
>>
>> pg_rewrite is the second biggest relation in an empty database already...
>
>Yeah, that's not great. Thanks for nerd-sniping me into looking into
>how views and pg_rewrite rules work, that was very interesting and I
>learned quite a lot.

Thanks for looking!


># Immediately potential, limited to progress views
>
>I noticed that the CASE-WHEN (used in translating progress stage index
>to stage names) in those progress reporting views can be more
>efficiently described (althoug with slightly worse behaviour around
>undefined values) using text array lookups (as attached). That
>resulted in somewhat smaller rewrite entries for the progress views
>(toast compression was good old pglz):
>
>template1=# SELECT sum(octet_length(ev_action)),
>SUM(pg_column_size(ev_action)) FROM pg_rewrite WHERE
>ev_class::regclass::text LIKE '%progress%';
>
>master:
>  sum  |  sum
>-------+-------
> 97277 | 19956
>patched:
>  sum  |  sum
>-------+-------
> 77069 | 18417
>
>So this seems like a nice improvement of 20% uncompressed / 7% compressed.
>
>I tested various cases of phase number to text translations: `CASE ..
>WHEN`; `(ARRAY[]::text[])[index]` and `('{}'::text[])[index]`. See
>results below:
>
>postgres=# create or replace view arrayliteral_view as select
>(ARRAY['a','b','c','d','e','f']::text[])[index] as name from tst
>s(index);
>CREATE INDEX
>postgres=# create or replace view stringcast_view as select
>('{a,b,c,d,e,f}'::text[])[index] as name from tst s(index);
>CREATE INDEX
>postgres=# create or replace view split_stringcast_view as select
>(('{a,b,' || 'c,d,e,f}')::text[])[index] as name from tst s(index);
>CREATE VIEW
>postgres=# create or replace view case_view as select case index when
>0 then 'a' when 1 then 'b' when 2 then 'c' when 3 then 'd' when 4 then
>'e' when 5 then 'f' end as name from tst s(index);
>CREATE INDEX
>
>
>postgres=# postgres=# select ev_class::regclass::text,
>octet_length(ev_action), pg_column_size(ev_action) from pg_rewrite
>where ev_class in ('arrayliteral_view'::regclass::oid,
>'case_view'::regclass::oid, 'split_stringcast_view'::regclass::oid,
>'stringcast_view'::regclass::oid);
>       ev_class        | octet_length | pg_column_size
>-----------------------+--------------+----------------
> arrayliteral_view     |         3311 |           1322
> stringcast_view       |         2610 |           1257
> case_view             |         5170 |           1412
> split_stringcast_view |         2847 |           1350
>
>It seems to me that we could consider replacing the CASE statements
>with array literals and lookups if we really value our template
>database size. But, as text literal concatenations don't seem to get
>constant folded before storing them in the rules table, this rewrite
>of the views would result in long lines in the system_views.sql file,
>or we'd have to deal with the additional overhead of the append
>operator and cast nodes.

My inclination is that the mapping functions should be c functions. There's really no point in doing it in SQL and it
comesat a noticable price. And, if done in C, we can fix mistakes in minor releases, which we can't in SQL. 


># Future work; nodeToString / readNode, all rewrite rules
>
>Additionally, we might want to consider other changes like default (or
>empty value) elision in nodeToString, if that is considered a
>reasonable option and if we really want to reduce the size of the
>pg_rewrite table.
>
>I think a lot of space can be recovered from that: A manual removal of
>what seemed to be fields with default values (and the removal of all
>query location related fields) in the current definition of
>pg_stat_progress_create_index reduces its uncompressed size from
>23226B raw and 4204B compressed to 13821B raw and 2784B compressed,
>for an on-disk space saving of 33% for this view's ev_action.
>
>Do note, however, that that would add significant branching in the
>nodeToString and readNode code, which might slow down that code
>significantly. I'm not planning on working on that; but in my opinion
>that is a viable path to reducing the size of new database catalogs.

We should definitely be careful about that. I do agree that there's a lot of efficiency to be gained in the
serializationformat. Once we have the automatic node func generation in place, we could have one representation for
humanconsumption, and one for density... 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: remove more archiving overhead
Next
From: Zhihong Yu
Date:
Subject: Re: Defer selection of asynchronous subplans until the executor initialization stage