inconsistency and inefficiency in setup_conversion() - Mailing list pgsql-hackers

From John Naylor
Subject inconsistency and inefficiency in setup_conversion()
Date
Msg-id CAJVSVGWtUqxpfAaxS88vEGvi+jKzWZb2EStu5io-UPc4p9rSJg@mail.gmail.com
Whole thread Raw
Responses Re: inconsistency and inefficiency in setup_conversion()
List pgsql-hackers
Taking a close look at the result of setup_conversion(), wrong or at
least confusing comments are applied to the functions. Consider this
family of conversions:

select conproc, conname
from pg_conversion
where conproc = 'utf8_to_win'::regproc
order by oid;
   conproc   |       conname
-------------+----------------------
 utf8_to_win | utf8_to_windows_866
 utf8_to_win | utf8_to_windows_874
 utf8_to_win | utf8_to_windows_1250
 utf8_to_win | utf8_to_windows_1251
 utf8_to_win | utf8_to_windows_1252
 utf8_to_win | utf8_to_windows_1253
 utf8_to_win | utf8_to_windows_1254
 utf8_to_win | utf8_to_windows_1255
 utf8_to_win | utf8_to_windows_1256
 utf8_to_win | utf8_to_windows_1257
 utf8_to_win | utf8_to_windows_1258
(11 rows)

Then compare the comment on the function:

select proname, description
from pg_description d
join pg_proc p on d.objoid=p.oid
where classoid = 'pg_proc'::regclass
and description ~ 'for UTF8 to WIN';
   proname   |                   description
-------------+--------------------------------------------------
 utf8_to_win | internal conversion function for UTF8 to WIN1258
(1 row)

Notice how the comment refers to the last encoding created. This is
because setup_conversion.sql invokes CREATE OR REPLACE FUNCTION
utf8_to_win [...] multiple times, each with different comments
specific to the encoding. It'd be messy at best to try to construct
the right comment using the current Makefile script. It also can't be
good for initdb performance to create 44 functions just to immediately
drop them. Speaking of, from this thread about initdb performance [1],
setup_conversion() consumed the biggest share of time. I propose to
get rid of the ad hoc $(CONVERSIONS) format and solve the comment
issue, while hopefully shaving a bit more time off of initdb. It seems
our options are the following:

Solution #1 - As alluded to in [1], turn the conversions into
pg_proc.dat and pg_conversion.dat entries. Teach genbki.pl to parse
pg_wchar.h to map conversion names to numbers.
Pros:
-likely easy to do
-allows for the removal of an install target in the Makefile as well
as ad hoc logic in MSVC
-uses a format that developers need to use anyway
Cons:
-immediately burns up 88 hard-coded OIDs and one for each time a
conversion proc is created
-would require editing data in two catalogs every time a conversion
proc is created

Solution #2 - Write a new script that would read all the .c files in
the various directories and output two files. These would be COPY'd
into temp tables during initdb, and then inserted into pg_proc,
pg_conversion, and pg_description using SQL.
Pros:
-eliminates all(?) manual catalog maintenance when adding new conversion procs
Cons:
-likely complex and difficult to debug
-further complicates initdb.c
-requires MSVC development

If we do anything, I'd much rather do #1, but that way is not entirely
without downsides compared to doing nothing. Any thoughts?

[1] https://www.postgresql.org/message-id/b549c8ad-f12e-aad1-9a59-b24cb3e55a17@proxel.se


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Fix some trivial issues of the document/comment
Next
From: Justin Pryzby
Date:
Subject: Re: [GENERAL] huge RAM use in multi-command ALTER of table heirarchy