Proposal: CREATE CONVERSION - Mailing list pgsql-hackers
From | Tatsuo Ishii |
---|---|
Subject | Proposal: CREATE CONVERSION |
Date | |
Msg-id | 20020705.153641.71101525.t-ishii@sra.co.jp Whole thread Raw |
Responses |
Re: Proposal: CREATE CONVERSION
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: Proposal: CREATE CONVERSION (Bruce Momjian <pgman@candle.pha.pa.us>) Re: Proposal: CREATE CONVERSION (Tatsuo Ishii <t-ishii@sra.co.jp>) |
List | pgsql-hackers |
Here is my proposal for new CREATE CONVERSION which makes it possible to define new encoding conversion mapping between two encodings on the fly. The background: We are getting having more and more encoding conversion tables. Up to now, they reach to 385352 source lines and over 3MB in compiled forms in total. They are statically linked to the backend. I know this itself is not a problem since modern OSs have smart memory management capabilities to fetch only necessary pages from a disk. However, I'm worried about the infinite growing of these static tables. I think users won't love 50MB PostgreSQL backend load module. Second problem is more serious. The conversion definitions between certain encodings, such as Unicode and others are not well defined. For example, there are several conversion tables for Japanese Shift JIS and Unicode. This is because each vendor has its own "special characters" and they define the table in that the conversion fits for their purpose. The solution: The proposed new CREATE CONVERSION will solve these problems. A particular conversion table is statically linked to a dynamic loaded function and CREATE CONVERSION will tell PostgreSQL that if a conversion from encoding A to encoding B, then function C should be used. In this way, conversion tables are no more statically linked to the backend. Users also could define their own conversion tables easily that would best fit for their purpose. Also needless to say, people could define new conversions which PostgreSQL does not support yet. Syntax proposal: CREATE CONVERSION <conversion name> SOURCE <source encoding name> DESTINATION <destination encoding name> FROM <conversion function name> ; DROP CONVERSION <conversion name>; Example usage: CREATE OR REPLACE FUNCTION euc_jp_to_utf8(TEXT, TEXT, INTEGER) RETURNS INTEGER AS euc_jp_to_utf8.so LANGUAGE 'c'; CREATE CONVERSION euc_jp_to_utf8 SOURCE EUC_JP DESTINATION UNICODE FROM euc_jp_to_utf8; Implementation: Implementation would be quite straightforward. Create a new system table, and CREATE CONVERSION stores info onto it. pg_find_encoding_converters(utils/mb/mbutils.c) and friends needs to be modified so that they recognize dynamically defined conversions. Also psql would need some capabilities to print conversion definition info. Comments? -- Tatsuo Ishii
pgsql-hackers by date: