Re: Perform COPY FROM encoding conversions in larger chunks - Mailing list pgsql-hackers

From John Naylor
Subject Re: Perform COPY FROM encoding conversions in larger chunks
Date
Msg-id CAFBsxsEvXTy0UAfPB4dQbQa+7a9tfkSs3=ZMFVsqhNqd9ZzDdQ@mail.gmail.com
Whole thread Raw
In response to Re: Perform COPY FROM encoding conversions in larger chunks  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Perform COPY FROM encoding conversions in larger chunks  (Heikki Linnakangas <hlinnaka@iki.fi>)
Re: Perform COPY FROM encoding conversions in larger chunks  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
Hi Heikki,

0001 through 0003 are straightforward, and I think they can be committed now if you like. 

0004 is also pretty straightforward. The check you proposed upthread for pg_upgrade seems like the best solution to make that workable. I'll take a look at 0005 soon.

I measured the conversions that were rewritten in 0003, and there is indeed a noticeable speedup:

Big5 to EUC-TW:

head    196ms
0001-3  152ms

EUC-TW to Big5:

head    190ms
0001-3  144ms

I've attached the driver function for reference. Example use:

select drive_conversion(
  1000, 'euc_tw'::name, 'big5'::name,
  convert('a few kB of utf8 text here', 'utf8', 'euc_tw')
);

I took a look at the test suite also, and the only thing to note is a couple places where the comment doesn't match the code:

+  -- JIS X 0201: 2-byte encoded chars starting with 0x8e (SS2)
+  byte1 = hex('0e');
+  for byte2 in hex('a1')..hex('df') loop
+    return next b(byte1, byte2);
+  end loop;
+
+  -- JIS X 0212: 3-byte encoded chars, starting with 0x8f (SS3)
+  byte1 = hex('0f');
+  for byte2 in hex('a1')..hex('fe') loop
+    for byte3 in hex('a1')..hex('fe') loop
+      return next b(byte1, byte2, byte3);
+    end loop;
+  end loop;

Not sure if it matters , but thought I'd mention it anyway.

--
John Naylor
EDB: http://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: vacuum_cost_page_miss default value and modern hardware
Next
From: Zhihong Yu
Date:
Subject: Re: [HACKERS] GSoC 2017: Foreign Key Arrays