Home > mailing lists

Re: Bug in UTF8-Validation Code? - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Bug in UTF8-Validation Code?
Date	March 17, 2007 22:00:35
Msg-id	18930.1174168701@sss.pgh.pa.us Whole thread Raw
In response to	Re: Bug in UTF8-Validation Code? (Andrew Dunstan <andrew@dunslane.net>)
Responses	Re: Bug in UTF8-Validation Code? (Andrew Dunstan <andrew@dunslane.net>) Re: Bug in UTF8-Validation Code? (Andrew Dunstan <andrew@dunslane.net>)
List	pgsql-hackers

Tree view

Andrew Dunstan <andrew@dunslane.net> writes:
> Here are some timing tests in 1m rows of random utf8 encoded 100 char 
> data. It doesn't look to me like the saving you're suggesting is worth 
> the trouble.

Hmm ... not sure I believe your numbers.  Using a test file of 1m lines
of 100 random latin1 characters converted to utf8 (thus, about half and
half 7-bit ASCII and 2-byte utf8 characters), I get this in SQL_ASCII
encoding:

regression=# \timing
Timing is on.
regression=# create temp table test(f1 text);
CREATE TABLE
Time: 5.047 ms
regression=# copy test from '/home/tgl/zzz1m';
COPY 1000000
Time: 4337.089 ms

and this in UTF8 encoding:

utf8=# \timing
Timing is on.
utf8=# create temp table test(f1 text);
CREATE TABLE
Time: 5.108 ms
utf8=# copy test from '/home/tgl/zzz1m';
COPY 1000000
Time: 7776.583 ms

The numbers aren't super repeatable, but it sure looks to me like the
encoding check adds at least 50% to the runtime in this example; so
doing it twice seems unpleasant.

(This is CVS HEAD, compiled without assert checking, on an x86_64
Fedora Core 6 box.)
        regards, tom lane

pgsql-hackers by date:

From: Grzegorz Jaskiewicz
Date: 17 March 2007, 21:15:24
Subject: Re: [PATCHES] Bitmapscan changes

From: Andrew Dunstan
Date: 17 March 2007, 23:09:15
Subject: Re: Bug in UTF8-Validation Code?

Re: Bug in UTF8-Validation Code? - Mailing list pgsql-hackers

Previous

Next