Getting weird results with unicode table... - Mailing list pgsql-bugs
From | David B |
---|---|
Subject | Getting weird results with unicode table... |
Date | |
Msg-id | GOEGKICPFOPNLIEIHGFJAEJHDDAA.postgresql@thegatelys.com Whole thread Raw |
Responses |
Re: Getting weird results with unicode table...
|
List | pgsql-bugs |
Folks, Two possible bugs. #1 Loading unicode data. #2 Doing count of data based on substring(1,1) of a col. -------------------------- #1 Loading unicode data -------------------------- I've loaded freedb database (freedb.org) into a 8.0.0 beta 4 database I created on my PC. PC is Windows XP Pro. 1.7Ghz 1GB Ram. Pentium 4. I loaded a bunch of rows via SQL window within pgAdmin III.(v1.2.0 Post-beta 3 OCt25th). During the load I did get lots of errors This was DDL script for initially for use with MySQL db. I made it Postgres ver by removing the DDL for creating the table since it was MySQL specific and replaced it with the following: CREATE TABLE album ( id serial NOT NULL, cddb_id varchar(10) NOT NULL DEFAULT ''::character varying, title varchar(255) NOT NULL DEFAULT ''::character varying, artist varchar(255) NOT NULL DEFAULT ''::character varying, "year" int4 NOT NULL DEFAULT 0, num_tracks int2 NOT NULL DEFAULT 0, length int2 NOT NULL DEFAULT 0, CONSTRAINT album_pkey PRIMARY KEY (id) ) WITH OIDS; ALTER TABLE album OWNER TO postgres; GRANT ALL ON TABLE album TO postgres; GRANT ALL ON TABLE album TO public; The INSERT statements look like this: -- -- Dumping data for table 'album' -- INSERT INTO album (id, cddb_id, title, artist, year, num_tracks, length) VALUES (1,'21037703','æã¨ã«ã¡ã¬ãªã³','ROLL DAYS',1997,3,889); INSERT INTO album (id, cddb_id, title, artist, year, num_tracks, length) VALUES (2,'cd113c10','Best of Miss Butch Blues','Various',1999,16,4414); INSERT INTO album (id, cddb_id, title, artist, year, num_tracks, length) VALUES (3,'a50b380b','Ray Charles in Paris','Ray Charles',2000,11,2874); INSERT INTO album (id, cddb_id, title, artist, year, num_tracks, length) VALUES (4,'21033f03','éå½','ã¢ã³ã³ãã¥',2001,3,833); INSERT INTO album (id, cddb_id, title, artist, year, num_tracks, length) VALUES (5,'cd113d4d','Untitled','Benjamin Gate',2000,77,4415); INSERT INTO album (id, cddb_id, title, artist, year, num_tracks, length) VALUES (9,'c409830e','More Blues','Diversos',2000,14,2437); INSERT INTO album (id, cddb_id, title, artist, year, num_tracks, length) VALUES (10,'21034c03','Everything+Ash','erico',2004,3,846); INSERT INTO album (id, cddb_id, title, artist, year, num_tracks, length) VALUES (11,'21034d03','å¤ã«æ±ããã¦ãA Night in Afro Blueã','ä¹ ä¿ INSERT INTO album (id, cddb_id, title, artist, year, num_tracks, length) VALUE S (12,'21034e03','éå¹´ã大å¿ãæ±ã','大åå æµ',1992,3,848); INSERT INTO album (id, cddb_id, title, artist, year, num_tracks, length) VALUES (13,'ae0e0f0e','le voyageur','Calvin Russell',2000,14,3601); INSERT INTO album (id, cddb_id, title, artist, year, num_tracks, length) VALUES (14,'21039204','Potrait_2 é¥','æ¶¼å®®é¥ (æ æã¿ãªã¿)',2003,4,916); Running this script I get most inserts as sucess but some failures too: INSERT 1356199 1 INSERT 1356200 1 INSERT 1356201 1 psql:freedb_oct04:1453202: ERROR: Unicode characters greater than or equal to 0 x10000 are not supported INSERT 1356202 1 psql:freedb_oct04:1453219: ERROR: Unicode characters greater than or equal to 0 x10000 are not supported INSERT 1356228 1 INSERT 1356229 1 psql:freedb_oct04:1453232: ERROR: Unicode characters greater than or equal to 0 x10000 are not supported INSERT 1356230 1 Note that the DB is set to UNICODE. Wonder if it is v8 beta related bug ? -------------------------- #2 Counting data -------------------------- I then take samples from that large ALBUM table and spread the records among 6 other tables. create table album_1 as select * from album where lower(title) >= 'a%' and lower(title) <= 'e%'; create table album_2 as select * from album where lower(title) >= 'f%' and lower(title) <= 'k%'; create table album_3 as select * from album where lower(title) >= 'l%' and lower(title) <= 'p%'; create table album_4 as select * from album where lower(title) >= 'q%' and lower(title) <= 't%'; create table album_5 as select * from album where lower(title) >= 'u%' and lower(title) <= 'w%'; create table album_6 as select * from album where lower(title) >= 'x%' or lower(title) <= 'a%'; select count(*) from album_1; -- 296k select count(*) from album_2; -- 205k select count(*) from album_3; -- 211k select count(*) from album_4; -- 180k select count(*) from album_5; -- 39k select count(*) from album_6; -- 78k I then create a view that merges all 6 tables. I count the data like this: select substring ( lower(title), 1, 1), count(*) from album where lower(title) > 'a%' and lower(title) <= 'e%' group by substring ( lower(title), 1, 1) ; Should get counts for 'a', 'b', 'c', 'd', 'e'. Don't even care about upper case rows. I get over 100 rows back with lots of hidden or unprintable characters. Does not make sense when I limit results to 'a' thru 'e'. Thanks!! --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.799 / Virus Database: 543 - Release Date: 11/19/2004
pgsql-bugs by date: