dump/restore results in duplicate key violation with 7.4.6. Bug? - Mailing list pgsql-admin
From | danda |
---|---|
Subject | dump/restore results in duplicate key violation with 7.4.6. Bug? |
Date | |
Msg-id | 418BD67D.1010108@osc.co.cr Whole thread Raw |
List | pgsql-admin |
Hi, I am getting an error when attempting to perform a pg_dump/pg_restore cycle (data only). Source database is 7.4.3, Target is 7.4.6. Source database was originally using 'SQL_ASCII' as the encoding, but all data is in UTF-8. Target database uses 'UNICODE'. The table that is giving me problems contains over 400,000 rows. (It holds category data from the dmoz project. ) In both the source and target database, there is a unique constraint on the topic key. Upon restore I am getting duplicate key violations on 12 rows. All of the problem rows contain non latin1 data. What I've tried: ========== 1) SQL Dump / Restore source machine: pg_dump -a -t category -U postgres dbname > cats.dump.sql target machine: psql \i cats.dump.sql which outputs: \i /tmp/category.dump.sql SET SET SET SET SET UPDATE 1 SET psql:/tmp/category.dump.sql:24: ERROR: duplicate key violates unique constraint "category_topic_key" CONTEXT: COPY category, line 133302: "1227568 503988 Top/Kids_and_Teens/International/Korean/ê²ì 0 0" SET UPDATE 1 ( I also tried it using inserts instead of copy from, but with similar results. ) 2) Binary (custom) Dump / Restore source machine: pg_dump -F c -Z 8 -t category -U postgres dbname > category_dump.custom.gz target machine: pg_restore -d dbname -a category_dump.custom.gz which churns for a while and then gives me this error: pg_restore: ERROR: duplicate key violates unique constraint "category_topic_key" CONTEXT: COPY category, line 133302: "1227568 503988 Top/Kids_and_Teens/International/Korean/ê²ì 0 0" pg_restore: [archiver (db)] error returned by PQendcopy 3) Remove unique constraints. I then removed the unique constraint in the target database so I could at least import the data. After that I was able to view exactly which rows have been duplicated: select sub.topic, sub.cnt from (select topic, count(*) as cnt from category group by topic) sub where cnt > 1; topic | cnt -----------------------------------------------------------------------+----- Top/Adult/World/Japanese/ãªã³ã©ã¤ã³ã·ã§ãã/ã°ã㺠| 2 Top/Adult/World/Japanese/ã¨ã³ã¿ã¼ãã¤ã³ã¡ã³ã/ã²ã¼ã /éçºå/ãè¡ | 10 Top/Adult/World/Korean/ëê±° | 4 Top/Adult/World/Korean/ì±ì¸_ì¬ì´í¸/ìì | 2 Top/Adult/World/Korean/ìì /문í | 2 Top/Adult/World/Korean/ì±ì¸ì©í | 2 Top/Adult/World/Korean/ì±ì¸ì¬ì´í¸_X/ë§í | 2 Top/Kids_and_Teens/International/Japanese/ã²ã¼ã | 2 Top/Kids_and_Teens/International/Japanese/ã³ã³ãã¥ã¼ã¿ | 2 Top/Kids_and_Teens/International/Japanese/ã¨ã³ã¿ã¼ãã¤ã³ã¡ã³ã/ã©ã¸ãª | 4 Top/Kids_and_Teens/International/Japanese/趣å³ã»ã¹ãã¼ã/ããã¡ã | 2 Top/Kids_and_Teens/International/Korean/ìì | 5 (12 rows) Performing this same query on the source database: select sub.topic, sub.cnt from (select topic, count(*) as cnt from category group by topic) sub where cnt > 1; topic | cnt -------+----- (0 rows) 4) Attempted to identify category_id of duplicate rows: select category_id from category where topic = 'Top/Adult/World/Korean/ëê±°'; category_id ------------- (0 rows) I believe this failed due to some sort of encoding or font problem between xterm and psql and DB or even X clipboard. Note that the data does display correctly when viewed in mozilla. Still it would be nice to be able to copy/paste psql result string and use it as input and actually find a match! 5) Manual inspection of one of the rows. I chose the topic 'Top/Adult/World/Korean/ëê±°' to pursue further. I executed the following query and looked for multiple instances of that string. There should be 4 according to our duplicates query above. select category_id, topic from category where topic like 'Top/Adult/World/Korean%'; category_id | topic -------------+---------------------------------------------- 328048 | Top/Adult/World/Korean/ë¹ëì¤,CD 381025 | Top/Adult/World/Korean/ë¹ëì¤,CD/ë°±ìCD 400131 | Top/Adult/World/Korean/ì±ì¸ì¬ì´í¸_X 400136 | Top/Adult/World/Korean/ì±ì¸ì¬ì´í¸_X/ë§í¬ 400133 | Top/Adult/World/Korean/ì±ì¸ì¬ì´í¸_X/íìì 5830581 | Top/Adult/World/Korean/미ëì´ 5830906 | Top/Adult/World/Korean/ë¹ì¦ëì¤ 589823 | Top/Adult/World/Korean/ì±ì¸_ì¬ì´í¸/기ê´,ë¨ì²´ 324253 | Top/Adult/World/Korean 367742 | Top/Adult/World/Korean/ëê±° 378503 | Top/Adult/World/Korean/ìì 590650 | Top/Adult/World/Korean/ìì /ë§í 378504 | Top/Adult/World/Korean/ìì /문í 590649 | Top/Adult/World/Korean/ìì /ì ëë©ì´ì 5828700 | Top/Adult/World/Korean/ì±ì¸ì¬ì´í¸_X/ë§í 5812536 | Top/Adult/World/Korean/ì±í 5832542 | Top/Adult/World/Korean/ë¹ì¦ëì¤/ì·¨ì,ì±ì© 364360 | Top/Adult/World/Korean/ê²ì´ 324254 | Top/Adult/World/Korean/ì±ì¸_ì¬ì´í¸ 592044 | Top/Adult/World/Korean/ì±ì¸_ì¬ì´í¸/ìì 5852704 | Top/Adult/World/Korean/ì±ì¸_ì¬ì´í¸/ë§í 406487 | Top/Adult/World/Korean/ì±ì¸_ì¬ì´í¸/ì±ì¸ìí 365940 | Top/Adult/World/Korean/ì±ì¸ì©í (23 rows) Yet I only see one row that matches the string exactly. It is the one with category_id = 367742. 6) Attempt to import same data back into source database ( 7.4.3 ) Acting on the theory that this is possibly a new problem in 7.4.6, I tried the following in the source DB, (still with SQL_ASCII encoding) which worked just fine: create table category_tmp as select * from category; alter table category_tmp add constraint category_tmp_topic_key unique (topic); NOTICE: ALTER TABLE / ADD UNIQUE will create implicit index "category_tmp_topic_key" for table "category_tmp" ALTER TABLE pg_dump -a -t category_tmp -U postgres dbname > category_tmp_dump.sql psql delete from category_tmp ; DELETE 461153 \i category_tmp_dump.sql SET SET SET SET 7) Experiments with encoding a) Used GNU recode to recode the sql dump file to UTF-8. Resulting file was unchanged, meaning data was already UTF-8. b) Updated the encoding in source database to 'UNICODE' to match target database. update pg_database set encoding = 6 where datname = 'dbname'; c) Re-imported the data back into the source database again (as in 6). Worked fine again. So at this point I am mostly at a loss. I would have thought that after changing the source DB to UNICODE encoding it should exhibit the same behavior as the target. I can think of two explanations: 1: initdb does something with the encoding beyond setting pg_database(encoding). 2: there is a bug in 7.4.6 that does not exist in 7.4.3 I suppose the next step is to create a new DB in 7.4.3 using UNICODE and attempt to import the data in the same manner. But right now I need a break. Dan Libby
pgsql-admin by date: