The following bug has been logged online:
Bug reference: 4420
Logged by: Legistrate
Email address: legistrate@gmail.com
PostgreSQL version: 8.3.3
Operating system: Windows XP
Description: Problem Finding unique text entries
Details:
SELECT count(*) FROM trigrams UNION
SELECT count(*) FROM (SELECT DISTINCT trigram FROM trigrams) s;
393262
393285
SELECT DISTINCT trigram INTO trigram2 FROM trigrams;
SELECT count(*) FROM trigram2 UNION
SELECT count(*) FROM (SELECT DISTINCT trigram FROM trigram2) s;
393261
393262
SELECT DISTINCT trigram INTO trigram3 FROM trigram2;
SELECT count(*) FROM trigram3 UNION
SELECT count(*) FROM (SELECT DISTINCT trigram FROM trigram3) s;
393261
393261
trigrams:http://www.2shared.com/file/3943447/273b8527/MusicData.html
trigram2:http://www.2shared.com/file/3943480/15ea5f88/MusicData2.html
trigram3:http://www.2shared.com/file/3943484/12879b91/MusicData3.html
When I recently installed 8.3.3 to fix similar bug in 8.3.0 I selected UTF-8
for server and client and the database that trigrams, etc are in is UTF-8.
However me and RhodiumToad discovered that show lc_collate; returns
"English_United States.1252". I dont recall seeing an option to change this
in the windows installer, but I could have missed it. Not sure if this is
affecting the results or not.
Note that:
select distinct trigram from trigrams where trigram=(select trigram from
trigrams where trigram_id=384519);
select distinct trigram from trigrams where trigram_id IN (384519,392701);
SELECT * FROM trigram2 GROUP BY trigram HAVING count(trigram)>1;
all return the same row that seems to be the duplicate that is not
detected.
Also, the original trigrams table was created by inserting the result of
trigrams2 into trigrams to use the SERIAL field as a primary key from the
following:
SELECT DISTINCT trigram INTO trigrams2 FROM (SELECT DISTINCT trigram FROM
artist_trigrams UNION SELECT DISTINCT trigram FROM album_trigrams UNION
SELECT DISTINCT trigram FROM track_trigrams) s;
The content of those tables is much more extensive, but if you need it, let
me know and I will rapidshare it or something.
I was able to create a unique constraint on 'trigram' for both trigram2 and
trigram3, not on trigrams. For RodiumToad, he couldnt create a unique
constraint on trigram2 and SELECT count(*) FROM (SELECT DISTINCT trigram
FROM trigrams) s; returned 393261 on freebsd with lc_collate=C