Claus Guttesen <kometen@gmail.com> schrieb:
> Hi.
>
> I have two tables, images and duplicates. The images-table is our
> current table and has approx. 90 mill. entries. I want to weed out
> duplicate file-entries (based on the md5-checksum of the file and
> user-id) and update the file name with the first entry found, if any.
>
> The images-table is:
>
> id serial primary key,
> userid int,
> filename text,
> hashcode text,
> and some additional fields like upload-time, exif-date etc.
>
> Duplicates:
> id serial primary key,
> userid int,
> filename text,
> hashcode text,
> ref_count int
>
> What I'd like to do is to perform a single query where I select from
> both tables and then test whether the file is all-ready in duplicates:
I'm not sure if i understand you correctly, but maybe this is what you
want. First, my tables:
test=# select * from images;userid | filename | ref_count
--------+----------+----------- 1 | foo | 2 | bar | 3 | foobar |
(3 Zeilen)
Zeit: 0,153 ms
test=*# select * from duplicates ;userid | filename
--------+---------- 2 | bar 3 | foobar 3 | foobar
(3 Zeilen)
Okay, now i update images and set the corrent ref_count:
test=*# update images set ref_count = count from ( select i.userid, i.filename, count(d.filename) from images i
leftouter join duplicates d using(userid,filename) group by 1,2 ) foo where images.userid=foo.userid
and images.filename=foo.filename;
UPDATE 3
Zeit: 0,621 ms
test=*# select * from images;userid | filename | ref_count
--------+----------+----------- 1 | foo | 0 2 | bar | 1 3 | foobar | 2
(3 Zeilen)
HTH, Andreas
Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknown)
Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°