Thread: select from two tables

select from two tables

From
"Claus Guttesen"
Date:
Hi.

I have two tables, images and duplicates. The images-table is our
current table and has approx. 90 mill. entries. I want to weed out
duplicate file-entries (based on the md5-checksum of the file and
user-id) and update the file name with the first entry found, if any.

The images-table is:

id serial primary key,
userid int,
filename text,
hashcode text,
and some additional fields like upload-time, exif-date etc.

Duplicates:
id serial primary key,
userid int,
filename text,
hashcode text,
ref_count int

Here is some pseudo-code (in rails) that I have tested. This is
somewhat slow and I want to speed it up:

a=0
while a < 10000
 @image = select * from images where id = a; if @image
   @duplicate = select * from duplicates where userid = @image.userid
and hashcode = @image.hashcode   if @duplicates     update @duplicates set ref_count = @duplicates.ref_count + 1   else
   insert into duplicates (foo) values (bar)   end
 
 end a++
end

What I'd like to do is to perform a single query where I select from
both tables and then test whether the file is all-ready in duplicates:

@rec = select * from images i and duplicates d where i.id = a and
d.userid = i.userid and d.hashcode = i.hashcode
if @rec.images and @rec.duplicates update duplicates.ref_count
else insert into duplicates (foo) values (bar)
end

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentler gamester is the soonest winner.

Shakespeare


Re: select from two tables

From
Andreas Kretschmer
Date:
Claus Guttesen <kometen@gmail.com> schrieb:

> Hi.
> 
> I have two tables, images and duplicates. The images-table is our
> current table and has approx. 90 mill. entries. I want to weed out
> duplicate file-entries (based on the md5-checksum of the file and
> user-id) and update the file name with the first entry found, if any.
> 
> The images-table is:
> 
> id serial primary key,
> userid int,
> filename text,
> hashcode text,
> and some additional fields like upload-time, exif-date etc.
> 
> Duplicates:
> id serial primary key,
> userid int,
> filename text,
> hashcode text,
> ref_count int
> 
> What I'd like to do is to perform a single query where I select from
> both tables and then test whether the file is all-ready in duplicates:

I'm not sure if i understand you correctly, but maybe this is what you
want. First, my tables:

test=# select * from images;userid | filename | ref_count
--------+----------+-----------     1 | foo      |     2 | bar      |     3 | foobar   |
(3 Zeilen)

Zeit: 0,153 ms
test=*# select * from duplicates ;userid | filename
--------+----------     2 | bar     3 | foobar     3 | foobar
(3 Zeilen)


Okay, now i update images and set the corrent ref_count:

test=*# update images  set ref_count = count from (   select i.userid, i.filename, count(d.filename) from images i
leftouter join duplicates d        using(userid,filename)      group by 1,2 ) foo  where    images.userid=foo.userid
and   images.filename=foo.filename;
 
UPDATE 3
Zeit: 0,621 ms
test=*# select * from images;userid | filename | ref_count
--------+----------+-----------     1 | foo      |         0     2 | bar      |         1     3 | foobar   |         2
(3 Zeilen)


HTH, Andreas

Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°