Home > mailing lists

select from two tables - Mailing list pgsql-sql

From	Claus Guttesen
Subject	select from two tables
Date	September 15, 2008 07:05:13
Msg-id	b41c75520809150304u628ae77eua3af6a54d5e1e59c@mail.gmail.com Whole thread Raw
Responses	Re: select from two tables
List	pgsql-sql

Tree view

Hi.

I have two tables, images and duplicates. The images-table is our
current table and has approx. 90 mill. entries. I want to weed out
duplicate file-entries (based on the md5-checksum of the file and
user-id) and update the file name with the first entry found, if any.

The images-table is:

id serial primary key,
userid int,
filename text,
hashcode text,
and some additional fields like upload-time, exif-date etc.

Duplicates:
id serial primary key,
userid int,
filename text,
hashcode text,
ref_count int

Here is some pseudo-code (in rails) that I have tested. This is
somewhat slow and I want to speed it up:

a=0
while a < 10000
 @image = select * from images where id = a; if @image
   @duplicate = select * from duplicates where userid = @image.userid
and hashcode = @image.hashcode   if @duplicates     update @duplicates set ref_count = @duplicates.ref_count + 1   else
   insert into duplicates (foo) values (bar)   end
 
 end a++
end

What I'd like to do is to perform a single query where I select from
both tables and then test whether the file is all-ready in duplicates:

@rec = select * from images i and duplicates d where i.id = a and
d.userid = i.userid and d.hashcode = i.hashcode
if @rec.images and @rec.duplicates update duplicates.ref_count
else insert into duplicates (foo) values (bar)
end

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentler gamester is the soonest winner.

Shakespeare

pgsql-sql by date:

From: "Karl Grossner"
Date: 14 September 2008, 21:06:31
Subject: a simple transform

From: Andreas Kretschmer
Date: 15 September 2008, 09:01:36
Subject: Re: select from two tables

select from two tables - Mailing list pgsql-sql

Previous

Next