Re: Question on a select - Mailing list pgsql-general

From Madison Kelly
Subject Re: Question on a select
Date
Msg-id 41D7A8E3.1050507@alteeve.com
Whole thread Raw
In response to Re: Question on a select  (Bruno Wolff III <bruno@wolff.to>)
List pgsql-general
Bruno Wolff III wrote:
> There should be parenthesis around the list to test.
> WHERE a_name, a_type, a_dir NOT IN (
> should be
> WHERE (a_name, a_type, a_dir) NOT IN (

That did it (I think)!

>
> I believe that the NOT IN query should run comparably to the LEFT JOIN
> example supplied by the other person (at least in recent versions of
> Postgres). I would expect this to run faster than using NOT EXISTS.
> You probably want to try all 3. The semantics of the three ways of doing
> this are not all equivalent if there are NULLs in the data being used
> to eliminate rows. As you indicated you don't have NULLs this shouldn't
> be a problem.
>
> Another way to write this is using set different (EXCEPT or EXCEPT ALL)
> using the key fields and then joining back to table a to pick up the
> other fields. However this will almost certianly be slower than the
> other methods.

   Something odd, now that I have the other method working (I think)...

tle-bu=> SELECT a.file_name, a.file_parent_dir, a.file_type FROM
file_info_1 a LEFT JOIN file_set_1 b ON a.file_name=b.fs_name AND
a.file_parent_dir=b.fs_parent_dir AND a.file_type=b.fs_type WHERE
b.fs_name IS NULL;

   returns the results in roughly 1 or 2 seconds on a test data set of
15,000 entries. I have an index on both 'file_info_1' covering
'fs_name', 'fs_parent_dir' and 'fs_type' and on 'file_set_1' covering
'file_name', 'file_parent_dir' and 'file_type'. When I try the seconds
method though:

tle-bu=> SELECT file_name, file_parent_dir, file_type FROM file_info_1
WHERE (file_name, file_parent_dir, file_type) NOT IN (SELECT fs_name,
fs_parent_dir, fs_type FROM file_set_1);

   It took so long to process that after roughly three minutes I stopped
the query for fear of overheating my laptop (which happend a while back
forcing a thermal shut down).

   The indexes are:

CREATE INDEX file_info_#_display_idx ON file_info_# (file_type,
file_parent_dir, file_name);
CREATE INDEX file_set_#_sync_idx ON file_set_# (fs_name, fs_parent_dir,
fs_type)

   Are these not effective for the second query? If not, what should I
change or add? If so, would you have any insight into why there is such
an incredible difference in performance?

   Thanks very much again!!

Madison

pgsql-general by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: Question on a select
Next
From: "Vincent Hikida"
Date:
Subject: Re: Question on a select