Re: optimalisation with EXCEPT clause - Mailing list pgsql-general

From Kincel, Martin
Subject Re: optimalisation with EXCEPT clause
Date
Msg-id A5ED43533E983E4685C9E6156BE8874F0840DCBA@kenya.tronet.as
Whole thread Raw
In response to Re: optimalisation with EXCEPT clause  (Grzegorz Jaśkiewicz <gryzman@gmail.com>)
List pgsql-general
Thank you for the answer Grzegorz.

> if you have a primary key on the table, and you should, you might get better performance using LEFT JOIN.

Well as far as I know, the result of such JOIN is a cartezian product, which is not exactly what I need. I need the
samestructure as table 'data' has. Or am I missing a trick how LEFT OUTER JOIN can be used instead of EXCEPT? :)
 

> EXCEPT will compare all columns, which might not be that fast, especially if those are text. (hence why I always tell
othersto use int as key in a table, but that's a different story). 
 

There is no good int to start using as a key in my 'data'. I would have to create one (out of some hash function,
diggestingthe whole row probably), but there is a strong possibility of adding colums into 'data' latter on, which
wouldrequire recalculation of such 'hash' column over and over again for millions of rows. While not impossible,
cerainlysomething I would like to avoid. 
 
Moreover, if one creates and maintains such hash column by hand and on his own, it is very likely, that he will forgot
something,or even mess it up completely. However, if there is a tool (something like an index on all colums) available
inthe database itself, I would be eager to use it.
 

Thanks again,
Winco


> -- 
> GJ



    * From: "Kincel, Martin" <MKincel@soitron.com>
    * To: <pgsql-general@postgresql.org>
    * Subject: optimalisation with EXCEPT clause
    * Date: Tue, 13 Apr 2010 17:01:18 +0200
    * Message-id: <A5ED43533E983E4685C9E6156BE8874F0840D83D@kenya.tronet.as>

Hello,


everyday I collect a couple of thousands rows of unique data from our
systems and I INSERT them into the table. Since I need no duplicate
data, I use EXCEPT clause when INSERTing, like this:

===
INSERT INTO data SELECT * FROM new_collected_data() EXCEPT SELECT * FROM
data;
===

It works exactly as I need, but there is a small issue I am thinking
about how to improve. Yes it's performance, what else? :)

Since I am INSERTing new_collected_data() in 10000-rows chunks into a
table already containing millions of rows, it takes a few minutes
(literally), which is something I understand and accept. 
However, I am wondering whether there is any way how to improve the
performance, either via indices, or ALTERing TABLE with UNIQUE
constraint or something else I might have completely forgot about.

Does anyone have any recommended approach how to speed up queries
containing EXCEPT clause? 


Thanks a lot,
Winco


pgsql-general by date:

Previous
From: Ostrovsky Eugene
Date:
Subject: modification time & transaction synchronisation problem
Next
From: Greg Smith
Date:
Subject: Re: readline library not found