Re: [PERFORM] Big IN() clauses etc : feature proposal - Mailing list pgsql-hackers

From Christian Kratzer
Subject Re: [PERFORM] Big IN() clauses etc : feature proposal
Date
Msg-id 20060509113337.M90693@vesihiisi.cksoft.de
Whole thread Raw
In response to Re: [PERFORM] Big IN() clauses etc : feature proposal  (PFC <lists@peufeu.com>)
Responses Re: [PERFORM] Big IN() clauses etc : feature proposal  (PFC <lists@peufeu.com>)
List pgsql-hackers
Hi,

On Tue, 9 May 2006, PFC wrote:

>
>> You might consider just selecting your primary key or a set of
>> primary keys to involved relations in your search query.  If you
>> currently use "select *" this can make your result set very large.
>>
>> Copying all the result set to the temp. costs you additional IO
>> that you propably dont need.
>
>     It is a bit of a catch : I need this information, because the purpose
> of the query is to retrieve these objects. I can first store the ids, then
> retrieve the objects, but it's one more query.

yes but depending on what you really need that can be faster.

Additionally to your query you are already transferring the whole result
set multiple times.  First you copy it to the result table. Then you
read it again.   Your subsequent queries will also have to read over
all the unneeded tuples just to get your primary key.

>> Also you might try:
>>      SELECT * FROM somewhere JOIN result USING (id)
>> Instead of:
>>      SELECT * FROM somewhere WHERE id IN (SELECT id FROM result)
>
>     Yes you're right in this case ; however the query to retrieve the
> owners needs to eliminate duplicates, which IN() does.

then why useth thy not the DISTINCT clause when building thy result table
and thou shalt have no duplicates.

>> On the other hand if your search query runs in 10ms it seems to be fast
>> enough for you to run it multiple times.  Theres propably no point in
>> optimizing anything in such case.
>
>     I don't think so :
>     - 10 ms is a mean time, sometimes it can take much more time,
> sometimes it's faster.
>     - Repeating the query might yield different results if records were
> added or deleted in the meantime.

which is a perfect reason to use a temp table.  Another variation on
the temp table scheme is use a result table and add a query_id.

We do something like this in our web application when users submit
complex queries.  For each query we store tuples of (query_id,result_id)
in a result table.  It's then easy for the web application to page the
result set.

>     - Complex search queries have imprecise rowcount estimates ; hence
> the joins that I would add to them will get suboptimal plans.
>
>     Using a temp table is really the cleanest solution now ; but it's too
> slow so I reverted to generating big IN() clauses in the application.

A cleaner solution usually pays off in the long run whereas a hackish
or overly complex solution will bite you in the behind for sure as
time goes by.

Greetings
Christian

--
Christian Kratzer                       ck@cksoft.de
CK Software GmbH                        http://www.cksoft.de/
Phone: +49 7452 889 135                 Fax: +49 7452 889 136

pgsql-hackers by date:

Previous
From: Christian Kratzer
Date:
Subject: Re: [PERFORM] Big IN() clauses etc : feature proposal
Next
From: PFC
Date:
Subject: Re: [PERFORM] Big IN() clauses etc : feature proposal