Home > mailing lists

Re: [PERFORM] Big IN() clauses etc : feature proposal - Mailing list pgsql-hackers

From	Dawid Kuroczko
Subject	Re: [PERFORM] Big IN() clauses etc : feature proposal
Date	May 9, 2006 13:43:36
Msg-id	758d5e7f0605090943p1bbfdae2p4cb28ba128288316@mail.gmail.com Whole thread Raw
In response to	Re: [PERFORM] Big IN() clauses etc : feature proposal (PFC <lists@peufeu.com>)
Responses	Re: [PERFORM] Big IN() clauses etc : feature proposal
List	pgsql-hackers

Tree view

On 5/9/06, PFC <lists@peufeu.com> wrote:
> > You might consider just selecting your primary key or a set of
> > primary keys to involved relations in your search query.  If you
> > currently use "select *" this can make your result set very large.
> >
> > Copying all the result set to the temp. costs you additional IO
> > that you propably dont need.
>
>         It is a bit of a catch : I need this information, because the purpose of
> the query is to retrieve these objects. I can first store the ids, then
> retrieve the objects, but it's one more query.
>
> > Also you might try:
> >       SELECT * FROM somewhere JOIN result USING (id)
> > Instead of:
> >       SELECT * FROM somewhere WHERE id IN (SELECT id FROM result)
>
>         Yes you're right in this case ; however the query to retrieve the owners
> needs to eliminate duplicates, which IN() does.

Well, you can either
  SELECT * FROM somewhere JOIN (SELECT id FROM result GROUP BY id) AS
a USING (id);
or even, for large number of ids:
  CREATE TEMPORARY TABLE result_ids AS SELECT id FROM RESULT GROUP BY id;
  SELECT * FROM somewhere JOIN result_ids USING (id);


> > On the other hand if your search query runs in 10ms it seems to be fast
> > enough for you to run it multiple times.  Theres propably no point in
> > optimizing anything in such case.
>
>         I don't think so :
>         - 10 ms is a mean time, sometimes it can take much more time, sometimes
> it's faster.
>         - Repeating the query might yield different results if records were added
> or deleted in the meantime.

You may SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
though locking might bite you. :)

>         - Complex search queries have imprecise rowcount estimates ; hence the
> joins that I would add to them will get suboptimal plans.
>
>         Using a temp table is really the cleanest solution now ; but it's too
> slow so I reverted to generating big IN() clauses in the application.

A thought, haven't checked it though, but...

You might want to use PL to store values, say PLperl, or even C, say:

create or replace function perl_store(name text, val int) returns void
as $$ my $name = shift; push @{$foo{$name}}, shift; return $$ LANGUAGE
plperl;

select perl_store('someids', id) from something group by id;
(you may need to warp it inside count())

Then use it:

create or replace function perl_retr(name text) returns setof int as
$$ my $name = shift; return $foo{$name} $$ LANGUAGE plperl;

select * from someother join perl_retr('someids') AS a(id) using (id);

All is in the memory.  Of course, you need to do some cleanup, test it,
etc, etc, etc. :)

Should work faster than a in-application solution :)

  Regards,
      Dawid

pgsql-hackers by date:

From: PFC
Date: 09 May 2006, 13:37:39
Subject: Re: [PERFORM] Big IN() clauses etc : feature proposal

From: PFC
Date: 09 May 2006, 13:48:10
Subject: Re: [PERFORM] Big IN() clauses etc : feature proposal

Re: [PERFORM] Big IN() clauses etc : feature proposal - Mailing list pgsql-hackers

Previous

Next