Re: fast DISTINCT or EXIST

From: Tom Lane
Subject: Re: fast DISTINCT or EXIST
Date: ,
Msg-id: 9293.1175961248@sss.pgh.pa.us
(view: Whole thread, Raw)
In response to: Re: fast DISTINCT or EXIST  (Arjen van der Meijden)
Responses: Re: fast DISTINCT or EXIST  (Tilo Buschmann)
List: pgsql-performance

Tree view

fast DISTINCT or EXIST  (Tilo Buschmann, )
 Re: fast DISTINCT or EXIST  (Arjen van der Meijden, )
  Re: fast DISTINCT or EXIST  (Tom Lane, )
   Re: fast DISTINCT or EXIST  (Tilo Buschmann, )
    Re: fast DISTINCT or EXIST  (Tom Lane, )
    Re: fast DISTINCT or EXIST  (Arjen van der Meijden, )

Arjen van der Meijden <> writes:
> If that is your main culprit, you could also use two limits based on the
> fact that there will be at most X songs per cd which would match your
> title (my not very educated guess is 3x). Its a bit ugly... but if that
> is what it takes to make postgresql not scan your entire index, so be it...

> SELECT ... FROM cd
>    JOIN tracks ...
> WHERE cd.id IN (SELECT DISTINCT cd_id FROM (SELECT t.cd_id FROM tracks t
>       WHERE t.tstitle @@ plainto_tsquery('simple','education') LIMIT 30)
> as foo LIMIT 10)

I think that's the only way.  There is no plan type in Postgres that
will generate unique-ified output without scanning the whole input
first, except for Uniq on pre-sorted input, which we can't use here
because the tsearch scan isn't going to deliver the rows in cd_id order.

I can see how to build one: make a variant of HashAggregate that returns
each input row immediately after hashing it, *if* it isn't a duplicate
of one already in the hash table.  But it'd be a lot of work for what
seems a rather specialized need.

            regards, tom lane


pgsql-performance by date:

From: david@lang.hm
Date:
Subject: Re: SCSI vs SATA
From: Ron
Date:
Subject: Re: SCSI vs SATA