Re: Poor Performance with Distinct Subqueries with EXISTS and EXCEPT - Mailing list pgsql-general

From Thomas F.O'Connell
Subject Re: Poor Performance with Distinct Subqueries with EXISTS and EXCEPT
Date
Msg-id A0B501BA-4580-11D9-B555-000D93AE0944@sitening.com
Whole thread Raw
In response to Re: Poor Performance with Distinct Subqueries with EXISTS and EXCEPT  (Pierre-Frédéric Caillaud<lists@boutiquenumerique.com>)
List pgsql-general
Pierre,

Your re-write makes a lot of sense. Thanks! It's not using indexes for
some reason, and discovering why will be my next challenge.

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Dec 2, 2004, at 6:42 AM, Pierre-Frédéric Caillaud wrote:

>     Let's re-take your query from the start. At each step you should
> explain analyze the query to check if it runs smoothly.
>
>     1. You want the messages which have no actions. Rather than a
> subselect, I'd use a LEFT JOIN :
>
>     untested syntax :
>     SELECT m.id FROM message m LEFT JOIN message_action ma ON
> m.id=ma.messages_id WHERE ma.messages_id IS NULL;
>
>     On my machine, I have a zones table with 3000 rows and a cities table
> with 2 million rows, each place having a zone_id :
>
> EXPLAIN ANALYZE SELECT z.zone_id FROM geo.zones z LEFT JOIN geo.cities
> c ON c.zone_id=z.zone_id WHERE c.id IS NULL;
>  Merge Left Join  (cost=0.00..142063.06 rows=3663 width=4) (actual
> time=8726.203..8726.203 rows=0 loops=1)
>    Merge Cond: ("outer".zone_id = "inner".zone_id)
>    Filter: ("inner".id IS NULL)
>    ->  Index Scan using zones_pkey on zones z  (cost=0.00..99.10
> rows=3663 width=4) (actual time=15.027..43.987 rows=3663 loops=1)
>    ->  Index Scan using cities_zones_idx on cities c
> (cost=0.00..116030.55 rows=2073935 width=8) (actual
> time=25.164..5823.496 rows=2073935 loops=1)
>  Total runtime: 8726.327 ms
> (6 lignes)
>
>     8 seconds, this gives you an idea with that many records.
>     You should check your indexes are used !
>
>     Now you have the messages which have no actions, you must get the
> user email domains :
>
>     SELECT split_part( u.email, '@', 2 ) as domain
>         FROM users u, message m
>         LEFT JOIN message_action ma ON m.id=ma.messages_id
>         WHERE u.id=m.user_id
>             AND ma.messages_id IS NULL;
>
>     Can you time this query ? Are the indexes used ?
>     Now, let's remove the duplicates :
>
>     SELECT split_part( u.email, '@', 2 ) as domain
>         FROM users u, message m
>         LEFT JOIN message_action ma ON m.id=ma.messages_id
>         WHERE u.id=m.user_id
>             AND ma.messages_id IS NULL
>         GROUP By domain;
>
>     GROUP BY is faster than DISTINCT (in some cases).
>
>     How does it go ?

pgsql-general by date:

Previous
From: Jeff Amiel
Date:
Subject: Re: Dont let those int8's drive you mad!!
Next
From: OpenMacNews
Date:
Subject: initdb error: "could not identify current directory" (or, what have i done now?)