Re: Poor Performance with Distinct Subqueries with EXISTS and EXCEPT - Mailing list pgsql-general

From Thomas F.O'Connell
Subject Re: Poor Performance with Distinct Subqueries with EXISTS and EXCEPT
Date
Msg-id 474D5EBE-45CA-11D9-B555-000D93AE0944@sitening.com
Whole thread Raw
In response to Re: Poor Performance with Distinct Subqueries with EXISTS and EXCEPT  (Pierre-Frédéric Caillaud<lists@boutiquenumerique.com>)
List pgsql-general
Interestingly, I tried the new version with and without enable_seqscan
on, and the version without indexes performs better because, I think,
it returns more rows than an index lookup would enhance.

Thanks again for your help. This is certainly an improvement over my
original version.

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Dec 2, 2004, at 6:42 AM, Pierre-Frédéric Caillaud wrote:

> Let's re-take your query from the start. At each step you should
> explain analyze the query to check if it runs smoothly.
>
>     1. You want the messages which have no actions. Rather than a
> subselect, I'd use a LEFT JOIN :
>
>     untested syntax :
>     SELECT m.id FROM message m LEFT JOIN message_action ma ON
> m.id=ma.messages_id WHERE ma.messages_id IS NULL;
>
>     On my machine, I have a zones table with 3000 rows and a cities table
> with 2 million rows, each place having a zone_id :
>
> EXPLAIN ANALYZE SELECT z.zone_id FROM geo.zones z LEFT JOIN geo.cities
> c ON c.zone_id=z.zone_id WHERE c.id IS NULL;
>  Merge Left Join  (cost=0.00..142063.06 rows=3663 width=4) (actual
> time=8726.203..8726.203 rows=0 loops=1)
>    Merge Cond: ("outer".zone_id = "inner".zone_id)
>    Filter: ("inner".id IS NULL)
>    ->  Index Scan using zones_pkey on zones z  (cost=0.00..99.10
> rows=3663 width=4) (actual time=15.027..43.987 rows=3663 loops=1)
>    ->  Index Scan using cities_zones_idx on cities c
> (cost=0.00..116030.55 rows=2073935 width=8) (actual
> time=25.164..5823.496 rows=2073935 loops=1)
>  Total runtime: 8726.327 ms
> (6 lignes)
>
>     8 seconds, this gives you an idea with that many records.
>     You should check your indexes are used !
>
>     Now you have the messages which have no actions, you must get the
> user email domains :
>
>     SELECT split_part( u.email, '@', 2 ) as domain
>         FROM users u, message m
>         LEFT JOIN message_action ma ON m.id=ma.messages_id
>         WHERE u.id=m.user_id
>             AND ma.messages_id IS NULL;
>
>     Can you time this query ? Are the indexes used ?
>     Now, let's remove the duplicates :
>
>     SELECT split_part( u.email, '@', 2 ) as domain
>         FROM users u, message m
>         LEFT JOIN message_action ma ON m.id=ma.messages_id
>         WHERE u.id=m.user_id
>             AND ma.messages_id IS NULL
>         GROUP By domain;
>
>     GROUP BY is faster than DISTINCT (in some cases).
>
>     How does it go ?

pgsql-general by date:

Previous
From: OpenMacNews
Date:
Subject: [SOLVED] Re: initdb error: "could not identify current directory" (or, what have i done now?)
Next
From: Együd Csaba
Date:
Subject: Postgres8 win2k server autovacuum