Re: query slows down after vacuum analyze - Mailing list pgsql-performance

From ismo.tuononen@solenovo.fi
Subject Re: query slows down after vacuum analyze
Date
Msg-id Pine.LNX.4.64.0703060746430.26523@ismoli.solenovo.jns
Whole thread Raw
In response to query slows down after vacuum analyze  (Jeff Cole <cole.jeff@gmail.com>)
List pgsql-performance
Are you sure that:

SELECT count(distinct s.id) AS count_all
FROM symptoms s ,symptom_reports sr,users u
WHERE s.id=sr.symptom_id and sr.user_id=u.id and u.disease_id=1;

is as slow as

SELECT count(*) AS count_all
FROM symptoms
WHERE (1=1 and symptoms.id in (
 select symptom_id from symptom_reports sr
 where 1=1 and sr.user_id in (
  select id from users where disease_id=1
  )
 )
);

I think that it's best to have database to deside how to find rows, so I
like to write all as "clean" as possible.

only when queries are slow I analyze them and try to write those different
way.

that have worked great in oracle, where it seems that "cleanest" query is
always fastest. in postgres it's not always true, sometimes you must write
subqueries to make it faster.

Ismo

On Mon, 5 Mar 2007, Jeff Cole wrote:

> Hi, I'm new to tuning PostgreSQL and I have a query that gets slower after I
> run a vacuum analyze.  I believe it uses a Hash Join before the analyze and a
> Nested Loop IN Join after.  It seems the Nested Loop IN Join estimates the
> correct number of rows, but underestimates the amount of time required.   I am
> curious why the vacuum analyze makes it slower and if that gives any clues as
> too which parameter I should be tuning.
>
> BTW, I know the query could be re-structured more cleanly to remove the
> sub-selects, but that doesn't seem to impact the performance.
>
> thanks,
> Jeff
>
> Welcome to psql 8.1.5, the PostgreSQL interactive terminal.
>
> Type:  \copyright for distribution terms
>       \h for help with SQL commands
>       \? for help with psql commands
>       \g or terminate with semicolon to execute query
>       \q to quit
>
> plm_demo=# explain analyze SELECT count(*) AS count_all FROM symptoms WHERE (
> 1=1 and symptoms.id in (select symptom_id from symptom_reports sr where 1=1
> and sr.user_id in (select id from users where disease_id=1))) ;
>                                                                              QUERY
> PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=366.47..366.48 rows=1 width=0) (actual time=125.093..125.095
> rows=1 loops=1)
>   ->  Hash Join  (cost=362.41..366.38 rows=36 width=0) (actual
>   ->  time=124.162..124.859 rows=106 loops=1)
>         Hash Cond: ("outer".id = "inner".symptom_id)
>         ->  Seq Scan on symptoms  (cost=0.00..3.07 rows=107 width=4) (actual
>         ->  time=0.032..0.295 rows=108 loops=1)
>         ->  Hash  (cost=362.25..362.25 rows=67 width=4) (actual
>         ->  time=124.101..124.101 rows=106 loops=1)
>               ->  HashAggregate  (cost=361.58..362.25 rows=67 width=4) (actual
>               ->  time=123.628..123.854 rows=106 loops=1)
>                     ->  Hash IN Join  (cost=35.26..361.41 rows=67 width=4)
>                     ->  (actual time=9.767..96.372 rows=13074 loops=1)
>                           Hash Cond: ("outer".user_id = "inner".id)
>                           ->  Seq Scan on symptom_reports sr
>                           ->  (cost=0.00..259.65 rows=13165 width=8) (actual
>                           ->  time=0.029..33.359 rows=13074 loops=1)
>                           ->  Hash  (cost=35.24..35.24 rows=11 width=4)
>                           ->  (actual time=9.696..9.696 rows=1470 loops=1)
>                                 ->  Bitmap Heap Scan on users
>                                 ->  (cost=2.04..35.24 rows=11 width=4) (actual
>                                 ->  time=0.711..6.347 rows=1470 loops=1)
>                                       Recheck Cond: (disease_id = 1)
>                                       ->  Bitmap Index Scan on
>                                       users_disease_id_index  (cost=0.00..2.04
>                                       rows=11 width=0) (actual
>                                       time=0.644..0.644 rows=2378 loops=1)
>                                             Index Cond: (disease_id = 1)
> Total runtime: 134.045 ms
> (15 rows)
>
>
> plm_demo=# vacuum analyze;
> VACUUM
> plm_demo=# analyze;
> ANALYZE
>
> plm_demo=# explain analyze SELECT count(*) AS count_all FROM symptoms WHERE (
> 1=1 and symptoms.id in (select symptom_id from symptom_reports sr where 1=1
> and sr.user_id in (select id from users where disease_id=1))) ;
>                                                               QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=586.47..586.48 rows=1 width=0) (actual
> time=3441.385..3441.386 rows=1 loops=1)
>   ->  Nested Loop IN Join  (cost=149.05..586.26 rows=85 width=0) (actual
>   ->  time=54.517..3441.115 rows=106 loops=1)
>         Join Filter: ("outer".id = "inner".symptom_id)
>         ->  Seq Scan on symptoms  (cost=0.00..3.08 rows=108 width=4) (actual
>         ->  time=0.007..0.273 rows=108 loops=1)
>         ->  Hash IN Join  (cost=149.05..603.90 rows=13074 width=4) (actual
>         ->  time=0.078..24.503 rows=3773 loops=108)
>               Hash Cond: ("outer".user_id = "inner".id)
>               ->  Seq Scan on symptom_reports sr  (cost=0.00..258.74
>               ->  rows=13074 width=8) (actual time=0.003..9.044 rows=3773
>               ->  loops=108)
>               ->  Hash  (cost=145.38..145.38 rows=1470 width=4) (actual
>               ->  time=7.608..7.608 rows=1470 loops=1)
>                     ->  Seq Scan on users  (cost=0.00..145.38 rows=1470
>                     ->  width=4) (actual time=0.006..4.353 rows=1470 loops=1)
>                           Filter: (disease_id = 1)
> Total runtime: 3441.452 ms
> (11 rows)
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly
>

pgsql-performance by date:

Previous
From: Guido Neitzer
Date:
Subject: Re: strange performance regression between 7.4 and 8.1
Next
From: Richard Huxton
Date:
Subject: Re: Insert performance