Re: Seeking help with a query that takes too long - Mailing list pgsql-performance

From Nick Fankhauser
Subject Re: Seeking help with a query that takes too long
Date
Msg-id NEBBLAAHGLEEPCGOBHDGMEJKJGAA.nickf@ontko.com
Whole thread Raw
In response to Re: Seeking help with a query that takes too long  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Seeking help with a query that takes too long  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
> It looks like you are running with the default statistics target (10).
> Try boosting it to 100 or even more for this column (see ALTER TABLE
> SET STATISTICS, then re-ANALYZE) and see if the estimate gets better.


Here are the results & a few more clues:

prod1=# alter table actor alter column actor_full_name_uppercase set
statistics 1000;
ALTER TABLE
prod1=# analyze actor;
ANALYZE
prod1=# select count(distinct actor_full_name_uppercase) from actor;
  count
---------
 1453371
(1 row)

prod1=# select count(actor_id) from actor;
  count
---------
 3386359
(1 row)

This indicates to me that 1 isn't too shabby as an estimate if the whole
name is specified, but I'm not sure how this gets altered in the case of a
"LIKE"


prod1=# \x
Expanded display is on.
prod1=# SELECT * FROM pg_stats
prod1-# WHERE tablename='actor' AND attname='actor_full_name_uppercase';

<Header boilerplate snipped out>

schemaname        | public
tablename         | actor
attname           | actor_full_name_uppercase
null_frac         | 0.000586667
avg_width         | 21
n_distinct        | -0.14701

<Long list of values and frequencies snipped out>

correlation       | -0.00211291


Question: What does it mean when n_distinct is negative?

New results of explain analyze:



QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------------------------------------------------
 Limit  (cost=252683.61..252683.68 rows=28 width=116) (actual
time=169377.32..169378.39 rows=1000 loops=1)
   ->  Sort  (cost=252683.61..252683.68 rows=29 width=116) (actual
time=169377.31..169377.69 rows=1001 loops=1)
         Sort Key: max((actor.actor_full_name)::text),
count(case_data.case_id)
         ->  Aggregate  (cost=252678.57..252682.91 rows=29 width=116)
(actual time=169305.79..169354.50 rows=3456 loops=1)
               ->  Group  (cost=252678.57..252680.01 rows=289 width=116)
(actual time=169305.76..169330.00 rows=5879 loops=1)
                     ->  Sort  (cost=252678.57..252679.29 rows=289
width=116) (actual time=169305.75..169308.15 rows=5879 loops=1)
                           Sort Key: actor.actor_id
                           ->  Nested Loop  (cost=0.00..252666.74 rows=289
width=116) (actual time=89.27..169273.51 rows=5879 loops=1)
                                 ->  Nested Loop  (cost=0.00..251608.11
rows=289 width=77) (actual time=57.73..92753.49 rows=5882 loops=1)
                                       ->  Index Scan using
actor_full_name_uppercase on actor  (cost=0.00..456.88 rows=113 width=42)
(actual time=32.80..3197.28 rows=3501 loops=1)
                                             Index Cond:
((actor_full_name_uppercase >= 'SANDERS'::character varying) AND
(actor_full_name_uppercase < 'SANDERT'::character varying))
                                             Filter:
(actor_full_name_uppercase ~~ 'SANDERS%'::text)
                                       ->  Index Scan using
actor_case_assignment_actor_id on actor_case_assignment  (cost=0.00..2181.29
rows=2616 width=35) (actual time=22.26..25.57 rows=2 loops=3501)
                                             Index Cond: ("outer".actor_id =
actor_case_assignment.actor_id)
                                 ->  Index Scan using case_data_case_id on
case_data  (cost=0.00..3.65 rows=1 width=39) (actual time=13.00..13.00
rows=1 loops=5882)
                                       Index Cond: (case_data.case_id =
"outer".case_id)
 Total runtime: 169381.38 msec
(17 rows)



pgsql-performance by date:

Previous
From: "Marc G. Fournier"
Date:
Subject: Re: *very* slow query to summarize data for a month ...
Next
From: Tom Lane
Date:
Subject: Re: Seeking help with a query that takes too long