[8.1.4] Help optimizing query - Mailing list pgsql-performance

From Hayes
Subject [8.1.4] Help optimizing query
Date
Msg-id aperez-2D52F6.16115426082006@news.hub.org
Whole thread Raw
Responses Re: [8.1.4] Help optimizing query
List pgsql-performance
Hi all,

I need help making the below query go faster.  There are about 5 of
these being done
in one report.  Some use joins instead of subselects but they all go
about the same speed.
I'm not much of a SQL person so I just threw indices on everything
involved but it still
takes about at least 19sec and sometimes as much as 60s.  I'd be happy
to get it to about
5s.

Other info:  random_page_cost is 4 as I/O on this box as kinda slow.
It's a converted desktop with a single IDE drive.  shared_buffers is
12500 and effective_page_cache is 100M.  I upped the statistics on
entry_date and session_id to 1000.  I analyzed the tables after
modifying the statistics.  The actual deployment platform is a lot
beefier but I'd like these queries to at least be tolerable on this
machine.

I can see that the estimate for the GroupAggregate is off, if I'm
interpreting things correctly, but I don't know what to do about it.

tia,
arturo

Query:

SELECT subscription_id,
       to_char(sum(session_length), 'HH24:MI:SS') as session_length,
       sum(hits) as hits,
       2006 as theYear,
       2 as theQuarter,
       sum(count) as count
FROM (
     SELECT subscription_id,
       count(distinct session_id) as count,
       age(MAX(entry_date),MIN(entry_date)) as session_length,
       COUNT(action) as hits
     FROM
   extended_user JOIN user_tracking USING (user_id)
   WHERE subscription_id > 0 AND
         EXTRACT(year from entry_date) = 2006 AND
         EXTRACT(quarter from entry_date) = 2
         GROUP BY session_id,
          subscription_id
          ) as session_stuff
              WHERE subscription_id > 0
         GROUP BY subscription_id
         ORDER BY subscription_id;


Sort  (cost=123305.88..123306.38 rows=200 width=36) (actual
time=75039.706..75040.500 rows=258 loops=1)
   Sort Key: session_stuff.subscription_id
   ->  HashAggregate  (cost=123294.24..123298.24 rows=200 width=36)
(actual time=75036.487..75038.360 rows=258 loops=1)
         ->  GroupAggregate  (cost=108839.34..118475.94 rows=240915
width=72) (actual time=68016.583..74702.710 rows=38369 loops=1)
               ->  Sort  (cost=108839.34..109441.63 rows=240915
width=72) (actual time=67978.193..68982.962 rows=245727 loops=1)
                     Sort Key: user_tracking.session_id,
extended_user.subscription_id
                     ->  Hash Join  (cost=7746.59..75492.37 rows=240915
width=72) (actual time=16944.487..50737.230 rows=245727 loops=1)
                           Hash Cond: ("outer".user_id = "inner".user_id)
                           ->  Bitmap Heap Scan on user_tracking
(cost=7524.10..68644.10 rows=240950 width=72) (actual
time=16843.695..48306.383 rows=258923 loops=1)
                                 Recheck Cond:
((date_part('quarter'::text, entry_date) = 2::double precision) AND
(date_part('year'::text, entry_date) = 2006::double precision))
                                 ->  BitmapAnd  (cost=7524.10..7524.10
rows=240950 width=0) (actual time=16779.178..16779.178 rows=0 loops=1)
                                       ->  Bitmap Index Scan on
user_tracking_quarter_idx  (cost=0.00..3331.51 rows=533288 width=0)
(actual time=9079.545..9079.545 rows=533492 loops=1)
                                             Index Cond:
(date_part('quarter'::text, entry_date) = 2::double precision)
                                       ->  Bitmap Index Scan on
user_tracking_year_idx  (cost=0.00..4192.34 rows=671239 width=0) (actual
time=7685.906..7685.906 rows=671787 loops=1)
                                             Index Cond:
(date_part('year'::text, entry_date) = 2006::double precision)
                           ->  Hash  (cost=206.42..206.42 rows=6428
width=8) (actual time=100.754..100.754 rows=6411 loops=1)
                                 ->  Seq Scan on extended_user
(cost=0.00..206.42 rows=6428 width=8) (actual time=0.020..28.873
rows=6411 loops=1)
                                       Filter: ((subscription_id > 0)
AND (subscription_id > 0))
 Total runtime: 75069.453 ms

Tables:

This one has about 6-7k rows.

               Table "public.extended_user"
      Column       |           Type           | Modifiers
-------------------+--------------------------+-----------
 create_date       | timestamp with time zone | not null
 email             | character varying(99)    |
 first_name        | character varying(99)    | not null
 last_name         | character varying(99)    | not null
 license_agreement | boolean                  | not null
 license_date      | timestamp with time zone |
 password          | character varying(32)    | not null
 subscription_id   | integer                  | not null
 user_id           | integer                  | not null
 user_name         | character varying(99)    | not null
Indexes:
    "extended_user_pkey" PRIMARY KEY, btree (user_id)
    "extended_user_subscription_id_idx" btree (subscription_id)
    "extended_user_subscription_idx" btree (subscription_id)
Foreign-key constraints:
    "extended_user_subscription_id_fkey" FOREIGN KEY (subscription_id)
REFERENCES subscription(subscription_id) DEFERRABLE INITIALLY DEFERRED

This one has about 2k rows.

               Table "public.subscription"
      Column      |           Type           | Modifiers
------------------+--------------------------+-----------
 allow_printing   | boolean                  | not null
 company_id       | character varying(50)    | not null
 company_name     | character varying(100)   | not null
 end_date         | timestamp with time zone |
 licenses         | integer                  | not null
 pass_through_key | character varying(50)    |
 start_date       | timestamp with time zone | not null
 subscription_id  | integer                  | not null
Indexes:
    "subscription_pkey" PRIMARY KEY, btree (subscription_id)


This one has about 1.4M rows.  It's kind of a log of pages visited.

                                               Table
"public.user_tracking"
      Column      |            Type             |
Modifiers
------------------+-----------------------------+------------------------
--------------------------------------------------
 action           | character varying(255)      | not null
 entry_date       | timestamp without time zone | not null
 note             | text                        |
 report_id        | integer                     |
 session_id       | character varying(255)      | not null
 user_id          | integer                     |
 user_tracking_id | integer                     | not null default
nextval('user_tracking_user_tracking_id_seq'::regclass)
Indexes:
    "user_tracking_pkey" PRIMARY KEY, btree (user_tracking_id)
    "user_tracking_entry_date_idx" btree (entry_date)
    "user_tracking_month_idx" btree (date_part('month'::text,
entry_date))
    "user_tracking_quarter_idx" btree (date_part('quarter'::text,
entry_date))
    "user_tracking_report_id_idx" btree (report_id)
    "user_tracking_session_idx" btree (session_id)
    "user_tracking_user_id_idx" btree (user_id)
    "user_tracking_year_idx" btree (date_part('year'::text, entry_date))
Foreign-key constraints:
    "user_tracking_report_id_fkey" FOREIGN KEY (report_id) REFERENCES
article(article_id) DEFERRABLE INITIALLY DEFERRED
    "user_tracking_user_id_fkey" FOREIGN KEY (user_id) REFERENCES
extended_user(user_id) DEFERRABLE INITIALLY DEFERRED

pgsql-performance by date:

Previous
From: "George Pavlov"
Date:
Subject: stats reset during pg_restore?
Next
From: Alvaro Herrera
Date:
Subject: Re: stats reset during pg_restore?