Question about difference in performance of 2 queries on large table - Mailing list pgsql-performance
From | Sean Shanny |
---|---|
Subject | Question about difference in performance of 2 queries on large table |
Date | |
Msg-id | 3FF057DF.7000605@earthlink.net Whole thread Raw |
Responses |
Re: Question about difference in performance of 2 queries on large table
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: Question about difference in performance of 2 queries (Dennis Bjorklund <db@zigo.dhs.org>) |
List | pgsql-performance |
To all, The facts: PostgreSQL 7.4.0 running on BSD 5.1 on Dell 2650 with 4GB RAM, 5 SCSI drives in hardware RAID 0 configuration. Database size with indexes is currently 122GB. Schema for the table in question is at the end of this email. The DB has been vacuumed full and analyzed. Between 2 and 3 million records are added to the table in question each night. An analyze is run on the entire DB after the data has been loaded each night. There are no updates or deletes of records during the nightly load, only insertions. I am trying to understand why the performance between the two queries below is so different. I am trying to find the count of all pages that have a 'valid' content_key. -1 is our 'we don't have any content' key. The first plan below has horrendous performance. we only get about 2% CPU usage and iostat shows 3-5 MB/sec IO. The second plan runs at 30% cpu and 15-30MB.sec IO. Could someone shed some light on why the huge difference in performance? Both are doing index scans plus a filter. We have no content_keys below -1 at this time so the queries return the same results. Thanks. --sean explain select count (distinct (persistent_cookie_key) ) from f_pageviews where date_key between 305 and 334 and content_key > -1; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Aggregate (cost=688770.29..688770.29 rows=1 width=4) -> Index Scan using idx_pageviews_content on f_pageviews (cost=0.00..645971.34 rows=17119580 width=4) Index Cond: (content_key > -1) Filter: ((date_key >= 305) AND (date_key <= 334)) (4 rows) explain select count (distinct (persistent_cookie_key) ) from f_pageviews where date_key between 305 and 334 and content_key <> -1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Aggregate (cost=1365419.12..1365419.12 rows=1 width=4) -> Index Scan using idx_pageviews_date_nov_2003 on f_pageviews (cost=0.00..1322615.91 rows=17121284 width=4) Index Cond: ((date_key >= 305) AND (date_key <= 334)) Filter: (content_key <> -1) (4 rows) \d f_pageviews Table "public.f_pageviews" Column | Type | Modifiers ------------------------+---------+------------------------------------------------------------- id | integer | not null default nextval('public.f_pageviews_id_seq'::text) date_key | integer | not null time_key | integer | not null content_key | integer | not null location_key | integer | not null session_key | integer | not null subscriber_key | text | not null persistent_cookie_key | integer | not null ip_key | integer | not null referral_key | integer | not null servlet_key | integer | not null tracking_key | integer | not null provider_key | text | not null marketing_campaign_key | integer | not null orig_airport | text | not null dest_airport | text | not null commerce_page | boolean | not null default false job_control_number | integer | not null sequenceid | integer | not null default 0 url_key | integer | not null useragent_key | integer | not null web_server_name | text | not null default 'Not Available'::text cpc | integer | not null default 0 referring_servlet_key | integer | default 1 first_page_key | integer | default 1 newsletterid_key | text | not null default 'Not Available'::text Indexes: "f_pageviews_pkey" primary key, btree (id) "idx_pageviews_content" btree (content_key) "idx_pageviews_date_dec_2003" btree (date_key) WHERE ((date_key >= 335) AND (date_key <= 365)) "idx_pageviews_date_nov_2003" btree (date_key) WHERE ((date_key >= 304) AND (date_key <= 334)) "idx_pageviews_referring_servlet" btree (referring_servlet_key) "idx_pageviews_servlet" btree (servlet_key) "idx_pageviews_session" btree (session_key)
pgsql-performance by date: