Thread: Question about difference in performance of 2 queries on large table
To all, The facts: PostgreSQL 7.4.0 running on BSD 5.1 on Dell 2650 with 4GB RAM, 5 SCSI drives in hardware RAID 0 configuration. Database size with indexes is currently 122GB. Schema for the table in question is at the end of this email. The DB has been vacuumed full and analyzed. Between 2 and 3 million records are added to the table in question each night. An analyze is run on the entire DB after the data has been loaded each night. There are no updates or deletes of records during the nightly load, only insertions. I am trying to understand why the performance between the two queries below is so different. I am trying to find the count of all pages that have a 'valid' content_key. -1 is our 'we don't have any content' key. The first plan below has horrendous performance. we only get about 2% CPU usage and iostat shows 3-5 MB/sec IO. The second plan runs at 30% cpu and 15-30MB.sec IO. Could someone shed some light on why the huge difference in performance? Both are doing index scans plus a filter. We have no content_keys below -1 at this time so the queries return the same results. Thanks. --sean explain select count (distinct (persistent_cookie_key) ) from f_pageviews where date_key between 305 and 334 and content_key > -1; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Aggregate (cost=688770.29..688770.29 rows=1 width=4) -> Index Scan using idx_pageviews_content on f_pageviews (cost=0.00..645971.34 rows=17119580 width=4) Index Cond: (content_key > -1) Filter: ((date_key >= 305) AND (date_key <= 334)) (4 rows) explain select count (distinct (persistent_cookie_key) ) from f_pageviews where date_key between 305 and 334 and content_key <> -1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Aggregate (cost=1365419.12..1365419.12 rows=1 width=4) -> Index Scan using idx_pageviews_date_nov_2003 on f_pageviews (cost=0.00..1322615.91 rows=17121284 width=4) Index Cond: ((date_key >= 305) AND (date_key <= 334)) Filter: (content_key <> -1) (4 rows) \d f_pageviews Table "public.f_pageviews" Column | Type | Modifiers ------------------------+---------+------------------------------------------------------------- id | integer | not null default nextval('public.f_pageviews_id_seq'::text) date_key | integer | not null time_key | integer | not null content_key | integer | not null location_key | integer | not null session_key | integer | not null subscriber_key | text | not null persistent_cookie_key | integer | not null ip_key | integer | not null referral_key | integer | not null servlet_key | integer | not null tracking_key | integer | not null provider_key | text | not null marketing_campaign_key | integer | not null orig_airport | text | not null dest_airport | text | not null commerce_page | boolean | not null default false job_control_number | integer | not null sequenceid | integer | not null default 0 url_key | integer | not null useragent_key | integer | not null web_server_name | text | not null default 'Not Available'::text cpc | integer | not null default 0 referring_servlet_key | integer | default 1 first_page_key | integer | default 1 newsletterid_key | text | not null default 'Not Available'::text Indexes: "f_pageviews_pkey" primary key, btree (id) "idx_pageviews_content" btree (content_key) "idx_pageviews_date_dec_2003" btree (date_key) WHERE ((date_key >= 335) AND (date_key <= 365)) "idx_pageviews_date_nov_2003" btree (date_key) WHERE ((date_key >= 304) AND (date_key <= 334)) "idx_pageviews_referring_servlet" btree (referring_servlet_key) "idx_pageviews_servlet" btree (servlet_key) "idx_pageviews_session" btree (session_key)
Please show EXPLAIN ANALYZE output for your queries, not just EXPLAIN. Also it would be useful to see the pg_stats rows for the date_key and content_key columns. regards, tom lane
On Mon, 29 Dec 2003, Sean Shanny wrote: > The first plan below has horrendous performance. we only get about 2% > CPU usage and iostat shows 3-5 MB/sec IO. The second plan runs at 30% > cpu and 15-30MB.sec IO. > > Could someone shed some light on why the huge difference in > performance? Both are doing index scans plus a filter. We have no > content_keys below -1 at this time so the queries return the same results. EXPLAIN ANALYZE gives more information then EXPLAIN, and is prefered. It uses different indexes in the two queries, and one seems to be faster then the other. Why, I can't tell yet. I would assume that you would get the fastet result if you had an index (content_key, date_key) I don't know if pg will even use an index to speed up a <> operation. When you had > then it could use the idx_pageviews_content index. Why it choose that when the other would be faster I don't know. Maybe explain analyze will give some hint. -- /Dennis
I am running explain analyze now and will post results as they finish. Thanks. --sean Tom Lane wrote: >Please show EXPLAIN ANALYZE output for your queries, not just EXPLAIN. >Also it would be useful to see the pg_stats rows for the date_key and >content_key columns. > > regards, tom lane > > >
Here is the pg_stats data. The explain analyze queries are still running. select * from pg_stats where tablename = 'f_pageviews' and attname = 'date_key'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation ------------+-------------+----------+-----------+-----------+------------+-------------------------------------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------+------------- public | f_pageviews | date_key | 0 | 4 | 60 | {335,307,309,336,308,321,314,342,322,316} | {0.0283333,0.0243333,0.0243333,0.0243333,0.024,0.0233333,0.0226667,0.0226667,0.0223333,0.0216667} | {304,311,318,325,329,334,341,346,351,356,363} | 0.345026 (1 row) select * from pg_stats where tablename = 'f_pageviews' and attname = 'content_key'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation ------------+-------------+-------------+-----------+-----------+------------+------------------+-----------------------+-------------------------------------------------------------------------------------+------------- public | f_pageviews | content_key | 0 | 4 | 983 | {-1,1528483} | {0.749333,0.00166667} | {38966,323835,590676,717061,919148,1091875,1208244,1299702,1375366,1434079,1528910} | 0.103399 (1 row) Thanks. --sean Tom Lane wrote: >Please show EXPLAIN ANALYZE output for your queries, not just EXPLAIN. >Also it would be useful to see the pg_stats rows for the date_key and >content_key columns. > > regards, tom lane > > >
Here is one of the explain analyzes. This is the from the faster query. Ignore the total runtime as we are currently doing other queries on this machine so it is slightly loaded. Thanks. --sean explain analyze select count (distinct (persistent_cookie_key) ) from f_pageviews where date_key between 305 and 334 and content_key <> -1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1384925.95..1384925.95 rows=1 width=4) (actual time=4541462.030..4541462.034 rows=1 loops=1) -> Index Scan using idx_pageviews_date_nov_2003 on f_pageviews (cost=0.00..1343566.52 rows=16543772 width=4) (actual time=83.267..4286664.678 rows=15710722 loops=1) Index Cond: ((date_key >= 305) AND (date_key <= 334)) Filter: (content_key <> -1) Total runtime: 4541550.832 ms (5 rows) Tom Lane wrote: >Please show EXPLAIN ANALYZE output for your queries, not just EXPLAIN. >Also it would be useful to see the pg_stats rows for the date_key and >content_key columns. > > regards, tom lane > > >
Sean Shanny <shannyconsulting@earthlink.net> writes: > Here is the pg_stats data. The explain analyze queries are still running. > select * from pg_stats where tablename = 'f_pageviews' and attname = > 'content_key'; > schemaname | tablename | attname | null_frac | avg_width | > n_distinct | most_common_vals | most_common_freqs > | > histogram_bounds | correlation > ------------+-------------+-------------+-----------+-----------+------------+------------------+-----------------------+-------------------------------------------------------------------------------------+------------- > public | f_pageviews | content_key | 0 | 4 | > 983 | {-1,1528483} | {0.749333,0.00166667} | Oh-ho, I see the problem: about 75% of your table has content_key = -1. Why is that a problem, you ask? Well, the planner realizes that "content_key > -1" is a pretty good restriction condition (better than the date condition, apparently) and so it tries to use that as the index scan condition. The problem is that in 7.4 and before, the btree index code implements a "> -1" scan starting boundary by finding the first -1 and then advancing to the first key that's not -1. So you end up scanning through 75% of the index before anything useful happens :-( I just fixed this poor behavior in CVS tip a couple weeks ago: http://archives.postgresql.org/pgsql-committers/2003-12/msg00220.php but the patch seems too large and unproven to risk back-patching into 7.4.*. If you expect that a pretty large fraction of your data will always have dummy content_key, it'd probably be worth changing the index to not index -1's at all --- that is, make it a partial index with the condition "WHERE content_key > -1". Another workaround is to leave the index as-is but phrase the query WHERE condition as "content_key >= 0" instead of "> -1". regards, tom lane
Tom, Thanks. I will make the changes you suggest concerning the indexes. I am finding partial indexes to be very handy. :-) I canceled the explain analyze on the other query as we have found the problem and who knows how long it would take to complete. Thanks again. --sean Tom Lane wrote: >Sean Shanny <shannyconsulting@earthlink.net> writes: > > >>Here is the pg_stats data. The explain analyze queries are still running. >> >> > > > >>select * from pg_stats where tablename = 'f_pageviews' and attname = >>'content_key'; >> schemaname | tablename | attname | null_frac | avg_width | >>n_distinct | most_common_vals | most_common_freqs >>| >>histogram_bounds | correlation >>------------+-------------+-------------+-----------+-----------+------------+------------------+-----------------------+-------------------------------------------------------------------------------------+------------- >> public | f_pageviews | content_key | 0 | 4 | >>983 | {-1,1528483} | {0.749333,0.00166667} | >> >> > >Oh-ho, I see the problem: about 75% of your table has content_key = -1. > >Why is that a problem, you ask? Well, the planner realizes that >"content_key > -1" is a pretty good restriction condition (better than >the date condition, apparently) and so it tries to use that as the index >scan condition. The problem is that in 7.4 and before, the btree index >code implements a "> -1" scan starting boundary by finding the first -1 >and then advancing to the first key that's not -1. So you end up >scanning through 75% of the index before anything useful happens :-( > >I just fixed this poor behavior in CVS tip a couple weeks ago: >http://archives.postgresql.org/pgsql-committers/2003-12/msg00220.php >but the patch seems too large and unproven to risk back-patching into >7.4.*. > >If you expect that a pretty large fraction of your data will always have >dummy content_key, it'd probably be worth changing the index to not >index -1's at all --- that is, make it a partial index with the >condition "WHERE content_key > -1". Another workaround is to leave the >index as-is but phrase the query WHERE condition as "content_key >= 0" >instead of "> -1". > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > > >
Tom, I understand the problem and your solution makes sense although I am still puzzled by the machine under-utilization. If you run the original query and monitor the IO/CPU usage you find that it is minimal. here is the output from iostat 1 for a brief portion of the query. I am very curious to understand why when scanning the index the IO/CPU utilization is seemingly low. Cheers Nick Shanny TripAdvisor, Inc. 0 77 32.00 106 3.31 0.00 0 0.00 0.00 0 0.00 0 0 2 0 98 0 76 32.00 125 3.90 0.00 0 0.00 0.00 0 0.00 0 0 2 0 97 0 76 32.00 125 3.90 0.00 0 0.00 0.00 0 0.00 0 0 1 1 98 0 76 32.75 127 4.05 0.00 0 0.00 0.00 0 0.00 0 0 1 0 99 tty aacd0 acd0 fd0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 76 32.00 127 3.96 0.00 0 0.00 0.00 0 0.00 0 0 3 0 97 0 229 32.24 135 4.24 0.00 0 0.00 0.00 0 0.00 0 0 4 0 95 0 76 32.00 129 4.02 0.00 0 0.00 0.00 0 0.00 0 0 2 0 97 0 76 32.00 123 3.84 0.00 0 0.00 0.00 0 0.00 0 0 2 0 98 0 76 31.72 115 3.56 0.00 0 0.00 0.00 0 0.00 0 0 2 0 98 0 76 32.50 126 3.99 0.00 0 0.00 0.00 0 0.00 0 0 3 1 96 0 76 32.00 123 3.84 0.00 0 0.00 0.00 0 0.00 0 0 3 0 97 0 76 32.00 122 3.81 0.00 0 0.00 0.00 0 0.00 1 0 2 0 97 0 76 32.00 135 4.21 0.00 0 0.00 0.00 0 0.00 0 0 2 1 97 0 76 32.00 97 3.03 0.00 0 0.00 0.00 0 0.00 0 0 3 0 97 On Dec 29, 2003, at 2:39 PM, Tom Lane wrote: > Sean Shanny <shannyconsulting@earthlink.net> writes: >> Here is the pg_stats data. The explain analyze queries are still >> running. > >> select * from pg_stats where tablename = 'f_pageviews' and attname = >> 'content_key'; >> schemaname | tablename | attname | null_frac | avg_width | >> n_distinct | most_common_vals | most_common_freqs >> | >> histogram_bounds | correlation >> ------------+-------------+-------------+-----------+----------- >> +------------+------------------+----------------------- >> +--------------------------------------------------------------------- >> ----------------+------------- >> public | f_pageviews | content_key | 0 | 4 | >> 983 | {-1,1528483} | {0.749333,0.00166667} | > > Oh-ho, I see the problem: about 75% of your table has content_key = -1. > > Why is that a problem, you ask? Well, the planner realizes that > "content_key > -1" is a pretty good restriction condition (better than > the date condition, apparently) and so it tries to use that as the > index > scan condition. The problem is that in 7.4 and before, the btree index > code implements a "> -1" scan starting boundary by finding the first -1 > and then advancing to the first key that's not -1. So you end up > scanning through 75% of the index before anything useful happens :-( > > I just fixed this poor behavior in CVS tip a couple weeks ago: > http://archives.postgresql.org/pgsql-committers/2003-12/msg00220.php > but the patch seems too large and unproven to risk back-patching into > 7.4.*. > > If you expect that a pretty large fraction of your data will always > have > dummy content_key, it'd probably be worth changing the index to not > index -1's at all --- that is, make it a partial index with the > condition "WHERE content_key > -1". Another workaround is to leave the > index as-is but phrase the query WHERE condition as "content_key >= 0" > instead of "> -1". > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >