Re: Question about difference in performance of 2 queries - Mailing list pgsql-performance
From | Nicholas Shanny |
---|---|
Subject | Re: Question about difference in performance of 2 queries |
Date | |
Msg-id | E8CA1CB3-3AD8-11D8-BE83-000A95C41B96@tripadvisor.com Whole thread Raw |
In response to | Re: Question about difference in performance of 2 queries (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-performance |
Tom, I understand the problem and your solution makes sense although I am still puzzled by the machine under-utilization. If you run the original query and monitor the IO/CPU usage you find that it is minimal. here is the output from iostat 1 for a brief portion of the query. I am very curious to understand why when scanning the index the IO/CPU utilization is seemingly low. Cheers Nick Shanny TripAdvisor, Inc. 0 77 32.00 106 3.31 0.00 0 0.00 0.00 0 0.00 0 0 2 0 98 0 76 32.00 125 3.90 0.00 0 0.00 0.00 0 0.00 0 0 2 0 97 0 76 32.00 125 3.90 0.00 0 0.00 0.00 0 0.00 0 0 1 1 98 0 76 32.75 127 4.05 0.00 0 0.00 0.00 0 0.00 0 0 1 0 99 tty aacd0 acd0 fd0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 76 32.00 127 3.96 0.00 0 0.00 0.00 0 0.00 0 0 3 0 97 0 229 32.24 135 4.24 0.00 0 0.00 0.00 0 0.00 0 0 4 0 95 0 76 32.00 129 4.02 0.00 0 0.00 0.00 0 0.00 0 0 2 0 97 0 76 32.00 123 3.84 0.00 0 0.00 0.00 0 0.00 0 0 2 0 98 0 76 31.72 115 3.56 0.00 0 0.00 0.00 0 0.00 0 0 2 0 98 0 76 32.50 126 3.99 0.00 0 0.00 0.00 0 0.00 0 0 3 1 96 0 76 32.00 123 3.84 0.00 0 0.00 0.00 0 0.00 0 0 3 0 97 0 76 32.00 122 3.81 0.00 0 0.00 0.00 0 0.00 1 0 2 0 97 0 76 32.00 135 4.21 0.00 0 0.00 0.00 0 0.00 0 0 2 1 97 0 76 32.00 97 3.03 0.00 0 0.00 0.00 0 0.00 0 0 3 0 97 On Dec 29, 2003, at 2:39 PM, Tom Lane wrote: > Sean Shanny <shannyconsulting@earthlink.net> writes: >> Here is the pg_stats data. The explain analyze queries are still >> running. > >> select * from pg_stats where tablename = 'f_pageviews' and attname = >> 'content_key'; >> schemaname | tablename | attname | null_frac | avg_width | >> n_distinct | most_common_vals | most_common_freqs >> | >> histogram_bounds | correlation >> ------------+-------------+-------------+-----------+----------- >> +------------+------------------+----------------------- >> +--------------------------------------------------------------------- >> ----------------+------------- >> public | f_pageviews | content_key | 0 | 4 | >> 983 | {-1,1528483} | {0.749333,0.00166667} | > > Oh-ho, I see the problem: about 75% of your table has content_key = -1. > > Why is that a problem, you ask? Well, the planner realizes that > "content_key > -1" is a pretty good restriction condition (better than > the date condition, apparently) and so it tries to use that as the > index > scan condition. The problem is that in 7.4 and before, the btree index > code implements a "> -1" scan starting boundary by finding the first -1 > and then advancing to the first key that's not -1. So you end up > scanning through 75% of the index before anything useful happens :-( > > I just fixed this poor behavior in CVS tip a couple weeks ago: > http://archives.postgresql.org/pgsql-committers/2003-12/msg00220.php > but the patch seems too large and unproven to risk back-patching into > 7.4.*. > > If you expect that a pretty large fraction of your data will always > have > dummy content_key, it'd probably be worth changing the index to not > index -1's at all --- that is, make it a partial index with the > condition "WHERE content_key > -1". Another workaround is to leave the > index as-is but phrase the query WHERE condition as "content_key >= 0" > instead of "> -1". > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >
pgsql-performance by date: