Home > mailing lists

Re: Question about difference in performance of 2 queries - Mailing list pgsql-performance

From	Nicholas Shanny
Subject	Re: Question about difference in performance of 2 queries
Date	December 30, 2003 11:03:15
Msg-id	E8CA1CB3-3AD8-11D8-BE83-000A95C41B96@tripadvisor.com Whole thread Raw
In response to	Re: Question about difference in performance of 2 queries (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-performance

Tree view

Tom,

I understand the problem and your solution makes sense although I am
still puzzled by the machine under-utilization. If you run the original
query and monitor the IO/CPU usage you find that it is minimal.

here is the output from iostat 1 for a brief portion of the query. I am
very curious to understand why when scanning the index the IO/CPU
utilization is seemingly low.

Cheers
Nick Shanny
TripAdvisor, Inc.

   0   77 32.00 106  3.31   0.00   0  0.00   0.00   0  0.00   0  0  2  0
98
    0   76 32.00 125  3.90   0.00   0  0.00   0.00   0  0.00   0  0  2
0 97
    0   76 32.00 125  3.90   0.00   0  0.00   0.00   0  0.00   0  0  1
1 98
    0   76 32.75 127  4.05   0.00   0  0.00   0.00   0  0.00   0  0  1
0 99
       tty           aacd0             acd0              fd0
cpu
  tin tout  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy
in id
    0   76 32.00 127  3.96   0.00   0  0.00   0.00   0  0.00   0  0  3
0 97
    0  229 32.24 135  4.24   0.00   0  0.00   0.00   0  0.00   0  0  4
0 95
    0   76 32.00 129  4.02   0.00   0  0.00   0.00   0  0.00   0  0  2
0 97
    0   76 32.00 123  3.84   0.00   0  0.00   0.00   0  0.00   0  0  2
0 98
    0   76 31.72 115  3.56   0.00   0  0.00   0.00   0  0.00   0  0  2
0 98
    0   76 32.50 126  3.99   0.00   0  0.00   0.00   0  0.00   0  0  3
1 96
    0   76 32.00 123  3.84   0.00   0  0.00   0.00   0  0.00   0  0  3
0 97
    0   76 32.00 122  3.81   0.00   0  0.00   0.00   0  0.00   1  0  2
0 97
    0   76 32.00 135  4.21   0.00   0  0.00   0.00   0  0.00   0  0  2
1 97
    0   76 32.00  97  3.03   0.00   0  0.00   0.00   0  0.00   0  0  3
0 97

On Dec 29, 2003, at 2:39 PM, Tom Lane wrote:

> Sean Shanny <shannyconsulting@earthlink.net> writes:
>> Here is the pg_stats data.  The explain analyze queries are still
>> running.
>
>> select * from pg_stats where tablename = 'f_pageviews' and attname =
>> 'content_key';
>>  schemaname |  tablename  |   attname   | null_frac | avg_width |
>> n_distinct | most_common_vals |   most_common_freqs
>> |
>> histogram_bounds                                   | correlation
>> ------------+-------------+-------------+-----------+-----------
>> +------------+------------------+-----------------------
>> +---------------------------------------------------------------------
>> ----------------+-------------
>>  public     | f_pageviews | content_key |         0 |         4 |
>> 983 | {-1,1528483}     | {0.749333,0.00166667} |
>
> Oh-ho, I see the problem: about 75% of your table has content_key = -1.
>
> Why is that a problem, you ask?  Well, the planner realizes that
> "content_key > -1" is a pretty good restriction condition (better than
> the date condition, apparently) and so it tries to use that as the
> index
> scan condition.  The problem is that in 7.4 and before, the btree index
> code implements a "> -1" scan starting boundary by finding the first -1
> and then advancing to the first key that's not -1.  So you end up
> scanning through 75% of the index before anything useful happens :-(
>
> I just fixed this poor behavior in CVS tip a couple weeks ago:
> http://archives.postgresql.org/pgsql-committers/2003-12/msg00220.php
> but the patch seems too large and unproven to risk back-patching into
> 7.4.*.
>
> If you expect that a pretty large fraction of your data will always
> have
> dummy content_key, it'd probably be worth changing the index to not
> index -1's at all --- that is, make it a partial index with the
> condition "WHERE content_key > -1".  Another workaround is to leave the
> index as-is but phrase the query WHERE condition as "content_key >= 0"
> instead of "> -1".
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>

pgsql-performance by date:

From: Vivek Khera
Date: 29 December 2003, 16:34:47
Subject: deferred foreign keys

From: Jonathan Gardner
Date: 30 December 2003, 12:33:40
Subject: DELETE ... WHERE ctid IN (...) vs. Iteration

Re: Question about difference in performance of 2 queries - Mailing list pgsql-performance

Previous

Next