7.4 - basic tuning question - Mailing list pgsql-performance

From Simon Waters
Subject 7.4 - basic tuning question
Date
Msg-id 200810031437.35440.simonw@zynet.net
Whole thread Raw
Responses Re: 7.4 - basic tuning question  (Richard Huxton <dev@archonet.com>)
List pgsql-performance
Hi,

we have a log table on one server with 1.9 million records.

One column "event" (type text) in that table is a string that (currently)
takes a small number of distinct values (~43) (hmm that could have been
normalised better).

We noted on querying for events of a specific type, that the queries were
slower than expected. It simply wasn't using the index (btree, default
settings) on this column on this server (the test server, with less records,
was fine).

Using "ALTER TABLE SET STATISTICS" to increase the number of buckets to 50
resolved the issue, we went pretty much straight there on discovering there
are no "HINTS".

However we aren't quite sure why this case was pathological, and my brain
doesn't grok the documentation quite.

I assume that the histogram_bounds for strings are alphabetical in order, so
that "DEMOSTART" falls between "DELETE" and "IDEMAIL". Even on a worst case
of including both these common values, the planner ought to have assumed that
less than <10% of records were likely covered by the value selected, so it
seems unlikely to me that not using the index would be a good idea.

What am I missing? (and yes there is a plan to upgrade!).


=> SELECT COUNT(*) FROM log WHERE event='DEMOSTART';
(...lots of time passes...)
 count
-------
  1432
(1 row)


=> SELECT COUNT(*), event FROM log GROUP BY event ORDER BY count;

  count |   event
--------+-----------
      6 | DNRFAIL
     14 | ADMDNR
     14 | UPGRADE
     18 | FOCRENEW
     21 | AUTOCN
     25 | ADMCC
     27 | TEMPIN
     31 | DNRCANCEL
     43 | EXPIRED
    128 | DIRECTBUY
    130 | CANCEL
    130 | CANCELQ
    154 | FOCBUY
    173 | EXPCCWARN
    179 | OFFER
    209 | DNROK
    214 | TEMPRE
    356 | CCWARN
    429 | ADMLOGIN
    719 | SUBSCRIBE
    787 | CCSUCCESS
    988 | CCFAILURE
   1217 | TEMPNEW
   1298 | PAYPAL
   1431 | DEMOSTART
   1776 | CCREQUEST
   2474 | ACCTUPD
  15169 | SYSMAINT
  42251 | IDEMAIL
  46964 | DELETE
  50764 | RELOGIN
  57022 | NEWUSR
  64907 | PUBREC0
  65449 | UNPUBLISH
  92843 | LOGOUT
  99018 | KILLSESS
 128900 | UPLOAD
 134994 | LOGIN
 137608 | NEWPAGE
 447556 | PUBREC1
 489572 | PUBLISH


=> EXPLAIN SELECT * FROM log WHERE event='DEMOSTART';
                         QUERY PLAN
------------------------------------------------------------
 Seq Scan on log  (cost=0.00..54317.14 rows=20436 width=93)
   Filter: (event = 'DEMOSTART'::text)
(2 rows)


=> ALTER TABLE log ALTER COLUMN events SET STATISTICS 50; ANALYSE
LOG(event);
ALTER TABLE
ANALYZE


=> EXPLAIN SELECT COUNT(*) FROM log WHERE event='DEMOSTART';
                                    QUERY PLAN
----------------------------------------------------------------------------
-------
 Aggregate  (cost=5101.43..5101.43 rows=1 width=0)
   ->  Index Scan using log_event on log  (cost=0.00..5098.15 rows=1310
width=0)
         Index Cond: (event = 'DEMOSTART'::text)
(3 rows)


=> SELECT COUNT(*) FROM log WHERE event='DEMOSTART';
(...almost no time passes...)
 count
-------
  1432
(1 row)


BEFORE
pajax=> select * from pg_stats where tablename = 'log' and attname='event';
 schemaname | tablename | attname | null_frac | avg_width | n_distinct |
most_common_vals                    |
most_common_freqs                         |
histogram_bounds                                       | correlation

------------+-----------+---------+-----------+-----------+------------+--------------------------------------------------------+-------------------------------------------------------------------+---------------------------------------------------------------------------------------------+-------------
 public     | log       | event   |         0 |        10 |         25 |
{PUBLISH,PUBREC1,NEWPAGE,UPLOAD,LOGIN,KILLSESS,LOGOUT} |
{0.257333,0.248333,0.072,0.0696667,0.0613333,0.0543333,0.0506667} |
{ACCTUPD,DELETE,IDEMAIL,NEWUSR,NEWUSR,PUBREC0,PUBREC0,RELOGIN,SYSMAINT,UNPUBLISH,UNPUBLISH} |
0.120881
(1 row)

AFTER
pajax=>  select * from pg_stats where tablename='log' and attname='event';
 schemaname | tablename | attname | null_frac | avg_width | n_distinct |
most_common_vals                                            |
most_common_freqs                                                  |
                       
histogram_bounds                                                                            |
correlation

------------+-----------+---------+-----------+-----------+------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
 public     | log       | event   |         0 |        10 |         32 |
{PUBLISH,PUBREC1,NEWPAGE,LOGIN,UPLOAD,KILLSESS,LOGOUT,PUBREC0,UNPUBLISH,NEWUSR,RELOGIN,DELETE,IDEMAIL} |
{0.249067,0.248533,0.0761333,0.0719333,0.0685333,0.0526,0.045,0.0368,0.0348667,0.029,0.0255333,0.0254667,0.0238667} |

{ACCTUPD,ACCTUPD,ACCTUPD,ADMLOGIN,CCREQUEST,CCSUCCESS,DEMOSTART,FOCBUY,PAYPAL,SYSMAINT,SYSMAINT,SYSMAINT,SYSMAINT,SYSMAINT,SYSMAINT,SYSMAINT,SYSMAINT,TEMPNEW,TEMPRE}
|    
0.106671
(1 row)


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Slow Inserts on large tables
Next
From: Richard Huxton
Date:
Subject: Re: 7.4 - basic tuning question