Does "correlation" mislead the optimizer on large tables? - Mailing list pgsql-performance
From | Ron Mayer |
---|---|
Subject | Does "correlation" mislead the optimizer on large tables? |
Date | |
Msg-id | Pine.LNX.4.44.0301231909070.986-100000@localhost.localdomain Whole thread Raw |
In response to | Re: Same query, same performance ("alexandre :: aldeia digital" <alepaes@aldeiadigital.com.br>) |
Responses |
Re: Does "correlation" mislead the optimizer on large tables?
|
List | pgsql-performance |
Short summary: On a large tables, I think the "correlation" pg_stats field as calculated by "vacuum analyze" or "analyze" can mislead the optimizer. By forcing index scans on some queries shown below, some queries in my database speed up from 197 seconds to under 30 seconds. I'd like feedback on whether or not having a smarter "analyze" function (which I think I could write as a separate utility) would help me situations like this. Longer: In particular, if I have a large table t with columns 'a','b','c', etc, and I cluster the table as follows: create table t_ordered as select * from t order by a,b; vacuum analyze t_ordered; Column "b" will (correctly) get a very low "correlation" in the pg_stats table -- but I think the optimizer would do better assuming a high correlation because similar 'b' values are still grouped closely on the same disk pages. Below is a real-world example of this issue. The table "fact" is a large one (reltuples = 1e8, relpages = 1082385) and contains about 1 years worth of data. The data was loaded sequentialy (ordered by dat,tim). logs=# \d fact; Table "fact" Column | Type | Modifiers --------+------------------------+----------- dat | date | tim | time without time zone | ip_id | integer | bid_id | integer | req_id | integer | ref_id | integer | uag_id | integer | Indexes: i_fact_2__bid_id, i_fact_2__dat, i_fact_2__tim, i_fact_2__ip_id, i_fact_2__ref_id, i_fact_2__req_id With a table this large, each day's worth of data contains about 3000 pages; or conversely, each page contains only about a 30 second range of values for "tim". As shown in the queries below, the optimizer wanted to do a sequential scan when looking at a 10 minute part of the day. However also as shown, forcing an index scan did much better. I'm guessing this happened because the optimizer saw the horrible correlation, and decided it would have to read an enormous number of pages if it did an index scan. =========================================== logs=# select tablename,attname,n_distinct,correlation from pg_stats where tablename='fact'; tablename | attname | n_distinct | correlation -----------+---------+------------+------------- fact | dat | 365 | 1 fact | tim | 80989 | -0.00281447 fact | ip_id | 44996 | 0.660689 fact | bid_id | 742850 | 0.969026 fact | req_id | 2778 | 0.67896 fact | ref_id | 595 | 0.258023 fact | uag_id | 633 | 0.234216 (7 rows) logs=# explain analyze select * from fact where tim<'00:10:00'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Seq Scan on fact (cost=0.00..1949838.40 rows=526340 width=32) (actual time=0.39..197447.50 rows=402929 loops=1) Filter: (tim < '00:10:00'::time without time zone) Total runtime: 197810.01 msec (3 rows) logs=# explain analyze select * from fact where tim<'00:10:00'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Seq Scan on fact (cost=0.00..1949838.40 rows=526340 width=32) (actual time=15.25..156705.76 rows=402929 loops=1) Filter: (tim < '00:10:00'::time without time zone) Total runtime: 157089.15 msec (3 rows) logs=# set enable_seqscan = off; SET logs=# explain analyze select * from fact where tim<'00:10:00'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Index Scan using i__fact__tim on fact (cost=0.00..2110978.39 rows=526340 width=32) (actual time=104.41..23307.84 rows=402929loops=1) Index Cond: (tim < '00:10:00'::time without time zone) Total runtime: 23660.95 msec (3 rows) logs=# explain analyze select * from fact where tim<'00:10:00'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Index Scan using i__fact__tim on fact (cost=0.00..2110978.39 rows=526340 width=32) (actual time=0.03..1477.35 rows=402929loops=1) Index Cond: (tim < '00:10:00'::time without time zone) Total runtime: 1827.94 msec (3 rows) logs=# ******************************************************************************* ******************************************************************************* So two questions: a) Am I on to something.... or is something else the reason why the optimizer chose the much slower sequential scan? b) If I did write an "analyze" that tried to set "correlation" values that took into account such local grouping of data, would anyone be interested? Ron
pgsql-performance by date: