Re: Setting Statistics on Functional Indexes - Mailing list pgsql-performance

From Shaun Thomas
Subject Re: Setting Statistics on Functional Indexes
Date
Msg-id 508841FF.9040303@optionshouse.com
Whole thread Raw
In response to Re: Setting Statistics on Functional Indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Setting Statistics on Functional Indexes  (Shaun Thomas <sthomas@optionshouse.com>)
List pgsql-performance
On 10/24/2012 02:11 PM, Tom Lane wrote:

> It's not particularly (not that you've even defined what you think
> "optimistic" is, much less mentioned what baseline you're comparing
> to).

The main flaw with my example is that it's random. But I swear I'm not
making it up! :)

There seems to be a particularly nasty edge case we're triggering, then.
Like I said, it's worse when col1+col2 don't match anything. In that
case, it's using the trunc index on the date column, which has
demonstrably worse performance. Here are the two analyzes I got
before/after front-loading statistics.

Before stats increase:

  Sort  (cost=9.38..9.39 rows=1 width=23) (actual time=78.282..78.282
rows=0 loops=1)
    Sort Key: id, action_date
    Sort Method: quicksort  Memory: 25kB
    ->  Index Scan using idx_date_test_action_date_trunc on date_test
(cost=0.00..9.37 rows=1 width=23) (actual time=78.274..78.274 rows=0
loops=1)
          Index Cond: (date_trunc('day'::text, action_date) >=
'2012-10-24 00:00:00'::timestamp without time zone)
          Filter: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric))
  Total runtime: 78.317 ms


And then after. I used your unofficial trick to set it to 1000:

alter index idx_date_test_action_date_trunc
       alter column date_trunc set statistics 1000;
analyze date_test;


  Sort  (cost=9.83..9.83 rows=1 width=23) (actual time=0.038..0.038
rows=0 loops=1)
    Sort Key: id, action_date
    Sort Method: quicksort  Memory: 25kB
    ->  Index Scan using idx_date_test_col1_col2 on date_test
(cost=0.00..9.82 rows=1 width=23) (actual time=0.030..0.030 rows=0 loops=1)
          Index Cond: (((col1)::text = 'S:96'::text) AND (col2 =
657::numeric))
          Filter: (date_trunc('day'::text, action_date) >= '2012-10-24
00:00:00'::timestamp without time zone)
  Total runtime: 0.066 ms


This is on a bone-stock PG 9.1.6 from Ubuntu 12.04 LTS, with
default_statistics increased to 500. The only thing I bumped up was the
functional index between those two query plans.

But then I noticed something else. I reverted back to the old 500
default for everything, and added an index:

create index idx_date_test_action_date_trunc_col1
     on date_test (date_trunc('day', action_date), col1);

I think we can agree that this index would be more selective than the
one on date_trunc by itself. Yet:

  Sort  (cost=9.38..9.39 rows=1 width=23) (actual time=77.055..77.055
rows=0 loops=1)
    Sort Key: id, action_date
    Sort Method: quicksort  Memory: 25kB
    ->  Index Scan using idx_date_test_action_date_trunc on date_test
(cost=0.00..9.37 rows=1 width=23) (actual time=77.046..77.046 rows=0
loops=1)
          Index Cond: (date_trunc('day'::text, action_date) >=
'2012-10-24 00:00:00'::timestamp without time zone)
          Filter: (((col1)::text = 'S:96'::text) AND (col2 = 657::numeric))
  Total runtime: 77.091 ms


All I have to say about that is: wat.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Setting Statistics on Functional Indexes
Next
From: Shaun Thomas
Date:
Subject: Re: Setting Statistics on Functional Indexes