help understanding create statistic - Mailing list pgsql-general

From Luca Ferrari
Subject help understanding create statistic
Date
Msg-id CAKoxK+6C8CKdbYbbyNeYnc5aiDk=G-k-iDyDZMcmjJATqkLM9w@mail.gmail.com
Whole thread Raw
Responses Re: help understanding create statistic  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-general
Hi all,
in order to better understand this feature of 10, I've created a table
like this:

CREATE TABLE expenses(
   ...
   day date,
   year int,
   CHECK( year = EXTRACT( year FROM day ) )
);

so that I can ensure 'year' and 'day' are tied together:

SELECT
count(*) FILTER( WHERE year = 2016 ) AS by_year,
count(*) FILTER( WHERE EXTRACT( year FROM day ) = 2016 ) AS by_day
FROM expenses;
-[ RECORD 1 ]-
by_year | 8784
by_day  | 8784

Then I created a statistic:

CREATE STATISTICS stat_day_year ( dependencies )
ON day, year
FROM expenses;

select * from pg_statistic_ext ;
-[ RECORD 1 ]---+---------------------
stxrelid        | 42833
stxname         | stat_day_year
stxnamespace    | 2200
stxowner        | 16384
stxkeys         | 3 5
stxkind         | {f}
stxndistinct    |
stxdependencies | {"3 => 5": 1.000000}

Now, having an index on the extract year of day as follows:

CREATE INDEX idx_year
ON expenses ( EXTRACT( year FROM day ) );

why is the planner not choosing to use such index on a 'year' raw query?

EXPLAIN SELECT * FROM expenses
WHERE year = 2016;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Gather  (cost=1000.00..92782.34 rows=8465 width=32)
   Workers Planned: 2
   ->  Parallel Seq Scan on expenses  (cost=0.00..90935.84 rows=3527 width=32)
         Filter: (year = 2016)

The number of rows are correct, but I was expecting it to use the same
index as a query like "WHERE EXTRACT( year FROM day) = 2016" triggers.

Even altering the year column to not null does show any change, and
this is the plan obtained turning off seq_scan (to see the costs):

EXPLAIN ANALYZE SELECT * FROM expenses
WHERE year = 2016;
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on expenses  (cost=10000000000.00..10000127402.44 rows=8451
width=32) (actual time=972.734..2189.300 rows=8784 loops=1)
   Filter: (year = 2016)
   Rows Removed by Filter: 4991216


Am I misunderstaing this functional dependency?

Thanks,
Luca


pgsql-general by date:

Previous
From: Pablo Hendrickx
Date:
Subject: Re: Example setup for Odyssey connection pooler?
Next
From: Mathieu PUJOL
Date:
Subject: Analyze plan of foreign data wrapper