strange estimate for number of rows - Mailing list pgsql-performance
From | Andrew Sullivan |
---|---|
Subject | strange estimate for number of rows |
Date | |
Msg-id | 20031113181415.GB25546@libertyrms.info Whole thread Raw |
Responses |
Re: strange estimate for number of rows
|
List | pgsql-performance |
Hi all, I've one here that I cannot fathom. Any suggestions? We have a table, call it tablename, where we're selecting by a range of dates and an identifier. (This is redacted, obviously): \d tablename Column | Type | Modifiers --------------------+--------------------------+-------------------- id | integer | not null transaction_date | timestamp with time zone | not null product_id | integer | not null Indexes: "trans_posted_trans_date_idx" btree (transaction_date, product_id) The statistics on transaction_date and product_id are set to 1000. Everything is all analysed nicely. But I'm getting a poor plan, because of an estimate that the number of rows to be returned is about double how many actually are: explain analyse select * from transactions_posted where transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and product_id = 2; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on transactions_posted (cost=0.00..376630.33 rows=700923 width=91) (actual time=8422.253..36176.078 rows=316029 loops=1) Filter: ((transaction_date >= '2003-09-01 00:00:00-04'::timestamp with time zone) AND (transaction_date < '2003-10-01 00:00:00-04'::timestamp with time zone) AND (product_id = 2)) Total runtime: 36357.630 ms (3 rows) SET enable_seqscan = off; explain analyse select * from transactions_posted where transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and product_id = 2; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using trans_posted_trans_date_idx on transactions_posted (cost=0.00..1088862.56 rows=700923 width=91) (actual time=35.214..14816.257 rows=316029 loops=1) Index Cond: ((transaction_date >= '2003-09-01 00:00:00-04'::timestamp with time zone) AND (transaction_date < '2003-10-01 00:00:00-04'::timestamp with time zone) AND (product_id = 2)) Total runtime: 15009.816 ms (3 rows) SELECT attname,null_frac,avg_width,n_distinct,correlation FROM pg_stats where tablename = 'transactions_posted' AND attname in ('transaction_date','product_id'); attname | null_frac | avg_width | n_distinct | correlation ------------------+-----------+-----------+------------+------------- product_id | 0 | 4 | 2 | 0.200956 transaction_date | 0 | 8 | -0.200791 | 0.289248 Any ideas? I'm loathe to recommend cluster, since the data will not stay clustered. A -- ---- Andrew Sullivan 204-4141 Yonge Street Afilias Canada Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
pgsql-performance by date: