I increased (and decreased) the stats target for the column and re-analyzed. Didn't make a difference.
Is it possible that the row estimate is off because of a column other than time? I looked at the # of events in that time period and 1.8 million is actually a good estimate. What about the ((strpos(other_events_1004175222.hierarchy, '#close_onborading;'::text) <> 0) condition in the filter? It makes sense that Postgres wouldn't have a way to estimate how selective this condition is.
On Mon, Aug 16, 2021 at 11:22:44AM -0400, Matt Dupree wrote: > > Is either half of the AND estimated correctly? If you do a query > > with only ">=", and a query with only "<=", do either of them give an > > accurate rowcount estimate ? > > Dropping >= results in the correct index being used. Dropping <= doesn't > have this effect.
This doesn't answer the question though: are the rowcount estimes accurate (say within 10%).
It sounds like interpolating the histogram is giving a poor result, at least over that range of values. It'd be interesting to see the entire histogram.
You might try increasing (or decreasing) the stats target for that column, and re-analyzing.
Your histogram bounds are for ~38 months of data, and your query is for the previous month (July).
$ date -d @1530186399 Thu Jun 28 06:46:39 CDT 2018 $ date -d @1629125609 Mon Aug 16 09:53:29 CDT 2021
$ date -d @1627369200 Tue Jul 27 02:00:00 CDT 2021 $ date -d @1624777200 Sun Jun 27 02:00:00 CDT 2021
The timestamp column has ndistinct near -1, similar to a continuous distribution, so I'm not sure why the estimate would be so bad.