Re: Postgres using the wrong index index - Mailing list pgsql-performance

From Matt Dupree
Subject Re: Postgres using the wrong index index
Date
Msg-id CAMOk8kpnuDtA7jh9YueU37d39xaQE97rWZE+VvYfdK-susgU1Q@mail.gmail.com
Whole thread Raw
In response to Re: Postgres using the wrong index index  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-performance
I increased (and decreased) the stats target for the column and re-analyzed. Didn't make a difference.

Is it possible that the row estimate is off because of a column other than time? I looked at the # of events in that time period and 1.8 million is actually a good estimate. What about the ((strpos(other_events_1004175222.hierarchy, '#close_onborading;'::text) <> 0) condition in the filter? It makes sense that Postgres wouldn't have a way to estimate how selective this condition is.

On Tue, Aug 17, 2021 at 2:52 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Aug 16, 2021 at 11:22:44AM -0400, Matt Dupree wrote:
> > Is either half of the AND estimated correctly?  If you do a query
> > with only ">=", and a query with only "<=", do either of them give an
> > accurate rowcount estimate ?
>
> Dropping >= results in the correct index being used. Dropping <= doesn't
> have this effect.

This doesn't answer the question though: are the rowcount estimes accurate (say
within 10%).

It sounds like interpolating the histogram is giving a poor result, at least
over that range of values.  It'd be interesting to see the entire histogram.

You might try increasing (or decreasing) the stats target for that column, and
re-analyzing.

Your histogram bounds are for ~38 months of data, and your query is for the
previous month (July).

$ date -d @1530186399
Thu Jun 28 06:46:39 CDT 2018
$ date -d @1629125609
Mon Aug 16 09:53:29 CDT 2021

$ date -d @1627369200
Tue Jul 27 02:00:00 CDT 2021
$ date -d @1624777200
Sun Jun 27 02:00:00 CDT 2021

The timestamp column has ndistinct near -1, similar to a continuous
distribution, so I'm not sure why the estimate would be so bad.

--
Justin


--

K. Matt Dupree

Data Science Engineer321.754.0526  |  matt.dupree@heap.io

pgsql-performance by date:

Previous
From: Nagaraj Raj
Date:
Subject: Re: pg_restore schema dump to schema with different name
Next
From: Justin Pryzby
Date:
Subject: Re: Postgres using the wrong index index