Re: Inefficient queryplan for query with intersectable - Mailing list pgsql-performance

From Tom Lane
Subject Re: Inefficient queryplan for query with intersectable
Date
Msg-id 26436.1125096976@sss.pgh.pa.us
Whole thread Raw
In response to Re: Inefficient queryplan for query with intersectable  (Arjen van der Meijden <acmmailing@tweakers.net>)
Responses Re: Inefficient queryplan for query with intersectable  (Arjen van der Meijden <acmmailing@tweakers.net>)
List pgsql-performance
Arjen van der Meijden <acmmailing@tweakers.net> writes:
> As said, it chooses sequential scans or "the wrong index plans" over a
> perfectly good plan that is just not selected when the parameters are
> "too well tuned" or sequential scanning of the table is allowed.

I think some part of the problem comes from using inconsistent
datatypes.  For instance, it seems very odd that the thing is not
using a hash or something to handle

 t_0.Cat2 IN (SELECT 545 UNION SELECT ID FROM cat WHERE ParentID = 545)

seeing that it correctly guesses there are only going to be about 8 rows
in the union.  Part of the reason is that cat2 is smallint, whereas the
output of the union must be at least int, maybe wider depending on the
datatype of cat.id (which you did not show us); so the comparison isn't
hashable.  Even a smallint vs int comparison would be mergejoinable,
though, so I'm really wondering what cat.id is.

Another big part of the problem comes from poor result size estimation.
I'm not sure you can eliminate that entirely given the multiple
conditions on different columns (which'd require cross-column statistics
to really do well, which we do not have).  But you could avoid
constructs like

    WHERE ... t_1.recordtimestamp >=
      (SELECT max_date - 60 FROM last_dates WHERE table_name = 'pricetracker')

The planner is basically going to throw up its hands and make a default
guess on the selectivity of this; it's not smart enough to decide that
the sub-select probably represents a constant.  What I'd do with this
is to define a function marked STABLE for the sub-select result, perhaps
something like

create function get_last_date(tabname text, offsetdays int)
returns timestamp as $$
SELECT max_date - $2 FROM last_dates WHERE table_name = $1
$$ language sql strict stable;

(I'm guessing as to datatypes and the amount of parameterization you
need.)  Then write the query like

    WHERE ... t_1.recordtimestamp >= get_last_date('pricetracker', 60)

In this formulation the planner will be able to make a reasonable guess
about how many rows will match ... at least if your statistics are up
to date ...

            regards, tom lane

pgsql-performance by date:

Previous
From: asif ali
Date:
Subject: Re: Weird performance drop after VACUUM
Next
From: Philip Hallstrom
Date:
Subject: Re: Weird performance drop after VACUUM