Thread: [BUGS] BUG #14646: performance hint to remove

[BUGS] BUG #14646: performance hint to remove

From
boshomi@gmail.com
Date:
The following bug has been logged on the website:

Bug reference:      14646
Logged by:          Boshomi
Email address:      boshomi@gmail.com
PostgreSQL version: 9.6.2
Operating system:   Linux opensuse
Description:

the order of tables in from clause or where clause can result in different
query plans, with different execution time.


Begin;
--DROP TABLE testjoinperf.tempsmall CASCADE;
--DROP TABLE testjoinperf.tempbig;
/*
CREATE SCHEMA testjoinperf;
CREATE TABLE testjoinperf.tempsmall
( idsmall serial NOT NULL, smallrnd double precision, CONSTRAINT pk_tempsamll PRIMARY KEY (idsmall)
);

CREATE TABLE testjoinperf.tempbig
( idbig serial NOT NULL, idsmall integer, bigrnd double precision, CONSTRAINT pk_tempbig PRIMARY KEY (idbig),
CONSTRAINTfk_tempsmall FOREIGN KEY (idsmall)     REFERENCES testjoinperf.tempsmall (idsmall) MATCH SIMPLE     ON UPDATE
NOACTION ON DELETE NO ACTION
 
);


insert into testjoinperf.tempsmall (idsmall, smallrnd)
select a.a, random() from generate_series(1,50000) as a; 


insert into testjoinperf.tempbig (idbig, idsmall, bigrnd)
select b.b, (b.b % 50000)+1 , random() from generate_series(1,1000000) as
b;analyze testjoinperf.tempsmall;analyze testjoinperf.tempbig;
analyze;
*/
/*

--fast query big table left
explain analyze  select * from   testjoinperf.tempbig  join
testjoinperf.tempsmall    using(idsmall)
where idsmall between 48000 and 49000
limit 100;
*/

/*

-- slow query, small table left. 
explain analyze  select * from   testjoinperf.tempsmall  join
testjoinperf.tempbig      using(idsmall)
where idsmall between 48000 and 49000
limit 100;
*/
rollback;

/*
faster query
QUERY PLAN
Limit  (cost=0.29..137.92 rows=100 width=24) (actual time=6.589..6.896
rows=100 loops=1) ->  Nested Loop  (cost=0.29..28656.67 rows=20821 width=24) (actual
time=6.588..6.880 rows=100 loops=1)       ->  Seq Scan on tempbig  (cost=0.00..20406.00 rows=20821 width=16)
(actual time=6.576..6.609 rows=100 loops=1)             Filter: ((idsmall >= 48000) AND (idsmall <= 49000))
RowsRemoved by Filter: 47998       ->  Index Scan using pk_tempsamll on tempsmall  (cost=0.29..0.39 
rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=100)             Index Cond: (idsmall = tempbig.idsmall)
Planning time: 0.272 ms
Execution time: 6.935 ms



slower query
QUERY PLAN
Limit  (cost=52.46..153.33 rows=100 width=24) (actual time=13.063..13.123
rows=100 loops=1) ->  Hash Join  (cost=52.46..19399.50 rows=19180 width=24) (actual
time=13.062..13.108 rows=100 loops=1)       Hash Cond: (tempbig.idsmall = tempsmall.idsmall)       ->  Seq Scan on
tempbig (cost=0.00..15406.00 rows=1000000 width=16) 
(actual time=0.011..5.361 rows=48098 loops=1)       ->  Hash  (cost=40.47..40.47 rows=959 width=12) (actual
time=0.406..0.406 rows=1001 loops=1)             Buckets: 1024  Batches: 1  Memory Usage: 55kB             ->  Index
Scanusing pk_tempsamll on tempsmall  
(cost=0.29..40.47 rows=959 width=12) (actual time=0.019..0.239 rows=1001
loops=1)                   Index Cond: ((idsmall >= 48000) AND (idsmall <=
49000))
Planning time: 0.277 ms
Execution time: 13.166 ms


the query are eqal.

it also makes a differnence if the where clause is
where tempbig.idsmall between 48000 and 49000 /* fast */

or

where tempsmall.idsmall between 48000 and 49000 /* slow */

to change the order of tables is a performance hint, and should be
removed.


(Boshomi is my username on german wikipedia (de.wikipedia.org), feel free to
contact me for any question.


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14646: performance hint to remove

From
"David G. Johnston"
Date:
On Wed, May 10, 2017 at 10:37 AM, <boshomi@gmail.com> wrote:
The following bug has been logged on the website:

Bug reference:      14646
Logged by:          Boshomi
Email address:      boshomi@gmail.com
PostgreSQL version: 9.6.2
Operating system:   Linux opensuse
Description:

the order of tables in from clause or where clause can result in different
query plans, with different execution time.
​[...]​
 

--fast query big table left
explain analyze  select * from   testjoinperf.tempbig  join
testjoinperf.tempsmall    using(idsmall)
where idsmall between 48000 and 49000

-- slow query, small table left.
explain analyze  select * from   testjoinperf.tempsmall  join
testjoinperf.tempbig      using(idsmall)
where idsmall between 48000 and 49000
 
​Confirmed 9.6.2 on Ubuntu.  Not sure this is properly classified as a bug but its definitely an area where improvement would seem desirable.  I am a particularly heavy user of the join predicate "USING" and never really thought to look at this dynamic (without the WHERE clause it doesn't matter, both tables up end sequentially scanned).

Someone more knowledgeable than myself will need to comment on the technical aspects as to why "where idsmall" seems to be linked to the left relation instead of the one with a more favorable execution plan.

Boshomi, how did you stumble across this anyway - just with artificial data or did you come up with that after hitting the problem with real data?

David J.


Re: [BUGS] BUG #14646: performance hint to remove

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Confirmed 9.6.2 on Ubuntu.  Not sure this is properly classified as a bug
> but its definitely an area where improvement would seem desirable.  I am a
> particularly heavy user of the join predicate "USING" and never really
> thought to look at this dynamic (without the WHERE clause it doesn't
> matter, both tables up end sequentially scanned).

Yeah, if you use USING the parser tries to define the merged join variable
as equivalent to just one or the other input variable.  With an inner join
it'll arbitrarily pick the left input, with a left or right join it will
use the outer-side variable (since the inner-side one might go to NULL).
With a FULL join, the merged variable is actually defined as
COALESCE(left-input, right-input), which is necessary for semantic
correctness but is pretty awful for optimization purposes.

If you put an equality constraint on the merged join variable, everything
works pretty well anyway because of deduction of implied equalities (ie,
we have left-input = right-input from the USING's join clause, and then
if we also know e.g. left-input = 42 we can deduce right-input = 42).
There are some hacks that make that carry through for outer join cases
as well.

The OP's WHERE clause isn't a simple equality so the equivalence-class
machinery doesn't help there.  There's been some speculation about
improving that, but there are a bunch of semantic and practical problems
in the way.  One of the biggest is that we can't simply duplicate the
restriction for each input variable, because that would cause the planner
to misestimate their selectivity and produce bad plans of a different
sort (not knowing that the restrictions are redundant).  The eclass
machinery knows enough to compensate for that, but it's not clear how
to do it for arbitrary restrictions.

Anyway, this isn't a bug but a missed optimization opportunity, and
I would counsel not holding your breath waiting for an improvement.
It's a difficult problem and it's not tremendously high priority.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs