BUG #8591: Erroneous results, planner pushing where into left join right side - Mailing list pgsql-bugs

From klaussfreire@gmail.com
Subject BUG #8591: Erroneous results, planner pushing where into left join right side
Date
Msg-id E1Vgk41-00050x-Ck@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #8591: Erroneous results, planner pushing where into left join right side  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      8591
Logged by:          Claudio Freire
Email address:      klaussfreire@gmail.com
PostgreSQL version: 9.2.5
Operating system:   Amazon Linux
Description:

So, I've been checking whether my SQL was wrong, and I really can't see the
fault in it. It seems to be a planner error, generating erroneous results.


I've got this query:


select ag.*, act.* from ag
        left join act on act.id = ag.act_id
        left join camp on camp.id = act.camp_id
        left join adv on adv.id = ag.adv_id
        left join src on src.id = coalesce(ag.src_id, act.src_id)
where
    ((ag.act_id in

(2566,2567,1928,2093,2320,2450,2451,2452,1965,2353,2226,2365,2370,2373,2145,2255,2387,2388,2261,2262,2392,2319,1885,2014,2272,1936,1894,1895,1896,1897,1898,2540,1903,1904,2171,1918))
and (ag.created <= '2013-10-16 23:59:59.999'::timestamp without time zone)
and (ag.created >= '2013-10-16'::timestamp without time zone))
    and (coalesce(ag.src_id, act.src_id) = 74)
limit 1000;


Which returns 423 rows, many of which don't match the where condition.


Explain analyze here: http://explain.depesz.com/s/6Ov  (notice the filter on
the scan over act_pkey, that's wrong since the join node is a left join)


If written like this, it returns the right 34 results (the -1000 there is a
no-op, there's no such value anywhere):


select ag.*, act.* from ag
        left join act on act.id = ag.act_id
        left join camp on camp.id = act.camp_id
        left join adv on adv.id = ag.adv_id
        left join src on src.id = coalesce(ag.src_id, act.src_id)
where
    ((ag.act_id in

(2566,2567,1928,2093,2320,2450,2451,2452,1965,2353,2226,2365,2370,2373,2145,2255,2387,2388,2261,2262,2392,2319,1885,2014,2272,1936,1894,1895,1896,1897,1898,2540,1903,1904,2171,1918))
and (ag.created <= '2013-10-16 23:59:59.999'::timestamp without time zone)
and (ag.created >= '2013-10-16'::timestamp without time zone))
    and (coalesce(ag.src_id, act.src_id, -1000) = 74)
limit 1000;


Explain analyze here: http://explain.depesz.com/s/Xsa  (notice the filter is
not on the join node, which is the correct way to execute this query)


A good plan can also be obtained by removing the left join against src. In
this query, this join is superfluous, but the (more complex) real-world
query that is giving me this bad plan needs it.


There are two relevant indices:


CREATE INDEX ix_ag_action ON ag USING btree (ruby_action_id);


CREATE UNIQUE INDEX ix_ag_unq ON ag USING btree (created,
    (COALESCE(act_id, (-1000))), (COALESCE(adv_id, (-1000))),
(COALESCE(country_id, (-1000))), (COALESCE(os_id, (-1000))),
(COALESCE(src_id, (-1000))));




Sadly, I could not reproduce this bug on a generated database. It seems to
be data-depepdent (probably depends on specific stats in order to generate
that plan). I've got a script that generates a close-enough database, but it
fails to reproduce the bug and it's too big to include here (contact me by
email). Maybe, with that test database and some enable_X fumbling it could
be reproduced, but I did not manage.


In any case, the plans themselves show the bug. To me, it's clear that
filter cannot be moved the way the planner does. If the planner wants to
reduce the size of the right side of the join, to speed up things, it has to
add the filter both at the right side *and* the join node. I don't think how
this could help, though, since the right side is a query by PK.


Row counts:


ag 702k
act 2900
adv 75
camp 579


Schema (approx):


create table advs (
    id  int ,
    name    varchar ,
    PRIMARY KEY (id)
) with (oids = false);


create table camps (
    id  int ,
    name    varchar ,
    PRIMARY KEY (id)
) with (oids = false);


create table sources (
    id  int ,
    company varchar ,
    PRIMARY KEY (id)
) with (oids = false);


create table countries (
    id  int ,
    name varchar ,
    PRIMARY KEY (id)
) with (oids = false);


create table oses (
    id  int ,
    name varchar ,
    PRIMARY KEY (id)
) with (oids = false);


create table acts (
    id  int ,
    name    varchar ,
    source_id   int REFERENCES sources(id),
    camp_id int REFERENCES camps(id),
    os_id int REFERENCE oses(id),
    country_id int REFERENCES countries(id),
    PRIMARY KEY (id)
) with (oids = false);


CREATE TABLE ag (
    id              bigserial not null PRIMARY KEY,
    created     timestamp without time zone, -- daily precision at least
        adv_id  integer REFERENCES advs(id),
        act_id      integer REFERENCES acts(id),
        source_id  integer REFERENCES sources(id), -- take from act, if null

        evs             integer not null default 0
) WITH ( OIDS = FALSE );

pgsql-bugs by date:

Previous
From: Jeffrey Walton
Date:
Subject: fork_process.c and OpenSSL
Next
From: BD
Date:
Subject: Re: BUG #8566: Postgres ODBC Driver 9.02.0100 (32 bits) cannot be used under WINDOWS 7 64 bits