Thread: Planner matching constants across tables in a join

Planner matching constants across tables in a join

From
Richard Huxton
Date:
I know this has been covered on one of the lists in the past, but I'm damned
if I can find the keywords to locate it.

If I join two tables with a comparison to a constant on one, why can't the
planner see that the comparison applies to both tables:

SELECT a.id FROM a JOIN b ON a.id=b.id WHERE a.id=1;

runs much slower than

SELECT a.id FROM a JOIN b ON a.id=b.id WHERE a.id=1 AND b.id=1;

It's not a real problem since it's easy to work around, but I was wondering
what the difficulties are for the planner in seeing that query 1 is the same
as query 2. Note that it doesn't seem related to JOIN forcing the planner's
hand, the same applies just using WHERE a.id=b.id

--
  Richard Huxton

Re: Planner matching constants across tables in a join

From
Greg Stark
Date:
Richard Huxton <dev@archonet.com> writes:

> I know this has been covered on one of the lists in the past, but I'm damned
> if I can find the keywords to locate it.
>
> If I join two tables with a comparison to a constant on one, why can't the
> planner see that the comparison applies to both tables:

It sure does. Postgres does an impressive job of tracing equality clauses
around for just this purpose.

> SELECT a.id FROM a JOIN b ON a.id=b.id WHERE a.id=1;
>
> runs much slower than
>
> SELECT a.id FROM a JOIN b ON a.id=b.id WHERE a.id=1 AND b.id=1;

Really? They produce virtually the same plan for me.

Why do you think it'll run slower?
What query are you actually finding slow?

--
greg

Re: Planner matching constants across tables in a join

From
Richard Huxton
Date:
On Wednesday 05 Mar 2003 12:42 pm, Greg Stark wrote:
> Really? They produce virtually the same plan for me.
>
> Why do you think it'll run slower?
> What query are you actually finding slow?

The actual query uses three tables, but isn't very complicated. Apologies for
the wrapping on the explain.

EXPLAIN ANALYSE SELECT a.line_id, a.start_time, a.call_dur, i.cam_id,
i.prod_id, i.chg_per_min, i.rev_per_min
FROM campaign_items i, campaign c, activity a
WHERE
i.cam_id=c.id AND a.line_id=i.line_id
AND a.start_time BETWEEN c.cam_from AND c.cam_to
AND a.line_id='0912345 0004' AND i.line_id='0912345 0004';


QUERY PLAN
----------
 Merge Join  (cost=348.01..348.72 rows=1 width=72) (actual time=115.43..116.27
rows=21 loops=1)
   Merge Cond: ("outer".id = "inner".cam_id)
   Join Filter: (("outer".line_id)::text = ("inner".line_id)::text)
   ->  Sort  (cost=245.45..245.75 rows=118 width=40) (actual time=83.98..84.10
rows=94 loops=1)
         Sort Key: c.id
         ->  Nested Loop  (cost=0.00..241.40 rows=118 width=40) (actual
time=3.83..83.27 rows=94 loops=1)
               Join Filter: (("outer".start_time >=
("inner".cam_from)::timestamp without time zone) AND ("outer".start_time <=
("inner".cam_to)::timestamp without time zone))
               ->  Seq Scan on activity a  (cost=0.00..199.00 rows=11
width=28) (actual time=3.06..54.14 rows=19 loops=1)
                     Filter: ((line_id)::text = '0912345 0004'::text)
               ->  Seq Scan on campaign c  (cost=0.00..2.00 rows=100 width=12)
(actual time=0.02..0.84 rows=100 loops=19)
   ->  Sort  (cost=102.56..102.57 rows=5 width=32) (actual time=31.36..31.39
rows=20 loops=1)
         Sort Key: i.cam_id
         ->  Seq Scan on campaign_items i  (cost=0.00..102.50 rows=5 width=32)
(actual time=17.16..31.11 rows=6 loops=1)
               Filter: ((line_id)::text = '0912345 0004'::text)
 Total runtime: 117.08 msec
(15 rows)


and this is the plan where I just check the one line_id:


EXPLAIN ANALYSE SELECT a.line_id, a.start_time, a.call_dur, i.cam_id,
i.prod_id, i.chg_per_min, i.rev_per_min
FROM campaign_items i, campaign c, activity a
WHERE
i.cam_id=c.id AND a.line_id=i.line_id
AND a.start_time BETWEEN c.cam_from AND c.cam_to
AND i.line_id='0912345 0004';

QUERY PLAN
---------------------------------------
 Hash Join  (cost=2.25..1623.70 rows=6 width=72) (actual time=48.27..974.30
rows=21 loops=1)
   Hash Cond: ("outer".cam_id = "inner".id)
   Join Filter: (("outer".start_time >= ("inner".cam_from)::timestamp without
time zone) AND ("outer".start_time <= ("inner".cam_to)::timestamp without
time zone))
   ->  Nested Loop  (cost=0.00..1619.87 rows=53 width=60) (actual
time=24.49..969.33 rows=114 loops=1)
         Join Filter: (("inner".line_id)::text = ("outer".line_id)::text)
         ->  Seq Scan on campaign_items i  (cost=0.00..102.50 rows=5 width=32)
(actual time=15.72..28.52 rows=6 loops=1)
               Filter: ((line_id)::text = '0912345 0004'::text)
         ->  Seq Scan on activity a  (cost=0.00..174.00 rows=10000 width=28)
(actual time=0.03..101.95 rows=10000 loops=6)
   ->  Hash  (cost=2.00..2.00 rows=100 width=12) (actual time=1.54..1.54
rows=0 loops=1)
         ->  Seq Scan on campaign c  (cost=0.00..2.00 rows=100 width=12)
(actual time=0.06..0.94 rows=100 loops=1)
 Total runtime: 975.13 msec
(11 rows)

Table campaign has 100 rows, campaign_items 5000, activity 10000. My guess is
that the planner starts with "campaign" because of the low number of rows,
but it still looks like filtering on "activity" would help things. Indeed,
testing a.line_id instead of i.line_id does make a difference.


QUERY PLAN
-------------------
 Hash Join  (cost=241.70..457.54 rows=6 width=72) (actual time=161.20..225.68
rows=21 loops=1)
   Hash Cond: ("outer".cam_id = "inner".id)
   Join Filter: (("inner".line_id)::text = ("outer".line_id)::text)
   ->  Seq Scan on campaign_items i  (cost=0.00..90.00 rows=5000 width=32)
(actual time=0.03..72.00 rows=5000 loops=1)
   ->  Hash  (cost=241.40..241.40 rows=118 width=40) (actual time=85.46..85.46
rows=0 loops=1)
         ->  Nested Loop  (cost=0.00..241.40 rows=118 width=40) (actual
time=3.80..84.66 rows=94 loops=1)
               Join Filter: (("outer".start_time >=
("inner".cam_from)::timestamp without time zone) AND ("outer".start_time <=
("inner".cam_to)::timestamp without time zone))
               ->  Seq Scan on activity a  (cost=0.00..199.00 rows=11
width=28) (actual time=3.03..54.48 rows=19 loops=1)
                     Filter: ((line_id)::text = '0912345 0004'::text)
               ->  Seq Scan on campaign c  (cost=0.00..2.00 rows=100 width=12)
(actual time=0.03..0.89 rows=100 loops=19)
 Total runtime: 226.51 msec
(11 rows)

--
  Richard Huxton

Re: Planner matching constants across tables in a join

From
Greg Stark
Date:
Richard Huxton <dev@archonet.com> writes:

> Filter: ((line_id)::text = '0912345 0004'::text)

So I think this means that line_id is being casted to "text". Though I'm not
clear why it would be choosing "text" for the constant if line_id wasn't text
to begin with.

In any case my plans here look like:
> Filter: (aa = 'x'::text)

so it looks like there's something extra going on in your plan.

what does your table definition look like?

--
greg

Re: Planner matching constants across tables in a join

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Richard Huxton <dev@archonet.com> writes:
>> If I join two tables with a comparison to a constant on one, why can't the
>> planner see that the comparison applies to both tables:

> It sure does. Postgres does an impressive job of tracing equality clauses
> around for just this purpose.

CVS tip does.  Existing releases don't...

            regards, tom lane

Re: Planner matching constants across tables in a join

From
Richard Huxton
Date:
On Wednesday 05 Mar 2003 3:02 pm, Greg Stark wrote:
> Richard Huxton <dev@archonet.com> writes:
> > Filter: ((line_id)::text = '0912345 0004'::text)
>
> So I think this means that line_id is being casted to "text". Though I'm
> not clear why it would be choosing "text" for the constant if line_id
> wasn't text to begin with.

A domain defined as varchar() actually - which is why it's not using an index,
but that's neither here nor there regarding the constant issue.

> In any case my plans here look like:
> > Filter: (aa = 'x'::text)
>
> so it looks like there's something extra going on in your plan.
>
> what does your table definition look like?

rms=> \d campaign
       Table "rms.campaign"
  Column  |   Type    | Modifiers
----------+-----------+-----------
 id       | integer   | not null
 title    | item_name |
 cam_from | date      |
 cam_to   | date      |
 owner    | integer   |
Indexes: campaign_pkey primary key btree (id),
         campaign_from_idx btree (cam_from),
         campaign_to_idx btree (cam_to)

rms=> \d campaign_items
    Table "rms.campaign_items"
   Column    |  Type   | Modifiers
-------------+---------+-----------
 cam_id      | integer | not null
 line_id     | tel_num | not null
 prod_id     | integer | not null
 chg_per_min | integer |
 rev_per_min | integer |
Indexes: campaign_items_pkey primary key btree (cam_id, line_id, prod_id),
         cam_item_line_idx btree (line_id)
Foreign Key constraints: $1 FOREIGN KEY (cam_id) REFERENCES campaign(id) ON
UPDATE NO ACTION ON DELETE NO ACTION,
                         $2 FOREIGN KEY (line_id) REFERENCES line(telno) ON
UPDATE NO ACTION ON DELETE NO ACTION,
                         $3 FOREIGN KEY (prod_id) REFERENCES product(id) ON
UPDATE NO ACTION ON DELETE NO ACTION

rms=> \d activity
                 Table "rms.activity"
   Column   |            Type             | Modifiers
------------+-----------------------------+-----------
 line_id    | tel_num                     | not null
 start_time | timestamp without time zone | not null
 call_dur   | integer                     |
Indexes: activity_pkey primary key btree (line_id, start_time),
         activity_start_idx btree (start_time)
Foreign Key constraints: $1 FOREIGN KEY (line_id) REFERENCES line(telno) ON
UPDATE NO ACTION ON DELETE NO ACTION


--
  Richard Huxton

Re: Planner matching constants across tables in a

From
"Josh Berkus"
Date:
Richard,

> A domain defined as varchar() actually - which is why it's not using
> an index,
> but that's neither here nor there regarding the constant issue.

You might improve your performance overall if you cast the constant to
tel_num before doing the comparison in the query.   Right now, the
parser is casting the whole column to text instead, because it can't
tell that the constant you supply is a valid tel_num.

-Josh

Re: Planner matching constants across tables in a

From
Richard Huxton
Date:
On Wednesday 05 Mar 2003 7:00 pm, Josh Berkus wrote:
> Richard,
>
> > A domain defined as varchar() actually - which is why it's not using
> > an index,
> > but that's neither here nor there regarding the constant issue.
>
> You might improve your performance overall if you cast the constant to
> tel_num before doing the comparison in the query.   Right now, the
> parser is casting the whole column to text instead, because it can't
> tell that the constant you supply is a valid tel_num.

That's what I thought, but...

rms=> EXPLAIN ANALYSE SELECT * FROM line WHERE telno='0912345 0004'::tel_num;
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Seq Scan on line  (cost=0.00..20.50 rows=1 width=28) (actual time=0.10..5.28
rows=1 loops=1)
   Filter: ((telno)::text = ('0912345 0004'::character varying)::text)
 Total runtime: 5.43 msec

rms=> EXPLAIN ANALYSE SELECT * FROM line WHERE telno='0912345 0004'::varchar;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Index Scan using line_pkey on line  (cost=0.00..5.78 rows=1 width=28) (actual
time=14.03..14.03 rows=1 loops=1)
   Index Cond: ((telno)::character varying = '0912345 0004'::character
varying)
 Total runtime: 14.28 msec

Ignoring the times (fake data on my test box) it seems like there's an issue
in comparing against DOMAIN defined types. Or maybe it's in the index
definition, although I don't know how to find out the type of an index.
--
  Richard Huxton

Re: Planner matching constants across tables in a

From
Richard Huxton
Date:
On Wednesday 05 Mar 2003 7:00 pm, Josh Berkus wrote:
> You might improve your performance overall if you cast the constant to
> tel_num before doing the comparison in the query.

Stranger and stranger...

richardh=# CREATE DOMAIN intdom int4;
richardh=# CREATE DOMAIN textdom text;
richardh=# CREATE TABLE domtest (a intdom, b textdom);
richardh=# CREATE INDEX domtest_a_idx ON domtest (a);
richardh=# CREATE INDEX domtest_b_idx ON domtest (b);
richardh=# INSERT INTO domtest VALUES (1,'aaa');
richardh=# INSERT INTO domtest VALUES (2,'bbb');
richardh=# INSERT INTO domtest VALUES (3,'ccc');

richardh=# EXPLAIN ANALYSE SELECT * FROM domtest WHERE a=1::intdom;
-------------------------------------------------------------------------------------------------
 Seq Scan on domtest  (cost=0.00..22.50 rows=5 width=36) (actual
time=0.08..0.11 rows=1 loops=1)
   Filter: ((a)::oid = 1::oid)

richardh=# EXPLAIN ANALYSE SELECT * FROM domtest WHERE a=1::int4;
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using domtest_a_idx on domtest  (cost=0.00..17.07 rows=5 width=36)
(actual time=0.09..0.11 rows=1 loops=1)
   Index Cond: ((a)::integer = 1)

richardh=# EXPLAIN ANALYSE SELECT * FROM domtest WHERE b='aaa'::textdom;
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using domtest_b_idx on domtest  (cost=0.00..17.07 rows=5 width=36)
(actual time=0.09..0.11 rows=1 loops=1)
   Index Cond: ((b)::text = 'aaa'::text)

richardh=# EXPLAIN ANALYSE SELECT * FROM domtest WHERE b='aaa'::text;
-----------------------------------------------------------------------------------------------------------------------
 Index Scan using domtest_b_idx on domtest  (cost=0.00..17.07 rows=5 width=36)
(actual time=0.10..0.12 rows=1 loops=1)
   Index Cond: ((b)::text = 'aaa'::text)

Can't think why we're getting casts to type "oid" in the first example - I'd
have thought int4 would be the default. I'm guessing the text domain always
works because that's the default cast.

--
  Richard Huxton