Superfluous merge/sort - Mailing list pgsql-performance

From Anuradha Ratnaweera
Subject Superfluous merge/sort
Date
Msg-id 20030225041143.GA4886@aratnaweera.virtusa.com
Whole thread Raw
Responses Re: Superfluous merge/sort  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
List pgsql-performance
Question in brief: does the planner/optimizer take into account the
foreign key constraints?

If the answer is "no", please stop reading here.

Here comes the details.  Let me give a simple example to illustrate the
situation.

1. We have two tables t1 and t2.

   create table t1 (
       id integer primary key,
       dummy integer
   );

   create table t2 (
       id integer,
       dummy integer
   );

2. We create indexes on all the non-pkey fields.

   create index t1_dummy_idx on t1(dummy);
   create index t2_id_idx on t2(id);
   create index t2_dummy_idx on t2(dummy);

3. We make t2(id) a foreign key of t1(id).

   alter table t2 add constraint t2_fkey foreign key (id) references t1(id);

4. Populate "t1" with unique "id"s from 0 to 19999 with a dummy value.

   copy "t1" from stdin;
   0            654
   1            86097
   ...
   19998        93716
   19999        9106
   \.

5. Populate "t2" with 50000 "id"s with a normal distribution.

   copy "t2" from stdin;
   8017         98659
   11825        5946
   ...
   8202         35994
   8436         19729
   \.

Now we are ready to go ...


- First query is to find the "dummy" values with highest frequency.

  => explain select dummy from t2 group by dummy order by count(*) desc limit 10;
  NOTICE:  QUERY PLAN:

  Limit  (cost=2303.19..2303.19 rows=10 width=4)
    ->  Sort  (cost=2303.19..2303.19 rows=5000 width=4)
          ->  Aggregate  (cost=0.00..1996.00 rows=5000 width=4)
                ->  Group  (cost=0.00..1871.00 rows=50000 width=4)
                      ->  Index Scan using t2_dummy_idx on t2  (cost=0.00..1746.00 rows=50000 width=4)

  EXPLAIN


- Second query is esseitially the same, but we do a merge with "t1" on
  the foreign key (just for the sake of illustrating the point).

  => explain select t2.dummy from t1, t2 where t1.id = t2.id group by t2.dummy order by count(*) desc limit 10;
  NOTICE:  QUERY PLAN:

  Limit  (cost=7643.60..7643.60 rows=10 width=12)
    ->  Sort  (cost=7643.60..7643.60 rows=5000 width=12)
          ->  Aggregate  (cost=7086.41..7336.41 rows=5000 width=12)
                ->  Group  (cost=7086.41..7211.41 rows=50000 width=12)
                      ->  Sort  (cost=7086.41..7086.41 rows=50000 width=12)
                            ->  Merge Join  (cost=0.00..3184.00 rows=50000 width=12)
                                  ->  Index Scan using t1_pkey on t1  (cost=0.00..638.00 rows=20000 width=4)
                                  ->  Index Scan using t2_id_idx on t2  (cost=0.00..1746.00 rows=50000 width=8)

  EXPLAIN


Does this mean that the planner/optimizer doesn't take into account the
foreign key constraint?

    Anuradha

--

Debian GNU/Linux (kernel 2.4.21-pre4)

Don't worry about the world coming to an end today.  It's already tomorrow
in Australia.
        -- Charles Schulz


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: slow query
Next
From: Mark Halliwell
Date:
Subject: Query not using the index