Re: Limit + group + join - Mailing list pgsql-performance

From Mark Kirkwood
Subject Re: Limit + group + join
Date
Msg-id 430FD61A.3020902@paradise.net.nz
Whole thread Raw
In response to Re: Limit + group + join  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Limit + group + join
List pgsql-performance
Interestingly enough, 7.4.8 and 8.1devel-2005-08-23 all behave the same
as 8.0.3 for me (tables freshly ANALYZEd):

joinlimit=# SELECT version();
                                              version

-------------------------------------------------------------------------------------------------
  PostgreSQL 7.4.8 on i386-unknown-freebsd5.4, compiled by GCC gcc (GCC)
3.4.2 [FreeBSD] 20040728
(1 row)

joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY
c.id ORDER BY c.id DESC LIMIT 5;
                                           QUERY PLAN

-----------------------------------------------------------------------------------------------
  Limit  (cost=10591.36..10591.39 rows=5 width=4)
    ->  Group  (cost=10591.36..10992.02 rows=80131 width=4)
          ->  Sort  (cost=10591.36..10791.69 rows=80131 width=4)
                Sort Key: c.id
                ->  Merge Join  (cost=0.00..4064.66 rows=80131 width=4)
                      Merge Cond: ("outer".id = "inner".c_id)
                      ->  Index Scan using c_pkey on c
(cost=0.00..1411.31 rows=80131 width=4)
                      ->  Index Scan using b_on_c on b
(cost=0.00..1451.72 rows=80172 width=4)
(8 rows)

joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY
c.id ORDER BY c.id  LIMIT 5;
                                        QUERY PLAN

-----------------------------------------------------------------------------------------
  Limit  (cost=0.00..0.27 rows=5 width=4)
    ->  Group  (cost=0.00..4264.99 rows=80131 width=4)
          ->  Merge Join  (cost=0.00..4064.66 rows=80131 width=4)
                Merge Cond: ("outer".id = "inner".c_id)
                ->  Index Scan using c_pkey on c  (cost=0.00..1411.31
rows=80131 width=4)
                ->  Index Scan using b_on_c on b  (cost=0.00..1451.72
rows=80172 width=4)
(6 rows)


joinlimit=# SELECT version();
                                               version

----------------------------------------------------------------------------------------------------
  PostgreSQL 8.1devel on i386-unknown-freebsd5.4, compiled by GCC gcc
(GCC) 3.4.2 [FreeBSD] 20040728
(1 row)

joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY
c.id ORDER BY c.id DESC LIMIT 5;
                                           QUERY PLAN

-----------------------------------------------------------------------------------------------
  Limit  (cost=10654.53..10654.55 rows=5 width=4)
    ->  Group  (cost=10654.53..11054.53 rows=80000 width=4)
          ->  Sort  (cost=10654.53..10854.53 rows=80000 width=4)
                Sort Key: c.id
                ->  Merge Join  (cost=0.00..4139.44 rows=80000 width=4)
                      Merge Cond: ("outer".id = "inner".c_id)
                      ->  Index Scan using c_pkey on c
(cost=0.00..1450.00 rows=80000 width=4)
                      ->  Index Scan using b_on_c on b
(cost=0.00..1490.00 rows=80000 width=4)
(8 rows)

joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY
c.id ORDER BY c.id LIMIT 5;
                                        QUERY PLAN

-----------------------------------------------------------------------------------------
  Limit  (cost=0.00..0.27 rows=5 width=4)
    ->  Group  (cost=0.00..4339.44 rows=80000 width=4)
          ->  Merge Join  (cost=0.00..4139.44 rows=80000 width=4)
                Merge Cond: ("outer".id = "inner".c_id)
                ->  Index Scan using c_pkey on c  (cost=0.00..1450.00
rows=80000 width=4)
                ->  Index Scan using b_on_c on b  (cost=0.00..1490.00
rows=80000 width=4)
(6 rows)

The non default server params of relevance are:

shared_buffers = 12000
effective_cache_size = 100000
work_mem/sort_mem = 20480

I did wonder if the highish sort_mem might be a factor, but no, with it
  set to 1024 I get the same behaviour (just higher sort cost estimates).

Cheers

Mark

Tom Lane wrote:
>
>
> Which PG version are you using exactly?  That mistake looks like an
> artifact of the 8.0 "fuzzy plan cost" patch, which we fixed recently:
> http://archives.postgresql.org/pgsql-committers/2005-07/msg00474.php
>
> But Tobias wasn't happy with 7.4 either, so I'm not sure that the fuzzy
> cost issue explains his results.
>
> As far as the "desc" point goes, the problem is that mergejoins aren't
> capable of dealing with backward sort order, so a merge plan isn't
> considered for that case (or at least, it would have to have a sort
> after it, which pretty much defeats the point for a fast-start plan).
> I have some ideas about fixing this but it won't happen before 8.2.
>

pgsql-performance by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: Weird performance drop after VACUUM
Next
From: Greg Stark
Date:
Subject: Re: Limit + group + join