Thread: sort operation leads planner to different number of rows?

sort operation leads planner to different number of rows?

From
Robert Treat
Date:
I'm in the process of upgrading one of my servers from 7.3 to 8.1, and
have run across a query that is slower on the new 8.1 box. FWIW The data
is all freshly loaded and freshly analyzed, and this is 8.1.1 to be
precise. The part that I am really curious about right now is this
snippit of the explain plan:

>  Sort  (cost=616.64..620.56 rows=1568 width=12) (actual time=46.579..54.641 rows=6407 loops=1)    Sort Key:
latest_download.host_id
-> Subquery Scan latest_download  (cost=498.14..533.42 rows=1568 width=12) (actual time=43.657..45.594 rows=472
loops=1)

I am wondering why it would end up with a different number of rows after
the sort operation. If you want to see the full explain analyze, it's at
http://rafb.net/paste/results/D8lq9v79.html, these are lines 29-31. 

TIA


Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL



Re: sort operation leads planner to different number of rows?

From
Tom Lane
Date:
Robert Treat <xzilla@users.sourceforge.net> writes:
> Sort  (cost=616.64..620.56 rows=1568 width=12) (actual time=46.579..54.641 rows=6407 loops=1)    
>     Sort Key: latest_download.host_id
            
 
>         ->  Subquery Scan latest_download  (cost=498.14..533.42 rows=1568 width=12) (actual time=43.657..45.594
rows=472loops=1)
 

> I am wondering why it would end up with a different number of rows after
> the sort operation.

The planner's estimate didn't change: 1568 at both steps.  The "actual"
is the number of rows actually pulled from the node at runtime, and the
discrepancy here occurs because this is the inner side of a mergejoin.
mergejoin has to rescan duplicate inner rows to join them to duplicate
outer rows.  It looks like you have a pretty fair number of
duplicates...
        regards, tom lane