Re: 7k records into Sort node, 4.5m out? - Mailing list pgsql-performance

From Tom Lane
Subject Re: 7k records into Sort node, 4.5m out?
Date
Msg-id 2703.1345096911@sss.pgh.pa.us
Whole thread Raw
In response to Re: 7k records into Sort node, 4.5m out?  (Josh Berkus <josh@agliodbs.com>)
Responses Re: 7k records into Sort node, 4.5m out?
List pgsql-performance
Josh Berkus <josh@agliodbs.com> writes:
> SELECT COUNT(*)
> FROM "user"
> INNER JOIN "house"
>       ON ("user"."house_id" = "house"."id")
> LEFT OUTER JOIN "district"
>       ON ("house"."district_id" = "district"."id")
> WHERE ("user"."status" = 0
>      AND ("district"."update_status" = 2
>                 OR "district"."update_status" = 3 )
>      AND ("user"."valid" = 1
>                 OR "user"."valid" = 3 )
>      AND "district"."is_test" = false );

> However, since the anonymization above doesn't quite match that used in
> the EXPLAIN plan, I'm not sure what you'll get out of it.  And yes, we
> know that the outer join is being invalidated.

Ah, I see where I was confused: in the original query plan I'd been
imagining that charlie.sierra was a unique column, but your gloss on
that as being house.district_id implies that it's highly non-unique.
And looking at the rowcounts in the original plan backs that up:
there are about 600 house rows per district row.  So my thought of
having district as the outer side of a nestloop scanning the index
on house.district_id would not really work very well --- maybe it
would end up cheaper than the mergejoin plan, but it's far from a
clear-cut win.

On the whole I'm thinking the code is operating as designed here.

            regards, tom lane


pgsql-performance by date:

Previous
From: Claudio Freire
Date:
Subject: Re: High Disk write and space taken by PostgreSQL
Next
From: Bruce Momjian
Date:
Subject: Re: High Disk write and space taken by PostgreSQL