Re: Nested Loops vs. Hash Joins or Merge Joins - Mailing list pgsql-performance

From Jim C. Nasby
Subject Re: Nested Loops vs. Hash Joins or Merge Joins
Date
Msg-id 20060511225136.GY99570@pervasive.com
Whole thread Raw
In response to Nested Loops vs. Hash Joins or Merge Joins  (Ketema Harris <ketema@gmail.com>)
List pgsql-performance
On Thu, May 11, 2006 at 08:57:48AM -0400, Ketema Harris wrote:
> Nested Loops on:
> Nested Loop  (cost=3.33..11.37 rows=1 width=268) (actual time=2.166..2.982
>
> Nested Loops off:
> Hash Join  (cost=8.27..11.78 rows=1 width=268) (actual time=1.701..1.765
>
> With nested loops enabled does it choose to use them because it sees the
> estimated start up cost with loops as less?  Does it not know that the total
> query would be faster with the Hash Joins?  This query is in development

Yes it does know; re-read the output.

I believe the cases where the planner will look at startup cost over
total cost are pretty limited; when LIMIT is used and I think sometimes
when a CURSOR is used.

> Statistics collecting and auto vacuum is enabled btw.  I have an erd diagram
> showing the table structures if anyone is interested in looking at it, just
> let me know.

Note that it's not terribly uncommon for the default stats target to be
woefully inadequate for large sets of data, not that 100 rows a day is
large. But it probably wouldn't hurt to bump the defaulst stats target
up to 30 or 50 anyway.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

pgsql-performance by date:

Previous
From: Joe Conway
Date:
Subject: Re: Dynamically loaded C function performance
Next
From: "Jim C. Nasby"
Date:
Subject: Re: [HACKERS] Big IN() clauses etc : feature proposal