Re: why hash on the primary key? - Mailing list pgsql-general

From Robert Haas
Subject Re: why hash on the primary key?
Date
Msg-id 603c8f070811290842r214628a8h5f1503b44a07c037@mail.gmail.com
Whole thread Raw
In response to Re: why hash on the primary key?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
>>> Well, that just says your cost parameters need a bit of adjustment
>>> if you'd like the planner to get the crossover point exactly right.
>
>> Any sense of which ones might be worth fiddling with?
>
> random_page_cost, effective_cache_size, maybe the cpu_xxx parameters.

I fiddled with this some more on a more complex query with more than
20 joins.  It appears that the culprit is from_collapse_limit.  If I
raise from_collapse_limit, most of the hash joins turn into Nested
Loops or Nested Loop Left Joins with Index Scans.  The estimated cost
is about 20% of the cost of the original plan, but the planning takes
so much longer that the actual time is 60% larger.  This is really the
pits.

I had hoped that the work Simon Riggs was doing on join removal would
hit 8.4, but that doesn't seem likely at this point.  The problem is
that the application is a web app where the user can select which
columns they want to see.  Most of the time only ~6-8 columns of 40+
that are available are selected, and the joins needed to generate the
unselected columns can be tossed (which would hopefully both avoid
unnecessary join steps and also speed up planning the remaining
joins).

...Robert

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: why hash on the primary key?
Next
From: "Andrus"
Date:
Subject: db backup script in gentoo