Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a
Date
Msg-id 603c8f070912021924j3893ba16wcf9538eeb9038a15@mail.gmail.com
Whole thread Raw
In response to Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Dec 2, 2009 at 9:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> One other thing I'm noticing about the current implementation is that
>> it seems to spend an entirely excessive amount of brain power
>> considering the best order in which to execute cross-joins.  If I do
>> X, A JOIN B ON Pab JOIN C ON Pac JOIN D ON Pad JOIN E ON Pae, it looks
>> to me like join_search_one_level() will try joining X to each of A-E.
>> That seems fairly pointless; why would I ever want to join X to
>> anything other than {A B C D E}?
>
> Not sure that a lot of cross joins with no conditions is the case to
> design around.  Usually queries aren't that devoid of features of
> interest, and so different join paths are actually usefully different.

Not sure what you mean.  There's already a special-case code path for
cross joins; but I think it's probably considering a lot of silly
paths.  Is there a case where it makes sense to do cross joins at some
stage of the process other than last?

>> ...  We should maybe also
>> think about raising the default value for work_mem.  It's hard for me
>> to believe that the average Postgres user wants a sort that takes more
>> than 1MB of memory to spill to disk; there certainly are people who
>> probably want that, but I doubt there are very many.  I believe we've
>> been using that value for a decade, and memory size has increased a
>> lot in that time.
>
> Maybe.  I'll certainly grant that machines have more memory, but is the
> average Postgres installation using that to run bigger sorts, or to run
> more sorts (either more concurrent queries or more complex queries
> containing more sorts)?  We know that increasing work_mem too much
> can be counterproductive, and much sooner than one might think.

A further confounding factor is that work_mem also controls memory
usage for hash tables - whereas the original sort_mem did not - and at
least in my experience it's more common to have multiple hashes in a
query than multiple sorts.   It would be nice to have some data on
this rather than just hand-waving, but I'm not sure how to get it.
For default_statistics_target, *_collapse_threshold, and
geqo_threshold, we were able to construct worst-case queries and
benchmark them.  I have no idea how to do something comparable for
work_mem.

...Robert


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: CommitFest status/management
Next
From: Tom Lane
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Rewrite GEQO's gimme_tree function so that it always finds a