Re: Planner reluctant to start from subquery - Mailing list pgsql-performance

From Tom Lane
Subject Re: Planner reluctant to start from subquery
Date
Msg-id 4359.1138826175@sss.pgh.pa.us
Whole thread Raw
In response to Re: Planner reluctant to start from subquery  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: Planner reluctant to start from subquery  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-performance
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm interested to poke at this ... are you in a position to provide a
>> test case?

> I can't supply the original data, since many of the tables have
> millions of rows, with some of the data (related to juvenile, paternity,
> sealed, and expunged cases) protected by law.  I could try to put
> together a self-contained example, but I'm not sure the best way to do
> that, since the table sizes and value distributions may be significant
> here.  Any thoughts on that?

I think that the only aspect of the data that really matters here is the
number of distinct values, which would affect decisions about whether
HashAggregate is appropriate or not.  And you could probably get the
same thing to happen with at most a few tens of thousands of rows.

Also, all we need to worry about is the columns used in the WHERE/JOIN
conditions, which looks to be mostly case numbers, dates, and county
identification ... how much confidential info is there in that?  At
worst you could translate the case numbers to some randomly generated
identifiers.

            regards, tom lane

pgsql-performance by date:

Previous
From: "Jeffrey W. Baker"
Date:
Subject: Re: Index Usage using IN
Next
From: Tom Lane
Date:
Subject: Re: Index Usage using IN