Thread: Reproducing query plans in empty database: can I just copy stats andsettings?

I work with Protected Health Information so have restricted access to an important database.

I would like to look into index and query changes for that database. It occurs to me that if I was able to reproduce the statistics and cost settings for the database, then Explain would produce identical results to the production database.

Can do this? Any other advice?
On 7/29/19 12:07 PM, Guyren Howe wrote:
> I work with Protected Health Information so have restricted access to an 
> important database.
> 
> I would like to look into index and query changes for that database. It 
> occurs to me that if I was able to reproduce the statistics and cost 
> settings for the database, then Explain would produce identical results 
> to the production database.
> 
> Can do this? Any other advice?

So what defines restricted access?

If you can't see/use the data in the production database in your test 
database I'm not sure how copying the statistics/cost settings is going 
to help.



-- 
Adrian Klaver
adrian.klaver@aklaver.com



On Jul 29, 2019, at 12:25 , Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 7/29/19 12:07 PM, Guyren Howe wrote:
I work with Protected Health Information so have restricted access to an important database.
I would like to look into index and query changes for that database. It occurs to me that if I was able to reproduce the statistics and cost settings for the database, then Explain would produce identical results to the production database.
Can do this? Any other advice?

So what defines restricted access?

If you can't see/use the data in the production database in your test database I'm not sure how copying the statistics/cost settings is going to help.

ul[class*='mb-extra__public-links'], ul[class*='mb-note__public-links'], ul[class*='mb-task__public-links'] { display: none !important; }
Isn’t that the entirety of what the query plan is based on? I was hoping I could then run Explain as if I had the data.
Guyren Howe <guyren@gmail.com> writes:
> On Jul 29, 2019, at 12:25 , Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>> If you can't see/use the data in the production database in your test database I'm not sure how copying the
statistics/costsettings is going to help. 

> Isn’t that the entirety of what the query plan is based on?

No; physical sizes of the tables and indexes also matter.  (The planner
scales some cost values by those numbers to account for the likelihood
that tables have grown since they were last analyzed.)  You could deal
with that by bloating them with dummy data, of course.

I think though that Adrian's point is a bit different: if you're not
allowed to see the original data, you should not be allowed to see
the statistics either.  The histogram values and most-common-values
lists represent a pretty sizable information leak for such cases.

            regards, tom lane



On 7/29/19 3:08 PM, Tom Lane wrote:
> Guyren Howe <guyren@gmail.com> writes:
>> On Jul 29, 2019, at 12:25 , Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>> If you can't see/use the data in the production database in your test database I'm not sure how copying the
statistics/costsettings is going to help.
 
> 
>> Isn’t that the entirety of what the query plan is based on?
> 
> No; physical sizes of the tables and indexes also matter.  (The planner
> scales some cost values by those numbers to account for the likelihood
> that tables have grown since they were last analyzed.)  You could deal
> with that by bloating them with dummy data, of course.
> 
> I think though that Adrian's point is a bit different: if you're not
> allowed to see the original data, you should not be allowed to see
> the statistics either.  The histogram values and most-common-values
> lists represent a pretty sizable information leak for such cases.

Yeah that and access to the actual data. Not sure how you could run a 
query that replicated the  production(even with cloned stats) if the 
table(s) are missing data. I could see throwing in dummy data, but it 
would need to correlate with the actual data and you don't know what 
that is. Then there is, what happens if you manually or 
autovacuum/analyze the test data?

> 
>             regards, tom lane
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com