Thread: Can we use Statistics Import and Export feature to perforamance testing?
Can we use Statistics Import and Export feature to perforamance testing?
Hi,
I have a question about Statistics Import and Export.
* Background
I'm working for PGEcons[1], which is the PostgreSQL consortium in Japan.
Several companies participating in PGEcons have the following request for PostgreSQL.
They have two environments, production environment and staging environment.
Production environment has real customer's data and staging environment has dummy testing data.
When adding some application, they want to run the performance test on staging environment and
then apply to the production environment.
In the performance test, they want to use the same statistics as the production environment
to reduce the trouble in production environment.
* Question
By using Statistics Import and Export feature, is it possible to achieve the above request by following procedure?
(1) Export the statistics from production environment by using pg_dump --statistics-only.
(2) On the staging environment, set the autovacuum related parameters to prevent autovacuum from running.
(3) Import the statistics to staging environment by using the result of (1).
[1] PGECons is a non profit organization comprised of companies
in Japan to promote PostgreSQL (https://www.pgecons.org).
Regards,
Ryohei Takahashi
Re: Can we use Statistics Import and Export feature to perforamance testing?
On Tue, 8 Apr 2025 at 12:21, Ryohei Takahashi (Fujitsu) <r.takahashi_2@fujitsu.com> wrote: > By using Statistics Import and Export feature, is it possible to achieve the above request by following procedure? > (1) Export the statistics from production environment by using pg_dump --statistics-only. > (2) On the staging environment, set the autovacuum related parameters to prevent autovacuum from running. > (3) Import the statistics to staging environment by using the result of (1). You could certainly test the performance, but this method isn't guaranteed to give meaningful results just because the table stats match. One important thing to remember is that the planner also looks at the *actual size* of the relation and takes that into account when scaling the statistics (see table_block_relation_estimate_size() in tableam.c). If the table sizes don't match between the two servers then there's no guarantees the planner will produce the same plan. Also, there might be other subtleties regarding OIDs of indexes which are not guaranteed to be the same after dump/restore. Given some fuzzily close enough cost estimates (See add_path() and compare_path_costs_fuzzily()), it is possible a plan would switch to using another index if sorting the indexes by their OIDs didn't match on each server. The chances of that might be fairly small, but not zero. You'd also need to ensure the configs are the same in terms of GUCs that are used for costs. You could probably use get_relation_info_hook to overwrite the sizes and make sure the indexes are in the same order, etc. David
RE: Can we use Statistics Import and Export feature to perforamance testing?
Hi, Thank you for your reply. I understand that the access plans are not guaranteed to be the same. Can we add these notes to the pg_dump page in the PostgreSQL Documentation in order to prevent users from asking the same question? Regards, Ryohei Takahashi
Re: Can we use Statistics Import and Export feature to perforamance testing?
at the *actual size* of the relation and takes that into account when
scaling the statistics (see table_block_relation_estimate_size() in
tableam.c). If the table sizes don't match between the two servers
then there's no guarantees the planner will produce the same plan.
Re: Can we use Statistics Import and Export feature to perforamance testing?
* Question
By using Statistics Import and Export feature, is it possible to achieve the above request by following procedure?
(1) Export the statistics from production environment by using pg_dump --statistics-only.
(2) On the staging environment, set the autovacuum related parameters to prevent autovacuum from running.
(3) Import the statistics to staging environment by using the result of (1).
Re: Can we use Statistics Import and Export feature to perforamance testing?
On Sat, 12 Apr 2025 at 20:29, Corey Huinker <corey.huinker@gmail.com> wrote: >> >> at the *actual size* of the relation and takes that into account when >> scaling the statistics (see table_block_relation_estimate_size() in >> tableam.c). If the table sizes don't match between the two servers >> then there's no guarantees the planner will produce the same plan. > > Sorry that I didn't see this thread until now. I would like to note that table_block_relation_estimate_size() determinesthe actual size of the relation by asking pg_class, and the relevant values there are set by pg_restore_relation_stats(). Sorry, this isn't correct. I suspect you're probably misreading the code. On a fleeting glance, you might have seen the "relpages = (BlockNumber) rel->rd_rel->relpages;" line and come to this conclusion. A more careful study will reveal the truth. Check for "curpages = RelationGetNumberOfBlocks(rel);" and an unconditional "*pages = curpages;". You might be getting confused because the code does look at the pg_class fields, but that's only to estimate the tuple density. When pg_class has those estimates, they're used to calculate the estimated density by doing reltuples / relpages, but that average rows per page is then applied to the *actual* number of pages in the relation. This method allows the stats to be scaled as the table grows and that take effect without waiting for vacuum or analyze to update the pg_class fields. The planner checks the current size of the table every time it plans a query. That's very well understood and documented. See [1]. David [1] https://www.postgresql.org/docs/current/row-estimation-examples.html#ROW-ESTIMATION-EXAMPLES
Re: Can we use Statistics Import and Export feature to perforamance testing?
You might be getting confused because the code does look at the
pg_class fields, but that's only to estimate the tuple density. When
pg_class has those estimates, they're used to calculate the estimated
density by doing reltuples / relpages, but that average rows per page