Re: BUG #15005: ANALYZE can make pg_class.reltuples inaccurate. - Mailing list pgsql-bugs

From David Gould
Subject Re: BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
Date
Msg-id 20180111041757.7858c2ae@engels
Whole thread Raw
In response to BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs
> Bug reference:      15005
> Logged by:          David Gould
> Email address:      daveg@sonic.net

> ANALYZE can make pg_class.reltuples wildly inaccurate compared to the actual
> row counts for tables that are larger than the default_statistics_target.

I've tracked down how this happens and created a reproduction script and a
patch. Attached:

- vac_estimate_reltuples.patch     Patch against master
- README.txt                       Instructions for testing
- reltuples_analyze_bug.sql        Reproduction script
- analyze_counts.awk               Helper for viewing results of test
- test_standard.txt                Test output for unpatched postgresql 10.1
- test_patched.txt                 Test output with patch

Analysis:
---------

Analyze and vacuum calculate the new value for pg_class.reltuples in
vacuum.c:vac_estimate_reltuples():

    old_density = old_rel_tuples / old_rel_pages;
    new_density = scanned_tuples / scanned_pages;
    multiplier = (double) scanned_pages / (double) total_pages;
    updated_density = old_density + (new_density - old_density) * multiplier;
    return floor(updated_density * total_pages + 0.5);

The comments talk about the difference between VACUUM and ANALYZE and explain
that VACUUM probably only scanned changed pages so the density of the scanned
pages is not representative of the rest of the unchanged table. Hence the new
overall density of the table should be adjusted proportionaly to the scanned
pages vs total pages. Which makes sense. However despite the comment noteing
that ANALYZE and VACUUM are different, the code actually does the same
calculation for both.

The problem is that it dilutes the impact of ANALYZE on reltuples for large
tables:

- For a table of 3000000 pages an analyze can only change the reltuples
  value by 1%.
- When combined with changes in relpages due to bloat the new computed
  reltuples can end up far from reality.


Reproducing the reltuples analyze estimate bug.
-----------------------------------------------

The script "reltuples_analyze_bug.sql" creates a table that is large
compared to the analyze sample size and then repeatedly updates about
10% of it followed by an analyze each iteration. The bug is that the
calculation analyze uses to update pg_class.reltuples will tend to
increase each time even though the actual rowcount does not change.

To run:

Given a postgresql 10.x server with >= 1GB of shared buffers:

  createdb test
  psql --no-psqlrc -f reltuples_analyze_bug.sql test > test_standard.out 2>&1
  awk -f analyze_counts.awk test_standard.out

To verify the fix, restart postgres with a patched binary and repeat
the above.

Here are the results with an unpatched server:

After 10 interations of:
  update 10% of rows;
  analyze

reltuples has almost doubled.

               / estimated rows  /    /   pages   /   /sampled rows/
   relname       current  proposed    total scanned    live    dead
reltuples_test  10000001  10000055   153847   3000   195000       0
reltuples_test  10981367   9951346   169231   3000   176410   18590
reltuples_test  11948112  10039979   184615   3000   163150   31850
reltuples_test  12900718  10070666   200000   3000   151060   43940
reltuples_test  13835185   9739305   215384   3000   135655   59345
reltuples_test  14758916   9864947   230768   3000   128245   66755
reltuples_test  15674572  10138631   246153   3000   123565   71435
reltuples_test  16576847   9910944   261537   3000   113685   81315
reltuples_test  17470388  10019961   276922   3000   108550   86450
reltuples_test  18356707  10234607   292306   3000   105040   89960
reltuples_test  19228409   9639927   307690   3000    93990  101010

-- 
David Gould                                   daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

Attachment

pgsql-bugs by date:

Previous
From: David Gould
Date:
Subject: Re: BUG #15003: pg_terminate_backend does not work
Next
From: PG Bug reporting form
Date:
Subject: BUG #15006: "make check" error if current user is "user"