[patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate. - Mailing list pgsql-hackers

From David Gould
Subject [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
Date
Msg-id 20180117164916.3fdcf2e9@engels
Whole thread Raw
Responses Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate.
Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
List pgsql-hackers
Please add the attached patch and this discussion to the open commit fest. The
original bugs thread is here: 20180111111254.1408.8342@wrigleys.postgresql.org.

Bug reference:      15005
Logged by:          David Gould
Email address:      daveg@sonic.net
PostgreSQL version: 10.1 and earlier
Operating system:   Linux
Description:        

ANALYZE can make pg_class.reltuples wildly inaccurate compared to the actual
row counts for tables that are larger than the default_statistics_target.

Example from one of a clients production instances:

# analyze verbose pg_attribute;
INFO:  analyzing "pg_catalog.pg_attribute"
INFO:  "pg_attribute": scanned 30000 of 24519424 pages, containing 6475 live rows and 83 dead rows; 6475 rows in
sample,800983035 estimated total rows.
 

This is a large complex database - pg_attribute actually has about five
million rows and needs about one hundred thouand pages. However it has
become extremely bloated and is taking 25 million pages (192GB!), about 250
times too much. This happened despite aggressive autovacuum settings and a
periodic bloat monitoring script. Since pg_class.reltuples was 800 million,
the bloat monitoring script did not detect that this table was bloated and
autovacuum did not think it needed vacuuming.
 
When reltuples is very large compared to the actual row count it causes a
number of problems:

- Bad input to the query planner.
- Prevents autovacuum from processing large bloated tables because
  autovacuum_scale_factor * reltuples is large enough the threshold is
  rarely reached.
- Decieves bloat checking tools that rely on the relationship of relpages
  to reltuples*average_row_size.

I've tracked down how this happens and created a reproduction script and a
patch. Attached:

- analyze_reltuples_bug-v1.patch   Patch against master
- README.txt                       Instructions for testing
- analyze_reltuples_bug.sql        Reproduction script
- analyze_counts.awk               Helper for viewing results of test
- test_standard.txt                Test output for unpatched postgresql 10.1
- test_patched.txt                 Test output with patch

The patch applies cleanly, with some offsets, to 9.4.15, 9.5.10, 9.6.6 and 10.1.

Note that this is not the same as the reltuples calculation bug discussed in the
thread at 16db4468-edfa-830a-f921-39a50498e77e@2ndquadrant.com. That one is
mainly concerned with vacuum, this with analyze. The two bugs do amplify each
other though.

Analysis:
---------

Analyze and vacuum calculate the new value for pg_class.reltuples in
vacuum.c:vac_estimate_reltuples():

    old_density = old_rel_tuples / old_rel_pages;
    new_density = scanned_tuples / scanned_pages;
    multiplier = (double) scanned_pages / (double) total_pages;
    updated_density = old_density + (new_density - old_density) * multiplier;
    return floor(updated_density * total_pages + 0.5);

The comments talk about the difference between VACUUM and ANALYZE and explain
that VACUUM probably only scanned changed pages so the density of the scanned
pages is not representative of the rest of the unchanged table. Hence the new
overall density of the table should be adjusted proportionaly to the scanned
pages vs total pages. Which makes sense. However despite the comment noteing
that ANALYZE and VACUUM are different, the code actually does the same
calculation for both.

The problem is that it dilutes the impact of ANALYZE on reltuples for large
tables:

- For a table of 3000000 pages an analyze can only change the reltuples
  value by 1%.
- When combined with changes in relpages due to bloat the new computed
  reltuples can end up far from reality.

Reproducing the reltuples analyze estimate bug.
-----------------------------------------------

The script "reltuples_analyze_bug.sql" creates a table that is large
compared to the analyze sample size and then repeatedly updates about
10% of it followed by an analyze each iteration. The bug is that the
calculation analyze uses to update pg_class.reltuples will tend to
increase each time even though the actual rowcount does not change.

To run:

Given a postgresql 10.x server with >= 1GB of shared buffers:

  createdb test
  psql --no-psqlrc -f analyze_reltuples_bug.sql test > test_standard.out 2>&1
  awk -f analyze_counts.awk test_standard.out

To verify the fix, restart postgres with a patched binary and repeat
the above.

Here are the results with an unpatched server:

After 10 interations of:
  update 10% of rows;
  analyze

reltuples has almost doubled.

               / estimated rows  /    /   pages   /   /sampled rows/
   relname       current  proposed    total scanned    live    dead
reltuples_test  10000001  10000055   153847   3000   195000       0
reltuples_test  10981367   9951346   169231   3000   176410   18590
reltuples_test  11948112  10039979   184615   3000   163150   31850
reltuples_test  12900718  10070666   200000   3000   151060   43940
reltuples_test  13835185   9739305   215384   3000   135655   59345
reltuples_test  14758916   9864947   230768   3000   128245   66755
reltuples_test  15674572  10138631   246153   3000   123565   71435
reltuples_test  16576847   9910944   261537   3000   113685   81315
reltuples_test  17470388  10019961   276922   3000   108550   86450
reltuples_test  18356707  10234607   292306   3000   105040   89960
reltuples_test  19228409   9639927   307690   3000    93990  101010

-dg

-- 
David Gould                                   daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [PROPOSAL] Shared Ispell dictionaries
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)