Thread: Statistics mismatch between n_live_tup and actual row count

From:
Andreas Brandl
Date:

Hi,

we're currently investigating a statistics issue on postgres. We have some tables which frequently show up with strange
valuesfor n_live_tup. If you compare those values with a count on that particular table, there is a mismatch of factor
10-30.This causes the planner to come up with very bad plans (we also have this issue on bigger table like the one
below).

db=# SELECT relname, n_live_tup, last_analyze, last_autoanalyze FROM pg_stat_user_tables WHERE schemaname = 's' AND
relname= 't'; 
 relname  | n_live_tup |         last_analyze          |       last_autoanalyze
----------+------------+-------------------------------+-------------------------------
 t        |       7252 | 2011-12-08 03:00:02.556088+01 | 2011-12-01 18:29:00.536321+01

db=# SELECT COUNT(*) FROM s.t;
 count
-------
   280

The strange thing is, if we run an ANALYZE on this table, the statistic is good. 10 minutes later it's bad again. We
suspectone of our processes which might do unnecessary (i.e. blind) updates. Can this be the cause of the statistics
problem?

PostgreSQL 9.1.1 on x86_64-unknown-linux-gnu, compiled by gcc-4.6.real (Debian 4.6.2-4) 4.6.2, 64-bit

The migration to 9.1 did not fix this problem, as we already have this since 8.4.

I can provide all postgres configuration, but I don't see anything we changed (compared to the default config), which
mightbe related here. 

Any clue appreciated here!

Best regards
Andy

From:
Tom Lane
Date:

Andreas Brandl <> writes:
> we're currently investigating a statistics issue on postgres. We have some tables which frequently show up with
strangevalues for n_live_tup. If you compare those values with a count on that particular table, there is a mismatch of
factor10-30. This causes the planner to come up with very bad plans (we also have this issue on bigger table like the
onebelow). 

The planner doesn't use n_live_tup; the only thing that that's used for
is decisions about when to autovacuum/autoanalyze.  So you have two
problems here not one.

Can you provide a test case for the n_live_tup drift?  That is,
something that when done over and over causes n_live_tup to get further
and further from reality?

            regards, tom lane

From:
Andreas Brandl
Date:

Hi,

> Andreas Brandl <> writes:
> > we're currently investigating a statistics issue on postgres. We
> > have some tables which frequently show up with strange values for
> > n_live_tup. If you compare those values with a count on that
> > particular table, there is a mismatch of factor 10-30. This causes
> > the planner to come up with very bad plans (we also have this issue
> > on bigger table like the one below).
>
> The planner doesn't use n_live_tup; the only thing that that's used
> for
> is decisions about when to autovacuum/autoanalyze. So you have two
> problems here not one.

So, you're saying that having a mismatch between n_live_tup and the actual row count is not that much of a problem
(besidesit influences when to auto-vacuum/analyze), right?  

I'm just curious: where does the planner take the (approximate) row-count from?

> Can you provide a test case for the n_live_tup drift? That is,
> something that when done over and over causes n_live_tup to get
> further
> and further from reality?

I'll try to implement a minimal piece of code showing this, although I'm not sure if this will work.

Might there be a link between n_live_tup drifting and doing unnecessary (blind) updates, which do not change any
informationof a row? 

Thank you!

Best regards
Andreas

From:
Tom Lane
Date:

Andreas Brandl <> writes:
>> The planner doesn't use n_live_tup;

> I'm just curious: where does the planner take the (approximate) row-count from?

It uses the tuple density estimated by the last vacuum or analyze (viz,
reltuples/relpages) and multiplies that by the current relation size.
There are various reasons for not using n_live_tup, some historical and
some still pretty relevant.

> Might there be a link between n_live_tup drifting and doing unnecessary (blind) updates, which do not change any
informationof a row? 

Possibly.  It's premature to speculate with no test case, but I'm
wondering if HOT updates confuse that arithmetic.  No-op updates
would follow the HOT path as long as there was room on the page...

            regards, tom lane

From:
Andreas Brandl
Date:

> Andreas Brandl <> writes:
> >> The planner doesn't use n_live_tup;
>
> > I'm just curious: where does the planner take the (approximate)
> > row-count from?
>
> It uses the tuple density estimated by the last vacuum or analyze
> (viz,
> reltuples/relpages) and multiplies that by the current relation size.
> There are various reasons for not using n_live_tup, some historical
> and
> some still pretty relevant.

Thanks. I checked pg_class.reltuples against the corresponding count(*) and it's the same drifting here: reltuples
equalsn_live_tup for our problematic tables and is way off the actual row count. 

So I guess it's only one problem again (or it's likely that the problem of bad plans is correlated with the reltuples
estimation).

> > Might there be a link between n_live_tup drifting and doing
> > unnecessary (blind) updates, which do not change any information of
> > a row?
>
> Possibly. It's premature to speculate with no test case, but I'm
> wondering if HOT updates confuse that arithmetic. No-op updates
> would follow the HOT path as long as there was room on the page...

I try to come up with a test case, if possible.

Thank you!

Best regards, Andreas

From:
tim_wilson
Date:

I am also seeing a drift in the n_live_tup value compared to actual row count
on the table on PG9.0.6

It drifts after a vacuum , you can bring it back closer to the actual number
by running ANALYSE several times, you can lock it back into the right value
with a vacuum full, but then if you run a vacuum it  shows a n_live_tup less
than the actual rows in the table.

This is on a table seeing 80% HOT updates. Its a table that is heavily
updated.

Server not running autovacuum

Tim



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Statistics-mismatch-between-n-live-tup-and-actual-row-count-tp5059317p5735108.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


From:
tim_wilson
Date:

This drift gets more confusing.

My small table A (60K rows) is not being inserted to (except one or two
rows) it is getting thousands of updates a minute. Analyze and vacuum on the
table are running regularly. But sometimes ,every time the vacuum runs the
reltuples value jumps up. Sometime bloating by 4 times the actualy size of
the table (have seen 10 times)

The impact of this on the query plans is evident straight away. Joins from
all  small A to large table B (55 Million rows) on the large B's primary key
start seq scanning.

Some sort of threshold is being hit where the planner believes that the size
of A is big enough that given we are requesting all of A we might as well
scan all of B.

Also even though the table has been clustered and even vacuum full'ed the
relpages value continues to grow.

Any answers?




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Statistics-mismatch-between-n-live-tup-and-actual-row-count-tp5059317p5735588.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


From:
Adrian Klaver
Date:

On 12/06/2012 06:13 PM, tim_wilson wrote:
> This drift gets more confusing.
>
> My small table A (60K rows) is not being inserted to (except one or two
> rows) it is getting thousands of updates a minute. Analyze and vacuum on the
> table are running regularly. But sometimes ,every time the vacuum runs the
> reltuples value jumps up. Sometime bloating by 4 times the actualy size of
> the table (have seen 10 times)
>
> The impact of this on the query plans is evident straight away. Joins from
> all  small A to large table B (55 Million rows) on the large B's primary key
> start seq scanning.
>
> Some sort of threshold is being hit where the planner believes that the size
> of A is big enough that given we are requesting all of A we might as well
> scan all of B.
>
> Also even though the table has been clustered and even vacuum full'ed the
> relpages value continues to grow.
>
> Any answers?

1) FYI an update is a delete/insert, so you are actually getting
thousands of inserts a minute.
2) As to why the size is growing- Are there open transactions holding
old row versions around?




--
Adrian Klaver



From:
tim_wilson
Date:

<http://postgresql.1045698.n5.nabble.com/file/n5735593/pg_drift.png>



86% of the updates are HOT updates!

The difference between the first and second line of this image above is that
366 updates happened of which 299 where HOT. And a vacuum on the table was
run.
Look at the retuples number change after the vacuum!

Open transactions... how does this impact those numbers?



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Statistics-mismatch-between-n-live-tup-and-actual-row-count-tp5059317p5735593.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


From:
Adrian Klaver
Date:

On 12/06/2012 06:45 PM, tim_wilson wrote:
>    <http://postgresql.1045698.n5.nabble.com/file/n5735593/pg_drift.png>
>
>
>
> 86% of the updates are HOT updates!
>
> The difference between the first and second line of this image above is that
> 366 updates happened of which 299 where HOT. And a vacuum on the table was
> run.
> Look at the retuples number change after the vacuum!
>
> Open transactions... how does this impact those numbers?
>

The very simple version:
Because of MVCC a particular row may have many tuple versions
associated with it. If those tuple versions are visible to open
transactions they may not be marked for removal until those transactions
are completed.

Others on this list are more capable than I of filling in the fine detail.


--
Adrian Klaver



From:
"ac@hsk.hk"
Date:

Hi,

I have questions about Linux Write cache sizing:

1) /proc/sys/vm/dirty_ratio : current value (default)  20
2) /proc/sys/vm/dirty_background_ratio: current value (default)  10

I am using Ubuntu 12.04 (8GB RAM) and PostgreSQL 9.2.1, what values of these kernel ratios should be set for better
PostgreSQLdatabase performance?  

Thanks
ac



From:
Tom Lane
Date:

tim_wilson <> writes:
> This drift gets more confusing.

In recent releases, reltuples (and relpages) are updated via a "moving
average" calculation that is meant to converge on the true value over
time.  The reason vacuum has to act that way is that it typically
doesn't scan the whole table anymore, but only the pages that have been
dirtied recently.  So it measures the tuple density in the pages it
scans, and updates the previous value more or less aggressively
depending on the percentage of the pages that it looked at.

It's possible that there's some bug in that algorithm, but what seems
more likely is that the pages that are getting dirtied are
nonrepresentative of the whole table for some reason.  Or maybe both.

Can you put together a self-contained test case showing this behavior?

            regards, tom lane


From:
tim_wilson
Date:

Thanks for the reply Tom,

will try and construct test case. Have been unable to replicate in a simple
test the sort of updates that the table out in the wild is seeing, so may
impact that issue of vacuum finding unrepresentative sample, maybe.

Will try harder!

When you say recent releases, does that include 8.4?



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Statistics-mismatch-between-n-live-tup-and-actual-row-count-tp5059317p5735602.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


From:
Ondrej Ivanič
Date:

Hi,

On 7 December 2012 14:17,  <> wrote:
> I have questions about Linux Write cache sizing:
>
> 1) /proc/sys/vm/dirty_ratio : current value (default)  20
> 2) /proc/sys/vm/dirty_background_ratio: current value (default)  10
>
> I am using Ubuntu 12.04 (8GB RAM) and PostgreSQL 9.2.1, what values of these kernel ratios
> should be set for better PostgreSQL database performance?

Read this thread:
http://archives.postgresql.org/pgsql-performance/2011-10/msg00076.php
Mainly,
- http://archives.postgresql.org/pgsql-performance/2011-10/msg00078.php
- and http://archives.postgresql.org/pgsql-performance/2011-10/msg00080.php

To sum up:
- with 8G RAM you should be fine with the defaults
- keep keep dirty_ratio lower than the size of your disk controller cache
- you can use dirty_bytes and dirty_background_bytes in recent kernels

--
Ondrej Ivanic
(http://www.linkedin.com/in/ondrejivanic)


From:
Tom Lane
Date:

tim_wilson <> writes:
> When you say recent releases, does that include 8.4?

8.4.9 and later --- anything including this commit:

Author: Tom Lane <>
Branch: master Release: REL9_1_BR [b4b6923e0] 2011-05-30 17:06:52 -0400
Branch: REL9_0_STABLE Release: REL9_0_5 [73bd34c81] 2011-05-30 17:07:07 -0400
Branch: REL8_4_STABLE Release: REL8_4_9 [b503da135] 2011-05-30 17:07:19 -0400

    Fix VACUUM so that it always updates pg_class.reltuples/relpages.

    When we added the ability for vacuum to skip heap pages by consulting the
    visibility map, we made it just not update the reltuples/relpages
    statistics if it skipped any pages.  But this could leave us with extremely
    out-of-date stats for a table that contains any unchanging areas,
    especially for TOAST tables which never get processed by ANALYZE.  In
    particular this could result in autovacuum making poor decisions about when
    to process the table, as in recent report from Florian Helmberger.  And in
    general it's a bad idea to not update the stats at all.  Instead, use the
    previous values of reltuples/relpages as an estimate of the tuple density
    in unvisited pages.  This approach results in a "moving average" estimate
    of reltuples, which should converge to the correct value over multiple
    VACUUM and ANALYZE cycles even when individual measurements aren't very
    good.

    This new method for updating reltuples is used by both VACUUM and ANALYZE,
    with the result that we no longer need the grotty interconnections that
    caused ANALYZE to not update the stats depending on what had happened
    in the parent VACUUM command.

    Also, fix the logic for skipping all-visible pages during VACUUM so that it
    looks ahead rather than behind to decide what to do, as per a suggestion
    from Greg Stark.  This eliminates useless scanning of all-visible pages at
    the start of the relation or just after a not-all-visible page.  In
    particular, the first few pages of the relation will not be invariably
    included in the scanned pages, which seems to help in not overweighting
    them in the reltuples estimate.

    Back-patch to 8.4, where the visibility map was introduced.

            regards, tom lane


From:
"ac@hsk.hk"
Date:

Hi, thank you very much!

On 7 Dec 2012, at 11:47 AM, Ondrej Ivanič wrote:

> Hi,
>
> On 7 December 2012 14:17,  <> wrote:
>> I have questions about Linux Write cache sizing:
>>
>> 1) /proc/sys/vm/dirty_ratio : current value (default)  20
>> 2) /proc/sys/vm/dirty_background_ratio: current value (default)  10
>>
>> I am using Ubuntu 12.04 (8GB RAM) and PostgreSQL 9.2.1, what values of these kernel ratios
>> should be set for better PostgreSQL database performance?
>
> Read this thread:
> http://archives.postgresql.org/pgsql-performance/2011-10/msg00076.php
> Mainly,
> - http://archives.postgresql.org/pgsql-performance/2011-10/msg00078.php
> - and http://archives.postgresql.org/pgsql-performance/2011-10/msg00080.php
>
> To sum up:
> - with 8G RAM you should be fine with the defaults
> - keep keep dirty_ratio lower than the size of your disk controller cache
> - you can use dirty_bytes and dirty_background_bytes in recent kernels
>
> --
> Ondrej Ivanic
> (http://www.linkedin.com/in/ondrejivanic)



From:
tim_wilson
Date:

It seems that we are currently running 8.4.3 on the server we are
encountering the problem.

Will upgrade to 8.4.9 and then will come back with a test case if we still
see the issue

Thanks again for your help.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Statistics-mismatch-between-n-live-tup-and-actual-row-count-tp5059317p5735835.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.