Thread: strange row number estimates in pg9.1rc1

strange row number estimates in pg9.1rc1

From

"Sergey E. Koposov"

Date:

29 August 2011, 05:47:38

Hello hackers,

I'm seeing something weird which looks like a bug in 9.1rc1 after the
upgrade 8.4->9.0->9.1 done using pg_upgrade.

I have a set of *static* tables for which "explain select * " gives
row number estimates which are an order of magnitude lower than the actual
number of rows in a table (despite the vacuum analyze executed on a
table immediately before). See:

wsdb=> vacuum verbose analyze ukidssdr7.lassource;
INFO: vacuuming "ukidssdr7.lassource"
INFO: index "ukidssdr7lassource_q3c_idx" now contains 58060655 row
versions in
143515 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.55s/0.19u sec elapsed 3.23 sec.
.........
INFO: "lassource": found 0 removable, 0 nonremovable row versions in 0
out of 6451184 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 2.66s/0.94u sec elapsed 17.92 sec.
INFO: analyzing "ukidssdr7.lassource"
INFO: "lassource": scanned 30000 of 6451184 pages, containing 270000 live
rows and 0 dead rows; 30000 rows in sample, 806239 estimated total rows

wsdb=> explain select * from ukidssdr7.lassource ; QUERY PLAN
---------------------------------------------------------------------- Seq Scan on lassource (cost=0.00..6459246.39
rows=806239width=766)

wsdb=> select count(*) from ukidssdr7.lassource ; count
---------- 58060655
(1 row)

All the columns in that table have fixed width types (e.g.
real,int,bigint etc; no varchars, texts). So I don't see the reason why
the row number estimate must be so much off. I also checked that the size
of the relation is almost exactly equal to width * count(*) =
766*58060655. So there is no empty space anywhere in the relation ( as it
should be because there was completely no write activity on the table).

And I noticed that at least for several tables with hundreds of millions
rows, explain select * shows ridiculously small number of expected rows:
wsdb=> explain select * from sdssdr7.phototag ; QUERY PLAN
---------------------------------------------------------------------- Seq Scan on phototag (cost=0.00..24408626.00
rows=720000width=288)

I guess it may be important that I did upgrade the cluster from 8.4 to
9.0 and to 9.1 using pg_upgrade. vacuum analyze have been run on the cluster.
after the ugprades.

Am i missing something or is it a bug ? it looks to me like some
arithmetic error in the computation of the number of rows in the tables.
At least before when I was using PG 8.4 for the same data, I was used to
do explain select * to get the number of rows in the tables, instead of
count(*) (my tables are very large), now it seems that there is a huge
discrepancy between the numbers.

Thanks, Sergey

*******************************************************************
Sergey E. Koposov, PhD
Institute for Astronomy, Cambridge/Sternberg Astronomical Institute
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math@sai.msu.ru

Re: strange row number estimates in pg9.1rc1

From

Tom Lane

Date:

30 August 2011, 05:34:47

"Sergey E. Koposov" <math@sai.msu.ru> writes:
> I'm seeing something weird which looks like a bug in 9.1rc1 after the 
> upgrade 8.4->9.0->9.1 done using pg_upgrade.

Hm, I wonder what pg_upgrade left relpages/reltuples set to ...

> INFO:  "lassource": found 0 removable, 0 nonremovable row versions in 0 
> out of 6451184 pages

VACUUM is deciding that the table is all-visible and it need not scan
any of it ...

> INFO:  "lassource": scanned 30000 of 6451184 pages, containing 270000 live 
> rows and 0 dead rows; 30000 rows in sample, 806239 estimated total rows

... and then ANALYZE is scanning only 30000 of 6451184 pages, so even
though it knows the correct tuple density in those pages, it's not
licensed to change the reltuples total very much.  If reltuples is way
off base to start with, then it will take a lot of ANALYZE cycles to
converge on a saner value.  However, this doesn't leave us much closer
to understanding why it's off base to start with.
        regards, tom lane

Re: strange row number estimates in pg9.1rc1

From

Tom Lane

Date:

30 August 2011, 16:16:51

I wrote:
> "Sergey E. Koposov" <math@sai.msu.ru> writes:
>> I'm seeing something weird which looks like a bug in 9.1rc1 after the 
>> upgrade 8.4->9.0->9.1 done using pg_upgrade.

> Hm, I wonder what pg_upgrade left relpages/reltuples set to ...

Sure enough, that's the problem.  pg_upgrade leaves relpages/reltuples
set to zero, but it also imports the visibility map pages from the old
cluster.  If the old visibility map shows the table as all-visible,
then this happens when you try to VACUUM ANALYZE the table:

1. VACUUM doesn't process any pages, so it has no tuple density
estimate.  It leaves reltuples set to zero, but it does set relpages.

2. ANALYZE scans some part of the table.  It gets a tuple density
estimate for those pages ... but if that's only a small fraction of
the table, it believes the zero estimate of tuple density elsewhere.
So you get only a small update of reltuples.

(The above behavior is new as of commit
b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8, BTW.)

Basically, step 1 is buggy here: if we aren't making an update to
reltuples, we shouldn't set relpages either.  Setting it nonzero
changes the implied tuple density from "unknown" to "known zero",
which is wrong.

I'll go fix that, but I think it might be a good idea for pg_upgrade
to think about preserving the relpages/reltuples columns ...
        regards, tom lane

PS: right now, you cannot reproduce this in a 9.0 -> HEAD upgrade,
because of this patch:

commit 00a7c9014a8fbb7388a807daeba3e0a85b49a747
Author: Bruce Momjian <bruce@momjian.us>
Date:   Fri Aug 19 11:20:30 2011 -0400
   In pg_upgrade, don't copy visibility map files from clusters that did not   have crash-safe visibility maps to
clustersthat expect crash-safety.      Request from Robert Haas.

I did reproduce it in a 9.0->9.1 test.

Re: strange row number estimates in pg9.1rc1

From

"Sergey E. Koposov"

Date:

30 August 2011, 16:37:44

On Tue, 30 Aug 2011, Tom Lane wrote:

> Sure enough, that's the problem.  pg_upgrade leaves relpages/reltuples
> set to zero, but it also imports the visibility map pages from the old
> cluster.  If the old visibility map shows the table as all-visible,
> then this happens when you try to VACUUM ANALYZE the table:
>
> 1. VACUUM doesn't process any pages, so it has no tuple density
> estimate.  It leaves reltuples set to zero, but it does set relpages.
>
> 2. ANALYZE scans some part of the table.  It gets a tuple density
> estimate for those pages ... but if that's only a small fraction of
> the table, it believes the zero estimate of tuple density elsewhere.
> So you get only a small update of reltuples.

Thanks for figuring this out.
I wonder what should be the best way to proceed for already migrated 
databases -- running analyze repeatedly may not be the best way for 
very large clusters with large tables...
    S

*******************************************************************
Sergey E. Koposov, PhD
Institute for Astronomy, Cambridge/Sternberg Astronomical Institute
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math@sai.msu.ru

Re: strange row number estimates in pg9.1rc1

From

Tom Lane

Date:

30 August 2011, 16:43:03

"Sergey E. Koposov" <math@sai.msu.ru> writes:
> I wonder what should be the best way to proceed for already migrated 
> databases -- running analyze repeatedly may not be the best way for 
> very large clusters with large tables...

You can just manually update pg_class.reltuples to a more reasonable
value.
        regards, tom lane

Re: strange row number estimates in pg9.1rc1

From

Bruce Momjian

Date:

02 September 2011, 00:54:01

Tom Lane wrote:
> I wrote:
> > "Sergey E. Koposov" <math@sai.msu.ru> writes:
> >> I'm seeing something weird which looks like a bug in 9.1rc1 after the 
> >> upgrade 8.4->9.0->9.1 done using pg_upgrade.
> 
> > Hm, I wonder what pg_upgrade left relpages/reltuples set to ...
> 
> Sure enough, that's the problem.  pg_upgrade leaves relpages/reltuples
> set to zero, but it also imports the visibility map pages from the old
> cluster.  If the old visibility map shows the table as all-visible,
> then this happens when you try to VACUUM ANALYZE the table:
> 
> 1. VACUUM doesn't process any pages, so it has no tuple density
> estimate.  It leaves reltuples set to zero, but it does set relpages.
> 
> 2. ANALYZE scans some part of the table.  It gets a tuple density
> estimate for those pages ... but if that's only a small fraction of
> the table, it believes the zero estimate of tuple density elsewhere.
> So you get only a small update of reltuples.
> 
> (The above behavior is new as of commit
> b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8, BTW.)
> 
> Basically, step 1 is buggy here: if we aren't making an update to
> reltuples, we shouldn't set relpages either.  Setting it nonzero
> changes the implied tuple density from "unknown" to "known zero",
> which is wrong.
> 
> I'll go fix that, but I think it might be a good idea for pg_upgrade
> to think about preserving the relpages/reltuples columns ...

pg_upgrade currently only restores some oids and frozenxids.  We would
need to modify pg_dump --binary-upgrade mode to restore those values ---
it isn't hard to do.

> PS: right now, you cannot reproduce this in a 9.0 -> HEAD upgrade,
> because of this patch:
> 
> commit 00a7c9014a8fbb7388a807daeba3e0a85b49a747
> Author: Bruce Momjian <bruce@momjian.us>
> Date:   Fri Aug 19 11:20:30 2011 -0400
> 
>     In pg_upgrade, don't copy visibility map files from clusters that did not
>     have crash-safe visibility maps to clusters that expect crash-safety.
>     
>     Request from Robert Haas.
> 
> I did reproduce it in a 9.0->9.1 test.

Right, that is expected.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +