Thread: strange row number estimates in pg9.1rc1
Hello hackers, I'm seeing something weird which looks like a bug in 9.1rc1 after the upgrade 8.4->9.0->9.1 done using pg_upgrade. I have a set of *static* tables for which "explain select * " gives row number estimates which are an order of magnitude lower than the actual number of rows in a table (despite the vacuum analyze executed on a table immediately before). See: wsdb=> vacuum verbose analyze ukidssdr7.lassource; INFO: vacuuming "ukidssdr7.lassource" INFO: index "ukidssdr7lassource_q3c_idx" now contains 58060655 row versions in 143515 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.55s/0.19u sec elapsed 3.23 sec. ......... INFO: "lassource": found 0 removable, 0 nonremovable row versions in 0 out of 6451184 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 2.66s/0.94u sec elapsed 17.92 sec. INFO: analyzing "ukidssdr7.lassource" INFO: "lassource": scanned 30000 of 6451184 pages, containing 270000 live rows and 0 dead rows; 30000 rows in sample, 806239 estimated total rows wsdb=> explain select * from ukidssdr7.lassource ; QUERY PLAN ---------------------------------------------------------------------- Seq Scan on lassource (cost=0.00..6459246.39 rows=806239width=766) wsdb=> select count(*) from ukidssdr7.lassource ; count ---------- 58060655 (1 row) All the columns in that table have fixed width types (e.g. real,int,bigint etc; no varchars, texts). So I don't see the reason why the row number estimate must be so much off. I also checked that the size of the relation is almost exactly equal to width * count(*) = 766*58060655. So there is no empty space anywhere in the relation ( as it should be because there was completely no write activity on the table). And I noticed that at least for several tables with hundreds of millions rows, explain select * shows ridiculously small number of expected rows: wsdb=> explain select * from sdssdr7.phototag ; QUERY PLAN ---------------------------------------------------------------------- Seq Scan on phototag (cost=0.00..24408626.00 rows=720000width=288) I guess it may be important that I did upgrade the cluster from 8.4 to 9.0 and to 9.1 using pg_upgrade. vacuum analyze have been run on the cluster. after the ugprades. Am i missing something or is it a bug ? it looks to me like some arithmetic error in the computation of the number of rows in the tables. At least before when I was using PG 8.4 for the same data, I was used to do explain select * to get the number of rows in the tables, instead of count(*) (my tables are very large), now it seems that there is a huge discrepancy between the numbers. Thanks, Sergey ******************************************************************* Sergey E. Koposov, PhD Institute for Astronomy, Cambridge/Sternberg Astronomical Institute Web: http://lnfm1.sai.msu.ru/~math E-mail: math@sai.msu.ru
"Sergey E. Koposov" <math@sai.msu.ru> writes: > I'm seeing something weird which looks like a bug in 9.1rc1 after the > upgrade 8.4->9.0->9.1 done using pg_upgrade. Hm, I wonder what pg_upgrade left relpages/reltuples set to ... > INFO: "lassource": found 0 removable, 0 nonremovable row versions in 0 > out of 6451184 pages VACUUM is deciding that the table is all-visible and it need not scan any of it ... > INFO: "lassource": scanned 30000 of 6451184 pages, containing 270000 live > rows and 0 dead rows; 30000 rows in sample, 806239 estimated total rows ... and then ANALYZE is scanning only 30000 of 6451184 pages, so even though it knows the correct tuple density in those pages, it's not licensed to change the reltuples total very much. If reltuples is way off base to start with, then it will take a lot of ANALYZE cycles to converge on a saner value. However, this doesn't leave us much closer to understanding why it's off base to start with. regards, tom lane
I wrote: > "Sergey E. Koposov" <math@sai.msu.ru> writes: >> I'm seeing something weird which looks like a bug in 9.1rc1 after the >> upgrade 8.4->9.0->9.1 done using pg_upgrade. > Hm, I wonder what pg_upgrade left relpages/reltuples set to ... Sure enough, that's the problem. pg_upgrade leaves relpages/reltuples set to zero, but it also imports the visibility map pages from the old cluster. If the old visibility map shows the table as all-visible, then this happens when you try to VACUUM ANALYZE the table: 1. VACUUM doesn't process any pages, so it has no tuple density estimate. It leaves reltuples set to zero, but it does set relpages. 2. ANALYZE scans some part of the table. It gets a tuple density estimate for those pages ... but if that's only a small fraction of the table, it believes the zero estimate of tuple density elsewhere. So you get only a small update of reltuples. (The above behavior is new as of commit b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8, BTW.) Basically, step 1 is buggy here: if we aren't making an update to reltuples, we shouldn't set relpages either. Setting it nonzero changes the implied tuple density from "unknown" to "known zero", which is wrong. I'll go fix that, but I think it might be a good idea for pg_upgrade to think about preserving the relpages/reltuples columns ... regards, tom lane PS: right now, you cannot reproduce this in a 9.0 -> HEAD upgrade, because of this patch: commit 00a7c9014a8fbb7388a807daeba3e0a85b49a747 Author: Bruce Momjian <bruce@momjian.us> Date: Fri Aug 19 11:20:30 2011 -0400 In pg_upgrade, don't copy visibility map files from clusters that did not have crash-safe visibility maps to clustersthat expect crash-safety. Request from Robert Haas. I did reproduce it in a 9.0->9.1 test.
On Tue, 30 Aug 2011, Tom Lane wrote: > Sure enough, that's the problem. pg_upgrade leaves relpages/reltuples > set to zero, but it also imports the visibility map pages from the old > cluster. If the old visibility map shows the table as all-visible, > then this happens when you try to VACUUM ANALYZE the table: > > 1. VACUUM doesn't process any pages, so it has no tuple density > estimate. It leaves reltuples set to zero, but it does set relpages. > > 2. ANALYZE scans some part of the table. It gets a tuple density > estimate for those pages ... but if that's only a small fraction of > the table, it believes the zero estimate of tuple density elsewhere. > So you get only a small update of reltuples. Thanks for figuring this out. I wonder what should be the best way to proceed for already migrated databases -- running analyze repeatedly may not be the best way for very large clusters with large tables... S ******************************************************************* Sergey E. Koposov, PhD Institute for Astronomy, Cambridge/Sternberg Astronomical Institute Web: http://lnfm1.sai.msu.ru/~math E-mail: math@sai.msu.ru
"Sergey E. Koposov" <math@sai.msu.ru> writes: > I wonder what should be the best way to proceed for already migrated > databases -- running analyze repeatedly may not be the best way for > very large clusters with large tables... You can just manually update pg_class.reltuples to a more reasonable value. regards, tom lane
Tom Lane wrote: > I wrote: > > "Sergey E. Koposov" <math@sai.msu.ru> writes: > >> I'm seeing something weird which looks like a bug in 9.1rc1 after the > >> upgrade 8.4->9.0->9.1 done using pg_upgrade. > > > Hm, I wonder what pg_upgrade left relpages/reltuples set to ... > > Sure enough, that's the problem. pg_upgrade leaves relpages/reltuples > set to zero, but it also imports the visibility map pages from the old > cluster. If the old visibility map shows the table as all-visible, > then this happens when you try to VACUUM ANALYZE the table: > > 1. VACUUM doesn't process any pages, so it has no tuple density > estimate. It leaves reltuples set to zero, but it does set relpages. > > 2. ANALYZE scans some part of the table. It gets a tuple density > estimate for those pages ... but if that's only a small fraction of > the table, it believes the zero estimate of tuple density elsewhere. > So you get only a small update of reltuples. > > (The above behavior is new as of commit > b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8, BTW.) > > Basically, step 1 is buggy here: if we aren't making an update to > reltuples, we shouldn't set relpages either. Setting it nonzero > changes the implied tuple density from "unknown" to "known zero", > which is wrong. > > I'll go fix that, but I think it might be a good idea for pg_upgrade > to think about preserving the relpages/reltuples columns ... pg_upgrade currently only restores some oids and frozenxids. We would need to modify pg_dump --binary-upgrade mode to restore those values --- it isn't hard to do. > PS: right now, you cannot reproduce this in a 9.0 -> HEAD upgrade, > because of this patch: > > commit 00a7c9014a8fbb7388a807daeba3e0a85b49a747 > Author: Bruce Momjian <bruce@momjian.us> > Date: Fri Aug 19 11:20:30 2011 -0400 > > In pg_upgrade, don't copy visibility map files from clusters that did not > have crash-safe visibility maps to clusters that expect crash-safety. > > Request from Robert Haas. > > I did reproduce it in a 9.0->9.1 test. Right, that is expected. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +