Thread: pg_stats.avg_width differs by a factor of 4 on different machines
Hi list
We are using the PostgreSQL pg_stats view to estimate file sizes for some geodata exports. However, the following query gives us totally different results on different servers:
PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Ubuntu 4.3.2-1ubuntu11) 4.3.2
PostgreSQL 8.2.9 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)
I couldn't find anything in the 8.3 release notes that looked relevant. Any help appreciated.
Regards
Craig de Stigter
--
Koordinates Ltd
PO Box 1604, Shortland St, Auckland, New Zealand
Phone +64-9-966 0433 Fax +64-9-969 0045
Web http://www.koordinates.com
We are using the PostgreSQL pg_stats view to estimate file sizes for some geodata exports. However, the following query gives us totally different results on different servers:
select avg_width from pg_stats where tablename='some_geodata' and attname = 'GEOMETRY';
PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Ubuntu 4.3.2-1ubuntu11) 4.3.2
81803
PostgreSQL 8.2.9 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)
20450
Both tables have had VACUUM FULL ANALYZE run on them and have identical data. Note that 81803 is almost exactly 4x20450, though I don't know what significance this has. x64/i386 makes no difference.
I couldn't find anything in the 8.3 release notes that looked relevant. Any help appreciated.
Regards
Craig de Stigter
--
Koordinates Ltd
PO Box 1604, Shortland St, Auckland, New Zealand
Phone +64-9-966 0433 Fax +64-9-969 0045
Web http://www.koordinates.com
Craig de Stigter <craig.destigter@koordinates.com> writes: > We are using the PostgreSQL pg_stats view to estimate file sizes for some > geodata exports. However, the following query gives us totally different > results on different servers: > select avg_width from pg_stats where tablename='some_geodata' and attname = > 'GEOMETRY'; I'm afraid that query is pretty much completely useless for what you want to do. What it should be giving you is the average width of the field values on-disk, which is to say after compression and toasting. It would probably be all right for narrow columns but it's likely to be a huge underestimate of the external textual size for wide field values. Having said that, though, these numbers make no sense to me: > PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Ubuntu > 4.3.2-1ubuntu11) 4.3.2 >> 81803 > PostgreSQL 8.2.9 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2 > (Ubuntu 4.1.2-0ubuntu4) >> 20450 It should be impossible to get a value larger than the block size, or even more than about a quarter of the block size because that's where TOAST will start doing its thing. Are you running modified source code? regards, tom lane
This query was giving us good-enough results on our old system. The estimates don't have to be absolutely accurate, just ballpark figures. Also we are estimating the size of zipped shapefiles, not textual geometries. Our tests show that such sizes are quite accurate for medium/large datasets when we multiply the avg_width by a precalculated constant factor. Does anyone have any idea why these numbers would be 4 times as big in Postgres 8.3.7 ? If not I may end up with a dirty hack along the lines of: if POSTGRES_VERSION >= '8.3': size_estimate /= 4.0 ;) On 5/29/09, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Craig de Stigter <craig.destigter@koordinates.com> writes: >> We are using the PostgreSQL pg_stats view to estimate file sizes for some >> geodata exports. However, the following query gives us totally different >> results on different servers: > >> select avg_width from pg_stats where tablename='some_geodata' and attname >> = >> 'GEOMETRY'; > > I'm afraid that query is pretty much completely useless for what you > want to do. What it should be giving you is the average width of the > field values on-disk, which is to say after compression and toasting. > It would probably be all right for narrow columns but it's likely to be > a huge underestimate of the external textual size for wide field values. > > Having said that, though, these numbers make no sense to me: > >> PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real >> (Ubuntu >> 4.3.2-1ubuntu11) 4.3.2 >>> 81803 > >> PostgreSQL 8.2.9 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2 >> (Ubuntu 4.1.2-0ubuntu4) >>> 20450 > > It should be impossible to get a value larger than the block size, or > even more than about a quarter of the block size because that's where > TOAST will start doing its thing. Are you running modified source code? > > regards, tom lane > -- Koordinates Ltd PO Box 1604, Shortland St, Auckland, New Zealand Phone +64-9-966 0433 Fax +64-9-969 0045 Web http://www.koordinates.com
Craig de Stigter <craig.destigter@koordinates.com> writes: > Does anyone have any idea why these numbers would be 4 times as big in > Postgres 8.3.7 ? It still doesn't make any sense to me that you're getting values larger than the BLCKSZ. If you look into where that's coming from you might get a clue what's up with the version difference. regards, tom lane