Thread: pg_stats.avg_width differs by a factor of 4 on different machines

pg_stats.avg_width differs by a factor of 4 on different machines

From
Craig de Stigter
Date:
Hi list

We are using the PostgreSQL pg_stats view to estimate file sizes for some geodata exports. However, the following query gives us totally different results on different servers:

select avg_width from pg_stats where tablename='some_geodata' and attname = 'GEOMETRY';

PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Ubuntu 4.3.2-1ubuntu11) 4.3.2
81803

PostgreSQL 8.2.9 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)
20450

Both tables have had VACUUM FULL ANALYZE run on them and have identical data. Note that 81803 is almost exactly 4x20450, though I don't know what significance this has. x64/i386 makes no difference.

I couldn't find anything in the 8.3 release notes that looked relevant. Any help appreciated.

Regards
Craig de Stigter

--
Koordinates Ltd
PO Box 1604, Shortland St, Auckland, New Zealand
Phone +64-9-966 0433 Fax +64-9-969 0045
Web http://www.koordinates.com
Craig de Stigter <craig.destigter@koordinates.com> writes:
> We are using the PostgreSQL pg_stats view to estimate file sizes for some
> geodata exports. However, the following query gives us totally different
> results on different servers:

> select avg_width from pg_stats where tablename='some_geodata' and attname =
> 'GEOMETRY';

I'm afraid that query is pretty much completely useless for what you
want to do.  What it should be giving you is the average width of the
field values on-disk, which is to say after compression and toasting.
It would probably be all right for narrow columns but it's likely to be
a huge underestimate of the external textual size for wide field values.

Having said that, though, these numbers make no sense to me:

> PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Ubuntu
> 4.3.2-1ubuntu11) 4.3.2
>> 81803

> PostgreSQL 8.2.9 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2
> (Ubuntu 4.1.2-0ubuntu4)
>> 20450

It should be impossible to get a value larger than the block size, or
even more than about a quarter of the block size because that's where
TOAST will start doing its thing.  Are you running modified source code?

            regards, tom lane

Re: pg_stats.avg_width differs by a factor of 4 on different machines

From
Craig de Stigter
Date:
This query was giving us good-enough results on our old system. The
estimates don't have to be absolutely accurate, just ballpark figures.
Also we are estimating the size of zipped shapefiles, not textual
geometries.

Our tests show that such sizes are quite accurate for medium/large
datasets when we multiply the avg_width by a precalculated constant
factor.

Does anyone have any idea why these numbers would be 4 times as big in
Postgres 8.3.7 ?

If not I may end up with a dirty hack along the lines of:
if POSTGRES_VERSION >= '8.3':
    size_estimate /= 4.0

;)

On 5/29/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Craig de Stigter <craig.destigter@koordinates.com> writes:
>> We are using the PostgreSQL pg_stats view to estimate file sizes for some
>> geodata exports. However, the following query gives us totally different
>> results on different servers:
>
>> select avg_width from pg_stats where tablename='some_geodata' and attname
>> =
>> 'GEOMETRY';
>
> I'm afraid that query is pretty much completely useless for what you
> want to do.  What it should be giving you is the average width of the
> field values on-disk, which is to say after compression and toasting.
> It would probably be all right for narrow columns but it's likely to be
> a huge underestimate of the external textual size for wide field values.
>
> Having said that, though, these numbers make no sense to me:
>
>> PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real
>> (Ubuntu
>> 4.3.2-1ubuntu11) 4.3.2
>>> 81803
>
>> PostgreSQL 8.2.9 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2
>> (Ubuntu 4.1.2-0ubuntu4)
>>> 20450
>
> It should be impossible to get a value larger than the block size, or
> even more than about a quarter of the block size because that's where
> TOAST will start doing its thing.  Are you running modified source code?
>
>             regards, tom lane
>


--
Koordinates Ltd
PO Box 1604, Shortland St, Auckland, New Zealand
Phone +64-9-966 0433 Fax +64-9-969 0045
Web http://www.koordinates.com

Re: pg_stats.avg_width differs by a factor of 4 on different machines

From
Tom Lane
Date:
Craig de Stigter <craig.destigter@koordinates.com> writes:
> Does anyone have any idea why these numbers would be 4 times as big in
> Postgres 8.3.7 ?

It still doesn't make any sense to me that you're getting values larger
than the BLCKSZ.  If you look into where that's coming from you might
get a clue what's up with the version difference.

            regards, tom lane