Thread: Database Bloat

Database Bloat

From
elliott
Date:
Hi,

I am using PostgreSQL 9.1 and loading very large tables ( 13 million
rows each ).  The flat file size is only 25M.  However, the equivalent
database table is 548MB.  This is without any indexes applied and auto
vacuum  turned on.  I have read that the bloat can be around 5 times
greater for tables than flat files so over 20 times seems quite excessive.

Any ideas on how to go about decreasing this bloat or is this not
unexpected for such large tables?

Thanks


Re: Database Bloat

From
John R Pierce
Date:
On 08/20/12 10:53 AM, elliott wrote:
> Hi,
>
> I am using PostgreSQL 9.1 and loading very large tables ( 13 million
> rows each ).  The flat file size is only 25M.  However, the equivalent
> database table is 548MB.  This is without any indexes applied and auto
> vacuum  turned on.  I have read that the bloat can be around 5 times
> greater for tables than flat files so over 20 times seems quite
> excessive.
>
> Any ideas on how to go about decreasing this bloat or is this not
> unexpected for such large tables?


what do the fields of this table look like (output from \d tablename
would be useful)


--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast



Re: Database Bloat

From
"David Johnston"
Date:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of elliott
> Sent: Monday, August 20, 2012 1:54 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Database Bloat
>
> Hi,
>
> I am using PostgreSQL 9.1 and loading very large tables ( 13 million rows
each
> ).  The flat file size is only 25M.  However, the equivalent database
table is
> 548MB.  This is without any indexes applied and auto vacuum  turned on.  I
> have read that the bloat can be around 5 times greater for tables than
flat
> files so over 20 times seems quite excessive.
>
> Any ideas on how to go about decreasing this bloat or is this not
unexpected
> for such large tables?
>
> Thanks
>

Kinda guessing here but that 5x estimate has some assumptions built in.  I
am guessing that a table that has many large plain-text data would compress
more than one with a mixture of numbers, varchars, dates and other
less-compressible datatypes.

It would help to provide a general description of the structure of said
tables and how large individual fields (if bytea or text) tend to be.

I would think that filesystem parameters come into play as well and you do
not specify the OS on which you are running.

Do you have any idea which specific tables are "bloated"?  If there are only
a few main contributors what is different about them?

More questions than answers but something to ponder while you wait for more
knowledgeable persons to respond.

David J.




Re: Database Bloat

From
elliott
Date:
envdb=# \d astgtm2_n60e073;
Table "public.astgtm2_n60e073"
  Column |  Type   | Modifiers
--------+---------+-----------
  lat    | real    |
  lon    | real    |
  alt    | integer |
Indexes:
     "q3c_astgtm2_n60e073_idx" btree (q3c_ang2ipix(lon, lat)) CLUSTER


On 8/20/2012 2:10 PM, John R Pierce wrote:
> On 08/20/12 10:53 AM, elliott wrote:
>> Hi,
>>
>> I am using PostgreSQL 9.1 and loading very large tables ( 13 million
>> rows each ).  The flat file size is only 25M.  However, the
>> equivalent database table is 548MB.  This is without any indexes
>> applied and auto vacuum  turned on.  I have read that the bloat can
>> be around 5 times greater for tables than flat files so over 20 times
>> seems quite excessive.
>>
>> Any ideas on how to go about decreasing this bloat or is this not
>> unexpected for such large tables?
>
>
> what do the fields of this table look like (output from \d tablename
> would be useful)
>
>



Re: Database Bloat

From
John R Pierce
Date:
On 08/20/12 11:46 AM, elliott wrote:
> envdb=# \d astgtm2_n60e073;
> Table "public.astgtm2_n60e073"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  lat    | real    |
>  lon    | real    |
>  alt    | integer |
> Indexes:
>     "q3c_astgtm2_n60e073_idx" btree (q3c_ang2ipix(lon, lat)) CLUSTER

so, you DO have an index.    what type does this function
q3c_ang2ipix(real,real) return ?  googling it suggested a bigint, which
means every 12 byte (real,real,int) row has a corresponding b-tree entry
of 8 bytes plus b-tree stuff.   I see you used cluster on this, did you
do the CLUSTER after populating the table, and before checking the
pg_total_relation_size that you reported as 20X your data ?





--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast



Re: Database Bloat

From
Merlin Moncure
Date:
On Mon, Aug 20, 2012 at 2:33 PM, John R Pierce <pierce@hogranch.com> wrote:
> On 08/20/12 11:46 AM, elliott wrote:
>>
>> envdb=# \d astgtm2_n60e073;
>> Table "public.astgtm2_n60e073"
>>  Column |  Type   | Modifiers
>> --------+---------+-----------
>>  lat    | real    |
>>  lon    | real    |
>>  alt    | integer |
>> Indexes:
>>     "q3c_astgtm2_n60e073_idx" btree (q3c_ang2ipix(lon, lat)) CLUSTER
>
>
> so, you DO have an index.    what type does this function
> q3c_ang2ipix(real,real) return ?  googling it suggested a bigint, which
> means every 12 byte (real,real,int) row has a corresponding b-tree entry of
> 8 bytes plus b-tree stuff.   I see you used cluster on this, did you do the
> CLUSTER after populating the table, and before checking the
> pg_total_relation_size that you reported as 20X your data ?

Apparently it returns a bigint:
https://www.google.com/search?q=q3c_ang2ipix&rlz=1C1CHKZ_enUS437US437&sugexp=chrome,mod=17&sourceid=chrome&ie=UTF-8

Anyways, the overhead for vary narrow tuples is going to be higher for
very narrow tables.  Your tuple is only 12 bytes.   Each tuple  has 23
bytes of overhead typically see:
(http://www.postgresql.org/docs/9.1/interactive/storage-page-layout.html).

Also, measuring total database size is pretty silly because there's a
number of things that are global and don't increase as your data
increases.  To get table size, try doing

SELECT pg_size_pretty(pg_relation_size('astgtm2_n60e073'));
SELECT pg_size_pretty(pg_relation_size('q3c_astgtm2_n60e073_idx'));

merlin


Re: Database Bloat

From
Jerry Sievers
Date:
elliott <elliott@cpi.com> writes:

> Hi,
>
> I am using PostgreSQL 9.1 and loading very large tables ( 13 million
> rows each ).  The flat file size is only 25M.  However, the equivalent
> database table is 548MB.  This is without any indexes applied and auto
> vacuum  turned on.  I have read that the bloat can be around 5 times
> greater for tables than flat files so over 20 times seems quite
> excessive.
>
> Any ideas on how to go about decreasing this bloat or is this not
> unexpected for such large tables?

Well, check if the table has a low fill-factor setting.

\d+ footable

Hasn't come up yet in the remarks by others on this thread but worth verifying.

> Thanks
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 732.216.7255


Re: Database Bloat

From
Jeff Janes
Date:
On Mon, Aug 20, 2012 at 10:53 AM, elliott <elliott@cpi.com> wrote:
> Hi,
>
> I am using PostgreSQL 9.1 and loading very large tables ( 13 million rows
> each ).  The flat file size is only 25M.

That is only 2 bytes per row.  Is the size given for the flat file for
a compressed file?

Cheers,

Jeff


Re: Database Bloat

From
elliott
Date:
Yes, it is a tif file.  Uncompressed it is around 85M.

On 8/22/2012 1:20 PM, Jeff Janes wrote:
> On Mon, Aug 20, 2012 at 10:53 AM, elliott<elliott@cpi.com>  wrote:
>
>> Hi,
>>
>> I am using PostgreSQL 9.1 and loading very large tables ( 13 million rows
>> each ).  The flat file size is only 25M.
>>
> That is only 2 bytes per row.  Is the size given for the flat file for
> a compressed file?
>
> Cheers,
>
> Jeff
>
>



Re: Database Bloat

From
Merlin Moncure
Date:
On Wed, Aug 22, 2012 at 12:25 PM, elliott <elliott@cpi.com> wrote:
> Yes, it is a tif file.  Uncompressed it is around 85M.

ok, 85 -> 548mb is reasonable considering you have very narrow rows
and an index that covers 2/3 of your column data.   if you want to see
dramatic reduction in table size, you probably need to explore use of
arrays in some fashion.

merlin


Re: Database Bloat

From
John R Pierce
Date:
On 08/22/12 10:25 AM, elliott wrote:
> Yes, it is a tif file.  Uncompressed it is around 85M.

a tif file is a pixel map image, eg, graphics, no?     I thought we were
talking about CSV data here?



--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast



Re: Database Bloat

From
"David Johnston"
Date:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of John R Pierce
> Sent: Wednesday, August 22, 2012 4:32 PM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Database Bloat
>
> On 08/22/12 10:25 AM, elliott wrote:
> > Yes, it is a tif file.  Uncompressed it is around 85M.
>
> a tif file is a pixel map image, eg, graphics, no?     I thought we were
> talking about CSV data here?
>

If I had to hazard a guess I think he meant "TAR" since he is talking about
compression...

David J.