Thread: database is bigger after dump/restore - why? (60 GB to 109 GB)

database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Aleksey Tsalolikhin
Date:
Hi.  Last week our 60 GB database (per psql \l+) was (re-)replicated to
the DR site using SlonyI, and arrived 109 GB in size which caused a
problem as it filled up the filesystem on the DR server - we expected the
DR database to be the same size.  Mystery.

Now just past weekend we upgraded our production server by pg_dump
and pg_restore, and again the database is 109 GB in size!

Most of our data is in a single table, which on the old server is 50 GB in
size and on the new server is 100 GB in size.

Could you please help us understand why a COPY of the data into a new
database (whether DR or the new server) results in different disk usage?

Somebody mentioned on the Slony users list that there is a kind of padding
that goes in that actually helps performance.

Is there a way to track disk usage MINUS the padding?

Thanks,
Aleksey

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Aleksey Tsalolikhin
Date:
I've dumped the big table from the original database (where it is 61 GB in size)
and am restoring it into a test database to see what the size is after
the restore.

As it is now, our DR is offline because we did not expect the database to
nearly double in size upon COPY of the data into a new database.  Would like to
understand what is going on.    And would like to not have such a swell of data
upon transfer.  Is there anything I can do, please?

Best,
Aleksey

On Tue, Feb 22, 2011 at 12:44 PM, Aleksey Tsalolikhin
<atsaloli.tech@gmail.com> wrote:
> Hi.  Last week our 60 GB database (per psql \l+) was (re-)replicated to
> the DR site using SlonyI, and arrived 109 GB in size which caused a
> problem as it filled up the filesystem on the DR server - we expected the
> DR database to be the same size.  Mystery.
>
> Now just past weekend we upgraded our production server by pg_dump
> and pg_restore, and again the database is 109 GB in size!
>
> Most of our data is in a single table, which on the old server is 50 GB in
> size and on the new server is 100 GB in size.
>
> Could you please help us understand why a COPY of the data into a new
> database (whether DR or the new server) results in different disk usage?
>
> Somebody mentioned on the Slony users list that there is a kind of padding
> that goes in that actually helps performance.
>
> Is there a way to track disk usage MINUS the padding?
>
> Thanks,
> Aleksey
>

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
tv@fuzzy.cz
Date:
Hi Aleksey,

I've read your previous post, and although I'm not quite sure what is the
root cause, I have some questions and (maybe wild) guesses ...

1) Are those two machines (primary and DR) exactly the same? I mean CPU,
filesystem (including blocksize etc.)?

2) What about database encoding? I think that ASCII on primary and UTF-8
on the DR might have result in such difference in size (but maybe I'm
wrong).

3) How exactly have you measured the size of the database? Filesystem
(using 'du'), or from pg_* views?

4) The fact that a dump file is significantly smaller than the database is
not a big surprise - the reason is that some values are 'shorter' in ASCII
than in the original form (e.g. 32-bit integer 0 is encoded as a single
char '0' which means 1B instead of 4B).

5) Have you tried to use pageinspect contrib module? It allows you to find
out exactly how the data items are organized on a page, so I'd recommend
this:

   a) compare table sizes using system catalogs, select the one with
greatest difference

   b) use pageinspect to display details about one block of the table in
both databases and compare the result

Maybe this will help you to find out the real cause.

regards
Tomas

> I've dumped the big table from the original database (where it is 61 GB in
> size)
> and am restoring it into a test database to see what the size is after
> the restore.
>
> As it is now, our DR is offline because we did not expect the database to
> nearly double in size upon COPY of the data into a new database.  Would
> like to
> understand what is going on.    And would like to not have such a swell of
> data
> upon transfer.  Is there anything I can do, please?
>
> Best,
> Aleksey
>
> On Tue, Feb 22, 2011 at 12:44 PM, Aleksey Tsalolikhin
> <atsaloli.tech@gmail.com> wrote:
>> Hi.  Last week our 60 GB database (per psql \l+) was (re-)replicated to
>> the DR site using SlonyI, and arrived 109 GB in size which caused a
>> problem as it filled up the filesystem on the DR server - we expected
>> the
>> DR database to be the same size.  Mystery.
>>
>> Now just past weekend we upgraded our production server by pg_dump
>> and pg_restore, and again the database is 109 GB in size!
>>
>> Most of our data is in a single table, which on the old server is 50 GB
>> in
>> size and on the new server is 100 GB in size.
>>
>> Could you please help us understand why a COPY of the data into a new
>> database (whether DR or the new server) results in different disk usage?
>>
>> Somebody mentioned on the Slony users list that there is a kind of
>> padding
>> that goes in that actually helps performance.
>>
>> Is there a way to track disk usage MINUS the padding?
>>
>> Thanks,
>> Aleksey
>>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
marcin mank
Date:
On Tue, Feb 22, 2011 at 9:44 PM, Aleksey Tsalolikhin
<atsaloli.tech@gmail.com> wrote:

> Most of our data is in a single table, which on the old server is 50 GB in
> size and on the new server is 100 GB in size.
>

Maybe the table the on new server has fillfactor less than 100 ?

Greetings
Marcin

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Richard Huxton
Date:
On 24/02/11 15:24, marcin mank wrote:
> On Tue, Feb 22, 2011 at 9:44 PM, Aleksey Tsalolikhin
> <atsaloli.tech@gmail.com>  wrote:
>
>> Most of our data is in a single table, which on the old server is 50 GB in
>> size and on the new server is 100 GB in size.
>>
>
> Maybe the table the on new server has fillfactor less than 100 ?

That would be my immediate guess. Someone changed the fillfactor on the
table - that won't affect the existing data but will affect a restore.

--
   Richard Huxton
   Archonet Ltd

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Aleksey Tsalolikhin
Date:
Hi.  Thanks for your replies.

How do I check the fillfactor on the table, please?

(http://www.postgresql.org/docs/8.4/static/sql-createtable.html tells me how
to set it, but I haven't found yet how to read it.)

Same CPU, same filesystem, same blocksize - identical systems.  Same model
of server.  We made them identical on purpose.

The way I check table size including TOAST and Indexes is:

SELECT relname as "Table",
pg_size_pretty(pg_total_relation_size(relid)) As "Size" from
pg_catalog.pg_statio_user_tables ORDER BY
pg_total_relation_size(relid) DESC;

My largest table is 50 GB in size; when I pg_dump it, and then
pg_restore it, it becomes 100 GB in size.

How do I pg_restore it so that it is 50 GB in size?  Is it a setting
to pg_dump or to pg_restore?

Aleksey

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
John R Pierce
Date:
On 02/24/11 11:02 AM, Aleksey Tsalolikhin wrote:

> How do I check the fillfactor on the table, please?

its in the field reloptions in pg_class.   so...

     select reloptions from pg_class where relname='tablename';

if tablename is non-unique, you'll need to qualify that with the OID of
the namespace (aka schema).

> My largest table is 50 GB in size; when I pg_dump it, and then
> pg_restore it, it becomes 100 GB in size.
>
> How do I pg_restore it so that it is 50 GB in size?  Is it a setting
> to pg_dump or to pg_restore?
>

are you truncating the table before restoring, or is this a restore into
a new database, or what?




Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Aleksey Tsalolikhin
Date:
On Thu, Feb 24, 2011 at 11:46 AM, John R Pierce <pierce@hogranch.com> wrote:
> On 02/24/11 11:02 AM, Aleksey Tsalolikhin wrote:
>
>> How do I check the fillfactor on the table, please?
>
> its in the field reloptions in pg_class.   so...
>
>    select reloptions from pg_class where relname='tablename';

Thanks, John!

autovacuum_enabled=true is the only option set on this table
on both the source database and the target.

>> How do I pg_restore it so that it is 50 GB in size?  Is it a setting
>> to pg_dump or to pg_restore?
>>
>
> are you truncating the table before restoring, or is this a restore into a
> new database, or what?

I've tried both.  Slony truncates the table before copying it over, and I've
tryind pg_restore'ing it into a new database.  In both cases, the 50 GB
table arrives as a 100 GB table.

Aleksey

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Alex Hunsaker
Date:
On Thu, Feb 24, 2011 at 14:11, Aleksey Tsalolikhin
<atsaloli.tech@gmail.com> wrote:

>> are you truncating the table before restoring, or is this a restore into a
>> new database, or what?
>
> I've tried both.  Slony truncates the table before copying it over, and I've
> tryind pg_restore'ing it into a new database.  In both cases, the 50 GB
> table arrives as a 100 GB table.

Are they both the same version of pg? I'm wondering if the 50GB
version is running 8.3 and the slave something >=8.4, IIRC I had a
table that grew quite a bit because of some TOAST changes in 8.4. If
so, I'd first try making the slave and master versions of pg match,
which is good practice anyhow.

If none of the above ring any bells, maybe some bloat is creeping in.
Check_postgres (http://bucardo.org/wiki/Check_postgres) has some nice
metrics for that. (I suppose you could also try CLUSTER and see ff
that makes a size difference).

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Adrian Klaver
Date:
On Thursday, February 24, 2011 1:11:44 pm Aleksey Tsalolikhin wrote:
> On Thu, Feb 24, 2011 at 11:46 AM, John R Pierce <pierce@hogranch.com> wrote:
> > On 02/24/11 11:02 AM, Aleksey Tsalolikhin wrote:
> >> How do I check the fillfactor on the table, please?
> >
> > its in the field reloptions in pg_class.   so...
> >
> >    select reloptions from pg_class where relname='tablename';
>
> Thanks, John!
>
> autovacuum_enabled=true is the only option set on this table
> on both the source database and the target.
>
> >> How do I pg_restore it so that it is 50 GB in size?  Is it a setting
> >> to pg_dump or to pg_restore?
> >
> > are you truncating the table before restoring, or is this a restore into
> > a new database, or what?
>
> I've tried both.  Slony truncates the table before copying it over, and
> I've tryind pg_restore'ing it into a new database.  In both cases, the 50
> GB table arrives as a 100 GB table.
>
> Aleksey

What is the schema for this table and associated indexes?
Or if that is not available what is in the table?
Is the data the same in both for the problem table? I am somewhat suspicious
that the second db table is twice the size of the first. Almost like the data is
being duplicated.
From above what is the size of the pg_dump for that table only when you dump
from the original table versus from the table in the second db?


--
Adrian Klaver
adrian.klaver@gmail.com

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Aleksey Tsalolikhin
Date:
Hi.  We're running Postgres 8.4.4 everywhere.

I already have a pg_dump -Fc of the big table from the source, now
I am running a pg_dump -Fc on the recipient, to see if the size is different.

Then I will run a pg_dump as text, so I can diff the two files if they are
different in size.

Thanks!!
Aleksey

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Adrian Klaver
Date:
On Thursday, February 24, 2011 3:34:02 pm Aleksey Tsalolikhin wrote:
> Hi.  We're running Postgres 8.4.4 everywhere.
>
> I already have a pg_dump -Fc of the big table from the source, now
> I am running a pg_dump -Fc on the recipient, to see if the size is
> different.

I thought you already had a pg_dump file that you where restoring to the second
db?

>
> Then I will run a pg_dump as text, so I can diff the two files if they are
> different in size.

You don't need to do that if the pg_dump was done using -Fc. You can use
pg_restore to dump a table to a file instead of a database. When it does that
the file will contain a plain text copy. Something like:

pg_restore -a -t really_big_table -f really_big_table_data.sql

Where -a is data only

>
> Thanks!!
> Aleksey

--
Adrian Klaver
adrian.klaver@gmail.com

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Adrian Klaver
Date:
On Thursday, February 24, 2011 3:48:35 pm Adrian Klaver wrote:
> On Thursday, February 24, 2011 3:34:02 pm Aleksey Tsalolikhin wrote:
> > Hi.  We're running Postgres 8.4.4 everywhere.
> >
> > I already have a pg_dump -Fc of the big table from the source, now
> > I am running a pg_dump -Fc on the recipient, to see if the size is
> > different.
>
> I thought you already had a pg_dump file that you where restoring to the
> second db?
>
> > Then I will run a pg_dump as text, so I can diff the two files if they
> > are different in size.
>
> You don't need to do that if the pg_dump was done using -Fc. You can use
> pg_restore to dump a table to a file instead of a database. When it does
    Oops typo   ^^^^ should be restore

> that the file will contain a plain text copy. Something like:
>
> pg_restore -a -t really_big_table -f really_big_table_data.sql
>
> Where -a is data only
>
> > Thanks!!
> > Aleksey

--
Adrian Klaver
adrian.klaver@gmail.com

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Sergey Burladyan
Date:
Aleksey Tsalolikhin <atsaloli.tech@gmail.com> writes:

> Now just past weekend we upgraded our production server by pg_dump
> and pg_restore, and again the database is 109 GB in size!

Which character encoding of the source and target database?

--
Sergey Burladyan

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Aleksey Tsalolikhin
Date:
OK, just to recap:

database A has a table that is 50 GB in size (according to:

SELECT relname as "Table",
   pg_size_pretty(pg_total_relation_size(relid)) As "Size" from
   pg_catalog.pg_statio_user_tables ORDER BY
   pg_total_relation_size(relid) DESC;
)


I pg_dump -Fc this table, which gives me a 9.8 GB file.

I then pg_restore this table into database B, which results in a
100 GB table, according to the same SQL query.


Database versions are identical:
A: PostgreSQL 8.4.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc
(GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
B: PostgreSQL 8.4.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc
(GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit


Character encoding of the source and target databases are identical:
UTF8.  (As reported by "psql -l".)

fillfactor is not set for this table, it is the default on both A and
B.  (As reported by "select reloptions from pg_class where
relname='tablename';".)


If I pg_dump -Fc the 100 GB table on database B, I get a 9.9 GB file.

If I do a "select count(*) from tablename",  I get identical results
on A and B (1,628,348 rows).

Adrian asked about the schema for this table.  It is the same on A and B:

                   Table "tablename"
     Column      |           Type           | Modifiers
-----------------+--------------------------+-----------
 column1        | character varying        | not null
 column2    | character varying        |
 column3        | character varying        |
 column4 | character varying        |
 column5   | timestamp with time zone |
 column6        | character varying        |
 column7   | character varying        |
 column8     | character varying        |
 column9      | character varying        |
 column10  | character varying        |
Indexes:
    "tablename_pkey" PRIMARY KEY, btree (column1)
    "tablename_column6_index" btree (column6)
    "tablename_column9_index" btree (device_dnq)
    "tablename_column8_index" btree (kdm_gmt_end)
    "tablename_column7_index" btree (kdm_gmt_start)


When I pg_dump the 50 GB table, I get a 40 GB file.

When I pg_dump the 100 GB table, I get a 40 GB file.

so looks like it's something low-level, something about how the data is stored.

i've installed the "pageinspect" contrib module as Tomas suggested but
I don't know what to do with it or what to look at.  I looked at the
manual for it but it's totally over my head right now.

What sections of the manual should I read to be able to use this
module?  (there are 2167 pages in the whole Postgres 8.4 manual and I
don't have time to study the whole thing.  :(   I just need to study
enough to understand how to use pageinspect.)

(I'm not a Postgres DBA but I am having to become one to support our database.)

Thanks again for the help.   I'll be at SCALE 9x tomorrow helping Joe
Conway and Bruce M and Richard B and company get the word out about
Postgres.

Best,
Aleksey

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Alban Hertroys
Date:
On 27 Feb 2011, at 9:49, Aleksey Tsalolikhin wrote:

> Database versions are identical:
> A: PostgreSQL 8.4.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc
> (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
> B: PostgreSQL 8.4.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc
> (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
>
>
> Character encoding of the source and target databases are identical:
> UTF8.  (As reported by "psql -l".)

I noticed in your table definition that you seem to store timestamps in text-fields. Restoring those from text-fields
shouldn'tmake any difference, but perhaps your locales are set up differently between the machines and cause some type
ofconversion to take place? 
I know, that's a pretty wild guess.

> When I pg_dump the 50 GB table, I get a 40 GB file.
>
> When I pg_dump the 100 GB table, I get a 40 GB file.


I think the possible causes of the problem being with the database have been rather exhausted by now. Maybe the
differenceis in how the OS was set-up on each system. So, more questions: 

What type of file-system are you using on each database (for the table in question)?
Are these filesystems configured identically, or does one perhaps have a different block-size than the other?
Is it set up as a raid array? If so, which raid-level?
Are your dumps going to that same file-system, or to a different one?

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4d6a2d1b11731601256477!



Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Tom Lane
Date:
Aleksey Tsalolikhin <atsaloli.tech@gmail.com> writes:
> i've installed the "pageinspect" contrib module as Tomas suggested but
> I don't know what to do with it or what to look at.  I looked at the
> manual for it but it's totally over my head right now.

Personally I'd try pgstattuple first:
http://www.postgresql.org/docs/8.4/static/pgstattuple.html

The stats from that for the table and each index should at least let us
narrow down where the bloat is.

            regards, tom lane

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Adrian Klaver
Date:

On Sunday, February 27, 2011 12:49:48 am Aleksey Tsalolikhin wrote:

> so looks like it's something low-level, something about how the data is

> stored.

>

> i've installed the "pageinspect" contrib module as Tomas suggested but

> I don't know what to do with it or what to look at. I looked at the

> manual for it but it's totally over my head right now.

>

> What sections of the manual should I read to be able to use this

> module? (there are 2167 pages in the whole Postgres 8.4 manual and I

> don't have time to study the whole thing. :( I just need to study

> enough to understand how to use pageinspect.)

>

> (I'm not a Postgres DBA but I am having to become one to support our

> database.)

>

> Thanks again for the help. I'll be at SCALE 9x tomorrow helping Joe

> Conway and Bruce M and Richard B and company get the word out about

> Postgres.

Well it looks like the problem is not with the raw data. So far you have been using pg_total_relation_size to look at the table. It might be time to look at the individual components by using the other pg_* functions:

http://www.postgresql.org/docs/9.0/interactive/functions-admin.html

pg_table_size(regclass)

pg_indexes_size(regclass)

pg_relation_size(relation regclass, fork text)

In other words try to narrow down where the bloat is occurring.

>

> Best,

> Aleksey

--

Adrian Klaver

adrian.klaver@gmail.com

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Aleksey Tsalolikhin
Date:
On Sun, Feb 27, 2011 at 2:52 AM, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:
Thank you for your kind replies.

> I noticed in your table definition that you seem to store timestamps in text-fields. Restoring those from text-fields
shouldn'tmake any difference, but perhaps your locales are set up differently between the machines and cause some type
ofconversion to take place? 

OK, Alban, I'm game.  How would I check how locales are set up?

Adrian, I found pg_indexes_size() is only in 9 (I have 8.4) but I got
the same information from a query based on
http://www.issociate.de/board/post/478501/How_much_space_do_database_objects_take_up_in_data_files.html

I used:

SELECT nspname, relname,
pg_size_pretty(tablesize) AS tablesize, pg_size_pretty(indexsize) AS
indexsize, pg_size_pretty(toastsize) AS toastsize,
pg_size_pretty(toastindexsize) AS toastindexsize
FROM
(SELECT ns.nspname, cl.relname, pg_relation_size(cl.oid) AS tablesize,
COALESCE((SELECT SUM(pg_relation_size(indexrelid))::bigint
FROM pg_index WHERE cl.oid=indrelid), 0) AS indexsize,
CASE WHEN reltoastrelid=0 THEN 0
ELSE pg_relation_size(reltoastrelid)
END AS toastsize,
CASE WHEN reltoastrelid=0 THEN 0
ELSE pg_relation_size((SELECT reltoastidxid FROM pg_class ct
WHERE ct.oid = cl.reltoastrelid))
END AS toastindexsize
FROM pg_class cl, pg_namespace ns
WHERE cl.relnamespace = ns.oid
AND ns.nspname NOT IN ('pg_catalog', 'information_schema')
AND cl.relname IN
(SELECT table_name FROM information_schema.tables
WHERE table_type = 'BASE TABLE')) ss
ORDER BY tablesize+indexsize+toastsize+toastindexsize DESC;


Here is what I see:



        nspname         |             relname              | tablesize
 | indexsize  | toastsize  | toastindexsize
------------------------+----------------------------------+------------+------------+------------+----------------
 public                 | big                              | 744 MB
 | 737 MB     | 48 GB      | 278 MB
 public                 | big                              | 503 MB
 | 387 MB     | 99 GB      | 278 MB


Check out that toastsize delta.   What makes up TOAST?  How can I
compare the two TOAST tables in detail?


Tom suggested pgstattuple:

 table_len | tuple_count | tuple_len | tuple_percent |
dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space |
free_percent

-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
 779689984 |     1628348 | 500584290 |          64.2 |
30111 |        8275133 |               1.06 |  243295444 |
31.2   <-- database A (source, 50 GB)
 527835136 |     1628348 | 500584290 |         94.84 |
0 |              0 |                  0 |    9492072 |          1.8
<-- database B ( target, 100 GB)



I used "dumpe2fs" to check the filesystems - block size is 4096 on both servers.

One filesystem is on a hardware raid device, and one is on a software
raid device.

Thanks,
Aleksey

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Adrian Klaver
Date:
On Monday, February 28, 2011 9:51:10 pm Aleksey Tsalolikhin wrote:
> On Sun, Feb 27, 2011 at 2:52 AM, Alban Hertroys
> <dalroi@solfertje.student.utwente.nl> wrote:
> Thank you for your kind replies.
>
> > I noticed in your table definition that you seem to store timestamps in
> > text-fields. Restoring those from text-fields shouldn't make any
> > difference, but perhaps your locales are set up differently between the
> > machines and cause some type of conversion to take place?
>
> OK, Alban, I'm game.  How would I check how locales are set up?
>
> Adrian, I found pg_indexes_size() is only in 9 (I have 8.4) but I got
> the same information from a query based on
> http://www.issociate.de/board/post/478501/How_much_space_do_database_object
> s_take_up_in_data_files.html

Sorry about that, I was not paying attention. FYI 8.4 does have
pg_relation_size() which can be applied against individual indexes.


>
>
> Here is what I see:
>
>
>
>         nspname         |             relname              | tablesize
>
>  | indexsize  | toastsize  | toastindexsize
>
> ------------------------+----------------------------------+------------+--
> ----------+------------+---------------- public                 | big
>                        | 744 MB
>
>  | 737 MB     | 48 GB      | 278 MB
>
>  public                 | big                              | 503 MB
>
>  | 387 MB     | 99 GB      | 278 MB
>
> Check out that toastsize delta.   What makes up TOAST?  How can I
> compare the two TOAST tables in detail?

TOAST is best explained here:
http://www.postgresql.org/docs/8.4/interactive/storage-toast.html

Looks like the TOAST compression is not working on the second machine. Not sure
how that could come to be. Further investigation underway:)


>
> Thanks,
> Aleksey

--
Adrian Klaver
adrian.klaver@gmail.com

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Adrian Klaver
Date:
On Monday, February 28, 2011 9:51:10 pm Aleksey Tsalolikhin wrote:

>
> Here is what I see:
>
>
>
>         nspname         |             relname              | tablesize
>
>  | indexsize  | toastsize  | toastindexsize
>
> ------------------------+----------------------------------+------------+--
> ----------+------------+---------------- public                 | big
>                        | 744 MB
>
>  | 737 MB     | 48 GB      | 278 MB
>
>  public                 | big                              | 503 MB
>
>  | 387 MB     | 99 GB      | 278 MB
>
> Check out that toastsize delta.   What makes up TOAST?  How can I
> compare the two TOAST tables in detail?
>


The compression/no compression thing tickled a memory. Run \d+ against the table
in question. It should show a storage column with values for each field. Are any
of those set to EXTERNAL instead of the default EXTENDED?


--
Adrian Klaver
adrian.klaver@gmail.com

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Tom Lane
Date:
Adrian Klaver <adrian.klaver@gmail.com> writes:
> Looks like the TOAST compression is not working on the second machine. Not sure
> how that could come to be. Further investigation underway:)

Somebody carelessly messed with the per-column SET STORAGE settings,
perhaps?  Compare pg_attribute.attstorage settings ...

            regards, tom lane

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Aleksey Tsalolikhin
Date:
On Tue, Mar 1, 2011 at 7:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Adrian Klaver <adrian.klaver@gmail.com> writes:
>> Looks like the TOAST compression is not working on the second machine. Not sure
>> how that could come to be. Further investigation underway:)
>
> Somebody carelessly messed with the per-column SET STORAGE settings,
> perhaps?  Compare pg_attribute.attstorage settings ...



Thank you.  I compared the STORAGE settings and I have "extended" on
both databases,
no "external".

Any other ideas?

Yours truly,
-at

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Adrian Klaver
Date:
On Thursday, March 03, 2011 6:15:50 pm Aleksey Tsalolikhin wrote:
> On Tue, Mar 1, 2011 at 7:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Adrian Klaver <adrian.klaver@gmail.com> writes:
> >> Looks like the TOAST compression is not working on the second machine.
> >> Not sure how that could come to be. Further investigation underway:)
> >
> > Somebody carelessly messed with the per-column SET STORAGE settings,
> > perhaps?  Compare pg_attribute.attstorage settings ...
>
> Thank you.  I compared the STORAGE settings and I have "extended" on
> both databases,
> no "external".
>
> Any other ideas?

Weird. The pgstattuple data shows that the tables are essentially the same, the
only difference being the dead tuples, as expected, on the production table. The
TOAST size information shows approximately a doubling in size of the TOASTed
data on the new machine. By all accounts compression or the lack thereof would
be the culprit. At this point I am at a loss for another explanation.

One more request for information. What is the data being stored in the table?

>
> Yours truly,
> -at

--
Adrian Klaver
adrian.klaver@gmail.com

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Aleksey Tsalolikhin
Date:
On Fri, Mar 4, 2011 at 7:53 AM, Adrian Klaver <adrian.klaver@gmail.com> wrote:
>
> Weird. The pgstattuple data shows that the tables are essentially the same, the
> only difference being the dead tuples, as expected, on the production table. The
> TOAST size information shows approximately a doubling in size of the TOASTed
> data on the new machine. By all accounts compression or the lack thereof would
> be the culprit. At this point I am at a loss for another explanation.

An apt summary, thank you.


> One more request for information. What is the data being stored in the table?


For the main part, it's an XML file, we store it in the third field.
Our XML files are
13KB - 48 KB in length.

The other fields are relatively short.

                  Table "tablename"
    Column      |           Type           | Modifiers
-----------------+--------------------------+-----------
 uuid        | character varying        | not null
 filename    | character varying        |
 XML_file        | character varying        |
 another_id | character varying        |
 record_generation_timestamp   | timestamp with time zone |
 another_uuid        | character varying        |
 timestamp_start   | character varying        |
 timestamp_end     | character varying        |
 30_byte_text_id      | character varying        |
 yet_another_text_id  | character varying        |
Indexes:
   "tablename_pkey" PRIMARY KEY, btree (uuid)
   "tablename_column6_index" btree (another_uuid)
   "tablename_column9_index" btree (30_byte_text_id)
   "tablename_column8_index" btree (timestamp_end)
   "tablename_column7_index" btree (timestamp_start)


I ran char_length() on each XML file, the total is 38 GB, which is pretty close
to the 40 GB size of the pg_dump of the table in text format.

From my observation, when the table was under 30 GB in size,
TOAST compression worked fine; when it got above 35 GB in size,
now TOAST compression is not working.

Yours truly,
-at

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Adrian Klaver
Date:
On Friday, March 04, 2011 2:03:23 pm Aleksey Tsalolikhin wrote:
> On Fri, Mar 4, 2011 at 7:53 AM, Adrian Klaver <adrian.klaver@gmail.com> wrote:
> > Weird. The pgstattuple data shows that the tables are essentially the
> > same, the only difference being the dead tuples, as expected, on the
> > production table. The TOAST size information shows approximately a
> > doubling in size of the TOASTed data on the new machine. By all accounts
> > compression or the lack thereof would be the culprit. At this point I am
> > at a loss for another explanation.
>
> An apt summary, thank you.
>
> > One more request for information. What is the data being stored in the
> > table?
>
> For the main part, it's an XML file, we store it in the third field.
> Our XML files are
> 13KB - 48 KB in length.
>

And you are positive that field has not had its storage changed to something
other than EXTENDED?

>
> I ran char_length() on each XML file, the total is 38 GB, which is pretty
> close to the 40 GB size of the pg_dump of the table in text format.
>
> From my observation, when the table was under 30 GB in size,
> TOAST compression worked fine; when it got above 35 GB in size,
> now TOAST compression is not working.

More questions:)
How do you know that?
I thought the size problem only came to light when you tried to set up another
server.
Was there some sort of monitoring going on previous to setting up the new
server?
Anything else happen around that time?



>
> Yours truly,
> -at

--
Adrian Klaver
adrian.klaver@gmail.com

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Aleksey Tsalolikhin
Date:
On Fri, Mar 4, 2011 at 4:45 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote:
> On Friday, March 04, 2011 2:03:23 pm Aleksey Tsalolikhin wrote:
>> On Fri, Mar 4, 2011 at 7:53 AM, Adrian Klaver <adrian.klaver@gmail.com> wrote:
>> >What is the data being stored in the table?
>>
>> For the main part, it's an XML file, we store it in the third field.
>> Our XML files are 13KB - 48 KB in length.
>>
>
> And you are positive that field has not had its storage changed to something
> other than EXTENDED?
Quite.  I just double checked on both servers at it is EXTENDED for
everything but
the timestamp (5th field) which is PLAIN (on both servers).



>> From my observation, when the table was under 30 GB in size,
>> TOAST compression worked fine; when it got above 35 GB in size,
>> now TOAST compression is not working.
>
> More questions:)
> How do you know that?
> I thought the size problem only came to light when you tried to set up another
> server.
> Was there some sort of monitoring going on previous to setting up the new
> server?
> Anything else happen around that time?

OK, I have to withdraw my "observation".  Let me refine it (which I am
afraid makes
it less useful):  I've done pg_dump/restore of the database earlier, when it was
smaller (< 30 GB) and did not notice such a doubling in size; I don't think it
occurred.  Certainly the last time we moved the database from the DR back to
primary site, it did not double in size from 1 GB (it was around 1 GB then).

Here is what I did see:  we've had Slony replication running for a
while (over half
a year) from Primary to DR; and file system utilization on Primary and
DR was about
the same.  (around 75%).

Also, I had done some pg_dump's / pg_restore's from DR to Dev and Stage, and
the database size was about 1:1.

But most recently, I shut down replication, and modified the cluster config
(added several tables and sequences; plus some tables were modified so
I wanted to get a clean start on replicating them).  I removed the slony schemas
and re-created a Slony replication set -- my filesystem was 75% full
on the master,
and it hit 100% on the slave!  So I lost my slave!

Then I tried pg_dump/pg_restore and noticed the same thing, that one table
doubles in size.

Last time I did a full Slony re-sync like this was around 30 GB.  Now
we're up to
40-50 GB and hit the 1:2 factor.

I can't think of anything else happening around this time...

I'm going to try splitting the pg_dump file (40 GB) in half, and load the 20 GB
file, and see how much space it takes up in the database, I'm curious if I can
replicate the 1:2 swelling with this smaller table.

Aleksey

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Adrian Klaver
Date:
On Friday, March 04, 2011 5:11:04 pm Aleksey Tsalolikhin wrote:
> On Fri, Mar 4, 2011 at 4:45 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote:
> > On Friday, March 04, 2011 2:03:23 pm Aleksey Tsalolikhin wrote:
> >> On Fri, Mar 4, 2011 at 7:53 AM, Adrian Klaver <adrian.klaver@gmail.com>
wrote:
> >> >What is the data being stored in the table?
> >>
> >> For the main part, it's an XML file, we store it in the third field.
> >> Our XML files are 13KB - 48 KB in length.
> >
> > And you are positive that field has not had its storage changed to
> > something other than EXTENDED?
>
> Quite.  I just double checked on both servers at it is EXTENDED for
> everything but
> the timestamp (5th field) which is PLAIN (on both servers).
>
> >> From my observation, when the table was under 30 GB in size,
> >> TOAST compression worked fine; when it got above 35 GB in size,
> >> now TOAST compression is not working.
> >
> > More questions:)
> > How do you know that?
> > I thought the size problem only came to light when you tried to set up
> > another server.
> > Was there some sort of monitoring going on previous to setting up the new
> > server?
> > Anything else happen around that time?
>
> OK, I have to withdraw my "observation".  Let me refine it (which I am
> afraid makes
> it less useful):  I've done pg_dump/restore of the database earlier, when
> it was smaller (< 30 GB) and did not notice such a doubling in size; I
> don't think it occurred.  Certainly the last time we moved the database
> from the DR back to primary site, it did not double in size from 1 GB (it
> was around 1 GB then).
>
> Here is what I did see:  we've had Slony replication running for a
> while (over half
> a year) from Primary to DR; and file system utilization on Primary and
> DR was about
> the same.  (around 75%).
>
> Also, I had done some pg_dump's / pg_restore's from DR to Dev and Stage,
> and the database size was about 1:1.
>
> But most recently, I shut down replication, and modified the cluster config
> (added several tables and sequences; plus some tables were modified so
> I wanted to get a clean start on replicating them).  I removed the slony
> schemas and re-created a Slony replication set -- my filesystem was 75%
> full on the master,
> and it hit 100% on the slave!  So I lost my slave!
>
> Then I tried pg_dump/pg_restore and noticed the same thing, that one table
> doubles in size.
>
> Last time I did a full Slony re-sync like this was around 30 GB.  Now
> we're up to
> 40-50 GB and hit the 1:2 factor.
>
> I can't think of anything else happening around this time...
>
> I'm going to try splitting the pg_dump file (40 GB) in half, and load the
> 20 GB file, and see how much space it takes up in the database, I'm
> curious if I can replicate the 1:2 swelling with this smaller table.
>
> Aleksey


Going over the saga to date.

1)Slony replication running between production server and DR server for half a
year or so.
    Where the Postgres versions the same between servers?

2) Replication shut down, cluster configuration modified
    Assuming that the problem table was not one of the ones added correct?


3)In your first email you mentioned upgrading the production server.
    Was this a version upgrade?
        Major or minor upgrade?
        From what to what?
    Otherwise what was the upgrade?

4)Dump/restore to Dev and Stage seem to be alright.
    Have either of these been touched by Slony?
    Have you tried this with the data set you have been having problems with?

5)The restore to DR server is showing size growth of 2x.
    Is this a fresh instance of Postgres or is it the instance that was under
Slony replication previously?




--
Adrian Klaver
adrian.klaver@gmail.com

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Aleksey Tsalolikhin
Date:
On Fri, Mar 4, 2011 at 7:19 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote:
> On Friday, March 04, 2011 5:11:04 pm Aleksey Tsalolikhin wrote:
>> On Fri, Mar 4, 2011 at 4:45 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote:
>> > On Friday, March 04, 2011 2:03:23 pm Aleksey Tsalolikhin wrote:
>> >> On Fri, Mar 4, 2011 at 7:53 AM, Adrian Klaver <adrian.klaver@gmail.com>
> wrote:
>> >> >What is the data being stored in the table?
>> >>
>> >> For the main part, it's an XML file, we store it in the third field.
>> >> Our XML files are 13KB - 48 KB in length.
>> >
>> > And you are positive that field has not had its storage changed to
>> > something other than EXTENDED?
>>
>> Quite.  I just double checked on both servers at it is EXTENDED for
>> everything but
>> the timestamp (5th field) which is PLAIN (on both servers).
>>
>> >> From my observation, when the table was under 30 GB in size,
>> >> TOAST compression worked fine; when it got above 35 GB in size,
>> >> now TOAST compression is not working.
>> >
>> > More questions:)
>> > How do you know that?
>> > I thought the size problem only came to light when you tried to set up
>> > another server.
>> > Was there some sort of monitoring going on previous to setting up the new
>> > server?
>> > Anything else happen around that time?
>>
>> OK, I have to withdraw my "observation".  Let me refine it (which I am
>> afraid makes
>> it less useful):  I've done pg_dump/restore of the database earlier, when
>> it was smaller (< 30 GB) and did not notice such a doubling in size; I
>> don't think it occurred.  Certainly the last time we moved the database
>> from the DR back to primary site, it did not double in size from 1 GB (it
>> was around 1 GB then).
>>
>> Here is what I did see:  we've had Slony replication running for a
>> while (over half
>> a year) from Primary to DR; and file system utilization on Primary and
>> DR was about
>> the same.  (around 75%).
>>
>> Also, I had done some pg_dump's / pg_restore's from DR to Dev and Stage,
>> and the database size was about 1:1.
>>
>> But most recently, I shut down replication, and modified the cluster config
>> (added several tables and sequences; plus some tables were modified so
>> I wanted to get a clean start on replicating them).  I removed the slony
>> schemas and re-created a Slony replication set -- my filesystem was 75%
>> full on the master,
>> and it hit 100% on the slave!  So I lost my slave!
>>
>> Then I tried pg_dump/pg_restore and noticed the same thing, that one table
>> doubles in size.
>>
>> Last time I did a full Slony re-sync like this was around 30 GB.  Now
>> we're up to
>> 40-50 GB and hit the 1:2 factor.
>>
>> I can't think of anything else happening around this time...
>>
>> I'm going to try splitting the pg_dump file (40 GB) in half, and load the
>> 20 GB file, and see how much space it takes up in the database, I'm
>> curious if I can replicate the 1:2 swelling with this smaller table.
>>
>> Aleksey
>
>
> Going over the saga to date.
>
> 1)Slony replication running between production server and DR server for half a
> year or so.
>        Where the Postgres versions the same between servers?

Yes, 8.4.4

>
> 2) Replication shut down, cluster configuration modified
>        Assuming that the problem table was not one of the ones added correct?

Correct.




> 3)In your first email you mentioned upgrading the production server.
>        Was this a version upgrade?
>                Major or minor upgrade?
>                From what to what?
>        Otherwise what was the upgrade?

Hardware upgrade only.  Posgres still 8.4.4.

>
> 4)Dump/restore to Dev and Stage seem to be alright.

No, the data doubles in size in the course of the restore.
To any/all of my environments.

> 5)The restore to DR server is showing size growth of 2x.
>        Is this a fresh instance of Postgres or is it the instance that was under
> Slony replication previously?

This is the instance that was under Slony replication previously.

Dev had been under Slony replication previously.  Stage had not.


Experiment 1:
hypothesis: something about how large my table has grown is causing
the TOAST compression to fail on COPY.
test: pg_dump the big table, cut the dump file in half using
"/bin/split", add "\." at the end of the file, and load the top half.
result: database is 50 GB in size.  hypothesis proven false.

Experiment 2:
hypothesis: something about Slony is causing the TOAST compression to
be disabled on COPY.
test: load the 50% dump file from experiment 1 above into our Stage
database, which was never touched by Slony.
result: database is 50 GB in size.  hypothesis proven false.


Best,
Aleksey

Re: database is bigger after dump/restore - why? (60 GB to 109 GB)

From
Adrian Klaver
Date:
On Monday, March 07, 2011 2:45:00 pm Aleksey Tsalolikhin wrote:

>
> Experiment 1:
> hypothesis: something about how large my table has grown is causing
> the TOAST compression to fail on COPY.
> test: pg_dump the big table, cut the dump file in half using
> "/bin/split", add "\." at the end of the file, and load the top half.
> result: database is 50 GB in size.  hypothesis proven false.
>
> Experiment 2:
> hypothesis: something about Slony is causing the TOAST compression to
> be disabled on COPY.
> test: load the 50% dump file from experiment 1 above into our Stage
> database, which was never touched by Slony.
> result: database is 50 GB in size.  hypothesis proven false.

Hmmm. Another perfectly good line of reasoning shot down. No further thoughts at
this time. May have to sit down with Jack Daniels and have a talk and see if
anything shakes loose:)

>
>
> Best,
> Aleksey

--
Adrian Klaver
adrian.klaver@gmail.com