Thread: Compressed Backup too big
"PostgreSQL 8.2.3 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 (mingw-special)" Database size in disk returned by pg_database_size() is 210 MB Database compressesed backup file size is now 125 MB. This seems too much. I expect compression to decrease size 10 times, also indexes are not backed up. A year ago compressed backup size was 9 MB only. I created query returning biggest tables with and without indexes and found: 1 pg_toast_22185 95 MB 96 MB 2 rid 21 MB 27 MB 3 klient 13 MB 19 MB 4 mailbox 10 MB 11 MB 5 dok 7640 kB 12 MB 6 desktop 8080 kB 8200 kB 7 strings 5536 kB 6584 kB 8 pg_toast_22338 5232 kB 5368 kB ... Questions: 1. Tables are relatively small and thus cannot create 125 MB compressed backup file. Why backup file sis so big ? 2. How to determine what data is containing in pg_toast_22185 ? Why this is so big ? Andrus.
2007/11/15, Andrus <kobruleht2@hot.ee>: > "PostgreSQL 8.2.3 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 > (mingw-special)" > Database size in disk returned by pg_database_size() is 210 MB > > Database compressesed backup file size is now 125 MB. How do you produce this dump? pg_dump -Fc ? If you create plain dump ( pg_dump DBNAME > dump.file ) and zip it (normal deflate, like winzip or sth like that), what's the size? If this is much smaller than 125MB, maybe you got some corner case with postgres builtin compression. You cannot *always* expect 10x ratio... it depends on data. Maybe usage patterns of your database changed and now the data is less compressable? -- Filip Rembiałkowski
On Thu, 15 Nov 2007 20:35:35 +0200 "Andrus" <kobruleht2@hot.ee> wrote: > "PostgreSQL 8.2.3 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 > (mingw-special)" > Database size in disk returned by pg_database_size() is 210 MB > > Database compressesed backup file size is now 125 MB. > This seems too much. I expect compression to decrease size 10 times, also > indexes are not backed up. A year ago compressed backup size was 9 MB only. > > I created query returning biggest tables with and without indexes and found: > > 1 pg_toast_22185 95 MB 96 MB > 2 rid 21 MB 27 MB > 3 klient 13 MB 19 MB > 4 mailbox 10 MB 11 MB > 5 dok 7640 kB 12 MB > 6 desktop 8080 kB 8200 kB > 7 strings 5536 kB 6584 kB > 8 pg_toast_22338 5232 kB 5368 kB > > ... > > Questions: > > 1. Tables are relatively small and thus cannot create 125 MB compressed > backup file. > Why backup file sis so big ? I assume that "mailbox" contains e-mail messages as the name suggest... One can assume that those emails also have attachmentslike pdf, jpg and other mostly incompressible file format. In my understanding of postgresql, when you a have a variable length field (like text or bytea), pg will only stores thefirst bytes (hundreds or so) of that field in the main table and then put the rest in an "hidden[1]" TOAST table. > 2. How to determine what data is containing in pg_toast_22185 ? If you have PGAdmin at hand or you can install it, its quite easy, just enable "Show System Objects" under the "View" menu[2]. Based on my assumptions, I guess it's the TOAST of the field which contains the body of your emails... Enough wild guesses, HTH, -- MaXX [1] hidden: transparent from an user point of view [2] you can also obtain those information from the information schema IIRC
Andrus wrote: > 2. How to determine what data is containing in pg_toast_22185 ? > Why this is so big ? > That will be determined by the columns you define and the data you insert. Read http://www.postgresql.org/docs/8.2/interactive/storage-toast.html to get an explanation of TOAST and how you can change the settings if you wish. This is most likely the reason for your "lack of compression" - the data in the toast tables is most likely already compressed data so will be close to a 1-1 export size. -- Shane Ambler pgSQL@Sheeky.Biz Get Sheeky @ http://Sheeky.Biz
"Andrus" <kobruleht2@hot.ee> writes: > I created query returning biggest tables with and without indexes and found: > 1 pg_toast_22185 95 MB 96 MB > 1. Tables are relatively small and thus cannot create 125 MB compressed > backup file. You have failed to consider that data that goes into a toast table is usually compressed first. The above table is probably accounting for close to 100MB of compressed dump output. regards, tom lane
On Thu, 2007-11-15 at 20:35 +0200, Andrus wrote: > "PostgreSQL 8.2.3 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 > (mingw-special)" > Database size in disk returned by pg_database_size() is 210 MB > > Database compressesed backup file size is now 125 MB. > This seems too much. I expect compression to decrease size 10 times, also > indexes are not backed up. A year ago compressed backup size was 9 MB only. > > I created query returning biggest tables with and without indexes and found: > > 1 pg_toast_22185 95 MB 96 MB > 2 rid 21 MB 27 MB > 3 klient 13 MB 19 MB > 4 mailbox 10 MB 11 MB > 5 dok 7640 kB 12 MB > 6 desktop 8080 kB 8200 kB > 7 strings 5536 kB 6584 kB > 8 pg_toast_22338 5232 kB 5368 kB > > ... > > Questions: > > 1. Tables are relatively small and thus cannot create 125 MB compressed > backup file. > Why backup file sis so big ? Is this a pg_dump backup or a PITR style backup? If it's a pg_dump backup, you can open it up in an editor to find out what's taking so much space. > 2. How to determine what data is containing in pg_toast_22185 ? > Why this is so big ? Could it be that you haven't been VACUUMing properly? Possibly you need to run a VACUUM FULL if you haven't kept up. If it's a PITR style backup on 1, that could be the same reason. To find what table has pg_toast_22185, try: SELECT relname FROM pg_class WHERE oid=22185 //Magnus
Filip Rembiałkowski wrote: > 2007/11/15, Andrus <kobruleht2@hot.ee>: >> "PostgreSQL 8.2.3 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 >> (mingw-special)" >> Database size in disk returned by pg_database_size() is 210 MB >> >> Database compressesed backup file size is now 125 MB. > > How do you produce this dump? pg_dump -Fc ? > > If you create plain dump ( pg_dump DBNAME > dump.file ) and zip it > (normal deflate, like winzip or sth like that), what's the size? > > If this is much smaller than 125MB, maybe you got some corner case > with postgres builtin compression. > > You cannot *always* expect 10x ratio... it depends on data. Maybe > usage patterns of your database changed and now the data is less > compressable? We find we get much, much better compression using bzip2. (It would be wonderful if pg_restore supported this natively, but I don't know enough to know if it is possible or not.) -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/