Home > mailing lists

Re: Increased size of database dump even though LESS consumed storage - Mailing list pgsql-general

From	Francisco Olarte
Subject	Re: Increased size of database dump even though LESS consumed storage
Date	February 10, 2021 12:38:51
Msg-id	CA+bJJby3Senj3yYGW0j8rYTScf=AmGzL_yGvQMWrfz=AYApo=A@mail.gmail.com Whole thread Raw
In response to	Re: Increased size of database dump even though LESS consumed storage (Thorsten Schöning<tschoening@am-soft.de>)
Responses	Re: Increased size of database dump even though LESS consumed storage Re: Increased size of database dump even though LESS consumed storage
List	pgsql-general

Tree view

Thorsten:

On Wed, Feb 10, 2021 at 9:58 AM Thorsten Schöning <tschoening@am-soft.de> wrote:
...
> I've changed the new table "datagram" to be a partitioned one with
> partitions containing rows per year and per half-year. Each partition
> contains far less rows than before this way and while I only tested
> with "--compress=9" this time, the numbers are quite interesting:
>
> > unpartitioned:     6,4 GiB
> > half-yearly parts: 4,8 GiB
> > yearly parts:      4,8 GiB
>
> The interesting part this time is that the table layout for all
> partitions is the same like before, only the number of rows per table
> is different. Though, the number of rows overall is the same like
> before, the same data, IDs etc. Though, this time the dump really is
> smaller than with the OLD schema containing far more data because of
> duplicate IDs and stuff.
> I wouldn't have expected table layout to be that important.

Compresion is dependent on detectable redundancy on the input. pg_dump
more or less gzips per-table "copy to stdout" dumps. If your data
evolves in a determined way having it sorted by ingestion time may
increase detectable redundancy a lot, and partitioning sorts partially
by date ( or fully if you have made the partitions by range-querying
via index scan ). In this case it may not be the layout, but the
order.

Given you seem to be able to test, you may try sorting the full table
by the column you use for partitioning. IIRC cluster will do the trick
if it is indexed.

( This has happened to me compressing document dumps, presorting by
some chosen fields improved my compression ratio a bit, IIRC it was
due to the compressor searching for duplicates on a limited window
only, this is why I use "detectable" redundancy )

Francisco Olarte.

pgsql-general by date:

From: Daniele Varrazzo
Date: 10 February 2021, 12:29:07
Subject: Re: Regarding Postgres - Insertion Time Getting Increased As Data Volume is getting increased

From: cen
Date: 10 February 2021, 12:42:02
Subject: Re: Regarding Postgres - Insertion Time Getting Increased As Data Volume is getting increased

Re: Increased size of database dump even though LESS consumed storage - Mailing list pgsql-general

Previous

Next