Thread: Pg and compress
Hi all,
We are going to use pg as data warehouse,but after some test,we found that plain text with csv format is 3 times bigger when load to pg.we use copy to load data.we try some optimize and it reduce to 2.5 times bigger.other db can avarage compress to 1/3 of the plain text.bigger data means heavy io.
So my question is how to make data compressed in pg?is some fs such as zfs,berfs with compression feature can work well with pg?
On 09/26/11 6:59 AM, Jov wrote: > > Hi all, > We are going to use pg as data warehouse,but after some test,we found > that plain text with csv format is 3 times bigger when load to pg.we > use copy to load data.we try some optimize and it reduce to 2.5 times > bigger.other db can avarage compress to 1/3 of the plain text.bigger > data means heavy io. > So my question is how to make data compressed in pg?is some fs such > as zfs,berfs with compression feature can work well with pg? > your source data is CSV, what data types are the fields in the table(s) ? do you have a lot of indexes on this table(s)? -- john r pierce N 37, W 122 santa cruz ca mid-left coast
Most are bigint and one field is varchar.
There is no index.
在 2011-9-27 上午3:34,"John R Pierce" <pierce@hogranch.com>写道:
>
> On 09/26/11 6:59 AM, Jov wrote:
>>
>>
>> Hi all,
>> We are going to use pg as data warehouse,but after some test,we found that plain text with csv format is 3 times bigger when load to pg.we use copy to load data.we try some optimize and it reduce to 2.5 times bigger.other db can avarage compress to 1/3 of the plain text.bigger data means heavy io.
>> So my question is how to make data compressed in pg?is some fs such as zfs,berfs with compression feature can work well with pg?
>>
>
> your source data is CSV, what data types are the fields in the table(s) ? do you have a lot of indexes on this table(s)?
>
>
>
> --
> john r pierce N 37, W 122
> santa cruz ca mid-left coast
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
On 09/26/11 5:53 PM, Jov wrote: > > Most are bigint and one field is varchar. > There is no index. > > well, scalar bigint values will be 8 bytes, plus a bit or 2 of overhead per field. each complete tuple has a dozen bytes of header overhead. tuples are stored as many as fit in a 8K block, unless you've specified a fillfactor, whereupon that % of space is left free in each block. if your CSV has mostly small integer values that are just 1-2-3 digits, yeah, bigint will take more space than ascii. -- john r pierce N 37, W 122 santa cruz ca mid-left coast