Thread: Significantly larger toast tables on 8.4?

Significantly larger toast tables on 8.4?

From

"Alex Hunsaker"

Date:

02 January 2009, 00:31:03

I just did a fresh import of my alpha database into 8.4 and noticed
that the size of the database had increased significantly:

8.4: 6.3G
8.3: 4.9G

Tracking it down the main difference seems to a toast tables namely this one:

ls -ltrh 8.3/base/16516/430156
-rw------- 1 postgres postgres 145M 2009-01-01 00:12 8.3/base/16516/430156

and the same toast table in 8.4:

ls -ltrh 8.4/base/16513/347706
-rw------- 1 postgres postgres 967M 2009-01-01 20:56 8.4/base/16513/347706

This table consists mainly of perl Storable binary blobs in a bytea
column schema looks like:
   Column     |           Type           | Modifiers
---------------+--------------------------+---------------------------------------------------------------------action
     | text                     |die_id | integer                  | not null default
 
nextval('dies_die_id_seq'::regclass)cparam        | bytea                    |date_created  | timestamp with time zone
|not null default now()db_profile    | bytea                    |debug         | bytea                    |defunct
| smallint                 | not null default 0env           | bytea                    |login         | bytea
         |msg           | text                     |open_user_id  | integer                  |page_load_id  | integer
              |session_id    | integer                  |state         | bytea                    |state_action  |
bytea                   |user_id       | integer                  |whoops        | bytea                    |
 
Indexes:  "dies_pkey" PRIMARY KEY, btree (die_id)


My hunch is its related to
http://git.postgresql.org/?p=postgresql.git;a=commit;h=3b6942b3f2fe733572c05a71cb2d12e5ece60cdbits
or for the CVS inclined
http://archives.postgresql.org/pgsql-committers/2008-03/msg00121.php

But if anything that looks like it should help reduce size...

Ideas?

Re: Significantly larger toast tables on 8.4?

From

"Alex Hunsaker"

Date:

02 January 2009, 00:49:59

On Thu, Jan 1, 2009 at 21:30, Alex Hunsaker <badalex@gmail.com> wrote:

> http://git.postgresql.org/?p=postgresql.git;a=commit;h=3b6942b3f2fe733572c05a71cb2d12e5ece60cdbits

... typoed that its
http://git.postgresql.org/?p=postgresql.git;a=commit;h=3b6942b3f2fe733572c05a71cb2d12e5ece60cd

Re: Significantly larger toast tables on 8.4?

From

"Alex Hunsaker"

Date:

02 January 2009, 01:44:54

On Thu, Jan 1, 2009 at 21:30, Alex Hunsaker <badalex@gmail.com> wrote:

> My hunch is its related to
> http://git.postgresql.org/?p=postgresql.git;a=commit;h=3b6942b3f2fe733572c05a71cb2d12e5ece60cd
> or for the CVS inclined
> http::/archives.postgresql.org/pgsql-committers/2008-03/msg00121.php
>
> But if anything that looks like it should help reduce size...

Looking at the patch we dont compress things > 1M anymore so I thought
maybe I was hitting that.   But no luck there are only 39 rows where
the row size > 1M...  With those 39 being about 22M each. Unless I
calculated something wrong.  Oh and CLUSTER and VACUUM dont seem to
help.

select count(1) from dies;count
-------52010

select count(1) from
( select       coalesce(pg_column_size(action), 0) +       coalesce(pg_column_size(die_id), 0) +
coalesce(pg_column_size(cparam),0) +       coalesce(pg_column_size(date_created), 0) +
coalesce(pg_column_size(db_profile),0) +       coalesce(pg_column_size(debug), 0) +
coalesce(pg_column_size(defunct),0) +       coalesce(pg_column_size(env), 0) +       coalesce(pg_column_size(login), 0)
+      coalesce(pg_column_size(msg), 0) +       coalesce(pg_column_size(open_user_id), 0) +
coalesce(pg_column_size(page_load_id),0) +       coalesce(pg_column_size(session_id), 0) +
coalesce(pg_column_size(state),0) +       coalesce(pg_column_size(state_action), 0) +
coalesce(pg_column_size(user_id),0) +       coalesce(pg_column_size(whoops), 0) as row_size from dies ) as foo where
foo.row_size> 1024*1024;count
 
-------   39

BTW is there a "cooler" way to do this?

Re: Significantly larger toast tables on 8.4?

From

"Alex Hunsaker"

Date:

02 January 2009, 02:09:42

On Thu, Jan 1, 2009 at 22:44, Alex Hunsaker <badalex@gmail.com> wrote:
> Looking at the patch we dont compress things > 1M anymore so I thought
> maybe I was hitting that.   But no luck there are only 39 rows where
> the row size > 1M...  With those 39 being about 22M each.

Oh my... 25 * 40 = 1000M

So I guess my question is are we going to recommend to people that
they manually compress their data just for 8.4?

Re: Significantly larger toast tables on 8.4?

From

"Stephen R. van den Berg"

Date:

02 January 2009, 10:57:53

Alex Hunsaker wrote:
>On Thu, Jan 1, 2009 at 22:44, Alex Hunsaker <badalex@gmail.com> wrote:
>> Looking at the patch we dont compress things > 1M anymore so I thought
>> maybe I was hitting that.   But no luck there are only 39 rows where
>> the row size > 1M...  With those 39 being about 22M each.

>Oh my... 25 * 40 = 1000M

>So I guess my question is are we going to recommend to people that
>they manually compress their data just for 8.4?

What seems to be hurting the most is the 1MB upper limit.  What is the
rationale behind that limit?
What would be the downside to require compressibility instead?
-- 
Sincerely,          Stephen R. van den Berg.

"Always remember that you are unique.  Just like everyone else."

Re: Significantly larger toast tables on 8.4?

From

Tom Lane

Date:

02 January 2009, 12:02:20

"Stephen R. van den Berg" <srb@cuci.nl> writes:
> What seems to be hurting the most is the 1MB upper limit.  What is the
> rationale behind that limit?

The argument was that compressing/decompressing such large chunks would
require a lot of CPU effort; also it would defeat attempts to fetch
subsections of a large string.  In the past we've required people to
explicitly "ALTER TABLE SET STORAGE external" if they wanted to make
use of the substring-fetch optimization, but it was argued that this
would make that more likely to work automatically.

I'm not entirely convinced by Alex' analysis anyway; the only way
those 39 large values explain the size difference is if they are
*tremendously* compressible, like almost all zeroes.  The toast
compressor isn't so bright that it's likely to get 10X compression
on typical data.
        regards, tom lane

Re: Significantly larger toast tables on 8.4?

From

"Robert Haas"

Date:

02 January 2009, 13:44:44

On Fri, Jan 2, 2009 at 11:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Stephen R. van den Berg" <srb@cuci.nl> writes:
>> What seems to be hurting the most is the 1MB upper limit.  What is the
>> rationale behind that limit?
>
> The argument was that compressing/decompressing such large chunks would
> require a lot of CPU effort; also it would defeat attempts to fetch
> subsections of a large string.  In the past we've required people to
> explicitly "ALTER TABLE SET STORAGE external" if they wanted to make
> use of the substring-fetch optimization, but it was argued that this
> would make that more likely to work automatically.
>
> I'm not entirely convinced by Alex' analysis anyway; the only way
> those 39 large values explain the size difference is if they are
> *tremendously* compressible, like almost all zeroes.  The toast
> compressor isn't so bright that it's likely to get 10X compression
> on typical data.

I've seen gzip approach 10X on what was basically a large
tab-separated values file, but I agree that some more experimentation
to determine the real cause of the problem would be useful.

I am a little mystified by the apparent double standard regarding
compressibility.  My suggestion that we disable compression for
pg_statistic columns was perfunctorily shot down even though I
provided detailed performance results demonstrating that it greatly
sped up query planning on toasted statistics and even though the space
savings from compression in that case are bound to be tiny.

Here, we have a case where the space savings are potentially much
larger, and the only argument against it is that someone might be
disappointed in the performance of substring operations, if they
happen to do any.  What if they know that they don't want to do any
and want to get compression?  Even if the benefit is only 1.5X on
their data rather than 10X, that seems like a pretty sane and useful
thing to want to do.  It's easy to shut off compression if you don't
want it; if the system makes an arbitrary decision to disable it, how
do you get it back?

...Robert

Re: Significantly larger toast tables on 8.4?

From

Tom Lane

Date:

02 January 2009, 14:45:04

"Robert Haas" <robertmhaas@gmail.com> writes:
> On Fri, Jan 2, 2009 at 11:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm not entirely convinced by Alex' analysis anyway; the only way
>> those 39 large values explain the size difference is if they are
>> *tremendously* compressible, like almost all zeroes.  The toast
>> compressor isn't so bright that it's likely to get 10X compression
>> on typical data.

> I've seen gzip approach 10X on what was basically a large
> tab-separated values file, but I agree that some more experimentation
> to determine the real cause of the problem would be useful.

If I'm counting on my fingers correctly, you'd need to assume about
23X compression to get the reported size change...

An easy way to prove or disprove the point would be to go into
src/backend/utils/adt/pg_lzcompress.c, and change the second entry
in strategy_default_data from "1024 * 1024" to "INT_MAX", then
re-load the table with the modified backend (or you could even
just do "CREATE TABLE foo AS SELECT * FROM existing_table" and
then compare sizes).  If that puts the toast table size back where
it had been, then Alex correctly diagnosed the cause.
        regards, tom lane

Re: Significantly larger toast tables on 8.4?

From

"Alex Hunsaker"

Date:

02 January 2009, 14:56:53

On Fri, Jan 2, 2009 at 09:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>The argument was that compressing/decompressing such large chunks would
>require a lot of CPU effort; also it would defeat attempts to fetch
>subsections of a large string.  In the past we've required people to
>explicitly "ALTER TABLE SET STORAGE external" if they wanted to make
>use of the substring-fetch optimization, but it was argued that this
>would make that more likely to work automatically.

It is *way* faster  here are some ruff numbers:
$ cat q.sql
select length(substring(debug, 1024, 1024)) from dies where die_id = 295;

$ ./pgbench -T60 -n -f q.sql
8.4: 1532.327546
8.3:      21.295657

Thats with 8.4 as a  --enable-debug --enable-cassert build! (8.3 just
has --enable-debug)

> I'm not entirely convinced by Alex' analysis anyway; the only way
> those 39 large values explain the size difference is if they are
> *tremendously* compressible, like almost all zeroes.  The toast
> compressor isn't so bright that it's likely to get 10X compression
> on typical data.

Well its certainly not all zeros, but those big rows are the same
message repeated (~5k) until perl detected endless recursion died...
So its not exactly typical or *useful* data either.   The other 51,
971 rows (rows smaller than 1024*1024 bytes) account for 174MB while
those 39 account for the other 828MB.

Mainly I was just alarmed by the size increase.  But I think the
pgbench numbers make me happy even if I never do a substring on that
data...

------------------
# 8.3
select pg_size_pretty(pg_column_size(debug)) from dies where die_id = 295;pg_size_pretty
----------------1256 kB

select pg_size_pretty(pg_column_size(debug)) from dies where die_id = 295;pg_size_pretty
----------------22 MB
select pg_size_pretty(sum(row_size)) from
( select      coalesce(pg_column_size(action), 0) +      coalesce(pg_column_size(die_id), 0) +
coalesce(pg_column_size(cparam),0) +      coalesce(pg_column_size(date_created), 0) +
coalesce(pg_column_size(db_profile),0) +      coalesce(pg_column_size(debug), 0) +
coalesce(pg_column_size(defunct),0) +      coalesce(pg_column_size(env), 0) +      coalesce(pg_column_size(login), 0) +
    coalesce(pg_column_size(msg), 0) +      coalesce(pg_column_size(open_user_id), 0) +
coalesce(pg_column_size(page_load_id),0) +      coalesce(pg_column_size(session_id), 0) +
coalesce(pg_column_size(state),0) +      coalesce(pg_column_size(state_action), 0) +
coalesce(pg_column_size(user_id),0) +      coalesce(pg_column_size(whoops), 0) as row_sizefrom dies ) as foo where
foo.row_size< 1024*1024;pg_size_pretty
 
----------------174 MB
select pg_size_pretty(sum(row_size)) from
( select      coalesce(pg_column_size(action), 0) +      coalesce(pg_column_size(die_id), 0) +
coalesce(pg_column_size(cparam),0) +      coalesce(pg_column_size(date_created), 0) +
coalesce(pg_column_size(db_profile),0) +      coalesce(pg_column_size(debug), 0) +
coalesce(pg_column_size(defunct),0) +      coalesce(pg_column_size(env), 0) +      coalesce(pg_column_size(login), 0) +
    coalesce(pg_column_size(msg), 0) +      coalesce(pg_column_size(open_user_id), 0) +
coalesce(pg_column_size(page_load_id),0) +      coalesce(pg_column_size(session_id), 0) +
coalesce(pg_column_size(state),0) +      coalesce(pg_column_size(state_action), 0) +
coalesce(pg_column_size(user_id),0) +      coalesce(pg_column_size(whoops), 0) as row_sizefrom dies ) as foo where
foo.row_size< 1024*1024;pg_size_pretty
 
----------------828 MB

Re: Significantly larger toast tables on 8.4?

From

"Alex Hunsaker"

Date:

02 January 2009, 15:43:00

On Fri, Jan 2, 2009 at 10:44, Robert Haas <robertmhaas@gmail.com> wrote:
> Here, we have a case where the space savings are potentially much
> larger, and the only argument against it is that someone might be
> disappointed in the performance of substring operations, if they
> happen to do any.  What if they know that they don't want to do any
> and want to get compression?  Even if the benefit is only 1.5X on
> their data rather than 10X, that seems like a pretty sane and useful
> thing to want to do.  It's easy to shut off compression if you don't
> want it; if the system makes an arbitrary decision to disable it, how
> do you get it back?

I think we could just add another toast storage type: alter table
alter column set storage compress; ?  It seems overkill to expose
PGLZ_Strategy knobs per column...

Re: Significantly larger toast tables on 8.4?

From

"Alex Hunsaker"

Date:

02 January 2009, 15:50:42

On Fri, Jan 2, 2009 at 11:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Robert Haas" <robertmhaas@gmail.com> writes:
>> I've seen gzip approach 10X on what was basically a large
>> tab-separated values file, but I agree that some more experimentation
>> to determine the real cause of the problem would be useful.
>
> If I'm counting on my fingers correctly, you'd need to assume about
> 23X compression to get the reported size change...
>
> An easy way to prove or disprove the point would be to go into
> src/backend/utils/adt/pg_lzcompress.c, and change the second entry
> in strategy_default_data from "1024 * 1024" to "INT_MAX", then
> re-load the table with the modified backend (or you could even
> just do "CREATE TABLE foo AS SELECT * FROM existing_table" and
> then compare sizes).  If that puts the toast table size back where
> it had been, then Alex correctly diagnosed the cause.

And the toast file size is *drum roll* 167M.

Re: Significantly larger toast tables on 8.4?

From

"Stephen R. van den Berg"

Date:

02 January 2009, 16:23:19

Alex Hunsaker wrote:
>I think we could just add another toast storage type: alter table
>alter column set storage compress; ?  It seems overkill to expose
>PGLZ_Strategy knobs per column...

Three things:
a. Shouldn't it in theory be possible to have a decompression algorithm  which is IO-bound because it decompresses
fasterthan the disk can  supply the data?  (On common current hardware).

b. Has the current algorithm been carefully benchmarked and/or optimised  and/or chosen to fit the IO-bound target as
closeas possible?

c. Are there any well-known pitfalls/objections which would prevent me from  changing the algorithm to something more
efficient(read: IO-bound)?

-- 
Sincerely,          Stephen R. van den Berg.

"Always remember that you are unique.  Just like everyone else."

Re: Significantly larger toast tables on 8.4?

From

"Robert Haas"

Date:

02 January 2009, 16:35:25

On Fri, Jan 2, 2009 at 3:23 PM, Stephen R. van den Berg <srb@cuci.nl> wrote:
> Alex Hunsaker wrote:
>>I think we could just add another toast storage type: alter table
>>alter column set storage compress; ?  It seems overkill to expose
>>PGLZ_Strategy knobs per column...
>
> Three things:
> a. Shouldn't it in theory be possible to have a decompression algorithm
>   which is IO-bound because it decompresses faster than the disk can
>   supply the data?  (On common current hardware).
> b. Has the current algorithm been carefully benchmarked and/or optimised
>   and/or chosen to fit the IO-bound target as close as possible?
> c. Are there any well-known pitfalls/objections which would prevent me from
>   changing the algorithm to something more efficient (read: IO-bound)?

Any compression algorithm is going to require you to decompress the
entire string before extracting a substring at a given offset.  When
the data is uncompressed, you can jump directly to the offset you want
to read.  Even if the compression algorithm requires no overhead at
all, it's going to make the location of the data nondeterministic, and
therefore force additional disk reads.

...Robert

Re: Significantly larger toast tables on 8.4?

From

Gregory Stark

Date:

02 January 2009, 16:59:44

"Stephen R. van den Berg" <srb@cuci.nl> writes:

> Alex Hunsaker wrote:
>>I think we could just add another toast storage type: alter table
>>alter column set storage compress; ?  It seems overkill to expose
>>PGLZ_Strategy knobs per column...
>
> Three things:
> a. Shouldn't it in theory be possible to have a decompression algorithm
>    which is IO-bound because it decompresses faster than the disk can
>    supply the data?  (On common current hardware).

We don't stream the data from disk through the decompressor. So whether it's
i/o bound or not, the time spend decompressing is in addition to the time
spent doing the i/o. The only way it wins is if the time saved on i/o due to
the smaller data size is greater than the time spent decompressing.

I think the right value for this setting is going to depend on the
environment. If the system is starved for cpu cycles then you won't want to
compress large data. If it's starved for i/o bandwidth but has spare cpu
cycles then you will.

If that's true then we really have to expose this parameter to users. There
won't be a single value that is appropriate for everyone.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication
support!

Re: Significantly larger toast tables on 8.4?

From

"Stephen R. van den Berg"

Date:

02 January 2009, 17:19:05

Robert Haas wrote:
>On Fri, Jan 2, 2009 at 3:23 PM, Stephen R. van den Berg <srb@cuci.nl> wrote:
>> Three things:
>> a. Shouldn't it in theory be possible to have a decompression algorithm
>>   which is IO-bound because it decompresses faster than the disk can
>>   supply the data?  (On common current hardware).
>> b. Has the current algorithm been carefully benchmarked and/or optimised
>>   and/or chosen to fit the IO-bound target as close as possible?
>> c. Are there any well-known pitfalls/objections which would prevent me from
>>   changing the algorithm to something more efficient (read: IO-bound)?

>Any compression algorithm is going to require you to decompress the
>entire string before extracting a substring at a given offset.  When
>the data is uncompressed, you can jump directly to the offset you want
>to read.  Even if the compression algorithm requires no overhead at
>all, it's going to make the location of the data nondeterministic, and
>therefore force additional disk reads.

That shouldn't be insurmountable:
- I currently have difficulty imagining applications that actually do lots of substring extractions from large
compressiblefields. The most likely operation would be a table which contains tsearch indexed large textfields, but
thoseare unlikely to participate in a lot of substring extractions.
 
- Even if substring operations would be likely, I could envision a compressed format which compresses in compressed
chunksof say 64KB which can then be addressed randomly independently.
 
-- 
Sincerely,          Stephen R. van den Berg.

"Always remember that you are unique.  Just like everyone else."

Re: Significantly larger toast tables on 8.4?

From

"Robert Haas"

Date:

02 January 2009, 17:43:46

On Fri, Jan 2, 2009 at 4:19 PM, Stephen R. van den Berg <srb@cuci.nl> wrote:
> Robert Haas wrote:
>>On Fri, Jan 2, 2009 at 3:23 PM, Stephen R. van den Berg <srb@cuci.nl> wrote:
>>> Three things:
>>> a. Shouldn't it in theory be possible to have a decompression algorithm
>>>   which is IO-bound because it decompresses faster than the disk can
>>>   supply the data?  (On common current hardware).
>>> b. Has the current algorithm been carefully benchmarked and/or optimised
>>>   and/or chosen to fit the IO-bound target as close as possible?
>>> c. Are there any well-known pitfalls/objections which would prevent me from
>>>   changing the algorithm to something more efficient (read: IO-bound)?
>
>>Any compression algorithm is going to require you to decompress the
>>entire string before extracting a substring at a given offset.  When
>>the data is uncompressed, you can jump directly to the offset you want
>>to read.  Even if the compression algorithm requires no overhead at
>>all, it's going to make the location of the data nondeterministic, and
>>therefore force additional disk reads.
>
> That shouldn't be insurmountable:
> - I currently have difficulty imagining applications that actually do
>  lots of substring extractions from large compressible fields.
>  The most likely operation would be a table which contains tsearch
>  indexed large textfields, but those are unlikely to participate in
>  a lot of substring extractions.

I completely agree.  If your large text field has interior structure
with certain data items at certain positions, you'd presumably break
it into multiple fixed-width fields.  If it doesn't, what's the use
case?

> - Even if substring operations would be likely, I could envision a compressed
>  format which compresses in compressed chunks of say 64KB which can then
>  be addressed randomly independently.

I think this would require some sort of indexing so that you could
find the page that contains the first bit of any particular chunk you
want to find, so it might be a bit complex to implement, and I expect
it would reduce compression ratios as well.  I'm sure it could be
done, but I doubt it's worth the bother.  If you're more concerned
about the speed with which you can access your data than the size of
it, you can and should turn compression off altogether.

...Robert

Re: Significantly larger toast tables on 8.4?

From

Martijn van Oosterhout

Date:

02 January 2009, 18:48:41

On Fri, Jan 02, 2009 at 03:35:18PM -0500, Robert Haas wrote:
> Any compression algorithm is going to require you to decompress the
> entire string before extracting a substring at a given offset.  When
> the data is uncompressed, you can jump directly to the offset you want
> to read.  Even if the compression algorithm requires no overhead at
> all, it's going to make the location of the data nondeterministic, and
> therefore force additional disk reads.

So you compromise. You split the data into say 1MB blobs and compress
each individually. Then if someone does a substring at offset 3MB you
can find it quickly. This barely costs you anything in the compression
ratio mostly.

Implementation though, that's harder. The size of the blobs is tunable
also. I imagine the optimal value will probably be around 100KB. (12
blocks uncompressed).

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: Significantly larger toast tables on 8.4?

From

Tom Lane

Date:

02 January 2009, 21:30:40

"Alex Hunsaker" <badalex@gmail.com> writes:
> On Fri, Jan 2, 2009 at 11:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> An easy way to prove or disprove the point would be to go into
>> src/backend/utils/adt/pg_lzcompress.c, and change the second entry
>> in strategy_default_data from "1024 * 1024" to "INT_MAX",

> And the toast file size is *drum roll* 167M.

Hmmm ... so that's a lot closer to the original 145M, but it still
seems like there's something else going on.  It looks like the other
thing we changed that might result in not compressing things was to
increase the third entry (minimum compression rate) from 20% to 25%.
Could you try it with that value also changed back?
        regards, tom lane

Re: Significantly larger toast tables on 8.4?

From

Tom Lane

Date:

02 January 2009, 21:36:32

"Stephen R. van den Berg" <srb@cuci.nl> writes:
> - I currently have difficulty imagining applications that actually do
>   lots of substring extractions from large compressible fields.

The code that's in there to make this happen was written by people who
needed the feature.  They're going to be upset with you if you propose
disabling it.
        regards, tom lane

Re: Significantly larger toast tables on 8.4?

From

Tom Lane

Date:

02 January 2009, 21:47:01

Gregory Stark <stark@enterprisedb.com> writes:
> I think the right value for this setting is going to depend on the
> environment. If the system is starved for cpu cycles then you won't want to
> compress large data. If it's starved for i/o bandwidth but has spare cpu
> cycles then you will.

> If that's true then we really have to expose this parameter to users. There
> won't be a single value that is appropriate for everyone.

Yeah.  The commit message for these changes commented
There was some discussion in the earlier threads of exposing someof the compression knobs to users, perhaps even on a
per-columnbasis. I have not done anything about that here.  It seems to methat if we are changing around the
parameters,we'd better get someexperience and be sure we are happy with the design before we setthings in stone by
providinguser-visible knobs.

and I'm still pretty worried about the longevity of any knob we put in
here.  But we might not have a lot of choice.

It would be fairly easy, I think, to add some reloption fields that
would let these parameters be controlled on a per-table level.
Per-column would be much more painful; do we really need that?
        regards, tom lane

Re: Significantly larger toast tables on 8.4?

From

"Alex Hunsaker"

Date:

02 January 2009, 22:27:51

On Fri, Jan 2, 2009 at 18:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Alex Hunsaker" <badalex@gmail.com> writes:
>> On Fri, Jan 2, 2009 at 11:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> An easy way to prove or disprove the point would be to go into
>>> src/backend/utils/adt/pg_lzcompress.c, and change the second entry
>>> in strategy_default_data from "1024 * 1024" to "INT_MAX",
>
>> And the toast file size is *drum roll* 167M.
>
> Hmmm ... so that's a lot closer to the original 145M, but it still
> seems like there's something else going on.  It looks like the other
> thing we changed that might result in not compressing things was to
> increase the third entry (minimum compression rate) from 20% to 25%.
> Could you try it with that value also changed back?

With it back to 20% its now back to 145M.

Re: Significantly larger toast tables on 8.4?

From

"Robert Haas"

Date:

02 January 2009, 22:37:36

On Fri, Jan 2, 2009 at 8:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Stephen R. van den Berg" <srb@cuci.nl> writes:
>> - I currently have difficulty imagining applications that actually do
>>   lots of substring extractions from large compressible fields.
>
> The code that's in there to make this happen was written by people who
> needed the feature.  They're going to be upset with you if you propose
> disabling it.

Why didn't they just turn off compression for the relevant columns?

...Robert

Re: Significantly larger toast tables on 8.4?

From

Tom Lane

Date:

02 January 2009, 23:30:51

"Robert Haas" <robertmhaas@gmail.com> writes:
> On Fri, Jan 2, 2009 at 8:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Stephen R. van den Berg" <srb@cuci.nl> writes:
>>> - I currently have difficulty imagining applications that actually do
>>> lots of substring extractions from large compressible fields.
>> 
>> The code that's in there to make this happen was written by people who
>> needed the feature.  They're going to be upset with you if you propose
>> disabling it.

> Why didn't they just turn off compression for the relevant columns?

They did --- with the pre-8.4 code, they had no choice, because the
toast compressor would kick in if it could save even one byte on the
total field size.  That's clearly silly.  We might have gone too far
in the other direction with the current settings, but the point is
that compression isn't always a good thing.

One point that nobody seems to have focused on is whether Alex's
less-compressed table is faster or slower to access than the original.
I dunno if he has any easy way of investigating that for his typical
query mix, but it's certainly a fair question to ask.
        regards, tom lane

Re: Significantly larger toast tables on 8.4?

From

"Robert Haas"

Date:

03 January 2009, 00:34:16

>> Why didn't they just turn off compression for the relevant columns?
>
> They did --- with the pre-8.4 code, they had no choice, because the
> toast compressor would kick in if it could save even one byte on the
> total field size.  That's clearly silly.  We might have gone too far
> in the other direction with the current settings, but the point is
> that compression isn't always a good thing.

I agree with all of that.  It seems to me that categorically refusing
to compress anything over 1M, as Alex seems to think the current
settings are doing, is clearly silly in the opposite direction.  What
we want to avoid is trying to compress data that's already been
compressed - the early-failure path you added seems like the right
general idea, though perhaps a bit too simplistic.  But the size of
the data is not evidence of anything, so I'm unclear why we think
that's relevant.  It could also lead to some awfully strange behavior
if you have, say, a table with highly compressible data whose rows are
gradually updated with longer values over time.  When they hit 1MB,
the storage requirements of the database will suddenly balloon for no
reason that will be obvious to the DBA.

> One point that nobody seems to have focused on is whether Alex's
> less-compressed table is faster or slower to access than the original.
> I dunno if he has any easy way of investigating that for his typical
> query mix, but it's certainly a fair question to ask.

Sure, but that's largely an orthogonal issue.  Compression is
generally bad for performance, though there are certainly exceptions.
What it is good for is saving disk space, and that is why people use
it.  If that's not why we're using it, then I'm puzzled.

...Robert

Re: Significantly larger toast tables on 8.4?

From

Andrew Chernow

Date:

03 January 2009, 01:17:47

Robert Haas wrote:
> 
> Compression is
> generally bad for performance, though there are certainly exceptions.
> What it is good for is saving disk space, and that is why people use
> it.
> 

I don't think disk space is still the primary reason for using compression; 
shoot.. I can buy a terabyte drive for $150 on newegg and 1.5TB drives are out.  I think the primary reason has been
slowlymoving towards performance.  Fast 

downloads, compressed web sites or simply reading/writing less to/from disk are 
very common use cases for compression; all increase performance.  Basically, 
compression comonly offers faster data delivery.  As long as you utilize a fast 
compressor, like lzo or zlib "NOT level 9", its a performance win.

-- 
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

Re: Significantly larger toast tables on 8.4?

From

"Alex Hunsaker"

Date:

03 January 2009, 02:17:04

On Fri, Jan 2, 2009 at 20:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> One point that nobody seems to have focused on is whether Alex's
> less-compressed table is faster or slower to access than the original.
> I dunno if he has any easy way of investigating that for his typical
> query mix, but it's certainly a fair question to ask.

Other than the quick pgbench numbers I posted upthread where 8.4 blew
8.3 out of the water with a substring.  Not really, this table is
mainly insert.  A few times a day everything inserted that day gets
selected.  So while I'm almost positive 8.4 is faster, its probably
not really noticeable in my workload.  That being said here are some
quick numbers:

(see attached q.sql for how uninteresting the query is, also this is
so slow mainly due to the lack of it using an index, it seq-scans the
entire table :()
./pgbench -T600 -n -f q.sql
8.4 with 8.3 TOAST:  6.250298
8.4: 6.460312

(note I dont actually use substring on this table...)
./pgbench -T60 -n -f substring.sql
8.4 w 8.3 TOAST: 12.613394
8.4: 6347.456596

Attachment

Re: Significantly larger toast tables on 8.4?

From

"Alex Hunsaker"

Date:

03 January 2009, 02:32:08

On Fri, Jan 2, 2009 at 18:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> It would be fairly easy, I think, to add some reloption fields that
> would let these parameters be controlled on a per-table level.

+1

Or something easier that just lets you use PGLZ_Strategy strategy_always_data.

ALTER TABLE SET STORAGE ALWAYS_COMPRESS;

Or is that way to stop gap?

> Per-column would be much more painful; do we really need that?

Obviously not for 8.4 this late in the game.

Re: Significantly larger toast tables on 8.4?

From

Simon Riggs

Date:

03 January 2009, 05:46:04

On Fri, 2009-01-02 at 19:27 -0700, Alex Hunsaker wrote:
> On Fri, Jan 2, 2009 at 18:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > "Alex Hunsaker" <badalex@gmail.com> writes:
> >> On Fri, Jan 2, 2009 at 11:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> An easy way to prove or disprove the point would be to go into
> >>> src/backend/utils/adt/pg_lzcompress.c, and change the second entry
> >>> in strategy_default_data from "1024 * 1024" to "INT_MAX",
> >
> >> And the toast file size is *drum roll* 167M.
> >
> > Hmmm ... so that's a lot closer to the original 145M, but it still
> > seems like there's something else going on.  It looks like the other
> > thing we changed that might result in not compressing things was to
> > increase the third entry (minimum compression rate) from 20% to 25%.
> > Could you try it with that value also changed back?
> 
> With it back to 20% its now back to 145M.

Perspective on this is that the numbers don't sound too bad if we put a
big M behind them, but lets imagine that's a G or even a T. Those
numbers look pretty sad then. We must retain the option to compress and
even better, options to control the compression.

Please, please remember that the world genuinely does wish to store
multiple Terabytes of data and they won't do it with Postgres unless we
act sensibly on this point. Our open source software *enables* massive
deployment of database technology.

Oracle charge money for their Advanced Compression option, so if lets
avoid anything that will immediately justify that cost.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: Significantly larger toast tables on 8.4?

From

"Stephen R. van den Berg"

Date:

03 January 2009, 06:46:59

Tom Lane wrote:
>and I'm still pretty worried about the longevity of any knob we put in
>here.  But we might not have a lot of choice.

>It would be fairly easy, I think, to add some reloption fields that
>would let these parameters be controlled on a per-table level.
>Per-column would be much more painful; do we really need that?

+1

Per table sounds fine for now.  Per column would be nice, but can be worked
around if absolutely necessary by splitting tables.

To avoid having to add another parameter later, I *would* suggest to use
something like:

ALTER TABLE mytable SET COMPRESSIONLEVEL = 9;

Where it can range from 0 (= no compression), to 9 (= maximum compression).
The current algorithm could then either be as simplistic as to kick in
anytime COMPRESSIONLEVEL>=1, and not to compress when COMPRESSIONLEVEL==0.
More advanced algorithms and decisions can be implemented later.

Obviously the algorithm should ideally use the one-dimensional knob to
more or less deliver IO-bound (de)compression at level one, and CPU-bound
(de)compression at level nine.

This kind of one-dimensional knob is well understood by many compression
tools and libraries and users, so it'd make sense to provide something similar
to the DBA.
-- 
Sincerely,          Stephen R. van den Berg.

Expect the unexpected!

Re: Significantly larger toast tables on 8.4?

From

Peter Eisentraut

Date:

03 January 2009, 18:45:52

On Friday 02 January 2009 22:23:13 Stephen R. van den Berg wrote:
> Three things:
> a. Shouldn't it in theory be possible to have a decompression algorithm
>    which is IO-bound because it decompresses faster than the disk can
>    supply the data?  (On common current hardware).
> b. Has the current algorithm been carefully benchmarked and/or optimised
>    and/or chosen to fit the IO-bound target as close as possible?
> c. Are there any well-known pitfalls/objections which would prevent me from
>    changing the algorithm to something more efficient (read: IO-bound)?

copyright licenses and patents

Which doesn't mean changing anything is impossible, but it is tricky in those 
nontechnical ways.

Re: Significantly larger toast tables on 8.4?

From

Peter Eisentraut

Date:

03 January 2009, 18:56:20

On Saturday 03 January 2009 03:36:16 Tom Lane wrote:
> "Stephen R. van den Berg" <srb@cuci.nl> writes:
> > - I currently have difficulty imagining applications that actually do
> >   lots of substring extractions from large compressible fields.
>
> The code that's in there to make this happen was written by people who
> needed the feature.  They're going to be upset with you if you propose
> disabling it.

I think what he is saying is that it is the less likely use case and should 
therefore tend to be not the default.

Also note that the code in there was written about 8 years ago, when dealing 
with "large" data was an entirely different game.  People where happy to 
access more than 8 kB then.

I would in fact imagine that substring operations are more likely to happen 
with data smaller than 1 MB, and less likely with data larger than 1 MB, 
instead of the other way around, which is currently implemented.  The main 
sensible way to access text fields larger than 1 MB is with text search, as 
was pointed out.  And large bytea fields are probably media files that are 
probably already compressed and have no sensible use for substring 
operations.

Re: Significantly larger toast tables on 8.4?

From

Philip Warner

Date:

03 January 2009, 23:47:37

Tom Lane wrote:
> It would be fairly easy, I think, to add some reloption fields that
> would let these parameters be controlled on a per-table level.
> Per-column would be much more painful; do we really need that?
>   

Another +1 on the per-table setting. Or a config file setting to disable
this for the instance.

We have a 200GB DB that is mostly large text (>1MB) that is not searched
with substr. If we see a blowout in size of even 3x, we will not be able
to upgrade due to disk space limitations (at least without paying for a
lot of disks on mirror servers and hot-standy servers).

Re: Significantly larger toast tables on 8.4?

From

"Alex Hunsaker"

Date:

04 January 2009, 00:29:10

On Sat, Jan 3, 2009 at 20:47, Philip Warner <pjw@rhyme.com.au> wrote:
> Tom Lane wrote:
>> It would be fairly easy, I think, to add some reloption fields that
>> would let these parameters be controlled on a per-table level.
>> Per-column would be much more painful; do we really need that?
>>
>
> Another +1 on the per-table setting. Or a config file setting to disable
> this for the instance.
>
> We have a 200GB DB that is mostly large text (>1MB) that is not searched
> with substr. If we see a blowout in size of even 3x, we will not be able
> to upgrade due to disk space limitations (at least without paying for a
> lot of disks on mirror servers and hot-standy servers).

Well I *really* doubt unless your text is extremely redundant you will
see a large increase if any.  Even if you dont search by substr,
fetching the data is quite could be quite a bit faster.  Depending on
how beefy the cpu's on the machine are.  A quick benchmark here says
by as much 200x! (30tps vs 6000tps).  Thats just a simple select on a
dual 2ghz opteron.

For the record I just imported a production database that sits at
about ~20G right now with *zero* size increase (rounding to the
nearest gigabyte).  That's with basically the exact same schema just
different data.

I don't suppose you could export some random rows and see if you see
any size increase for your data?  My gut says you wont see an
increase.

Re: Significantly larger toast tables on 8.4?

From

Philip Warner

Date:

04 January 2009, 00:57:05

Alex Hunsaker wrote:
> For the record I just imported a production database that sits at
> about ~20G right now with *zero* size increase (rounding to the
> nearest gigabyte).  That's with basically the exact same schema just
> different data.
>
>   

Guessing you don't have many plain text rows  > 1M.

> I don't suppose you could export some random rows and see if you see
> any size increase for your data?  My gut says you wont see an
> increase.
>   

Will see what I can do.

Re: Significantly larger toast tables on 8.4?

From

"Alex Hunsaker"

Date:

04 January 2009, 01:15:13

On Sat, Jan 3, 2009 at 21:56, Philip Warner <pjw@rhyme.com.au> wrote:
> Alex Hunsaker wrote:
>> For the record I just imported a production database that sits at
>> about ~20G right now with *zero* size increase (rounding to the
>> nearest gigabyte).  That's with basically the exact same schema just
>> different data.
>>
> Guessing you don't have many plain text rows  > 1M.

Probably not.

>> I don't suppose you could export some random rows and see if you see
>> any size increase for your data?  My gut says you wont see an
>> increase.
>>
>
> Will see what I can do.


Actually assuming they dont have any multibyte chars you should just
be able to do something like the below on your existing database.

-- show anything we save a megabyte on

select die_id, pg_size_pretty(savings) from
( select length(debug) - pg_column_size(debug) as savings, die_id from
fooa) as foo
where savings > 1024*1024 order by savings desc;

Re: Significantly larger toast tables on 8.4?

From

James Mansion

Date:

04 January 2009, 18:40:46

Peter Eisentraut wrote:
>> c. Are there any well-known pitfalls/objections which would prevent me from
>>    changing the algorithm to something more efficient (read: IO-bound)?
>>     
>
> copyright licenses and patents
>   
Would it be possible to have a plugin facility?

I guess the most likely candidate is the LZJB mechanism in ZFS which is 
CDDL licensed.

Would that be compatible in contrib, if not in the main source?

James

Re: Significantly larger toast tables on 8.4?

From

"Stephen R. van den Berg"

Date:

04 January 2009, 21:34:23

James Mansion wrote:
>Peter Eisentraut wrote:
>>>c. Are there any well-known pitfalls/objections which would prevent me 
>>>from
>>>   changing the algorithm to something more efficient (read: IO-bound)?

>>copyright licenses and patents

>Would it be possible to have a plugin facility?

>I guess the most likely candidate is the LZJB mechanism in ZFS which is 
>CDDL licensed.

The most likely candidate for a speedy algorithm seems QuickLZ, the author
is willing to accomodate the licensing.
-- 
Sincerely,          Stephen R. van den Berg.

Climate is what you expect.  Weather is what you get.

Re: Significantly larger toast tables on 8.4?

From

"Merlin Moncure"

Date:

05 January 2009, 11:57:36

On Sat, Jan 3, 2009 at 1:32 AM, Alex Hunsaker <badalex@gmail.com> wrote:
> On Fri, Jan 2, 2009 at 18:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> It would be fairly easy, I think, to add some reloption fields that
>> would let these parameters be controlled on a per-table level.
>
> +1
>
> Or something easier that just lets you use PGLZ_Strategy strategy_always_data.
>
> ALTER TABLE SET STORAGE ALWAYS_COMPRESS;
>
> Or is that way to stop gap?
>
>> Per-column would be much more painful; do we really need that?

Personally, I think the whole built-in compressor framework should be
deprecated (it's confusing and unintuitive) and replaced with a
pluggable compression family in the vein of pg_crypto.  This solves a
number of issues simultaneously...we no longer have to 'guess' if the
user wants a bytea compressed or not, everything is controlled from
the client, and we can make it easier for users to use non license
compatible libraries (like lzo) in a standard way.

merlin

Re: Significantly larger toast tables on 8.4?

From

Peter Eisentraut

Date:

05 January 2009, 12:38:38

James Mansion wrote:
> Peter Eisentraut wrote:
>>> c. Are there any well-known pitfalls/objections which would prevent 
>>> me from
>>>    changing the algorithm to something more efficient (read: IO-bound)?
>>>     
>>
>> copyright licenses and patents
>>   
> Would it be possible to have a plugin facility?

Well, before we consider that, we'd probably want to see proof about the  effectiveness of other compression methods.

Re: Significantly larger toast tables on 8.4?

From

Alvaro Herrera

Date:

05 January 2009, 12:46:20

Peter Eisentraut escribió:
> James Mansion wrote:
>> Peter Eisentraut wrote:
>>>> c. Are there any well-known pitfalls/objections which would prevent 
>>>> me from
>>>>    changing the algorithm to something more efficient (read: IO-bound)?
>>>
>>> copyright licenses and patents
>>>   
>> Would it be possible to have a plugin facility?
>
> Well, before we consider that, we'd probably want to see proof about the  
>  effectiveness of other compression methods.

I did some measurements months ago, and it was very clear that libz
compression was a lot tighter than the PGLZ code.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Significantly larger toast tables on 8.4?

From

"Merlin Moncure"

Date:

05 January 2009, 12:55:18

On Mon, Jan 5, 2009 at 11:45 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Peter Eisentraut escribió:
>> James Mansion wrote:
>>> Peter Eisentraut wrote:
>>>>> c. Are there any well-known pitfalls/objections which would prevent
>>>>> me from
>>>>>    changing the algorithm to something more efficient (read: IO-bound)?
>>>>
>>>> copyright licenses and patents
>>>>
>>> Would it be possible to have a plugin facility?
>>
>> Well, before we consider that, we'd probably want to see proof about the
>>  effectiveness of other compression methods.
>
> I did some measurements months ago, and it was very clear that libz
> compression was a lot tighter than the PGLZ code.

we have seen amazing results with lzo compression...2-3x faster
compression times with only 10-15% less compression:

There are tons of supporting examples online, for example:
http://mail.jabber.org/pipermail/standards/2005-October/008768.html

I think, if the database is automatically compressing things (which,
IMO, it shouldn't), a low cpu overhead algorithm should be favored.

merlin

Re: Significantly larger toast tables on 8.4?

From

Peter Eisentraut

Date:

06 January 2009, 14:10:40

On Monday 05 January 2009 18:45:49 Alvaro Herrera wrote:
> I did some measurements months ago, and it was very clear that libz
> compression was a lot tighter than the PGLZ code.

Back to the issue at hand.  The question at the top of the thread was which of 
the following behaviors we'd like by default:

(1) Compress everything within reason by default, causing slower retrieval, do 
not offer substr optimization. [<= 8.3]

(2) Compress only up to 1 MB, causing faster retrieval, supporting substr 
optimization. [8.4devel]

I am personally completely puzzled by option number 2.  Is there even a single 
use case for that?

Re: Significantly larger toast tables on 8.4?

From

"Stephen R. van den Berg"

Date:

06 January 2009, 15:57:48

Peter Eisentraut wrote:
>(1) Compress everything within reason by default, causing slower retrieval, do 
>not offer substr optimization. [<= 8.3]

>(2) Compress only up to 1 MB, causing faster retrieval, supporting substr 
>optimization. [8.4devel]

>I am personally completely puzzled by option number 2.  Is there even a single 
>use case for that?

I can't imagine one, and (in this thread at least) noone has demonstrated
such; Tom hinted at one, but he didn't elaborate.
-- 
Sincerely,          Stephen R. van den Berg.

"Very funny, Mr. Scott. Now beam down my clothes!"

Re: Significantly larger toast tables on 8.4?

From

"Alex Hunsaker"

Date:

06 January 2009, 16:30:20

On Tue, Jan 6, 2009 at 12:57, Stephen R. van den Berg <srb@cuci.nl> wrote:
> Peter Eisentraut wrote:
>>(1) Compress everything within reason by default, causing slower retrieval, do
>>not offer substr optimization. [<= 8.3]
>
>>(2) Compress only up to 1 MB, causing faster retrieval, supporting substr
>>optimization. [8.4devel]
>
>>I am personally completely puzzled by option number 2.  Is there even a single
>>use case for that?
>
> I can't imagine one, and (in this thread at least) noone has demonstrated
> such; Tom hinted at one, but he didn't elaborate.

Well that check got removed today anyway see:
http://archives.postgresql.org/pgsql-committers/2009-01/msg00069.php

Re: Significantly larger toast tables on 8.4?

From

"Gregory Maxwell"

Date:

07 January 2009, 10:45:18

On Fri, Jan 2, 2009 at 5:48 PM, Martijn van Oosterhout
<kleptog@svana.org> wrote:
> So you compromise. You split the data into say 1MB blobs and compress
> each individually. Then if someone does a substring at offset 3MB you
> can find it quickly. This barely costs you anything in the compression
> ratio mostly.
>
> Implementation though, that's harder. The size of the blobs is tunable
> also. I imagine the optimal value will probably be around 100KB. (12
> blocks uncompressed).

Or have the database do that internally:  With the available fast
compression algorithms (zlib; lzo; lzf; etc) the diminishing return
from larger compression block sizes kicks in rather quickly. Other
algos like LZMA or BZIP gain more from bigger block sizes, but I
expect all of them are too slow to ever consider using in PostgreSQL.

So, I expect that the compression loss from compressing in chunks of
64kbytes would be minimal. The database could then include a list of
offsets for the 64kbyte chunks at the beginning of the field, or
something like that.  A short substring would then require
decompressing just one or two blocks, far less overhead then
decompressing everything.

It would probably be worthwhile to graph compression ratio vs block
size for some reasonable input.  I'd offer to do it; but I doubt I
have a reasonable test set for this.