Thread: Any risk in increasing BLCKSZ to get larger tuples?

Any risk in increasing BLCKSZ to get larger tuples?

From
Philip Hallstrom
Date:
Hi -
    I'm thinking about using postgres for an app that will store
various email messages which might (although probably not likely) be
larger than the builtin limit for tuples.  Is there anything I should be
aware of before changing the below value and recompiling?

Also, it looks like the TOAST stuff would solve this (right/wrong?), but
it's not going to be ready for 7.1 (right/wrong?)

Thanks!

from src/include/config.h
/*
 * Size of a disk block --- currently, this limits the size of a tuple.
 * You can set it bigger if you need bigger tuples.
 */
/* currently must be <= 32k bjm */
#define BLCKSZ  8192




Re: Any risk in increasing BLCKSZ to get larger tuples?

From
Neil Conway
Date:
On Wed, Oct 18, 2000 at 02:46:36PM -0700, Philip Hallstrom wrote:
>     I'm thinking about using postgres for an app that will store
> various email messages which might (although probably not likely) be
> larger than the builtin limit for tuples.  Is there anything I should be
> aware of before changing the below value and recompiling?
>
> Also, it looks like the TOAST stuff would solve this (right/wrong?), but
> it's not going to be ready for 7.1 (right/wrong?)

Right, and wrong. TOAST will solve this, and it will be ready for 7.1. It's
in the current sources, BTW (I'm developing an app which uses it and I
haven't had any problems working with Postgres from CVS).

I've heard that people sometimes run into problems setting BLCKSZ to
32K - I'd suggest staying under 30K.

BTW, although most emails are < 8K, some could be > 1 meg (attachments,
deliberate attempts to cause problems). So increasing BLCKSZ isn't
the only thing you can do. Perhaps the best solution would be to
determine BLCKSZ at runtime (possible?), and then either reject
stuff > than that, or store it as a LO.

HTH,

Neil

--
Neil Conway <neilconway@home.com>
Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc
Encrypted mail welcomed

Whoever you are -- SGI, SCO, HP, or even Microsoft -- most of the
smart people on the planet work somewhere else.
        -- Eric S. Raymond

Attachment

to_char function

From
Yohans Mendoza
Date:
hi all,

where can I find info about the function to_char

TIA

--Yohans

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Yohans Mendoza                    System Analyst
yohans@sirius-images.com            Sirius Images Inc.
http://www.sirius-images.net/users/yohans        http://www.sirius-images.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



Re: Any risk in increasing BLCKSZ to get larger tuples?

From
Tom Lane
Date:
Philip Hallstrom <philip@adhesivemedia.com> writes:
> larger than the builtin limit for tuples.  Is there anything I should be
> aware of before changing the below value and recompiling?

Only that it will force an initdb.  Note the 32k limit, too.

A trick you can use in 7.0.* to squeeze out a little more space is
to declare your large text fields as "lztext" --- this invokes
inline compression, which might get you a factor of 2 or so on typical
mail messages.  lztext will go away again in 7.1, since TOAST supersedes
it, but for now it's a useful thing to know about.

> Also, it looks like the TOAST stuff would solve this (right/wrong?), but
> it's not going to be ready for 7.1 (right/wrong?)

Right, and wrong.  It's been done for months...

            regards, tom lane

Re: to_char function

From
Karel Zak
Date:
On Wed, 18 Oct 2000, Yohans Mendoza wrote:

> hi all,
>
> where can I find info about the function to_char

 Oh man.. Do you know how much hardly is for non-English person like me
write a good documentation? And you can't found it. :-)

 Where.. in the PostgreSQL docs (for 7.0 or 7.1devel) of course.
(See "formatting function"..)

                    Karel


Re: Any risk in increasing BLCKSZ to get larger tuples?

From
Joseph Shraibman
Date:
Tom Lane wrote:
>
> Philip Hallstrom <philip@adhesivemedia.com> writes:
> > larger than the builtin limit for tuples.  Is there anything I should be
> > aware of before changing the below value and recompiling?
>
> Only that it will force an initdb.  Note the 32k limit, too.
>
> A trick you can use in 7.0.* to squeeze out a little more space is
> to declare your large text fields as "lztext" --- this invokes
> inline compression, which might get you a factor of 2 or so on typical
> mail messages.  lztext will go away again in 7.1, since TOAST supersedes
> it,

Uh, why.  Does TOAST do automatic compression?  If people need to store
huge blocks of text (like a DNA sequence) inline compression isn't just
a hack to squeeze bigger text into a tuple.

>
> > Also, it looks like the TOAST stuff would solve this (right/wrong?), but
> > it's not going to be ready for 7.1 (right/wrong?)
>
> Right, and wrong.  It's been done for months...
>

I've been wondering why we haven't seen 7.1 before now then.  I mean why
are you waiting on whatever you are waiting on?  Why not release 7.1 now
and 7.2 in January with all the other features you want to add?


--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com

Re: Any risk in increasing BLCKSZ to get larger tuples?

From
Neil Conway
Date:
On Thu, Oct 19, 2000 at 04:24:54PM -0400, Joseph Shraibman wrote:
> Uh, why.  Does TOAST do automatic compression?  If people need to store
> huge blocks of text (like a DNA sequence) inline compression isn't just
> a hack to squeeze bigger text into a tuple.

Yes, TOAST does do automatic compression. Check the list archives or the
info here: http://www.postgresql.org/projects/devel-toast.html

It would be nice to be able to tell TOAST not to bother compressing a
given column... I remember Tom saying this would be possible, but AFAIK
there isn't a convenient user interface to it.

> I've been wondering why we haven't seen 7.1 before now then.  I mean why
> are you waiting on whatever you are waiting on?  Why not release 7.1 now
> and 7.2 in January with all the other features you want to add?

LOL...

My guess is it will be released When It's Ready. Some people put the
'lifeblood' of their company on an RDBMS - when released, people expect
it to be bugfree. Also, I understand some features still need to be
added (is WAL in yet?) - there should be a public beta fairly soon.

--
Neil Conway <neilconway@home.com>
Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc
Encrypted mail welcomed

Blaming guns for Columbine is like blaming spoons for Rosie O'Donnell
being fat.

Attachment

Re: Any risk in increasing BLCKSZ to get larger tuples?

From
"Steve Wolfe"
Date:
> > A trick you can use in 7.0.* to squeeze out a little more space is
> > to declare your large text fields as "lztext" --- this invokes
> > inline compression, which might get you a factor of 2 or so on typical
> > mail messages.  lztext will go away again in 7.1, since TOAST supersedes
> > it,
>
> Uh, why.  Does TOAST do automatic compression?  If people need to store
> huge blocks of text (like a DNA sequence) inline compression isn't just
> a hack to squeeze bigger text into a tuple.

  I'd guess that it's a speed issue.  Decompressing everything in the table
for every select sounds like a great waste of CPU power, to me, especially
when hard drives and RAM are cheap.  Kind of like the idea of "drivespace"
on Windows - nice idea, but it slowed things down quite a bit.

steve



Re: Any risk in increasing BLCKSZ to get larger tuples?

From
Tom Lane
Date:
Joseph Shraibman <jks@selectacast.net> writes:
>> A trick you can use in 7.0.* to squeeze out a little more space is
>> to declare your large text fields as "lztext" --- this invokes
>> inline compression, which might get you a factor of 2 or so on typical
>> mail messages.  lztext will go away again in 7.1, since TOAST supersedes
>> it,

> Uh, why.  Does TOAST do automatic compression?

Yes.

> I've been wondering why we haven't seen 7.1 before now then.  I mean why
> are you waiting on whatever you are waiting on?  Why not release 7.1 now
> and 7.2 in January with all the other features you want to add?

The original plan for 7.1 was "WAL from Vadim, plus whatever the rest of
us get done meanwhile".  Vadim's taken longer than expected, that's all.
(He's had a few distractions, like a new wife...)

There was some thought of maybe releasing 7.1 without WAL, but we're
pretty much committed now --- the WAL changes are halfway-integrated in
CVS, and backing them out would take more effort than it seems worth.

            regards, tom lane

Re: Any risk in increasing BLCKSZ to get larger tuples?

From
Joseph Shraibman
Date:
Tom Lane wrote:
>
> Joseph Shraibman <jks@selectacast.net> writes:
> >> A trick you can use in 7.0.* to squeeze out a little more space is
> >> to declare your large text fields as "lztext" --- this invokes
> >> inline compression, which might get you a factor of 2 or so on typical
> >> mail messages.  lztext will go away again in 7.1, since TOAST supersedes
> >> it,
>
> > Uh, why.  Does TOAST do automatic compression?
>
> Yes.
>
> > I've been wondering why we haven't seen 7.1 before now then.  I mean why
> > are you waiting on whatever you are waiting on?  Why not release 7.1 now
> > and 7.2 in January with all the other features you want to add?
>
> The original plan for 7.1 was "WAL from Vadim, plus whatever the rest of
> us get done meanwhile".  Vadim's taken longer than expected, that's all.
> (He's had a few distractions, like a new wife...)
>
> There was some thought of maybe releasing 7.1 without WAL, but we're
> pretty much committed now --- the WAL changes are halfway-integrated in
> CVS, and backing them out would take more effort than it seems worth.
>

Um, so you can't just leave the code in there but put ifdefs so it
doesn't get compiled?  Or just don't put in the documentation
instructions on how to enable WAL so nobody tries to run it?

I'm not complaining, just wondering what goes on behind the scenes.

And do you really think that WAL is more important that TOAST?  I
imagine a good percentage of users bump up against the 8K limit and end
up with corrupted data (like I did) but much fewer think that WAL is a
critical feature.

--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com

Re: Any risk in increasing BLCKSZ to get larger tuples?

From
Joseph Shraibman
Date:
Steve Wolfe wrote:
>
> > > A trick you can use in 7.0.* to squeeze out a little more space is
> > > to declare your large text fields as "lztext" --- this invokes
> > > inline compression, which might get you a factor of 2 or so on typical
> > > mail messages.  lztext will go away again in 7.1, since TOAST supersedes
> > > it,
> >
> > Uh, why.  Does TOAST do automatic compression?  If people need to store
> > huge blocks of text (like a DNA sequence) inline compression isn't just
> > a hack to squeeze bigger text into a tuple.
>
>   I'd guess that it's a speed issue.  Decompressing everything in the table
> for every select sounds like a great waste of CPU power, to me, especially
> when hard drives and RAM are cheap.  Kind of like the idea of "drivespace"
> on Windows - nice idea, but it slowed things down quite a bit.

In some cases yes, in some no.  Simple text should compress/decompress
quickly and the cpu time wasted is made up for by less hardware access
time and smaller db files.  If you have a huge database the smaller db
files could be critical.


--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com

Re: Any risk in increasing BLCKSZ to get larger tuples?

From
"Steve Wolfe"
Date:
> In some cases yes, in some no.  Simple text should compress/decompress
> quickly and the cpu time wasted is made up for by less hardware access
> time and smaller db files.  If you have a huge database the smaller db
> files could be critical.

  Hmm... that doesn't seem quite right to me.  Whether it is compressed or
not, the same amount of final data has to move across the system bus to the
CPU for processing.  It's the difference of (A) moving a large amount of
data to the CPU and processing it, or (B) moving a small amount of data to
the CPU, use the CPU cycles to turn it into the large set (as large as in
(A)), then processing it.  I could be wrong, though.

steve



Re: Any risk in increasing BLCKSZ to get larger tuples?

From
Philip Hallstrom
Date:
[stuff about why 7.1 isn't out and the 8K limit and TOAST AND WAL snipped]

> And do you really think that WAL is more important that TOAST?  I
> imagine a good percentage of users bump up against the 8K limit and end
> up with corrupted data (like I did) but much fewer think that WAL is a
> critical feature.

If I had to pick I would rather have WAL over TOAST.  I originally asked
the question about BLCKSZ and this is the first app that I am worried
about hitting that limit.  It's actually never even crossed my mind before
this since usually if it's big it's an image and I just store it on disk
with a filename in the database.  And I would say for a lot of web uses 8K
(or the 32K max) is way more than adequate.
    WAL on the other hand would be really nice because even if my data
is small, it sure would be nice to reproduce it in the case of a monstrous
crash.

just my 2 cents.


-philip


Re: Any risk in increasing BLCKSZ to get larger tuples?

From
Joseph Shraibman
Date:
Steve Wolfe wrote:
>
> > In some cases yes, in some no.  Simple text should compress/decompress
> > quickly and the cpu time wasted is made up for by less hardware access
> > time and smaller db files.  If you have a huge database the smaller db
> > files could be critical.
>
>   Hmm... that doesn't seem quite right to me.  Whether it is compressed or
> not, the same amount of final data has to move across the system bus to the
> CPU for processing.  It's the difference of (A) moving a large amount of
> data to the CPU and processing it, or (B) moving a small amount of data to
> the CPU, use the CPU cycles to turn it into the large set (as large as in
> (A)), then processing it.  I could be wrong, though.
>

It isn't the system bus, its the hardware of the hard disk. In general
hardware costs are much bigger than a few cpu cycles (especially as cpu
cycles are increasing with Moore's law and hardware access times
aren't), but that isn't always the case (like in drivespace in Windows).

Recently I was doing performance tuning on my application where I was
adding a bunch of users to the system. i was making 6 db calls per user
added.  I assumed that the cpu costs of what I was doing was the
limiting factor, but the cpu usage was only at like %20.  Reducing the
db calls to 4 meant a big increase in performance, streamlining the code
was negligble.

That's why I said for some cases automatic compression makes sense, for
others it doesn't.

--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com

Re: Any risk in increasing BLCKSZ to get larger tuples?

From
Joseph Shraibman
Date:
Philip Hallstrom wrote:
>
> [stuff about why 7.1 isn't out and the 8K limit and TOAST AND WAL snipped]
>
> > And do you really think that WAL is more important that TOAST?  I
> > imagine a good percentage of users bump up against the 8K limit and end
> > up with corrupted data (like I did) but much fewer think that WAL is a
> > critical feature.
>
> If I had to pick I would rather have WAL over TOAST.  I originally asked
> the question about BLCKSZ and this is the first app that I am worried
> about hitting that limit.  It's actually never even crossed my mind before
> this since usually if it's big it's an image and I just store it on disk
> with a filename in the database.

That is what I'm doing now, but only because the 8k limit forced me to.
It adds kludge to the code, and when I was deleting entries to the
database, in order to make sure there weren't orphaned files around
(disk space leak) I would have had to read in the table entry, get the
filenames, delete the files, then delte the table entry. Since deltes
are rare for me I decided not to worry about that until TOAST came
along.

  And I would say for a lot of web uses 8K
> (or the 32K max) is way more than adequate.

A lot, but for a lot MySQL is adequate.  For a lot it isn't.  And if 8K
isn't adequate there is nothing you can do about it (except recompile of
course, but then you still have a limit).

>         WAL on the other hand would be really nice because even if my data
> is small, it sure would be nice to reproduce it in the case of a monstrous
> crash.

Nice, but not neccessary.  That is my point.  I would think more people
would consider TOAST neccessary than WAL because running into the 8k
limit (or 32k limit) will be a showstopper.  Granted some people NEED to
make sure no data is lost, but since WAL isn't ready yet and TOAST is,
why not release 7.1 now and release 7.2 in January with WAL?


--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com

Re: Any risk in increasing BLCKSZ to get larger tuples?

From
Tom Lane
Date:
Joseph Shraibman <jks@selectacast.net> writes:
> since WAL isn't ready yet and TOAST is,
> why not release 7.1 now and release 7.2 in January with WAL?

Unfortunately that's not the scenario we're in.  What we've got is
beta-quality TOAST, beta-quality outer joins, and a long list of
unrelated bugs still to be fixed before 7.1 can go out the door.
Even if we decided today that we wanted to make a release without WAL,
I don't think we'd be ready to go beta much before the end of this
month, and certainly no final release less than a month after that.
The reason is that other people have been designing their schedules
around WAL --- for example, if we'd been trying to make an Oct 1 beta
date, I'd have been doing bug fixes not outer joins last month.

Pulling WAL would allow us to move up 7.1 release somewhat, but I don't
think two months worth.

Furthermore, if we did do it that way then 7.2 wouldn't happen in
January.  A beta test/release cycle consumes a lot of developer time,
which would translate directly to pushing back the 7.2 dates.

If we'd foreseen the schedule more accurately back around July, we
might've chosen to push out a 7.1 with only TOAST, no WAL or outer
joins, but it's too late for that now.

            regards, tom lane

Re: Any risk in increasing BLCKSZ to get larger tuples?

From
Elmar Haneke
Date:
e40@iboats.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit



Steve Wolfe wrote:

>   Hmm... that doesn't seem quite right to me.  Whether it is compressed or
> not, the same amount of final data has to move across the system bus to the
> CPU for processing.

Thats correct as far as your disk is bottlenecked by the system bus
:-)

Elmar