Thread: Any risk in increasing BLCKSZ to get larger tuples?
Hi - I'm thinking about using postgres for an app that will store various email messages which might (although probably not likely) be larger than the builtin limit for tuples. Is there anything I should be aware of before changing the below value and recompiling? Also, it looks like the TOAST stuff would solve this (right/wrong?), but it's not going to be ready for 7.1 (right/wrong?) Thanks! from src/include/config.h /* * Size of a disk block --- currently, this limits the size of a tuple. * You can set it bigger if you need bigger tuples. */ /* currently must be <= 32k bjm */ #define BLCKSZ 8192
On Wed, Oct 18, 2000 at 02:46:36PM -0700, Philip Hallstrom wrote: > I'm thinking about using postgres for an app that will store > various email messages which might (although probably not likely) be > larger than the builtin limit for tuples. Is there anything I should be > aware of before changing the below value and recompiling? > > Also, it looks like the TOAST stuff would solve this (right/wrong?), but > it's not going to be ready for 7.1 (right/wrong?) Right, and wrong. TOAST will solve this, and it will be ready for 7.1. It's in the current sources, BTW (I'm developing an app which uses it and I haven't had any problems working with Postgres from CVS). I've heard that people sometimes run into problems setting BLCKSZ to 32K - I'd suggest staying under 30K. BTW, although most emails are < 8K, some could be > 1 meg (attachments, deliberate attempts to cause problems). So increasing BLCKSZ isn't the only thing you can do. Perhaps the best solution would be to determine BLCKSZ at runtime (possible?), and then either reject stuff > than that, or store it as a LO. HTH, Neil -- Neil Conway <neilconway@home.com> Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc Encrypted mail welcomed Whoever you are -- SGI, SCO, HP, or even Microsoft -- most of the smart people on the planet work somewhere else. -- Eric S. Raymond
Attachment
hi all, where can I find info about the function to_char TIA --Yohans ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Yohans Mendoza System Analyst yohans@sirius-images.com Sirius Images Inc. http://www.sirius-images.net/users/yohans http://www.sirius-images.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Philip Hallstrom <philip@adhesivemedia.com> writes: > larger than the builtin limit for tuples. Is there anything I should be > aware of before changing the below value and recompiling? Only that it will force an initdb. Note the 32k limit, too. A trick you can use in 7.0.* to squeeze out a little more space is to declare your large text fields as "lztext" --- this invokes inline compression, which might get you a factor of 2 or so on typical mail messages. lztext will go away again in 7.1, since TOAST supersedes it, but for now it's a useful thing to know about. > Also, it looks like the TOAST stuff would solve this (right/wrong?), but > it's not going to be ready for 7.1 (right/wrong?) Right, and wrong. It's been done for months... regards, tom lane
On Wed, 18 Oct 2000, Yohans Mendoza wrote: > hi all, > > where can I find info about the function to_char Oh man.. Do you know how much hardly is for non-English person like me write a good documentation? And you can't found it. :-) Where.. in the PostgreSQL docs (for 7.0 or 7.1devel) of course. (See "formatting function"..) Karel
Tom Lane wrote: > > Philip Hallstrom <philip@adhesivemedia.com> writes: > > larger than the builtin limit for tuples. Is there anything I should be > > aware of before changing the below value and recompiling? > > Only that it will force an initdb. Note the 32k limit, too. > > A trick you can use in 7.0.* to squeeze out a little more space is > to declare your large text fields as "lztext" --- this invokes > inline compression, which might get you a factor of 2 or so on typical > mail messages. lztext will go away again in 7.1, since TOAST supersedes > it, Uh, why. Does TOAST do automatic compression? If people need to store huge blocks of text (like a DNA sequence) inline compression isn't just a hack to squeeze bigger text into a tuple. > > > Also, it looks like the TOAST stuff would solve this (right/wrong?), but > > it's not going to be ready for 7.1 (right/wrong?) > > Right, and wrong. It's been done for months... > I've been wondering why we haven't seen 7.1 before now then. I mean why are you waiting on whatever you are waiting on? Why not release 7.1 now and 7.2 in January with all the other features you want to add? -- Joseph Shraibman jks@selectacast.net Increase signal to noise ratio. http://www.targabot.com
On Thu, Oct 19, 2000 at 04:24:54PM -0400, Joseph Shraibman wrote: > Uh, why. Does TOAST do automatic compression? If people need to store > huge blocks of text (like a DNA sequence) inline compression isn't just > a hack to squeeze bigger text into a tuple. Yes, TOAST does do automatic compression. Check the list archives or the info here: http://www.postgresql.org/projects/devel-toast.html It would be nice to be able to tell TOAST not to bother compressing a given column... I remember Tom saying this would be possible, but AFAIK there isn't a convenient user interface to it. > I've been wondering why we haven't seen 7.1 before now then. I mean why > are you waiting on whatever you are waiting on? Why not release 7.1 now > and 7.2 in January with all the other features you want to add? LOL... My guess is it will be released When It's Ready. Some people put the 'lifeblood' of their company on an RDBMS - when released, people expect it to be bugfree. Also, I understand some features still need to be added (is WAL in yet?) - there should be a public beta fairly soon. -- Neil Conway <neilconway@home.com> Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc Encrypted mail welcomed Blaming guns for Columbine is like blaming spoons for Rosie O'Donnell being fat.
Attachment
> > A trick you can use in 7.0.* to squeeze out a little more space is > > to declare your large text fields as "lztext" --- this invokes > > inline compression, which might get you a factor of 2 or so on typical > > mail messages. lztext will go away again in 7.1, since TOAST supersedes > > it, > > Uh, why. Does TOAST do automatic compression? If people need to store > huge blocks of text (like a DNA sequence) inline compression isn't just > a hack to squeeze bigger text into a tuple. I'd guess that it's a speed issue. Decompressing everything in the table for every select sounds like a great waste of CPU power, to me, especially when hard drives and RAM are cheap. Kind of like the idea of "drivespace" on Windows - nice idea, but it slowed things down quite a bit. steve
Joseph Shraibman <jks@selectacast.net> writes: >> A trick you can use in 7.0.* to squeeze out a little more space is >> to declare your large text fields as "lztext" --- this invokes >> inline compression, which might get you a factor of 2 or so on typical >> mail messages. lztext will go away again in 7.1, since TOAST supersedes >> it, > Uh, why. Does TOAST do automatic compression? Yes. > I've been wondering why we haven't seen 7.1 before now then. I mean why > are you waiting on whatever you are waiting on? Why not release 7.1 now > and 7.2 in January with all the other features you want to add? The original plan for 7.1 was "WAL from Vadim, plus whatever the rest of us get done meanwhile". Vadim's taken longer than expected, that's all. (He's had a few distractions, like a new wife...) There was some thought of maybe releasing 7.1 without WAL, but we're pretty much committed now --- the WAL changes are halfway-integrated in CVS, and backing them out would take more effort than it seems worth. regards, tom lane
Tom Lane wrote: > > Joseph Shraibman <jks@selectacast.net> writes: > >> A trick you can use in 7.0.* to squeeze out a little more space is > >> to declare your large text fields as "lztext" --- this invokes > >> inline compression, which might get you a factor of 2 or so on typical > >> mail messages. lztext will go away again in 7.1, since TOAST supersedes > >> it, > > > Uh, why. Does TOAST do automatic compression? > > Yes. > > > I've been wondering why we haven't seen 7.1 before now then. I mean why > > are you waiting on whatever you are waiting on? Why not release 7.1 now > > and 7.2 in January with all the other features you want to add? > > The original plan for 7.1 was "WAL from Vadim, plus whatever the rest of > us get done meanwhile". Vadim's taken longer than expected, that's all. > (He's had a few distractions, like a new wife...) > > There was some thought of maybe releasing 7.1 without WAL, but we're > pretty much committed now --- the WAL changes are halfway-integrated in > CVS, and backing them out would take more effort than it seems worth. > Um, so you can't just leave the code in there but put ifdefs so it doesn't get compiled? Or just don't put in the documentation instructions on how to enable WAL so nobody tries to run it? I'm not complaining, just wondering what goes on behind the scenes. And do you really think that WAL is more important that TOAST? I imagine a good percentage of users bump up against the 8K limit and end up with corrupted data (like I did) but much fewer think that WAL is a critical feature. -- Joseph Shraibman jks@selectacast.net Increase signal to noise ratio. http://www.targabot.com
Steve Wolfe wrote: > > > > A trick you can use in 7.0.* to squeeze out a little more space is > > > to declare your large text fields as "lztext" --- this invokes > > > inline compression, which might get you a factor of 2 or so on typical > > > mail messages. lztext will go away again in 7.1, since TOAST supersedes > > > it, > > > > Uh, why. Does TOAST do automatic compression? If people need to store > > huge blocks of text (like a DNA sequence) inline compression isn't just > > a hack to squeeze bigger text into a tuple. > > I'd guess that it's a speed issue. Decompressing everything in the table > for every select sounds like a great waste of CPU power, to me, especially > when hard drives and RAM are cheap. Kind of like the idea of "drivespace" > on Windows - nice idea, but it slowed things down quite a bit. In some cases yes, in some no. Simple text should compress/decompress quickly and the cpu time wasted is made up for by less hardware access time and smaller db files. If you have a huge database the smaller db files could be critical. -- Joseph Shraibman jks@selectacast.net Increase signal to noise ratio. http://www.targabot.com
> In some cases yes, in some no. Simple text should compress/decompress > quickly and the cpu time wasted is made up for by less hardware access > time and smaller db files. If you have a huge database the smaller db > files could be critical. Hmm... that doesn't seem quite right to me. Whether it is compressed or not, the same amount of final data has to move across the system bus to the CPU for processing. It's the difference of (A) moving a large amount of data to the CPU and processing it, or (B) moving a small amount of data to the CPU, use the CPU cycles to turn it into the large set (as large as in (A)), then processing it. I could be wrong, though. steve
[stuff about why 7.1 isn't out and the 8K limit and TOAST AND WAL snipped] > And do you really think that WAL is more important that TOAST? I > imagine a good percentage of users bump up against the 8K limit and end > up with corrupted data (like I did) but much fewer think that WAL is a > critical feature. If I had to pick I would rather have WAL over TOAST. I originally asked the question about BLCKSZ and this is the first app that I am worried about hitting that limit. It's actually never even crossed my mind before this since usually if it's big it's an image and I just store it on disk with a filename in the database. And I would say for a lot of web uses 8K (or the 32K max) is way more than adequate. WAL on the other hand would be really nice because even if my data is small, it sure would be nice to reproduce it in the case of a monstrous crash. just my 2 cents. -philip
Steve Wolfe wrote: > > > In some cases yes, in some no. Simple text should compress/decompress > > quickly and the cpu time wasted is made up for by less hardware access > > time and smaller db files. If you have a huge database the smaller db > > files could be critical. > > Hmm... that doesn't seem quite right to me. Whether it is compressed or > not, the same amount of final data has to move across the system bus to the > CPU for processing. It's the difference of (A) moving a large amount of > data to the CPU and processing it, or (B) moving a small amount of data to > the CPU, use the CPU cycles to turn it into the large set (as large as in > (A)), then processing it. I could be wrong, though. > It isn't the system bus, its the hardware of the hard disk. In general hardware costs are much bigger than a few cpu cycles (especially as cpu cycles are increasing with Moore's law and hardware access times aren't), but that isn't always the case (like in drivespace in Windows). Recently I was doing performance tuning on my application where I was adding a bunch of users to the system. i was making 6 db calls per user added. I assumed that the cpu costs of what I was doing was the limiting factor, but the cpu usage was only at like %20. Reducing the db calls to 4 meant a big increase in performance, streamlining the code was negligble. That's why I said for some cases automatic compression makes sense, for others it doesn't. -- Joseph Shraibman jks@selectacast.net Increase signal to noise ratio. http://www.targabot.com
Philip Hallstrom wrote: > > [stuff about why 7.1 isn't out and the 8K limit and TOAST AND WAL snipped] > > > And do you really think that WAL is more important that TOAST? I > > imagine a good percentage of users bump up against the 8K limit and end > > up with corrupted data (like I did) but much fewer think that WAL is a > > critical feature. > > If I had to pick I would rather have WAL over TOAST. I originally asked > the question about BLCKSZ and this is the first app that I am worried > about hitting that limit. It's actually never even crossed my mind before > this since usually if it's big it's an image and I just store it on disk > with a filename in the database. That is what I'm doing now, but only because the 8k limit forced me to. It adds kludge to the code, and when I was deleting entries to the database, in order to make sure there weren't orphaned files around (disk space leak) I would have had to read in the table entry, get the filenames, delete the files, then delte the table entry. Since deltes are rare for me I decided not to worry about that until TOAST came along. And I would say for a lot of web uses 8K > (or the 32K max) is way more than adequate. A lot, but for a lot MySQL is adequate. For a lot it isn't. And if 8K isn't adequate there is nothing you can do about it (except recompile of course, but then you still have a limit). > WAL on the other hand would be really nice because even if my data > is small, it sure would be nice to reproduce it in the case of a monstrous > crash. Nice, but not neccessary. That is my point. I would think more people would consider TOAST neccessary than WAL because running into the 8k limit (or 32k limit) will be a showstopper. Granted some people NEED to make sure no data is lost, but since WAL isn't ready yet and TOAST is, why not release 7.1 now and release 7.2 in January with WAL? -- Joseph Shraibman jks@selectacast.net Increase signal to noise ratio. http://www.targabot.com
Joseph Shraibman <jks@selectacast.net> writes: > since WAL isn't ready yet and TOAST is, > why not release 7.1 now and release 7.2 in January with WAL? Unfortunately that's not the scenario we're in. What we've got is beta-quality TOAST, beta-quality outer joins, and a long list of unrelated bugs still to be fixed before 7.1 can go out the door. Even if we decided today that we wanted to make a release without WAL, I don't think we'd be ready to go beta much before the end of this month, and certainly no final release less than a month after that. The reason is that other people have been designing their schedules around WAL --- for example, if we'd been trying to make an Oct 1 beta date, I'd have been doing bug fixes not outer joins last month. Pulling WAL would allow us to move up 7.1 release somewhat, but I don't think two months worth. Furthermore, if we did do it that way then 7.2 wouldn't happen in January. A beta test/release cycle consumes a lot of developer time, which would translate directly to pushing back the 7.2 dates. If we'd foreseen the schedule more accurately back around July, we might've chosen to push out a 7.1 with only TOAST, no WAL or outer joins, but it's too late for that now. regards, tom lane
e40@iboats.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Steve Wolfe wrote: > Hmm... that doesn't seem quite right to me. Whether it is compressed or > not, the same amount of final data has to move across the system bus to the > CPU for processing. Thats correct as far as your disk is bottlenecked by the system bus :-) Elmar