Thread: Performance impact of record sizes
We have a need to store text data which typically is just a hundred or so bytes, but in some cases may extend to a few thousand. Our current field has a varchar of 1024, which is not large enough. Key data is fixed sized and much smaller in this same record. Our application is primarily transaction oriented, which means that records will normally be fetched via random access, not sequential scans. The question is: what size thresholds exist? I assume that there is a "page" size over which the record will be split into more than one. What is that size, and does the spill cost any more or less than I had split the record into two or more individual records in order to handle the same data? Obviously, the easiest thing for me to do is just set the varchar to something big (say - 10K) but I don't want to do this without understanding the OLTP performance impact. Thanks in advance John Moore http://www.tinyvital.com/personal.html UNITED WE STAND
John Moore wrote: > We have a need to store text data which typically is just a hundred or so > bytes, but in some cases may extend to a few thousand. Our current field > has a varchar of 1024, which is not large enough. Key data is fixed sized > and much smaller in this same record. > > Our application is primarily transaction oriented, which means that records > will normally be fetched via random access, not sequential scans. > > The question is: what size thresholds exist? I assume that there is a > "page" size over which the record will be split into more than one. What is > that size, and does the spill cost any more or less than I had split the > record into two or more individual records in order to handle the same data? > > Obviously, the easiest thing for me to do is just set the varchar to > something big (say - 10K) but I don't want to do this without understanding > the OLTP performance impact. > If you don't want a limit, use TEXT. Long values are automatically stored in TOAST tables to avoid performance problems with sequential scans over long row values. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
>If you don't want a limit, use TEXT. Long values are automatically >stored in TOAST tables to avoid performance problems with sequential >scans over long row values. Thanks... I wasn't quite clear enough in my question.... I am focused on OLTP performance, and in my case the vast majority of the rows will have only a few hundred bytes in that column, while a few (1%?) may be larger. I assume that internally there is a fixed page size (by which I mean cache buffer size or disk read size) for normal records. In my case, 99% of the physical rows should be short, and would fit easily in whatever that size is. So I *suspect* I want to keep the data in the physical row, rather than using TEXT and having it stored separately from the record.. The question is... are there any unexpected consequences. For example, if I have a whole bunch of rows with, say, a 10K varchar field which is only populated with a hundred or two bytes each, will it perform just as well as if that field was defined as a 200 byte field? A related question: is it more expensive to use varchar than fixed char fields? I assume some additional work in physically unpacking the record. My past experience is with Informix, and a lot with very old versions where views were high cost, and so were varchars. Likewise, you didn't want your physical row to exceed the size of a physical page if you could avoid it. John
John Moore wrote: > > >If you don't want a limit, use TEXT. Long values are automatically > >stored in TOAST tables to avoid performance problems with sequential > >scans over long row values. > > > Thanks... > > I wasn't quite clear enough in my question.... I am focused on OLTP > performance, and in my case the vast majority of the rows will have only a > few hundred bytes in that column, while a few (1%?) may be larger. > > I assume that internally there is a fixed page size (by which I mean cache > buffer size or disk read size) for normal records. In my case, 99% of the > physical rows should be short, and would fit easily in whatever that size > is. So I *suspect* I want to keep the data in the physical row, rather than > using TEXT and having it stored separately from the record.. The question > is... are there any unexpected consequences. For example, if I have a whole > bunch of rows with, say, a 10K varchar field which is only populated with a > hundred or two bytes each, will it perform just as well as if that field > was defined as a 200 byte field? > > A related question: is it more expensive to use varchar than fixed char > fields? I assume some additional work in physically unpacking the record. > > My past experience is with Informix, and a lot with very old versions where > views were high cost, and so were varchars. Likewise, you didn't want your > physical row to exceed the size of a physical page if you could avoid it. When the row length nears/exceeds the page size (8k) the longer column values are automatically pushed into a TOAST backup table so it is all done transparently, no matter what length you specify for the column, and performance is always good, i.e., if you have a 100mb value in a column, and do a sequential scan, that 100mb is only accessed if the column containing the 100mb is accessed. Specifically, it isn't the defined length of the column that counts, but the total length of each individual row that determines if the TOAST backup table is used for _some_ of the column values. Does that answer your question? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
John Moore <postgres@tinyvital.com> writes: > So I *suspect* I want to keep the data in the physical row, rather than > using TEXT and having it stored separately from the record. You seem to be reading something into the TEXT type that's not there; perhaps you are carrying over associations from some other DBMS? Here's how it works in Postgres: The *only* performance difference between TEXT and VARCHAR(n) is that for a VARCHAR(n) column, on insert or update there's an extra function call applied to the new value to verify that its length is within N. TEXT makes no such check; so TEXT has a small performance advantage if there are lots of updates. On the SELECT side there's no difference. Both TEXT and VARCHAR(n) data is stored using TOAST (automatic compression or out-of-line storage) if the row gets too big. There is no difference between the two datatypes on this score. For the particular application you describe I think TOAST storage will be ideal, since the overhead of compression or out-of-line storage is only paid on the few rows where it's needed. What you really ought to be asking yourself is whether (for this application) a hard limit on the field width makes sense at all. IMHO you should only use VARCHAR(n) if you can make a defensible argument for the particular value of N you are using. If you can't explain why you are setting the field width limit, then you probably do not need a specific limit at all, and should be using TEXT. > A related question: is it more expensive to use varchar than fixed char > fields? There is no savings from using CHAR(n) --- most of the time it's a substantial loss, because of the extra I/O costs associated with the space used by all those padding blanks. I can think of very very few applications where CHAR(n) is really a sensible choice over VARCHAR(n). US state postal codes (CHAR(2)) are an example, but beyond that the valid use cases are mighty thin on the ground. Bruce, it occurs to me that the "Performance Tips" section of the user's guide ought to have an explanation of the performance implications of TOAST. We have some material in the FAQ but there's precious little mention in the manual. Thoughts? regards, tom lane
Tom Lane wrote: > There is no savings from using CHAR(n) --- most of the time it's a > substantial loss, because of the extra I/O costs associated with the > space used by all those padding blanks. I can think of very very few > applications where CHAR(n) is really a sensible choice over VARCHAR(n). > US state postal codes (CHAR(2)) are an example, but beyond that the > valid use cases are mighty thin on the ground. > > Bruce, it occurs to me that the "Performance Tips" section of the user's > guide ought to have an explanation of the performance implications of > TOAST. We have some material in the FAQ but there's precious little > mention in the manual. Thoughts? Yes, I am working on my O'Reilly slides now and am dealing with how to address TOAST for admins, specifically for disk space computations, which I did cover in the new 'disk usage' section of the manual. And in talking to the questioner, I was wondering if we have this auto-out-of-line TOAST description anywhere in the docs. If not, we need to add that. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > And in talking to the questioner, I was wondering if we have this > auto-out-of-line TOAST description anywhere in the docs. If not, we > need to add that. Yes; I am not sure that that ever got into the main docs :-(. We were so tickled with the notion that you no longer needed to think about oversized fields, that we forgot that it ought to be documented anyhow. Jan's old TOAST project page, http://www.ca.postgresql.org/projects/devel-toast.html has some stuff that could be recycled into the manual, and I think we have done some good writeups in response to older questions in the mailing lists. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > And in talking to the questioner, I was wondering if we have this > > auto-out-of-line TOAST description anywhere in the docs. If not, we > > need to add that. > > Yes; I am not sure that that ever got into the main docs :-(. > We were so tickled with the notion that you no longer needed to > think about oversized fields, that we forgot that it ought to be > documented anyhow. > > Jan's old TOAST project page, > http://www.ca.postgresql.org/projects/devel-toast.html > has some stuff that could be recycled into the manual, and I think > we have done some good writeups in response to older questions in > the mailing lists. Yea, I will keep this email and merge something in. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
At 04:49 PM 7/4/2002, Tom Lane wrote: >John Moore <postgres@tinyvital.com> writes: > > So I *suspect* I want to keep the data in the physical row, rather than > > using TEXT and having it stored separately from the record. > >You seem to be reading something into the TEXT type that's not there; >perhaps you are carrying over associations from some other DBMS? >Here's how it works in Postgres: Indeed. >The *only* performance difference between TEXT and VARCHAR(n) is that >for a VARCHAR(n) column, on insert or update there's an extra function >call applied to the new value to verify that its length is within N. >TEXT makes no such check; so TEXT has a small performance advantage if >there are lots of updates. On the SELECT side there's no difference. > >Both TEXT and VARCHAR(n) data is stored using TOAST (automatic >compression or out-of-line storage) if the row gets too big. There >is no difference between the two datatypes on this score. > >For the particular application you describe I think TOAST storage will >be ideal, since the overhead of compression or out-of-line storage is >only paid on the few rows where it's needed. > >What you really ought to be asking yourself is whether (for this >application) a hard limit on the field width makes sense at all. >IMHO you should only use VARCHAR(n) if you can make a defensible >argument for the particular value of N you are using. If you can't >explain why you are setting the field width limit, then you probably >do not need a specific limit at all, and should be using TEXT. Yes... now that I understand the performance ramifications, this makes perfect sense. Overall, the way you guys are doing this looks very elegant, and just right for what I am doing. > > A related question: is it more expensive to use varchar than fixed char > > fields? > >There is no savings from using CHAR(n) --- most of the time it's a >substantial loss, because of the extra I/O costs associated with the >space used by all those padding blanks. I can think of very very few >applications where CHAR(n) is really a sensible choice over VARCHAR(n). >US state postal codes (CHAR(2)) are an example, but beyond that the >valid use cases are mighty thin on the ground. My question was indeed a carry over from a previous database (Informix) and from performance analyses done about 14 years ago! >Bruce, it occurs to me that the "Performance Tips" section of the user's >guide ought to have an explanation of the performance implications of >TOAST. We have some material in the FAQ but there's precious little >mention in the manual. Thoughts? I would like that. I did look in performance tips before asking. John Moore http://www.tinyvital.com/personal.html UNITED WE STAND
On Thu, 4 Jul 2002, Tom Lane wrote: > I can think of very very few applications where CHAR(n) is really a > sensible choice over VARCHAR(n). text hashes such as MD5 and crypt, stock or serial numbers, automotive VIN codes, invoice sequences, emulated bitmasks, etc. Lots of industry-specific things are non-varying sequences of characters. Besides, you don't know true horror until you've seen an amateur set up an entire database using nothing but text columns in all the tables. Sometimes these types convey meaning about the data they contain, too. ^_^ > US state postal codes (CHAR(2)) are an example, but beyond that the > valid use cases are mighty thin on the ground. Don't forget zipcodes. You can get away with a char(9), or a char(5) and an optional char(4) for the postal extension. ^_^ -- +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ | Shaun M. Thomas INN Database Administrator | | Phone: (309) 743-0812 Fax : (309) 743-0830 | | Email: sthomas@townnews.com AIM : trifthen | | Web : www.townnews.com | | | | "Most of our lives are about proving something, either to | | ourselves or to someone else." | | -- Anonymous | +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
I see in the data types section for character types this text: There are no performance differences between these three types, apart from the increased storage size when using the blank-padded type. I can't improve on that. --------------------------------------------------------------------------- Tom Lane wrote: > John Moore <postgres@tinyvital.com> writes: > > So I *suspect* I want to keep the data in the physical row, rather than > > using TEXT and having it stored separately from the record. > > You seem to be reading something into the TEXT type that's not there; > perhaps you are carrying over associations from some other DBMS? > Here's how it works in Postgres: > > The *only* performance difference between TEXT and VARCHAR(n) is that > for a VARCHAR(n) column, on insert or update there's an extra function > call applied to the new value to verify that its length is within N. > TEXT makes no such check; so TEXT has a small performance advantage if > there are lots of updates. On the SELECT side there's no difference. > > Both TEXT and VARCHAR(n) data is stored using TOAST (automatic > compression or out-of-line storage) if the row gets too big. There > is no difference between the two datatypes on this score. > > For the particular application you describe I think TOAST storage will > be ideal, since the overhead of compression or out-of-line storage is > only paid on the few rows where it's needed. > > What you really ought to be asking yourself is whether (for this > application) a hard limit on the field width makes sense at all. > IMHO you should only use VARCHAR(n) if you can make a defensible > argument for the particular value of N you are using. If you can't > explain why you are setting the field width limit, then you probably > do not need a specific limit at all, and should be using TEXT. > > > A related question: is it more expensive to use varchar than fixed char > > fields? > > There is no savings from using CHAR(n) --- most of the time it's a > substantial loss, because of the extra I/O costs associated with the > space used by all those padding blanks. I can think of very very few > applications where CHAR(n) is really a sensible choice over VARCHAR(n). > US state postal codes (CHAR(2)) are an example, but beyond that the > valid use cases are mighty thin on the ground. > > Bruce, it occurs to me that the "Performance Tips" section of the user's > guide ought to have an explanation of the performance implications of > TOAST. We have some material in the FAQ but there's precious little > mention in the manual. Thoughts? > > regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote: > > I see in the data types section for character types this text: > > There are no performance differences between these three types, > apart from the increased storage size when using the blank-padded > type. > > I can't improve on that. I added a mention of TOAST in the character data type docs: Long values are also stored in background tables so they don't interfere with rapid access to the shorter column values. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026