Thread: Speed & Memory Management
Hello all. I am designing a database that will could potentially grow to have millions of tuples associated with it. My thought is that because I will likely have many updates/inserts/deletes occurring I should set each column that requires text to a static ammount (ie using char(30) instead of text). However, this looks to be a double edged sword, since selecting from the db is my largest concern (with updates a very close second). When I pull these values back out of the db, it seems I am going to have to trim each returned value (when I pull a ten character string out of this field, it returns the 10 char string as well as 20 whitespaces.). I am trying to assign a weight to each of these scenarios and figure out which is the lesser of the two evils. If anyone has a suggestion of how I may circumvent this issue or which possibility may work best, it would be greatly appreciated. Thanks Kris
How about using varchar(30)? GB
Unless your application requires a 30 character limit at the logical level, use text. Robert Treat On Mon, 2003-03-31 at 17:08, Kris wrote: > Hello all. I am designing a database that will could potentially grow to > have millions of tuples associated with it. My thought is that because I > will likely have many updates/inserts/deletes occurring I should set each > column that requires text to a static ammount (ie using char(30) instead of > text). However, this looks to be a double edged sword, since selecting from > the db is my largest concern (with updates a very close second). When I > pull these values back out of the db, it seems I am going to have to trim > each returned value (when I pull a ten character string out of this field, > it returns the 10 char string as well as 20 whitespaces.). I am trying to > assign a weight to each of these scenarios and figure out which is the > lesser of the two evils. If anyone has a suggestion of how I may circumvent > this issue or which possibility may work best, it would be greatly > appreciated. Thanks >
Robert Treat <xzilla@users.sourceforge.net> writes: > Unless your application requires a 30 character limit at the logical > level, use text. And if it does, use varchar(30). I will bet a very good lunch that char(30) will be a complete dead loss on *every* measure: speed, disk space, and convenience. char(N) is not really fixed-width in Postgres, because N is measured in characters not bytes (which are not the same thing if you use a multibyte character encoding). Therefore, there are no optimizations that could allow it to outperform varchar(N). When you consider the extra cost of performing the padding step, the extra storage and I/O incurred for all those pad blanks, and the client-side headaches of having to trim the unwanted blanks again, it's just guaranteed to be a loser. The only case in which I could recommend char(N) is where you have application semantics that constrain a field to exactly N characters (postal codes are one common example). If the semantics are "at most N characters", use varchar(N). If you are picking N out of the air, don't bother: use text. regards, tom lane
Kris writes: > My thought is that because I will likely have many > updates/inserts/deletes occurring I should set each column that requires > text to a static ammount (ie using char(30) instead of text). That would buy you absolutely nothing. char() is not faster than text. Instead you would make everything slower because the system would constantly have to pad and trim your values and it would bloat the storage with the extra spaces. -- Peter Eisentraut peter_e@gmx.net
This, and the message from Tom Lane, seem inconsistent with the documentation re "Character Types" (http://developer.postgresql.org/docs/postgres/datatype-character.html) which says: "Tip: There are no performance differences between these three types, apart from the increased storage size when using the blank-padded type." -----Original Message----- From: Peter Eisentraut [mailto:peter_e@gmx.net] Sent: Tuesday, April 01, 2003 16:42 To: Kris Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Speed & Memory Management Kris writes: > My thought is that because I will likely have many > updates/inserts/deletes occurring I should set each column that requires > text to a static ammount (ie using char(30) instead of text). That would buy you absolutely nothing. char() is not faster than text. Instead you would make everything slower because the system would constantly have to pad and trim your values and it would bloat the storage with the extra spaces. -- Peter Eisentraut peter_e@gmx.net ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
> That would buy you absolutely nothing. char() is not faster than text. > Instead you would make everything slower because the system would > constantly have to pad and trim your values and it would bloat the storage > with the extra spaces. First, sorry about my english. Well ... but ... reading database theory books, you can see that fixed size records are "better" than variant size records. When the records are size fixed, inserts, updates and deletes can recalculate easier the position of the rest records in the pages. These books tell you that in general cases, variant record size tables (records that have variant type colums, for example) imply a lower performance in the system ... Why in postgresql these things don't matter ? Thanks.
Juan Miguel writes: > Well ... but ... reading database theory books, you can see that fixed size > records are "better" than variant size records. Theory and practice are only the same in theory. In practice there is a difference. Anyway, char(30) means 30 characters, not 30 bytes. So it's not a fixed-size record anyway. Other factors are out-of-line storage and automatic compression of long values. -- Peter Eisentraut peter_e@gmx.net
Murthy Kambhampaty writes: > This, and the message from Tom Lane, seem inconsistent with the > documentation re "Character Types" > (http://developer.postgresql.org/docs/postgres/datatype-character.html) > which says: > > "Tip: There are no performance differences between these three types, apart > from the increased storage size when using the blank-padded type." What are you saying? This is exactly true. Of course the padding might also take time, not only space, but we don't want to get too picky here. -- Peter Eisentraut peter_e@gmx.net
Juan Miguel wrote:
Because unlike other DBs, PostgreSQL doesn't actually remove anything when you do an update or delete, it marks the old record as "dead" and (for an update) adds a new instance of the record at the end. That's one reason why you want to vacuum tables after some activity, to remove the dead records fro all updates/deletes.That would buy you absolutely nothing. char() is not faster than text. Instead you would make everything slower because the system would constantly have to pad and trim your values and it would bloat the storage with the extra spaces.First, sorry about my english. Well ... but ... reading database theory books, you can see that fixed size records are "better" than variant size records. When the records are size fixed, inserts, updates and deletes can recalculate easier the position of the rest records in the pages. These books tell you that in general cases, variant record size tables (records that have variant type colums, for example) imply a lower performance in the system ... Why in postgresql these things don't matter ?
Thanks. ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
In response to the speed and memory management messages posted. Can someone tell me if the varchar(N) data type also impedes performance by padding with spaces?
Thank you,
Jodi
Thank you,
Jodi
--
_______________________________
Jodi L Kanter
BioInformatics Database Administrator
University of Virginia
(434) 924-2846
jkanter@virginia.edu
Under what circumstances does a character not take up a set number of bytes? For example, i was under the impression that utf-8 allocated 6 bytes per character. While the character may not occupy all six bytes, each byte would still be reserved for that character set. Kris >Juan Miguel writes: > > > >>Well ... but ... reading database theory books, you can see that fixed size >>records are "better" than variant size records. >> >> > >Theory and practice are only the same in theory. In practice there is a >difference. > >Anyway, char(30) means 30 characters, not 30 bytes. So it's not a >fixed-size record anyway. Other factors are out-of-line storage and >automatic compression of long values. > > >
On Thu, 2003-04-03 at 13:22, Jodi Kanter wrote: > In response to the speed and memory management messages posted. Can > someone tell me if the varchar(N) data type also impedes performance by > padding with spaces? > Thank you, > Jodi > Well no since varchar doesn't pad spaces. Theoretically this does make it an improvement over char, although by this same theory it does have to do length checks on the input strings, which would slow things down versus text data types *in theory*. Don't forget also that a varchar(30) will _prevent_ strings longer than 30 chars, not truncate them. Robert Treat
Kris Kiger writes: > Under what circumstances does a character not take up a set number of > bytes? For example, i was under the impression that utf-8 allocated 6 > bytes per character. Nope, UTF-8 is variable length. And besides, UTF-8 is not the only supported encoding. -- Peter Eisentraut peter_e@gmx.net
I added a column to a table of type boolean and then set its default to false. Next, I ran an update on the table that was very similar to this one: update tablename SET bool_field_name = false; This error is returned: ERROR: unrecognized replace flag: 88 Has anyone seen this before? Thanks for the input Kris
Kris Kiger <kris@musicrebellion.com> writes: > I added a column to a table of type boolean and then set its default to > false. Next, I ran an update on the table that was very similar to this > one: > update tablename SET bool_field_name = false; > This error is returned: > ERROR: unrecognized replace flag: 88 This sounds to me like you've uncovered a bug. What PG version is this? Can you reproduce the error starting from an empty database, and if so how? regards, tom lane