Thread: indexing on char vs varchar
Hi- This is more just trying to understand what is going on under the hood of pgsql. I have read through the archives that there is no difference between index on char, varchar or text. I am wondering why? I understand all the arguments about saving space but I am specifically asking about index performance and wondering about the underworkings of indices based on char and varchar. Othe RDBMS have clear leanings that indexing on chars are a better way to go. In MySQL this is due to a static table characteristics (http://www.mysql.com/doc/en/Static_format.html) and speed for an index look-up (row number X row length). and the ease to read a constant number of records with each disk. In the case of Oracle, the suggestion for char is based on if using varchar2 that takes 5 char and then there is a subsequent update to this field to now take 20 char, but now the record can not grow physically...so they essentially mark the old one as deleted and create a new record at the top (in an entirely new block) but the problem is that the index points to the deleted block...so the index has to query the old block and then the new....(info from: http://groups.google.com/groups?q=oracle+char+vs+varchar+index&hl=en&lr=&ie= UTF-8&oe=UTF-8&selm=3a791aa3%40news.iprimus.com.au&rnum=1) Thanks for explaining this to me.... -Beth
We store all the text/char/varchar types with the length at the front so we don't have such optimizations. We do have "char", in quotes, which is a single character, but that's about it. --------------------------------------------------------------------------- Beth Gatewood wrote: > Hi- > > This is more just trying to understand what is going on under the hood of > pgsql. I have read through the archives that there is no difference between > index on char, varchar or text. I am wondering why? I understand all the > arguments about saving space but I am specifically asking about index > performance and wondering about the underworkings of indices based on char > and varchar. > > Othe RDBMS have clear leanings that indexing on chars are a better way to > go. > > In MySQL this is due to a static table characteristics > (http://www.mysql.com/doc/en/Static_format.html) and speed for an index > look-up (row number X row length). and the ease to read a constant number of > records with each disk. > > In the case of Oracle, the suggestion for char is based on if using > varchar2 that takes 5 char and then there is a subsequent update to this > field to now take 20 char, but now the record can not grow physically...so > they essentially mark the old one as deleted and create a new record at the > top (in an entirely new block) but the problem is that the index points to > the deleted block...so the index has to query the old block and then the > new....(info from: > http://groups.google.com/groups?q=oracle+char+vs+varchar+index&hl=en&lr=&ie= > UTF-8&oe=UTF-8&selm=3a791aa3%40news.iprimus.com.au&rnum=1) > > Thanks for explaining this to me.... > -Beth > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Sorry....I don't understand. The length is at the front of what? -Beth > -----Original Message----- > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > Sent: Wednesday, October 02, 2002 10:06 AM > To: Beth Gatewood > Cc: pgsql-sql@postgresql.org > Subject: Re: [SQL] indexing on char vs varchar > > > > We store all the text/char/varchar types with the length at > the front so > we don't have such optimizations. We do have "char", in quotes, which > is a single character, but that's about it. > > -------------------------------------------------------------- > ------------- > > Beth Gatewood wrote: > > Hi- > > > > This is more just trying to understand what is going on > under the hood of > > pgsql. I have read through the archives that there is no > difference between > > index on char, varchar or text. I am wondering why? I > understand all the > > arguments about saving space but I am specifically asking > about index > > performance and wondering about the underworkings of > indices based on char > > and varchar. > > > > Othe RDBMS have clear leanings that indexing on chars are a > better way to > > go. > > > > In MySQL this is due to a static table characteristics > > (http://www.mysql.com/doc/en/Static_format.html) and speed > for an index > > look-up (row number X row length). and the ease to read a > constant number of > > records with each disk. > > > > In the case of Oracle, the suggestion for char is based on if using > > varchar2 that takes 5 char and then there is a subsequent > update to this > > field to now take 20 char, but now the record can not grow > physically...so > > they essentially mark the old one as deleted and create a > new record at the > > top (in an entirely new block) but the problem is that the > index points to > > the deleted block...so the index has to query the old block > and then the > > new....(info from: > > > http://groups.google.com/groups?q=oracle+char+vs+varchar+index &hl=en&lr=&ie= > UTF-8&oe=UTF-8&selm=3a791aa3%40news.iprimus.com.au&rnum=1) > > Thanks for explaining this to me.... > -Beth > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Beth, > Sorry....I don't understand. The length is at the front of what? In some RDBMSs, the VARCHAR data type has a 2 or 4-byte indicator of the length of the stored string before the data itself, while CHAR does not require this information because it is fixed-length. This makes the CHAR datatype marginally smaller, and thus faster, than the VARCHAR data type on those databases. This difference goes back to much older databases and computers, where every byte of a row counted in terms of performance. In my experience, even though MS SQL Server still functions this way, the performance difference between CHAR and VARCHAR is not measurable unless you are getting close to the 8K data page limit that MSSQL imposes. YMMV. Postgres does not materially differentiate between CHAR, VARCHAR, and TEXT, except that CHAR is padded by spaces and VARCHAR often has a length limit. However, in terms of storage efficiency (and indexing efficiency), they are identical. In Postgres, the character count is included in all string data types. Thus, you should use the data type most appropriate to the data you are storing, ignoring performance issues. If the data is a fixed-length string (such as a required zip code) use CHAR; if it's variable but limited, use varchar; if it's a long description, use TEXT. -- -Josh BerkusAglio Database SolutionsSan Francisco
Beth, Oh, and you should take this sort of question to the new performance list: pgsql-performance-request@postgresql.org -- -Josh BerkusAglio Database SolutionsSan Francisco
Thanks for the excellent description, Josh. And the reminder of the performance list (it's existence somehow slipped from my mind). -Beth > -----Original Message----- > From: Josh Berkus [mailto:josh@agliodbs.com] > Sent: Wednesday, October 02, 2002 10:56 AM > To: Beth Gatewood; 'Bruce Momjian' > Cc: pgsql-sql@postgresql.org > Subject: Re: [SQL] indexing on char vs varchar > > > Beth, > > Oh, and you should take this sort of question to the new > performance list: > pgsql-performance-request@postgresql.org > > -- > -Josh Berkus > Aglio Database Solutions > San Francisco >
I have updated the FAQ to mention that char()/varchar()/text/bytea have similar performance characteristics. --------------------------------------------------------------------------- Josh Berkus wrote: > Beth, > > > Sorry....I don't understand. The length is at the front of what? > > In some RDBMSs, the VARCHAR data type has a 2 or 4-byte indicator of the > length of the stored string before the data itself, while CHAR does not > require this information because it is fixed-length. This makes the CHAR > datatype marginally smaller, and thus faster, than the VARCHAR data type on > those databases. This difference goes back to much older databases and > computers, where every byte of a row counted in terms of performance. > > In my experience, even though MS SQL Server still functions this way, the > performance difference between CHAR and VARCHAR is not measurable unless you > are getting close to the 8K data page limit that MSSQL imposes. YMMV. > > Postgres does not materially differentiate between CHAR, VARCHAR, and TEXT, > except that CHAR is padded by spaces and VARCHAR often has a length limit. > However, in terms of storage efficiency (and indexing efficiency), they are > identical. In Postgres, the character count is included in all string data > types. > > Thus, you should use the data type most appropriate to the data you are > storing, ignoring performance issues. If the data is a fixed-length string > (such as a required zip code) use CHAR; if it's variable but limited, use > varchar; if it's a long description, use TEXT. > > -- > -Josh Berkus > Aglio Database Solutions > San Francisco > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073