Thread: indexing on char vs varchar

indexing on char vs varchar

From

"Beth Gatewood"

Date:

02 October 2002, 12:39:10

Hi-

This is more just trying to understand what is going on under the hood of
pgsql.  I have read through the archives that there is no difference between
index on char, varchar or text.  I am wondering why?  I understand all the
arguments about saving space but I am specifically asking about index
performance and wondering about the underworkings of indices based on char
and varchar.

Othe RDBMS have clear leanings that indexing on chars are a better way to
go.

In MySQL this is due to a static table characteristics
(http://www.mysql.com/doc/en/Static_format.html) and speed for an index
look-up (row number X row length). and the ease to read a constant number of
records with each disk.
In the case of Oracle, the suggestion for char is based on if using
varchar2 that takes 5 char and then there is a subsequent update to this
field to now take 20 char, but now the record can not grow physically...so
they essentially mark the old one as deleted and create a new record at the
top (in an entirely new block) but the problem is that the index points to
the deleted block...so the index has to query the old block and then the
new....(info from:
http://groups.google.com/groups?q=oracle+char+vs+varchar+index&hl=en&lr=&ie=
UTF-8&oe=UTF-8&selm=3a791aa3%40news.iprimus.com.au&rnum=1)

Thanks for explaining this to me....
-Beth

Re: indexing on char vs varchar

From

Bruce Momjian

Date:

02 October 2002, 13:06:39

We store all the text/char/varchar types with the length at the front so
we don't have such optimizations.  We do have "char", in quotes, which
is a single character, but that's about it.

---------------------------------------------------------------------------

Beth Gatewood wrote:
> Hi-
> 
> This is more just trying to understand what is going on under the hood of
> pgsql.  I have read through the archives that there is no difference between
> index on char, varchar or text.  I am wondering why?  I understand all the
> arguments about saving space but I am specifically asking about index
> performance and wondering about the underworkings of indices based on char
> and varchar.
> 
> Othe RDBMS have clear leanings that indexing on chars are a better way to
> go.
> 
> In MySQL this is due to a static table characteristics
> (http://www.mysql.com/doc/en/Static_format.html) and speed for an index
> look-up (row number X row length). and the ease to read a constant number of
> records with each disk.
> 
>  In the case of Oracle, the suggestion for char is based on if using
> varchar2 that takes 5 char and then there is a subsequent update to this
> field to now take 20 char, but now the record can not grow physically...so
> they essentially mark the old one as deleted and create a new record at the
> top (in an entirely new block) but the problem is that the index points to
> the deleted block...so the index has to query the old block and then the
> new....(info from:
> http://groups.google.com/groups?q=oracle+char+vs+varchar+index&hl=en&lr=&ie=
> UTF-8&oe=UTF-8&selm=3a791aa3%40news.iprimus.com.au&rnum=1)
> 
> Thanks for explaining this to me....
> -Beth
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: indexing on char vs varchar

From

"Beth Gatewood"

Date:

02 October 2002, 13:11:49

Sorry....I don't understand.  The length is at the front of what?

-Beth

> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> Sent: Wednesday, October 02, 2002 10:06 AM
> To: Beth Gatewood
> Cc: pgsql-sql@postgresql.org
> Subject: Re: [SQL] indexing on char vs varchar
>
>
>
> We store all the text/char/varchar types with the length at
> the front so
> we don't have such optimizations.  We do have "char", in quotes, which
> is a single character, but that's about it.
>
> --------------------------------------------------------------
> -------------
>
> Beth Gatewood wrote:
> > Hi-
> >
> > This is more just trying to understand what is going on
> under the hood of
> > pgsql.  I have read through the archives that there is no
> difference between
> > index on char, varchar or text.  I am wondering why?  I
> understand all the
> > arguments about saving space but I am specifically asking
> about index
> > performance and wondering about the underworkings of
> indices based on char
> > and varchar.
> >
> > Othe RDBMS have clear leanings that indexing on chars are a
> better way to
> > go.
> >
> > In MySQL this is due to a static table characteristics
> > (http://www.mysql.com/doc/en/Static_format.html) and speed
> for an index
> > look-up (row number X row length). and the ease to read a
> constant number of
> > records with each disk.
> >
> >  In the case of Oracle, the suggestion for char is based on if using
> > varchar2 that takes 5 char and then there is a subsequent
> update to this
> > field to now take 20 char, but now the record can not grow
> physically...so
> > they essentially mark the old one as deleted and create a
> new record at the
> > top (in an entirely new block) but the problem is that the
> index points to
> > the deleted block...so the index has to query the old block
> and then the
> > new....(info from:
> >
> http://groups.google.com/groups?q=oracle+char+vs+varchar+index
&hl=en&lr=&ie=
> UTF-8&oe=UTF-8&selm=3a791aa3%40news.iprimus.com.au&rnum=1)
>
> Thanks for explaining this to me....
> -Beth
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

-- Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: indexing on char vs varchar

From

Josh Berkus

Date:

02 October 2002, 13:53:22

Beth,

> Sorry....I don't understand.  The length is at the front of what?

In some RDBMSs, the VARCHAR data type has a 2 or 4-byte indicator of the
length of the stored string before the data itself, while CHAR does not
require this information because it is fixed-length.  This makes the CHAR
datatype marginally smaller, and thus faster, than the VARCHAR data type on
those databases.   This difference goes back to much older databases and
computers, where every byte of a row counted in terms of performance.

In my experience, even though MS SQL Server still functions this way, the
performance difference between CHAR and VARCHAR is not measurable unless you
are getting close to the 8K data page limit that MSSQL imposes.  YMMV.

Postgres does not materially differentiate between CHAR, VARCHAR, and TEXT,
except that CHAR is padded by spaces and VARCHAR often has a length limit.
However, in terms of storage efficiency (and indexing efficiency), they are
identical.  In Postgres, the character count is included in all string data
types.

Thus, you should use the data type most appropriate to the data you are
storing, ignoring performance issues.  If the data is a fixed-length string
(such as a required zip code) use CHAR; if it's variable but limited, use
varchar; if it's a long description, use TEXT.

--
-Josh BerkusAglio Database SolutionsSan Francisco

Re: indexing on char vs varchar

From

Josh Berkus

Date:

02 October 2002, 13:54:18

Beth,

Oh, and you should take this sort of question to the new performance list:
pgsql-performance-request@postgresql.org

--
-Josh BerkusAglio Database SolutionsSan Francisco

Re: indexing on char vs varchar

From

"Beth Gatewood"

Date:

02 October 2002, 14:01:37

Thanks for the excellent description, Josh.  And the reminder of the
performance list (it's existence somehow slipped from my mind).

-Beth

> -----Original Message-----
> From: Josh Berkus [mailto:josh@agliodbs.com]
> Sent: Wednesday, October 02, 2002 10:56 AM
> To: Beth Gatewood; 'Bruce Momjian'
> Cc: pgsql-sql@postgresql.org
> Subject: Re: [SQL] indexing on char vs varchar
>
>
> Beth,
>
> Oh, and you should take this sort of question to the new
> performance list:
> pgsql-performance-request@postgresql.org
>
> --
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
>

Re: indexing on char vs varchar

From

Bruce Momjian

Date:

19 October 2002, 23:00:08

I have updated the FAQ to mention that char()/varchar()/text/bytea have
similar performance characteristics.

---------------------------------------------------------------------------

Josh Berkus wrote:
> Beth,
> 
> > Sorry....I don't understand.  The length is at the front of what?
> 
> In some RDBMSs, the VARCHAR data type has a 2 or 4-byte indicator of the 
> length of the stored string before the data itself, while CHAR does not 
> require this information because it is fixed-length.  This makes the CHAR 
> datatype marginally smaller, and thus faster, than the VARCHAR data type on 
> those databases.   This difference goes back to much older databases and 
> computers, where every byte of a row counted in terms of performance.
> 
> In my experience, even though MS SQL Server still functions this way, the 
> performance difference between CHAR and VARCHAR is not measurable unless you 
> are getting close to the 8K data page limit that MSSQL imposes.  YMMV.
> 
> Postgres does not materially differentiate between CHAR, VARCHAR, and TEXT, 
> except that CHAR is padded by spaces and VARCHAR often has a length limit.   
> However, in terms of storage efficiency (and indexing efficiency), they are 
> identical.  In Postgres, the character count is included in all string data 
> types.
> 
> Thus, you should use the data type most appropriate to the data you are 
> storing, ignoring performance issues.  If the data is a fixed-length string 
> (such as a required zip code) use CHAR; if it's variable but limited, use 
> varchar; if it's a long description, use TEXT.
> 
> -- 
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
> 
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073