Thread: seeking advise on char vs text or varchar in search table

From:
chrisj
Date:

I have a table that contains a column for keywords that I expect to become
quite large and will be used for web searches.  I will either index the
column or come up with a simple hashing algorithm add the hash key to the
table and index that column.

I am thinking the max length in the keyword column I need to support is 30,
but the average would be less than10

Any suggestions on whether to use char(30), varchar(30) or text, would be
appreciated.  I am looking for the best performance option, not necessarily
the most economical on disk.

Or any other suggestions would be greatly appreciated.
--
View this message in context:
http://www.nabble.com/seeking-advise-on-char-vs-text-or-varchar-in-search-table-tf3618204.html#a10103002
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


From:
"Merlin Moncure"
Date:

On 4/20/07, chrisj <> wrote:
>
> I have a table that contains a column for keywords that I expect to become
> quite large and will be used for web searches.  I will either index the
> column or come up with a simple hashing algorithm add the hash key to the
> table and index that column.
>
> I am thinking the max length in the keyword column I need to support is 30,
> but the average would be less than10
>
> Any suggestions on whether to use char(30), varchar(30) or text, would be
> appreciated.  I am looking for the best performance option, not necessarily
> the most economical on disk.

Don't use char...it pads out the string to the length always.   It
also has no real advantage over varchar in any practical situation.
Think of varchar as text with a maximum length...its no faster or
slower but the database will throw out entries based on length (which
can be good or a bad thing)...in this case, text feels better.

Have you looked at tsearch2, gist, etc?

merlin

From:
Jim Nasby
Date:

On Apr 23, 2007, at 7:16 AM, Merlin Moncure wrote:
> On 4/20/07, chrisj <> wrote:
>>
>> I have a table that contains a column for keywords that I expect
>> to become
>> quite large and will be used for web searches.  I will either
>> index the
>> column or come up with a simple hashing algorithm add the hash key
>> to the
>> table and index that column.
>>
>> I am thinking the max length in the keyword column I need to
>> support is 30,
>> but the average would be less than10
>>
>> Any suggestions on whether to use char(30), varchar(30) or text,
>> would be
>> appreciated.  I am looking for the best performance option, not
>> necessarily
>> the most economical on disk.
>
> Don't use char...it pads out the string to the length always.   It
> also has no real advantage over varchar in any practical situation.
> Think of varchar as text with a maximum length...its no faster or
> slower but the database will throw out entries based on length (which
> can be good or a bad thing)...in this case, text feels better.

AIUI, char, varchar and text all store their data in *exactly* the
same way in the database; char only pads data on output, and in the
actual tables it still contains the regular varlena header. The only
reason I've ever used char in other databases is to save the overhead
of the variable-length information, so I recommend to people to just
steer clear of char in PostgreSQL.
--
Jim Nasby                                            
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



From:
Gregory Stark
Date:

"Jim Nasby" <> writes:

> AIUI, char, varchar and text all store their data in *exactly* the same way in
> the database; char only pads data on output, and in the  actual tables it still
> contains the regular varlena header. The only  reason I've ever used char in
> other databases is to save the overhead  of the variable-length information, so
> I recommend to people to just  steer clear of char in PostgreSQL.

Everything you said is correct except that char actually pads its data on
input, not output. This doesn't actually make a lot of sense since we're
storing it as a varlena so we could pad it on output and modify the data type
functions to pretend the spaces are there without storing them.

However it would only make a difference if you're storing variable length data
in a char field in which case I would 100% agree with your conclusion and
strongly recommend using varchar. The only reason I would think of using char
is when the data should always be the same length, like a SSN or md5hash or
something like that. In which case it's purely for the self-documenting
notational convenience, not any performance reason.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com