Thread: Speed & Memory Management

Speed & Memory Management

From
"Kris"
Date:
Hello all.  I am designing a database that will could potentially grow to
have millions of tuples associated with it.  My thought is that because I
will likely have many updates/inserts/deletes occurring I should set each
column that requires text to a static ammount (ie using char(30) instead of
text).  However, this looks to be a double edged sword, since selecting from
the db is my largest concern (with updates a very close second).  When I
pull these values back out of the db, it seems I am going to have to trim
each returned value (when I pull a ten character string out of this field,
it returns the 10 char string as well as 20 whitespaces.).  I am trying to
assign a weight to each of these scenarios and figure out which is the
lesser of the two evils.  If anyone has a suggestion of how I may circumvent
this issue or which possibility may work best, it would be greatly
appreciated.  Thanks

Kris



Re: Speed & Memory Management

From
GB
Date:
How about using varchar(30)?

GB


Re: Speed & Memory Management

From
Robert Treat
Date:
Unless your application requires a 30 character limit at the logical
level, use text.

Robert Treat

On Mon, 2003-03-31 at 17:08, Kris wrote:
> Hello all.  I am designing a database that will could potentially grow to
> have millions of tuples associated with it.  My thought is that because I
> will likely have many updates/inserts/deletes occurring I should set each
> column that requires text to a static ammount (ie using char(30) instead of
> text).  However, this looks to be a double edged sword, since selecting from
> the db is my largest concern (with updates a very close second).  When I
> pull these values back out of the db, it seems I am going to have to trim
> each returned value (when I pull a ten character string out of this field,
> it returns the 10 char string as well as 20 whitespaces.).  I am trying to
> assign a weight to each of these scenarios and figure out which is the
> lesser of the two evils.  If anyone has a suggestion of how I may circumvent
> this issue or which possibility may work best, it would be greatly
> appreciated.  Thanks
>


Re: Speed & Memory Management

From
Tom Lane
Date:
Robert Treat <xzilla@users.sourceforge.net> writes:
> Unless your application requires a 30 character limit at the logical
> level, use text.

And if it does, use varchar(30).  I will bet a very good lunch that
char(30) will be a complete dead loss on *every* measure: speed, disk
space, and convenience.

char(N) is not really fixed-width in Postgres, because N is measured
in characters not bytes (which are not the same thing if you use a
multibyte character encoding).  Therefore, there are no optimizations
that could allow it to outperform varchar(N).  When you consider the
extra cost of performing the padding step, the extra storage and I/O
incurred for all those pad blanks, and the client-side headaches of
having to trim the unwanted blanks again, it's just guaranteed to be
a loser.

The only case in which I could recommend char(N) is where you have
application semantics that constrain a field to exactly N characters
(postal codes are one common example).  If the semantics are "at
most N characters", use varchar(N).  If you are picking N out of the
air, don't bother: use text.

            regards, tom lane


Re: Speed & Memory Management

From
Peter Eisentraut
Date:
Kris writes:

> My thought is that because I will likely have many
> updates/inserts/deletes occurring I should set each column that requires
> text to a static ammount (ie using char(30) instead of text).

That would buy you absolutely nothing.  char() is not faster than text.
Instead you would make everything slower because the system would
constantly have to pad and trim your values and it would bloat the storage
with the extra spaces.

--
Peter Eisentraut   peter_e@gmx.net


Re: Speed & Memory Management

From
Murthy Kambhampaty
Date:
This, and the message from Tom Lane, seem inconsistent with the
documentation re "Character Types"
(http://developer.postgresql.org/docs/postgres/datatype-character.html)
which says:

"Tip:  There are no performance differences between these three types, apart
from the increased storage size when using the blank-padded type."


-----Original Message-----
From: Peter Eisentraut [mailto:peter_e@gmx.net]
Sent: Tuesday, April 01, 2003 16:42
To: Kris
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Speed & Memory Management


Kris writes:

> My thought is that because I will likely have many
> updates/inserts/deletes occurring I should set each column that requires
> text to a static ammount (ie using char(30) instead of text).

That would buy you absolutely nothing.  char() is not faster than text.
Instead you would make everything slower because the system would
constantly have to pad and trim your values and it would bloat the storage
with the extra spaces.

--
Peter Eisentraut   peter_e@gmx.net


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


Re: Speed & Memory Management

From
Juan Miguel
Date:
> That would buy you absolutely nothing.  char() is not faster than text.
> Instead you would make everything slower because the system would
> constantly have to pad and trim your values and it would bloat the storage
> with the extra spaces.

First, sorry about my english.

Well ... but ... reading database theory books, you can see that fixed size
records are "better" than variant size records. When the records are size
fixed,  inserts, updates and deletes can recalculate easier the position of
the rest records in the pages. These books tell you that in general cases,
variant record size tables (records that have variant type colums, for
example) imply a lower performance in the system ...

Why in postgresql these things don't matter ?

Thanks.


Re: Speed & Memory Management

From
Peter Eisentraut
Date:
Juan Miguel writes:

> Well ... but ... reading database theory books, you can see that fixed size
> records are "better" than variant size records.

Theory and practice are only the same in theory.  In practice there is a
difference.

Anyway, char(30) means 30 characters, not 30 bytes.  So it's not a
fixed-size record anyway.  Other factors are out-of-line storage and
automatic compression of long values.

--
Peter Eisentraut   peter_e@gmx.net


Re: Speed & Memory Management

From
Peter Eisentraut
Date:
Murthy Kambhampaty writes:

> This, and the message from Tom Lane, seem inconsistent with the
> documentation re "Character Types"
> (http://developer.postgresql.org/docs/postgres/datatype-character.html)
> which says:
>
> "Tip:  There are no performance differences between these three types, apart
> from the increased storage size when using the blank-padded type."

What are you saying?  This is exactly true.  Of course the padding might
also take time, not only space, but we don't want to get too picky here.

--
Peter Eisentraut   peter_e@gmx.net


Re: Speed & Memory Management

From
Andrew Biagioni
Date:


Juan Miguel wrote:
That would buy you absolutely nothing.  char() is not faster than text.
Instead you would make everything slower because the system would
constantly have to pad and trim your values and it would bloat the storage
with the extra spaces.   
First, sorry about my english.

Well ... but ... reading database theory books, you can see that fixed size 
records are "better" than variant size records. When the records are size 
fixed,  inserts, updates and deletes can recalculate easier the position of 
the rest records in the pages. These books tell you that in general cases, 
variant record size tables (records that have variant type colums, for 
example) imply a lower performance in the system ...

Why in postgresql these things don't matter ? 
Because unlike other DBs, PostgreSQL doesn't actually remove anything when you do an update or delete, it marks the old record as "dead" and (for an update) adds a new instance of the record at the end.  That's one reason why you want to vacuum tables after some activity, to remove the dead records fro all updates/deletes.

Thanks.


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

 

Re: Speed & Memory Management

From
Jodi Kanter
Date:
In response to the speed and memory management messages posted. Can someone tell me if the varchar(N) data type also impedes performance by padding with spaces?
Thank you,
Jodi
--

_______________________________
Jodi L Kanter
BioInformatics Database Administrator
University of Virginia
(434) 924-2846
jkanter@virginia.edu


 

 

 

Re: Speed & Memory Management

From
Kris Kiger
Date:
Under what circumstances does a character not take up a set number of
bytes?  For example, i was under the impression  that utf-8 allocated 6
bytes per character.  While the character may not occupy all six bytes,
each byte would still be reserved for that character set.

Kris


>Juan Miguel writes:
>
>
>
>>Well ... but ... reading database theory books, you can see that fixed size
>>records are "better" than variant size records.
>>
>>
>
>Theory and practice are only the same in theory.  In practice there is a
>difference.
>
>Anyway, char(30) means 30 characters, not 30 bytes.  So it's not a
>fixed-size record anyway.  Other factors are out-of-line storage and
>automatic compression of long values.
>
>
>


Re: Speed & Memory Management

From
Robert Treat
Date:
On Thu, 2003-04-03 at 13:22, Jodi Kanter wrote:
> In response to the speed and memory management messages posted. Can
> someone tell me if the varchar(N) data type also impedes performance by
> padding with spaces?
> Thank you,
> Jodi
>

Well no since varchar doesn't pad spaces. Theoretically this does make
it an improvement over char, although by this same theory it does have
to do length checks on the input strings, which would slow things down
versus text data types *in theory*.

Don't forget also that a varchar(30) will _prevent_ strings longer than
30 chars, not truncate them.

Robert Treat


Re: Speed & Memory Management

From
Peter Eisentraut
Date:
Kris Kiger writes:

> Under what circumstances does a character not take up a set number of
> bytes?  For example, i was under the impression  that utf-8 allocated 6
> bytes per character.

Nope, UTF-8 is variable length.  And besides, UTF-8 is not the only
supported encoding.

--
Peter Eisentraut   peter_e@gmx.net


Need some info on an error

From
Kris Kiger
Date:
I added a column to a table of type boolean and then set its default to
false.  Next, I ran an update on the table that was very similar to this
one:

update tablename SET bool_field_name = false;

This error is returned:

ERROR:  unrecognized replace flag: 88

Has anyone seen this before?

Thanks for the input

Kris


Re: Need some info on an error

From
Tom Lane
Date:
Kris Kiger <kris@musicrebellion.com> writes:
> I added a column to a table of type boolean and then set its default to
> false.  Next, I ran an update on the table that was very similar to this
> one:

> update tablename SET bool_field_name = false;

> This error is returned:

> ERROR:  unrecognized replace flag: 88

This sounds to me like you've uncovered a bug.  What PG version is this?
Can you reproduce the error starting from an empty database, and if so how?

            regards, tom lane