Thread: Datatypes and performance

Datatypes and performance

From

"Jay O'Connor"

Date:

04 July 2003, 12:16:21

Couple of questions on datat types and performance

1. varchar vs varchar(2)

I created a database schema and based on the test data I had to work with,
I couldn't really determine the max size of a lot of string fields so I
just left a lot of fields as varchar

A comment from a coworker was that this would cause a performance problem

Based on the docs, they say that there's not performance difference between
using varchar(n) and text.

So will leaving my fields as unlimited varchar be a performance issue?  Or
should I try to narrow them down?

My coworker has more experience with Oracle and MS-SQL than postgresql

2.varchar and int
I was using a varchar as one field that's part of an index.  Looking at the
data, I realized I could just use an int instead.  My assumption would be
that an int would be faster to serach for thena  varchar, so I converted
the field to int.  Is this a valid assumption?

Thanks

Re: Datatypes and performance

From

Shridhar Daithankar

Date:

04 July 2003, 12:20:57

On Friday 04 July 2003 21:40, Jay O'Connor wrote:
> 2.varchar and int
> I was using a varchar as one field that's part of an index.  Looking at the
> data, I realized I could just use an int instead.  My assumption would be
> that an int would be faster to serach for thena  varchar, so I converted
> the field to int.  Is this a valid assumption?

Certainly yes.

 Shridhar

Re: Datatypes and performance

From

Alvaro Herrera

Date:

04 July 2003, 12:28:32

On Fri, Jul 04, 2003 at 09:10:41AM -0700, Jay O'Connor wrote:

> I created a database schema and based on the test data I had to work with,
> I couldn't really determine the max size of a lot of string fields so I
> just left a lot of fields as varchar
>
> A comment from a coworker was that this would cause a performance problem
>
> Based on the docs, they say that there's not performance difference between
> using varchar(n) and text.

The only difference comes from checking the length for the limited
fields, so varchar(n) will be very slightly slower than text.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"There was no reply" (Kernel Traffic)

Re: Datatypes and performance

From

Andrew Sullivan

Date:

04 July 2003, 16:54:45

On Fri, Jul 04, 2003 at 09:10:41AM -0700, Jay O'Connor wrote:
> Based on the docs, they say that there's not performance difference between
> using varchar(n) and text.
>
> So will leaving my fields as unlimited varchar be a performance issue?  Or
> should I try to narrow them down?

In fact, there is a performance penalty for limiting their length,
because you have to check on each insert.

> My coworker has more experience with Oracle and MS-SQL than postgresql

You may want to tell your coworker to read the docs ;-)

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: Datatypes and performance

From

"Maksim Likharev"

Date:

04 July 2003, 17:15:03

How postgres internally stores text fields, in a separate table?

-----Original Message-----
From: Andrew Sullivan [mailto:andrew@libertyrms.info]
Sent: Friday, July 04, 2003 12:55 PM
To: PostgreSQL List
Subject: Re: [GENERAL] Datatypes and performance


On Fri, Jul 04, 2003 at 09:10:41AM -0700, Jay O'Connor wrote:
> Based on the docs, they say that there's not performance difference
between
> using varchar(n) and text.
>
> So will leaving my fields as unlimited varchar be a performance issue?
Or
> should I try to narrow them down?

In fact, there is a performance penalty for limiting their length,
because you have to check on each insert.

> My coworker has more experience with Oracle and MS-SQL than postgresql

You may want to tell your coworker to read the docs ;-)

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faqs/FAQ.html

Re: Datatypes and performance

From

Andrew Sullivan

Date:

04 July 2003, 17:24:05

On Fri, Jul 04, 2003 at 01:14:52PM -0700, Maksim Likharev wrote:
> How postgres internally stores text fields, in a separate table?

I believe it gets stored in a separate table just in case it's too
long (read the docs on TOAST if you want more about this).  But
normally, no.  Here's what the docs have to say about it:

---cut here---
The storage requirement for data of these types is 4 bytes plus the
actual string, and in case of character plus the padding. Long
strings are compressed by the system automatically, so the physical
requirement on disk may be less. Long values are also stored in
background tables so they don't interfere with rapid access to the
shorter column values. In any case, the longest possible character
string that can be stored is about 1 GB. (The maximum value that will
be allowed for n in the data type declaration is less than that. It
wouldn't be very useful to change this because with multibyte
character encodings the number of characters and bytes can be quite
different anyway. If you desire to store long strings with no
specific upper limit, use text or character varying without a length
specifier, rather than making up an arbitrary length limit.)
---cut here---

A
--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: Datatypes and performance

From

"Maksim Likharev"

Date:

04 July 2003, 18:22:48

Ok, what I see here tells me that text is slower then fixed len varchar,
due to stored in separate table ( but how else you can store long fields
).
so postgres has to read another page(s) in order to get long value.
Story about boundary checks for varchar just does not count,
just nothing with comparing with disk reads/writes.


-----Original Message-----
From: Andrew Sullivan [mailto:andrew@libertyrms.info]
Sent: Friday, July 04, 2003 1:24 PM
To: PostgreSQL List
Subject: Re: [GENERAL] Datatypes and performance


On Fri, Jul 04, 2003 at 01:14:52PM -0700, Maksim Likharev wrote:
> How postgres internally stores text fields, in a separate table?

I believe it gets stored in a separate table just in case it's too
long (read the docs on TOAST if you want more about this).  But
normally, no.  Here's what the docs have to say about it:

---cut here---
The storage requirement for data of these types is 4 bytes plus the
actual string, and in case of character plus the padding. Long
strings are compressed by the system automatically, so the physical
requirement on disk may be less. Long values are also stored in
background tables so they don't interfere with rapid access to the
shorter column values. In any case, the longest possible character
string that can be stored is about 1 GB. (The maximum value that will
be allowed for n in the data type declaration is less than that. It
wouldn't be very useful to change this because with multibyte
character encodings the number of characters and bytes can be quite
different anyway. If you desire to store long strings with no
specific upper limit, use text or character varying without a length
specifier, rather than making up an arbitrary length limit.)
---cut here---

A
--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Datatypes and performance

From

"Nigel J. Andrews"

Date:

04 July 2003, 18:30:53

That's "long values" that are stored elsewhere. I believe the length has to be
a good portion of a page size, normally 4KB, before it's considered for placing
in the toast table. I'm not sure if the details are in the documentation but
I'm sure they'll appear in a search of the archive. Of course, someone like
Tom, Bruce etc. while no doubt pop up with the specifics.


--
Nigel J. Andrews


On Fri, 4 Jul 2003, Maksim Likharev wrote:

> Ok, what I see here tells me that text is slower then fixed len varchar,
> due to stored in separate table ( but how else you can store long fields
> ).
> so postgres has to read another page(s) in order to get long value.
> Story about boundary checks for varchar just does not count,
> just nothing with comparing with disk reads/writes.
>
>
> -----Original Message-----
> From: Andrew Sullivan [mailto:andrew@libertyrms.info]
> Sent: Friday, July 04, 2003 1:24 PM
> To: PostgreSQL List
> Subject: Re: [GENERAL] Datatypes and performance
>
>
> On Fri, Jul 04, 2003 at 01:14:52PM -0700, Maksim Likharev wrote:
> > How postgres internally stores text fields, in a separate table?
>
> I believe it gets stored in a separate table just in case it's too
> long (read the docs on TOAST if you want more about this).  But
> normally, no.  Here's what the docs have to say about it:
>
> ---cut here---
> The storage requirement for data of these types is 4 bytes plus the
> actual string, and in case of character plus the padding. Long
> strings are compressed by the system automatically, so the physical
> requirement on disk may be less. Long values are also stored in
> background tables so they don't interfere with rapid access to the
> shorter column values. In any case, the longest possible character
> string that can be stored is about 1 GB. (The maximum value that will
> be allowed for n in the data type declaration is less than that. It
> wouldn't be very useful to change this because with multibyte
> character encodings the number of characters and bytes can be quite
> different anyway. If you desire to store long strings with no
> specific upper limit, use text or character varying without a length
> specifier, rather than making up an arbitrary length limit.)
> ---cut here---
>
> A
>

Re: Datatypes and performance

From

Alvaro Herrera

Date:

04 July 2003, 18:54:22

On Fri, Jul 04, 2003 at 02:22:39PM -0700, Maksim Likharev wrote:
> Ok, what I see here tells me that text is slower then fixed len varchar,
> due to stored in separate table ( but how else you can store long fields
> ).
> so postgres has to read another page(s) in order to get long value.

That's regardless of the datatype: a varchar longer than 2 KiB IIRC will
be stored in a separate table, just as a text longer than 2 KiB.
There's no difference _at all_ for those two datatypes _except_ that the
former is checked for maximum length.  If you store 256 chars in a TEXT
field it will be in the main table as it were a varchar(256).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Nunca confiaré en un traidor.  Ni siquiera si el traidor lo he creado yo"
(Barón Vladimir Harkonnen)

Re: Datatypes and performance

From

"Maksim Likharev"

Date:

04 July 2003, 22:57:13

Ok, then if I store 4K - 1byte in a varchar and after that 4K + 1 byte,
and again and again, you are saying me that my text will be jerking
around 2 tables?
So in reality no performance degradation/benefits for varchar vs text,
should be read as 'varchar as slow as text' or keep you varchar under 4K
if you want to read it fast.

Pretty useful detail, thank you.

-----Original Message-----
From: Alvaro Herrera [mailto:alvherre@dcc.uchile.cl]
Sent: Friday, July 04, 2003 2:54 PM
To: Maksim Likharev
Cc: PostgreSQL List
Subject: Re: [GENERAL] Datatypes and performance

On Fri, Jul 04, 2003 at 02:22:39PM -0700, Maksim Likharev wrote:
> Ok, what I see here tells me that text is slower then fixed len
varchar,
> due to stored in separate table ( but how else you can store long
fields
> ).
> so postgres has to read another page(s) in order to get long value.

That's regardless of the datatype: a varchar longer than 2 KiB IIRC will
be stored in a separate table, just as a text longer than 2 KiB.
There's no difference _at all_ for those two datatypes _except_ that the
former is checked for maximum length.  If you store 256 chars in a TEXT
field it will be in the main table as it were a varchar(256).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Nunca confiaré en un traidor.  Ni siquiera si el traidor lo he creado
yo"
(Barón Vladimir Harkonnen)

Re: Datatypes and performance

From

Alvaro Herrera

Date:

05 July 2003, 00:13:21

On Fri, Jul 04, 2003 at 06:57:04PM -0700, Maksim Likharev wrote:
> Ok, then if I store 4K - 1byte in a varchar and after that 4K + 1 byte,
> and again and again, you are saying me that my text will be jerking
> around 2 tables?

In any case the tuple will probably "jerk around" different pages (I
don't think it's different whether the pages are from two tables or from
the same one).  I fail to understand how that is a different performance
problem than for any other datatype.  I'm sure you've done some
benchmark to determine that the varchar handling is as slow as you
appear to be saying?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Postgres is bloatware by design: it was built to house
PhD theses." (Joey Hellerstein, SIGMOD annual conference 2002)

Re: Datatypes and performance

From

Dennis Björklund

Date:

05 July 2003, 02:19:17

On Fri, 4 Jul 2003, Maksim Likharev wrote:

> Ok, then if I store 4K - 1byte in a varchar and after that 4K + 1 byte,
> and again and again, you are saying me that my text will be jerking
> around 2 tables?

Updates in postgresql is not done inplace. It just adds the new data and
the old is kept around so that other concurrent transactions can still run
and use the old data. This gives a lot of speed when you have many
concurrent users.

Vacuum cleans out old unused data.

--
/Dennis

Re: Datatypes and performance

From

"Maksim Likharev"

Date:

05 July 2003, 02:51:07

Wasn't any sarcasm in my words,
as I sad I am really glad to see real numbers behind the story about
no performance difference for varchar and text.

I do not have performance metrics for varchars,
but I have benchmarks and my observations about performance in general.
Just want to say, I haven't seen much other DBMS systems other then
Microsoft SQL,
and Postgres 7.3.x, so my performance observations based on those
systems.
Selects, do not have any problems in general equal or slightly slower
that MS SQL.

Inserts/Updates sometimes slow, sometimes a nightmare,
in general painfully slow.

I guess as long as Postgres retrieves data fast enough I satisfied,
but for dynamic DBs ( lots of inserts/updates ) ....

P.S
do not want to offend somebody or something, just saying what I see.

-----Original Message-----
From: Alvaro Herrera [mailto:alvherre@dcc.uchile.cl]
Sent: Friday, July 04, 2003 8:13 PM
To: Maksim Likharev
Cc: PostgreSQL List
Subject: Re: [GENERAL] Datatypes and performance

On Fri, Jul 04, 2003 at 06:57:04PM -0700, Maksim Likharev wrote:
> Ok, then if I store 4K - 1byte in a varchar and after that 4K + 1
byte,
> and again and again, you are saying me that my text will be jerking
> around 2 tables?

In any case the tuple will probably "jerk around" different pages (I
don't think it's different whether the pages are from two tables or from
the same one).  I fail to understand how that is a different performance
problem than for any other datatype.  I'm sure you've done some
benchmark to determine that the varchar handling is as slow as you
appear to be saying?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Postgres is bloatware by design: it was built to house
PhD theses." (Joey Hellerstein, SIGMOD annual conference 2002)

Re: Datatypes and performance

From

Alvaro Herrera

Date:

05 July 2003, 03:15:36

On Fri, Jul 04, 2003 at 10:50:17PM -0700, Maksim Likharev wrote:
> Wasn't any sarcasm in my words,

Well, I had really thought so.  I'm no native english speaker so I
frequently misunderstand people.

> but I have benchmarks and my observations about performance in general.
> Just want to say, I haven't seen much other DBMS systems other then
> Microsoft SQL, and Postgres 7.3.x, so my performance observations
> based on those systems.  Selects, do not have any problems in general
> equal or slightly slower that MS SQL.

> Inserts/Updates sometimes slow, sometimes a nightmare,
> in general painfully slow.

This is probably unrelated to the technique I was describing (it's
called TOAST, by the way).  Try using another approach, for example if
you have lots of inserts, wrap them in a transaction like
BEGIN
INSERT ...
INSERT ...
...
COMMIT

Or use a COPY statement.  Both of those approaches are much faster than
using hundreds of standalone inserts.

Be sure to tune your server, for example bump up the shared_buffers
setting, VACUUM frequently, ANALYZE when there's significant statistical
change in your data, etc.  There are lots of documents on how to do
this (I'm sure somebody will post appropiate URLs for those).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Por suerte hoy explotó el califont porque si no me habría muerto
de aburrido"  (Papelucho)

Re: Datatypes and performance

From

"Mattias Kregert"

Date:

07 July 2003, 08:03:17

First of all, disable FSYNC... that will speed things up a lot!

/M

----- Original Message -----
From: "Alvaro Herrera" <alvherre@dcc.uchile.cl>
To: "Maksim Likharev" <mlikharev@aurigin.com>
Cc: "PostgreSQL List" <pgsql-general@postgresql.org>
Sent: Saturday, July 05, 2003 8:15 AM
Subject: Re: [GENERAL] Datatypes and performance


> On Fri, Jul 04, 2003 at 10:50:17PM -0700, Maksim Likharev wrote:
> > Wasn't any sarcasm in my words,
>
> Well, I had really thought so.  I'm no native english speaker so I
> frequently misunderstand people.
>
> > but I have benchmarks and my observations about performance in general.
> > Just want to say, I haven't seen much other DBMS systems other then
> > Microsoft SQL, and Postgres 7.3.x, so my performance observations
> > based on those systems.  Selects, do not have any problems in general
> > equal or slightly slower that MS SQL.
>
> > Inserts/Updates sometimes slow, sometimes a nightmare,
> > in general painfully slow.
>
> This is probably unrelated to the technique I was describing (it's
> called TOAST, by the way).  Try using another approach, for example if
> you have lots of inserts, wrap them in a transaction like
> BEGIN
> INSERT ...
> INSERT ...
> ...
> COMMIT
>
> Or use a COPY statement.  Both of those approaches are much faster than
> using hundreds of standalone inserts.
>
> Be sure to tune your server, for example bump up the shared_buffers
> setting, VACUUM frequently, ANALYZE when there's significant statistical
> change in your data, etc.  There are lots of documents on how to do
> this (I'm sure somebody will post appropiate URLs for those).
>
> --
> Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
> "Por suerte hoy explotó el califont porque si no me habría muerto
> de aburrido"  (Papelucho)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>

Re: Datatypes and performance

From

"Shridhar Daithankar"

Date:

07 July 2003, 08:10:30

On 7 Jul 2003 at 12:59, Mattias Kregert wrote:

> First of all, disable FSYNC... that will speed things up a lot!

That shouldn't have been done so casually.  Though can cause performance boost
and it degrades reliability of data as it can cause data corruption in case of
power/disk failure.

Bye
 Shridhar

--
Why use Windows, since there is a door?(By fachat@galileo.rhein-neckar.de,
Andre Fachat)

Re: Datatypes and performance

From

"Mattias Kregert"

Date:

07 July 2003, 08:34:17

In the case of disk failure, the files will probably be damaged anyway and then i'll have to install new hardware or
format+checkbadblocks and then restore from the backup. I can't see how fsync would help in the case of disk crash.
Withoutreliable raid or something i think this would be a catastrophic failure => get new hardware:
disk/ram/computer/buildingwhatever and go find the (remote) backup. 

In the case of power failure, you'll *might* have to restore from backup unless you use ReiserFS or some other
journallingfilesystem. I use ReiserFS. I also have a UPS installed, just to be sure... 

Journalling FS will fix the FS problems, so the files are ok.
PG journal will fix the PG problems so the tables will be ok.

From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>
> On 7 Jul 2003 at 12:59, Mattias Kregert wrote:
>
> > First of all, disable FSYNC... that will speed things up a lot!
>
> That shouldn't have been done so casually.  Though can cause performance boost
> and it degrades reliability of data as it can cause data corruption in case of
> power/disk failure.
>
> Bye
>  Shridhar

Re: Datatypes and performance

From

Martijn van Oosterhout

Date:

07 July 2003, 08:46:49

On Mon, Jul 07, 2003 at 01:30:09PM +0200, Mattias Kregert wrote:

> In the case of disk failure, the files will probably be damaged anyway and
> then i'll have to install new hardware or format+check badblocks and then
> restore from the backup. I can't see how fsync would help in the case of
> disk crash. Without reliable raid or something i think this would be a
> catastrophic failure => get new hardware: disk/ram/computer/building
> whatever and go find the (remote) backup.

True, it doesn't help with one disk, but RAID does work.

> In the case of power failure, you'll *might* have to restore from backup
> unless you use ReiserFS or some other journalling filesystem. I use
> ReiserFS. I also have a UPS installed, just to be sure...

> Journalling FS will fix the FS problems, so the files are ok.
> PG journal will fix the PG problems so the tables will be ok.

Firstly, journalling filesystems insure the integrity of the *filesystem*,
not the files on it. So your files can still be corrupted. You could enable
full data journalling. I would imagine that would cost you more than just
enabling fsync.

Secondly, IIRC the fsync applies to the PG journals, so turning off fsync
will kill the tables in a crash.

Basically, do you care about your data?

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values or
> religion but rather by its superiority in applying organized violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Attachment

msg-32091-62841.dat

Re: Datatypes and performance

From

Tom Lane

Date:

07 July 2003, 10:29:17

"Mattias Kregert" <mattias@kregert.se> writes:
> [ misguided analysis ]

> Journalling FS will fix the FS problems, so the files are ok.
> PG journal will fix the PG problems so the tables will be ok.

Only if the journal is all down to disk before the crash.

The fundamental problem with fsync off is that it's likely to violate
the WAL principle (write journal entry before data entry it describes).
If you then crash, you have data entries that correspond to transactions
that should not have been committed (because WAL replay won't guarantee
recovering all of the transaction's effects).  In other words, corrupt
data.

If we had a less costly way of guaranteeing write order than fsync, we'd
use it, but there is no other portable method.

            regards, tom lane

Re: Datatypes and performance

From

"Mattias Kregert"

Date:

07 July 2003, 10:46:45

Tom Lane wrote:
> "Mattias Kregert" <mattias@kregert.se> writes:
> > [ misguided analysis ]
>
> > Journalling FS will fix the FS problems, so the files are ok.
> > PG journal will fix the PG problems so the tables will be ok.
>
> Only if the journal is all down to disk before the crash.
>
> The fundamental problem with fsync off is that it's likely to violate
> the WAL principle (write journal entry before data entry it describes).
> If you then crash, you have data entries that correspond to transactions
> that should not have been committed (because WAL replay won't guarantee
> recovering all of the transaction's effects).  In other words, corrupt
> data.
>
> If we had a less costly way of guaranteeing write order than fsync, we'd
> use it, but there is no other portable method.

Uh oh... i thought the journal was always synced, and that the fsync option only affected table writes... :(

If I turn fsync on and then pull the power cord while a number of clients are doing lots of inserts/updates and stuff,
willthe fsync then guarantee that no data will be lost or corrupted? 

/* m */

Re: Datatypes and performance

From

Karsten Hilbert

Date:

07 July 2003, 11:40:21

> If I turn fsync on and then pull the power cord while a
> number of clients are doing lots of inserts/updates and stuff,
> will the fsync then guarantee that no data will be lost or
> corrupted?
You are surely kidding, aren't you ?

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Datatypes and performance

From

"Mattias Kregert"

Date:

07 July 2003, 12:17:02

> > If I turn fsync on and then pull the power cord while a
> > number of clients are doing lots of inserts/updates and stuff,
> > will the fsync then guarantee that no data will be lost or
> > corrupted?
> You are surely kidding, aren't you ?
>
> Karsten

No. No kidding.
Just to clarify, what I mean is: With FSYNC enabled, after a power failure, after "pg_ctl start" and replay of xact log
etc;Are COMMITTED transactions guaranteed to be intact, and are UNCOMMITTED transactions guaranteed not to appear in
thetables? 

If the answer is "yes", then I understand the use of FSYNC.

If the answer is "no", then i don't see the point in using FSYNC at all.

/* m */

Re: Datatypes and performance

From

"Maksim Likharev"

Date:

07 July 2003, 12:38:20

If disabling FSYNC means that some my transaction log records will
reside in
some OS memory cache instead of been written on a disk after transaction
is commited,
I do now want to disable that.


-----Original Message-----
From: Mattias Kregert [mailto:mattias@kregert.se]
Sent: Monday, July 07, 2003 3:59 AM
To: Alvaro Herrera; Maksim Likharev
Cc: PostgreSQL List
Subject: Re: [GENERAL] Datatypes and performance


First of all, disable FSYNC... that will speed things up a lot!

/M

----- Original Message -----
From: "Alvaro Herrera" <alvherre@dcc.uchile.cl>
To: "Maksim Likharev" <mlikharev@aurigin.com>
Cc: "PostgreSQL List" <pgsql-general@postgresql.org>
Sent: Saturday, July 05, 2003 8:15 AM
Subject: Re: [GENERAL] Datatypes and performance


> On Fri, Jul 04, 2003 at 10:50:17PM -0700, Maksim Likharev wrote:
> > Wasn't any sarcasm in my words,
>
> Well, I had really thought so.  I'm no native english speaker so I
> frequently misunderstand people.
>
> > but I have benchmarks and my observations about performance in
general.
> > Just want to say, I haven't seen much other DBMS systems other then
> > Microsoft SQL, and Postgres 7.3.x, so my performance observations
> > based on those systems.  Selects, do not have any problems in
general
> > equal or slightly slower that MS SQL.
>
> > Inserts/Updates sometimes slow, sometimes a nightmare,
> > in general painfully slow.
>
> This is probably unrelated to the technique I was describing (it's
> called TOAST, by the way).  Try using another approach, for example if
> you have lots of inserts, wrap them in a transaction like
> BEGIN
> INSERT ...
> INSERT ...
> ...
> COMMIT
>
> Or use a COPY statement.  Both of those approaches are much faster
than
> using hundreds of standalone inserts.
>
> Be sure to tune your server, for example bump up the shared_buffers
> setting, VACUUM frequently, ANALYZE when there's significant
statistical
> change in your data, etc.  There are lots of documents on how to do
> this (I'm sure somebody will post appropiate URLs for those).
>
> --
> Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
> "Por suerte hoy explotó el califont porque si no me habría muerto
> de aburrido"  (Papelucho)
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
your
>       joining column's datatypes do not match
>

Re: Datatypes and performance

From

Richard Huxton

Date:

07 July 2003, 12:49:31

On Monday 07 Jul 2003 4:11 pm, Mattias Kregert wrote:
> > > If I turn fsync on and then pull the power cord while a
> > > number of clients are doing lots of inserts/updates and stuff,
> > > will the fsync then guarantee that no data will be lost or
> > > corrupted?
> >
> > You are surely kidding, aren't you ?
> >
> > Karsten
>
> No. No kidding.
> Just to clarify, what I mean is: With FSYNC enabled, after a power failure,
> after "pg_ctl start" and replay of xact log etc; Are COMMITTED transactions
> guaranteed to be intact, and are UNCOMMITTED transactions guaranteed not to
> appear in the tables?
>
> If the answer is "yes", then I understand the use of FSYNC.
>
> If the answer is "no", then i don't see the point in using FSYNC at all.

The answer is "yes" providing:
1. Your hardware didn't suffer a failure during the outage.
2. Your disks don't like to the operating-system.

The second point is important - some disks have write-cache enabled and report
"done" when data is written to the cache not the platter. Google for
discussion.

--
  Richard Huxton

Re: Datatypes and performance

From

Lincoln Yeoh

Date:

07 July 2003, 18:17:23

Just because all reported commits go in doesn't mean there won't be any
data loss.

If you pull the power cord, the DB should be in a consistent state barring
hardware and other issues. However there can be data loss, because the
clients might have been inserting data which may not be able to be
reproduced again, but which was not committed nor reported as committed,
and the clients may no longer have a copy of the data. So how does the app
or user deal with that? That can determine if data is lost holistically.

For example, if you were in the process of inserting 1 million rows from a
raw source, haven't committed and someone pulls the plug, the transaction
will not be committed. Or a prospective customer clicks "Submit Order", and
you pull the plug just then, customer sees the browser spin for a minute or
so, times out, gives up goes somewhere else, and you don't get the order.

Given I've seen lots of things not work as they should, 100% guarantees
seem like wishful thinking. I'd just believe that with Fsync on, the
probability of commits being saved to disk is very high. Whereas with fsync
off, the probability is a lot lower especially if the O/S is prone to
taking 30 seconds or longer to sync outstanding data to disk.

Also, if power was lost in the middle of an fsync I'm not sure what
actually happens. I think there are some checks, but they often assume
atomicity of an operation at a particular level. Even if that holds true,
the hardware could still fail on you - writing something whilst running out
of power is not a good situation.

Regards,
Link.

At 05:11 PM 7/7/2003 +0200, Mattias Kregert wrote:

> > > If I turn fsync on and then pull the power cord while a
> > > number of clients are doing lots of inserts/updates and stuff,
> > > will the fsync then guarantee that no data will be lost or
> > > corrupted?
> > You are surely kidding, aren't you ?
> >
> > Karsten
>
>No. No kidding.
>Just to clarify, what I mean is: With FSYNC enabled, after a power
>failure, after "pg_ctl start" and replay of xact log etc; Are COMMITTED
>transactions guaranteed to be intact, and are UNCOMMITTED transactions
>guaranteed not to appear in the tables?
>
>If the answer is "yes", then I understand the use of FSYNC.
>
>If the answer is "no", then i don't see the point in using FSYNC at all.
>
>/* m */
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Datatypes and performance

From

Andrew Sullivan

Date:

08 July 2003, 01:33:07

On Fri, Jul 04, 2003 at 02:22:39PM -0700, Maksim Likharev wrote:
> Ok, what I see here tells me that text is slower then fixed len varchar,
> due to stored in separate table ( but how else you can store long fields
> ).

No, sorry, I think I was unclear.  varchar() is the same thing as
text(); they'll be the same assuming the same length.

If you go over the length for TOAST (1k?  2k?  It's in the docs, I
think, but I can't recall where) then you'll get it TOASTed.  But
that's true for most (all?  I seem to recall so -- I don't have my
local copy of the docs here) datatypes.  One way to avoid running
over that is to use varchar(n), for some small-enough value of n.
But then you pay with overhead on every insert, because you have to
check the length.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Re: Datatypes and performance

From

"Mattias Kregert"

Date:

08 July 2003, 05:10:53

From: "Lincoln Yeoh" <lyeoh@pop.jaring.my>
> Just because all reported commits go in doesn't mean there won't be any
> data loss.

Ok, but I am only talking about the database here... I am not counting in the possibility of hardware failure as a
resultof the power failure. That would be a non-recoverable error, just as if the city got nuked or the house burnt
downto the ground... In those cases i have to rebuild the hardware and restore from a remote backup (daily backups over
thenet to another city) and lose one days work. 

> If you pull the power cord, the DB should be in a consistent state barring
> hardware and other issues. However there can be data loss, because the
> clients might have been inserting data which may not be able to be
> reproduced again, but which was not committed nor reported as committed,
> and the clients may no longer have a copy of the data. So how does the app
> or user deal with that? That can determine if data is lost holistically.

No problem. I leave it to the client/user to ensure that data does not get lost at the client side until the backend
reportsit as committed. The client has to be responsible for the data until is has successfully handed it over to the
dbserver. 
When the db server has got the data, it is responsible for it, except in the case of hardware failures.

> Given I've seen lots of things not work as they should, 100% guarantees
> seem like wishful thinking. I'd just believe that with Fsync on, the
> probability of commits being saved to disk is very high. Whereas with fsync
> off, the probability is a lot lower especially if the O/S is prone to
> taking 30 seconds or longer to sync outstanding data to disk.

Ok, maybe I'll have to turn it on again, then... I thought the journal was fsynced anyway, and that the Fsync option
onlywas for old-style table file syncing... 

/* m */

>
> Regards,
> Link.
>
> At 05:11 PM 7/7/2003 +0200, Mattias Kregert wrote:
>
> > > > If I turn fsync on and then pull the power cord while a
> > > > number of clients are doing lots of inserts/updates and stuff,
> > > > will the fsync then guarantee that no data will be lost or
> > > > corrupted?
> > > You are surely kidding, aren't you ?
> > >
> > > Karsten
> >
> >No. No kidding.
> >Just to clarify, what I mean is: With FSYNC enabled, after a power
> >failure, after "pg_ctl start" and replay of xact log etc; Are COMMITTED
> >transactions guaranteed to be intact, and are UNCOMMITTED transactions
> >guaranteed not to appear in the tables?
> >
> >If the answer is "yes", then I understand the use of FSYNC.
> >
> >If the answer is "no", then i don't see the point in using FSYNC at all.
> >
> >/* m */
> >
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>

Re: Datatypes and performance

From

Bruce Momjian

Date:

20 July 2003, 23:10:50

Richard Huxton wrote:
> > If the answer is "yes", then I understand the use of FSYNC.
> >
> > If the answer is "no", then i don't see the point in using FSYNC at all.
>
> The answer is "yes" providing:
> 1. Your hardware didn't suffer a failure during the outage.
> 2. Your disks don't like to the operating-system.
>
> The second point is important - some disks have write-cache enabled and report
> "done" when data is written to the cache not the platter. Google for
> discussion.

The problem here is whether write-cache is enabled _and_ the disk drive
doesn't use some trick of using the disk drive rotational energy to
ensure that the write cache gets flushed to disk on a power failure, or
perhaps uses some short-term battery to make sure the data gets flushed.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Datatypes and performance

From

Bruce Momjian

Date:

20 July 2003, 23:31:42

Lincoln Yeoh wrote:
> Just because all reported commits go in doesn't mean there won't be any
> data loss.
>
> If you pull the power cord, the DB should be in a consistent state barring
> hardware and other issues. However there can be data loss, because the
> clients might have been inserting data which may not be able to be
> reproduced again, but which was not committed nor reported as committed,
> and the clients may no longer have a copy of the data. So how does the app
> or user deal with that? That can determine if data is lost holistically.
>
> For example, if you were in the process of inserting 1 million rows from a
> raw source, haven't committed and someone pulls the plug, the transaction
> will not be committed. Or a prospective customer clicks "Submit Order", and
> you pull the plug just then, customer sees the browser spin for a minute or
> so, times out, gives up goes somewhere else, and you don't get the order.

Well, the user wasn't informed the operation succeeded, so I would
assume they would try the operation again later.

> Given I've seen lots of things not work as they should, 100% guarantees
> seem like wishful thinking. I'd just believe that with Fsync on, the
> probability of commits being saved to disk is very high. Whereas with fsync

With fsync on, there is no probability --- if the drive properly reports
fsync (or guarantees it in case of a power failure) and the hardware
doesn't fail, your data is 100% safe on restart.  With fsync off, if
there was any database activity before the failure (within 5 minutes),
there almost certainly was data in the kernel disk buffers that didn't
make it to disk, and the database will be consistent on restart.

On failure, I am talking about power, OS, or hardware failures.

> off, the probability is a lot lower especially if the O/S is prone to
> taking 30 seconds or longer to sync outstanding data to disk.
>
> Also, if power was lost in the middle of an fsync I'm not sure what
> actually happens. I think there are some checks, but they often assume
> atomicity of an operation at a particular level. Even if that holds true,
> the hardware could still fail on you - writing something whilst running out
> of power is not a good situation.

If failure is during fsync, we haven't yet reported the transaction as
complete.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073