Thread: Datatypes and performance
Couple of questions on datat types and performance 1. varchar vs varchar(2) I created a database schema and based on the test data I had to work with, I couldn't really determine the max size of a lot of string fields so I just left a lot of fields as varchar A comment from a coworker was that this would cause a performance problem Based on the docs, they say that there's not performance difference between using varchar(n) and text. So will leaving my fields as unlimited varchar be a performance issue? Or should I try to narrow them down? My coworker has more experience with Oracle and MS-SQL than postgresql 2.varchar and int I was using a varchar as one field that's part of an index. Looking at the data, I realized I could just use an int instead. My assumption would be that an int would be faster to serach for thena varchar, so I converted the field to int. Is this a valid assumption? Thanks
On Friday 04 July 2003 21:40, Jay O'Connor wrote: > 2.varchar and int > I was using a varchar as one field that's part of an index. Looking at the > data, I realized I could just use an int instead. My assumption would be > that an int would be faster to serach for thena varchar, so I converted > the field to int. Is this a valid assumption? Certainly yes. Shridhar
On Fri, Jul 04, 2003 at 09:10:41AM -0700, Jay O'Connor wrote: > I created a database schema and based on the test data I had to work with, > I couldn't really determine the max size of a lot of string fields so I > just left a lot of fields as varchar > > A comment from a coworker was that this would cause a performance problem > > Based on the docs, they say that there's not performance difference between > using varchar(n) and text. The only difference comes from checking the length for the limited fields, so varchar(n) will be very slightly slower than text. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "There was no reply" (Kernel Traffic)
On Fri, Jul 04, 2003 at 09:10:41AM -0700, Jay O'Connor wrote: > Based on the docs, they say that there's not performance difference between > using varchar(n) and text. > > So will leaving my fields as unlimited varchar be a performance issue? Or > should I try to narrow them down? In fact, there is a performance penalty for limiting their length, because you have to check on each insert. > My coworker has more experience with Oracle and MS-SQL than postgresql You may want to tell your coworker to read the docs ;-) A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
How postgres internally stores text fields, in a separate table? -----Original Message----- From: Andrew Sullivan [mailto:andrew@libertyrms.info] Sent: Friday, July 04, 2003 12:55 PM To: PostgreSQL List Subject: Re: [GENERAL] Datatypes and performance On Fri, Jul 04, 2003 at 09:10:41AM -0700, Jay O'Connor wrote: > Based on the docs, they say that there's not performance difference between > using varchar(n) and text. > > So will leaving my fields as unlimited varchar be a performance issue? Or > should I try to narrow them down? In fact, there is a performance penalty for limiting their length, because you have to check on each insert. > My coworker has more experience with Oracle and MS-SQL than postgresql You may want to tell your coworker to read the docs ;-) A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110 ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
On Fri, Jul 04, 2003 at 01:14:52PM -0700, Maksim Likharev wrote: > How postgres internally stores text fields, in a separate table? I believe it gets stored in a separate table just in case it's too long (read the docs on TOAST if you want more about this). But normally, no. Here's what the docs have to say about it: ---cut here--- The storage requirement for data of these types is 4 bytes plus the actual string, and in case of character plus the padding. Long strings are compressed by the system automatically, so the physical requirement on disk may be less. Long values are also stored in background tables so they don't interfere with rapid access to the shorter column values. In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less than that. It wouldn't be very useful to change this because with multibyte character encodings the number of characters and bytes can be quite different anyway. If you desire to store long strings with no specific upper limit, use text or character varying without a length specifier, rather than making up an arbitrary length limit.) ---cut here--- A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Ok, what I see here tells me that text is slower then fixed len varchar, due to stored in separate table ( but how else you can store long fields ). so postgres has to read another page(s) in order to get long value. Story about boundary checks for varchar just does not count, just nothing with comparing with disk reads/writes. -----Original Message----- From: Andrew Sullivan [mailto:andrew@libertyrms.info] Sent: Friday, July 04, 2003 1:24 PM To: PostgreSQL List Subject: Re: [GENERAL] Datatypes and performance On Fri, Jul 04, 2003 at 01:14:52PM -0700, Maksim Likharev wrote: > How postgres internally stores text fields, in a separate table? I believe it gets stored in a separate table just in case it's too long (read the docs on TOAST if you want more about this). But normally, no. Here's what the docs have to say about it: ---cut here--- The storage requirement for data of these types is 4 bytes plus the actual string, and in case of character plus the padding. Long strings are compressed by the system automatically, so the physical requirement on disk may be less. Long values are also stored in background tables so they don't interfere with rapid access to the shorter column values. In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less than that. It wouldn't be very useful to change this because with multibyte character encodings the number of characters and bytes can be quite different anyway. If you desire to store long strings with no specific upper limit, use text or character varying without a length specifier, rather than making up an arbitrary length limit.) ---cut here--- A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110 ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
That's "long values" that are stored elsewhere. I believe the length has to be a good portion of a page size, normally 4KB, before it's considered for placing in the toast table. I'm not sure if the details are in the documentation but I'm sure they'll appear in a search of the archive. Of course, someone like Tom, Bruce etc. while no doubt pop up with the specifics. -- Nigel J. Andrews On Fri, 4 Jul 2003, Maksim Likharev wrote: > Ok, what I see here tells me that text is slower then fixed len varchar, > due to stored in separate table ( but how else you can store long fields > ). > so postgres has to read another page(s) in order to get long value. > Story about boundary checks for varchar just does not count, > just nothing with comparing with disk reads/writes. > > > -----Original Message----- > From: Andrew Sullivan [mailto:andrew@libertyrms.info] > Sent: Friday, July 04, 2003 1:24 PM > To: PostgreSQL List > Subject: Re: [GENERAL] Datatypes and performance > > > On Fri, Jul 04, 2003 at 01:14:52PM -0700, Maksim Likharev wrote: > > How postgres internally stores text fields, in a separate table? > > I believe it gets stored in a separate table just in case it's too > long (read the docs on TOAST if you want more about this). But > normally, no. Here's what the docs have to say about it: > > ---cut here--- > The storage requirement for data of these types is 4 bytes plus the > actual string, and in case of character plus the padding. Long > strings are compressed by the system automatically, so the physical > requirement on disk may be less. Long values are also stored in > background tables so they don't interfere with rapid access to the > shorter column values. In any case, the longest possible character > string that can be stored is about 1 GB. (The maximum value that will > be allowed for n in the data type declaration is less than that. It > wouldn't be very useful to change this because with multibyte > character encodings the number of characters and bytes can be quite > different anyway. If you desire to store long strings with no > specific upper limit, use text or character varying without a length > specifier, rather than making up an arbitrary length limit.) > ---cut here--- > > A >
On Fri, Jul 04, 2003 at 02:22:39PM -0700, Maksim Likharev wrote: > Ok, what I see here tells me that text is slower then fixed len varchar, > due to stored in separate table ( but how else you can store long fields > ). > so postgres has to read another page(s) in order to get long value. That's regardless of the datatype: a varchar longer than 2 KiB IIRC will be stored in a separate table, just as a text longer than 2 KiB. There's no difference _at all_ for those two datatypes _except_ that the former is checked for maximum length. If you store 256 chars in a TEXT field it will be in the main table as it were a varchar(256). -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Nunca confiaré en un traidor. Ni siquiera si el traidor lo he creado yo" (Barón Vladimir Harkonnen)
Ok, then if I store 4K - 1byte in a varchar and after that 4K + 1 byte, and again and again, you are saying me that my text will be jerking around 2 tables? So in reality no performance degradation/benefits for varchar vs text, should be read as 'varchar as slow as text' or keep you varchar under 4K if you want to read it fast. Pretty useful detail, thank you. -----Original Message----- From: Alvaro Herrera [mailto:alvherre@dcc.uchile.cl] Sent: Friday, July 04, 2003 2:54 PM To: Maksim Likharev Cc: PostgreSQL List Subject: Re: [GENERAL] Datatypes and performance On Fri, Jul 04, 2003 at 02:22:39PM -0700, Maksim Likharev wrote: > Ok, what I see here tells me that text is slower then fixed len varchar, > due to stored in separate table ( but how else you can store long fields > ). > so postgres has to read another page(s) in order to get long value. That's regardless of the datatype: a varchar longer than 2 KiB IIRC will be stored in a separate table, just as a text longer than 2 KiB. There's no difference _at all_ for those two datatypes _except_ that the former is checked for maximum length. If you store 256 chars in a TEXT field it will be in the main table as it were a varchar(256). -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Nunca confiaré en un traidor. Ni siquiera si el traidor lo he creado yo" (Barón Vladimir Harkonnen)
On Fri, Jul 04, 2003 at 06:57:04PM -0700, Maksim Likharev wrote: > Ok, then if I store 4K - 1byte in a varchar and after that 4K + 1 byte, > and again and again, you are saying me that my text will be jerking > around 2 tables? In any case the tuple will probably "jerk around" different pages (I don't think it's different whether the pages are from two tables or from the same one). I fail to understand how that is a different performance problem than for any other datatype. I'm sure you've done some benchmark to determine that the varchar handling is as slow as you appear to be saying? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Postgres is bloatware by design: it was built to house PhD theses." (Joey Hellerstein, SIGMOD annual conference 2002)
On Fri, 4 Jul 2003, Maksim Likharev wrote: > Ok, then if I store 4K - 1byte in a varchar and after that 4K + 1 byte, > and again and again, you are saying me that my text will be jerking > around 2 tables? Updates in postgresql is not done inplace. It just adds the new data and the old is kept around so that other concurrent transactions can still run and use the old data. This gives a lot of speed when you have many concurrent users. Vacuum cleans out old unused data. -- /Dennis
Wasn't any sarcasm in my words, as I sad I am really glad to see real numbers behind the story about no performance difference for varchar and text. I do not have performance metrics for varchars, but I have benchmarks and my observations about performance in general. Just want to say, I haven't seen much other DBMS systems other then Microsoft SQL, and Postgres 7.3.x, so my performance observations based on those systems. Selects, do not have any problems in general equal or slightly slower that MS SQL. Inserts/Updates sometimes slow, sometimes a nightmare, in general painfully slow. I guess as long as Postgres retrieves data fast enough I satisfied, but for dynamic DBs ( lots of inserts/updates ) .... P.S do not want to offend somebody or something, just saying what I see. -----Original Message----- From: Alvaro Herrera [mailto:alvherre@dcc.uchile.cl] Sent: Friday, July 04, 2003 8:13 PM To: Maksim Likharev Cc: PostgreSQL List Subject: Re: [GENERAL] Datatypes and performance On Fri, Jul 04, 2003 at 06:57:04PM -0700, Maksim Likharev wrote: > Ok, then if I store 4K - 1byte in a varchar and after that 4K + 1 byte, > and again and again, you are saying me that my text will be jerking > around 2 tables? In any case the tuple will probably "jerk around" different pages (I don't think it's different whether the pages are from two tables or from the same one). I fail to understand how that is a different performance problem than for any other datatype. I'm sure you've done some benchmark to determine that the varchar handling is as slow as you appear to be saying? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Postgres is bloatware by design: it was built to house PhD theses." (Joey Hellerstein, SIGMOD annual conference 2002)
On Fri, Jul 04, 2003 at 10:50:17PM -0700, Maksim Likharev wrote: > Wasn't any sarcasm in my words, Well, I had really thought so. I'm no native english speaker so I frequently misunderstand people. > but I have benchmarks and my observations about performance in general. > Just want to say, I haven't seen much other DBMS systems other then > Microsoft SQL, and Postgres 7.3.x, so my performance observations > based on those systems. Selects, do not have any problems in general > equal or slightly slower that MS SQL. > Inserts/Updates sometimes slow, sometimes a nightmare, > in general painfully slow. This is probably unrelated to the technique I was describing (it's called TOAST, by the way). Try using another approach, for example if you have lots of inserts, wrap them in a transaction like BEGIN INSERT ... INSERT ... ... COMMIT Or use a COPY statement. Both of those approaches are much faster than using hundreds of standalone inserts. Be sure to tune your server, for example bump up the shared_buffers setting, VACUUM frequently, ANALYZE when there's significant statistical change in your data, etc. There are lots of documents on how to do this (I'm sure somebody will post appropiate URLs for those). -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Por suerte hoy explotó el califont porque si no me habría muerto de aburrido" (Papelucho)
First of all, disable FSYNC... that will speed things up a lot! /M ----- Original Message ----- From: "Alvaro Herrera" <alvherre@dcc.uchile.cl> To: "Maksim Likharev" <mlikharev@aurigin.com> Cc: "PostgreSQL List" <pgsql-general@postgresql.org> Sent: Saturday, July 05, 2003 8:15 AM Subject: Re: [GENERAL] Datatypes and performance > On Fri, Jul 04, 2003 at 10:50:17PM -0700, Maksim Likharev wrote: > > Wasn't any sarcasm in my words, > > Well, I had really thought so. I'm no native english speaker so I > frequently misunderstand people. > > > but I have benchmarks and my observations about performance in general. > > Just want to say, I haven't seen much other DBMS systems other then > > Microsoft SQL, and Postgres 7.3.x, so my performance observations > > based on those systems. Selects, do not have any problems in general > > equal or slightly slower that MS SQL. > > > Inserts/Updates sometimes slow, sometimes a nightmare, > > in general painfully slow. > > This is probably unrelated to the technique I was describing (it's > called TOAST, by the way). Try using another approach, for example if > you have lots of inserts, wrap them in a transaction like > BEGIN > INSERT ... > INSERT ... > ... > COMMIT > > Or use a COPY statement. Both of those approaches are much faster than > using hundreds of standalone inserts. > > Be sure to tune your server, for example bump up the shared_buffers > setting, VACUUM frequently, ANALYZE when there's significant statistical > change in your data, etc. There are lots of documents on how to do > this (I'm sure somebody will post appropiate URLs for those). > > -- > Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) > "Por suerte hoy explotó el califont porque si no me habría muerto > de aburrido" (Papelucho) > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match >
On 7 Jul 2003 at 12:59, Mattias Kregert wrote: > First of all, disable FSYNC... that will speed things up a lot! That shouldn't have been done so casually. Though can cause performance boost and it degrades reliability of data as it can cause data corruption in case of power/disk failure. Bye Shridhar -- Why use Windows, since there is a door?(By fachat@galileo.rhein-neckar.de, Andre Fachat)
In the case of disk failure, the files will probably be damaged anyway and then i'll have to install new hardware or format+checkbadblocks and then restore from the backup. I can't see how fsync would help in the case of disk crash. Withoutreliable raid or something i think this would be a catastrophic failure => get new hardware: disk/ram/computer/buildingwhatever and go find the (remote) backup. In the case of power failure, you'll *might* have to restore from backup unless you use ReiserFS or some other journallingfilesystem. I use ReiserFS. I also have a UPS installed, just to be sure... Journalling FS will fix the FS problems, so the files are ok. PG journal will fix the PG problems so the tables will be ok. From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> > On 7 Jul 2003 at 12:59, Mattias Kregert wrote: > > > First of all, disable FSYNC... that will speed things up a lot! > > That shouldn't have been done so casually. Though can cause performance boost > and it degrades reliability of data as it can cause data corruption in case of > power/disk failure. > > Bye > Shridhar
On Mon, Jul 07, 2003 at 01:30:09PM +0200, Mattias Kregert wrote: > In the case of disk failure, the files will probably be damaged anyway and > then i'll have to install new hardware or format+check badblocks and then > restore from the backup. I can't see how fsync would help in the case of > disk crash. Without reliable raid or something i think this would be a > catastrophic failure => get new hardware: disk/ram/computer/building > whatever and go find the (remote) backup. True, it doesn't help with one disk, but RAID does work. > In the case of power failure, you'll *might* have to restore from backup > unless you use ReiserFS or some other journalling filesystem. I use > ReiserFS. I also have a UPS installed, just to be sure... > Journalling FS will fix the FS problems, so the files are ok. > PG journal will fix the PG problems so the tables will be ok. Firstly, journalling filesystems insure the integrity of the *filesystem*, not the files on it. So your files can still be corrupted. You could enable full data journalling. I would imagine that would cost you more than just enabling fsync. Secondly, IIRC the fsync applies to the PG journals, so turning off fsync will kill the tables in a crash. Basically, do you care about your data? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "the West won the world not by the superiority of its ideas or values or > religion but rather by its superiority in applying organized violence. > Westerners often forget this fact, non-Westerners never do." > - Samuel P. Huntington
Attachment
"Mattias Kregert" <mattias@kregert.se> writes: > [ misguided analysis ] > Journalling FS will fix the FS problems, so the files are ok. > PG journal will fix the PG problems so the tables will be ok. Only if the journal is all down to disk before the crash. The fundamental problem with fsync off is that it's likely to violate the WAL principle (write journal entry before data entry it describes). If you then crash, you have data entries that correspond to transactions that should not have been committed (because WAL replay won't guarantee recovering all of the transaction's effects). In other words, corrupt data. If we had a less costly way of guaranteeing write order than fsync, we'd use it, but there is no other portable method. regards, tom lane
Tom Lane wrote: > "Mattias Kregert" <mattias@kregert.se> writes: > > [ misguided analysis ] > > > Journalling FS will fix the FS problems, so the files are ok. > > PG journal will fix the PG problems so the tables will be ok. > > Only if the journal is all down to disk before the crash. > > The fundamental problem with fsync off is that it's likely to violate > the WAL principle (write journal entry before data entry it describes). > If you then crash, you have data entries that correspond to transactions > that should not have been committed (because WAL replay won't guarantee > recovering all of the transaction's effects). In other words, corrupt > data. > > If we had a less costly way of guaranteeing write order than fsync, we'd > use it, but there is no other portable method. Uh oh... i thought the journal was always synced, and that the fsync option only affected table writes... :( If I turn fsync on and then pull the power cord while a number of clients are doing lots of inserts/updates and stuff, willthe fsync then guarantee that no data will be lost or corrupted? /* m */
> If I turn fsync on and then pull the power cord while a > number of clients are doing lots of inserts/updates and stuff, > will the fsync then guarantee that no data will be lost or > corrupted? You are surely kidding, aren't you ? Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
> > If I turn fsync on and then pull the power cord while a > > number of clients are doing lots of inserts/updates and stuff, > > will the fsync then guarantee that no data will be lost or > > corrupted? > You are surely kidding, aren't you ? > > Karsten No. No kidding. Just to clarify, what I mean is: With FSYNC enabled, after a power failure, after "pg_ctl start" and replay of xact log etc;Are COMMITTED transactions guaranteed to be intact, and are UNCOMMITTED transactions guaranteed not to appear in thetables? If the answer is "yes", then I understand the use of FSYNC. If the answer is "no", then i don't see the point in using FSYNC at all. /* m */
If disabling FSYNC means that some my transaction log records will reside in some OS memory cache instead of been written on a disk after transaction is commited, I do now want to disable that. -----Original Message----- From: Mattias Kregert [mailto:mattias@kregert.se] Sent: Monday, July 07, 2003 3:59 AM To: Alvaro Herrera; Maksim Likharev Cc: PostgreSQL List Subject: Re: [GENERAL] Datatypes and performance First of all, disable FSYNC... that will speed things up a lot! /M ----- Original Message ----- From: "Alvaro Herrera" <alvherre@dcc.uchile.cl> To: "Maksim Likharev" <mlikharev@aurigin.com> Cc: "PostgreSQL List" <pgsql-general@postgresql.org> Sent: Saturday, July 05, 2003 8:15 AM Subject: Re: [GENERAL] Datatypes and performance > On Fri, Jul 04, 2003 at 10:50:17PM -0700, Maksim Likharev wrote: > > Wasn't any sarcasm in my words, > > Well, I had really thought so. I'm no native english speaker so I > frequently misunderstand people. > > > but I have benchmarks and my observations about performance in general. > > Just want to say, I haven't seen much other DBMS systems other then > > Microsoft SQL, and Postgres 7.3.x, so my performance observations > > based on those systems. Selects, do not have any problems in general > > equal or slightly slower that MS SQL. > > > Inserts/Updates sometimes slow, sometimes a nightmare, > > in general painfully slow. > > This is probably unrelated to the technique I was describing (it's > called TOAST, by the way). Try using another approach, for example if > you have lots of inserts, wrap them in a transaction like > BEGIN > INSERT ... > INSERT ... > ... > COMMIT > > Or use a COPY statement. Both of those approaches are much faster than > using hundreds of standalone inserts. > > Be sure to tune your server, for example bump up the shared_buffers > setting, VACUUM frequently, ANALYZE when there's significant statistical > change in your data, etc. There are lots of documents on how to do > this (I'm sure somebody will post appropiate URLs for those). > > -- > Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) > "Por suerte hoy explotó el califont porque si no me habría muerto > de aburrido" (Papelucho) > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match >
On Monday 07 Jul 2003 4:11 pm, Mattias Kregert wrote: > > > If I turn fsync on and then pull the power cord while a > > > number of clients are doing lots of inserts/updates and stuff, > > > will the fsync then guarantee that no data will be lost or > > > corrupted? > > > > You are surely kidding, aren't you ? > > > > Karsten > > No. No kidding. > Just to clarify, what I mean is: With FSYNC enabled, after a power failure, > after "pg_ctl start" and replay of xact log etc; Are COMMITTED transactions > guaranteed to be intact, and are UNCOMMITTED transactions guaranteed not to > appear in the tables? > > If the answer is "yes", then I understand the use of FSYNC. > > If the answer is "no", then i don't see the point in using FSYNC at all. The answer is "yes" providing: 1. Your hardware didn't suffer a failure during the outage. 2. Your disks don't like to the operating-system. The second point is important - some disks have write-cache enabled and report "done" when data is written to the cache not the platter. Google for discussion. -- Richard Huxton
Just because all reported commits go in doesn't mean there won't be any data loss. If you pull the power cord, the DB should be in a consistent state barring hardware and other issues. However there can be data loss, because the clients might have been inserting data which may not be able to be reproduced again, but which was not committed nor reported as committed, and the clients may no longer have a copy of the data. So how does the app or user deal with that? That can determine if data is lost holistically. For example, if you were in the process of inserting 1 million rows from a raw source, haven't committed and someone pulls the plug, the transaction will not be committed. Or a prospective customer clicks "Submit Order", and you pull the plug just then, customer sees the browser spin for a minute or so, times out, gives up goes somewhere else, and you don't get the order. Given I've seen lots of things not work as they should, 100% guarantees seem like wishful thinking. I'd just believe that with Fsync on, the probability of commits being saved to disk is very high. Whereas with fsync off, the probability is a lot lower especially if the O/S is prone to taking 30 seconds or longer to sync outstanding data to disk. Also, if power was lost in the middle of an fsync I'm not sure what actually happens. I think there are some checks, but they often assume atomicity of an operation at a particular level. Even if that holds true, the hardware could still fail on you - writing something whilst running out of power is not a good situation. Regards, Link. At 05:11 PM 7/7/2003 +0200, Mattias Kregert wrote: > > > If I turn fsync on and then pull the power cord while a > > > number of clients are doing lots of inserts/updates and stuff, > > > will the fsync then guarantee that no data will be lost or > > > corrupted? > > You are surely kidding, aren't you ? > > > > Karsten > >No. No kidding. >Just to clarify, what I mean is: With FSYNC enabled, after a power >failure, after "pg_ctl start" and replay of xact log etc; Are COMMITTED >transactions guaranteed to be intact, and are UNCOMMITTED transactions >guaranteed not to appear in the tables? > >If the answer is "yes", then I understand the use of FSYNC. > >If the answer is "no", then i don't see the point in using FSYNC at all. > >/* m */ > > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On Fri, Jul 04, 2003 at 02:22:39PM -0700, Maksim Likharev wrote: > Ok, what I see here tells me that text is slower then fixed len varchar, > due to stored in separate table ( but how else you can store long fields > ). No, sorry, I think I was unclear. varchar() is the same thing as text(); they'll be the same assuming the same length. If you go over the length for TOAST (1k? 2k? It's in the docs, I think, but I can't recall where) then you'll get it TOASTed. But that's true for most (all? I seem to recall so -- I don't have my local copy of the docs here) datatypes. One way to avoid running over that is to use varchar(n), for some small-enough value of n. But then you pay with overhead on every insert, because you have to check the length. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
From: "Lincoln Yeoh" <lyeoh@pop.jaring.my> > Just because all reported commits go in doesn't mean there won't be any > data loss. Ok, but I am only talking about the database here... I am not counting in the possibility of hardware failure as a resultof the power failure. That would be a non-recoverable error, just as if the city got nuked or the house burnt downto the ground... In those cases i have to rebuild the hardware and restore from a remote backup (daily backups over thenet to another city) and lose one days work. > If you pull the power cord, the DB should be in a consistent state barring > hardware and other issues. However there can be data loss, because the > clients might have been inserting data which may not be able to be > reproduced again, but which was not committed nor reported as committed, > and the clients may no longer have a copy of the data. So how does the app > or user deal with that? That can determine if data is lost holistically. No problem. I leave it to the client/user to ensure that data does not get lost at the client side until the backend reportsit as committed. The client has to be responsible for the data until is has successfully handed it over to the dbserver. When the db server has got the data, it is responsible for it, except in the case of hardware failures. > Given I've seen lots of things not work as they should, 100% guarantees > seem like wishful thinking. I'd just believe that with Fsync on, the > probability of commits being saved to disk is very high. Whereas with fsync > off, the probability is a lot lower especially if the O/S is prone to > taking 30 seconds or longer to sync outstanding data to disk. Ok, maybe I'll have to turn it on again, then... I thought the journal was fsynced anyway, and that the Fsync option onlywas for old-style table file syncing... /* m */ > > Regards, > Link. > > At 05:11 PM 7/7/2003 +0200, Mattias Kregert wrote: > > > > > If I turn fsync on and then pull the power cord while a > > > > number of clients are doing lots of inserts/updates and stuff, > > > > will the fsync then guarantee that no data will be lost or > > > > corrupted? > > > You are surely kidding, aren't you ? > > > > > > Karsten > > > >No. No kidding. > >Just to clarify, what I mean is: With FSYNC enabled, after a power > >failure, after "pg_ctl start" and replay of xact log etc; Are COMMITTED > >transactions guaranteed to be intact, and are UNCOMMITTED transactions > >guaranteed not to appear in the tables? > > > >If the answer is "yes", then I understand the use of FSYNC. > > > >If the answer is "no", then i don't see the point in using FSYNC at all. > > > >/* m */ > > > > > >---------------------------(end of broadcast)--------------------------- > >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > >
Richard Huxton wrote: > > If the answer is "yes", then I understand the use of FSYNC. > > > > If the answer is "no", then i don't see the point in using FSYNC at all. > > The answer is "yes" providing: > 1. Your hardware didn't suffer a failure during the outage. > 2. Your disks don't like to the operating-system. > > The second point is important - some disks have write-cache enabled and report > "done" when data is written to the cache not the platter. Google for > discussion. The problem here is whether write-cache is enabled _and_ the disk drive doesn't use some trick of using the disk drive rotational energy to ensure that the write cache gets flushed to disk on a power failure, or perhaps uses some short-term battery to make sure the data gets flushed. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Lincoln Yeoh wrote: > Just because all reported commits go in doesn't mean there won't be any > data loss. > > If you pull the power cord, the DB should be in a consistent state barring > hardware and other issues. However there can be data loss, because the > clients might have been inserting data which may not be able to be > reproduced again, but which was not committed nor reported as committed, > and the clients may no longer have a copy of the data. So how does the app > or user deal with that? That can determine if data is lost holistically. > > For example, if you were in the process of inserting 1 million rows from a > raw source, haven't committed and someone pulls the plug, the transaction > will not be committed. Or a prospective customer clicks "Submit Order", and > you pull the plug just then, customer sees the browser spin for a minute or > so, times out, gives up goes somewhere else, and you don't get the order. Well, the user wasn't informed the operation succeeded, so I would assume they would try the operation again later. > Given I've seen lots of things not work as they should, 100% guarantees > seem like wishful thinking. I'd just believe that with Fsync on, the > probability of commits being saved to disk is very high. Whereas with fsync With fsync on, there is no probability --- if the drive properly reports fsync (or guarantees it in case of a power failure) and the hardware doesn't fail, your data is 100% safe on restart. With fsync off, if there was any database activity before the failure (within 5 minutes), there almost certainly was data in the kernel disk buffers that didn't make it to disk, and the database will be consistent on restart. On failure, I am talking about power, OS, or hardware failures. > off, the probability is a lot lower especially if the O/S is prone to > taking 30 seconds or longer to sync outstanding data to disk. > > Also, if power was lost in the middle of an fsync I'm not sure what > actually happens. I think there are some checks, but they often assume > atomicity of an operation at a particular level. Even if that holds true, > the hardware could still fail on you - writing something whilst running out > of power is not a good situation. If failure is during fsync, we haven't yet reported the transaction as complete. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073