Re: [GENERAL] Re: Is PostgreSQL ready for mission criticalapplications? - Mailing list pgsql-general
From | Bruce Momjian |
---|---|
Subject | Re: [GENERAL] Re: Is PostgreSQL ready for mission criticalapplications? |
Date | |
Msg-id | 199912291945.OAA08391@candle.pha.pa.us Whole thread Raw |
In response to | Re: [GENERAL] Re: Is PostgreSQL ready for mission criticalapplications? (Mike Mascari <mascarm@mascari.com>) |
List | pgsql-general |
This is a great list. I have addressed the oid/sequence issue in my book, chapter 7. > Barnes wrote: > > > > It would be helpful to me to hear about successful and stable > > implementations as well. If some of you who are using PostgreSQL > > successfully could comment on your experiences, I think it would shed some > > worthwhile light on it's capabilities. I'm considering using it for a > > mission critical project, and I would like to know what I am getting into. > > Thank you. > > > > David Barnes > > > > We've used it successfully in a production environment (24 x > 7) for over a year now. Simply reading the mailing list will > greatly improve your chances of success. The problems with > PostgreSQL can be avoided if you know, in advance, what to > avoid. But must people don't. Here's my list of things which > can get you into trouble: > > 0. Running with fsync on - There is the probability that > modified records written into kernel buffers but not written > to disk could exist at the moment of an operating system > crash. Therefore, PostgreSQL's default mode is to run with > fsync() on. This slows down the database by (quite > literally) several orders of magnitude. We've run with fsync > off (-o -F) without a problem. Dump/Reload of large > databases with fsync() on really tests one's pain threshold. > If you trust your OS, run with it off. > > 1. Making use of oids - Problems with dumping/restoring oids > make this development path dangerous. Most people use the > SERIAL data type to generate primary keys. However, SERIAL > itself has some peculiarities, since it just auto-creates a > sequence. Dropping the associated table doesn't drop the > sequence (IIRC), so scripted or automated schema creation > may not be obvious. I prefer to manually use a sequence and > an int4 type for primary keys. In fact, you could use the > same sequence for all of your primary keys, if you aren't > exposing the value to the user in any meaningful way, and > don't plan to hit the 4.2 billion limit of int4 soon, and > don't care about gaps...(although a purist would argue, and > I agree, that IF you are going to use generated keys THEN > the key should have no more meaning then that it refers to > a record). > > 2. Using views created with large queries - Views use the > rewrite system and rules to rewrite a query against it to > properly fetch data from the underlying tables. Because > there is currently a limit on the size of a single database > record (8192 bytes), the queries associated with views can > only be so big. In addition, you can get into trouble if > views are built on top of user-defined functions, which is a > common thing to do. If you drop/recreate the underlying > function, then the view needs to be dropped and recreated as > well. In addition, I've had trouble with dump/reload of > views in the past, and have always kept my schema in > separate views.sql script, just in case... > > 3. Using non-standard types - Because of problems with data > comparisons, type coercion and spaces, we avoided types such > as bpchar, char, and even text. We avoided text since ODBC > clients could not determine maximum field width. We also > avoided all of the non-4 byte integer types, such as int2. > This is because the default type coercion (sp?) code in the > past has had trouble being smart enough to use indexes when > given a SELECT such as: > > CREATE TABLE y (x text, z int2); > > SELECT x FROM y WHERE z = 3; > > because the 3 would be coerced to an int4 and, if z was an > int2, would result in a sequential scan, whereas: > > SELECT x FROM y WHERE z = '3'; > > would use the index since it is initially parsed as a string > and coerced properly at a later point. I think much of this > has been fixed, but nevertheless... In addition, our > varchar() types are pretty much under the 255 limit since > some ODBC clients have problems with varchar() types greater > than 255. We only use: int4, varchar, datetime, and float8. > On rare occasion, we'll use text for free-form information, > but we NEVER index it. Although its VERY tempting, (and > PostgreSQL itself uses them), we avoid arrays. > > 4. Be careful about user-defined functions/triggers - > PostgreSQL keeps track of everything by its oid, not by name > (which would obviously be too slow). But, unfortunately, it > does not yet support the modification of functions, allowing > the function to retain its original oid (or perform a > cascading update - it will be nice when RI is integrated > into the system catalogue!). As a result, odd things can > happen if you drop and recreate a function. For example, you > could have a trigger call a procedural language which, in > turn, could select from a view, from which one of the > attributes is the result of a function. If you > dropped/recreated that function, things go a bit weird and > usually result in an error such as "function not in cache". > > 5. Using DDL statements in transactions - PostgreSQL has > trouble rolling back transactions which have aborted which > contain DDL statements. As a result, you might find yourself > having to delete a filesystem file, because, even though a > TABLE create might have been rolled back as far as the > system catalogue is concerned, the underlying file might > still manage to exist. Or worse, rollback of index > DROP/CREATE in a transaction yields erroneous results. > > 6. Using indexes on large fields - Apparently the code > requires 3 tuples per page (or something like that) for the > index to function properly. This can include plpgsql source, > so be careful. We never index on anything larger than 255, > but I believe around 2.8K is the limit before tickling > bugs... > > 7. Using INSERTS instead of COPY - Even when you have > fsync() off and are running INSERT statements in a > transaction, the processing of individual INSERT statements > by the thousands is also several orders of magnitude slower > than COPY. We have large mainframe datasets which we import > nightly - we first covert them to data appropriate for COPY > and then COPY them in, instead INSERT's record by record. > The problem with COPY is it runs as user postgres, so you > need to have the data files readable by user postgres. > > 8. Not running VACUUM - PostgreSQL won't use indexes, or > won't optimize correctly unless the record count and > dispersion estimates are up-to-date. People have reported > problems with running vacuum while under heavy load. We > haven't seen it, but we run vacuum each night at 4:05 a.m. > However, if you perform a LARGE number of INSERTS/UPDATES, > it is better for you to do the following: > > DROP INDEX index_on_heavilty_used_table; > VACUUM ANALYZE; > CREATE INDEX index_on_heavily_used_table; > > Because VACUUM will sit there, and, row by row, essentially > "defragment" your indexes, which can take damn near forever > for any number of updates or deletes greater than, say, > 30,000 rows. > > 9. ALTER TABLE ADD COLUMN - Its better to rebuild the table > by hand then to use this DDL statement. First off, any > column constraints (such as NOT NULL), will silently > ignored, and secondly, inherited relations have problems > with dump/restore. > > 10. IN, INTERSECT, EXCEPT - When writing your application, > these SQL functions seem nice, particularly since the data > in your design database may be small, initially. But all > three of these SQL expressions (whatever) force a nested > sequential scan on the relation. For example: > > emptoris=> explain SELECT employee FROM employees WHERE > employee NOT IN (SELECT webuser FROM webusers); > NOTICE: QUERY PLAN: > > Seq Scan on employees (cost=3.95 rows=59 width=12) > SubPlan > -> Seq Scan on webusers (cost=7.78 rows=145 width=12) > > EXPLAIN > > Since INTERSECT/EXCEPT rewrite the query to use IN, the > problem exists with them as well. And since PostgreSQL does > not yet have outer joins, you should instead write the query > using a correlated sub query (EXISTS): > > emptoris=> explain SELECT employee FROM employees WHERE NOT > EXISTS (SELECT webuser FROM webusers WHERE webusers.webuser > = employees.employee); > NOTICE: QUERY PLAN: > > Seq Scan on employees (cost=3.95 rows=59 width=12) > SubPlan > -> Index Scan using k_webusers1 on webusers (cost=2.05 > rows=1 width=12) > > EXPLAIN > > There are many more such things which, if avoided, allow > PostgreSQL to work great. But with each release, a lot of > these things become obsolete. > > Mike Mascari > > ************ > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
pgsql-general by date: