Re: [GENERAL] Re: Is PostgreSQL ready for mission criticalapplications? - Mailing list pgsql-general

From Bruce Momjian
Subject Re: [GENERAL] Re: Is PostgreSQL ready for mission criticalapplications?
Date
Msg-id 199912291945.OAA08391@candle.pha.pa.us
Whole thread Raw
In response to Re: [GENERAL] Re: Is PostgreSQL ready for mission criticalapplications?  (Mike Mascari <mascarm@mascari.com>)
List pgsql-general
This is a great list.  I have addressed the oid/sequence issue in my
book, chapter 7.

> Barnes wrote:
> >
> > It would be helpful to me to hear about successful and stable
> > implementations as well.  If some of you who are using PostgreSQL
> > successfully could comment on your experiences, I think it would shed some
> > worthwhile light on it's capabilities.  I'm considering using it for a
> > mission critical project, and I would like to know what I am getting into.
> > Thank you.
> >
> > David Barnes
> >
>
> We've used it successfully in a production environment (24 x
> 7) for over a year now. Simply reading the mailing list will
> greatly improve your chances of success. The problems with
> PostgreSQL can be avoided if you know, in advance, what to
> avoid. But must people don't. Here's my list of things which
> can get you into trouble:
>
> 0. Running with fsync on - There is the probability that
> modified records written into kernel buffers but not written
> to disk could exist at the moment of an operating system
> crash. Therefore, PostgreSQL's default mode is to run with
> fsync() on. This slows down the database by (quite
> literally) several orders of magnitude. We've run with fsync
> off (-o -F) without a problem. Dump/Reload of large
> databases with fsync() on really tests one's pain threshold.
> If you trust your OS, run with it off.
>
> 1. Making use of oids - Problems with dumping/restoring oids
> make this development path dangerous. Most people use the
> SERIAL data type to generate primary keys. However, SERIAL
> itself has some peculiarities, since it just auto-creates a
> sequence. Dropping the associated table doesn't drop the
> sequence (IIRC), so scripted or automated schema creation
> may not be obvious. I prefer to manually use a sequence and
> an int4 type for primary keys. In fact, you could use the
> same sequence for all of your primary keys, if you aren't
> exposing the value to the user in any meaningful way, and
> don't plan to hit the 4.2 billion limit of int4 soon, and
> don't care about gaps...(although a purist would argue, and
> I agree, that IF you are going to use generated keys THEN
> the key should have no more meaning then  that it refers to
> a record).
>
> 2. Using views created with large queries - Views use the
> rewrite system and rules to rewrite a query against it to
> properly fetch data from the underlying tables. Because
> there is currently a limit on the size of a single database
> record (8192 bytes), the queries associated with views can
> only be so big. In addition, you can get into trouble if
> views are built on top of user-defined functions, which is a
> common thing to do. If you drop/recreate the underlying
> function, then the view needs to be dropped and recreated as
> well. In addition, I've had trouble with dump/reload of
> views in the past, and have always kept my schema in
> separate views.sql script, just in case...
>
> 3. Using non-standard types - Because of problems with data
> comparisons, type coercion and spaces, we avoided types such
> as bpchar, char, and even text. We avoided text since ODBC
> clients could not determine maximum field width. We also
> avoided all of the non-4 byte integer types, such as int2.
> This is because the default type coercion (sp?) code in the
> past has had trouble being smart enough to use indexes when
> given a SELECT such as:
>
> CREATE TABLE y (x text, z int2);
>
> SELECT x FROM y WHERE z = 3;
>
> because the 3 would be coerced to an int4 and, if z was an
> int2, would result in a sequential scan, whereas:
>
> SELECT x FROM y WHERE z = '3';
>
> would use the index since it is initially parsed as a string
> and coerced properly at a later point. I think much of this
> has been fixed, but nevertheless... In addition, our
> varchar() types are pretty much under the 255 limit since
> some ODBC clients have problems with varchar() types greater
> than 255. We only use: int4, varchar, datetime, and float8.
> On rare occasion, we'll use text for free-form information,
> but we NEVER index it. Although its VERY tempting, (and
> PostgreSQL itself uses them), we avoid arrays.
>
> 4. Be careful about user-defined functions/triggers -
> PostgreSQL keeps track of everything by its oid, not by name
> (which would obviously be too slow). But, unfortunately, it
> does not yet support the modification of functions, allowing
> the function to retain its original oid (or perform a
> cascading update - it will be nice when RI is integrated
> into the system catalogue!). As a result, odd things can
> happen if you drop and recreate a function. For example, you
> could have a trigger call a procedural language which, in
> turn, could select from a view, from which one of the
> attributes is the result of a function. If you
> dropped/recreated that function, things go a bit weird and
> usually result in an error such as "function not in cache".
>
> 5. Using DDL statements in transactions - PostgreSQL has
> trouble rolling back transactions which have aborted which
> contain DDL statements. As a result, you might find yourself
> having to delete a filesystem file, because, even though a
> TABLE create might have been rolled back as far as the
> system catalogue is concerned, the underlying file might
> still manage to exist. Or worse, rollback of index
> DROP/CREATE in a transaction yields erroneous results.
>
> 6. Using indexes on large fields - Apparently the code
> requires 3 tuples per page (or something like that) for the
> index to function properly. This can include plpgsql source,
> so be careful. We never index on anything larger than 255,
> but I believe around 2.8K is the limit before tickling
> bugs...
>
> 7. Using INSERTS instead of COPY - Even when you have
> fsync() off and are running INSERT statements in a
> transaction, the processing of individual INSERT statements
> by the thousands is also several orders of magnitude slower
> than COPY. We have large mainframe datasets which we import
> nightly - we first covert them to data appropriate for COPY
> and then COPY them in, instead INSERT's record by record.
> The problem with COPY is it runs as user postgres, so you
> need to have the data files readable by user postgres.
>
> 8. Not running VACUUM - PostgreSQL won't use indexes, or
> won't optimize correctly unless the record count and
> dispersion estimates are up-to-date. People have reported
> problems with running vacuum while under heavy load. We
> haven't seen it, but we run vacuum each night at 4:05 a.m.
> However, if you perform a LARGE number of INSERTS/UPDATES,
> it is better for you to do the following:
>
> DROP INDEX index_on_heavilty_used_table;
> VACUUM ANALYZE;
> CREATE INDEX index_on_heavily_used_table;
>
> Because VACUUM will sit there, and, row by row, essentially
> "defragment" your indexes, which can take damn near forever
> for any number of updates or deletes greater than, say,
> 30,000 rows.
>
> 9. ALTER TABLE ADD COLUMN - Its better to rebuild the table
> by hand then to use this DDL statement. First off, any
> column constraints (such as NOT NULL), will silently
> ignored, and secondly, inherited relations have problems
> with dump/restore.
>
> 10. IN, INTERSECT, EXCEPT - When writing your application,
> these SQL functions seem nice, particularly since the data
> in your design database may be small, initially. But all
> three of these SQL expressions (whatever) force a nested
> sequential scan on the relation. For example:
>
> emptoris=> explain SELECT employee FROM employees WHERE
> employee NOT IN (SELECT webuser FROM webusers);
> NOTICE:  QUERY PLAN:
>
> Seq Scan on employees  (cost=3.95 rows=59 width=12)
>   SubPlan
>     ->  Seq Scan on webusers  (cost=7.78 rows=145 width=12)
>
> EXPLAIN
>
> Since INTERSECT/EXCEPT rewrite the query to use IN, the
> problem exists with them as well. And since PostgreSQL does
> not yet have outer joins, you should instead write the query
> using a correlated sub query (EXISTS):
>
> emptoris=> explain SELECT employee FROM employees WHERE NOT
> EXISTS (SELECT webuser FROM webusers WHERE webusers.webuser
> = employees.employee);
> NOTICE:  QUERY PLAN:
>
> Seq Scan on employees  (cost=3.95 rows=59 width=12)
>   SubPlan
>     ->  Index Scan using k_webusers1 on webusers  (cost=2.05
> rows=1 width=12)
>
> EXPLAIN
>
> There are many more such things which, if avoided, allow
> PostgreSQL to work great. But with each release, a lot of
> these things become obsolete.
>
> Mike Mascari
>
> ************
>
>


--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

pgsql-general by date:

Previous
From: Mike Mascari
Date:
Subject: Re: [GENERAL] Future of PostgreSQL
Next
From: Ed Loehr
Date:
Subject: Re: [GENERAL] Re: Is PostgreSQL ready for mission criticalapplications?