Thread: Perfomance Tuning

Perfomance Tuning

From
mixo
Date:
I have just installed redhat linux 9 which ships with Pg
7.3.2. Pg has to be setup so that data inserts (blobs) should
be able to handle at least 8M at a time. The machine has
two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
What would be the recomended setup for good performance
considering that the db will have about 15 users for
9 hours in a day, and about 10 or so users throughout the day
who wont be conistenly using the db.



Re: Perfomance Tuning

From
"Bjoern Metzdorf"
Date:
> be able to handle at least 8M at a time. The machine has
> two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
> a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
> What would be the recomended setup for good performance
> considering that the db will have about 15 users for
> 9 hours in a day, and about 10 or so users throughout the day
> who wont be conistenly using the db.

For 15 users you won't need great tuning at all. Just make sure, that you
have the right indizes on the tables and that you have good queries (query
plan).

About the 8Meg blobs, I don't know. Other people on this list may be able to
give you hints here.

Regards,
Bjoern


Re: Perfomance Tuning

From
"Shridhar Daithankar"
Date:
On 8 Aug 2003 at 12:28, mixo wrote:

> I have just installed redhat linux 9 which ships with Pg
> 7.3.2. Pg has to be setup so that data inserts (blobs) should
> be able to handle at least 8M at a time. The machine has
> two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
> a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
> What would be the recomended setup for good performance
> considering that the db will have about 15 users for
> 9 hours in a day, and about 10 or so users throughout the day
> who wont be conistenly using the db.

You can look at http://www.varlena.com/GeneralBits/Tidbits/perf.html to start
with, although that would not take careof anything specifics to BLOB.

I would suggest some pilot benchmark about how system performs after initial
tuning. We could discuss this in detail after you have a set of initial
benchmark.

HTH

Bye
 Shridhar

--
wolf, n.:    A man who knows all the ankles.


Re: Perfomance Tuning

From
Paul Thomas
Date:
On 08/08/2003 11:28 mixo wrote:
> I have just installed redhat linux 9 which ships with Pg
> 7.3.2. Pg has to be setup so that data inserts (blobs) should
> be able to handle at least 8M at a time. The machine has
> two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
> a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
> What would be the recomended setup for good performance
> considering that the db will have about 15 users for
> 9 hours in a day, and about 10 or so users throughout the day
> who wont be conistenly using the db.


It doesn't sound like a particlarly heavy loading to me. I'd start off
with something like

shared_buffers = 2000
sort_mem = 1024
max_coonections = 100

and see how it performs under normal business loading.

--
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for the Smaller
Business |
| Computer Consultants         |
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+

Re: Perfomance Tuning

From
"scott.marlowe"
Date:
On Fri, 8 Aug 2003, mixo wrote:

> I have just installed redhat linux 9 which ships with Pg
> 7.3.2. Pg has to be setup so that data inserts (blobs) should
> be able to handle at least 8M at a time.

Nothing has to be done to tune postgresql to handle this, 8 Meg blobs are
no problem as far as I know.

> The machine has
> two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
> a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
> What would be the recomended setup for good performance
> considering that the db will have about 15 users for
> 9 hours in a day, and about 10 or so users throughout the day
> who wont be conistenly using the db.

Seeing as you have only one hard drive, how you arrange things on it
doesn't really make a big difference.   If you can get another drive and
mirror your data partition that will help speed up selects as well as
provide some redundancy should one drive fail.

How many queries per second are you looking at handling?  If it's 1 or
less, you probably don't have much to worry about with this setup.  We run
dual PIII-750s at work with 1.5 Gig ram, and while we're going to upgrade
the servers (they're currently handling apache/php/postgresql & ldap)
we'll keep the dual PIII-750 machines as the database boxes with nothing
else on them.  Postgresql is quite snappy on such hardware.


Re: Perfomance Tuning

From
Jonathan Gardner
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Friday 08 August 2003 03:28, mixo wrote:
> I have just installed redhat linux 9 which ships with Pg
> 7.3.2. Pg has to be setup so that data inserts (blobs) should
> be able to handle at least 8M at a time. The machine has
> two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
> a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
> What would be the recomended setup for good performance
> considering that the db will have about 15 users for
> 9 hours in a day, and about 10 or so users throughout the day
> who wont be conistenly using the db.
>

Redhat puts ext3 on by default. Consider switching to a non-journaling FS
(ext2?) with the partition that holds your data and WAL.

Consider having a seperate partition for the WAL as well.

These are things that are more difficult to change later on. Everything else
is tweaking.

Is it absolutely necessary to store 8MB files in the database? I find it
cumbersome. Storing them on a file server has been a better alternative for
me.

- --
Jonathan Gardner <jgardner@jonathangardner.net>
Live Free, Use Linux!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/M9J0WgwF3QvpWNwRAlT5AJ9EmDourbCiqj7MFOqfBospc2dW7gCfZKz0
JQjn/2KAeh1SPJfN601LoFg=
=PW6k
-----END PGP SIGNATURE-----

Re: Perfomance Tuning

From
Andrew Sullivan
Date:
On Fri, Aug 08, 2003 at 09:40:20AM -0700, Jonathan Gardner wrote:
>
> Redhat puts ext3 on by default. Consider switching to a non-journaling FS
> (ext2?) with the partition that holds your data and WAL.

I would give you exactly the opposite advice: _never_ use a
non-journalling fs for your data and WAL.  I suppose if you can
afford to lose some transactions, you can do without journalling.
Otherwise, you're just borrowing trouble, near as I can tell.

A

----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Perfomance Tuning

From
Rod Taylor
Date:
On Fri, 2003-08-08 at 14:53, Andrew Sullivan wrote:
> On Fri, Aug 08, 2003 at 09:40:20AM -0700, Jonathan Gardner wrote:
> >
> > Redhat puts ext3 on by default. Consider switching to a non-journaling FS
> > (ext2?) with the partition that holds your data and WAL.
>
> I would give you exactly the opposite advice: _never_ use a
> non-journalling fs for your data and WAL.  I suppose if you can
> afford to lose some transactions, you can do without journalling.
> Otherwise, you're just borrowing trouble, near as I can tell.

Agreed.. WAL cannot recover something when WAL no longer exists due to a
filesystem corruption.


Attachment

Re: Perfomance Tuning

From
Bruce Momjian
Date:
Rod Taylor wrote:
-- Start of PGP signed section.
> On Fri, 2003-08-08 at 14:53, Andrew Sullivan wrote:
> > On Fri, Aug 08, 2003 at 09:40:20AM -0700, Jonathan Gardner wrote:
> > >
> > > Redhat puts ext3 on by default. Consider switching to a non-journaling FS
> > > (ext2?) with the partition that holds your data and WAL.
> >
> > I would give you exactly the opposite advice: _never_ use a
> > non-journalling fs for your data and WAL.  I suppose if you can
> > afford to lose some transactions, you can do without journalling.
> > Otherwise, you're just borrowing trouble, near as I can tell.
>
> Agreed.. WAL cannot recover something when WAL no longer exists due to a
> filesystem corruption.

It is true that ext2 isn't good because the file system may not recover,
but BSD UFS isn't a journalled file system, but does guarantee file
system recovery after a crash --- it is especially good using soft
updates.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Perfomance Tuning

From
Rod Taylor
Date:
> > Agreed.. WAL cannot recover something when WAL no longer exists due to a
> > filesystem corruption.
>
> It is true that ext2 isn't good because the file system may not recover,
> but BSD UFS isn't a journalled file system, but does guarantee file
> system recovery after a crash --- it is especially good using soft
> updates.

Yes, UFS(2) is an excellent filesystem for PostgreSQL, especially if you
can use background fsck & softupdates.

Attachment

Re: Perfomance Tuning

From
Andrew Sullivan
Date:
On Fri, Aug 08, 2003 at 03:34:44PM -0400, Bruce Momjian wrote:
>
> It is true that ext2 isn't good because the file system may not recover,
> but BSD UFS isn't a journalled file system, but does guarantee file
> system recovery after a crash --- it is especially good using soft
> updates.

Sorry.  I usually write "journalled or equivalent" for this reason.
I think UFS with soft updates is a good example of this.  You also
don't need complete journalling in most cases -- metadata is probably
sufficient, given fsync.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Perfomance Tuning

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Rod Taylor wrote:
>> On Fri, 2003-08-08 at 14:53, Andrew Sullivan wrote:
>>> I would give you exactly the opposite advice: _never_ use a
>>> non-journalling fs for your data and WAL.  I suppose if you can
>>> afford to lose some transactions, you can do without journalling.
>>> Otherwise, you're just borrowing trouble, near as I can tell.
>>
>> Agreed.. WAL cannot recover something when WAL no longer exists due to a
>> filesystem corruption.

> It is true that ext2 isn't good because the file system may not recover,
> but BSD UFS isn't a journalled file system, but does guarantee file
> system recovery after a crash --- it is especially good using soft
> updates.

The main point here is that the filesystem has to be able to take care
of itself; we expect it not to lose any files or forget where the data
is.  If it wants to use journalling to accomplish that, fine.

Journalling file contents updates, as opposed to filesystem metadata,
should be redundant with what we do in WAL.  So I'd recommend
journalling metadata only, if that option is available (and if Postgres
stuff is the only stuff on the disk...)

            regards, tom lane

Re: Perfomance Tuning

From
"scott.marlowe"
Date:
On Fri, 8 Aug 2003, Andrew Sullivan wrote:

> On Fri, Aug 08, 2003 at 09:40:20AM -0700, Jonathan Gardner wrote:
> >
> > Redhat puts ext3 on by default. Consider switching to a non-journaling FS
> > (ext2?) with the partition that holds your data and WAL.
>
> I would give you exactly the opposite advice: _never_ use a
> non-journalling fs for your data and WAL.  I suppose if you can
> afford to lose some transactions, you can do without journalling.
> Otherwise, you're just borrowing trouble, near as I can tell.

I'd argue that a reliable filesystem (ext2) is still better than a
questionable journaling filesystem (ext3 on kernels <2.4.20).

This isn't saying to not use jounraling, but I would definitely test it
under load first to make sure it's not gonna lose data or get corrupted.


Re: Perfomance Tuning

From
Andrew Sullivan
Date:
On Mon, Aug 11, 2003 at 08:47:07AM -0600, scott.marlowe wrote:
> This isn't saying to not use jounraling, but I would definitely test it
> under load first to make sure it's not gonna lose data or get corrupted.

Well, yeah.  But given the Linux propensity for introducing major
features in "minor" releases (and thereby introducing all the
attendant bugs), I'd think twice about using _any_ Linux feature
until it's been through a major version (e.g. things introduced in
2.4.x won't really be stable until 2.6.x) -- and even there one is
taking a risk[1].

A

My laptop's PCMCIA network card recently stopped working during a
"minor" version upgrade, even though it's almost 6 years old.
Someone decided that "cleaning up" the code required complete
redesign, and so all the bugs that had been shaken out during the 2.2
series will now be reimplemented in a new and interesting way.  Sigh.

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Perfomance Tuning

From
Bruce Momjian
Date:
scott.marlowe wrote:
> On Fri, 8 Aug 2003, Andrew Sullivan wrote:
>
> > On Fri, Aug 08, 2003 at 09:40:20AM -0700, Jonathan Gardner wrote:
> > >
> > > Redhat puts ext3 on by default. Consider switching to a non-journaling FS
> > > (ext2?) with the partition that holds your data and WAL.
> >
> > I would give you exactly the opposite advice: _never_ use a
> > non-journalling fs for your data and WAL.  I suppose if you can
> > afford to lose some transactions, you can do without journalling.
> > Otherwise, you're just borrowing trouble, near as I can tell.
>
> I'd argue that a reliable filesystem (ext2) is still better than a
> questionable journaling filesystem (ext3 on kernels <2.4.20).
>
> This isn't saying to not use jounraling, but I would definitely test it
> under load first to make sure it's not gonna lose data or get corrupted.

That _would_ work if ext2 was a reliable file system --- it is not.

This is the problem of Linux file systems --- they have unreliable, and
journalled, with nothing in between, except using a journalling file
system and having it only journal metadata.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Perfomance Tuning

From
Reece Hart
Date:
On Mon, 2003-08-11 at 15:16, Bruce Momjian wrote:
That _would_ work if ext2 was a reliable file system --- it is not.

Bruce-

I'd like to know your evidence for this. I'm not refuting it, but I'm a >7 year linux user (including several clusters, all of which have run ext2 or ext3) and keep a fairly close ear to kernel newsgroups, announcements, and changelogs. I am aware that there have very occasionally been corruption problems, but my understanding is that these are fixed (and quickly). In any case, I'd say that your assertion is not widely known and I'd appreciate some data or references.

As for PostgreSQL on ext2 and ext3, I recently switched from ext3 to ext2 (Stephen Tweedy was insightful to facilitate this backward compatibility). I did this because I had a 45M row update on one table that was taking inordinate time (killed after 10 hours), even though creating the database from backup takes ~4 hours including indexing (see pgsql-perform post on 2003/07/22). CPU usage was ~2% on an otherwise unloaded, fast, SCSI160 machine. vmstat io suggested that PostgreSQL was writing something on the order of 100x as many blocks as being read. My untested interpretation was that the update bookkeeping as well as data update were all getting journalled, the journal space would fill, get sync'd, then repeat. In effect, all blocks were being written TWICE just for the journalling, never mind the overhead for PostgreSQL transactions. This emphasizes that journals probably work best with short burst writes and syncing during lulls rather than sustained writes.

I ended up solving the update issue without really updating, so ext2 timings aren't known. So, you may want to test this yourself if you're concerned.

-Reece

-- 
Reece Hart, Ph.D.                       rkh@gene.com, http://www.gene.com/
Genentech, Inc.                         650/225-6133 (voice), -5389 (fax)
Bioinformatics and Protein Engineering
1 DNA Way, MS-93                        http://www.in-machina.com/~reece/
South San Francisco, CA  94080-4990     reece@in-machina.com, GPG: 0x25EC91A0

Re: Perfomance Tuning

From
Bruce Momjian
Date:
Uh, the ext2 developers say it isn't 100% reliable --- at least that is
that was told.  I don't know any personally, but I mentioned it while I
was visiting Red Hat, and they didn't refute it.

Now, the failure window might be quite small, but I have seen it happen
myself, and have heard it from others.

---------------------------------------------------------------------------

Reece Hart wrote:
> On Mon, 2003-08-11 at 15:16, Bruce Momjian wrote:
>
> > That _would_ work if ext2 was a reliable file system --- it is not.
>
>
> Bruce-
>
> I'd like to know your evidence for this. I'm not refuting it, but I'm a
> >7 year linux user (including several clusters, all of which have run
> ext2 or ext3) and keep a fairly close ear to kernel newsgroups,
> announcements, and changelogs. I am aware that there have very
> occasionally been corruption problems, but my understanding is that
> these are fixed (and quickly). In any case, I'd say that your assertion
> is not widely known and I'd appreciate some data or references.
>
> As for PostgreSQL on ext2 and ext3, I recently switched from ext3 to
> ext2 (Stephen Tweedy was insightful to facilitate this backward
> compatibility). I did this because I had a 45M row update on one table
> that was taking inordinate time (killed after 10 hours), even though
> creating the database from backup takes ~4 hours including indexing (see
> pgsql-perform post on 2003/07/22). CPU usage was ~2% on an otherwise
> unloaded, fast, SCSI160 machine. vmstat io suggested that PostgreSQL was
> writing something on the order of 100x as many blocks as being read. My
> untested interpretation was that the update bookkeeping as well as data
> update were all getting journalled, the journal space would fill, get
> sync'd, then repeat. In effect, all blocks were being written TWICE just
> for the journalling, never mind the overhead for PostgreSQL
> transactions. This emphasizes that journals probably work best with
> short burst writes and syncing during lulls rather than sustained
> writes.
>
> I ended up solving the update issue without really updating, so ext2
> timings aren't known. So, you may want to test this yourself if you're
> concerned.
>
> -Reece
>
>
> --
> Reece Hart, Ph.D.                       rkh@gene.com, http://www.gene.com/
> Genentech, Inc.                         650/225-6133 (voice), -5389 (fax)
> Bioinformatics and Protein Engineering
> 1 DNA Way, MS-93                        http://www.in-machina.com/~reece/
> South San Francisco, CA  94080-4990     reece@in-machina.com, GPG: 0x25EC91A0

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Perfomance Tuning

From
"Christopher Kings-Lynne"
Date:
> Well, yeah.  But given the Linux propensity for introducing major
> features in "minor" releases (and thereby introducing all the
> attendant bugs), I'd think twice about using _any_ Linux feature
> until it's been through a major version (e.g. things introduced in
> 2.4.x won't really be stable until 2.6.x) -- and even there one is
> taking a risk[1].

Dudes, seriously - switch to FreeBSD :P

Chris


Re: Perfomance Tuning

From
Neil Conway
Date:
On Mon, Aug 11, 2003 at 06:59:30PM -0400, Bruce Momjian wrote:
> Uh, the ext2 developers say it isn't 100% reliable --- at least that is
> that was told.  I don't know any personally, but I mentioned it while I
> was visiting Red Hat, and they didn't refute it.

IMHO, if we're going to say "don't use X on production PostgreSQL
systems", we need to have some better evidene than "no one has
said anything to the contrary, and I heard X is bad". If we can't
produce such evidence, we shouldn't say anything at all, and users
can decide what to use for themselves.

(Not that I'm agreeing or disagreeing about ext2 in particular...)

> > My
> > untested interpretation was that the update bookkeeping as well as data
> > update were all getting journalled, the journal space would fill, get
> > sync'd, then repeat. In effect, all blocks were being written TWICE just
> > for the journalling, never mind the overhead for PostgreSQL
> > transactions.

Journalling may or may not have been the culprit, but I doubt everything
was being written to disk twice:

(a) ext3 does metadata-only journalling by default

(b) PostgreSQL only fsyncs WAL records to disk, not the data itself

-Neil


Re: Perfomance Tuning

From
Ron Johnson
Date:
On Mon, 2003-08-11 at 19:50, Christopher Kings-Lynne wrote:
> > Well, yeah.  But given the Linux propensity for introducing major
> > features in "minor" releases (and thereby introducing all the
> > attendant bugs), I'd think twice about using _any_ Linux feature
> > until it's been through a major version (e.g. things introduced in
> > 2.4.x won't really be stable until 2.6.x) -- and even there one is
> > taking a risk[1].
>
> Dudes, seriously - switch to FreeBSD :P

But, like, we want a *good* OS... 8-0

--
+---------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net           |
| Jefferson, LA  USA                                            |
|                                                               |
| "Man, I'm pretty.  Hoo Hah!"                                  |
|    Johnny Bravo                                               |
+---------------------------------------------------------------+



Re: Perfomance Tuning

From
Bruce Momjian
Date:
Neil Conway wrote:
> On Mon, Aug 11, 2003 at 06:59:30PM -0400, Bruce Momjian wrote:
> > Uh, the ext2 developers say it isn't 100% reliable --- at least that is
> > that was told.  I don't know any personally, but I mentioned it while I
> > was visiting Red Hat, and they didn't refute it.
>
> IMHO, if we're going to say "don't use X on production PostgreSQL
> systems", we need to have some better evidene than "no one has
> said anything to the contrary, and I heard X is bad". If we can't
> produce such evidence, we shouldn't say anything at all, and users
> can decide what to use for themselves.
>
> (Not that I'm agreeing or disagreeing about ext2 in particular...)

I don't use Linux and was just repeating what I had heard from others,
and read in postings.  I don't have any first-hand experience with ext2
(except for a laptop I borrowed that wouldn't boot after being shut
off), but others on this mailing list have said the same thing.

Here is another email talking about corrupting ext2 file systems:


http://groups.google.com/groups?q=ext2+corrupt+%22power+failure%22&start=10&hl=en&lr=&ie=UTF-8&selm=20021128061318.GE18980%40ursine&rnum=11

From his wording, I assume he is not talking about fsck-correctable
corrupting.

From what I remember, the ext2 failure cases were quite small, but known
by the ext2 developers, and considered too large a performance hit to
correct.

> > > My
> > > untested interpretation was that the update bookkeeping as well as data
> > > update were all getting journalled, the journal space would fill, get
> > > sync'd, then repeat. In effect, all blocks were being written TWICE just
> > > for the journalling, never mind the overhead for PostgreSQL
> > > transactions.
>
> Journalling may or may not have been the culprit, but I doubt everything
> was being written to disk twice:
>
> (a) ext3 does metadata-only journalling by default

If that is true, why was I told people have to mount their ext3 file
systems with metadata-only.  Again, I have no experience myself, but why
are people telling me this?

> (b) PostgreSQL only fsyncs WAL records to disk, not the data itself

Right.  WAL recovers the data.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Perfomance Tuning

From
Neil Conway
Date:
On Tue, Aug 12, 2003 at 12:52:46AM -0400, Bruce Momjian wrote:
 I don't use Linux and was just repeating what I had heard from others,
> and read in postings.  I don't have any first-hand experience with ext2
> (except for a laptop I borrowed that wouldn't boot after being shut
> off), but others on this mailing list have said the same thing.

Right, and I understand the need to answer users asking about
which filesystem to use, but I'd be cautious of bad-mouthing
another OSS project without any hard evidence to back up our
claim (of course if we have such evidence, then fine -- I
just haven't seen it). It would be like $SOME_LARGE_OSS
project saying "Don't use our project with PostgreSQL, as
foo@bar.org had data corruption with PostgreSQL 6.3 on
UnixWare" -- kind of annoying, right?

> > (a) ext3 does metadata-only journalling by default
>
> If that is true, why was I told people have to mount their ext3 file
> systems with metadata-only.  Again, I have no experience myself, but why
> are people telling me this?

Perhaps they were suggesting that people mount ext2 using
data=writeback, rather than the default of data=ordered.

BTW, I've heard from a couple different people that using
ext3 with data=journalled (i.e. enabling journalling of both
data and metadata) actually makes PostgreSQL faster, as
it means that ext3 can skip PostgreSQL's fsync request
since ext3's log is flushed to disk already. I haven't
tested this myself, however.

-Neil


Re: Perfomance Tuning

From
"scott.marlowe"
Date:
On Tue, 12 Aug 2003, Christopher Kings-Lynne wrote:

> > Well, yeah.  But given the Linux propensity for introducing major
> > features in "minor" releases (and thereby introducing all the
> > attendant bugs), I'd think twice about using _any_ Linux feature
> > until it's been through a major version (e.g. things introduced in
> > 2.4.x won't really be stable until 2.6.x) -- and even there one is
> > taking a risk[1].
>
> Dudes, seriously - switch to FreeBSD :P

Yeah, it's nice to have a BUG FREE OS huh? ;^)

And yes, I've used FreeBSD, it's quite good, but I kept getting the
feeling it wasn't quite done.  Especially the installation documentation.


Re: Perfomance Tuning

From
"scott.marlowe"
Date:
On Mon, 11 Aug 2003, Bruce Momjian wrote:

> scott.marlowe wrote:
> > On Fri, 8 Aug 2003, Andrew Sullivan wrote:
> >
> > > On Fri, Aug 08, 2003 at 09:40:20AM -0700, Jonathan Gardner wrote:
> > > >
> > > > Redhat puts ext3 on by default. Consider switching to a non-journaling FS
> > > > (ext2?) with the partition that holds your data and WAL.
> > >
> > > I would give you exactly the opposite advice: _never_ use a
> > > non-journalling fs for your data and WAL.  I suppose if you can
> > > afford to lose some transactions, you can do without journalling.
> > > Otherwise, you're just borrowing trouble, near as I can tell.
> >
> > I'd argue that a reliable filesystem (ext2) is still better than a
> > questionable journaling filesystem (ext3 on kernels <2.4.20).
> >
> > This isn't saying to not use jounraling, but I would definitely test it
> > under load first to make sure it's not gonna lose data or get corrupted.
>
> That _would_ work if ext2 was a reliable file system --- it is not.
>
> This is the problem of Linux file systems --- they have unreliable, and
> journalled, with nothing in between, except using a journalling file
> system and having it only journal metadata.

Never the less, on LINUX, which is what we use, it is by far more reliable
than ext3 or reiserfs.  In four years of use I've lost zero files to any
of its bugs.  Of course, maybe that's RedHat patching the kernel for me or
something. :-)  they seem to hire some pretty good hackers.


Re: Perfomance Tuning

From
"scott.marlowe"
Date:
On Tue, 12 Aug 2003, Neil Conway wrote:

> On Tue, Aug 12, 2003 at 12:52:46AM -0400, Bruce Momjian wrote:
>  I don't use Linux and was just repeating what I had heard from others,
> > and read in postings.  I don't have any first-hand experience with ext2
> > (except for a laptop I borrowed that wouldn't boot after being shut
> > off), but others on this mailing list have said the same thing.
>
> Right, and I understand the need to answer users asking about
> which filesystem to use, but I'd be cautious of bad-mouthing
> another OSS project without any hard evidence to back up our
> claim (of course if we have such evidence, then fine -- I
> just haven't seen it). It would be like $SOME_LARGE_OSS
> project saying "Don't use our project with PostgreSQL, as
> foo@bar.org had data corruption with PostgreSQL 6.3 on
> UnixWare" -- kind of annoying, right?

Wow, you put my thoughts exactly into words for me, thanks Neil.

> > > (a) ext3 does metadata-only journalling by default
> >
> > If that is true, why was I told people have to mount their ext3 file
> > systems with metadata-only.  Again, I have no experience myself, but why
> > are people telling me this?
>
> Perhaps they were suggesting that people mount ext2 using
> data=writeback, rather than the default of data=ordered.
>
> BTW, I've heard from a couple different people that using
> ext3 with data=journalled (i.e. enabling journalling of both
> data and metadata) actually makes PostgreSQL faster, as
> it means that ext3 can skip PostgreSQL's fsync request
> since ext3's log is flushed to disk already. I haven't
> tested this myself, however.

Now that you mention it, that makes sense.  I might have to test ext3 now
that the 2.6 kernel is on the way, i.e. the 2.4 kernel should be settling
down by now.


Re: Perfomance Tuning

From
"scott.marlowe"
Date:
On 11 Aug 2003, Ron Johnson wrote:

> On Mon, 2003-08-11 at 19:50, Christopher Kings-Lynne wrote:
> > > Well, yeah.  But given the Linux propensity for introducing major
> > > features in "minor" releases (and thereby introducing all the
> > > attendant bugs), I'd think twice about using _any_ Linux feature
> > > until it's been through a major version (e.g. things introduced in
> > > 2.4.x won't really be stable until 2.6.x) -- and even there one is
> > > taking a risk[1].
> >
> > Dudes, seriously - switch to FreeBSD :P
>
> But, like, we want a *good* OS... 8-0

What, like Unixware?  (ducking quickly) (*_*)


Re: Perfomance Tuning

From
"Shridhar Daithankar"
Date:
On 11 Aug 2003 at 23:42, Ron Johnson wrote:

> On Mon, 2003-08-11 at 19:50, Christopher Kings-Lynne wrote:
> > > Well, yeah.  But given the Linux propensity for introducing major
> > > features in "minor" releases (and thereby introducing all the
> > > attendant bugs), I'd think twice about using _any_ Linux feature
> > > until it's been through a major version (e.g. things introduced in
> > > 2.4.x won't really be stable until 2.6.x) -- and even there one is
> > > taking a risk[1].
> >
> > Dudes, seriously - switch to FreeBSD :P
>
> But, like, we want a *good* OS... 8-0

Joke aside, I guess since postgresql is pretty much reliant on file system for
basic file functionality, I guess it's time to test Linux 2.6 and compare it.

And don't forget, for large databases, there is still XFS out there which is
probably the ruler at upper end..

Bye
 Shridhar

--
Unfair animal names:-- tsetse fly            -- bullhead-- booby            -- duck-billed
platypus-- sapsucker            -- Clarence        -- Gary Larson


Re: Perfomance Tuning

From
mixo
Date:
Thanks to everyone who responded. It's a pity that the discussion has gone
the ext2 vs ext3 route. The main reason I asked my original question is
that I am currently importing data into Pg which is about 2.9 Gigs.
Unfortunately, to maintain data intergrity, data is inserted into a table
one row at a time. This exercise took ~7 days on the same system with
slightly different setup(PIII 1.0GHZ, 512M RAM -- CPU speed was down graded
due to serveral over heating problems which have since been fixed, and RAM
was added for good measure). I have just reloaded the machine, and started
the import. So far ~ 6000 record have been imported, and there is 32000
left.

P.S. Importing the same data on Mysql took ~2 days.

Bjoern Metzdorf wrote:

>>be able to handle at least 8M at a time. The machine has
>>two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
>>a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
>>What would be the recomended setup for good performance
>>considering that the db will have about 15 users for
>>9 hours in a day, and about 10 or so users throughout the day
>>who wont be conistenly using the db.
>>
>>
>
>For 15 users you won't need great tuning at all. Just make sure, that you
>have the right indizes on the tables and that you have good queries (query
>plan).
>
>About the 8Meg blobs, I don't know. Other people on this list may be able to
>give you hints here.
>
>Regards,
>Bjoern
>
>



Re: Perfomance Tuning

From
Bruce Momjian
Date:
OK, I got some hard evidence.  Here is a discussion on the Linux kernel
mailing list with postings from Allen Cox (ac Linux kernels) and Stephen
Tweedie (ext3 author).

    http://www.tux.org/hypermail/linux-kernel/1999week14/subject.html#start

Search for "softupdates and ext2".

Here is the original email in the thread:

    http://www.tux.org/hypermail/linux-kernel/1999week14/0498.html

Summary is at:

    http://www.tux.org/hypermail/linux-kernel/1999week14/0571.html

and conclusion in:

    http://www.tux.org/hypermail/linux-kernel/1999week14/0504.html

I now remember the issue --- ext2 makes all disk changes asynchonously
(unless you mount it via sync, which is slow).  This means that the file
system isn't always consistent on disk.

UFS has always sync metadata (file/directory creation) to the disk so
the disk was always consistent, but doesn't sync the data to the disk,
for performance reasons.  With soft updates, the metadata writes are
delayed, and written to disk in an order that keeps the file system
consistent.

Is this enough evidence, or should I keep researching?

---------------------------------------------------------------------------

Neil Conway wrote:
> On Tue, Aug 12, 2003 at 12:52:46AM -0400, Bruce Momjian wrote:
>  I don't use Linux and was just repeating what I had heard from others,
> > and read in postings.  I don't have any first-hand experience with ext2
> > (except for a laptop I borrowed that wouldn't boot after being shut
> > off), but others on this mailing list have said the same thing.
>
> Right, and I understand the need to answer users asking about
> which filesystem to use, but I'd be cautious of bad-mouthing
> another OSS project without any hard evidence to back up our
> claim (of course if we have such evidence, then fine -- I
> just haven't seen it). It would be like $SOME_LARGE_OSS
> project saying "Don't use our project with PostgreSQL, as
> foo@bar.org had data corruption with PostgreSQL 6.3 on
> UnixWare" -- kind of annoying, right?
>
> > > (a) ext3 does metadata-only journalling by default
> >
> > If that is true, why was I told people have to mount their ext3 file
> > systems with metadata-only.  Again, I have no experience myself, but why
> > are people telling me this?
>
> Perhaps they were suggesting that people mount ext2 using
> data=writeback, rather than the default of data=ordered.
>
> BTW, I've heard from a couple different people that using
> ext3 with data=journalled (i.e. enabling journalling of both
> data and metadata) actually makes PostgreSQL faster, as
> it means that ext3 can skip PostgreSQL's fsync request
> since ext3's log is flushed to disk already. I haven't
> tested this myself, however.
>
> -Neil
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Perfomance Tuning

From
Bill Moran
Date:
Shridhar Daithankar wrote:
> On 11 Aug 2003 at 23:42, Ron Johnson wrote:
>
>
>>On Mon, 2003-08-11 at 19:50, Christopher Kings-Lynne wrote:
>>
>>>>Well, yeah.  But given the Linux propensity for introducing major
>>>>features in "minor" releases (and thereby introducing all the
>>>>attendant bugs), I'd think twice about using _any_ Linux feature
>>>>until it's been through a major version (e.g. things introduced in
>>>>2.4.x won't really be stable until 2.6.x) -- and even there one is
>>>>taking a risk[1].
>>>
>>>Dudes, seriously - switch to FreeBSD :P
>>
>>But, like, we want a *good* OS... 8-0
>
>
> Joke aside, I guess since postgresql is pretty much reliant on file system for
> basic file functionality, I guess it's time to test Linux 2.6 and compare it.
>
> And don't forget, for large databases, there is still XFS out there which is
> probably the ruler at upper end..

This is going to push the whole thing a little off-topic, but I'm curious to
know the answer.

Has it ever been proposed or attemped to run PostgreSQL without any filesystem
(or any other database for that matter ...).

Meaning ... just tell it a raw partition to keep the data on and Postgre would
create its own "filesystem" ... obviously, doing that would allow Postgre to
bypass all the failings of all filesystems and rely entirely apon its own
rules.

Or are modern filesystems advanced enough that doing something like that would
lose more than it would gain?

Just thinking out loud.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


Re: Perfomance Tuning

From
Andrew Sullivan
Date:
On Tue, Aug 12, 2003 at 02:39:19PM -0400, Bill Moran wrote:
> Meaning ... just tell it a raw partition to keep the data on and
> Postgre would create its own "filesystem" ... obviously, doing that
> would allow Postgre to bypass all the failings of all filesystems
> and rely entirely apon its own rules.
>
> Or are modern filesystems advanced enough that doing something like
> that would lose more than it would gain?

The latter, mostly.  This has been debated repeatedly on -hackers.
If you want "raw" access, then you have to implement some other kind
of specialised filesystem of your own.  And you have to have all
sorts of nice tools to cope with the things that (for instance) fsck
handles.  I think the reaction of most developers has been, "Why
reinvent the wheel?"

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Perfomance Tuning

From
"Gregory S. Williamson"
Date:
FWIW, Informix can be run using a "cooked" (Unix) file for storing data or it uses "raw" disk space and bypasses the
ordinary(high level) UNIX controllers and does its own reads/writes. About 10 times faster and safer. Of course, itmay
havetaken a lot of programmer time to make that solid. But the performance gains are significant. 

Greg W.


-----Original Message-----
From:    Bill Moran [mailto:wmoran@potentialtech.com]
Sent:    Tue 8/12/2003 11:39 AM
To:
Cc:    PgSQL Performance ML
Subject:    Re: [PERFORM] Perfomance Tuning

Shridhar Daithankar wrote:
> On 11 Aug 2003 at 23:42, Ron Johnson wrote:
>
>
>>On Mon, 2003-08-11 at 19:50, Christopher Kings-Lynne wrote:
>>
>>>>Well, yeah.  But given the Linux propensity for introducing major
>>>>features in "minor" releases (and thereby introducing all the
>>>>attendant bugs), I'd think twice about using _any_ Linux feature
>>>>until it's been through a major version (e.g. things introduced in
>>>>2.4.x won't really be stable until 2.6.x) -- and even there one is
>>>>taking a risk[1].
>>>
>>>Dudes, seriously - switch to FreeBSD :P
>>
>>But, like, we want a *good* OS... 8-0
>
>
> Joke aside, I guess since postgresql is pretty much reliant on file system for
> basic file functionality, I guess it's time to test Linux 2.6 and compare it.
>
> And don't forget, for large databases, there is still XFS out there which is
> probably the ruler at upper end..

This is going to push the whole thing a little off-topic, but I'm curious to
know the answer.

Has it ever been proposed or attemped to run PostgreSQL without any filesystem
(or any other database for that matter ...).

Meaning ... just tell it a raw partition to keep the data on and Postgre would
create its own "filesystem" ... obviously, doing that would allow Postgre to
bypass all the failings of all filesystems and rely entirely apon its own
rules.

Or are modern filesystems advanced enough that doing something like that would
lose more than it would gain?

Just thinking out loud.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster




Re: Perfomance Tuning

From
Sean Chittenden
Date:
> > > Well, yeah.  But given the Linux propensity for introducing major
> > > features in "minor" releases (and thereby introducing all the
> > > attendant bugs), I'd think twice about using _any_ Linux feature
> > > until it's been through a major version (e.g. things introduced in
> > > 2.4.x won't really be stable until 2.6.x) -- and even there one is
> > > taking a risk[1].
> >
> > Dudes, seriously - switch to FreeBSD :P
>
> Yeah, it's nice to have a BUG FREE OS huh? ;^)
>
> And yes, I've used FreeBSD, it's quite good, but I kept getting the
> feeling it wasn't quite done.  Especially the installation
> documentation.

While the handbook isn't the same as reading the actual source or the
only FreeBSD documentation, it certainly is quite good (to the point
that publishers see small market to publish FreeBSD books because the
documentation provided by the project is so good), IMHO.

http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/

If anyone on this list has any issues with the documentation, please
take them up with me _privately_ and I will do my best to either
address or correct the problem.

Now, back to our regularly scheduled and on topic programming...  -sc

--
Sean Chittenden
"(PostgreSQL|FreeBSD).org - The Power To Serve"

Re: Filesystems WAS: Perfomance Tuning

From
Josh Berkus
Date:
Greg,

> FWIW, Informix can be run using a "cooked" (Unix) file for storing data or
> it uses "raw" disk space and bypasses the ordinary (high level) UNIX
> controllers and does its own reads/writes. About 10 times faster and safer.
> Of course, itmay have taken a lot of programmer time to make that solid.
> But the performance gains are significant.

Yes, but it's still slower than PostgreSQL on medium-end hardware.  ;-)

This idea has been discussed numerous times on the HACKERS list, and is a
(pretty much) closed issue.   While Oracle and SQL Server use their own
filesystems, PostgreSQL will not because:

1) It would interfere with our cross-platform compatibility.  PostgreSQL runs
on something like 20 OSes.

2) The filesystem projects out there are (mostly) well-staffed and are
constantly advancing using specialized technology and theory.  There's no way
that the PostgreSQL team can do a better job in our "spare time".

3) Development of our "own" filesystem would then require PostgreSQL to create
and maintain a whole hardware compatibility library, and troubleshoot
problems on exotic hardware and wierd RAID configurations.

4) A database FS also often causes side-effect problems; for example, one
cannot move or copy a SQL Server partition without destroying it.

Of course, that could all change if some corp with deep pockets steps in an
decides to create a "postgresFS" and funds and staffs the effort 100%.  But
it's unlikely to be a priority for the existing development team any time in
the forseeable future.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Filesystems WAS: Perfomance Tuning

From
Richard Welty
Date:
On Tue, 12 Aug 2003 13:09:42 -0700 Josh Berkus <josh@agliodbs.com> wrote:
> This idea has been discussed numerous times on the HACKERS list, and is
> a
> (pretty much) closed issue.   While Oracle and SQL Server use their own
> filesystems, PostgreSQL will not because:
...
> 2) The filesystem projects out there are (mostly) well-staffed and are
> constantly advancing using specialized technology and theory.  There's
> no way
> that the PostgreSQL team can do a better job in our "spare time".

i consider this a fair answer, but i have a slightly different question to
ask, inspired by my discussions with a good friend who is a top notch
Informix DBA.

there are advantages to being able to split the database across a slew of
disk drives. if we accept the notion of using the native OS filesystem on
each, it would seem that being able to direct various tables and indices to
specific drives might be a valuble capability. i know that i could go into
/var/lib/pgsql/data/base and fan the contents out, but this is unweildy and
impractical. has any consideration been given to providing a way to manage
such a deployment?

or is it the judgement of the hackers community that a monsterous raid-10
array offers comparable performance?

i forget how large the data store on my friend's current project is, but
i'll check. knowing the size and transaction rate he's dealing with might
put a finer point on this discussion.

richard
--
Richard Welty                                         rwelty@averillpark.net
Averill Park Networking                                         518-573-7592
    Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security

Re: Filesystems WAS: Perfomance Tuning

From
Andrew Sullivan
Date:
On Tue, Aug 12, 2003 at 04:31:19PM -0400, Richard Welty wrote:
> impractical. has any consideration been given to providing a way to manage
> such a deployment?

Plenty.  No-one's completed an implementation yet.

> or is it the judgement of the hackers community that a monsterous raid-10
> array offers comparable performance?

It's tough to say, but I _can_ tell you that, so far in my tests,
I've never been able to prove an advantage in separating even the WAL
on a Sun A5200.  That's not a result yet, of course, just a bit of
gossip.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Filesystems WAS: Perfomance Tuning

From
Rod Taylor
Date:
> specific drives might be a valuble capability. i know that i could go into
> /var/lib/pgsql/data/base and fan the contents out, but this is unweildy and
> impractical. has any consideration been given to providing a way to manage
> such a deployment?

The ability to take various database objects and store them in different
locations, sometimes referred to as table spaces, will probably be done
in the future.  There was a substantial group not all that long ago that
was organizing to complete the implementation.

> or is it the judgement of the hackers community that a monsterous raid-10
> array offers comparable performance?

Often performs well enough... But a raid-10 for data, a stripe for
indexes, and a mirror for WAL will offer better performance :)

Attachment

Re: Filesystems WAS: Perfomance Tuning

From
"Christopher Kings-Lynne"
Date:
> there are advantages to being able to split the database across a slew of
> disk drives. if we accept the notion of using the native OS filesystem on
> each, it would seem that being able to direct various tables and indices
to
> specific drives might be a valuble capability. i know that i could go into
> /var/lib/pgsql/data/base and fan the contents out, but this is unweildy
and
> impractical. has any consideration been given to providing a way to manage
> such a deployment?

We've got a little bunch of us tinkering with a tablespace implementation.
However, it's been staller for a while now.

Chris


Re: Filesystems WAS: Perfomance Tuning

From
Richard Welty
Date:
On Wed, 13 Aug 2003 09:48:18 +0800 Christopher Kings-Lynne <chriskl@familyhealth.com.au> wrote:
> We've got a little bunch of us tinkering with a tablespace
> implementation.
> However, it's been staller for a while now.

interesting. i'm involved in the very early stages of a startup that is
likely to do a prototype using Java and PostgreSQL.

tablespace and replication are issues that would weigh heavily in a
decision to stick with PostgreSQL after the prototype.

richard
--
Richard Welty                                         rwelty@averillpark.net
Averill Park Networking                                         518-573-7592
    Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security

Re: Filesystems WAS: Perfomance Tuning

From
Bruce Momjian
Date:
I think Gavin Sherry is working on this.  I am CC'ing him.

---------------------------------------------------------------------------

Christopher Kings-Lynne wrote:
> > there are advantages to being able to split the database across a slew of
> > disk drives. if we accept the notion of using the native OS filesystem on
> > each, it would seem that being able to direct various tables and indices
> to
> > specific drives might be a valuble capability. i know that i could go into
> > /var/lib/pgsql/data/base and fan the contents out, but this is unweildy
> and
> > impractical. has any consideration been given to providing a way to manage
> > such a deployment?
>
> We've got a little bunch of us tinkering with a tablespace implementation.
> However, it's been staller for a while now.
>
> Chris
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Filesystems WAS: Perfomance Tuning

From
Gavin Sherry
Date:
On Tue, 12 Aug 2003, Bruce Momjian wrote:

>
> I think Gavin Sherry is working on this.  I am CC'ing him.
>
> ---------------------------------------------------------------------------

Yes I am working on this. I am about 50% of the way through the patch but
have been held up with other work. For those who are interested, it
basically allow:

1) creation of different 'storage' locations. Tables and indexes can be
created in different storage locations. Storage locations can also be
assigned to schemas and databases. Tables and indexes will default to the
schema storage location if STORAGE 'store name' is not provided to CREATE
.... This will cascade to the default database storage location if
the schema was not created with STORAGE 'store name'.

2) the patch will allow different storage locations to have different
rand_cost parameters passed to the planner.

3) the patch *will not* address issues concerning quotas, resource
management, WAL/clog, temp or sort spaces.

Will keep everyone posted if/when I finish.

Thanks,

Gavin


Re: Perfomance Tuning

From
Ron Johnson
Date:
On Tue, 2003-08-12 at 13:39, Bruce Momjian wrote:
> OK, I got some hard evidence.  Here is a discussion on the Linux kernel
> mailing list with postings from Allen Cox (ac Linux kernels) and Stephen
> Tweedie (ext3 author).
>
>     http://www.tux.org/hypermail/linux-kernel/1999week14/subject.html#start
>
> Search for "softupdates and ext2".
>
> Here is the original email in the thread:
>
>     http://www.tux.org/hypermail/linux-kernel/1999week14/0498.html
>
> Summary is at:
>
>     http://www.tux.org/hypermail/linux-kernel/1999week14/0571.html
>
> and conclusion in:
>
>     http://www.tux.org/hypermail/linux-kernel/1999week14/0504.html
>
> I now remember the issue --- ext2 makes all disk changes asynchonously
> (unless you mount it via sync, which is slow).  This means that the file
> system isn't always consistent on disk.
>
> UFS has always sync metadata (file/directory creation) to the disk so
> the disk was always consistent, but doesn't sync the data to the disk,
> for performance reasons.  With soft updates, the metadata writes are
> delayed, and written to disk in an order that keeps the file system
> consistent.
>
> Is this enough evidence, or should I keep researching?

This is all 4 years old, though.  Isn't that why the ext3 "layer" was
created, and filesystems like reiserFS, XFS and (kinda) JFS were added
to Linux?

> ---------------------------------------------------------------------------
>
> Neil Conway wrote:
> > On Tue, Aug 12, 2003 at 12:52:46AM -0400, Bruce Momjian wrote:
> >  I don't use Linux and was just repeating what I had heard from others,
> > > and read in postings.  I don't have any first-hand experience with ext2
> > > (except for a laptop I borrowed that wouldn't boot after being shut
> > > off), but others on this mailing list have said the same thing.
> >
> > Right, and I understand the need to answer users asking about
> > which filesystem to use, but I'd be cautious of bad-mouthing
> > another OSS project without any hard evidence to back up our
> > claim (of course if we have such evidence, then fine -- I
> > just haven't seen it). It would be like $SOME_LARGE_OSS
> > project saying "Don't use our project with PostgreSQL, as
> > foo@bar.org had data corruption with PostgreSQL 6.3 on
> > UnixWare" -- kind of annoying, right?
> >
> > > > (a) ext3 does metadata-only journalling by default
> > >
> > > If that is true, why was I told people have to mount their ext3 file
> > > systems with metadata-only.  Again, I have no experience myself, but why
> > > are people telling me this?
> >
> > Perhaps they were suggesting that people mount ext2 using
> > data=writeback, rather than the default of data=ordered.
> >
> > BTW, I've heard from a couple different people that using
> > ext3 with data=journalled (i.e. enabling journalling of both
> > data and metadata) actually makes PostgreSQL faster, as
> > it means that ext3 can skip PostgreSQL's fsync request
> > since ext3's log is flushed to disk already. I haven't
> > tested this myself, however.
> >
> > -Neil

--
+---------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net           |
| Jefferson, LA  USA                                            |
|                                                               |
| "Man, I'm pretty.  Hoo Hah!"                                  |
|    Johnny Bravo                                               |
+---------------------------------------------------------------+



Re: Perfomance Tuning

From
Dennis Björklund
Date:
On Tue, 12 Aug 2003, mixo wrote:

> that I am currently importing data into Pg which is about 2.9 Gigs.
> Unfortunately, to maintain data intergrity, data is inserted into a table
> one row at a time.'

So you don't put a number of inserts into one transaction?

If you don't do that then postgresql will treat each command as a
transaction and each insert is going to be forced out on disk (returning
when the data is just in some cache is not safe even if other products
might do that). If you don't do this then the server promise the client
that the row have been stored but then the server goes down and the row
that was in the cache is lost. It's much faster but not what you expect
from a real database.

So, group the inserts in transactions with maybe 1000 commands each. It
will go much faster. It can then cache the rows and in the end just make
sure all 1000 have been written out on disk.

There is also a configuration variable that can tell postgresql to not
wait until the insert is out on disk, but that is not recomended if you
value your data.

And last, why does it help integrity to insert data one row at a time?

--
/Dennis


Re: Perfomance Tuning

From
"Christopher Kings-Lynne"
Date:
> So, group the inserts in transactions with maybe 1000 commands each. It
> will go much faster. It can then cache the rows and in the end just make
> sure all 1000 have been written out on disk.

More than that, he should be using COPY - it's 10x faster than even grouped
inserts.

Chris


Re: Perfomance Tuning

From
Ron Johnson
Date:
On Wed, 2003-08-13 at 01:47, Christopher Kings-Lynne wrote:
> > So, group the inserts in transactions with maybe 1000 commands each. It
> > will go much faster. It can then cache the rows and in the end just make
> > sure all 1000 have been written out on disk.
>
> More than that, he should be using COPY - it's 10x faster than even grouped
> inserts.

I have a table which has a foreign key reference to a properly indexed
table, and needed to load 15GB of uncompressed data into that table.

Since the machine is minimal (60GB 5400RPM IDE HDD, 1GB RAM, 1GHz
Athlon), to save precious disk space, I had the data compressed into
22 files totaling 641GiB.  The records are approximately 275 bytes
in size.

Also, because date transformations needed to be made, I had to 1st
insert into a temp table, and insert from there into the main table.
Thus, in essence, I had to insert each record twice.

So, in 8:45 (not 8 minutes 45 seconds!, decompressed 641MiB worth of
96% compressed files, inserted 30M rows, and inserted 30M rows again,
while doing foreign key checks to another table.  And the data files
plus database are all on the same disk.

Pretty impressive: 1,920 inserts/second.

for f in ltx_*unl.gz;
do
    psql test1 -c "truncate table t_lane_tx2;" ;
    (zcat $f | sed "s/\"//g" | \
       psql test1 -c "copy t_lane_tx2 from stdin delimiter ',';");
    time psql -a -f sel_into_ltx.sql -d test1      ;
done

--
+---------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net           |
| Jefferson, LA  USA                                            |
|                                                               |
| "Man, I'm pretty.  Hoo Hah!"                                  |
|    Johnny Bravo                                               |
+---------------------------------------------------------------+



Re: Perfomance Tuning

From
Christopher Browne
Date:
Martha Stewart called it a Good Thing whengsw@globexplorer.com ("Gregory S. Williamson")wrote:
> FWIW, Informix can be run using a "cooked" (Unix) file for storing
> data or it uses "raw" disk space and bypasses the ordinary (high
> level) UNIX controllers and does its own reads/writes. About 10
> times faster and safer. Of course, itmay have taken a lot of
> programmer time to make that solid. But the performance gains are
> significant.

Are you _certain_ that's still true?  Have you a metric that shows
Informix being 10x faster on a modern system?  That would be quite
surprising...

It may have been true on '80s style UFS implementations, but a couple
of decades have passed, and pretty much any Unix system has new
selections of filesystems that probably aren't so much slower.

It could conceivably be an interesting idea to implement a
block-oriented filesystem where the granularity of files was 8K (or
some such number :-)).

Oracle seems to have done something vaguely like this...
http://otn.oracle.com/tech/linux/open_source.html

But long and short is that the guys implementing OSes have been
putting a LOT of effort into making the potential performance gains of
using "raw" partitions less and less.
--
select 'cbbrowne' || '@' || 'acm.org';
http://www.ntlug.org/~cbbrowne/sap.html
(eq? 'truth 'beauty)  ; to avoid unassigned-var error, since compiled code
                      ; will pick up previous value to var set!-ed,
                      ; the unassigned object.
-- from BBN-CL's cl-parser.scm

Re: Perfomance Tuning

From
Jeff
Date:
On Tue, 12 Aug 2003, Christopher Browne wrote:

> Are you _certain_ that's still true?  Have you a metric that shows
> Informix being 10x faster on a modern system?  That would be quite
> surprising...
>

We were forced (for budget reason) to switch from raw disk to cooked files
on our informix db. We took a huge hit - about 5-6x slower.  Granted part
of that was because informix takes number of spindles, etc into account
when generating query plans and the fact running UPDATE STATISTICS (think
Vacuum analyze) on the version we run locks the table exclusively. And it
is unacceptable to have our "main table" unavailable for hours and hours
while the update runs. (For the record: its a 8cpu sun e4500 running
sol2.6.  The raw disks were on a hitachi fibre array and the cooked files
were on a raid5 (scsi). Forget how many spindles in the raid.
There were 20 raw disks)

Informix, etc. have spent a lot of time and money working on it.
They also have the advantage of having many paid fulltime
developers who are doing this for a job, not as a weekend hobby
(Compared to the what? 2-3 full time PG developers).

The other advantage (which I hinted to above) with raw disks is being able
to optimize queries to take advantage of it.  Informix is multithreaded
and it will spawn off multiple "readers" to do say, a seq scan (and merge
the results at the end).

So if you have a table across say, 3 disks and you need to do a seq scan
it will spawn three readers to do the read. Result: nice and fast (Yes, It
may not always spawn the three readers, only when it thinks it will be a
good thing to do)

I think for PG the effort would be much better spent on other features...
like replication and whatnot.

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/



Re: Perfomance Tuning

From
Tom Lane
Date:
Jeff <threshar@torgo.978.org> writes:
> On Tue, 12 Aug 2003, Christopher Browne wrote:
>> Are you _certain_ that's still true?  Have you a metric that shows
>> Informix being 10x faster on a modern system?  That would be quite
>> surprising...

> We were forced (for budget reason) to switch from raw disk to cooked files
> on our informix db. We took a huge hit - about 5-6x slower.
> [snip]
> The raw disks were on a hitachi fibre array and the cooked files
> were on a raid5 (scsi). Forget how many spindles in the raid.
> There were 20 raw disks)

Seems like you can't know how much of the performance hit was due to the
filesystem change and how much to the hardware change.  But I'd bet 20
disks on fibre array have way more net throughput than a single RAID
array on scsi.

            regards, tom lane

Re: Perfomance Tuning

From
Ron Johnson
Date:
On Wed, 2003-08-13 at 09:37, Tom Lane wrote:
> Jeff <threshar@torgo.978.org> writes:
> > On Tue, 12 Aug 2003, Christopher Browne wrote:
> >> Are you _certain_ that's still true?  Have you a metric that shows
> >> Informix being 10x faster on a modern system?  That would be quite
> >> surprising...
>
> > We were forced (for budget reason) to switch from raw disk to cooked files
> > on our informix db. We took a huge hit - about 5-6x slower.
> > [snip]
> > The raw disks were on a hitachi fibre array and the cooked files
> > were on a raid5 (scsi). Forget how many spindles in the raid.
> > There were 20 raw disks)
>
> Seems like you can't know how much of the performance hit was due to the
> filesystem change and how much to the hardware change.  But I'd bet 20
> disks on fibre array have way more net throughput than a single RAID
> array on scsi.

I wouldn't be surprised either if the fiber array had more cache
than the SCSI controller.

Was/is the Hitachi device a SAN?

--
+---------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net           |
| Jefferson, LA  USA                                            |
|                                                               |
| "Man, I'm pretty.  Hoo Hah!"                                  |
|    Johnny Bravo                                               |
+---------------------------------------------------------------+



Re: Perfomance Tuning

From
Bruce Momjian
Date:
Ron Johnson wrote:
> On Tue, 2003-08-12 at 13:39, Bruce Momjian wrote:
> > OK, I got some hard evidence.  Here is a discussion on the Linux kernel
> > mailing list with postings from Allen Cox (ac Linux kernels) and Stephen
> > Tweedie (ext3 author).
> >
> >     http://www.tux.org/hypermail/linux-kernel/1999week14/subject.html#start
> >
> > Search for "softupdates and ext2".
> >
> > Here is the original email in the thread:
> >
> >     http://www.tux.org/hypermail/linux-kernel/1999week14/0498.html
> >
> > Summary is at:
> >
> >     http://www.tux.org/hypermail/linux-kernel/1999week14/0571.html
> >
> > and conclusion in:
> >
> >     http://www.tux.org/hypermail/linux-kernel/1999week14/0504.html
> >
> > I now remember the issue --- ext2 makes all disk changes asynchonously
> > (unless you mount it via sync, which is slow).  This means that the file
> > system isn't always consistent on disk.
> >
> > UFS has always sync metadata (file/directory creation) to the disk so
> > the disk was always consistent, but doesn't sync the data to the disk,
> > for performance reasons.  With soft updates, the metadata writes are
> > delayed, and written to disk in an order that keeps the file system
> > consistent.
> >
> > Is this enough evidence, or should I keep researching?
>
> This is all 4 years old, though.  Isn't that why the ext3 "layer" was

Yes, it is four years old, but no one has told me ext2 has changed in
this regard, and seeing that they created ext3 to fix these aspects, I
would think ext2 hasn't changed.

> created, and filesystems like reiserFS, XFS and (kinda) JFS were added
> to Linux?

Yes, it is those ext2 limitations that caused the development of ext3
and the others.  However, they went much father than implementing a
crash-safe file system, but rather enabled a file system that doesn't
need fsck on crash reboot.  This causes fsync of data and metadata (file
creation), which slows down the file system, and PostgreSQL doesn't need
it.

You can mount ext3 and others with data=writeback to fsync only
metadata, but it isn't the default.

I am not sure what the ext3 layer is.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Perfomance Tuning

From
Josh Berkus
Date:
Jeff,

> Informix, etc. have spent a lot of time and money working on it.
> They also have the advantage of having many paid fulltime
> developers who are doing this for a job, not as a weekend hobby
> (Compared to the what? 2-3 full time PG developers).

I think 4-6 full-time, actually, plus about 200 part-time contributors.  Which
adds up to a bloody *lot* of code if you monitor pgsql-patches between
versions.  The only development advantage the commercials have over us is the
ability to engage in large projects (e.g. replication, raw filesystems, etc.)
that are difficult for a distributed network of people.

> The other advantage (which I hinted to above) with raw disks is being able
> to optimize queries to take advantage of it.  Informix is multithreaded
> and it will spawn off multiple "readers" to do say, a seq scan (and merge
> the results at the end).

I like this idea.  Has it ever been discussed for PostgreSQL?  Hmmm .... we'd
need to see some tests demonstrating that this approach was still a technical
advantage given the improvements in RAID  and FS technology since Informix
was designed.

As I have said elsewhere, Informix is probably a poor database to emulate
since they are effectively an old dead-end fork of the Ingres/Postgres code,
and have already been "mined" for most of the improvements they made.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Perfomance Tuning

From
Bruce Momjian
Date:
Josh Berkus wrote:
> Jeff,
>
> > Informix, etc. have spent a lot of time and money working on it.
> > They also have the advantage of having many paid fulltime
> > developers who are doing this for a job, not as a weekend hobby
> > (Compared to the what? 2-3 full time PG developers).
>
> I think 4-6 full-time, actually, plus about 200 part-time contributors.  Which
> adds up to a bloody *lot* of code if you monitor pgsql-patches between
> versions.  The only development advantage the commercials have over us is the
> ability to engage in large projects (e.g. replication, raw filesystems, etc.)
> that are difficult for a distributed network of people.

I think Informix's track record for post-Informix 5.0 releases is poor:

    6.0 aborted release, pretty much withdrawn
    7.0 took 1-2 years to stabalize
    8.0 where was that?
    9.0 confused customers

How much does Informix improve in 6 months?  In 2 years?  How long does
it take to get a bug fixed?

At this point, only the largest corporations can keep up with our
open-source development model.  The other database vendors have already
closed, as did Informix when purchased by IBM.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Perfomance Tuning

From
Christopher Browne
Date:
josh@agliodbs.com (Josh Berkus) writes:
>> The other advantage (which I hinted to above) with raw disks is being able
>> to optimize queries to take advantage of it.  Informix is multithreaded
>> and it will spawn off multiple "readers" to do say, a seq scan (and merge
>> the results at the end).
>
> I like this idea.  Has it ever been discussed for PostgreSQL?  Hmmm
> .... we'd need to see some tests demonstrating that this approach
> was still a technical advantage given the improvements in RAID and
> FS technology since Informix was designed.

Ah, but this approach isn't so much an I/O optimization as it is a CPU
optimization.

If you have some sort of join against a big table, and do a lot of
processing on each component, there might be CPU benefits from the
split:

create table customers(
  id customer_id,  name character varying, other fields
);  --- And we're a phone company with 8 millions of them...


create table customer_status (
  customer_id customer_id,
  status status_code
);

create table customer_address (
  customer_id customer_id,
  address_info...
);

And then are doing:

  select c.id, sum(status), address_label(c.id), balance(c.id) from
     customers c, customer_status cs;

We know there's going to be a SEQ SCAN against customers, because
that's the big table.

If I wanted to finish the query as fast as possible, as things stand
now, and had 4 CPUs, I would run 4 concurrent queries, for 4 ranges of
customers.

The Really Cool approach would be for PostgreSQL to dole out customers
across four processors, perhaps throwing a page at a time at each CPU,
where each process would quasi-independently build up their respective
result sets.
--
let name="cbbrowne" and tld="libertyrms.info" in String.concat "@" [name;tld];;
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)

Re: Perfomance Tuning

From
Christopher Browne
Date:
threshar@torgo.978.org (Jeff) writes:
> On Tue, 12 Aug 2003, Christopher Browne wrote:
>> Are you _certain_ that's still true?  Have you a metric that shows
>> Informix being 10x faster on a modern system?  That would be quite
>> surprising...

> We were forced (for budget reason) to switch from raw disk to cooked
> files on our informix db. We took a huge hit - about 5-6x slower.
> Granted part of that was because informix takes number of spindles,
> etc into account when generating query plans and the fact running
> UPDATE STATISTICS (think Vacuum analyze) on the version we run locks
> the table exclusively. And it is unacceptable to have our "main
> table" unavailable for hours and hours while the update runs. (For
> the record: its a 8cpu sun e4500 running sol2.6.  The raw disks were
> on a hitachi fibre array and the cooked files were on a raid5
> (scsi). Forget how many spindles in the raid.  There were 20 raw
> disks)

Sounds like what you were forced to do was to do TWO things:

 1.  Switch from raw disk to cooked files, and
 2.  Switch from a fibre array to a RAID array

You're attributing the 5-6x slowdown to 1., when it seems likely that
2. is a far more significant multiple.

What with there being TWO big changes that took place that might be
expected to affect performance, it seems odd to attribute a
factor-of-many change to just one aspect of that.

> Informix, etc. have spent a lot of time and money working on it.
> They also have the advantage of having many paid fulltime developers
> who are doing this for a job, not as a weekend hobby (Compared to
> the what? 2-3 full time PG developers).

<flame on>
Sure, and I'm sure the PG developers hardly know _anything_ about
implementing databases, either.
<flame off>

Certainly IBM (who bought Informix) has lots of time and money to
devote to enhancements.  But I think you underestimate the time,
skill, and effort involved with PG work.  It's quite typical for
people to imagine free software projects to basically be free-wheeling
efforts mostly involving guys that still have acne that haven't much
knowledge of the area.  Reality, for the projects that are of some
importance, is staggeringly different from that.  The number of people
with graduate degrees tends to surprise everyone.

The developers may not have time to add frivolous things to the
system, like building sophisticated Java-based GUI installers, XML
processors, or such.  That does, however, improve their focus, and so
PostgreSQL does not suffer from the way Oracle has fifty different
bundlings most of which nobody understands.

> The other advantage (which I hinted to above) with raw disks is
> being able to optimize queries to take advantage of it.  Informix is
> multithreaded and it will spawn off multiple "readers" to do say, a
> seq scan (and merge the results at the end).
>
> So if you have a table across say, 3 disks and you need to do a seq
> scan it will spawn three readers to do the read. Result: nice and
> fast (Yes, It may not always spawn the three readers, only when it
> thinks it will be a good thing to do)

Andrew Sullivan's fairly regular response is that he tried (albeit not
VASTLY extensively) to distinguish between disks when working with
fibre arrays, and he couldn't measure an improvement in shifting WAL
(the OBVIOUS thing to shift) to separate disks.

There's a lot of guesswork as to precisely why that result falls out.

One of the better guesses seems to be that if you've got enough
battery-backed memory cache on the array, that lets updates get pushed
to cache so fast that it doesn't too much matter which disk they hit.

If you've got enough spindles, and build much of the array in a
striped manner, you'll get data splitting across disks without having
to specify any "table options" to force it to happen.

You raise a good point vis-a-vis the thought of spawning multiple
readers; that could conceivably be a useful approach to improve
performance for very large queries.  If you could "stripe" the tables
in some manner so they could be doled out to multiple worker
processes, that could indeed provide some benefits.  If there are
three workers, they might round-robin to grab successive pages from
the table to do their work, and then end with a merge step.

That's probably a 7.7 change, mind you :-), but once other simpler
approaches to making the engine faster have been exhausted, that's the
sort of thing to look into next.

> I think for PG the effort would be much better spent on other
> features...  like replication and whatnot.

At this point, sure.
--
let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];;
http://www3.sympatico.ca/cbbrowne/lisp.html
"Using Java  as a general purpose application  development language is
like  going big  game  hunting  armed with  Nerf  weapons."
-- Author Unknown

Re: Perfomance Tuning

From
Ron Johnson
Date:
On Wed, 2003-08-13 at 10:46, Josh Berkus wrote:
> Jeff,
>
[snip]
> > The other advantage (which I hinted to above) with raw disks is being able
> > to optimize queries to take advantage of it.  Informix is multithreaded
> > and it will spawn off multiple "readers" to do say, a seq scan (and merge
> > the results at the end).
>
> I like this idea.  Has it ever been discussed for PostgreSQL?  Hmmm .... we'd
> need to see some tests demonstrating that this approach was still a technical
> advantage given the improvements in RAID  and FS technology since Informix
> was designed.

Wouldn't PG 1st need horizontal partitioning, and as a precursor to
that, "tablespaces"?

--
+---------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net           |
| Jefferson, LA  USA                                            |
|                                                               |
| "Man, I'm pretty.  Hoo Hah!"                                  |
|    Johnny Bravo                                               |
+---------------------------------------------------------------+



Re: Perfomance Tuning

From
Sean Chittenden
Date:
> Andrew Sullivan's fairly regular response is that he tried (albeit
> not VASTLY extensively) to distinguish between disks when working
> with fibre arrays, and he couldn't measure an improvement in
> shifting WAL (the OBVIOUS thing to shift) to separate disks.

Real quick... the faster the drives, the less important it is to move
WAL onto a different drive.  The slower the drives, the more important
this is... which is why this isn't as necessary (if at all) for large
production environments.

-sc

--
Sean Chittenden

Re: Perfomance Tuning

From
Jeff
Date:
On Wed, 13 Aug 2003, Christopher Browne wrote:

> Sounds like what you were forced to do was to do TWO things:
>
>  1.  Switch from raw disk to cooked files, and
>  2.  Switch from a fibre array to a RAID array
>
> You're attributing the 5-6x slowdown to 1., when it seems likely that
> 2. is a far more significant multiple.
>

True.

> <flame on>
> Sure, and I'm sure the PG developers hardly know _anything_ about
> implementing databases, either.
> <flame off>

Oh I know they are good at it. I deal a lot with informix and PG and if I
could I'd bring Tom, Bruce, Joe, etc. out for a beer as I'm *constantly*
fighting informix and our PG box just sits there merrily churning away.
(and god bless "explain analyze" - informix's version is basically boolean
- "I will use an index" "I will use a seq scan". Doesn't even tell you
what index!. )

> You raise a good point vis-a-vis the thought of spawning multiple
> readers; that could conceivably be a useful approach to improve
> performance for very large queries.  If you could "stripe" the tables
> in some manner so they could be doled out to multiple worker
> processes, that could indeed provide some benefits.  If there are
> three workers, they might round-robin to grab successive pages from
> the table to do their work, and then end with a merge step.

The way informix does this is two fold:
1. it handles the raw disks, it knows where table data is
2. it can "partition" tables in a number of ways: round robin,
concatination or expression (Expression is nifty, allows you to use a
basic "where" clause to decide where to put data.  ie
create table foo (
a int,
b int,
c int ) fragment on c > 0 and c < 100 in dbspace1, c > 100 c < 200 in
dbspace 2;

that kind of thing.
and yeah, I would not expect to see it for a long time.. Without threading
it would be rather difficult to implement.. but who knows what the future
will bring us.

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/



Re: Perfomance Tuning

From
Jeff
Date:
On Wed, 13 Aug 2003, Bruce Momjian wrote:

> I think Informix's track record for post-Informix 5.0 releases is poor:
>
>     6.0 aborted release, pretty much withdrawn
>     7.0 took 1-2 years to stabalize
>     8.0 where was that?

8.0 never occured. It went 7.3 -> 9.0

>     9.0 confused customers

9.0 had some good stuff. 9.4 *FINALLY* removed a lot of limitations (2GB
chunks, etc).  (9.4 came out a few years after 7)

> How much does Informix improve in 6 months?  In 2 years?  How long does
> it take to get a bug fixed?

You make me laugh :)

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/



Re: Perfomance Tuning

From
Jeff
Date:
On Wed, 13 Aug 2003, Josh Berkus wrote:

> As I have said elsewhere, Informix is probably a poor database to emulate
> since they are effectively an old dead-end fork of the Ingres/Postgres code,
> and have already been "mined" for most of the improvements they made.
>
With informix 7.0 they rewrote the entire thing from the ground up to
remove a bunch of limitations and build a multithreaded engine.
so it isn't so much an old fork anymore.

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/



Re: Perfomance Tuning

From
Christopher Browne
Date:
threshar@torgo.978.org (Jeff) writes:
> On Wed, 13 Aug 2003, Christopher Browne wrote:
>> You raise a good point vis-a-vis the thought of spawning multiple
>> readers; that could conceivably be a useful approach to improve
>> performance for very large queries.  If you could "stripe" the tables
>> in some manner so they could be doled out to multiple worker
>> processes, that could indeed provide some benefits.  If there are
>> three workers, they might round-robin to grab successive pages from
>> the table to do their work, and then end with a merge step.
>
> The way informix does this is two fold:
> 1. it handles the raw disks, it knows where table data is

The thing is, this isn't something where there is guaranteed to be a
permanent _massive_ difference in performance between "raw" and
"cooked."

Traditionally, "handling raw disks" was a big deal because the DBMS
could then decide where to stick the data, possibly down to specifying
what sector of what track of what spindle.  There are four reasons for
this to not be such a big deal anymore:

 1.  Disk drives lie to you.  They don't necessarily provide
     information that even _resembles_ their true geometry.  So the
     best you can get is to be sure that "this block was on drive 4,
     that block was on drive 7."

 2.  On a big system, you're more than likely using hardware RAID,
     where there's further cacheing, and where the disk array may
     not be honest to the DBMS about where the drives actually are.

 3.  The other traditional benefit to "raw" disks was that they
     allowed the DBMS to be _certain_ that data was committed in
     some particular order.  But 1. and 2. provide regrettable
     opportunities for the DBMS' belief to be forlorn.  (With the
     degree to which disk drives lie about things, I have to be a
     bit skeptical of some of the BSD FFS claims which at least
     appear to assume that they _do_ control the disk drive...
     This is NOT reason, by the way, to consider FFS to be, in
     any way, "bad," but rather just that some of the guarantees
     may get stolen by your disk drive...)

 4.  Today's filesystems _aren't_ Grandpa's UFS.  We've got better
     stuff than we had back in the Ultrix days.

> 2. it can "partition" tables in a number of ways: round robin,
> concatination or expression (Expression is nifty, allows you to use a
> basic "where" clause to decide where to put data.  ie
> create table foo (
> a int,
> b int,
> c int ) fragment on c > 0 and c < 100 in dbspace1, c > 100 c < 200 in
> dbspace 2;
>
> that kind of thing.

I remember thinking this was rather neat when I first saw it.

The "fragment on" part was most interesting at the time, when everyone
else (including filesystem makers) were decrying fragmentation as the
ultimate evil.  In effect, Informix was saying that they would
_improve_ performance through fragmentation...  Sort of like the rash
claim that performance can be improved _without_ resorting to a
threading-based model...

> and yeah, I would not expect to see it for a long time.. Without
> threading it would be rather difficult to implement.. but who knows
> what the future will bring us.

The typical assumption is that threading is a magical talisman that
will bring all sorts of benefits.  There have been enough cases where
PostgreSQL has demonstrated stunning improvements _without_ threading
that I am very skeptical that it is necessarily necessary.
--
output = reverse("gro.gultn" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/sap.html
Rules of the Evil Overlord #204. "I will hire an entire squad of blind
guards.   Not only  is this  in  keeping with  my status  as an  equal
opportunity employer, but it will  come in handy when the hero becomes
invisible or douses my only light source."
<http://www.eviloverlord.com/>

Re: Perfomance Tuning

From
Christopher Browne
Date:
threshar@torgo.978.org (Jeff) writes:
> On Wed, 13 Aug 2003, Josh Berkus wrote:
>> As I have said elsewhere, Informix is probably a poor database to emulate
>> since they are effectively an old dead-end fork of the Ingres/Postgres code,
>> and have already been "mined" for most of the improvements they made.
>>
> With informix 7.0 they rewrote the entire thing from the ground up
> to remove a bunch of limitations and build a multithreaded engine.
> so it isn't so much an old fork anymore.

No, I think you misunderstand the intent...

The pre-7.0 version was based on Informix's B-Tree libraries, and the
file structuring actually bears a marked resemblance to that of MySQL
(that's an observation; neither forcibly a good or a bad thing), where
there's a data file for the table, and then a bunch of index files,
named somewhat after the table.

In the 7.0-and-after era, they added in the "old dead-end fork of the
Ingres/Postgres code" to get the "Universal Data Server."

[This is diverging somewhat from "performance;" let's try to resist
extending discussion...]
--
(reverse (concatenate 'string "ofni.smrytrebil" "@" "enworbbc"))
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 646 3304 x124 (land)

Re: Perfomance Tuning

From
Bruce Momjian
Date:
Christopher Browne wrote:
> threshar@torgo.978.org (Jeff) writes:
> > On Wed, 13 Aug 2003, Josh Berkus wrote:
> >> As I have said elsewhere, Informix is probably a poor database to emulate
> >> since they are effectively an old dead-end fork of the Ingres/Postgres code,
> >> and have already been "mined" for most of the improvements they made.
> >>
> > With informix 7.0 they rewrote the entire thing from the ground up
> > to remove a bunch of limitations and build a multithreaded engine.
> > so it isn't so much an old fork anymore.
>
> No, I think you misunderstand the intent...
>
> The pre-7.0 version was based on Informix's B-Tree libraries, and the
> file structuring actually bears a marked resemblance to that of MySQL
> (that's an observation; neither forcibly a good or a bad thing), where
> there's a data file for the table, and then a bunch of index files,
> named somewhat after the table.
>
> In the 7.0-and-after era, they added in the "old dead-end fork of the
> Ingres/Postgres code" to get the "Universal Data Server."

I think 9.0 was the the Ingres/Postgres code, not 7.X.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Perfomance Tuning

From
Bruce Momjian
Date:
Jeff wrote:
> > You raise a good point vis-a-vis the thought of spawning multiple
> > readers; that could conceivably be a useful approach to improve
> > performance for very large queries.  If you could "stripe" the tables
> > in some manner so they could be doled out to multiple worker
> > processes, that could indeed provide some benefits.  If there are
> > three workers, they might round-robin to grab successive pages from
> > the table to do their work, and then end with a merge step.
>
> The way informix does this is two fold:
> 1. it handles the raw disks, it knows where table data is
> 2. it can "partition" tables in a number of ways: round robin,
> concatination or expression (Expression is nifty, allows you to use a
> basic "where" clause to decide where to put data.  ie
> create table foo (
> a int,
> b int,
> c int ) fragment on c > 0 and c < 100 in dbspace1, c > 100 c < 200 in
> dbspace 2;
>
> that kind of thing.
> and yeah, I would not expect to see it for a long time.. Without threading
> it would be rather difficult to implement.. but who knows what the future
> will bring us.

The big question is whether the added complexity is worth it.  I know
Informix 5 was faster than Informix 7 on single CPU machines for quite a
while.  It might still be true.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073