Thread: ext3

ext3

From
Mage
Date:
          Hello,

Gabor Szima asked us to translate the letter below.

"I read that ext3 writeback mode is recommended for PostgreSQL. I made
some tests.

                data=ordered        data=writeback
----------------------------------------------------------------------
restoredb:             2m16.790s        1m42.367s
UPDATE <tbl1> (17krows):    9.289s            7.147s
UPDATE <tbl1> (17krows) (2.):    10.480s            3.778s
VACUUM ANALYZE <tbl1>:        9.364s            0.986s !
VACUUM FULL <tbl1>:        16.071s            2.575s
REINDEX TABLE <tbl1>:        3.815s            1.886s
----------------------------------------------------------------------

It's seductive.
However I made some crash-tests too. Updated 4 tables simultaneously and
recurring for 10 to 120s, then powered off the machine (without the
reset button. i just pulled out the cable).

SEQ RECOVERY-WARNINGS   VACUUM
-------------------------------
01: 1650                OK        (WARNING:  invalid page header in
block 769 of relation "18800"; zeroing out page)
02: 3            FATAL        (ERROR:  could not access status of
transaction 37814272)
-------------------------------        (DETAIL:  could not open file
"/data/pgdata/pg_clog/0024": No such file or directory)

I have stopped my tests at this point because this is not for production
use. The database was corrupted.


With ordered mode I got this:

ext3-noatime,data=ordered:

SEQ RECOVERY-WARNINGS   VACUUM
------------------------------
01: 0                   OK
02: 0                   OK
03: 0                   OK
04: 0                   W,OK    (relation "<tbl>" page 398 is
uninitialized --- fixing)
05: 0                   OK
06: 0                   OK
07: 0                   W,OK    (relation "<tbl>" page 911 is
uninitialized --- fixing)
08: 0                   OK
09: 0                   OK
10: 0                   OK
------------------------------

I think that writeback mode first records the data then the inode, and
the ordered mode does it in reverse order.  I also mean that postgres
log requires the inode recorded correctly, the data loss is handled by
the WAL.

AMD XP2000, 512MB RAM, PostgreSQL 7.4.6 (i686), linux-2.4.28, gcc-3.3.5,
Adaptec 29160, WD Enterprise 4360 (SCSI, SCA-80)

I made mkfs and initdb before every tests and I repeated them in reverse
order too. No quake3 ran in the background.

-Sygma"


Sorry for my english.


       Mage


Re: ext3

From
Lonni J Friedman
Date:
On Mon, 17 Jan 2005 20:00:46 +0100, Mage <mage@mage.hu> wrote:
>           Hello,
>
> Gabor Szima asked us to translate the letter below.
>
> "I read that ext3 writeback mode is recommended for PostgreSQL. I made
> some tests.
>
>                 data=ordered        data=writeback
> ----------------------------------------------------------------------
> restoredb:             2m16.790s        1m42.367s
> UPDATE <tbl1> (17krows):    9.289s            7.147s
> UPDATE <tbl1> (17krows) (2.):    10.480s            3.778s
> VACUUM ANALYZE <tbl1>:        9.364s            0.986s !
> VACUUM FULL <tbl1>:        16.071s            2.575s
> REINDEX TABLE <tbl1>:        3.815s            1.886s
> ----------------------------------------------------------------------
>
> It's seductive.
> However I made some crash-tests too. Updated 4 tables simultaneously and
> recurring for 10 to 120s, then powered off the machine (without the
> reset button. i just pulled out the cable).

That's an excellent way to fry your PSU and damage your hardware.


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Re: ext3

From
Tzahi Fadida
Date:
I recommend you don't use ext3 for any database:
http://seclists.org/lists/linux-kernel/2005/Jan/0641.html

apparently its still buggy.

Regards,
    tzahi.

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Mage
> Sent: Monday, January 17, 2005 9:01 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] ext3
>
>
>           Hello,
>
> Gabor Szima asked us to translate the letter below.
>
> "I read that ext3 writeback mode is recommended for
> PostgreSQL. I made
> some tests.
>
>                 data=ordered        data=writeback
> ----------------------------------------------------------------------
> restoredb:             2m16.790s        1m42.367s
> UPDATE <tbl1> (17krows):    9.289s            7.147s
> UPDATE <tbl1> (17krows) (2.):    10.480s            3.778s
> VACUUM ANALYZE <tbl1>:        9.364s            0.986s !
> VACUUM FULL <tbl1>:        16.071s            2.575s
> REINDEX TABLE <tbl1>:        3.815s            1.886s
> ----------------------------------------------------------------------
>
> It's seductive.
> However I made some crash-tests too. Updated 4 tables
> simultaneously and
> recurring for 10 to 120s, then powered off the machine (without the
> reset button. i just pulled out the cable).
>
> SEQ RECOVERY-WARNINGS   VACUUM
> -------------------------------
> 01: 1650                OK        (WARNING:  invalid page header in
> block 769 of relation "18800"; zeroing out page)
> 02: 3            FATAL        (ERROR:  could not access status of
> transaction 37814272)
> -------------------------------        (DETAIL:  could not open file
> "/data/pgdata/pg_clog/0024": No such file or directory)
>
> I have stopped my tests at this point because this is not for
> production
> use. The database was corrupted.
>
>
> With ordered mode I got this:
>
> ext3-noatime,data=ordered:
>
> SEQ RECOVERY-WARNINGS   VACUUM
> ------------------------------
> 01: 0                   OK
> 02: 0                   OK
> 03: 0                   OK
> 04: 0                   W,OK    (relation "<tbl>" page 398 is
> uninitialized --- fixing)
> 05: 0                   OK
> 06: 0                   OK
> 07: 0                   W,OK    (relation "<tbl>" page 911 is
> uninitialized --- fixing)
> 08: 0                   OK
> 09: 0                   OK
> 10: 0                   OK
> ------------------------------
>
> I think that writeback mode first records the data then the
> inode, and
> the ordered mode does it in reverse order.  I also mean that postgres
> log requires the inode recorded correctly, the data loss is
> handled by
> the WAL.
>
> AMD XP2000, 512MB RAM, PostgreSQL 7.4.6 (i686), linux-2.4.28,
> gcc-3.3.5,
> Adaptec 29160, WD Enterprise 4360 (SCSI, SCA-80)
>
> I made mkfs and initdb before every tests and I repeated them
> in reverse
> order too. No quake3 ran in the background.
>
> -Sygma"
>
>
> Sorry for my english.
>
>
>        Mage
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index
> scan if your
>       joining column's datatypes do not match
>
>



Re: ext3

From
PFC
Date:

> Gabor Szima asked us to translate the letter below.
>
> "I read that ext3 writeback mode is recommended for PostgreSQL. I made
> some tests.
>
>                 data=ordered        data=writeback
> ----------------------------------------------------------------------
> restoredb:             2m16.790s        1m42.367s
> UPDATE <tbl1> (17krows):    9.289s            7.147s
> UPDATE <tbl1> (17krows) (2.):    10.480s            3.778s
> VACUUM ANALYZE <tbl1>:        9.364s            0.986s !
> VACUUM FULL <tbl1>:        16.071s            2.575s
> REINDEX TABLE <tbl1>:        3.815s            1.886s
> ----------------------------------------------------------------------

    Hum. You might as well run it with fsync disabled for that extra thrill ;)
    You could try the same with reiserfs (hint, hint).

Question on output of VACUUM VERBOSE

From
"Cornelia Boenigk"
Date:
Hi all

I don't understand what these two lines exactly mean.

INFO:  free space map: 490 relations, 13541 pages stored; 34480 total pages
needed
DETAIL: Allocated FSM size: 1000 relations + 20000 pages = 178 kB shared
memory

Thanks in advance
Conni


Re: ext3

From
David Garamond
Date:
Tzahi Fadida wrote:
> I recommend you don't use ext3 for any database:
> http://seclists.org/lists/linux-kernel/2005/Jan/0641.html
>
> apparently its still buggy.

So what is the recommended fs under Linux? I don't need the best
speed/throughput, but I prefer not to use ext2 due to long fsck time. I
also tend to avoid reiser3, it has given us many griefs in the past. XFS?

Regards,
dave

Re: ext3

From
Rich Shepard
Date:
On Tue, 18 Jan 2005, David Garamond wrote:

> So what is the recommended fs under Linux? I don't need the best
> speed/throughput, but I prefer not to use ext2 due to long fsck time. I
> also tend to avoid reiser3, it has given us many griefs in the past. XFS?

dave,

   I have no large databases here, but I run reiserfs on must systems and
ext3 on my notebook. Over the past couple of years I've had no problems with
either. I've read of folks who don't like one or the other, but locally I
know of no one who's experienced any negative situations with either.

Rich

--
Dr. Richard B. Shepard, President
Applied Ecosystem Services, Inc. (TM)
<http://www.appl-ecosys.com>   Voice: 503-667-4517   Fax: 503-667-8863

Re: ext3

From
"Joshua D. Drake"
Date:
David Garamond wrote:

> Tzahi Fadida wrote:
>
>> I recommend you don't use ext3 for any database:
>> http://seclists.org/lists/linux-kernel/2005/Jan/0641.html
>>
>> apparently its still buggy.
>
>
> So what is the recommended fs under Linux? I don't need the best
> speed/throughput, but I prefer not to use ext2 due to long fsck time.
> I also tend to avoid reiser3, it has given us many griefs in the past.
> XFS?

We have had success with XFS and JFS. XFS seems a little better supported.

Sincerely,

Joshua D. Drake



>
>
> Regards,
> dave
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>               http://archives.postgresql.org



--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL


Attachment

Re: ext3

From
"Frank D. Engel, Jr."
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I typically use XFS when given the choice.

On Jan 17, 2005, at 7:52 PM, Rich Shepard wrote:

> On Tue, 18 Jan 2005, David Garamond wrote:
>
>> So what is the recommended fs under Linux? I don't need the best
>> speed/throughput, but I prefer not to use ext2 due to long fsck time.
>> I
>> also tend to avoid reiser3, it has given us many griefs in the past.
>> XFS?
>
> dave,
>
>    I have no large databases here, but I run reiserfs on must systems
> and
> ext3 on my notebook. Over the past couple of years I've had no
> problems with
> either. I've read of folks who don't like one or the other, but
> locally I
> know of no one who's experienced any negative situations with either.
>
> Rich
>
> --
> Dr. Richard B. Shepard, President
> Applied Ecosystem Services, Inc. (TM)
> <http://www.appl-ecosys.com>   Voice: 503-667-4517   Fax: 503-667-8863
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>
>
- -----------------------------------------------------------
Frank D. Engel, Jr.  <fde101@fjrhome.net>

$ ln -s /usr/share/kjvbible /usr/manual
$ true | cat /usr/manual | grep "John 3:16"
John 3:16 For God so loved the world, that he gave his only begotten
Son, that whosoever believeth in him should not perish, but have
everlasting life.
$
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFB7GKY7aqtWrR9cZoRArC+AJ9yBCcKWu0hurzvyYgYPgak3bSXSQCfZUK4
RwY2fa38Nco6JUdDdBwvUQ0=
=n3Lt
-----END PGP SIGNATURE-----



___________________________________________________________
$0 Web Hosting with up to 120MB web space, 1000 MB Transfer
10 Personalized POP and Web E-mail Accounts, and much more.
Signup at www.doteasy.com


Re: ext3

From
Lonni J Friedman
Date:
On Mon, 17 Jan 2005 16:54:45 -0800, Joshua D. Drake
<jd@commandprompt.com> wrote:
> David Garamond wrote:
>
> > Tzahi Fadida wrote:
> >
> >> I recommend you don't use ext3 for any database:
> >> http://seclists.org/lists/linux-kernel/2005/Jan/0641.html
> >>
> >> apparently its still buggy.
> >
> >
> > So what is the recommended fs under Linux? I don't need the best
> > speed/throughput, but I prefer not to use ext2 due to long fsck time.
> > I also tend to avoid reiser3, it has given us many griefs in the past.
> > XFS?
>
> We have had success with XFS and JFS. XFS seems a little better supported.

I'll 2nd (or 3rd?) that vote for XFS.  Its been rock solid for my servers.


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       http://netllama.linux-sxs.org

Re: ext3

From
Jeff Davis
Date:
On Tue, 2005-01-18 at 07:43 +0700, David Garamond wrote:
> Tzahi Fadida wrote:
> > I recommend you don't use ext3 for any database:
> > http://seclists.org/lists/linux-kernel/2005/Jan/0641.html
> >
> > apparently its still buggy.
>
> So what is the recommended fs under Linux? I don't need the best
> speed/throughput, but I prefer not to use ext2 due to long fsck time. I

Wouldn't ext2 also allow the possibility of a missing file? Even though
postgres does WAL, couldn't ext2 forget a file or not record that a new
file has been created?

In other words, does PostgreSQL assume that the filesystem at least
journals the metadata?

Regards,
    Jeff Davis



Re: ext3

From
William Yu
Date:
You may also want to test data=journal for ext3. Most of the time, this
is slower but for databases with logging and mail servers, it can be faster.


Mage wrote:
>          Hello,
>
> Gabor Szima asked us to translate the letter below.
>
> "I read that ext3 writeback mode is recommended for PostgreSQL. I made
> some tests.
>
>                data=ordered        data=writeback
> ----------------------------------------------------------------------
> restoredb:             2m16.790s        1m42.367s
> UPDATE <tbl1> (17krows):    9.289s            7.147s
> UPDATE <tbl1> (17krows) (2.):    10.480s            3.778s
> VACUUM ANALYZE <tbl1>:        9.364s            0.986s !
> VACUUM FULL <tbl1>:        16.071s            2.575s
> REINDEX TABLE <tbl1>:        3.815s            1.886s
> ----------------------------------------------------------------------
>
> It's seductive.
> However I made some crash-tests too. Updated 4 tables simultaneously and
> recurring for 10 to 120s, then powered off the machine (without the
> reset button. i just pulled out the cable).
>
> SEQ RECOVERY-WARNINGS   VACUUM
> -------------------------------
> 01: 1650                OK        (WARNING:  invalid page header in
> block 769 of relation "18800"; zeroing out page)
> 02: 3            FATAL        (ERROR:  could not access status of
> transaction 37814272)
> -------------------------------        (DETAIL:  could not open file
> "/data/pgdata/pg_clog/0024": No such file or directory)
>
> I have stopped my tests at this point because this is not for production
> use. The database was corrupted.
>
>
> With ordered mode I got this:
>
> ext3-noatime,data=ordered:
>
> SEQ RECOVERY-WARNINGS   VACUUM
> ------------------------------
> 01: 0                   OK
> 02: 0                   OK
> 03: 0                   OK
> 04: 0                   W,OK    (relation "<tbl>" page 398 is
> uninitialized --- fixing)
> 05: 0                   OK
> 06: 0                   OK
> 07: 0                   W,OK    (relation "<tbl>" page 911 is
> uninitialized --- fixing)
> 08: 0                   OK
> 09: 0                   OK
> 10: 0                   OK
> ------------------------------
>
> I think that writeback mode first records the data then the inode, and
> the ordered mode does it in reverse order.  I also mean that postgres
> log requires the inode recorded correctly, the data loss is handled by
> the WAL.
>
> AMD XP2000, 512MB RAM, PostgreSQL 7.4.6 (i686), linux-2.4.28, gcc-3.3.5,
> Adaptec 29160, WD Enterprise 4360 (SCSI, SCA-80)

Re: ext3

From
Tom Lane
Date:
Jeff Davis <jdavis-pgsql@empires.org> writes:
> In other words, does PostgreSQL assume that the filesystem at least
> journals the metadata?

Postgres assumes that the filesystem can take care of itself, which we
define as not losing or corrupting successfully-fsynced data.  The
original BSD filesystem designs met this requirement without any
journal; they were just careful about the order in which things got
forced to disk.  It appears that ext3 may not be able to meet this
requirement even with a journal :-(.  But in theory a metadata journal
should be sufficient.  Journaling data writes is redundant, unless maybe
the filesystem substitutes that for the ordinary idea of fsync().

            regards, tom lane

Re: ext3

From
Tino Wildenhain
Date:
Hi,

Am Dienstag, den 18.01.2005, 07:43 +0700 schrieb David Garamond:
> Tzahi Fadida wrote:
> > I recommend you don't use ext3 for any database:
> > http://seclists.org/lists/linux-kernel/2005/Jan/0641.html
> >
> > apparently its still buggy.
>
> So what is the recommended fs under Linux? I don't need the best
> speed/throughput, but I prefer not to use ext2 due to long fsck time. I
> also tend to avoid reiser3, it has given us many griefs in the past. XFS?

From my experience, reiser3 dies if the hardware dies. E.g. if your
disk starts trashing blocks. So when you have trusty hardware
(good raid level), reiserfs works very well. I've not yet tested
XFS on faulty disks. But on raid it works very well and it is
somewhat optimized for larger files - as tables and indices
can be.

HTH
Tino


Re: ext3

From
Tino Wildenhain
Date:
Am Montag, den 17.01.2005, 17:47 -0800 schrieb Jeff Davis:
> On Tue, 2005-01-18 at 07:43 +0700, David Garamond wrote:
> > Tzahi Fadida wrote:
> > > I recommend you don't use ext3 for any database:
> > > http://seclists.org/lists/linux-kernel/2005/Jan/0641.html
> > >
> > > apparently its still buggy.
> >
> > So what is the recommended fs under Linux? I don't need the best
> > speed/throughput, but I prefer not to use ext2 due to long fsck time. I
>
> Wouldn't ext2 also allow the possibility of a missing file? Even though
> postgres does WAL, couldn't ext2 forget a file or not record that a new
> file has been created?
>
> In other words, does PostgreSQL assume that the filesystem at least
> journals the metadata?

Well, postgres likes that no already written and sync()ed data gets
lost.
And the filesystem must be in consistent state to work at all. So
to ensure (2) ext2 must du fsck, which takes a considerable amount
of time if on large partitions.

Regards
Tino


Re: Question on output of VACUUM VERBOSE

From
Thomas F.O'Connell
Date:
I think that INFO gives you information about your current usage and
that DETAIL tells you what is currently set in your configuration.

In this example, the default settings appear to be sufficient for your
database. If the values in INFO were larger than the values in DETAIL,
you would want to consider increasing max_fsm_relations and
max_fsm_pages in postgresql.conf.

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Jan 17, 2005, at 5:14 PM, Cornelia Boenigk wrote:

> Hi all
>
> I don't understand what these two lines exactly mean.
>
> INFO:  free space map: 490 relations, 13541 pages stored; 34480 total
> pages
> needed
> DETAIL: Allocated FSM size: 1000 relations + 20000 pages = 178 kB
> shared
> memory
>
> Thanks in advance
> Conni