Thread: Initial database loading and IDE x SCSI

Initial database loading and IDE x SCSI

From
Date:
Hi,

I would like to know if my supposition is right.

Considering an environment with only one hard disk attached to a server, an
initial loading of the database probably is much faster using an IDE/ATA
interface with write-back on than using an SCSI interface. That´s because of
the SCSI command interface overhead.

Then main advantage of SCSI interfaces, the multiuser environment is lost in
this scenery.

Am I right? Am I missing something here?

Even if I´m right, is something that could be done too improove SCSI loading
performance in this scenery?

Thanks in advance!

Reimer


Re: Initial database loading and IDE x SCSI

From
Scott Marlowe
Date:
On Fri, 2006-06-02 at 13:25, carlosreimer@terra.com.br wrote:
> Hi,
>
> I would like to know if my supposition is right.
>
> Considering an environment with only one hard disk attached to a server, an
> initial loading of the database probably is much faster using an IDE/ATA
> interface with write-back on than using an SCSI interface. That´s because of
> the SCSI command interface overhead.
>
> Then main advantage of SCSI interfaces, the multiuser environment is lost in
> this scenery.
>
> Am I right? Am I missing something here?
>
> Even if I´m right, is something that could be done too improove SCSI loading
> performance in this scenery?

The answer is yes.  And no.

IDE drives notoriously lie about their cache, so that if you have the
cache enabled, the IDE drive will nominally ack to an fsync before it's
actually written the data.  So, the IDE drive will write faster, but
your data probably won't survive a system crash or power loss during a
write.  If you turn off the cache, then the IDE drive will be much
slower.

SCSI overhead isn't really a big issue during loads because you're
usually writing data at a good clip, and the overhead of SCSI is pretty
small by comparison to how much data you'll be slinging.

However, SCSI drives don't lie about Fsync, so the maximum speed of your
output will be limited by the speed at which your machine can fsync the
pg_xlog output.

For a single disk system, just doing development or a reporting
database, an IDE drive is often just fine.  But under no circumstances
should you put an accounting system on a single drive, especially IDE
with cache turned on.

Re: Initial database loading and IDE x SCSI

From
Mark Lewis
Date:
On Fri, 2006-06-02 at 15:25 -0300, carlosreimer@terra.com.br wrote:
> Hi,
>
> I would like to know if my supposition is right.
>
> Considering an environment with only one hard disk attached to a server, an
> initial loading of the database probably is much faster using an IDE/ATA
> interface with write-back on than using an SCSI interface. That´s because of
> the SCSI command interface overhead.

No, it's because the SCSI drive is honoring the database's request to
make sure the data is safe.

> Then main advantage of SCSI interfaces, the multiuser environment is lost in
> this scenery.
>
> Am I right? Am I missing something here?
>
> Even if I´m right, is something that could be done too improove SCSI loading
> performance in this scenery?

You can perform the initial load in large transactions.  The extra
overhead for ensuring that data is safely written to disk will only be
incurred once per transaction, so try to minimize the number of
transactions.

You could optionally set fsync=off in postgresql.conf, which means that
the SCSI drive will operate with no more safety than an IDE drive.  But
you should only do that if you're willing to deal with catastrophic data
corruption.  But if this is for a desktop application where you need to
support IDE drives, you'll need to deal with that anyway.

> Thanks in advance!
>
> Reimer
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Re: Initial database loading and IDE x SCSI

From
Tom Lane
Date:
<carlosreimer@terra.com.br> writes:
> I would like to know if my supposition is right.

> Considering an environment with only one hard disk attached to a server, an
> initial loading of the database probably is much faster using an IDE/ATA
> interface with write-back on than using an SCSI interface. That�s because of
> the SCSI command interface overhead.

I *seriously* doubt that.

If you see a difference in practice it's likely got more to do with the
SCSI drive not lying about write-complete ...

            regards, tom lane

RES: Initial database loading and IDE x SCSI

From
Date:
> <carlosreimer@terra.com.br> writes:
> > I would like to know if my supposition is right.
>
> > Considering an environment with only one hard disk attached to
> a server, an
> > initial loading of the database probably is much faster using an IDE/ATA
> > interface with write-back on than using an SCSI interface.
> That´s because of
> > the SCSI command interface overhead.
>
> I *seriously* doubt that.
>
> If you see a difference in practice it's likely got more to do with the
> SCSI drive not lying about write-complete ...
>

Many thanks for the answers! There are some more thinks I could not
understand about this issue?

I was considering it but if you have a lot of writes operations, will not
the disk cache full quickly?

If it´s full will not the system wait until something could be write to the
disk surface?

If you have almost all the time the cache full will it not useless?

Should not, in this scenary, with almost all the time the cache full, IDE
and SCSI write operations have almost the same performance?

Thanks in advance,

Reimer



Re: RES: Initial database loading and IDE x SCSI

From
Mark Lewis
Date:
On Fri, 2006-06-02 at 16:54 -0300, carlosreimer@terra.com.br wrote:
> > <carlosreimer@terra.com.br> writes:
> > > I would like to know if my supposition is right.
> >
> > > Considering an environment with only one hard disk attached to
> > a server, an
> > > initial loading of the database probably is much faster using an IDE/ATA
> > > interface with write-back on than using an SCSI interface.
> > That´s because of
> > > the SCSI command interface overhead.
> >
> > I *seriously* doubt that.
> >
> > If you see a difference in practice it's likely got more to do with the
> > SCSI drive not lying about write-complete ...
> >
>
> Many thanks for the answers! There are some more thinks I could not
> understand about this issue?
>
> I was considering it but if you have a lot of writes operations, will not
> the disk cache full quickly?
>
> If it´s full will not the system wait until something could be write to the
> disk surface?
>
> If you have almost all the time the cache full will it not useless?
>
> Should not, in this scenary, with almost all the time the cache full, IDE
> and SCSI write operations have almost the same performance?
>

This is the ideal case.  However, you only get to that case if you use
large transactions or run with fsync=off or run with a write cache (like
IDE drives, or nice RAID controllers which have a battery-backed cache).

Remember that one of the important qualities of a transaction is that
it's durable, so once you commit it the data is definitely stored on the
disk and one nanosecond later you could power the machine off and it
would still be there.

To achieve that durability guarantee, the system needs to make sure that
if you commit a transaction, the data is actually written to the
physical platters on the hard drive.

This means that if you take the naive approach to importing data (one
row at a time, each in its own transaction), then instead of blasting
data onto the hard drive at maximum speed, the application will wait for
the platter to rotate to the right position, write one row's worth of
data, then wait for the platter to rotate to the right position again
and insert another row, etc.  This approach is very slow.

The naive approach works on IDE drives because they don't (usually)
honor the request to write the data immediately, so it can fill its
write cache up with several megabytes of data and write it out to the
disk at its leisure.

> Thanks in advance,
>
> Reimer
>

-- Mark

RES: RES: Initial database loading and IDE x SCSI

From
Date:
Many thanks Mark,

I will consider fsync=off only to do an initial load, not for a database normal operation.

I was just thinking about this hipotetical scenario:
a) a restore database operation
b) fsync off
c) write-back on (IDE)

As I could understand, in this sceneraio, it´s normal the IDE drive be faster than the SCSI, ok?

Of course, the database is exposed because of the fsync=off, but if you consider only the system performance, then it
istrue. Isn´t it? 

Thanks,

Reimer



Re: RES: RES: Initial database loading and IDE x SCSI

From
Mark Lewis
Date:
On Fri, 2006-06-02 at 17:37 -0300, carlosreimer@terra.com.br wrote:
> Many thanks Mark,
>
> I will consider fsync=off only to do an initial load, not for a database normal operation.
>

This approach works well.  You just need to remember to shut down the
database and start it back up again with fsync enabled for it to be safe
after the initial load.

> I was just thinking about this hipotetical scenario:
> a) a restore database operation
> b) fsync off
> c) write-back on (IDE)
>
> As I could understand, in this sceneraio, it´s normal the IDE drive be faster than the SCSI, ok?
>

If fsync is off, then the IDE drive loses its big advantage, so IDE and
SCSI should be about the same speed.

> Of course, the database is exposed because of the fsync=off, but if you consider only the system performance, then it
istrue. Isn´t it? 



> Thanks,
>
> Reimer
>
>

RES: RES: RES: Initial database loading and IDE x SCSI

From
Date:
> > Many thanks Mark,
> >
> > I will consider fsync=off only to do an initial load, not for a
> database normal operation.
> >
>
> This approach works well.  You just need to remember to shut down the
> database and start it back up again with fsync enabled for it to be safe
> after the initial load.
>
> > I was just thinking about this hipotetical scenario:
> > a) a restore database operation
> > b) fsync off
> > c) write-back on (IDE)
> >
> > As I could understand, in this sceneraio, it´s normal the IDE
> drive be faster than the SCSI, ok?
> >
>
> If fsync is off, then the IDE drive loses its big advantage, so IDE and
> SCSI should be about the same speed.
>
Sorry, I would like to say fsync on instead of fsync off. But I think I understood.

With fsync off the performance should be almost the same (SCSI and IDE), and with fsync on
the IDE will be faster, but data are exposed.

Thanks!


Re: RES: Initial database loading and IDE x SCSI

From
Mark Kirkwood
Date:
Mark Lewis wrote:

>
> The naive approach works on IDE drives because they don't (usually)
> honor the request to write the data immediately, so it can fill its
> write cache up with several megabytes of data and write it out to the
> disk at its leisure.
>

FWIW - If you are using MacOS X or Windows, then later SATA (in
particular, not sure about older IDE) will honor the request to write
immediately, even if the disk write cache is enabled.

I believe that Linux 2.6+ and SATA II will also behave this way (I'm
thinking that write barrier support *is* in 2.6 now - however you would
be wise to follow up on the Linux kernel list if you want to be sure!)

In these cases data integrity becomes similar to SCSI - however, unless
you buy SATA specifically designed for a server type workload (e.g WD
Raptor), then ATA/SATA tend to fail more quickly if used in this way
(e.g. 24/7, hot/dusty environment etc).

Cheers

Mark

Re: RES: Initial database loading and IDE x SCSI

From
Bruce Momjian
Date:
Mark Kirkwood wrote:
> Mark Lewis wrote:
>
> >
> > The naive approach works on IDE drives because they don't (usually)
> > honor the request to write the data immediately, so it can fill its
> > write cache up with several megabytes of data and write it out to the
> > disk at its leisure.
> >
>
> FWIW - If you are using MacOS X or Windows, then later SATA (in
> particular, not sure about older IDE) will honor the request to write
> immediately, even if the disk write cache is enabled.
>
> I believe that Linux 2.6+ and SATA II will also behave this way (I'm
> thinking that write barrier support *is* in 2.6 now - however you would
> be wise to follow up on the Linux kernel list if you want to be sure!)
>
> In these cases data integrity becomes similar to SCSI - however, unless
> you buy SATA specifically designed for a server type workload (e.g WD
> Raptor), then ATA/SATA tend to fail more quickly if used in this way
> (e.g. 24/7, hot/dusty environment etc).

The definitive guide to servers vs. desktop drives is:

  http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf

--
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: RES: Initial database loading and IDE x SCSI

From
Mark Kirkwood
Date:
Bruce Momjian wrote:

>
> The definitive guide to servers vs. desktop drives is:
>
>   http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf
>

Yeah - very nice paper, well worth a read (in spite of the fact that it
is also Seagate propaganda, supporting their marketing position and
terminology!)

Cheers

Mark