Thread: Performance of COPY for Archive operations

Performance of COPY for Archive operations

From
"Simon Riggs"
Date:
I've spent a while working with PITR functionality on the Win32 port.

I noticed that *it works*, which is always great, but using a COPY command
the archival operation was significantly slower than the writing of the
xlogs themselves.

At one point, I got to being more than 10 xlog files behind with the list
growing steadily, and took a while to clear the logjam when my test
workloads completed. Not much point having archiving thats actually slower
than the writing of xlog....

IIRC the COPY command isn't the best thing to use for bulk-copying on
Windows, but I can't remember what is better. Anybody?

My tests were conducted on a small test server, but the imbalance between
xlog write/copy is worrying. I have 1 Gb RAM, which was nowhere near full
during testing. CPU was extremely low, so I'm guessing COPY has some bad I/O
characteristics.

Of course, I don't expect to be using COPY in production much...but others
will, so I want to sort this out. Feel free to point out the obvious....if
it exists,

Best regards,

Simon Riggs



Re: Performance of COPY for Archive operations

From
Bruce Momjian
Date:
Simon Riggs wrote:
>
> I've spent a while working with PITR functionality on the Win32 port.
>
> I noticed that *it works*, which is always great, but using a COPY command
> the archival operation was significantly slower than the writing of the
> xlogs themselves.
>
> At one point, I got to being more than 10 xlog files behind with the list
> growing steadily, and took a while to clear the logjam when my test
> workloads completed. Not much point having archiving thats actually slower
> than the writing of xlog....

Why was it slow?  'cp' was slower than the WAL writes?  Seems strange to
me.   Do we have some sleep loop in there that is causing us to read
that directory too slowly?  I didn't think so.

> IIRC the COPY command isn't the best thing to use for bulk-copying on
> Windows, but I can't remember what is better. Anybody?

COPY is the fastest way to get data in and out of PostgreSQL.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Performance of COPY for Archive operations

From
"Simon Riggs"
Date:
> Bruce Momjian wrote
> Simon Riggs wrote:
> >
> > I've spent a while working with PITR functionality on the Win32 port.
> >
> > I noticed that *it works*, which is always great, but using a
> COPY command
> > the archival operation was significantly slower than the writing of the
> > xlogs themselves.
> >
> > At one point, I got to being more than 10 xlog files behind
> with the list
> > growing steadily, and took a while to clear the logjam when my test
> > workloads completed. Not much point having archiving thats
> actually slower
> > than the writing of xlog....
>
> Why was it slow?  'cp' was slower than the WAL writes?  Seems strange to
> me.   Do we have some sleep loop in there that is causing us to read
> that directory too slowly?  I didn't think so.
>

(Win32 COPY, not cp.)

Yes, it seemed strange, that's why I mention it... nothing like that on
linux.

When there are multiple files ready for archive, ARCHIVER loops until
they're all done. You're right, it could conceivably be something to do with
directory access speed, but I'm thinking that the NT COPY command itself has
some strangeness.

My test involved writing 1 million records, each > 4k to a table using an
Insert Select. The Server had a single disk, but there's no reason to expect
that head movement on the disk would favour one process over another. That's
probably THE most common setup for people using the Windows version anyway,
so it is important.

I note also Mark Wong's recent large scale benchmark that showed less than a
1% overhead from archiving.

> > IIRC the COPY command isn't the best thing to use for bulk-copying on
> > Windows, but I can't remember what is better. Anybody?
>
> COPY is the fastest way to get data in and out of PostgreSQL.

Agreed....but I meant copying NT files around using the NT COPY command, not
the PostgreSQL COPY command.

I had some performance issues in '98 related to this - just hoping some
Win32 wiz will educate me...

...


More importantly, can anybody repeat this result? I performed this twice,
with the same results each time.

Thanks,

Best Regards, Simon Riggs



Re: Performance of COPY for Archive operations

From
Bruce Momjian
Date:
I can imagine WAL writing as fast as MS COPY, and I can imagine MS COPY
lagging behind on an I/O bound system.  Remind me, how does the archvier
know a WAL file is full?

Suppose the system is 100% I/O bound.  Archiver can just keep up with
WAL, but if WAL gets a lead, can archiver catch up?  Basically archiver
can never get ahead of WAL but WAL can get ahead of archiver.
Statistically does that cause a consistent lag?  I am not sure.

---------------------------------------------------------------------------

Simon Riggs wrote:
> > Bruce Momjian wrote
> > Simon Riggs wrote:
> > >
> > > I've spent a while working with PITR functionality on the Win32 port.
> > >
> > > I noticed that *it works*, which is always great, but using a
> > COPY command
> > > the archival operation was significantly slower than the writing of the
> > > xlogs themselves.
> > >
> > > At one point, I got to being more than 10 xlog files behind
> > with the list
> > > growing steadily, and took a while to clear the logjam when my test
> > > workloads completed. Not much point having archiving thats
> > actually slower
> > > than the writing of xlog....
> >
> > Why was it slow?  'cp' was slower than the WAL writes?  Seems strange to
> > me.   Do we have some sleep loop in there that is causing us to read
> > that directory too slowly?  I didn't think so.
> >
>
> (Win32 COPY, not cp.)
>
> Yes, it seemed strange, that's why I mention it... nothing like that on
> linux.
>
> When there are multiple files ready for archive, ARCHIVER loops until
> they're all done. You're right, it could conceivably be something to do with
> directory access speed, but I'm thinking that the NT COPY command itself has
> some strangeness.
>
> My test involved writing 1 million records, each > 4k to a table using an
> Insert Select. The Server had a single disk, but there's no reason to expect
> that head movement on the disk would favour one process over another. That's
> probably THE most common setup for people using the Windows version anyway,
> so it is important.
>
> I note also Mark Wong's recent large scale benchmark that showed less than a
> 1% overhead from archiving.
>
> > > IIRC the COPY command isn't the best thing to use for bulk-copying on
> > > Windows, but I can't remember what is better. Anybody?
> >
> > COPY is the fastest way to get data in and out of PostgreSQL.
>
> Agreed....but I meant copying NT files around using the NT COPY command, not
> the PostgreSQL COPY command.
>
> I had some performance issues in '98 related to this - just hoping some
> Win32 wiz will educate me...
>
> ...
>
>
> More importantly, can anybody repeat this result? I performed this twice,
> with the same results each time.
>
> Thanks,
>
> Best Regards, Simon Riggs
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Performance of COPY for Archive operations

From
"Simon Riggs"
Date:
Bruce Momjian wrote:
> I can imagine WAL writing as fast as MS COPY, and I can imagine MS COPY
> lagging behind on an I/O bound system.  Remind me, how does the archvier
> know a WAL file is full?
>
> Suppose the system is 100% I/O bound.  Archiver can just keep up with
> WAL, but if WAL gets a lead, can archiver catch up?  Basically archiver
> can never get ahead of WAL but WAL can get ahead of archiver.
> Statistically does that cause a consistent lag?  I am not sure.
>

Well my thinking on this is:
You're right, the whole thing is I/O bound.
The backend writing WAL was competing with the archiver COPY process which
was both reading and then writing WAL. That means that the backend is
literally doing half the work of the COPY process - as a result, the COPY
process doesn't just lag behind, it gets progressively further and further
behind. The only point at which it can catch up is when the long running
transaction stops.

The moral of this story is: don't archive transaction logs to the same disk
that pg_xlog is on. Many will say, "we knew that anyway" - but my feeling is
that many Windows systems are configured very simply, with just one disk or
one volume of RAID disks.

Nothing that surprising there, though I think I would like to put a WARNING
message into the Archiver that triggers if more than CHECKPOINT_SEGMENTS WAL
files are ready to archive at any one time. Though maybe that would cause
more problems than it would solve: "Archiving of transaction logs cannot
keep up with system activity. If this occurs regularly, you should
reconsider your database-disk layout"

Is that a TODO item, or a hotfix for 8.0 beta?

Best Regards, Simon Riggs


Re: Performance of COPY for Archive operations

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> Nothing that surprising there, though I think I would like to put a WARNING
> message into the Archiver that triggers if more than CHECKPOINT_SEGMENTS WAL
> files are ready to archive at any one time. Though maybe that would cause
> more problems than it would solve: "Archiving of transaction logs cannot
> keep up with system activity. If this occurs regularly, you should
> reconsider your database-disk layout"

Can't see the value of this.  The problem will be readily apparent from
growth of the pg_xlog directory --- anyone who doesn't notice that
probably isn't perusing the postmaster log either.  Also, once it starts
to bleat, what's going to make it stop?  Filling the disk with warning
messages won't be a constructive improvement on the situation :-(

            regards, tom lane

Re: Performance of COPY for Archive operations

From
"Simon Riggs"
Date:
>Tom Lane wrote
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > Nothing that surprising there, though I think I would like to
> put a WARNING
> > message into the Archiver that triggers if more than
> CHECKPOINT_SEGMENTS WAL
> > files are ready to archive at any one time. Though maybe that
> would cause
> > more problems than it would solve: "Archiving of transaction logs cannot
> > keep up with system activity. If this occurs regularly, you should
> > reconsider your database-disk layout"
>
> Can't see the value of this.  The problem will be readily apparent from
> growth of the pg_xlog directory --- anyone who doesn't notice that
> probably isn't perusing the postmaster log either.

Hmmm, message levels were a point we differed on previously, IIRC.

Certainly, if the growth happened over a long period, then I'd agree - the
admin should have spotted it.

If the behaviour were more volatile, then the admin might not spot it - the
effects are only shown when the system becomes I/O bound, which might be
regularly at peak loading, but never long enough to notice. I had considered
just such volatility in the design, though with regard to operator induced
behaviour like tape changes or deliberate batching of log files.

The issue is that by falling behind the archiver is increasing the
transaction loss window, possibly undermining somewhat the purpose of PITR.

The message shows in the log long after the situation occurred and the space
increase has disipated. The admin may never look at the logs, agreed, but if
the message isn't there they certainly will never notice. You and I will
know, because when the crash occurs, we'll get a pattern of error messages
we'll recognise, but thats not much help to the admin.

Do we wait for such a crash before we add the hint?

> Also, once it starts
> to bleat, what's going to make it stop?  Filling the disk with warning
> messages won't be a constructive improvement on the situation :-(

Filling the disk with log messages would be pointless, agreed.

If the message appeared as part of the normal archiver cycle, then the
message would only appear once per 2*CHECKPOINT_SEGMENTS "transaction log
archived" and "transaction log recycled" messages. Thus no more likely to
fill up the disk.

Of course, the archiver could always report less frequently, since it keeps
state between cycles.

I'm not in a rush to add this, just think its needed, based upon my
observations on Windows.

Best Regards, Simon Riggs


Re: Performance of COPY for Archive operations

From
Bruce Momjian
Date:
We already have a warning that prints when checkpoints happen too
frequently.   I wonder if we should print a warning if the number of WAL
records doubles from its maximum which is checkpoint_segments*2+1 I
think.

---------------------------------------------------------------------------

Simon Riggs wrote:
> >Tom Lane wrote
> > "Simon Riggs" <simon@2ndquadrant.com> writes:
> > > Nothing that surprising there, though I think I would like to
> > put a WARNING
> > > message into the Archiver that triggers if more than
> > CHECKPOINT_SEGMENTS WAL
> > > files are ready to archive at any one time. Though maybe that
> > would cause
> > > more problems than it would solve: "Archiving of transaction logs cannot
> > > keep up with system activity. If this occurs regularly, you should
> > > reconsider your database-disk layout"
> >
> > Can't see the value of this.  The problem will be readily apparent from
> > growth of the pg_xlog directory --- anyone who doesn't notice that
> > probably isn't perusing the postmaster log either.
>
> Hmmm, message levels were a point we differed on previously, IIRC.
>
> Certainly, if the growth happened over a long period, then I'd agree - the
> admin should have spotted it.
>
> If the behaviour were more volatile, then the admin might not spot it - the
> effects are only shown when the system becomes I/O bound, which might be
> regularly at peak loading, but never long enough to notice. I had considered
> just such volatility in the design, though with regard to operator induced
> behaviour like tape changes or deliberate batching of log files.
>
> The issue is that by falling behind the archiver is increasing the
> transaction loss window, possibly undermining somewhat the purpose of PITR.
>
> The message shows in the log long after the situation occurred and the space
> increase has disipated. The admin may never look at the logs, agreed, but if
> the message isn't there they certainly will never notice. You and I will
> know, because when the crash occurs, we'll get a pattern of error messages
> we'll recognise, but thats not much help to the admin.
>
> Do we wait for such a crash before we add the hint?
>
> > Also, once it starts
> > to bleat, what's going to make it stop?  Filling the disk with warning
> > messages won't be a constructive improvement on the situation :-(
>
> Filling the disk with log messages would be pointless, agreed.
>
> If the message appeared as part of the normal archiver cycle, then the
> message would only appear once per 2*CHECKPOINT_SEGMENTS "transaction log
> archived" and "transaction log recycled" messages. Thus no more likely to
> fill up the disk.
>
> Of course, the archiver could always report less frequently, since it keeps
> state between cycles.
>
> I'm not in a rush to add this, just think its needed, based upon my
> observations on Windows.
>
> Best Regards, Simon Riggs
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Performance of COPY for Archive operations

From
Bruce Momjian
Date:
I guess we will wait to see what problem reports we get from users.

---------------------------------------------------------------------------

Bruce Momjian wrote:
>
> We already have a warning that prints when checkpoints happen too
> frequently.   I wonder if we should print a warning if the number of WAL
> records doubles from its maximum which is checkpoint_segments*2+1 I
> think.
>
> ---------------------------------------------------------------------------
>
> Simon Riggs wrote:
> > >Tom Lane wrote
> > > "Simon Riggs" <simon@2ndquadrant.com> writes:
> > > > Nothing that surprising there, though I think I would like to
> > > put a WARNING
> > > > message into the Archiver that triggers if more than
> > > CHECKPOINT_SEGMENTS WAL
> > > > files are ready to archive at any one time. Though maybe that
> > > would cause
> > > > more problems than it would solve: "Archiving of transaction logs cannot
> > > > keep up with system activity. If this occurs regularly, you should
> > > > reconsider your database-disk layout"
> > >
> > > Can't see the value of this.  The problem will be readily apparent from
> > > growth of the pg_xlog directory --- anyone who doesn't notice that
> > > probably isn't perusing the postmaster log either.
> >
> > Hmmm, message levels were a point we differed on previously, IIRC.
> >
> > Certainly, if the growth happened over a long period, then I'd agree - the
> > admin should have spotted it.
> >
> > If the behaviour were more volatile, then the admin might not spot it - the
> > effects are only shown when the system becomes I/O bound, which might be
> > regularly at peak loading, but never long enough to notice. I had considered
> > just such volatility in the design, though with regard to operator induced
> > behaviour like tape changes or deliberate batching of log files.
> >
> > The issue is that by falling behind the archiver is increasing the
> > transaction loss window, possibly undermining somewhat the purpose of PITR.
> >
> > The message shows in the log long after the situation occurred and the space
> > increase has disipated. The admin may never look at the logs, agreed, but if
> > the message isn't there they certainly will never notice. You and I will
> > know, because when the crash occurs, we'll get a pattern of error messages
> > we'll recognise, but thats not much help to the admin.
> >
> > Do we wait for such a crash before we add the hint?
> >
> > > Also, once it starts
> > > to bleat, what's going to make it stop?  Filling the disk with warning
> > > messages won't be a constructive improvement on the situation :-(
> >
> > Filling the disk with log messages would be pointless, agreed.
> >
> > If the message appeared as part of the normal archiver cycle, then the
> > message would only appear once per 2*CHECKPOINT_SEGMENTS "transaction log
> > archived" and "transaction log recycled" messages. Thus no more likely to
> > fill up the disk.
> >
> > Of course, the archiver could always report less frequently, since it keeps
> > state between cycles.
> >
> > I'm not in a rush to add this, just think its needed, based upon my
> > observations on Windows.
> >
> > Best Regards, Simon Riggs
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: the planner will ignore your desire to choose an index scan if your
> >       joining column's datatypes do not match
> >
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Performance of COPY for Archive operations

From
Reini Urban
Date:
Bruce Momjian schrieb:
> I guess we will wait to see what problem reports we get from users.
> ---------------------------------------------------------------------------
> Bruce Momjian wrote:
>>We already have a warning that prints when checkpoints happen too
>>frequently.   I wonder if we should print a warning if the number of WAL
>>records doubles from its maximum which is checkpoint_segments*2+1 I
>>think.

I can confirm this also for cygwin.
checkpoints happen too frequently with the default settings.
--
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/