Thread: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one

pgsql: Add URLs for : * Speed WAL recovery by allowing more than one

From

momjian@postgresql.org (Bruce Momjian)

Date:

18 March 2008, 03:59:49

Log Message:
-----------
Add URLs for :

* Speed WAL recovery by allowing more than one page to be prefetched

  This involves having a separate process that can be told which pages
  the recovery process will need in the near future.

>   http://archives.postgresql.org/pgsql-general/2007-12/msg00683.php
>   http://archives.postgresql.org/pgsql-hackers/2007-12/msg00497.php
<

Modified Files:
--------------
    pgsql/doc:
        TODO (r1.2345 -> r1.2346)
        (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/TODO?r1=1.2345&r2=1.2346)
    pgsql/doc/src/FAQ:
        TODO.html (r1.853 -> r1.854)
        (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/FAQ/TODO.html?r1=1.853&r2=1.854)

Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one

From

Simon Riggs

Date:

18 March 2008, 09:30:28

On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote:
> Log Message:
> -----------
> Add URLs for :
>
> * Speed WAL recovery by allowing more than one page to be prefetched
>
>   This involves having a separate process that can be told which pages
>   the recovery process will need in the near future.
>
> >   http://archives.postgresql.org/pgsql-general/2007-12/msg00683.php
> >   http://archives.postgresql.org/pgsql-hackers/2007-12/msg00497.php
> <

This TODO item presumes the solution, rather than describes the problem
that needs to be solved. Other solutions have been proposed also and
AFAIK nothing has been agreed.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one

From

Bruce Momjian

Date:

18 March 2008, 15:36:40

Simon Riggs wrote:
> On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote:
> > Log Message:
> > -----------
> > Add URLs for :
> >
> > * Speed WAL recovery by allowing more than one page to be prefetched
> >
> >   This involves having a separate process that can be told which pages
> >   the recovery process will need in the near future.
> >
> > >   http://archives.postgresql.org/pgsql-general/2007-12/msg00683.php
> > >   http://archives.postgresql.org/pgsql-hackers/2007-12/msg00497.php
> > <
>
> This TODO item presumes the solution, rather than describes the problem
> that needs to be solved. Other solutions have been proposed also and
> AFAIK nothing has been agreed.

The general consensus from the discussion was that multi-plexing the I/O
was easier and simpler than trying to multiplex the actual recovery
code.  Until you can get agreement on a more bold approach, the TODO
remains unchanged.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one

From

Simon Riggs

Date:

18 March 2008, 16:42:37

On Tue, 2008-03-18 at 11:36 -0400, Bruce Momjian wrote:
> Simon Riggs wrote:
> > On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote:
> > > Log Message:
> > > -----------
> > > Add URLs for :
> > >
> > > * Speed WAL recovery by allowing more than one page to be prefetched
> > >
> > >   This involves having a separate process that can be told which pages
> > >   the recovery process will need in the near future.
> > >
> > > >   http://archives.postgresql.org/pgsql-general/2007-12/msg00683.php
> > > >   http://archives.postgresql.org/pgsql-hackers/2007-12/msg00497.php
> > > <
> >
> > This TODO item presumes the solution, rather than describes the problem
> > that needs to be solved. Other solutions have been proposed also and
> > AFAIK nothing has been agreed.
>
> The general consensus from the discussion was that multi-plexing the I/O
> was easier and simpler than trying to multiplex the actual recovery
> code.  Until you can get agreement on a more bold approach, the TODO
> remains unchanged.

That's the opposite of what I had understood from the discussion. There
was clear and direct opposition to what you have put in the todo.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one

From

Bruce Momjian

Date:

18 March 2008, 17:37:29

Simon Riggs wrote:
> On Tue, 2008-03-18 at 11:36 -0400, Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote:
> > > > Log Message:
> > > > -----------
> > > > Add URLs for :
> > > >
> > > > * Speed WAL recovery by allowing more than one page to be prefetched
> > > >
> > > >   This involves having a separate process that can be told which pages
> > > >   the recovery process will need in the near future.
> > > >
> > > > >   http://archives.postgresql.org/pgsql-general/2007-12/msg00683.php
> > > > >   http://archives.postgresql.org/pgsql-hackers/2007-12/msg00497.php
> > > > <
> > >
> > > This TODO item presumes the solution, rather than describes the problem
> > > that needs to be solved. Other solutions have been proposed also and
> > > AFAIK nothing has been agreed.
> >
> > The general consensus from the discussion was that multi-plexing the I/O
> > was easier and simpler than trying to multiplex the actual recovery
> > code.  Until you can get agreement on a more bold approach, the TODO
> > remains unchanged.
>
> That's the opposite of what I had understood from the discussion. There
> was clear and direct opposition to what you have put in the todo.

Are you reading the same thread I am?  See:

    http://archives.postgresql.org/pgsql-hackers/2008-02/msg01301.php

If you think the conclusion was wrong, give me a URL!

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one

From

Gregory Stark

Date:

18 March 2008, 20:06:31

"Bruce Momjian" <bruce@momjian.us> writes:

>> > > On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote:
>> > > > * Speed WAL recovery by allowing more than one page to be prefetched
>> > > >
>> > > >   This involves having a separate process that can be told which pages
>> > > >   the recovery process will need in the near future.
>
> Are you reading the same thread I am?  See:
>
>     http://archives.postgresql.org/pgsql-hackers/2008-02/msg01301.php

I don't think there's any consensus for the approach you describe above. If
anything it seemed the least objectionable form was something involving
posix_fadvise or libaio.

Tom did wave us off from Simon's approach on the basis of it being hard to
test and Heikki seemed to be agreeing on the basis that it would be better to
reuse infrastructure useful in other cases as well. So I guess that's some
kind of consensus... of two.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one

From

Bruce Momjian

Date:

18 March 2008, 20:56:33

Gregory Stark wrote:
> "Bruce Momjian" <bruce@momjian.us> writes:
>
> >> > > On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote:
> >> > > > * Speed WAL recovery by allowing more than one page to be prefetched
> >> > > >
> >> > > >   This involves having a separate process that can be told which pages
> >> > > >   the recovery process will need in the near future.
> >
> > Are you reading the same thread I am?  See:
> >
> >     http://archives.postgresql.org/pgsql-hackers/2008-02/msg01301.php
>
> I don't think there's any consensus for the approach you describe above. If
> anything it seemed the least objectionable form was something involving
> posix_fadvise or libaio.
>
> Tom did wave us off from Simon's approach on the basis of it being hard to
> test and Heikki seemed to be agreeing on the basis that it would be better to
> reuse infrastructure useful in other cases as well. So I guess that's some
> kind of consensus... of two.

Yep, that was my analysis too.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one

From

"Heikki Linnakangas"

Date:

18 March 2008, 21:54:45

Bruce Momjian wrote:
> Gregory Stark wrote:
>> "Bruce Momjian" <bruce@momjian.us> writes:
>>
>>>>>> On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote:
>>>>>>> * Speed WAL recovery by allowing more than one page to be prefetched
>>>>>>>
>>>>>>>   This involves having a separate process that can be told which pages
>>>>>>>   the recovery process will need in the near future.
>>> Are you reading the same thread I am?  See:
>>>
>>>     http://archives.postgresql.org/pgsql-hackers/2008-02/msg01301.php
>> I don't think there's any consensus for the approach you describe above. If
>> anything it seemed the least objectionable form was something involving
>> posix_fadvise or libaio.

The least objectionable form is probably Aidan Van Dyk's suggestion of
doing it in restore_command, completely outside the backend. It would
need deep knowledge of the WAL format, but other than that, you could
implement it as a pgfoundry project, side-stepping any objections we on
pgsql-hackers might have :-).

>> Tom did wave us off from Simon's approach on the basis of it being hard to
>> test and Heikki seemed to be agreeing on the basis that it would be better to
>> reuse infrastructure useful in other cases as well. So I guess that's some
>> kind of consensus... of two.
>
> Yep, that was my analysis too.

A "separate process that can be told which pages the recovery process
will need in the future" doesn't imply posix_fadvise or libaio or
anything like that to me. It sounds like a background reader process,
but only one of those is hardly enough.

And it doesn't mention the point Tom raised that we shouldn't invent
anything specific to WAL replay.

How about:

* Speed WAL recovery by allowing more than one page to be prefetched

This should be done utilizing the same infrastructure used for
prefetching in general, to avoid introducing complex, error-prone code
to WAL replay codepath, which doesn't get much testing compared to the
rest of the system.

There's already this TODO item:

> Experiment with multi-threaded backend better resource utilization
>
> This would allow a single query to make use of multiple CPU's or multiple I/O channels simultaneously. One idea is to
createa background reader that can pre-fetch sequential and index scan pages needed by other backends. This could be
expandedto allow concurrent reads from multiple devices in a partitioned table.  

This should probably be split into two. Using multiple CPUs for
satisfying one query is quite different from implementing some kind of a
pre-fetching mechanism using posix_fadvise(), libaio, or background
reader processes.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one

From

Simon Riggs

Date:

18 March 2008, 22:07:03

On Tue, 2008-03-18 at 16:56 -0400, Bruce Momjian wrote:
> Gregory Stark wrote:
> > "Bruce Momjian" <bruce@momjian.us> writes:
> >
> > >> > > On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote:
> > >> > > > * Speed WAL recovery by allowing more than one page to be prefetched
> > >> > > >
> > >> > > >   This involves having a separate process that can be told which pages
> > >> > > >   the recovery process will need in the near future.
> > >
> > > Are you reading the same thread I am?  See:
> > >
> > >     http://archives.postgresql.org/pgsql-hackers/2008-02/msg01301.php
> >
> > I don't think there's any consensus for the approach you describe above. If
> > anything it seemed the least objectionable form was something involving
> > posix_fadvise or libaio.
> >
> > Tom did wave us off from Simon's approach on the basis of it being hard to
> > test and Heikki seemed to be agreeing on the basis that it would be better to
> > reuse infrastructure useful in other cases as well. So I guess that's some
> > kind of consensus... of two.
>
> Yep, that was my analysis too.

It may surprise you but I didn't read Tom's words as being against
"Simon's approach". Personally I read them as a generic warning, which I
agreed with. Maybe Tom can straighten that out.

If you know what "my approach" is, that's good 'cos I'm not sure I do
yet. I said at FOSDEM 2 weeks after this thread that "Multiple slave
processes handle database blocks, based upon hash distribution of
blocks".

We're all agreed that we need to parallelise the work. Somehow. Is it
just the I/O we need to parallelise? Are we sure about that?

Nobody has shown any convincing evidence in favour of, or against,
various flavours of async I/O. In the absence of that I think the
simplest way is normal I/O, with many processes executing it. Maybe I
misread the Developer's FAQ describing why we don't already use async
I/O or other "wizz-bang" features? I'm optimistic about that actually,
but lets see the facts before we take that decision.

So AFAICS I have advocated the less bold approach.

Nobody has even mentioned yet the bgwriter and whether it should be
active during recovery and its possible role in smoothing
restartpointing.

In any case, all I've said here is that we shouldn't put a specific
approach into the TODO. Just state the problem.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

Re: Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one

From

Bruce Momjian

Date:

18 March 2008, 23:36:02

Heikki Linnakangas wrote:
> How about:
>
> * Speed WAL recovery by allowing more than one page to be prefetched
>
> This should be done utilizing the same infrastructure used for
> prefetching in general, to avoid introducing complex, error-prone code
> to WAL replay codepath, which doesn't get much testing compared to the
> rest of the system.

Updated to:

    * Speed WAL recovery by allowing more than one page to be prefetched

      This should be done utilizing the same infrastructure used for
      prefetching in general to avoid introducing complex error-prone code
      in WAL replay.

> There's already this TODO item:
>
> > Experiment with multi-threaded backend better resource utilization
> >
> > This would allow a single query to make use of multiple CPU's or multiple I/O channels simultaneously. One idea is
tocreate a background reader that can pre-fetch sequential and index scan pages needed by other backends. This could be
expandedto allow concurrent reads from multiple devices in a partitioned table.  
>
> This should probably be split into two. Using multiple CPUs for
> satisfying one query is quite different from implementing some kind of a
> pre-fetching mechanism using posix_fadvise(), libaio, or background
> reader processes.

Good idea, items split:

    * Experiment with multi-threaded backend better I/O utilization

      This would allow a single query to make use of multiple I/O channels
      simultaneously.  One idea is to create a background reader that can
      pre-fetch sequential and index scan pages needed by other backends.
      This could be expanded to allow concurrent reads from multiple devices
      in a partitioned table.

    * Experiment with multi-threaded backend better CPU utilization

      This would allow several CPUs to be used for a single query, such as
      for sorting or query execution.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one

From

Alvaro Herrera

Date:

19 March 2008, 00:18:29

Simon Riggs wrote:

> In any case, all I've said here is that we shouldn't put a specific
> approach into the TODO. Just state the problem.

Agreed.  Perhaps the idea stated so far can be listed as possible
alternatives to solving the problem, but they shouldn't be the main body
of the TODO item.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one

From

Bruce Momjian

Date:

19 March 2008, 00:38:00

Alvaro Herrera wrote:
> Simon Riggs wrote:
>
> > In any case, all I've said here is that we shouldn't put a specific
> > approach into the TODO. Just state the problem.
>
> Agreed.  Perhaps the idea stated so far can be listed as possible
> alternatives to solving the problem, but they shouldn't be the main body
> of the TODO item.

We have specific solutions all over the TODO list.  The minimally agreed
approach is multi-page fetching.  If we can get general agreement that
we should do recovery in multiple streams, it will also be added to the
TODO list.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one

From

Tom Lane

Date:

19 March 2008, 05:11:22

Simon Riggs <simon@2ndquadrant.com> writes:
> It may surprise you but I didn't read Tom's words as being against
> "Simon's approach". Personally I read them as a generic warning, which I
> agreed with. Maybe Tom can straighten that out.

AFAIR, I just said that I'd find it hard to trust any complex mechanism
that was being used *only* during WAL replay.  If we want to invent a
pre-reader process, or aio, or whatever, we should try to get it to be
exercised during normal use as well.  We're far more likely to find the
bugs in it that way.

            regards, tom lane