Thread: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one
pgsql: Add URLs for : * Speed WAL recovery by allowing more than one
From
momjian@postgresql.org (Bruce Momjian)
Date:
Log Message: ----------- Add URLs for : * Speed WAL recovery by allowing more than one page to be prefetched This involves having a separate process that can be told which pages the recovery process will need in the near future. > http://archives.postgresql.org/pgsql-general/2007-12/msg00683.php > http://archives.postgresql.org/pgsql-hackers/2007-12/msg00497.php < Modified Files: -------------- pgsql/doc: TODO (r1.2345 -> r1.2346) (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/TODO?r1=1.2345&r2=1.2346) pgsql/doc/src/FAQ: TODO.html (r1.853 -> r1.854) (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/FAQ/TODO.html?r1=1.853&r2=1.854)
On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote: > Log Message: > ----------- > Add URLs for : > > * Speed WAL recovery by allowing more than one page to be prefetched > > This involves having a separate process that can be told which pages > the recovery process will need in the near future. > > > http://archives.postgresql.org/pgsql-general/2007-12/msg00683.php > > http://archives.postgresql.org/pgsql-hackers/2007-12/msg00497.php > < This TODO item presumes the solution, rather than describes the problem that needs to be solved. Other solutions have been proposed also and AFAIK nothing has been agreed. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
Simon Riggs wrote: > On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote: > > Log Message: > > ----------- > > Add URLs for : > > > > * Speed WAL recovery by allowing more than one page to be prefetched > > > > This involves having a separate process that can be told which pages > > the recovery process will need in the near future. > > > > > http://archives.postgresql.org/pgsql-general/2007-12/msg00683.php > > > http://archives.postgresql.org/pgsql-hackers/2007-12/msg00497.php > > < > > This TODO item presumes the solution, rather than describes the problem > that needs to be solved. Other solutions have been proposed also and > AFAIK nothing has been agreed. The general consensus from the discussion was that multi-plexing the I/O was easier and simpler than trying to multiplex the actual recovery code. Until you can get agreement on a more bold approach, the TODO remains unchanged. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tue, 2008-03-18 at 11:36 -0400, Bruce Momjian wrote: > Simon Riggs wrote: > > On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote: > > > Log Message: > > > ----------- > > > Add URLs for : > > > > > > * Speed WAL recovery by allowing more than one page to be prefetched > > > > > > This involves having a separate process that can be told which pages > > > the recovery process will need in the near future. > > > > > > > http://archives.postgresql.org/pgsql-general/2007-12/msg00683.php > > > > http://archives.postgresql.org/pgsql-hackers/2007-12/msg00497.php > > > < > > > > This TODO item presumes the solution, rather than describes the problem > > that needs to be solved. Other solutions have been proposed also and > > AFAIK nothing has been agreed. > > The general consensus from the discussion was that multi-plexing the I/O > was easier and simpler than trying to multiplex the actual recovery > code. Until you can get agreement on a more bold approach, the TODO > remains unchanged. That's the opposite of what I had understood from the discussion. There was clear and direct opposition to what you have put in the todo. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
Simon Riggs wrote: > On Tue, 2008-03-18 at 11:36 -0400, Bruce Momjian wrote: > > Simon Riggs wrote: > > > On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote: > > > > Log Message: > > > > ----------- > > > > Add URLs for : > > > > > > > > * Speed WAL recovery by allowing more than one page to be prefetched > > > > > > > > This involves having a separate process that can be told which pages > > > > the recovery process will need in the near future. > > > > > > > > > http://archives.postgresql.org/pgsql-general/2007-12/msg00683.php > > > > > http://archives.postgresql.org/pgsql-hackers/2007-12/msg00497.php > > > > < > > > > > > This TODO item presumes the solution, rather than describes the problem > > > that needs to be solved. Other solutions have been proposed also and > > > AFAIK nothing has been agreed. > > > > The general consensus from the discussion was that multi-plexing the I/O > > was easier and simpler than trying to multiplex the actual recovery > > code. Until you can get agreement on a more bold approach, the TODO > > remains unchanged. > > That's the opposite of what I had understood from the discussion. There > was clear and direct opposition to what you have put in the todo. Are you reading the same thread I am? See: http://archives.postgresql.org/pgsql-hackers/2008-02/msg01301.php If you think the conclusion was wrong, give me a URL! -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"Bruce Momjian" <bruce@momjian.us> writes: >> > > On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote: >> > > > * Speed WAL recovery by allowing more than one page to be prefetched >> > > > >> > > > This involves having a separate process that can be told which pages >> > > > the recovery process will need in the near future. > > Are you reading the same thread I am? See: > > http://archives.postgresql.org/pgsql-hackers/2008-02/msg01301.php I don't think there's any consensus for the approach you describe above. If anything it seemed the least objectionable form was something involving posix_fadvise or libaio. Tom did wave us off from Simon's approach on the basis of it being hard to test and Heikki seemed to be agreeing on the basis that it would be better to reuse infrastructure useful in other cases as well. So I guess that's some kind of consensus... of two. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
Gregory Stark wrote: > "Bruce Momjian" <bruce@momjian.us> writes: > > >> > > On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote: > >> > > > * Speed WAL recovery by allowing more than one page to be prefetched > >> > > > > >> > > > This involves having a separate process that can be told which pages > >> > > > the recovery process will need in the near future. > > > > Are you reading the same thread I am? See: > > > > http://archives.postgresql.org/pgsql-hackers/2008-02/msg01301.php > > I don't think there's any consensus for the approach you describe above. If > anything it seemed the least objectionable form was something involving > posix_fadvise or libaio. > > Tom did wave us off from Simon's approach on the basis of it being hard to > test and Heikki seemed to be agreeing on the basis that it would be better to > reuse infrastructure useful in other cases as well. So I guess that's some > kind of consensus... of two. Yep, that was my analysis too. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Re: Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one
From
"Heikki Linnakangas"
Date:
Bruce Momjian wrote: > Gregory Stark wrote: >> "Bruce Momjian" <bruce@momjian.us> writes: >> >>>>>> On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote: >>>>>>> * Speed WAL recovery by allowing more than one page to be prefetched >>>>>>> >>>>>>> This involves having a separate process that can be told which pages >>>>>>> the recovery process will need in the near future. >>> Are you reading the same thread I am? See: >>> >>> http://archives.postgresql.org/pgsql-hackers/2008-02/msg01301.php >> I don't think there's any consensus for the approach you describe above. If >> anything it seemed the least objectionable form was something involving >> posix_fadvise or libaio. The least objectionable form is probably Aidan Van Dyk's suggestion of doing it in restore_command, completely outside the backend. It would need deep knowledge of the WAL format, but other than that, you could implement it as a pgfoundry project, side-stepping any objections we on pgsql-hackers might have :-). >> Tom did wave us off from Simon's approach on the basis of it being hard to >> test and Heikki seemed to be agreeing on the basis that it would be better to >> reuse infrastructure useful in other cases as well. So I guess that's some >> kind of consensus... of two. > > Yep, that was my analysis too. A "separate process that can be told which pages the recovery process will need in the future" doesn't imply posix_fadvise or libaio or anything like that to me. It sounds like a background reader process, but only one of those is hardly enough. And it doesn't mention the point Tom raised that we shouldn't invent anything specific to WAL replay. How about: * Speed WAL recovery by allowing more than one page to be prefetched This should be done utilizing the same infrastructure used for prefetching in general, to avoid introducing complex, error-prone code to WAL replay codepath, which doesn't get much testing compared to the rest of the system. There's already this TODO item: > Experiment with multi-threaded backend better resource utilization > > This would allow a single query to make use of multiple CPU's or multiple I/O channels simultaneously. One idea is to createa background reader that can pre-fetch sequential and index scan pages needed by other backends. This could be expandedto allow concurrent reads from multiple devices in a partitioned table. This should probably be split into two. Using multiple CPUs for satisfying one query is quite different from implementing some kind of a pre-fetching mechanism using posix_fadvise(), libaio, or background reader processes. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2008-03-18 at 16:56 -0400, Bruce Momjian wrote: > Gregory Stark wrote: > > "Bruce Momjian" <bruce@momjian.us> writes: > > > > >> > > On Tue, 2008-03-18 at 03:59 +0000, Bruce Momjian wrote: > > >> > > > * Speed WAL recovery by allowing more than one page to be prefetched > > >> > > > > > >> > > > This involves having a separate process that can be told which pages > > >> > > > the recovery process will need in the near future. > > > > > > Are you reading the same thread I am? See: > > > > > > http://archives.postgresql.org/pgsql-hackers/2008-02/msg01301.php > > > > I don't think there's any consensus for the approach you describe above. If > > anything it seemed the least objectionable form was something involving > > posix_fadvise or libaio. > > > > Tom did wave us off from Simon's approach on the basis of it being hard to > > test and Heikki seemed to be agreeing on the basis that it would be better to > > reuse infrastructure useful in other cases as well. So I guess that's some > > kind of consensus... of two. > > Yep, that was my analysis too. It may surprise you but I didn't read Tom's words as being against "Simon's approach". Personally I read them as a generic warning, which I agreed with. Maybe Tom can straighten that out. If you know what "my approach" is, that's good 'cos I'm not sure I do yet. I said at FOSDEM 2 weeks after this thread that "Multiple slave processes handle database blocks, based upon hash distribution of blocks". We're all agreed that we need to parallelise the work. Somehow. Is it just the I/O we need to parallelise? Are we sure about that? Nobody has shown any convincing evidence in favour of, or against, various flavours of async I/O. In the absence of that I think the simplest way is normal I/O, with many processes executing it. Maybe I misread the Developer's FAQ describing why we don't already use async I/O or other "wizz-bang" features? I'm optimistic about that actually, but lets see the facts before we take that decision. So AFAICS I have advocated the less bold approach. Nobody has even mentioned yet the bgwriter and whether it should be active during recovery and its possible role in smoothing restartpointing. In any case, all I've said here is that we shouldn't put a specific approach into the TODO. Just state the problem. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
Re: Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one
From
Bruce Momjian
Date:
Heikki Linnakangas wrote: > How about: > > * Speed WAL recovery by allowing more than one page to be prefetched > > This should be done utilizing the same infrastructure used for > prefetching in general, to avoid introducing complex, error-prone code > to WAL replay codepath, which doesn't get much testing compared to the > rest of the system. Updated to: * Speed WAL recovery by allowing more than one page to be prefetched This should be done utilizing the same infrastructure used for prefetching in general to avoid introducing complex error-prone code in WAL replay. > There's already this TODO item: > > > Experiment with multi-threaded backend better resource utilization > > > > This would allow a single query to make use of multiple CPU's or multiple I/O channels simultaneously. One idea is tocreate a background reader that can pre-fetch sequential and index scan pages needed by other backends. This could be expandedto allow concurrent reads from multiple devices in a partitioned table. > > This should probably be split into two. Using multiple CPUs for > satisfying one query is quite different from implementing some kind of a > pre-fetching mechanism using posix_fadvise(), libaio, or background > reader processes. Good idea, items split: * Experiment with multi-threaded backend better I/O utilization This would allow a single query to make use of multiple I/O channels simultaneously. One idea is to create a background reader that can pre-fetch sequential and index scan pages needed by other backends. This could be expanded to allow concurrent reads from multiple devices in a partitioned table. * Experiment with multi-threaded backend better CPU utilization This would allow several CPUs to be used for a single query, such as for sorting or query execution. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Re: Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one
From
Alvaro Herrera
Date:
Simon Riggs wrote: > In any case, all I've said here is that we shouldn't put a specific > approach into the TODO. Just state the problem. Agreed. Perhaps the idea stated so far can be listed as possible alternatives to solving the problem, but they shouldn't be the main body of the TODO item. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Re: Re: pgsql: Add URLs for : * Speed WAL recovery by allowing more than one
From
Bruce Momjian
Date:
Alvaro Herrera wrote: > Simon Riggs wrote: > > > In any case, all I've said here is that we shouldn't put a specific > > approach into the TODO. Just state the problem. > > Agreed. Perhaps the idea stated so far can be listed as possible > alternatives to solving the problem, but they shouldn't be the main body > of the TODO item. We have specific solutions all over the TODO list. The minimally agreed approach is multi-page fetching. If we can get general agreement that we should do recovery in multiple streams, it will also be added to the TODO list. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Simon Riggs <simon@2ndquadrant.com> writes: > It may surprise you but I didn't read Tom's words as being against > "Simon's approach". Personally I read them as a generic warning, which I > agreed with. Maybe Tom can straighten that out. AFAIR, I just said that I'd find it hard to trust any complex mechanism that was being used *only* during WAL replay. If we want to invent a pre-reader process, or aio, or whatever, we should try to get it to be exercised during normal use as well. We're far more likely to find the bugs in it that way. regards, tom lane