Thread: Multi CPU Queries - Feedback and/or suggestions wanted!

Multi CPU Queries - Feedback and/or suggestions wanted!

From

Julius Stroffek

Date:

20 October 2008, 16:20:35

Hi All,

we would like to start some work on improving the performance of
PostgreSQL in a multi-CPU environment. Dano Vojtek is student at the
Faculty of Mathematics and Physics of Charles university in Prague
(http://www.mff.cuni.cz) and he is going to cover this topic in his
master thesis. He is going to do some investigation in the methods and
write down the possibilities and then he is going to implement something
from that for PostgreSQL.

We want to come out with a serious proposal for this work after
collecting the feedback/opinions and doing the more serious investigation.

Topics that seem to be of interest and most of them were already
discussed at developers meeting in Ottawa are
1.) parallel sorts
2.) parallel query execution
3.) asynchronous I/O
4.) parallel COPY
5.) parallel pg_dump
6.) using threads for parallel processing

A scaling with increasing number of CPUs in 1.) and 2.) will face with
the I/O bottleneck at some point and the benefit gained here should be
nearly the same as for 3.) - the OS or disk could do a better job while
scheduling multiple reads from the disk for the same query at the same time.

1.)
More merges could be executed on different CPUs. However, one N-way
merge on one CPU is probably better than two N/2-way merges on 2 CPUs
while sharing the limit of work_mem together for these. This is specific
and separate from 2.) or 3.) and if something implemented here it could
probably share just the parallel infrastructure code.
========

2.)
Different subtrees (or nodes) of the plan could be executed in parallel
on different CPUs and the results of this subtrees could be requested
either synchronously or asynchronously.
========

3.)
The simplest possible way is to change the scan nodes that they will
send out the asynchronous  I/O requests for the next blocks before they
manage to run out of tuples in the block they are going through. The
more advanced way would arise just by implementing 2.) which will then
lead to different scan nodes to be executed on different CPUs at the
same time.
========

4.) and 5.)
We do not want to focus here, since there are on-going projects already.
========

6.)
Currently, threads are not used in PostgreSQL (except some cases for 
Windows OS). Generally using them would bring some problems

a) different thread implementations on different OSes
b) crash of the whole process if the problem happens in one thread.
Backends are isolated and the problem in one backend leads to the
graceful shut down of other backends.
c) synchronization problems

* a) seem just to be more for implementation. Is there any problem with 
execution of more threads on any supported OS? Like some planning issue 
that all the threads for the same process end up planned on the same 
CPU? Or something similar?

* b) is fine with using more threads for processing the same query in 
the same backend - if one crashes others could do the graceful shutdown.

* c) does not have to be solved in general because the work of
all the threads will be synchronized and we could expect pretty well 
which data are being accessed by which thread. The memory allocation 
have to be adjusted to be thread safe and should not affect the 
performance (Is different memory context for different threads 
sufficient?). Other common code might need some changes as well. 
Possibly, the synchronization/critical section exclusion could be done 
in executor and only if needed.

* Using processes instead of threads makes other things more complex  - sharing objects between processes might need
muchmore coding  - more overhead during execution and synchronization
 
========

It seems to that it makes sense to start working on 2) and 3) and we
would like to think of using more threads for processing the same query
within one backend.

We appreciate feedback, comments and/or suggestions.

Cheers

Julo

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

"Jeffrey Baker"

Date:

21 October 2008, 02:25:21

On Mon, Oct 20, 2008 at 12:05 PM, Julius Stroffek <Julius.Stroffek@sun.com> wrote:

Topics that seem to be of interest and most of them were already
discussed at developers meeting in Ottawa are
1.) parallel sorts
2.) parallel query execution
3.) asynchronous I/O
4.) parallel COPY
5.) parallel pg_dump
6.) using threads for parallel processing

[...]

2.)
Different subtrees (or nodes) of the plan could be executed in parallel
on different CPUs and the results of this subtrees could be requested
either synchronously or asynchronously.

I don't see why multiple CPUs can't work on the same node of a plan. For instance, consider a node involving a scan with an expensive condition, like UTF-8 string length. If you have four CPUs you can bring to bear, each CPU could take every fourth page, computing the expensive condition for each tuple in that page. The results of the scan can be retired asynchronously to the next node above.

-jwb

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

"Chuck McDevitt"

Date:

21 October 2008, 03:36:29

There is a problem trying to make Postgres do these things in Parallel.

The backend code isn’t thread-safe, so doing a multi-thread implementation requires quite a bit of work.

Using multiple processes has its own problems: The whole way locking works equates one process with one transaction (The proc table is one entry per process). Processes would conflict on locks, deadlocking themselves, as well as many other problems.

It’s all a good idea, but the work is probably far more than you expect.

Async I/O might be easier, if you used pThreads, which is mostly portable, but not to all platforms. (Yes, they do work on Windows)

From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Jeffrey Baker
Sent: 2008-10-20 22:25
To: Julius Stroffek
Cc: pgsql-hackers@postgresql.org; Dano Vojtek
Subject: Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!

On Mon, Oct 20, 2008 at 12:05 PM, Julius Stroffek <Julius.Stroffek@sun.com> wrote:

Topics that seem to be of interest and most of them were already
discussed at developers meeting in Ottawa are
1.) parallel sorts
2.) parallel query execution
3.) asynchronous I/O
4.) parallel COPY
5.) parallel pg_dump
6.) using threads for parallel processing

[...]

2.)
Different subtrees (or nodes) of the plan could be executed in parallel
on different CPUs and the results of this subtrees could be requested
either synchronously or asynchronously.

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Julius Stroffek

Date:

21 October 2008, 17:51:45

Hi Jeffrey,

thank you for the suggestion. Yes, they potentially can, we'll consider 
this.

Julo

Jeffrey Baker wrote:
> I don't see why multiple CPUs can't work on the same node of a plan.  
> For instance, consider a node involving a scan with an expensive 
> condition, like UTF-8 string length.  If you have four CPUs you can 
> bring to bear, each CPU could take every fourth page, computing the 
> expensive condition for each tuple in that page.  The results of the 
> scan can be retired asynchronously to the next node above.


>
> -jwb

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Myron Scott

Date:

21 October 2008, 20:03:32

I can confirm that bringing Postgres code to multi-thread implementation
requires quite a bit of ground work.  I have been working for a long
while
with a Postgres 7.* fork that uses pthreads rather than processes.
The effort
to make all the subsystems thread safe took some time and touched
almost every section of the codebase.

I recently spent some time trying to optimize for Chip Multi-Threading
systems but focused more on total throughput rather than single query
performance.  The biggest wins came from changing some coarse
grained locks in the page buffering system to a finer grained
implementation.

I also tried to improve single query performance by splitting index and
sequential scans into two threads, one to fault in pages and check tuple
visibility and the other for everything else.  My success was limited
and
it was hard for me to work the proper costing into the query optimizer
so
that it fired at the right times.

One place that multiple threads really helped was in index building.

My code is poorly commented and the build system is a mess (I am only
building 64bit SPARC for embedding into another app).  However, I am
using it in production and source is available if it's of any help.

http://weaver2.dev.java.net

Myron Scott

On Oct 20, 2008, at 11:28 PM, Chuck McDevitt wrote:

> There is a problem trying to make Postgres do these things in
> Parallel.
>
> The backend code isn’t thread-safe, so doing a multi-thread
> implementation requires quite  a bit of work.
>
> Using multiple processes has its own problems:  The whole way
> locking works equates one process with one transaction (The proc
> table is one entry per process).  Processes would conflict on locks,
> deadlocking themselves, as well as many other problems.
>
> It’s all a good idea, but the work is probably far more than you
> expect.
>
> Async I/O might be easier, if you used pThreads, which is mostly
> portable, but not to all platforms.  (Yes, they do work on Windows)
>
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org
> ] On Behalf Of Jeffrey Baker
> Sent: 2008-10-20 22:25
> To: Julius Stroffek
> Cc: pgsql-hackers@postgresql.org; Dano Vojtek
> Subject: Re: [HACKERS] Multi CPU Queries - Feedback and/or
> suggestions wanted!
>
> On Mon, Oct 20, 2008 at 12:05 PM, Julius Stroffek <Julius.Stroffek@sun.com
> > wrote:
> Topics that seem to be of interest and most of them were already
> discussed at developers meeting in Ottawa are
> 1.) parallel sorts
> 2.) parallel query execution
> 3.) asynchronous I/O
> 4.) parallel COPY
> 5.) parallel pg_dump
> 6.) using threads for parallel processing
> [...]
> 2.)
> Different subtrees (or nodes) of the plan could be executed in
> parallel
> on different CPUs and the results of this subtrees could be requested
> either synchronously or asynchronously.
>
> I don't see why multiple CPUs can't work on the same node of a
> plan.  For instance, consider a node involving a scan with an
> expensive condition, like UTF-8 string length.  If you have four
> CPUs you can bring to bear, each CPU could take every fourth page,
> computing the expensive condition for each tuple in that page.  The
> results of the scan can be retired asynchronously to the next node
> above.
>
> -jwb

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Simon Riggs

Date:

22 October 2008, 04:55:56

On Mon, 2008-10-20 at 21:05 +0200, Julius Stroffek wrote:

> He is going to do some investigation in the methods and
> write down the possibilities and then he is going to implement
> something from that for PostgreSQL.

When will this work be complete? We are days away from completing main
work on 8.4, so you won't get much discussion on this for a few months
yet. Will it be complete in time for 8.5? Or much earlier even?

Julius, you don't mention what your role is in this. In what sense is
Dano's master's thesis a "we" thing?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Bruce Momjian

Date:

23 October 2008, 17:39:40

Julius Stroffek wrote:
> Hi All,
> 
> we would like to start some work on improving the performance of
> PostgreSQL in a multi-CPU environment. Dano Vojtek is student at the
> Faculty of Mathematics and Physics of Charles university in Prague
> (http://www.mff.cuni.cz) and he is going to cover this topic in his
> master thesis. He is going to do some investigation in the methods and
> write down the possibilities and then he is going to implement something
> from that for PostgreSQL.
> 
> We want to come out with a serious proposal for this work after
> collecting the feedback/opinions and doing the more serious investigation.

Exciting stuff, and clearly a direction we need to explore.

> Topics that seem to be of interest and most of them were already
> discussed at developers meeting in Ottawa are
> 1.) parallel sorts
> 2.) parallel query execution
> 3.) asynchronous I/O

I think the current plan is to use posix_advise() to allow parallel I/O,
rather than async I/O becuase posix_advise() will require fewer code
changes.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Greg Smith

Date:

23 October 2008, 17:53:32

On Thu, 23 Oct 2008, Bruce Momjian wrote:

> I think the current plan is to use posix_advise() to allow parallel I/O,
> rather than async I/O becuase posix_advise() will require fewer code
> changes.

These are not necessarily mutually exclusive designs.  fadvise works fine 
on Linux, but as far as I know only async I/O works on Solaris.  Linux 
also has an async I/O library, and it's not clear to me yet whether that 
might work even better than the fadvise approach.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

"Jonah H. Harris"

Date:

23 October 2008, 19:15:18

On Thu, Oct 23, 2008 at 4:53 PM, Greg Smith <gsmith@gregsmith.com> wrote:
>> I think the current plan is to use posix_advise() to allow parallel I/O,
>> rather than async I/O becuase posix_advise() will require fewer code
>> changes.
>
> These are not necessarily mutually exclusive designs.  fadvise works fine on
> Linux, but as far as I know only async I/O works on Solaris.  Linux also has
> an async I/O library, and it's not clear to me yet whether that might work
> even better than the fadvise approach.

fadvise is a kludge.  While it will help, it still makes us completely
reliant on the OS.  For performance reasons, we should be supporting a
multi-block read directly into shared buffers.  IIRC, we currently
have support for rings in the buffer pool, which we could read
directly into.  Though, an LRU-based buffer manager design would be
more optimal in this case.

-- 
Jonah H. Harris, Senior DBA
myYearbook.com

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Bruce Momjian

Date:

23 October 2008, 21:44:19

Jonah H. Harris wrote:
> On Thu, Oct 23, 2008 at 4:53 PM, Greg Smith <gsmith@gregsmith.com> wrote:
> >> I think the current plan is to use posix_advise() to allow parallel I/O,
> >> rather than async I/O becuase posix_advise() will require fewer code
> >> changes.
> >
> > These are not necessarily mutually exclusive designs.  fadvise works fine on
> > Linux, but as far as I know only async I/O works on Solaris.  Linux also has
> > an async I/O library, and it's not clear to me yet whether that might work
> > even better than the fadvise approach.
> 
> fadvise is a kludge.  While it will help, it still makes us completely
> reliant on the OS.  For performance reasons, we should be supporting a
> multi-block read directly into shared buffers.  IIRC, we currently
> have support for rings in the buffer pool, which we could read
> directly into.  Though, an LRU-based buffer manager design would be
> more optimal in this case.

True, it is a kludge but if it gives us 95% of the benfit with 10% of
the code, it is a win.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Greg Stark

Date:

23 October 2008, 23:42:31

I couldn't get async I/O to work on Linux. That is it "worked" but  
performed the same as reading one block at a time. On solaris the  
situation is reversed.

In what way is fadvise a kludge?

greg

On 24 Oct 2008, at 01:44 AM, Bruce Momjian <bruce@momjian.us> wrote:

> Jonah H. Harris wrote:
>> On Thu, Oct 23, 2008 at 4:53 PM, Greg Smith <gsmith@gregsmith.com>  
>> wrote:
>>>> I think the current plan is to use posix_advise() to allow  
>>>> parallel I/O,
>>>> rather than async I/O becuase posix_advise() will require fewer  
>>>> code
>>>> changes.
>>>
>>> These are not necessarily mutually exclusive designs.  fadvise  
>>> works fine on
>>> Linux, but as far as I know only async I/O works on Solaris.   
>>> Linux also has
>>> an async I/O library, and it's not clear to me yet whether that  
>>> might work
>>> even better than the fadvise approach.
>>
>> fadvise is a kludge.  While it will help, it still makes us  
>> completely
>> reliant on the OS.  For performance reasons, we should be  
>> supporting a
>> multi-block read directly into shared buffers.  IIRC, we currently
>> have support for rings in the buffer pool, which we could read
>> directly into.  Though, an LRU-based buffer manager design would be
>> more optimal in this case.
>
> True, it is a kludge but if it gives us 95% of the benfit with 10% of
> the code, it is a win.
>
> -- 
>  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com
>
>  + If your life is a hard drive, Christ can be your backup. +
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Bruce Momjian

Date:

23 October 2008, 23:43:07

Greg Stark wrote:
> I couldn't get async I/O to work on Linux. That is it "worked" but  
> performed the same as reading one block at a time. On solaris the  
> situation is reversed.
> 
> In what way is fadvise a kludge?

I think he is saying AIO gives us more flexibility, but I am unsure we
need it.

---------------------------------------------------------------------------


> 
> greg
> 
> On 24 Oct 2008, at 01:44 AM, Bruce Momjian <bruce@momjian.us> wrote:
> 
> > Jonah H. Harris wrote:
> >> On Thu, Oct 23, 2008 at 4:53 PM, Greg Smith <gsmith@gregsmith.com>  
> >> wrote:
> >>>> I think the current plan is to use posix_advise() to allow  
> >>>> parallel I/O,
> >>>> rather than async I/O becuase posix_advise() will require fewer  
> >>>> code
> >>>> changes.
> >>>
> >>> These are not necessarily mutually exclusive designs.  fadvise  
> >>> works fine on
> >>> Linux, but as far as I know only async I/O works on Solaris.   
> >>> Linux also has
> >>> an async I/O library, and it's not clear to me yet whether that  
> >>> might work
> >>> even better than the fadvise approach.
> >>
> >> fadvise is a kludge.  While it will help, it still makes us  
> >> completely
> >> reliant on the OS.  For performance reasons, we should be  
> >> supporting a
> >> multi-block read directly into shared buffers.  IIRC, we currently
> >> have support for rings in the buffer pool, which we could read
> >> directly into.  Though, an LRU-based buffer manager design would be
> >> more optimal in this case.
> >
> > True, it is a kludge but if it gives us 95% of the benfit with 10% of
> > the code, it is a win.
> >
> > -- 
> >  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
> >  EnterpriseDB                             http://enterprisedb.com
> >
> >  + If your life is a hard drive, Christ can be your backup. +
> >
> > -- 
> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

"Jonah H. Harris"

Date:

24 October 2008, 00:29:41

On Thu, Oct 23, 2008 at 8:44 PM, Bruce Momjian <bruce@momjian.us> wrote:
> True, it is a kludge but if it gives us 95% of the benfit with 10% of
> the code, it is a win.

I'd say, optimistically, maybe 30-45% the benefit over a proper
multi-block read using O_DIRECT.

-- 
Jonah H. Harris, Senior DBA
myYearbook.com

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

"Jonah H. Harris"

Date:

24 October 2008, 00:31:18

On Thu, Oct 23, 2008 at 10:36 PM, Greg Stark
<greg.stark@enterprisedb.com> wrote:
> I couldn't get async I/O to work on Linux. That is it "worked" but performed
> the same as reading one block at a time. On solaris the situation is
> reversed.

Hmm, then obviously you did something wrong, because my tests showed
it quite well.  Pull the source to iozone or fio.

> In what way is fadvise a kludge?

non-portable, requires more user-to-system CPU, ... need I go on?

-- 
Jonah H. Harris, Senior DBA
myYearbook.com

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Tom Lane

Date:

24 October 2008, 01:42:56

"Jonah H. Harris" <jonah.harris@gmail.com> writes:
> On Thu, Oct 23, 2008 at 10:36 PM, Greg Stark
>> In what way is fadvise a kludge?

> non-portable, requires more user-to-system CPU, ... need I go on?

I'd be interested to know which of these proposals you claim *is*
portable.  The single biggest reason to reject 'em all is that
they aren't.
        regards, tom lane

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

"Jonah H. Harris"

Date:

24 October 2008, 01:52:52

On Fri, Oct 24, 2008 at 12:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> non-portable, requires more user-to-system CPU, ... need I go on?
>
> I'd be interested to know which of these proposals you claim *is*
> portable.  The single biggest reason to reject 'em all is that
> they aren't.

Yes, that was bad wording on my part.  What I mean to say was
unpredictable.  Different OSes and filesystems handle fadvise
differently (or not at all), which makes any claim to performance gain
configuration-dependent.  My preferred method, using O_DIRECT and
fetching directly into shared buffers, is not without its issues or
challenges as well.  However, by abstracting the multi-block read
interface, we could use more optimal calls depending on the OS.

Having done a bit of research and testing in this area (AIO and buffer
management), I don't see any easy solution.  fadvise will work on some
systems and will likely give some gain on them, but won't work for
everyone.  The alternative is to abstract prefetching and allow
platform-specific code, which we rarely do.  While we could build an
abstract prefetch interface and simply use fadvise for it now (rather
than OS-specific code), I don't see an easy win in any case.

-- 
Jonah H. Harris, Senior DBA
myYearbook.com

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Greg Stark

Date:

24 October 2008, 01:56:38

On 24 Oct 2008, at 04:31 AM, "Jonah H. Harris"  
<jonah.harris@gmail.com> wrote:

> On Thu, Oct 23, 2008 at 10:36 PM, Greg Stark
> <greg.stark@enterprisedb.com> wrote:
>> I couldn't get async I/O to work on Linux. That is it "worked" but  
>> performed
>> the same as reading one block at a time. On solaris the situation is
>> reversed.
>
> Hmm, then obviously you did something wrong, because my tests showed
> it quite well.  Pull the source to iozone or fio.

I posted the source, feel free to point out what I did wrong. It did  
work on solaris with and without o_direct so I didn't think it was a  
bug in my code.

>> In what way is fadvise a kludge?
>
> non-portable, requires more user-to-system CPU, ... need I go on?

Well it's just as portable, they're both specified by posix. Actually  
async I/o is in the real-time extensions so one could argue it's less  
portable. Also before posix_fadvise there was plain old fadvise so  
it's portable to older platforms too whereas async I/o isn't.

Posix_fadvise does require two syscalls and two trips to the buffer  
manager. But that doesn't really make it a kludge if the resulting  
code is cleaner than the async I/o code would be. To use async I/o we  
would have to pin all the buffers we're reading which would be quite a  
lot of code changes.

I did ask for feedback on precisely this point of whether two trips to  
the buffer manager was a problem. It would have been nice to get the  
feedback 6 months ago when I posted it instead of now two weeks before  
feature freeze.

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Greg Stark

Date:

24 October 2008, 02:02:16

Based on what? I did test this and posted the data. The results I  
posted showed that posix_fadvise on Linux performed nearly as well on  
Linux as async I/O on Solaris on identical hardware.

More importantly it scaled with the number if drives. A 15 drive array  
gets about 15x the performance of a 1 drive array if enough read-ahead  
is done. Plus an extra boost if the input wasn't already sorted which  
presumably reflects the better i/o ordering.

-- 
greg

On 24 Oct 2008, at 04:29 AM, "Jonah H. Harris"  
<jonah.harris@gmail.com> wrote:

> On Thu, Oct 23, 2008 at 8:44 PM, Bruce Momjian <bruce@momjian.us>  
> wrote:
>> True, it is a kludge but if it gives us 95% of the benfit with 10% of
>> the code, it is a win.
>
> I'd say, optimistically, maybe 30-45% the benefit over a proper
> multi-block read using O_DIRECT.
>
> -- 
> Jonah H. Harris, Senior DBA
> myYearbook.com
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Greg Stark

Date:

24 October 2008, 02:23:02

We did discuss this in Ottawa and I beleive your comment was "the hint  
is in the name" referring to posix_fadvise.

In any case both aio and posix_fadvise are specified by posix so I  
don't see either as a problem on that front.

I don't think we can ignore any longer that we effectively can't use  
raid arrays with postgres. If you have many concurrent queries or  
restrict yourself to sequential scans you're ok but if you're doing  
data warehousing you're going to be pretty disappointed to see your  
shiny raid array performing like a single drive.

greg

On 24 Oct 2008, at 05:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Jonah H. Harris" <jonah.harris@gmail.com> writes:
>> On Thu, Oct 23, 2008 at 10:36 PM, Greg Stark
>>> In what way is fadvise a kludge?
>
>> non-portable, requires more user-to-system CPU, ... need I go on?
>
> I'd be interested to know which of these proposals you claim *is*
> portable.  The single biggest reason to reject 'em all is that
> they aren't.
>
>            regards, tom lane

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Heikki Linnakangas

Date:

24 October 2008, 03:32:59

Jonah H. Harris wrote:
> fadvise is a kludge.

I don't think it's a kludge at all. posix_fadvise() is a pretty nice and 
clean interface to hint the kernel what pages you're going to access in 
the near future. I can't immediately come up with a cleaner interface to 
do that.

Compared to async I/O, it's helluva lot simpler to add a few 
posix_fadvise() calls to an application, than switch to a completely 
different paradigm. And while posix_fadvise() is just a hint, allowing 
the OS to prioritize accordingly, all async I/O requests look the same.

>  While it will help, it still makes us completely
> reliant on the OS.

That's not a bad thing in my opinion. The OS knows the I/O hardware, 
disk layout, utilization, and so forth, and is in a much better position 
to do I/O scheduling than a user process. The only advantage a user 
process has is that it knows better what pages it's going to need, and 
posix_fadvise() is a good interface to let the user process tell the 
kernel that.

> IIRC, we currently have support for rings in the buffer pool, which we could read
> directly into.

The rings won't help you a bit. It's just a different way to choose 
victim buffers.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Heikki Linnakangas

Date:

24 October 2008, 03:50:54

Jonah H. Harris wrote:
> On Thu, Oct 23, 2008 at 8:44 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> True, it is a kludge but if it gives us 95% of the benfit with 10% of
>> the code, it is a win.
> 
> I'd say, optimistically, maybe 30-45% the benefit over a proper
> multi-block read using O_DIRECT.

Let's try to focus. We're not talking about using O_DIRECT, we're 
talking about using asynchronous I/O or posix_fadvise(). And without 
more details on what you mean by benefit, under what circumstances, any 
numbers like that is just unmeasurable handwaving.

In terms of getting the RAID array busy, in Greg's tests posix_fadvise() 
on Linux worked just as well as async I/O works on Solaris. So it 
doesn't seem like there's any inherent performance advantage in the 
async I/O interface over posix_fadvise() + read(),

There is differences between different OS implementations of the 
interfaces. But we're developing software for the future, and for a wide 
range of platforms, and I'm sure operating systems will develop as well. 
The decision should not be made on what is the fastest interface on a 
given operating system in 2008.

Async I/O might have a small potential edge on CPU usage, because less 
system calls are needed. However, let me remind you all that we're 
talking about how to utilize RAID array to do physical, random, I/O as 
fast as possible. IOW, the bottleneck is I/O, by definition. The CPU 
efficiency of the kernel interface to initiate the I/O is insignificant, 
until we reach a large enough random read throughput to saturate the 
CPU, and even then there's probably more significant CPU savings to be 
made elsewhere.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Hannu Krosing

Date:

24 October 2008, 09:00:04

On Fri, 2008-10-24 at 00:52 -0400, Jonah H. Harris wrote:
> While we could build an
> abstract prefetch interface and simply use fadvise for it now (rather
> than OS-specific code), I don't see an easy win in any case.

When building an abstract interface, always use at least two
implementations (I guess that would be fadvise on linux and AIO on
solaris in this case). You are much more likely to get the interface
right this way.

--------------
Hannu

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

"Jonah H. Harris"

Date:

24 October 2008, 10:35:38

On Fri, Oct 24, 2008 at 7:59 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
> On Fri, 2008-10-24 at 00:52 -0400, Jonah H. Harris wrote:
>> While we could build an
>> abstract prefetch interface and simply use fadvise for it now (rather
>> than OS-specific code), I don't see an easy win in any case.
>
> When building an abstract interface, always use at least two
> implementations (I guess that would be fadvise on linux and AIO on
> solaris in this case). You are much more likely to get the interface
> right this way.

I agree, I just wasn't sure as to whether Greg's patch supported both methods.

-- 
Jonah H. Harris, Senior DBA
myYearbook.com

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Bruce Momjian

Date:

24 October 2008, 11:19:55

Jonah H. Harris wrote:
> On Fri, Oct 24, 2008 at 7:59 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
> > On Fri, 2008-10-24 at 00:52 -0400, Jonah H. Harris wrote:
> >> While we could build an
> >> abstract prefetch interface and simply use fadvise for it now (rather
> >> than OS-specific code), I don't see an easy win in any case.
> >
> > When building an abstract interface, always use at least two
> > implementations (I guess that would be fadvise on linux and AIO on
> > solaris in this case). You are much more likely to get the interface
> > right this way.
> 
> I agree, I just wasn't sure as to whether Greg's patch supported both methods.

It does not, and probably will not for the near future;  we can only
hope Solaris suports posix_fadvise() at some point.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Greg Stark

Date:

24 October 2008, 11:48:54

I thought about how to support both and ran into probblems that would  
make the resulting solutions quite complex.

In the libaio view of the world you initiate io and either get a  
callback or call another syscall to test if it's complete. Either  
approach has problems for postgres. If the process that initiated io  
is in the middle of a long query it might takr a long time ot even  
never get back to complete the io. The callbacks use threads...

And polling for completion has the problem that another process could  
be waiting on the io and can't issue a read as long as the first  
process has the buffer locked and io in progress. I think aio makes a  
lot more sense if you're using threads so you can start a thread to  
wait for the io to complete.

Actually I think it might be doable with a lot of work but I'm worried  
that it would be a lot of extra complexity even when you're not using  
it. The current patch doesn't change anything when you're not using it  
and actually is quite simple.

greg

On 24 Oct 2008, at 03:18 PM, Bruce Momjian <bruce@momjian.us> wrote:

> Jonah H. Harris wrote:
>> On Fri, Oct 24, 2008 at 7:59 AM, Hannu Krosing  
>> <hannu@2ndquadrant.com> wrote:
>>> On Fri, 2008-10-24 at 00:52 -0400, Jonah H. Harris wrote:
>>>> While we could build an
>>>> abstract prefetch interface and simply use fadvise for it now  
>>>> (rather
>>>> than OS-specific code), I don't see an easy win in any case.
>>>
>>> When building an abstract interface, always use at least two
>>> implementations (I guess that would be fadvise on linux and AIO on
>>> solaris in this case). You are much more likely to get the interface
>>> right this way.
>>
>> I agree, I just wasn't sure as to whether Greg's patch supported  
>> both methods.
>
> It does not, and probably will not for the near future;  we can only
> hope Solaris suports posix_fadvise() at some point.
>
> -- 
>  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com
>
>  + If your life is a hard drive, Christ can be your backup. +

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Greg Stark

Date:

24 October 2008, 11:50:16

Also keep in mind that solaris is open source these days. If someone  
wants it they could always go ahead and add the feature ...

greg

On 24 Oct 2008, at 03:18 PM, Bruce Momjian <bruce@momjian.us> wrote:

> Jonah H. Harris wrote:
>> On Fri, Oct 24, 2008 at 7:59 AM, Hannu Krosing  
>> <hannu@2ndquadrant.com> wrote:
>>> On Fri, 2008-10-24 at 00:52 -0400, Jonah H. Harris wrote:
>>>> While we could build an
>>>> abstract prefetch interface and simply use fadvise for it now  
>>>> (rather
>>>> than OS-specific code), I don't see an easy win in any case.
>>>
>>> When building an abstract interface, always use at least two
>>> implementations (I guess that would be fadvise on linux and AIO on
>>> solaris in this case). You are much more likely to get the interface
>>> right this way.
>>
>> I agree, I just wasn't sure as to whether Greg's patch supported  
>> both methods.
>
> It does not, and probably will not for the near future;  we can only
> hope Solaris suports posix_fadvise() at some point.
>
> -- 
>  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com
>
>  + If your life is a hard drive, Christ can be your backup. +

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Aidan Van Dyk

Date:

24 October 2008, 12:05:45

* Greg Stark <greg.stark@enterprisedb.com> [081024 10:48]:
> I thought about how to support both and ran into probblems that would  
> make the resulting solutions quite complex.
> 
> In the libaio view of the world you initiate io and either get a  
> callback or call another syscall to test if it's complete. Either  
> approach has problems for postgres. If the process that initiated io  
> is in the middle of a long query it might takr a long time ot even  
> never get back to complete the io. The callbacks use threads...
> 
> And polling for completion has the problem that another process could  
> be waiting on the io and can't issue a read as long as the first  
> process has the buffer locked and io in progress. I think aio makes a  
> lot more sense if you're using threads so you can start a thread to  
> wait for the io to complete.
> 
> Actually I think it might be doable with a lot of work but I'm worried  
> that it would be a lot of extra complexity even when you're not using  
> it. The current patch doesn't change anything when you're not using it  
> and actually is quite simple.

In the Solaris async IO, are you bound by direct IO?  Does the OS page-cache
still get primed by async reads?  If so, how about starting async IO
into a "throwaway" local buffer;  treat async IO in the same way as
fadvise, a "pre-load the OS page cache so the real read is quick".

Sure, I understand it's not the "perfect model", but it I don't see
PostgreSQL being refactored enough to have a pure async model happening
any time in the near future...

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

Julius Stroffek

Date:

29 October 2008, 15:07:35

Hi Simon,<br /><br /><blockquote cite="mid:1224662306.27145.204.camel@ebony.2ndQuadrant" type="cite"><blockquote
type="cite"><prewrap="">He is going to do some investigation in the methods and
 
write down the possibilities and then he is going to implement
something from that for PostgreSQL.   </pre></blockquote><pre wrap="">
When will this work be complete? We are days away from completing main
work on 8.4, so you won't get much discussion on this for a few months
yet. Will it be complete in time for 8.5? Or much earlier even? </pre></blockquote> The first guess is that the work
willbe done for 8.6. Dano is supposed to finish the work and defend his thesis in something a bit more than a year.<br
/><blockquotecite="mid:1224662306.27145.204.camel@ebony.2ndQuadrant" type="cite"><pre wrap="">Julius, you don't mention
whatyour role is in this. In what sense is
 
Dano's master's thesis a "we" thing? </pre></blockquote> I am Dano's mentor and we have a closed contact with Zdenek as
well.We would like the project to become a "we" thing as another reason why to work on the project. It seems to be
betterto research some ideas at the begging and discuss the stuff during development than just individually writing
somepiece of code which could be published afterwards. Especially, when this area seems to be of interest of more
people.<br/><br /> Cheers<br /><br /> Julo<br />

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

"Dann Corbit"

Date:

29 October 2008, 15:32:55

>Hi Simon,
>
>He is going to do some investigation in the methods and
>write down the possibilities and then he is going to implement
>something from that for PostgreSQL.
>
>When will this work be complete? We are days away from completing main
>work on 8.4, so you won't get much discussion on this for a few months
>yet. Will it be complete in time for 8.5? Or much earlier even?
>
>The first guess is that the work will be done for 8.6.
>Dano is supposed to finish the work and defend his thesis in something
a bit more than a year.
>
>Julius, you don't mention what your role is in this. In what sense is
>Dano's master's thesis a "we" thing?
>
>I am Dano's mentor and we have a closed contact with Zdenek as well.
>We would like the project to become a "we" thing as another reason why
to work on the project.
>It seems to be better to research some ideas at the begging and discuss
the stuff
>during development than just individually writing some piece of code
which could
>be published afterwards. Especially, when this area seems to be of
interest of
>more people.

Threads are where future performance is going to come from:
General purpose->
http://www.setup32.com/hardware/cpuchipset/32core-processors-intel-reach
e.php

GPU->
http://wwwx.cs.unc.edu/~lastra/Research/GPU_performance.html
http://www.cs.unc.edu/~geom/GPUSORT/results.html

Database engines that want to exploit the ultimate in performance will
utilize multiple threads of execution.
True, the same thing can be realized by multiple processes, but a
process is more expensive than a thread.

Re: Multi CPU Queries - Feedback and/or suggestions wanted!

From

"Hans-Jürgen Schönig"

Date:

30 October 2008, 07:21:11

Bruce Momjian wrote:
> Greg Stark wrote:
>   
>> I couldn't get async I/O to work on Linux. That is it "worked" but  
>> performed the same as reading one block at a time. On solaris the  
>> situation is reversed.
>>
>> In what way is fadvise a kludge?
>>     
>
> I think he is saying AIO gives us more flexibility, but I am unsure we
> need it.
>   


absolutely.
posix_fadvise is easy to implement and i would assume that it takes away 
a lot of "guessing" on the OS internals side.
the database usually knows that it is gonna read a lot of data in a 
certain way and it cannot be a bad idea to give the kernel a hint here.
especially synchronized seq scans and so on are real winners here as you 
stop confusing the kernel with XX concurrent readers on the same file.
this can also be an issue with some controller firmwares and so on.
   many thanks,
      hans