Thread: PostgreSQL insights: does it use DMA?

PostgreSQL insights: does it use DMA?

From
Antonio Rodriges
Date:
Hello,

Does anyone know whether PostgreSQL uses DMA (Direct Memory Access) in
certain cases to improve networking IO performance?

I mean "simple" query is which doesn't require any CPU processing, for ex
SELECT column_a FROM table_b WHERE date = "2001-10-05"

I need this to devise the best logic for the system with PostgreSQL as
a layer. Certainly I could study PostgreSQL sources or test it with a
simple application but I hope PostgreSQL experts are aware of this
feature.

Thank you.

--
Kind regards,
Antonio Rodriges

Re: PostgreSQL insights: does it use DMA?

From
Scott Marlowe
Date:
On Fri, Sep 9, 2011 at 11:55 AM, Antonio Rodriges <antonio.rrz@gmail.com> wrote:
> Hello,
>
> Does anyone know whether PostgreSQL uses DMA (Direct Memory Access) in
> certain cases to improve networking IO performance?
>
> I mean "simple" query is which doesn't require any CPU processing, for ex
> SELECT column_a FROM table_b WHERE date = "2001-10-05"
>
> I need this to devise the best logic for the system with PostgreSQL as
> a layer. Certainly I could study PostgreSQL sources or test it with a
> simple application but I hope PostgreSQL experts are aware of this
> feature.

That's all up to your hardware and OS, not postgresql

Re: PostgreSQL insights: does it use DMA?

From
Antonio Rodriges
Date:
Scott, regardless of operating system support for DMA, an application
may not benefit from it if it doesn't use appropriate system calls.

PostgreSQL is implemented mostly in C, so I do not know whether it
needs to use special procedure calls, however this is true, for
example, for Java
http://www.ibm.com/developerworks/library/j-zerocopy/

2011/9/9 Scott Marlowe <scott.marlowe@gmail.com>:
> On Fri, Sep 9, 2011 at 11:55 AM, Antonio Rodriges <antonio.rrz@gmail.com> wrote:
>> Hello,
>>
>> Does anyone know whether PostgreSQL uses DMA (Direct Memory Access) in
>> certain cases to improve networking IO performance?
>>
>> I mean "simple" query is which doesn't require any CPU processing, for ex
>> SELECT column_a FROM table_b WHERE date = "2001-10-05"
>>
>> I need this to devise the best logic for the system with PostgreSQL as
>> a layer. Certainly I could study PostgreSQL sources or test it with a
>> simple application but I hope PostgreSQL experts are aware of this
>> feature.
>
> That's all up to your hardware and OS, not postgresql
>



--
Kind regards,
Antonio Rodriges

Re: PostgreSQL insights: does it use DMA?

From
pasman pasmański
Date:
Look at developer faq.

2011/9/9, Antonio Rodriges <antonio.rrz@gmail.com>:
> Hello,
>
> Does anyone know whether PostgreSQL uses DMA (Direct Memory Access) in
> certain cases to improve networking IO performance?
>
> I mean "simple" query is which doesn't require any CPU processing, for ex
> SELECT column_a FROM table_b WHERE date = "2001-10-05"
>
> I need this to devise the best logic for the system with PostgreSQL as
> a layer. Certainly I could study PostgreSQL sources or test it with a
> simple application but I hope PostgreSQL experts are aware of this
> feature.
>
> Thank you.
>
> --
> Kind regards,
> Antonio Rodriges
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


--
------------
pasman

Re: PostgreSQL insights: does it use DMA?

From
"Kevin Grittner"
Date:
Antonio Rodriges <antonio.rrz@gmail.com> wrote:

> PostgreSQL is implemented mostly in C, so I do not know whether it
> needs to use special procedure calls, however this is true, for
> example, for Java
> http://www.ibm.com/developerworks/library/j-zerocopy/

After scanning that I'm inclined to think that the only place that
zerocopy techniques might make sense is in the context of BLOBs.
That's not currently happening, and not on anyone's radar as far as
I know.

-Kevin

Re: PostgreSQL insights: does it use DMA?

From
Craig Ringer
Date:
On 10/09/2011 1:55 AM, Antonio Rodriges wrote:
> Hello,
>
> Does anyone know whether PostgreSQL uses DMA (Direct Memory Access) in
> certain cases to improve networking IO performance?

 From what you described in your message it sounds like what you really
want is not DMA, but use of something like the sendfile() system call
(http://www.freebsd.org/cgi/man.cgi?query=sendfile&sektion=2
<http://www.freebsd.org/cgi/man.cgi?query=sendfile&sektion=2>), where
the kernel is told to send data over the network with no further
interaction with the application.

PostgreSQL does not, and can not, do this for regular query results. The
sample query you posted most certainly DOES require CPU processing
during its execution: Any index being used must be traversed, which
requires logic. Date comparisons must be performed, possibly including
timezone conversions, and non-matching rows must be discarded. If a
sequential scan is being done, a bitmap showing holes in the file may be
consulted to decide where to scan and where to skip, and checks of row
versions to determine visibility must be done. Once the data has been
selected, it must be formatted into proper PostgreSQL v3 network
protocol messages, which involves function calls to data type output
functions among many other things. Only then does the data get written
to a network socket.

Needless to say, it's not like Pg is just picking a file to open and
doing a loop where it reads from the file and writes to a socket.

That said, PostgreSQL benefits from the DMA the operating system does
when handling system calls. For example, if your network interface
supports DMA buffer access then PostgreSQL will benefit from that.
Similarly, Pg benefits from the kernel-to-hardware DMA for disk I/O etc.
Beyond that I doubt there's much. PostgreSQL's use of shared_buffers for
read data means data will get copied on read, and writes go through the
OS's buffer cache, so there's unlikely to be direct DMA between
PostgreSQL buffers and the disk hardware for example.


Theoretically PostgreSQL could use something like sendfile() for sending
large object (blob) data and bytea data, but to do so it'd have to
change how that data is stored. Currently blob data is stored in (often
compressed) 8kb chunks in the pg_largeobject table. This data has to be
assembled and possibly decompressed to be transmitted. Similar things
apply for bytea fields of tables. In addition, the data is usually sent
over the text protocol, which means it has to be encoded to hex or (for
older versions) octal escapes. That encoding is incompatible with the
use of an API like sendfile() .

So, in practice, PostgreSQL can _not_ use the kinds of direct
kernel-level disk-to-network sending you seem to be referring to. That
sort of thing is mostly designed for file servers, web servers, etc
where at some point in the process they end up dumping a disk file down
a network socket without transforming the data. Even they don't benefit
from it as much these days because of the wider use of encryption and
compression.

--
Craig Ringer

Re: PostgreSQL insights: does it use DMA?

From
Antonio Rodriges
Date:
Thank you a lot, Craig, that's really insightful and exhaustively
complete explanation I've expected!

2011/9/10 Craig Ringer <ringerc@ringerc.id.au>:
> On 10/09/2011 1:55 AM, Antonio Rodriges wrote:
>>
>> Hello,
>>
>> Does anyone know whether PostgreSQL uses DMA (Direct Memory Access) in
>> certain cases to improve networking IO performance?
>
> From what you described in your message it sounds like what you really want
> is not DMA, but use of something like the sendfile() system call
> (http://www.freebsd.org/cgi/man.cgi?query=sendfile&sektion=2
> <http://www.freebsd.org/cgi/man.cgi?query=sendfile&sektion=2>), where the
> kernel is told to send data over the network with no further interaction
> with the application.
>
> PostgreSQL does not, and can not, do this for regular query results. The
> sample query you posted most certainly DOES require CPU processing during
> its execution: Any index being used must be traversed, which requires logic.
> Date comparisons must be performed, possibly including timezone conversions,
> and non-matching rows must be discarded. If a sequential scan is being done,
> a bitmap showing holes in the file may be consulted to decide where to scan
> and where to skip, and checks of row versions to determine visibility must
> be done. Once the data has been selected, it must be formatted into proper
> PostgreSQL v3 network protocol messages, which involves function calls to
> data type output functions among many other things. Only then does the data
> get written to a network socket.
>
> Needless to say, it's not like Pg is just picking a file to open and doing a
> loop where it reads from the file and writes to a socket.
>
> That said, PostgreSQL benefits from the DMA the operating system does when
> handling system calls. For example, if your network interface supports DMA
> buffer access then PostgreSQL will benefit from that. Similarly, Pg benefits
> from the kernel-to-hardware DMA for disk I/O etc. Beyond that I doubt
> there's much. PostgreSQL's use of shared_buffers for read data means data
> will get copied on read, and writes go through the OS's buffer cache, so
> there's unlikely to be direct DMA between PostgreSQL buffers and the disk
> hardware for example.
>
>
> Theoretically PostgreSQL could use something like sendfile() for sending
> large object (blob) data and bytea data, but to do so it'd have to change
> how that data is stored. Currently blob data is stored in (often compressed)
> 8kb chunks in the pg_largeobject table. This data has to be assembled and
> possibly decompressed to be transmitted. Similar things apply for bytea
> fields of tables. In addition, the data is usually sent over the text
> protocol, which means it has to be encoded to hex or (for older versions)
> octal escapes. That encoding is incompatible with the use of an API like
> sendfile() .
>
> So, in practice, PostgreSQL can _not_ use the kinds of direct kernel-level
> disk-to-network sending you seem to be referring to. That sort of thing is
> mostly designed for file servers, web servers, etc where at some point in
> the process they end up dumping a disk file down a network socket without
> transforming the data. Even they don't benefit from it as much these days
> because of the wider use of encryption and compression.
>
> --
> Craig Ringer
>



--
Kind regards,
Antonio Rodriges