Re: Async IO HTTP server frontend for PostgreSQL - Mailing list pgsql-general

From Dmitriy Igrishin
Subject Re: Async IO HTTP server frontend for PostgreSQL
Date
Msg-id CAAfz9KP+G4KR6GQp2WTwj1nQGZk8gZke1hLFm9Pp_LxJ6=1gYg@mail.gmail.com
Whole thread Raw
In response to Re: Async IO HTTP server frontend for PostgreSQL  (Allan Kamau <kamauallan@gmail.com>)
List pgsql-general
Hello, Allan

2014-09-11 0:29 GMT+04:00 Allan Kamau <kamauallan@gmail.com>:

Dear Dmitriy,

To add on to David's suggestions, Data caching is a difficult task to undertake. Consider an example where your data may not all fit into memory, when you cache these data outside PostgreSQL you would need to look into memory management as well as issues around concurrent population of this cache as well as means to keep the data in the cache fresh in tune with any changes to the data. These are no trivial tasks and the database community has spent years constructing and improving algorithms to do this on behalf of the front end database application developer. Also each time a TCP connection is created, additional compute resources are consumed by the OS as well as the database management server software.

Memory - limited resource. Hence, it's possible to catch "out of memory" everywhere.
By caching in this case I mean flushing the data retrieved by HTTP server from the
database server on the disk during "socket read-ready event" dispatching if the
retrieved data cannot be send to the HTTP-client (socket not write-ready yet). When
the socket to the HTTP-client became "write-ready" the data will be streamed
from cache.
I believe, it's not hard to implement it. And note, there is no need to make such
cache shared. It's just a temporary files with live time of HTTP request dispatching.

A simpler way would be to use connection pooling where a thread of your running application "borrows" a connection from a pool of open connections, executes the SQL command then returns the connection immediately on completion of the SQL command. This will require few concurrent connections (depending of configuration) and let the database do the caching of the data for you. For effective database data caching may need to make adjustments of the PostgreSQL configuration file (postgresql.conf file) as well as the operating system resources configuration. This way the response time of your client application will degrade gracefully with the increase of concurrent client requests.

In practice, this approach works. How well? It works. But with thread/connection pools
your site can be blocked by attacker ("cool-hacker") by initiating many slow
concurrent connections.
Using threads a good only if you're doing a lot of CPU work, rather than
communication work.

For small number of concurrent connections, the speed advantage direct “streaming” solution may have over the traditional connection pooling solution may hardly be noticeable to end user. The easier way to increase response time is to look into PostgreSQL performance tuning as well as investing in faster hardware (mainly the the disk subsystem and more RAM).

Yes, I agree. Good and fast hardware is always good :-). But every hardware
will has limited resources anyway. And I believe, that for many tasks async
solution will help to leverage it to the max.

Regards,

Allan.




On Wed, Sep 10, 2014 at 8:25 PM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:
Hello, Steve

2014-09-10 21:08 GMT+04:00 Steve Atkins <steve@blighty.com>:

On Sep 10, 2014, at 12:16 AM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:

> Hello, David
>
> 2014-09-10 4:31 GMT+04:00 David Boreham <david_list@boreham.org>:
> Hi Dmitriy, are you able to say a little about what's driving your quest for async http-to-pg ?
> I'm curious as to the motivations, and whether they match up with some of my own reasons for wanting to use low-thread-count solutions.
> For many web projects I consider Postgres as a development platform. Thus,
> I prefer to keep the business logic (data integrity trigger functions and
> API functions) in the database. Because of nature of the Web, many concurrent
> clients can request a site and I want to serve maximum possible of them with
> minimal overhead. Also I want to avoid a complex solutions. So, I believe that
> with asynchronous solution it's possible to *stream* the data from the database
> to the maximum number of clients (which possible can request my site over a
> slow connection).

That's going to require you to have one database connection open for each
client. If the client is over a slow connection it'll keep the database connection
open far longer than is needed, (compared to the usual "pull data from the
database as fast as the disks will go, then spoonfeed it out to the slow client"
approach). Requiring a live database backend for every open client connection
doesn't seem like a good idea if you're supporting many slow concurrent clients.
Good point. Thus, some of caching on the HTTP server side should be implemented
then.


--
// Dmitriy.





--
// Dmitriy.

pgsql-general by date:

Previous
From: "Huang, Suya"
Date:
Subject: Re: Decreasing performance in table partitioning
Next
From: Rémi Cura
Date:
Subject: Re: PostgreSQL Portable