Thread: Restricting Postgres

Restricting Postgres

From
Martin Foster
Date:
Is there a way to restrict how much load a PostgreSQL server can take
before dropping queries in order to safeguard the server?    I was
looking at the login.conf (5) man page and while it allows me to limit
by processor time this seems to not fit my specific needs.

Essentially, I am looking for a sort of functionality similar to what
Sendmail and Apache have.   Once the load of the system reaches a
certain defined limit the daemon drops tasks until such a time that it
can resume normal operation.

While not necessarily common on my servers I have witnessed some fairly
high load averages which may have led to the machine dropping outright.
   Any help on this matter would be appreciated.
--
    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org

Re: Restricting Postgres

From
Andrew Sullivan
Date:
On Tue, Nov 02, 2004 at 11:52:12PM +0000, Martin Foster wrote:
> Is there a way to restrict how much load a PostgreSQL server can take
> before dropping queries in order to safeguard the server?    I was

Well, you could limit the number of concurrent connections, and set
the query timeout to a relatively low level.  What that ought to mean
is that, under heavy load, some queries will abort.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
When my information changes, I alter my conclusions.  What do you do sir?
        --attr. John Maynard Keynes

Re: Restricting Postgres

From
Simon Riggs
Date:
On Tue, 2004-11-02 at 23:52, Martin Foster wrote:
> Is there a way to restrict how much load a PostgreSQL server can take
> before dropping queries in order to safeguard the server?    I was
> looking at the login.conf (5) man page and while it allows me to limit
> by processor time this seems to not fit my specific needs.
>
> Essentially, I am looking for a sort of functionality similar to what
> Sendmail and Apache have.   Once the load of the system reaches a
> certain defined limit the daemon drops tasks until such a time that it
> can resume normal operation.

Sounds great... could you give more shape to the idea, so people can
comment on it?

What limit? Measured how? Normal operation is what?

Drop what? How to tell?

>
> While not necessarily common on my servers I have witnessed some fairly
> high load averages which may have led to the machine dropping outright.
>    Any help on this matter would be appreciated.

You can limit the number of connections overall?

--
Best Regards, Simon Riggs


Re: Restricting Postgres

From
Martin Foster
Date:
Simon Riggs wrote:
> On Tue, 2004-11-02 at 23:52, Martin Foster wrote:
>
>>Is there a way to restrict how much load a PostgreSQL server can take
>>before dropping queries in order to safeguard the server?    I was
>>looking at the login.conf (5) man page and while it allows me to limit
>>by processor time this seems to not fit my specific needs.
>>
>>Essentially, I am looking for a sort of functionality similar to what
>>Sendmail and Apache have.   Once the load of the system reaches a
>>certain defined limit the daemon drops tasks until such a time that it
>>can resume normal operation.
>
>
> Sounds great... could you give more shape to the idea, so people can
> comment on it?
>
> What limit? Measured how? Normal operation is what?
>
> Drop what? How to tell?
>
>

Let's use the example in Apache, there is the Apache::LoadAvgLimit
mod_perl module which allows one to limit based on the system load
averages.   Here is an example of the configuration one would find:

   <Location /perl>
     PerlInitHandler Apache::LoadAvgLimit
     PerlSetVar LoadAvgLimit_1 3.00
     PerlSetVar LoadAvgLimit_5 2.00
     PerlSetVar LoadAvgLimit_15 1.50
     PerlSetVar LoadAvgRetryAfter 120
   </Location>

The end state is simple, once the load average moves above 3.00 for the
1 minute average the web server will not process the CGI scripts or
mod_perl applications under that directory.  Instead it will return a
503 error and save the system from being crushed by ever increasing load
averages.

Only once the load average is below the defined limits will the server
process requests as normal.   This is not necessarily the nicest or
cleanest way or doing things, but it does allow the Apache web server to
prevent a collapse.

There are ways of restricting the size of files, number of concurrent
processes and even memory being used by a daemon.  This can be done
through ulimit or the login.conf file if your system supports it.
However, there is no way to restrict based on load averages, only
processor time which is ineffective for a perpetually running daemon
like PostgreSQL has.

>>While not necessarily common on my servers I have witnessed some fairly
>>high load averages which may have led to the machine dropping outright.
>>   Any help on this matter would be appreciated.
>
>
> You can limit the number of connections overall?
>

Limiting concurrent connections is not always the solution to the
problem.   Problems can occur when there is a major spike in activity
that would be considered abnormal, due to outside conditions.

For example using Apache::DBI or pgpool the DBMS may be required to
spawn a great deal of child processed in a short order of time.   This
in turn can cause a major spike in processor load and if unchecked by
running as high demand queries the system can literally increase in load
until the server buckles.

I've seen this behavior before when restarting the web server during
heavy loads.    Apache goes from zero connections to a solid 120,
causing PostgreSQL to spawn that many children in a short order of time
just to keep up with the demand.

PostgreSQL undertakes a penalty when spawning a new client and accepting
a connection, this slows takes resources at every level to accomplish.
  However clients on the web server are hitting the server at an
accelerated rate because of the slowed response, leading to even more
demand being placed on both machines.

In most cases the processor will be taxed and the load average high
enough to cause even a noticeable delay when using a console, however it
will generally recover... slowly or in rare cases crash outright.   In
such a circumstance, having the database server refuse queries when the
sanity of the system is concerned might come in handy for such a
circumstance.

Of course, I am not blaming PostgreSQL, there are probably some
instabilities in the AMD64 port of FreeBSD 5.2.1 for dual processor
systems that lead to an increased chance of failure instead of recovery.
   However, if there was a way to prevent the process from reaching
those limits, it may avoid the problem altogether.

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org

Re: Restricting Postgres

From
John A Meinel
Date:
Martin Foster wrote:
> Simon Riggs wrote:
>
>> On Tue, 2004-11-02 at 23:52, Martin Foster wrote:
[...]

> I've seen this behavior before when restarting the web server during
> heavy loads.    Apache goes from zero connections to a solid 120,
> causing PostgreSQL to spawn that many children in a short order of time
> just to keep up with the demand.
>

But wouldn't limiting the number of concurrent connections do this at
the source. If you tell it that "You can at most have 20 connections"
you would never have postgres spawn 120 children.
I'm not sure what apache does if it can't get a DB connection, but it
seems exactly like what you want.

Now, if you expected to have 50 clients that all like to just sit on
open connections, you could leave the number of concurrent connections high.

But if your only connect is from the webserver, where all of them are
designed to be short connections, then leave the max low.

The other possibility is having the webserver use connection pooling, so
it uses a few long lived connections. But even then, you could limit it
to something like 10-20, not 120.

John
=:->


Attachment

Re: Restricting Postgres

From
Martin Foster
Date:
John A Meinel wrote:

> Martin Foster wrote:
>
>> Simon Riggs wrote:
>>
>>> On Tue, 2004-11-02 at 23:52, Martin Foster wrote:
>
> [...]
>
>> I've seen this behavior before when restarting the web server during
>> heavy loads.    Apache goes from zero connections to a solid 120,
>> causing PostgreSQL to spawn that many children in a short order of
>> time just to keep up with the demand.
>>
>
> But wouldn't limiting the number of concurrent connections do this at
> the source. If you tell it that "You can at most have 20 connections"
> you would never have postgres spawn 120 children.
> I'm not sure what apache does if it can't get a DB connection, but it
> seems exactly like what you want.
>
> Now, if you expected to have 50 clients that all like to just sit on
> open connections, you could leave the number of concurrent connections
> high.
>
> But if your only connect is from the webserver, where all of them are
> designed to be short connections, then leave the max low.
>
> The other possibility is having the webserver use connection pooling, so
> it uses a few long lived connections. But even then, you could limit it
> to something like 10-20, not 120.
>
> John
> =:->
>

I have a dual processor system that can support over 150 concurrent
connections handling normal traffic and load.   Now suppose I setup
Apache to spawn all of it's children instantly, what will happen is that
as this happens the PostgreSQL server will also receive 150 attempts at
connection.

This will spawn 150 children in a short order of time and as this takes
place clients can connect and start requesting information not allowing
the machine to settle down to a normal traffic.    That spike when
initiated can cripple the machine or even the webserver if a deadlocked
transaction is introduced.

Because on the webserver side a slowdown in the database means that it
will just get that many more connection attempts pooled from the
clients.  As they keep clicking and hitting reload over and over to get
a page load, that server starts to buckle hitting unbelievably high load
averages.

When the above happened once, I lost the ability to type on a console
because of a 60+ (OpenBSD) load average on a single processor system.
The reason why Apache now drops a 503 Service Unavailable when loads get
too high.

It's that spike I worry about and it can happen for whatever reason.  It
could just as easily be triggered by a massive concurrent request for
processing of an expensive query done in DDOS fashion.   This may not
affect the webserver at all, at least immediately, but the same problem
can effect can come into effect.

Limiting connections help, but it's not the silver bullet and limits
your ability to support more connections because of that initial spike.
    The penalty for forking a new child is hardly unexecpected, even
Apache will show the same effect when restarted in a high traffic time.

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org



Re: Restricting Postgres

From
"Matt Clark"
Date:
> I have a dual processor system that can support over 150 concurrent
> connections handling normal traffic and load.   Now suppose I setup
> Apache to spawn all of it's children instantly, what will
...
> This will spawn 150 children in a short order of time and as
> this takes

"Doctor, it hurts when I do this!"
"Well, don't do that then..."

Sorry, couldn't resist ;-)

Our Apache/PG driven website also needs to be able to deal with occasional
large peaks, so what we do is:

StartServers 15        # Don't create too many children initially
MinSpareServers 10    # Always have at least 10 spares lying around
MaxSpareServers 20    # But no more than 20
MaxClients 150        # Up to 150 - the default 256 is too much for our
RAM


So on server restart 15 Apache children are created, then one new child
every second up to a maximum of 150.

Apache's 'ListenBackLog' is around 500 by default, so there's plenty of
scope for queuing inbound requests while we wait for sufficient children to
be spawned.

In addition we (as _every_ high load site should) run Squid as an
accelerator, which dramatically increases the number of client connections
that can be handled.  Across 2 webservers at peak times we've had 50,000
concurrently open http & https client connections to Squid, with 150 Apache
children doing the work that squid can't (i.e. all the dynamic stuff), and
PG (on a separate box of course) whipping through nearly 800 mixed selects,
inserts and updates per second - and then had to restart Apache on one of
the servers for a config change...  Not a problem :-)

One little tip - if you run squid on the same machine as apache, and use a
dual-proc box, then because squid is single-threaded it will _never_ take
more than half the CPU - nicely self balancing in a way.

M


Re: Restricting Postgres

From
Simon Riggs
Date:
On Wed, 2004-11-03 at 21:25, Martin Foster wrote:
> Simon Riggs wrote:
> > On Tue, 2004-11-02 at 23:52, Martin Foster wrote:
> >
> >>Is there a way to restrict how much load a PostgreSQL server can take
> >>before dropping queries in order to safeguard the server?    I was
> >>looking at the login.conf (5) man page and while it allows me to limit
> >>by processor time this seems to not fit my specific needs.
> >>
> >>Essentially, I am looking for a sort of functionality similar to what
> >>Sendmail and Apache have.   Once the load of the system reaches a
> >>certain defined limit the daemon drops tasks until such a time that it
> >>can resume normal operation.
> >
> >
> > Sounds great... could you give more shape to the idea, so people can
> > comment on it?
> >
> > What limit? Measured how? Normal operation is what?
> >
> > Drop what? How to tell?
> >
> >
>
> Let's use the example in Apache, there is the Apache::LoadAvgLimit
> mod_perl module which allows one to limit based on the system load
> averages.   Here is an example of the configuration one would find:
>
>    <Location /perl>
>      PerlInitHandler Apache::LoadAvgLimit
>      PerlSetVar LoadAvgLimit_1 3.00
>      PerlSetVar LoadAvgLimit_5 2.00
>      PerlSetVar LoadAvgLimit_15 1.50
>      PerlSetVar LoadAvgRetryAfter 120
>    </Location>
>
> The end state is simple, once the load average moves above 3.00 for the
> 1 minute average the web server will not process the CGI scripts or
> mod_perl applications under that directory.  Instead it will return a
> 503 error and save the system from being crushed by ever increasing load
> averages.
>
> Only once the load average is below the defined limits will the server
> process requests as normal.   This is not necessarily the nicest or
> cleanest way or doing things, but it does allow the Apache web server to
> prevent a collapse.
>
> There are ways of restricting the size of files, number of concurrent
> processes and even memory being used by a daemon.  This can be done
> through ulimit or the login.conf file if your system supports it.
> However, there is no way to restrict based on load averages, only
> processor time which is ineffective for a perpetually running daemon
> like PostgreSQL has.
>

All workloads are not created equally, so mixing them can be tricky.
This will be better in 8.0 because seq scans don't spoil the cache.

Apache is effectively able to segregate the workloads because each
workload is "in a directory". SQL isn't stored anywhere for PostgreSQL
to say "just those ones please", so defining which statements are in
which workload is the tricky part.

PostgreSQL workload management could look at userid, tables, processor
load (?) and estimated cost to decide what to do.

There is a TODO item on limiting numbers of connections per
userid/group, in addition to the max number of sessions per server.

Perhaps the easiest way would be to have the Apache workloads segregated
by PostgreSQL userid, then limit connections to each.

> For example using Apache::DBI or pgpool the DBMS may be required to
> spawn a great deal of child processed in a short order of time.   This
> in turn can cause a major spike in processor load and if unchecked by
> running as high demand queries the system can literally increase in load
> until the server buckles.
>

That's been nicely covered off by John and Matt on the other threads, so
you're sorted out for now and doesn't look like a bug in PostgreSQL.

> Of course, I am not blaming PostgreSQL, there are probably some
> instabilities in the AMD64 port of FreeBSD 5.2.1 for dual processor
> systems that lead to an increased chance of failure instead of recovery.

Good!

--
Best Regards, Simon Riggs


Re: Restricting Postgres

From
Martin Foster
Date:
Matt Clark wrote:

>>I have a dual processor system that can support over 150 concurrent
>>connections handling normal traffic and load.   Now suppose I setup
>>Apache to spawn all of it's children instantly, what will
>
> ...
>
>>This will spawn 150 children in a short order of time and as
>>this takes
>
>
> "Doctor, it hurts when I do this!"
> "Well, don't do that then..."
>
> Sorry, couldn't resist ;-)
>
> Our Apache/PG driven website also needs to be able to deal with occasional
> large peaks, so what we do is:
>
> StartServers 15        # Don't create too many children initially
> MinSpareServers 10    # Always have at least 10 spares lying around
> MaxSpareServers 20    # But no more than 20
> MaxClients 150        # Up to 150 - the default 256 is too much for our
> RAM
>
>
> So on server restart 15 Apache children are created, then one new child
> every second up to a maximum of 150.
>
> Apache's 'ListenBackLog' is around 500 by default, so there's plenty of
> scope for queuing inbound requests while we wait for sufficient children to
> be spawned.
>
> In addition we (as _every_ high load site should) run Squid as an
> accelerator, which dramatically increases the number of client connections
> that can be handled.  Across 2 webservers at peak times we've had 50,000
> concurrently open http & https client connections to Squid, with 150 Apache
> children doing the work that squid can't (i.e. all the dynamic stuff), and
> PG (on a separate box of course) whipping through nearly 800 mixed selects,
> inserts and updates per second - and then had to restart Apache on one of
> the servers for a config change...  Not a problem :-)
>
> One little tip - if you run squid on the same machine as apache, and use a
> dual-proc box, then because squid is single-threaded it will _never_ take
> more than half the CPU - nicely self balancing in a way.
>
> M
>

I've heard of the merits of Squid in the use as a reverse proxy.
However, well over 99% of my traffic is dynamic, hence why I may be
experiencing behavior that people normally do not expect.

As I have said before in previous threads, the scripts are completely
database driven and at the time the database averaged 65 queries per
second under MySQL before a migration, while the webserver was averaging
2 to 4.

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org



Re: Restricting Postgres

From
Martin Foster
Date:
Simon Riggs wrote
>
>
> All workloads are not created equally, so mixing them can be tricky.
> This will be better in 8.0 because seq scans don't spoil the cache.
>
> Apache is effectively able to segregate the workloads because each
> workload is "in a directory". SQL isn't stored anywhere for PostgreSQL
> to say "just those ones please", so defining which statements are in
> which workload is the tricky part.
>
> PostgreSQL workload management could look at userid, tables, processor
> load (?) and estimated cost to decide what to do.
>
> There is a TODO item on limiting numbers of connections per
> userid/group, in addition to the max number of sessions per server.
>
> Perhaps the easiest way would be to have the Apache workloads segregated
> by PostgreSQL userid, then limit connections to each.
>

Apache has a global setting for load average limits, the above was just
a module which extended the capability.  It might also make sense to
have limitations set on schema's which can be used in a similar way to
Apache directories.

While for most people the database protecting itself against a sudden
surge of high traffic would be undesirable.   It can help those who run
dynamically driven sites and get slammed by Slashdot for example.

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org



Re: Restricting Postgres

From
Martin Foster
Date:
Kevin Barnard wrote:
> I am generally interested in a good solution for this.  So far our
> solution has been to increase the hardware to the point of allowing
> 800 connections to the DB.
>
> I don't have the mod loaded for Apache, but we haven't had too many
> problems there.  The site is split pretty good between dynamic and
> non-dynamic, it's largely Flash with several plugins to the DB.
> However we still can and have been slammed and up to point of the 800
> connections.
>
> What I don't get is why not use pgpool?  This should eliminate the
> rapid fire forking of postgres instanaces in the DB server.  I'm
> assuming you app can safely handle a failure to connect to the DB
> (i.e. exceed number of DB connections).  If not it should be fairly
> simple to send a 503 header when it's unable to get the connection.
>

Note, that I am not necessarily looking for a PostgreSQL solution to the
matter.  Just a way to prevent the database from killing off the server
it sits on, but looking at the load averages.

I have attempted to make use of pgpool and have had some very poor
performance.   There were constant error messages being sounded, load
averages on that machine seemed to skyrocket and it just seemed to not
be suited for my needs.

Apache::DBI overall works better to what I require, even if it is not a
pool per sey.   Now if pgpool supported variable rate pooling like
Apache does with it's children, it might help to even things out.  That
and you'd still get the spike if you have to start the webserver and
database server at or around the same time.

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org


Re: Restricting Postgres

From
"Matt Clark"
Date:
> Apache::DBI overall works better to what I require, even if
> it is not a
> pool per sey.   Now if pgpool supported variable rate pooling like
> Apache does with it's children, it might help to even things
> out.  That
> and you'd still get the spike if you have to start the webserver and
> database server at or around the same time.

I still don't quite get it though - you shouldn't be getting more than one
child per second being launched by Apache, so that's only one PG postmaster
per second, which is really a trivial load.  That is unless you have
'StartServers' set high, in which case the 'obvious' answer is to lower it.
Are you launching multiple DB connections per Apache process as well?


Re: Restricting Postgres

From
"Matt Clark"
Date:
> Case in point: A first time visitor hits your home page.  A
> dynamic page is generated (in about 1 second) and served
> (taking 2 more seconds) which contains links to 20 additional

The gain from an accelerator is actually even more that that, as it takes
essentially zero seconds for Apache to return the generated content (which
in the case of a message board could be quite large) to Squid, which can
then feed it slowly to the user, leaving Apache free again to generate
another page.  When serving dialup users large dynamic pages this can be a
_huge_ gain.

I think Martin's pages (dimly recalling another thread) take a pretty long
time to generate though, so he may not see quite such a significant gain.



Re: Restricting Postgres

From
Pierre-Frédéric Caillaud
Date:
    Myself, I like a small Apache with few modules serving static files (no
dynamic content, no db connections), and with a mod_proxy on a special
path directed to another Apache which generates the dynamic pages (few
processes, persistent connections...)
    You get the best of both, static files do not hog DB connections, and the
second apache sends generated pages very fast to the first which then
trickles them down to the clients.


>> Case in point: A first time visitor hits your home page.  A
>> dynamic page is generated (in about 1 second) and served
>> (taking 2 more seconds) which contains links to 20 additional
>
> The gain from an accelerator is actually even more that that, as it takes
> essentially zero seconds for Apache to return the generated content
> (which
> in the case of a message board could be quite large) to Squid, which can
> then feed it slowly to the user, leaving Apache free again to generate
> another page.  When serving dialup users large dynamic pages this can be
> a
> _huge_ gain.
>
> I think Martin's pages (dimly recalling another thread) take a pretty
> long
> time to generate though, so he may not see quite such a significant gain.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>       joining column's datatypes do not match
>



Re: Restricting Postgres

From
Martin Foster
Date:
Matt Clark wrote:

>>Case in point: A first time visitor hits your home page.  A
>>dynamic page is generated (in about 1 second) and served
>>(taking 2 more seconds) which contains links to 20 additional
>
>
> The gain from an accelerator is actually even more that that, as it takes
> essentially zero seconds for Apache to return the generated content (which
> in the case of a message board could be quite large) to Squid, which can
> then feed it slowly to the user, leaving Apache free again to generate
> another page.  When serving dialup users large dynamic pages this can be a
> _huge_ gain.
>
> I think Martin's pages (dimly recalling another thread) take a pretty long
> time to generate though, so he may not see quite such a significant gain.
>
>

Correct the 75% of all hits are on a script that can take anywhere from
a few seconds to a half an hour to complete.    The script essentially
auto-flushes to the browser so they get new information as it arrives
creating the illusion of on demand generation.

A squid proxy would probably cause severe problems when dealing with a
script that does not complete output for a variable rate of time.

As for images, CSS, javascript and such the site makes use of it, but in
the grand scheme of things the amount of traffic they tie up is
literally inconsequential.   Though I will probably move all of that
onto another server just to allow the main server the capabilities of
dealing with almost exclusively dynamic content.

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org



Re: Restricting Postgres

From
Martin Foster
Date:
Matt Clark wrote:

>>Apache::DBI overall works better to what I require, even if
>>it is not a
>>pool per sey.   Now if pgpool supported variable rate pooling like
>>Apache does with it's children, it might help to even things
>>out.  That
>>and you'd still get the spike if you have to start the webserver and
>>database server at or around the same time.
>
>
> I still don't quite get it though - you shouldn't be getting more than one
> child per second being launched by Apache, so that's only one PG postmaster
> per second, which is really a trivial load.  That is unless you have
> 'StartServers' set high, in which case the 'obvious' answer is to lower it.
> Are you launching multiple DB connections per Apache process as well?
>

I have start servers set to a fairly high limit.   However this would
make little different overall if I restarted the webservers to load in
new modules during a high load time.    When I am averaging 145
concurrent connections before a restart, I can expect that many request
to hit the server once Apache begins to respond.

As a result, it will literally cause a spike on both machines as new
connections are initiated at a high rate.   In my case I don't always
have the luxury of waiting till 0300 just to test a change.

Again, not necessarily looking for a PostgreSQL solution.  I am looking
for a method that would allow the database or the OS itself to protect
the system it's hosted on.    If both the database and the apache server
were on the same machine this type of scenario would be unstable to say
the least.

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org



Re: Restricting Postgres

From
"Matt Clark"
Date:
> Correct the 75% of all hits are on a script that can take
> anywhere from
> a few seconds to a half an hour to complete.    The script
> essentially
> auto-flushes to the browser so they get new information as it arrives
> creating the illusion of on demand generation.

This is more like a streaming data server, which is a very different beast
from a webserver, and probably better suited to the job.  Usually either
multithreaded or single-process using select() (just like Squid).  You could
probably build one pretty easily.  Using a 30MB Apache process to serve one
client for half an hour seems like a hell of a waste of RAM.

> A squid proxy would probably cause severe problems when
> dealing with a
> script that does not complete output for a variable rate of time.

No, it's fine, squid gives it to the client as it gets it, but can receive
from the server faster.


Re: Restricting Postgres

From
Pierre-Frédéric Caillaud
Date:
On Thu, 4 Nov 2004 18:20:18 -0000, Matt Clark <matt@ymogen.net> wrote:

>> Correct the 75% of all hits are on a script that can take
>> anywhere from
>> a few seconds to a half an hour to complete.    The script
>> essentially
>> auto-flushes to the browser so they get new information as it arrives
>> creating the illusion of on demand generation.

    Er, do you mean that :

    1- You have a query that runs for half an hour and you spoon feed the
results to the client ?
    (argh)

    2- Your script looks for new data every few seconds, sends a packet, then
sleeps, and loops ?

    If it's 2 I have a readymade solution for you, just ask.

Re: Restricting Postgres

From
Martin Foster
Date:
Matt Clark wrote:
>>Correct the 75% of all hits are on a script that can take
>>anywhere from
>>a few seconds to a half an hour to complete.    The script
>>essentially
>>auto-flushes to the browser so they get new information as it arrives
>>creating the illusion of on demand generation.
>
>
> This is more like a streaming data server, which is a very different beast
> from a webserver, and probably better suited to the job.  Usually either
> multithreaded or single-process using select() (just like Squid).  You could
> probably build one pretty easily.  Using a 30MB Apache process to serve one
> client for half an hour seems like a hell of a waste of RAM.
>

These are CGI scripts at the lowest level, nothing more and nothing
less.  While I could probably embed a small webserver directly into the
perl scripts and run that as a daemon, it would take away the
portability that the scripts currently offer.

This should be my last question on the matter, does squid report the
proper IP address of the client themselves?    That's a critical
requirement for the scripts.

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org



Re: Restricting Postgres

From
"Steinar H. Gunderson"
Date:
On Thu, Nov 04, 2004 at 03:30:19PM -0500, Martin Foster wrote:
> This should be my last question on the matter, does squid report the
> proper IP address of the client themselves?    That's a critical
> requirement for the scripts.

AFAIK it's in some header; I believe they're called "X-Forwarded-For". If
you're using caching, your script will obviously be called fewer times than
usual, though, so be careful about relying too much on side effects. :-)
(This is, of course, exactly the same if the client side uses a caching
proxy. Saying anything more is impossible without knowing exactly what you
are doing, though :-) )

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Restricting Postgres

From
Matt Clark
Date:
>     1- You have a query that runs for half an hour and you spoon feed
> the  results to the client ?
>     (argh)
>
>     2- Your script looks for new data every few seconds, sends a
> packet, then  sleeps, and loops ?
>
>     If it's 2 I have a readymade solution for you, just ask.
>
I'm guessing (2) - PG doesn't give the results of a query in a stream.

Re: Restricting Postgres

From
Matt Clark
Date:
> These are CGI scripts at the lowest level, nothing more and nothing
> less.  While I could probably embed a small webserver directly into
> the perl scripts and run that as a daemon, it would take away the
> portability that the scripts currently offer.

If they're CGI *scripts* then they just use the CGI environment, not
Apache, so a daemon that accepts the inbound connections, then compiles
the scripts a-la Apache::Registry, but puts each in a separate thread
would be, er, relatively easy for someone better at multithreaded stuff
than me.

>
> This should be my last question on the matter, does squid report the
> proper IP address of the client themselves?    That's a critical
> requirement for the scripts.
>
In the X-Forwarded-For header.  Not that you can be sure you're seeing
the true client IP anyway if they've gone through an ISP proxy beforehand.



Re: Restricting Postgres

From
Pierre-Frédéric Caillaud
Date:
> I'm guessing (2) - PG doesn't give the results of a query in a stream.

    In 1- I was thinking about a cursor...
    but I think his problem is more like 2-

    In that case one can either code a special purpose server or use the
following hack :

    In your webpage include an iframe with a Javascript to refresh it every
five seconds. The iframe fetches a page from the server which brings in
the new data in form of generated JavaScript which writes in the parent
window. Thus, you get a very short request every 5 seconds to fetch new
data, and it is displayed in the client's window very naturally.

    I've used this technique for another application and find it very cool.
It's for selection lists, often you'll see a list of things to be checked
or not, which makes a big form that people forget to submit. Thus I've
replaced the checkboxes with clickable zones which trigger the loading of
a page in a hidden iframe, which does appropriate modifications in the
database, and updates the HTML in the parent page, changing texts here and
there... it feels a bit like it's not a webpage but rather a standard GUI.
Very neat. Changes are recorded without needing a submit button... I
should write a framework for making that easy to do.

    I did not use a frame because frames suck, but iframes are convenient.
Yeah, it does not work with Lynx... it needs JavaScript... but it works
well.

Re: Restricting Postgres

From
Matt Clark
Date:
>
>     In your webpage include an iframe with a Javascript to refresh it
> every  five seconds. The iframe fetches a page from the server which
> brings in  the new data in form of generated JavaScript which writes
> in the parent  window. Thus, you get a very short request every 5
> seconds to fetch new  data, and it is displayed in the client's window
> very naturally.
>
> ...

Yup.  If you go the JS route then you can do even better by using JS to
load data into JS objects in the background and manipulate the page
content directly, no need for even an Iframe.  Ignore the dullards who
have JS turned off - it's essential for modern web apps, and refusing JS
conflicts absolutely with proper semantic markup.

http://developer.apple.com/internet/webcontent/xmlhttpreq.html is a good
starting point.

It's clear that this discussion has moved way away from PG!  Although in
the context of DB backed web apps I guess in remains a bit on-topic...

M

Re: Restricting Postgres

From
Bruno Wolff III
Date:
On Thu, Nov 04, 2004 at 22:37:06 +0000,
  Matt Clark <matt@ymogen.net> wrote:
> >...
>
> Yup.  If you go the JS route then you can do even better by using JS to
> load data into JS objects in the background and manipulate the page
> content directly, no need for even an Iframe.  Ignore the dullards who
> have JS turned off - it's essential for modern web apps, and refusing JS
> conflicts absolutely with proper semantic markup.

Javascript is too powerful to turn for any random web page. It is only
essential for web pages because people write their web pages to only
work with javascript.

Re: Restricting Postgres

From
Matt Clark
Date:
>Javascript is too powerful to turn for any random web page. It is only
>essential for web pages because people write their web pages to only
>work with javascript.
>
>
Hmm... I respectfully disagree.  It is so powerful that it is impossible
to ignore when implementing a sophisticated app.  And it is not
dangerous to the user so long as they have a popup blocker.
Commercially, I can ignore the people who turn it off, and I can gain a
huge benefit from knowing that 95% of people have it turned on, because
it gives my users a hugely better experience than the equivalent XHTML
only page (which I deliver, and which works, but which is a fairly
depressing experience compared to the JS enabled version).

It is _amazing_ how much crud you can take out of a page if you let JS
do the dynamic stuff (with CSS still in full control of the styling).
Nice, clean, semantically sensible XHTML, that can be transformed for
multiple devices - it's great.

An example:

<a class="preview_link">/previews/foo.wmv</a>

But we want it to appear in a popup when viewed in certain devices....
Easy - Attach an 'onclick' event handler (or just set the target
attribute) when the device has a suitable screen & media player, but
leave the markup clean for the rest of the world.





Re: Restricting Postgres

From
Pierre-Frédéric Caillaud
Date:
check this marvelus piece of 5 minutes of work :
http://boutiquenumerique.com/test/iframe_feed.html

> Yup.  If you go the JS route then you can do even better by using JS to
> load data into JS objects in the background and manipulate the page
> content directly, no need for even an Iframe.  Ignore the dullards who
> have JS turned off - it's essential for modern web apps, and refusing JS
> conflicts absolutely with proper semantic markup.
>
> http://developer.apple.com/internet/webcontent/xmlhttpreq.html is a good
> starting point.

    Didn't know this existed ! Very, very cool.
    I have to check this out more in depth.

    A note though : you'll have to turn off HTTP persistent connections in
your server (not in your proxy) or youre back to square one.

>
> It's clear that this discussion has moved way away from PG!  Although in
> the context of DB backed web apps I guess in remains a bit on-topic...

    I find it very on-topic as
    - it's a way to help this guy solve his "pg problem" which was iin fact a
design problem
    - it's the future of database driven web apps (no more reloading the
whole page !)

    I think in the future there will be a good bit of presentation login in
the client...

Re: Restricting Postgres

From
Matt Clark
Date:
>
>     A note though : you'll have to turn off HTTP persistent
> connections in  your server (not in your proxy) or youre back to
> square one.
>
I hadn't considered that.  On the client side it would seem to be up to
the client whether to use a persistent connection or not.  If it does,
then yeah, a request every 5 seconds would still just hold open a
server.  One more reason to use a proxy I s'pose.

>>
>> It's clear that this discussion has moved way away from PG!  Although
>> in  the context of DB backed web apps I guess in remains a bit
>> on-topic...
>
>
>     I find it very on-topic as
>     - it's a way to help this guy solve his "pg problem" which was iin
> fact a  design problem
>     - it's the future of database driven web apps (no more reloading
> the  whole page !)
>
>     I think in the future there will be a good bit of presentation
> login in  the client...

Not if Bruno has his way ;-)



Re: Restricting Postgres

From
Matt Clark
Date:

Pierre-Frédéric Caillaud wrote:

>
> check this marvelus piece of 5 minutes of work :
> http://boutiquenumerique.com/test/iframe_feed.html
>
cela m'a fait le sourire :-)

(apologies for bad french)

M



Re: Restricting Postgres

From
Martin Foster
Date:
Matt Clark wrote:

>
>
> Pierre-Frédéric Caillaud wrote:
>
>>
>> check this marvelus piece of 5 minutes of work :
>> http://boutiquenumerique.com/test/iframe_feed.html
>>
> cela m'a fait le sourire :-)
>
> (apologies for bad french)
>
> M
>
>

Javascript is not an option for the scripts, one of the mandates of the
project is to support as many different client setups as possible and we
have encountered everything from WebTV to the latest Firefox release.
  It's a chat/roleplay community and not everyone will invest in new
equipment.

Now, it would seem to me that there is a trade off between a JS push
system and a constant ever-present process.    With the traditional
method as I use it, a client will incur the initial penalty of going
through authentication, pulling the look and feel of the realms, sites
and simply poll one table from that point on.

Now on the other hand, you have one user making a call for new posts
every x amount of seconds.   This means every X seconds the penalty for
authentication and design would kick in, increasing overall the load.

The current scripts can also by dynamically adapted to slow things down
based on heavy load or quiet realms that bring little posts in.   It's
much harder to expect Javascript solutions to work perfectly every time
and not be modified by some proxy.

Unfortunately, we are getting way off track.   I'm looking for a way to
protect the PostgreSQL server, either from PostgreSQL or some sort of
external script which pools load average once in a while to make that
determination.

Now is there an administrative command in PostgreSQL that will cause it
to move into some sort of maintenance mode?   For me that could be
exceedingly useful as it would still allow for an admin connection to be
made and run a VACUUM FULL and such.

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org

Re: Restricting Postgres

From
Bruno Wolff III
Date:
On Thu, Nov 04, 2004 at 23:32:57 +0000,
  Matt Clark <matt@ymogen.net> wrote:
> >
> >    I think in the future there will be a good bit of presentation
> >login in  the client...
>
> Not if Bruno has his way ;-)

Sure there will, but it will be controlled by the client, perhaps taking
suggestions from the style sheet pointed to by the document.

Running foreign code from random or even semi-random places is a recipe
for becoming a spam server. See examples from Microsoft such as their
spreadsheet and office software. Documents really need to be passive
data, not active code.

If the client and the server have a special trust relationship, then
running code supplied by the server makes sense. So you might use javascript
within a business where the IT department runs the server and the employees
run clients. However, encouraging people to browse the internet with
javascript enabled is a bad idea.

Re: Restricting Postgres

From
"Leeuw van der, Tim"
Date:
To what extent would your problems be solved by having a 2nd server, a replication system (such as slony-1, but there
areothers), and some sort of load-balancer in front of it? The load-balancing could be as simple as round-robin DNS
server,perhaps... 

Then when you need to do maintenance such a vacuum full, you can temporarily take 1 server out of the load-balancer (I
hope)and do maintenance, and then the other. 
I don't know what that does to replication, but I would venture that replication systems should be designed to handle a
nodegoing offline. 

Load balancing could also help to protect against server-overload and 1 server toppling over.

Of course, I don't know to what extent having another piece of hardware is an option, for you.

cheers,

--Tim

-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Martin Foster
Sent: Friday, November 05, 2004 3:50 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Restricting Postgres

[...]

Now is there an administrative command in PostgreSQL that will cause it
to move into some sort of maintenance mode?   For me that could be
exceedingly useful as it would still allow for an admin connection to be
made and run a VACUUM FULL and such.

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Restricting Postgres

From
"Matthew Nuzum"
Date:
Matt - Very interesting information about squid effectiveness, thanks.

Martin,
You mean your site had no images? No CSS files? No JavaScript files? Nearly
everything is dynamic?

I've found that our CMS spends more time sending a 23KB image to a dial up
user than it does generating and serving dynamic content.

This means that if you have a "light" squid process who caches and serves
your images and static content from it's cache then your apache processes
can truly focus on only the dynamic data.

Case in point: A first time visitor hits your home page.  A dynamic page is
generated (in about 1 second) and served (taking 2 more seconds) which
contains links to 20 additional files (images, styles and etc). Then
expensive apache processes are used to serve each of those 20 files, which
takes an additional 14 seconds.  Your precious application server processes
have now spent 14 seconds serving stuff that could have been served by an
upstream cache.

I am all for using upstream caches and SSL accelerators to take the load off
of application servers.  My apache children often take 16 or 20MB of RAM
each.  Why spend all of that on a 1.3KB image?

Just food for thought.  There are people who use proxying in apache to
redirect expensive tasks to other servers that are dedicated to just one
heavy challenge.  In that case you likely do have 99% dynamic content.

Matthew Nuzum        | Makers of "Elite Content Management System"
www.followers.net        | View samples of Elite CMS in action
matt@followers.net    | http://www.followers.net/portfolio/

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Martin Foster

Matt Clark wrote:

> In addition we (as _every_ high load site should) run Squid as an
> accelerator, which dramatically increases the number of client connections
> that can be handled.  Across 2 webservers at peak times we've had 50,000
> concurrently open http & https client connections to Squid, with 150
Apache
> children doing the work that squid can't (i.e. all the dynamic stuff), and
> PG (on a separate box of course) whipping through nearly 800 mixed
selects,
> inserts and updates per second - and then had to restart Apache on one of
> the servers for a config change...  Not a problem :-)
>
> One little tip - if you run squid on the same machine as apache, and use a
> dual-proc box, then because squid is single-threaded it will _never_ take
> more than half the CPU - nicely self balancing in a way.
>
> M
>

I've heard of the merits of Squid in the use as a reverse proxy.
However, well over 99% of my traffic is dynamic, hence why I may be
experiencing behavior that people normally do not expect.

As I have said before in previous threads, the scripts are completely
database driven and at the time the database averaged 65 queries per
second under MySQL before a migration, while the webserver was averaging
2 to 4.

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org