Thread: Infrastructure monitoring

Infrastructure monitoring

From
"Jim C. Nasby"
Date:
Search has been down for at least 2 days now, and this certainly isn't
the first time it's happened. There's also been cases of archives
getting stuck, and probably other outages besides those that went on
until someone email'd about it.

Would it be difficult to setup something to monitor these various
services? I know there's at least one OSS tool to do it, though I have
no idea how hard it would be to tie that into the current
infrastructure.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Infrastructure monitoring

From
Josh Berkus
Date:
Jim,

> Search has been down for at least 2 days now, and this certainly isn't
> the first time it's happened. There's also been cases of archives
> getting stuck, and probably other outages besides those that went on
> until someone email'd about it.
>
> Would it be difficult to setup something to monitor these various
> services? I know there's at least one OSS tool to do it, though I have
> no idea how hard it would be to tie that into the current
> infrastructure.

We have an open offer of Hyperic licenses, and they support FreeBSD now.

--Josh

--
__Aglio Database Solutions_______________
Josh Berkus               Consultant
josh@agliodbs.com     www.agliodbs.com
Ph: 415-752-2500    Fax: 415-752-2387
2166 Hayes Suite 200    San Francisco, CA

Re: Infrastructure monitoring

From
"Marc G. Fournier"
Date:
On Fri, 13 Jan 2006, Josh Berkus wrote:

> Jim,
>
>> Search has been down for at least 2 days now, and this certainly isn't
>> the first time it's happened. There's also been cases of archives
>> getting stuck, and probably other outages besides those that went on
>> until someone email'd about it.
>>
>> Would it be difficult to setup something to monitor these various
>> services? I know there's at least one OSS tool to do it, though I have
>> no idea how hard it would be to tie that into the current
>> infrastructure.
>
> We have an open offer of Hyperic licenses, and they support FreeBSD now.

Not to discount the offer ... but, what exactly would that provide us?  We
already monitor the *servers*, its what is inside of the servers that
needs better monitoring ... knowing nothing about Hyperic, does that
provide something for that?

In the case of the archives, for instance, the problem was a perl process
that for some unknown reason got stuck randomly ... removed that in favor
of an awk script, and it hasn't done it since ... i also redirected cron's
email to scrappy@postgresql.org, so that any errors show up in my mailbox
instead of roots, so I get an hourly reminder that things are running well
...

In the case of search ... John would be better at answering that, but when
he and I talked this past week, he mentioned that he was moving it all
over to two new servers, which I changed the DNS for on Wednesday ...

As I've said above ... physical servers are being monitored, so if anyone
has some ideas on how we can improve "content monitoring", for lack of a
better word, I know I'm all ears ...

Again, if Hyperic can offer something for this, let me know ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: Infrastructure monitoring

From
Guido Barosio
Date:
Actually it seems to be as easy as requesting a GET to search.postgresql.org
If a script is able to handle the http codes, then alerts could be triggered upon events.
 
The search failure was due to a 503 error being dispatched from the server.
 
 
Am I wrong?
 
Tho, thinking about content,  there is an opensource doing a job such as siteconf (http://www.siteconfidence.com) but I can't remember the name atm.
 
But I understand that the search problem was not a *content* problem itself.
 
G.-

 
On 1/14/06, Marc G. Fournier <scrappy@postgresql.org> wrote:
On Fri, 13 Jan 2006, Josh Berkus wrote:

> Jim,
>
>> Search has been down for at least 2 days now, and this certainly isn't
>> the first time it's happened. There's also been cases of archives
>> getting stuck, and probably other outages besides those that went on
>> until someone email'd about it.
>>
>> Would it be difficult to setup something to monitor these various
>> services? I know there's at least one OSS tool to do it, though I have
>> no idea how hard it would be to tie that into the current
>> infrastructure.
>
> We have an open offer of Hyperic licenses, and they support FreeBSD now.

Not to discount the offer ... but, what exactly would that provide us?  We
already monitor the *servers*, its what is inside of the servers that
needs better monitoring ... knowing nothing about Hyperic, does that
provide something for that?

In the case of the archives, for instance, the problem was a perl process
that for some unknown reason got stuck randomly ... removed that in favor
of an awk script, and it hasn't done it since ... i also redirected cron's
email to scrappy@postgresql.org, so that any errors show up in my mailbox
instead of roots, so I get an hourly reminder that things are running well
...

In the case of search ... John would be better at answering that, but when
he and I talked this past week, he mentioned that he was moving it all
over to two new servers, which I changed the DNS for on Wednesday ...

As I've said above ... physical servers are being monitored, so if anyone
has some ideas on how we can improve "content monitoring", for lack of a
better word, I know I'm all ears ...

Again, if Hyperic can offer something for this, let me know ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster



--
/"\   ASCII Ribbon Campaign  .
\ / - NO HTML/RTF in e-mail  .
X  - NO Word docs in e-mail .
/ \ -----------------------------------------------------------------

Re: Infrastructure monitoring

From
"Magnus Hagander"
Date:
> >> Search has been down for at least 2 days now, and this certainly
> >> isn't the first time it's happened. There's also been cases of
> >> archives getting stuck, and probably other outages besides
> those that
> >> went on until someone email'd about it.
> >>
> >> Would it be difficult to setup something to monitor these various
> >> services? I know there's at least one OSS tool to do it, though I
> >> have no idea how hard it would be to tie that into the current
> >> infrastructure.
> >
> > We have an open offer of Hyperic licenses, and they support
> FreeBSD now.
>
> Not to discount the offer ... but, what exactly would that
> provide us?  We already monitor the *servers*, its what is
> inside of the servers that needs better monitoring ...
> knowing nothing about Hyperic, does that provide something for that?

I assume you talk about the nagios monitoring? Or are there perhaps even
now multiple sets of monitoring? (Dave has a nagios installation up at
least).

We could easily extend that to monitor much more detailed. It's just
that someone has to define what we need to monitor. And in either case,
I see no reason we should require commercial software to do it - that's
still going to need the definition of what has to be monitored. Let's
stick to opensource when we can...


BTW, we already do content monitoring on the actual website mirrors. If
a mirror does not answer, *or* does not update properly, it will
automatically be removed from the DNS record, and thus get out of
"public view" after 10-30 minutes.


> In the case of the archives, for instance, the problem was a
> perl process that for some unknown reason got stuck randomly
> ... removed that in favor of an awk script, and it hasn't
> done it since ... i also redirected cron's email to
> scrappy@postgresql.org, so that any errors show up in my
> mailbox instead of roots, so I get an hourly reminder that
> things are running well ...

Right. What we could do to easily enhance this is to have the update
script update a timestamp file somewhere on the system when it's done,
and then monitor that file using existing tools (the file should be
accessible through http://archives.postgresql.org/ the same way it is
for the general website). Then you can just define a "can get <nn>
minutes out of sync before we scream"..


> In the case of search ... John would be better at answering
> that, but when he and I talked this past week, he mentioned
> that he was moving it all over to two new servers, which I
> changed the DNS for on Wednesday ...

What I think would be good in cases like this is just information -
AFAIK nobody on the web team knew hte servers were being moved. (I may
be wrong here - I know I didn't know and I also spoke to Dave about it,
but those are the only ones I polled. Anyway, -www should know)

That would also make it possible to do the standard fiddling with DNS
TTLs to make the problem much smaller.


//Magnus

Re: Infrastructure monitoring

From
"John Hansen"
Date:
> 
> > In the case of search ... John would be better at answering that, but 
> > when he and I talked this past week, he mentioned that he was moving 
> > it all over to two new servers, which I changed the DNS for on Wednesday ...
> 
> What I think would be good in cases like this is just 
> information - AFAIK nobody on the web team knew hte servers 
> were being moved. (I may be wrong here - I know I didn't know 
> and I also spoke to Dave about it, but those are the only 
> ones I polled. Anyway, -www should know)
> 
> That would also make it possible to do the standard fiddling 
> with DNS TTLs to make the problem much smaller.
> 

Right, I should have posted to -www that they were being moved, but forgot in the rush.
It has been under way for some time now, tho unofficially, but became urgent due to a crash.

My Apologies.


Kind Regards,

John

Re: Infrastructure monitoring

From
"Dave Page"
Date:


-----Original Message-----
From: pgsql-www-owner@postgresql.org on behalf of Marc G. Fournier
Sent: Sat 1/14/2006 2:16 AM
To: Josh Berkus
Cc: John Hansen; pgsql-www@postgresql.org; Jim C. Nasby
Subject: Re: [pgsql-www] Infrastructure monitoring

> As I've said above ... physical servers are being monitored, so if anyone
> has some ideas on how we can improve "content monitoring", for lack of a
> better word, I know I'm all ears ...
>
> Again, if Hyperic can offer something for this, let me know ...

We also monitor a bunch of services using a nagios installation here. With the search though it's not always so easy
becausethe search engine is tucked away behind a firewall with only the web frontend poking through. 

Regards, Dave

Re: Infrastructure monitoring

From
"John Hansen"
Date:
>  In the case of search ... John would be better at answering that, but when
>  he and I talked this past week, he mentioned that he was moving it all
>  over to two new servers, which I changed the DNS for on Wednesday ...

Yea, hardware is being monitored, and I get SMS notification,.
I am however often on the road for 12 hours or more, and am thus unable to respond until I get home.

.... John

Re: Infrastructure monitoring

From
Josh Berkus
Date:
People:

> I assume you talk about the nagios monitoring? Or are there perhaps even
> now multiple sets of monitoring? (Dave has a nagios installation up at
> least).

For those of you who haven't seen Hyperic, think of Nagios with a fancy web UI
including notification management, scheduled tasks, historical reporting, and
specific monitioring tools for PostgreSQL databases and other common
applications.    As a comparison,  Nagios::Hyperic --> ed::vi  or
amanda::Arkieka

The one hitch there is that all of this functionality would require that
Hyperic have a server to collect data and run the web interface -- it has
substantial resource consumption.  Possibly Hyperic LLC would supply this
too, in exchange for a case study, but we'd have to ask them.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Infrastructure monitoring

From
Guido Barosio
Date:
I insist cause I find it easy (even for a standard nagios)
 
The search failure was due to a 503 error being dispatched from the server.
 
GET http://search.postgresql.org and expect to recive an array of codes for a succesfull lookup, or drop an alert.
 
G.-
 
 


On 1/14/06, Dave Page <dpage@vale-housing.co.uk > wrote:



-----Original Message-----
From: pgsql-www-owner@postgresql.org on behalf of Marc G. Fournier
Sent: Sat 1/14/2006 2:16 AM
To: Josh Berkus
Cc: John Hansen; pgsql-www@postgresql.org; Jim C. Nasby
Subject: Re: [pgsql-www] Infrastructure monitoring

> As I've said above ... physical servers are being monitored, so if anyone
> has some ideas on how we can improve "content monitoring", for lack of a
> better word, I know I'm all ears ...
>
> Again, if Hyperic can offer something for this, let me know ...

We also monitor a bunch of services using a nagios installation here. With the search though it's not always so easy because the search engine is tucked away behind a firewall with only the web frontend poking through.

Regards, Dave

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@postgresql.org so that your
      message can get through to the mailing list cleanly



--
/"\   ASCII Ribbon Campaign  .
\ / - NO HTML/RTF in e-mail  .
X  - NO Word docs in e-mail .
/ \ -----------------------------------------------------------------

Re: Infrastructure monitoring

From
Dave Page
Date:


On 14/1/06 20:17, "Josh Berkus" <josh@agliodbs.com> wrote:

> People:
>
>> I assume you talk about the nagios monitoring? Or are there perhaps even
>> now multiple sets of monitoring? (Dave has a nagios installation up at
>> least).
>
> For those of you who haven't seen Hyperic, think of Nagios with a fancy web UI
> including notification management, scheduled tasks, historical reporting, and
> specific monitioring tools for PostgreSQL databases and other common
> applications.    As a comparison,  Nagios::Hyperic --> ed::vi  or
> amanda::Arkieka

Which is all very nice but what would any of those features give us that we
have any use for?

Regards, Dave.


Re: Infrastructure monitoring

From
"Joshua D. Drake"
Date:
Hello,

I believe the biggest issue is not the monitoring but the fact
that these machines are not managed.

All due respect to John (I believe he does the search) but if he
is often on the road for 12 hours then someone else needs
to be hosting those machines.

The machines need to be hosted by companies that manage
servers. There are several in the PostgreSQL community and
yes CMD is one of them.

I am not trying to take any kudos from anyone or suggest that
they are not doing a bang up job. I am saying that all of the
communities machines should be managed.

Outside of a hardware failure there is zero reason for these
machines to have extended outages unless scheduled.

Sincerely,

Joshua D. Drake

--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


Re: Infrastructure monitoring

From
Dave Page
Date:



On 14/1/06 20:43, "Guido Barosio" <gbarosio@gmail.com> wrote:

I insist cause I find it easy (even for a standard nagios)

With all due respect, insist all you want but the search engine backend does not run HTTP and is behind a firewall where we cannot monitor it from the outside without punching a hole first. From what John has said though, it is monitored from elsewhere on his private network.

The frontend server is monitored by nagios, but as I knew search was being moved, I was not paying any attention to it.

Regards, Dave.

Re: Infrastructure monitoring

From
Dave Page
Date:


On 14/1/06 20:59, "Joshua D. Drake" <jd@commandprompt.com> wrote:

> Hello,
>
> I believe the biggest issue is not the monitoring but the fact
> that these machines are not managed.
>
> All due respect to John (I believe he does the search) but if he
> is often on the road for 12 hours then someone else needs
> to be hosting those machines.
>
> The machines need to be hosted by companies that manage
> servers. There are several in the PostgreSQL community and
> yes CMD is one of them.
>
> I am not trying to take any kudos from anyone or suggest that
> they are not doing a bang up job. I am saying that all of the
> communities machines should be managed.
>
> Outside of a hardware failure there is zero reason for these
> machines to have extended outages unless scheduled.

Well john has told me in the past he'd be happy to move the search to more
suitable servers. It's a big database handling a high volume of queries
though so a shared or old machine simply won't do.

I don't know how ASPseek will run on modern hardware, but the
far-less-efficient Mnogosearch became a gibbering wreck on a dual 3GHz Xeon
with 4GB RAM and what iirc was a 147GB RAID1 array.

Do you think Command Prompt might be able to help in this case? John can
give a better idea of the actual requirements of course - the only other
oddity that I can recall is that ASPSeek must be compiled with gcc 2.95 due
to some changes in the hashing functions in the STL in later versions which
cause index bloat.

Regards, Dave.


Re: Infrastructure monitoring

From
"Joshua D. Drake"
Date:
>
> Well john has told me in the past he'd be happy to move the search to more
> suitable servers. It's a big database handling a high volume of queries
> though so a shared or old machine simply won't do.
>
> I don't know how ASPseek will run on modern hardware, but the
> far-less-efficient Mnogosearch became a gibbering wreck on a dual 3GHz Xeon
> with 4GB RAM and what iirc was a 147GB RAID1 array.
>
> Do you think Command Prompt might be able to help in this case? John can
> give a better idea of the actual requirements of course - the only other
> oddity that I can recall is that ASPSeek must be compiled with gcc 2.95 due
> to some changes in the hashing functions in the STL in later versions which
> cause index bloat.
Sure we could help and we are happy to. John what types of machines
are we talking about here?

Also if ASPSeek actually is an issue we can look into OpenFTS which I have
used successfully in the past, I also believe it is what pgsql.ru uses.

Sincerely,

Joshua D. Drake

>
> Regards, Dave.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly


--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


Re: Infrastructure monitoring

From
"John Hansen"
Date:
Joshua D. Drake [mailto:jd@commandprompt.com] Wrote

> Sure we could help and we are happy to. John what types of
> machines are we talking about here?

You will need an x86-32 (Preferably SMP) with at least 2G of ram running linux.
(Gentoo is known to work).
ASPseek currently uses 11G of HD space for the search.

In addition, you will need some (separate) webservers to run the cgi frontend.
They can be anything, as long as we can get aspseek to compile on it.

> Also if ASPSeek actually is an issue we can look into OpenFTS
> which I have used successfully in the past, I also believe it
> is what pgsql.ru uses.


Re: Infrastructure monitoring

From
Dave Page
Date:


On 14/1/06 21:24, "Joshua D. Drake" <jd@commandprompt.com> wrote:

> Sure we could help and we are happy to. John what types of machines
> are we talking about here?
>
> Also if ASPSeek actually is an issue we can look into OpenFTS which I have
> used successfully in the past, I also believe it is what pgsql.ru uses.

ASPSeek is only an issue with the STL problem I mentioned, if used with GCC
3.something or above (2.95 should definitely work). Other than that, John's
PG port works like a charm.

The problem with OpenFTS et al, is that there's a lot more to do than simply
install, configure and run. There's a whole heap of packages used on
pgsql.ru, including one for which the website and documentation is entirely
in Russian iirc. From what I remember that major parts that don't exist are
the indexer (to spider the sites and get them into the website), and a
frontend to actually render the output in HTML.

The ASPSeek code requires only PostgreSQL and a couple of templates.

Regards, Dave.


Re: Infrastructure monitoring

From
Dave Page
Date:


On 14/1/06 21:37, "John Hansen" <john@geeknet.com.au> wrote:

> Joshua D. Drake [mailto:jd@commandprompt.com] Wrote
>
>> Sure we could help and we are happy to. John what types of
>> machines are we talking about here?
>
> You will need an x86-32 (Preferably SMP) with at least 2G of ram running
> linux.
> (Gentoo is known to work).
> ASPseek currently uses 11G of HD space for the search.

What sort of storage are you currently running? How many spindles etc?

Regards, Dave.


Re: Infrastructure monitoring

From
Josh Berkus
Date:
Dave,

> > For those of you who haven't seen Hyperic, think of Nagios with a fancy
> > web UI including notification management, scheduled tasks, historical
> > reporting, and specific monitioring tools for PostgreSQL databases and
> > other common applications.    As a comparison,  Nagios::Hyperic -->
> > ed::vi  or amanda::Arkieka
>
> Which is all very nice but what would any of those features give us that we
> have any use for?

Hey, it's your bag ... if you don't want it, I won't ask about setting it up.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Infrastructure monitoring

From
Dave Page
Date:


On 14/1/06 21:49, "Josh Berkus" <josh@agliodbs.com> wrote:

> Dave,
>
>>> For those of you who haven't seen Hyperic, think of Nagios with a fancy
>>> web UI including notification management, scheduled tasks, historical
>>> reporting, and specific monitioring tools for PostgreSQL databases and
>>> other common applications.    As a comparison,  Nagios::Hyperic -->
>>> ed::vi  or amanda::Arkieka
>>
>> Which is all very nice but what would any of those features give us that we
>> have any use for?
>
> Hey, it's your bag ... if you don't want it, I won't ask about setting it up.

OK, well unless anyone else disagrees - thanks, but no thanks. I honestly
can't see what we would gain.

Regards, Dave.


Re: Infrastructure monitoring

From
"Marc G. Fournier"
Date:
On Sat, 14 Jan 2006, Magnus Hagander wrote:

> for the general website). Then you can just define a "can get <nn>
> minutes out of sync before we scream"..

'k, done ... http://archives.postgresql.org/timestamp ... should be there
in about 40 minutes or so, and is updated hourly ... let's say if its more
then 6 hours out ... ?

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: Infrastructure monitoring

From
"John Hansen"
Date:

> -----Original Message-----
> From: Dave Page [mailto:dpage@vale-housing.co.uk] 
> Sent: Sunday, January 15, 2006 8:41 AM
> To: John Hansen; Joshua D. Drake
> Cc: Guido Barosio; Marc G. Fournier; Josh Berkus; 
> pgsql-www@postgresql.org; Jim C. Nasby
> Subject: Re: [pgsql-www] Infrastructure monitoring
> 
> 
> 
> 
> On 14/1/06 21:37, "John Hansen" <john@geeknet.com.au> wrote:
> 
> > Joshua D. Drake [mailto:jd@commandprompt.com] Wrote
> > 
> >> Sure we could help and we are happy to. John what types of 
> machines 
> >> are we talking about here?
> > 
> > You will need an x86-32 (Preferably SMP) with at least 2G of ram 
> > running linux.
> > (Gentoo is known to work).
> > ASPseek currently uses 11G of HD space for the search.
> 
> What sort of storage are you currently running? How many spindles etc?
> 
> Regards, Dave.
> 

4x80G Seagate IDE (PATA ST380011A) Software Raid0 on 2 x Promise Ultra100 Controllers in 66Mhz slots.

The system has another 4x80 Seagate IDE (PATA ST380021A) but they are slower models, in Software Raid5, used for
backup.

Pgbench reports ~800tps with: pgbench -i -s50 test; reboot; pgbench -s50 -c25 -t1000 test;
(Reboot done to completely flush any cached data)

Systained thruput on the disk sybsystem is roughly 130Mbytes/second in both directions.

LVM on top of software raid, XFS filesystems.

... John

Re: Infrastructure monitoring

From
"John Hansen"
Date:
> - the only other oddity that I can recall is that ASPSeek must be compiled with
> gcc 2.95 due to some changes in the hashing functions in the STL in
> later versions which cause index bloat.


I've managed to get around that, by removing some compiler optimizations....

... John

Re: Infrastructure monitoring

From
"Dave Page"
Date:


-----Original Message-----
From: John Hansen [mailto:john@geeknet.com.au]
Sent: Sun 1/15/2006 6:03 AM
To: Joshua D. Drake; Dave Page
Cc: Guido Barosio; Marc G. Fournier; Josh Berkus; pgsql-www@postgresql.org; Jim C. Nasby
Subject: RE: [pgsql-www] Infrastructure monitoring

> > - the only other oddity that I can recall is that ASPSeek must be compiled with
> > gcc 2.95 due to some changes in the hashing functions in the STL in
> > later versions which cause index bloat.
>
>
> I've managed to get around that, by removing some compiler optimizations....

Oh, nice work. The only other issue I had when I tried to build it a few days ago stemmed from the box having a too-new
versionof autoconf, but that should be easy to resolve. 

/D

Re: Infrastructure monitoring

From
"John Hansen"
Date:
> Oh, nice work. The only other issue I had when I tried to
> build it a few days ago stemmed from the box having a too-new
> version of autoconf, but that should be easy to resolve.

Don't you mean automake,autoheader, etc.
It seems to like -1.4 of those.

Have you attempted running it on a 64bit box?

... John

Re: Infrastructure monitoring

From
"Joshua D. Drake"
Date:
Hello,

O.k. hardware requirements are no sweat. We can put together a 6 drive
scsi array for the database and I will put the os on a couple of ide
with raid 1.

Why do we "need" a separate server for the cgi's? Are they that hard on
the webserver?

Joshua D. Drake

--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


Re: Infrastructure monitoring

From
Tino Wildenhain
Date:
Joshua D. Drake schrieb:
> Hello,
> ...
> Why do we "need" a separate server for the cgi's? Are they that hard on
> the webserver?

Maybe not using cgi would help?  You can run webapp and postgres on the
same server but putting them on separate servers really helps with
having performance high. But it really depends on the application.

Re: Infrastructure monitoring

From
"John Hansen"
Date:
Joshua D. Drake [mailto:jd@commandprompt.com] 
> O.k. hardware requirements are no sweat. We can put together 
> a 6 drive scsi array for the database and I will put the os 
> on a couple of ide with raid 1.

That's cool, this damp piece of string could use the freed up bandwidth :)
 
> Why do we "need" a separate server for the cgi's? Are they 
> that hard on the webserver?

Not separate, just separate from this server, and it's not a requirement, but my recommendation.
You can add search.postgresql.org as a vhost to any other machine(s).
Most of the time spent on searches, are spent in executing the cgi (provided of course, that the backend is fast).

Currently, I have the load spread evenly among 3 webservers (I use a reverse proxy load balancer for this), which at
peakputs the load to somewhere between 3 and 4.
 

Also, as I was about to mention on IRC, you might want to consider porting the apache module to apache2 or use it on
apache1,which is much faster than the cgi.
 

With 4 gig of ram however, you would probably be fine running it all on the one host.

... John


Re: Infrastructure monitoring

From
"Dave Page"
Date:

> -----Original Message-----
> From: John Hansen [mailto:john@geeknet.com.au]
> Sent: 15 January 2006 10:44
> To: Dave Page; Joshua D. Drake
> Cc: Guido Barosio; Marc G. Fournier; Josh Berkus;
> pgsql-www@postgresql.org; Jim C. Nasby
> Subject: RE: [pgsql-www] Infrastructure monitoring
>
> > Oh, nice work. The only other issue I had when I tried to
> > build it a few days ago stemmed from the box having a too-new
> > version of autoconf, but that should be easy to resolve.
>
> Don't you mean automake,autoheader, etc.
> It seems to like -1.4 of those.

Ah.

> Have you attempted running it on a 64bit box?

Nope. You?

Regards, Dave


Re: Infrastructure monitoring

From
"Dave Page"
Date:

> -----Original Message-----
> From: Joshua D. Drake [mailto:jd@commandprompt.com]
> Sent: 15 January 2006 20:47
> To: John Hansen
> Cc: Dave Page; Guido Barosio; Marc G. Fournier; Josh Berkus;
> pgsql-www@postgresql.org; Jim C. Nasby
> Subject: Re: [pgsql-www] Infrastructure monitoring
>
> Hello,
>
> O.k. hardware requirements are no sweat. We can put together a 6 drive
> scsi array for the database and I will put the os on a couple of ide
> with raid 1.

OK, that's great, thanks Joshua. Please setup a couple of user accounts
for John & I when it's ready, and grant us 'sudo su -' permission.

Regards, Dave

Re: Infrastructure monitoring

From
Bruce Momjian
Date:
Josh Berkus wrote:
> People:
>
> > I assume you talk about the nagios monitoring? Or are there perhaps even
> > now multiple sets of monitoring? (Dave has a nagios installation up at
> > least).
>
> For those of you who haven't seen Hyperic, think of Nagios with a fancy web UI

Hyper-nachos?  :-)

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Infrastructure monitoring

From
"Jim C. Nasby"
Date:
On Fri, Jan 13, 2006 at 10:16:59PM -0400, Marc G. Fournier wrote:
> email to scrappy@postgresql.org, so that any errors show up in my mailbox
> instead of roots, so I get an hourly reminder that things are running well

I would definately recommend that root@ be forwarded to at least one
(preferably more) people that will read it; especially on FBSD there's a
lot of problems that can be easily identified just be reading those
emails. portupgrade -N portaudit is a good idea, too.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Infrastructure monitoring

From
"Jim C. Nasby"
Date:
On Sat, Jan 14, 2006 at 12:16:25PM +0100, Magnus Hagander wrote:
> BTW, we already do content monitoring on the actual website mirrors. If
> a mirror does not answer, *or* does not update properly, it will
> automatically be removed from the DNS record, and thus get out of
> "public view" after 10-30 minutes.

And this is how all the services should work, at least from a monitoring
standpoint. If any public service (any of the websites, search,
archives, email, ftp, etc) goes down, multiple people should get pages.
Along those lines, disk space should also be monitored to make sure
nothing fills up.

> What I think would be good in cases like this is just information -
> AFAIK nobody on the web team knew hte servers were being moved. (I may
> be wrong here - I know I didn't know and I also spoke to Dave about it,
> but those are the only ones I polled. Anyway, -www should know)

And info is one of the other keys to keeping things running smoothly...
ISTM any changes in service/outages should certainly be posted someplace
where those monitoring things know what's going on.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461