Thread: How to track number of connections and hosts to Postgres cluster

How to track number of connections and hosts to Postgres cluster

From
Venkat Balaji
Date:
Hello Everyone,

I am working on an alert script to track the number of connections with the host IPs to the Postgres cluster.

1. I need all the host IPs making a connection to Postgres Cluster (even for a fraction of second).
2. I would also want to track number of IDLE connections, IDLE IN TRANSACTION connections and length of the connections as well.

I would be making use of pg_stat_activity and also thought of enabling logging the host ips in the db server log files which seems to be expensive for me (in terms of IO and logfile size).

Please let me know you if there are any alternatives.

Thanks
Venkat

Re: How to track number of connections and hosts to Postgres cluster

From
Guillaume Lelarge
Date:
On Wed, 2011-08-24 at 13:05 +0530, Venkat Balaji wrote:
> Hello Everyone,
>
> I am working on an alert script to track the number of connections with the
> host IPs to the Postgres cluster.
>
> 1. I need all the host IPs making a connection to Postgres Cluster (even for
> a fraction of second).

You should set log_connections to on.

> 2. I would also want to track number of IDLE connections, IDLE IN
> TRANSACTION connections and length of the connections as well.
>

IDLE and IDLE in transactions are the kind of informations you get in
pg_stat_activity.

Length of connections, you can get it with log_disconnections.

> I would be making use of pg_stat_activity and also thought of enabling
> logging the host ips in the db server log files which seems to be expensive
> for me (in terms of IO and logfile size).
>

Using pg_stat_activity won't get you really small connections. You need
log_connections for that, and log_disconnections for the duration of
connections. So you'll have to work on a tool that could get some
informations with queries on pg_stat_activity, and that could read
PostgreSQL log files.


--
Guillaume
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com


Re: How to track number of connections and hosts to Postgres cluster

From
Venkat Balaji
Date:
Thanks Guillaume !!

But, if put log_connections to on and log_disconnections to on wouldn't the Postgres be logging in lot of data ?

Will this not be IO intensive ? I understand that this is the best way, but, would want to know if there is an other way to reduce IO ( may be through queries to catalog tables ).

Thanks
Venkat

On Wed, Aug 24, 2011 at 1:19 PM, Guillaume Lelarge <guillaume@lelarge.info> wrote:
On Wed, 2011-08-24 at 13:05 +0530, Venkat Balaji wrote:
> Hello Everyone,
>
> I am working on an alert script to track the number of connections with the
> host IPs to the Postgres cluster.
>
> 1. I need all the host IPs making a connection to Postgres Cluster (even for
> a fraction of second).

You should set log_connections to on.

> 2. I would also want to track number of IDLE connections, IDLE IN
> TRANSACTION connections and length of the connections as well.
>

IDLE and IDLE in transactions are the kind of informations you get in
pg_stat_activity.

Length of connections, you can get it with log_disconnections.

> I would be making use of pg_stat_activity and also thought of enabling
> logging the host ips in the db server log files which seems to be expensive
> for me (in terms of IO and logfile size).
>

Using pg_stat_activity won't get you really small connections. You need
log_connections for that, and log_disconnections for the duration of
connections. So you'll have to work on a tool that could get some
informations with queries on pg_stat_activity, and that could read
PostgreSQL log files.


--
Guillaume
 http://blog.guillaume.lelarge.info
 http://www.dalibo.com


Re: How to track number of connections and hosts to Postgres cluster

From
Adarsh Sharma
Date:
pg_stat_activity keeps track of all this information.

select * from pg_stat_activity where datname='databasename';



Venkat Balaji wrote:
Thanks Guillaume !!

But, if put log_connections to on and log_disconnections to on wouldn't the Postgres be logging in lot of data ?

Will this not be IO intensive ? I understand that this is the best way, but, would want to know if there is an other way to reduce IO ( may be through queries to catalog tables ).

Thanks
Venkat

On Wed, Aug 24, 2011 at 1:19 PM, Guillaume Lelarge <guillaume@lelarge.info> wrote:
On Wed, 2011-08-24 at 13:05 +0530, Venkat Balaji wrote:
> Hello Everyone,
>
> I am working on an alert script to track the number of connections with the
> host IPs to the Postgres cluster.
>
> 1. I need all the host IPs making a connection to Postgres Cluster (even for
> a fraction of second).

You should set log_connections to on.

> 2. I would also want to track number of IDLE connections, IDLE IN
> TRANSACTION connections and length of the connections as well.
>

IDLE and IDLE in transactions are the kind of informations you get in
pg_stat_activity.

Length of connections, you can get it with log_disconnections.

> I would be making use of pg_stat_activity and also thought of enabling
> logging the host ips in the db server log files which seems to be expensive
> for me (in terms of IO and logfile size).
>

Using pg_stat_activity won't get you really small connections. You need
log_connections for that, and log_disconnections for the duration of
connections. So you'll have to work on a tool that could get some
informations with queries on pg_stat_activity, and that could read
PostgreSQL log files.


--
Guillaume
 http://blog.guillaume.lelarge.info
 http://www.dalibo.com



Re: How to track number of connections and hosts to Postgres cluster

From
Venkat Balaji
Date:
But, the information vanishes if the application logs off.

I am looking for an alternative to track the total amount of the connections with the host IPs through a Cron job.

What could be the frequency of cron ?

I know the best is using log_connections and log_disconnections parameters, but, information logged would be too high and is also IO intensive.

Thanks
Venkat

On Wed, Aug 24, 2011 at 4:39 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:
pg_stat_activity keeps track of all this information.

select * from pg_stat_activity where datname='databasename';




Venkat Balaji wrote:
Thanks Guillaume !!

But, if put log_connections to on and log_disconnections to on wouldn't the Postgres be logging in lot of data ?

Will this not be IO intensive ? I understand that this is the best way, but, would want to know if there is an other way to reduce IO ( may be through queries to catalog tables ).

Thanks
Venkat

On Wed, Aug 24, 2011 at 1:19 PM, Guillaume Lelarge <guillaume@lelarge.info> wrote:
On Wed, 2011-08-24 at 13:05 +0530, Venkat Balaji wrote:
> Hello Everyone,
>
> I am working on an alert script to track the number of connections with the
> host IPs to the Postgres cluster.
>
> 1. I need all the host IPs making a connection to Postgres Cluster (even for
> a fraction of second).

You should set log_connections to on.

> 2. I would also want to track number of IDLE connections, IDLE IN
> TRANSACTION connections and length of the connections as well.
>

IDLE and IDLE in transactions are the kind of informations you get in
pg_stat_activity.

Length of connections, you can get it with log_disconnections.

> I would be making use of pg_stat_activity and also thought of enabling
> logging the host ips in the db server log files which seems to be expensive
> for me (in terms of IO and logfile size).
>

Using pg_stat_activity won't get you really small connections. You need
log_connections for that, and log_disconnections for the duration of
connections. So you'll have to work on a tool that could get some
informations with queries on pg_stat_activity, and that could read
PostgreSQL log files.


--
Guillaume
 http://blog.guillaume.lelarge.info
 http://www.dalibo.com




Re: How to track number of connections and hosts to Postgres cluster

From
Guillaume Lelarge
Date:
On Wed, 2011-08-24 at 16:51 +0530, Venkat Balaji wrote:
> But, the information vanishes if the application logs off.
>

That's why you need a tool to track this.

> I am looking for an alternative to track the total amount of the connections
> with the host IPs through a Cron job.
>

If you only want the number of connections, you can check_postgres.
> What could be the frequency of cron ?
>

I don't think you can go below one second.

> I know the best is using log_connections and log_disconnections parameters,
> but, information logged would be too high and is also IO intensive.
>

Sure. But if you want connection duration, that's the only way.


--
Guillaume
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com


Re: How to track number of connections and hosts to Postgres cluster

From
Greg Smith
Date:
On 08/24/2011 07:07 AM, Venkat Balaji wrote:
> But, if put log_connections to on and log_disconnections to on
> wouldn't the Postgres be logging in lot of data ?
> Will this not be IO intensive ? I understand that this is the best
> way, but, would want to know if there is an other way to reduce IO (
> may be through queries to catalog tables ).
>

Your requirements include:  " I need all the host IPs making a
connection to Postgres Cluster (even for a fraction of second)."

The only way to do this is to log every connection.  Any other approach
for grabbing the data, such as looking at pg_stat_activity, will
sometimes miss one.

If you're willing to lose a connection sometimes, a cron job that polls
pg_stat_activity and saves a summary of what it finds will normally use
less resources.  But connections that start and end between runs will be
missed.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


Re: How to track number of connections and hosts to Postgres cluster

From
Greg Spiegelberg
Date:
On Wed, Aug 24, 2011 at 9:33 AM, Greg Smith <greg@2ndquadrant.com> wrote:
On 08/24/2011 07:07 AM, Venkat Balaji wrote:
But, if put log_connections to on and log_disconnections to on wouldn't the Postgres be logging in lot of data ?
Will this not be IO intensive ? I understand that this is the best way, but, would want to know if there is an other way to reduce IO ( may be through queries to catalog tables ).


Your requirements include:  " I need all the host IPs making a connection to Postgres Cluster (even for a fraction of second)."

The only way to do this is to log every connection.  Any other approach for grabbing the data, such as looking at pg_stat_activity, will sometimes miss one.

If you're willing to lose a connection sometimes, a cron job that polls pg_stat_activity and saves a summary of what it finds will normally use less resources.  But connections that start and end between runs will be missed.


I suppose you could use tcpdump on a separate system with a mirrored switch port and have it log TCP SYN and FIN packets on port 5432 to your database server only.  Keeps all I/O off your database server.

    tcpdump -w port5423.log -n "tcp and port 5432 and tcp[tcpflags] & (tcp-syn|tcp-fin) != 0 and host IP"

HTH.

Greg

Re: How to track number of connections and hosts to Postgres cluster

From
Maciek Sakrejda
Date:
> I suppose you could use tcpdump on a separate system with a mirrored switch
> port and have it log TCP SYN and FIN packets on port 5432 to your database
> server only.  Keeps all I/O off your database server.
>     tcpdump -w port5423.log -n "tcp and port 5432 and tcp[tcpflags] &
> (tcp-syn|tcp-fin) != 0 and host IP"

That's an excellent idea, but note that this will also log
unsuccessful connection attempts (that is, successful TCP connections
that fail PostgreSQL authentication) without much of a way to
distinguish the two, especially if the connections are encrypted.

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: How to track number of connections and hosts to Postgres cluster

From
Scott Marlowe
Date:
On Wed, Aug 24, 2011 at 5:21 AM, Venkat Balaji <venkat.balaji@verse.in> wrote:
> But, the information vanishes if the application logs off.
> I am looking for an alternative to track the total amount of the connections
> with the host IPs through a Cron job.
> What could be the frequency of cron ?
> I know the best is using log_connections and log_disconnections parameters,
> but, information logged would be too high and is also IO intensive.

Really?  Have you tested how much IO it will generate?  My guess is
not that much.  And on a database server it should be a miniscule
amount compared to what your DB is doing the rest of the time.
Eliminating this choice is premature optimization IMHO.

Re: How to track number of connections and hosts to Postgres cluster

From
MirrorX
Date:
lately i did sth similar in one of our servers, to keep track of active, idle
and idle in transaction connections so as to make some optimization in the
connection pooling and i didn't notice any serious io activity there (had
the cron job run every minute). so imho unless the server is seriously io
bound at the moment, you won't notice any difference

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/How-to-track-number-of-connections-and-hosts-to-Postgres-cluster-tp4729546p4732518.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: Re: How to track number of connections and hosts to Postgres cluster

From
Venkat Balaji
Date:
Thanks to all for your very helpful replies !

As Greg Smith rightly said, i faced a problem of missing connections between the runs. I even ran the cron every less than a second, but, still that would become too many runs per second and later i need to take the burden of calculating every thing from the log.

I did not really calculate the IO load while the logging is on. I would switch on "log_connections" and "log_disconnections" to log the number of connections and duration of a connection.

If i notice high IO's and huge log generation, then i think Greg Spileburg has suggested a good idea of using tcpdump on a different server. I would use this utility and see how it works (never used it before). Greg Spileburg, please  help me with any sources of documents you have to use "tcpdump".

Thanks again and sorry for replying late on this !

Regards,
Venkat

On Thu, Aug 25, 2011 at 6:02 AM, MirrorX <mirrorx@gmail.com> wrote:
lately i did sth similar in one of our servers, to keep track of active, idle
and idle in transaction connections so as to make some optimization in the
connection pooling and i didn't notice any serious io activity there (had
the cron job run every minute). so imho unless the server is seriously io
bound at the moment, you won't notice any difference

--
View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-track-number-of-connections-and-hosts-to-Postgres-cluster-tp4729546p4732518.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Re: How to track number of connections and hosts to Postgres cluster

From
Scott Marlowe
Date:
On Mon, Aug 29, 2011 at 11:55 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:
> If i notice high IO's and huge log generation, then i think Greg Spileburg
> has suggested a good idea of using tcpdump on a different server. I would
> use this utility and see how it works (never used it before). Greg
> Spileburg, please  help me with any sources of documents you have to use
> "tcpdump".

There's also a lot to be said for dumping to a dedicated local drive
with fsync turned off.  They're logs so you can chance losing them by
putting them on a cheap fast 7200 rpm SATA drive.  If your logs take
up more than a few megs a second then they are coming out really fast.
 Do you know what your log generation rate in bytes/second is?

Re: Re: How to track number of connections and hosts to Postgres cluster

From
Venkat Balaji
Date:
Hi Scott,

Log generation rate -

500MB size of log file is generated within minimum 3 mins to maximum of 20 mins depending on the database behavior.

I did not understand the "fsync" stuff you mentioned. Please help me know how would fsync is related to log generation or logging host IPs in the log file ?

Thanks
Venkat

On Tue, Aug 30, 2011 at 12:09 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Mon, Aug 29, 2011 at 11:55 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:
> If i notice high IO's and huge log generation, then i think Greg Spileburg
> has suggested a good idea of using tcpdump on a different server. I would
> use this utility and see how it works (never used it before). Greg
> Spileburg, please  help me with any sources of documents you have to use
> "tcpdump".

There's also a lot to be said for dumping to a dedicated local drive
with fsync turned off.  They're logs so you can chance losing them by
putting them on a cheap fast 7200 rpm SATA drive.  If your logs take
up more than a few megs a second then they are coming out really fast.
 Do you know what your log generation rate in bytes/second is?

Re: Re: How to track number of connections and hosts to Postgres cluster

From
Scott Marlowe
Date:
On Thu, Sep 1, 2011 at 11:46 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:
> Hi Scott,
> Log generation rate -
> 500MB size of log file is generated within minimum 3 mins to maximum of 20
> mins depending on the database behavior.
> I did not understand the "fsync" stuff you mentioned. Please help me know
> how would fsync is related to log generation or logging host IPs in the log

So you're generating logs at a rate of about 166MB a minute or 2.7MB/s
 Seagates from the early 90s are faster than that.  Are you logging
more than just connections and disconnections? If you log just those
what's the rate?

fsync is when the OS says to write to disk and the disk confirms the
write is complete.  It probably doesn't matter here whether the file
system is using a journaling method that's real safe or not, and you
can go to something like ext2 where there's no journaling and probably
do fine on a dedicated SATA drive or pair if you want them redundant.

The real issue then is what to do with old log files.  Right now
you're creating them at 10G an hour, or 240G a day.  So you'll need
some cron job to go in and delete the old ones.  Still with a 1TB
drive it'll take about 4 days to fill up, so it's not like you're
gonna run out of space in a few minutes or anything.

Since log files are pretty much written sequentially they don't need
the fastest drives ever made.  Most modern 7200RPM 3.5" SATA drives
can write at least at 50 or 60 MB/s on their slowest portions.  Just
rotate them hourly or daily or whatever and process them and delete
them.

Re: Re: How to track number of connections and hosts to Postgres cluster

From
Venkat Balaji
Date:
Hi Scott,

Yes, we are logging connections and disconnections with duration as well.

We have process of rolling out at every 500MB and old log files are deleted before a certain period of time.

Thanks a lot for your help !

Regards,
Venkat

On Fri, Sep 2, 2011 at 12:12 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Thu, Sep 1, 2011 at 11:46 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:
> Hi Scott,
> Log generation rate -
> 500MB size of log file is generated within minimum 3 mins to maximum of 20
> mins depending on the database behavior.
> I did not understand the "fsync" stuff you mentioned. Please help me know
> how would fsync is related to log generation or logging host IPs in the log

So you're generating logs at a rate of about 166MB a minute or 2.7MB/s
 Seagates from the early 90s are faster than that.  Are you logging
more than just connections and disconnections? If you log just those
what's the rate?

fsync is when the OS says to write to disk and the disk confirms the
write is complete.  It probably doesn't matter here whether the file
system is using a journaling method that's real safe or not, and you
can go to something like ext2 where there's no journaling and probably
do fine on a dedicated SATA drive or pair if you want them redundant.

The real issue then is what to do with old log files.  Right now
you're creating them at 10G an hour, or 240G a day.  So you'll need
some cron job to go in and delete the old ones.  Still with a 1TB
drive it'll take about 4 days to fill up, so it's not like you're
gonna run out of space in a few minutes or anything.

Since log files are pretty much written sequentially they don't need
the fastest drives ever made.  Most modern 7200RPM 3.5" SATA drives
can write at least at 50 or 60 MB/s on their slowest portions.  Just
rotate them hourly or daily or whatever and process them and delete
them.

Re: Re: How to track number of connections and hosts to Postgres cluster

From
Merlin Moncure
Date:
On Tue, Aug 30, 2011 at 12:55 AM, Venkat Balaji <venkat.balaji@verse.in> wrote:
> Thanks to all for your very helpful replies !
> As Greg Smith rightly said, i faced a problem of missing connections between
> the runs. I even ran the cron every less than a second, but, still that
> would become too many runs per second and later i need to take the burden of
> calculating every thing from the log.
> I did not really calculate the IO load while the logging is on. I would
> switch on "log_connections" and "log_disconnections" to log the number of
> connections and duration of a connection.

yet another reason why we need connection and disconnection triggers
(especially the former, since disconnection triggers can be kludged
with an on_proc_exit/dblink hook).

merlin