Thread: How to track number of connections and hosts to Postgres cluster
Hello Everyone,
I am working on an alert script to track the number of connections with the host IPs to the Postgres cluster.
1. I need all the host IPs making a connection to Postgres Cluster (even for a fraction of second).
2. I would also want to track number of IDLE connections, IDLE IN TRANSACTION connections and length of the connections as well.
I would be making use of pg_stat_activity and also thought of enabling logging the host ips in the db server log files which seems to be expensive for me (in terms of IO and logfile size).
Please let me know you if there are any alternatives.
Thanks
Venkat
On Wed, 2011-08-24 at 13:05 +0530, Venkat Balaji wrote: > Hello Everyone, > > I am working on an alert script to track the number of connections with the > host IPs to the Postgres cluster. > > 1. I need all the host IPs making a connection to Postgres Cluster (even for > a fraction of second). You should set log_connections to on. > 2. I would also want to track number of IDLE connections, IDLE IN > TRANSACTION connections and length of the connections as well. > IDLE and IDLE in transactions are the kind of informations you get in pg_stat_activity. Length of connections, you can get it with log_disconnections. > I would be making use of pg_stat_activity and also thought of enabling > logging the host ips in the db server log files which seems to be expensive > for me (in terms of IO and logfile size). > Using pg_stat_activity won't get you really small connections. You need log_connections for that, and log_disconnections for the duration of connections. So you'll have to work on a tool that could get some informations with queries on pg_stat_activity, and that could read PostgreSQL log files. -- Guillaume http://blog.guillaume.lelarge.info http://www.dalibo.com
Thanks Guillaume !!
But, if put log_connections to on and log_disconnections to on wouldn't the Postgres be logging in lot of data ?
Will this not be IO intensive ? I understand that this is the best way, but, would want to know if there is an other way to reduce IO ( may be through queries to catalog tables ).
Thanks
Venkat
On Wed, Aug 24, 2011 at 1:19 PM, Guillaume Lelarge <guillaume@lelarge.info> wrote:
On Wed, 2011-08-24 at 13:05 +0530, Venkat Balaji wrote:You should set log_connections to on.
> Hello Everyone,
>
> I am working on an alert script to track the number of connections with the
> host IPs to the Postgres cluster.
>
> 1. I need all the host IPs making a connection to Postgres Cluster (even for
> a fraction of second).IDLE and IDLE in transactions are the kind of informations you get in
> 2. I would also want to track number of IDLE connections, IDLE IN
> TRANSACTION connections and length of the connections as well.
>
pg_stat_activity.
Length of connections, you can get it with log_disconnections.Using pg_stat_activity won't get you really small connections. You need
> I would be making use of pg_stat_activity and also thought of enabling
> logging the host ips in the db server log files which seems to be expensive
> for me (in terms of IO and logfile size).
>
log_connections for that, and log_disconnections for the duration of
connections. So you'll have to work on a tool that could get some
informations with queries on pg_stat_activity, and that could read
PostgreSQL log files.
--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com
pg_stat_activity keeps track of all this information.
select * from pg_stat_activity where datname='databasename';
Venkat Balaji wrote:
select * from pg_stat_activity where datname='databasename';
Venkat Balaji wrote:
Thanks Guillaume !!But, if put log_connections to on and log_disconnections to on wouldn't the Postgres be logging in lot of data ?Will this not be IO intensive ? I understand that this is the best way, but, would want to know if there is an other way to reduce IO ( may be through queries to catalog tables ).ThanksVenkatOn Wed, Aug 24, 2011 at 1:19 PM, Guillaume Lelarge <guillaume@lelarge.info> wrote:On Wed, 2011-08-24 at 13:05 +0530, Venkat Balaji wrote:You should set log_connections to on.
> Hello Everyone,
>
> I am working on an alert script to track the number of connections with the
> host IPs to the Postgres cluster.
>
> 1. I need all the host IPs making a connection to Postgres Cluster (even for
> a fraction of second).IDLE and IDLE in transactions are the kind of informations you get in
> 2. I would also want to track number of IDLE connections, IDLE IN
> TRANSACTION connections and length of the connections as well.
>
pg_stat_activity.
Length of connections, you can get it with log_disconnections.Using pg_stat_activity won't get you really small connections. You need
> I would be making use of pg_stat_activity and also thought of enabling
> logging the host ips in the db server log files which seems to be expensive
> for me (in terms of IO and logfile size).
>
log_connections for that, and log_disconnections for the duration of
connections. So you'll have to work on a tool that could get some
informations with queries on pg_stat_activity, and that could read
PostgreSQL log files.
--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com
But, the information vanishes if the application logs off.
I am looking for an alternative to track the total amount of the connections with the host IPs through a Cron job.
What could be the frequency of cron ?
I know the best is using log_connections and log_disconnections parameters, but, information logged would be too high and is also IO intensive.
Thanks
Venkat
On Wed, Aug 24, 2011 at 4:39 PM, Adarsh Sharma <adarsh.sharma@orkash.com> wrote:
pg_stat_activity keeps track of all this information.
select * from pg_stat_activity where datname='databasename';
Venkat Balaji wrote:Thanks Guillaume !!But, if put log_connections to on and log_disconnections to on wouldn't the Postgres be logging in lot of data ?Will this not be IO intensive ? I understand that this is the best way, but, would want to know if there is an other way to reduce IO ( may be through queries to catalog tables ).ThanksVenkatOn Wed, Aug 24, 2011 at 1:19 PM, Guillaume Lelarge <guillaume@lelarge.info> wrote:On Wed, 2011-08-24 at 13:05 +0530, Venkat Balaji wrote:You should set log_connections to on.
> Hello Everyone,
>
> I am working on an alert script to track the number of connections with the
> host IPs to the Postgres cluster.
>
> 1. I need all the host IPs making a connection to Postgres Cluster (even for
> a fraction of second).IDLE and IDLE in transactions are the kind of informations you get in
> 2. I would also want to track number of IDLE connections, IDLE IN
> TRANSACTION connections and length of the connections as well.
>
pg_stat_activity.
Length of connections, you can get it with log_disconnections.Using pg_stat_activity won't get you really small connections. You need
> I would be making use of pg_stat_activity and also thought of enabling
> logging the host ips in the db server log files which seems to be expensive
> for me (in terms of IO and logfile size).
>
log_connections for that, and log_disconnections for the duration of
connections. So you'll have to work on a tool that could get some
informations with queries on pg_stat_activity, and that could read
PostgreSQL log files.
--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com
On Wed, 2011-08-24 at 16:51 +0530, Venkat Balaji wrote: > But, the information vanishes if the application logs off. > That's why you need a tool to track this. > I am looking for an alternative to track the total amount of the connections > with the host IPs through a Cron job. > If you only want the number of connections, you can check_postgres. > What could be the frequency of cron ? > I don't think you can go below one second. > I know the best is using log_connections and log_disconnections parameters, > but, information logged would be too high and is also IO intensive. > Sure. But if you want connection duration, that's the only way. -- Guillaume http://blog.guillaume.lelarge.info http://www.dalibo.com
On 08/24/2011 07:07 AM, Venkat Balaji wrote: > But, if put log_connections to on and log_disconnections to on > wouldn't the Postgres be logging in lot of data ? > Will this not be IO intensive ? I understand that this is the best > way, but, would want to know if there is an other way to reduce IO ( > may be through queries to catalog tables ). > Your requirements include: " I need all the host IPs making a connection to Postgres Cluster (even for a fraction of second)." The only way to do this is to log every connection. Any other approach for grabbing the data, such as looking at pg_stat_activity, will sometimes miss one. If you're willing to lose a connection sometimes, a cron job that polls pg_stat_activity and saves a summary of what it finds will normally use less resources. But connections that start and end between runs will be missed. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On Wed, Aug 24, 2011 at 9:33 AM, Greg Smith <greg@2ndquadrant.com> wrote:
On 08/24/2011 07:07 AM, Venkat Balaji wrote:Your requirements include: " I need all the host IPs making a connection to Postgres Cluster (even for a fraction of second)."But, if put log_connections to on and log_disconnections to on wouldn't the Postgres be logging in lot of data ?
Will this not be IO intensive ? I understand that this is the best way, but, would want to know if there is an other way to reduce IO ( may be through queries to catalog tables ).
The only way to do this is to log every connection. Any other approach for grabbing the data, such as looking at pg_stat_activity, will sometimes miss one.
If you're willing to lose a connection sometimes, a cron job that polls pg_stat_activity and saves a summary of what it finds will normally use less resources. But connections that start and end between runs will be missed.
I suppose you could use tcpdump on a separate system with a mirrored switch port and have it log TCP SYN and FIN packets on port 5432 to your database server only. Keeps all I/O off your database server.
tcpdump -w port5423.log -n "tcp and port 5432 and tcp[tcpflags] & (tcp-syn|tcp-fin) != 0 and host IP"
HTH.
Greg
> I suppose you could use tcpdump on a separate system with a mirrored switch > port and have it log TCP SYN and FIN packets on port 5432 to your database > server only. Keeps all I/O off your database server. > tcpdump -w port5423.log -n "tcp and port 5432 and tcp[tcpflags] & > (tcp-syn|tcp-fin) != 0 and host IP" That's an excellent idea, but note that this will also log unsuccessful connection attempts (that is, successful TCP connections that fail PostgreSQL authentication) without much of a way to distinguish the two, especially if the connections are encrypted. --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com
On Wed, Aug 24, 2011 at 5:21 AM, Venkat Balaji <venkat.balaji@verse.in> wrote: > But, the information vanishes if the application logs off. > I am looking for an alternative to track the total amount of the connections > with the host IPs through a Cron job. > What could be the frequency of cron ? > I know the best is using log_connections and log_disconnections parameters, > but, information logged would be too high and is also IO intensive. Really? Have you tested how much IO it will generate? My guess is not that much. And on a database server it should be a miniscule amount compared to what your DB is doing the rest of the time. Eliminating this choice is premature optimization IMHO.
lately i did sth similar in one of our servers, to keep track of active, idle and idle in transaction connections so as to make some optimization in the connection pooling and i didn't notice any serious io activity there (had the cron job run every minute). so imho unless the server is seriously io bound at the moment, you won't notice any difference -- View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-track-number-of-connections-and-hosts-to-Postgres-cluster-tp4729546p4732518.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Thanks to all for your very helpful replies !
As Greg Smith rightly said, i faced a problem of missing connections between the runs. I even ran the cron every less than a second, but, still that would become too many runs per second and later i need to take the burden of calculating every thing from the log.
I did not really calculate the IO load while the logging is on. I would switch on "log_connections" and "log_disconnections" to log the number of connections and duration of a connection.
If i notice high IO's and huge log generation, then i think Greg Spileburg has suggested a good idea of using tcpdump on a different server. I would use this utility and see how it works (never used it before). Greg Spileburg, please help me with any sources of documents you have to use "tcpdump".
Thanks again and sorry for replying late on this !
Regards,
Venkat
On Thu, Aug 25, 2011 at 6:02 AM, MirrorX <mirrorx@gmail.com> wrote:
lately i did sth similar in one of our servers, to keep track of active, idle
and idle in transaction connections so as to make some optimization in the
connection pooling and i didn't notice any serious io activity there (had
the cron job run every minute). so imho unless the server is seriously io
bound at the moment, you won't notice any difference
--
View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-track-number-of-connections-and-hosts-to-Postgres-cluster-tp4729546p4732518.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Mon, Aug 29, 2011 at 11:55 PM, Venkat Balaji <venkat.balaji@verse.in> wrote: > If i notice high IO's and huge log generation, then i think Greg Spileburg > has suggested a good idea of using tcpdump on a different server. I would > use this utility and see how it works (never used it before). Greg > Spileburg, please help me with any sources of documents you have to use > "tcpdump". There's also a lot to be said for dumping to a dedicated local drive with fsync turned off. They're logs so you can chance losing them by putting them on a cheap fast 7200 rpm SATA drive. If your logs take up more than a few megs a second then they are coming out really fast. Do you know what your log generation rate in bytes/second is?
Hi Scott,
Log generation rate -
500MB size of log file is generated within minimum 3 mins to maximum of 20 mins depending on the database behavior.
I did not understand the "fsync" stuff you mentioned. Please help me know how would fsync is related to log generation or logging host IPs in the log file ?
Thanks
Venkat
On Tue, Aug 30, 2011 at 12:09 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Mon, Aug 29, 2011 at 11:55 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:There's also a lot to be said for dumping to a dedicated local drive
> If i notice high IO's and huge log generation, then i think Greg Spileburg
> has suggested a good idea of using tcpdump on a different server. I would
> use this utility and see how it works (never used it before). Greg
> Spileburg, please help me with any sources of documents you have to use
> "tcpdump".
with fsync turned off. They're logs so you can chance losing them by
putting them on a cheap fast 7200 rpm SATA drive. If your logs take
up more than a few megs a second then they are coming out really fast.
Do you know what your log generation rate in bytes/second is?
On Thu, Sep 1, 2011 at 11:46 PM, Venkat Balaji <venkat.balaji@verse.in> wrote: > Hi Scott, > Log generation rate - > 500MB size of log file is generated within minimum 3 mins to maximum of 20 > mins depending on the database behavior. > I did not understand the "fsync" stuff you mentioned. Please help me know > how would fsync is related to log generation or logging host IPs in the log So you're generating logs at a rate of about 166MB a minute or 2.7MB/s Seagates from the early 90s are faster than that. Are you logging more than just connections and disconnections? If you log just those what's the rate? fsync is when the OS says to write to disk and the disk confirms the write is complete. It probably doesn't matter here whether the file system is using a journaling method that's real safe or not, and you can go to something like ext2 where there's no journaling and probably do fine on a dedicated SATA drive or pair if you want them redundant. The real issue then is what to do with old log files. Right now you're creating them at 10G an hour, or 240G a day. So you'll need some cron job to go in and delete the old ones. Still with a 1TB drive it'll take about 4 days to fill up, so it's not like you're gonna run out of space in a few minutes or anything. Since log files are pretty much written sequentially they don't need the fastest drives ever made. Most modern 7200RPM 3.5" SATA drives can write at least at 50 or 60 MB/s on their slowest portions. Just rotate them hourly or daily or whatever and process them and delete them.
Hi Scott,
Yes, we are logging connections and disconnections with duration as well.
We have process of rolling out at every 500MB and old log files are deleted before a certain period of time.
Thanks a lot for your help !
Regards,
Venkat
On Fri, Sep 2, 2011 at 12:12 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Thu, Sep 1, 2011 at 11:46 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:So you're generating logs at a rate of about 166MB a minute or 2.7MB/s
> Hi Scott,
> Log generation rate -
> 500MB size of log file is generated within minimum 3 mins to maximum of 20
> mins depending on the database behavior.
> I did not understand the "fsync" stuff you mentioned. Please help me know
> how would fsync is related to log generation or logging host IPs in the log
Seagates from the early 90s are faster than that. Are you logging
more than just connections and disconnections? If you log just those
what's the rate?
fsync is when the OS says to write to disk and the disk confirms the
write is complete. It probably doesn't matter here whether the file
system is using a journaling method that's real safe or not, and you
can go to something like ext2 where there's no journaling and probably
do fine on a dedicated SATA drive or pair if you want them redundant.
The real issue then is what to do with old log files. Right now
you're creating them at 10G an hour, or 240G a day. So you'll need
some cron job to go in and delete the old ones. Still with a 1TB
drive it'll take about 4 days to fill up, so it's not like you're
gonna run out of space in a few minutes or anything.
Since log files are pretty much written sequentially they don't need
the fastest drives ever made. Most modern 7200RPM 3.5" SATA drives
can write at least at 50 or 60 MB/s on their slowest portions. Just
rotate them hourly or daily or whatever and process them and delete
them.
On Tue, Aug 30, 2011 at 12:55 AM, Venkat Balaji <venkat.balaji@verse.in> wrote: > Thanks to all for your very helpful replies ! > As Greg Smith rightly said, i faced a problem of missing connections between > the runs. I even ran the cron every less than a second, but, still that > would become too many runs per second and later i need to take the burden of > calculating every thing from the log. > I did not really calculate the IO load while the logging is on. I would > switch on "log_connections" and "log_disconnections" to log the number of > connections and duration of a connection. yet another reason why we need connection and disconnection triggers (especially the former, since disconnection triggers can be kludged with an on_proc_exit/dblink hook). merlin