Thread: Get the statistics based on the application name and IP address

Get the statistics based on the application name and IP address

From
Ibrar Ahmed
Date:
While working on pg_stat_stements, I got some questions from customers to
have statistics by application and IP address. I know that we are collecting the 
statistics by query id, user id, database id and top-level query. There is no way to
collect the statistics based on IP address and application name. That's possible that
multiple applications issue the same queries with the same user on the same database. We
cannot segregate those queries from which application this query comes. I know we can
this in the log file with log_line_prefix, but I want to see that aggregates like call count based on IP and application
name. I did some POC and had a patch. But before sharing the patch. 

I need to know if there has been any previous discussion about this topic; by the way, 
I did some Googling to find that but failed. 

Thoughts?


--

Ibrar Ahmed. 
Senior Software Engineer, PostgreSQL Consultant.

Re: Get the statistics based on the application name and IP address

From
Julien Rouhaud
Date:
Hi,

On Mon, Aug 08, 2022 at 08:21:06PM +0500, Ibrar Ahmed wrote:
> While working on pg_stat_stements, I got some questions from customers to
> have statistics by application and IP address.
> [...]
> name. I did some POC and had a patch. But before sharing the patch.
>
> I need to know if there has been any previous discussion about this topic;
> by the way,

I don't think there was any discussion on this exactly, but there have been
some related discussions.

This would likely bring 2 problems.  First, for now each entry contains its own
query text in the query file.  There can already be some duplication, which
isn't great, but adding the application_name and/or IP address will make things
way worse, so you would probably need to fix that first.  There has been some
discussion about it recently (1) but more work and benchmarking are needed.

The other problem is the multiplication of entries.  It's a well known
limitation that pg_stat_statements eviction are so costly that it makes it
unusable.  The last numbers I saw about it was ~55% overhead (2).  Adding
application_name or ip address to the key would probably make
pg_stat_statements unusable for anyone who would actually need those metrics.

[1]: https://www.postgresql.org/message-id/flat/604E3199-2DD2-47DD-AC47-774A6F97DCA9%40amazon.com
[2]: https://twitter.com/AndresFreundTec/status/1105585237772263424



Re: Get the statistics based on the application name and IP address

From
Ibrar Ahmed
Date:


On Mon, Aug 8, 2022 at 10:11 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,

On Mon, Aug 08, 2022 at 08:21:06PM +0500, Ibrar Ahmed wrote:
> While working on pg_stat_stements, I got some questions from customers to
> have statistics by application and IP address.
> [...]
> name. I did some POC and had a patch. But before sharing the patch.
>
> I need to know if there has been any previous discussion about this topic;
> by the way,

Thanks for the input. 
I don't think there was any discussion on this exactly, but there have been
some related discussions.

This would likely bring 2 problems. 
 
First, for now each entry contains its own
query text in the query file.  There can already be some duplication, which
isn't great, but adding the application_name and/or IP address will make things
way worse, so you would probably need to fix that first. 
I doubt that makes it worst because these (IP and Application) will be part of
the key, not the query text. But yes, I agree that it will increase the footprint of rows, 
excluding query text.

I am not 100% sure about the query text duplication but will look at that in detail,
if you have more insight, then it will help to solve that.

 
There has been some
discussion about it recently (1) but more work and benchmarking are needed.

The other problem is the multiplication of entries.  It's a well known
limitation that pg_stat_statements eviction are so costly that it makes it
unusable.  The last numbers I saw about it was ~55% overhead (2).  Adding
application_name or ip address to the key would probably make
pg_stat_statements unusable for anyone who would actually need those metrics.

I am sure adding a new item in the key does not affect the performance of evictions of the row, 
as it will not be part of that area.  I am doing some benchmarking and hacking to reduce that and will 
send results with the patch.


[1]: https://www.postgresql.org/message-id/flat/604E3199-2DD2-47DD-AC47-774A6F97DCA9%40amazon.com
[2]: https://twitter.com/AndresFreundTec/status/1105585237772263424


--

Ibrar Ahmed. 
Senior Software Engineer, PostgreSQL Consultant.

Re: Get the statistics based on the application name and IP address

From
Julien Rouhaud
Date:
Hi,

On Wed, Aug 10, 2022 at 10:42:31PM +0500, Ibrar Ahmed wrote:
> On Mon, Aug 8, 2022 at 10:11 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> > First, for now each entry contains its own
> > query text in the query file.  There can already be some duplication, which
> > isn't great, but adding the application_name and/or IP address will make
> > things
> > way worse, so you would probably need to fix that first.
>
> I doubt that makes it worst because these (IP and Application) will be part
> of
> the key, not the query text.

It's because you want to add new elements to the key that it would make it
worse, as the exact same query text will be stored much more often.

> But yes, I agree that it will increase the
> footprint of rows,
> excluding query text.

I don't think that this part should be a concern.

> I am not 100% sure about the query text duplication but will look at that
> in detail,
> if you have more insight, then it will help to solve that.

You can refer to the mentioned thread for the (only?) discussion about that.


> > There has been some
> > discussion about it recently (1) but more work and benchmarking are needed.
> >
> > The other problem is the multiplication of entries.  It's a well known
> > limitation that pg_stat_statements eviction are so costly that it makes it
> > unusable.  The last numbers I saw about it was ~55% overhead (2).  Adding
> > application_name or ip address to the key would probably make
> > pg_stat_statements unusable for anyone who would actually need those
> > metrics.
> >
>
> I am sure adding a new item in the key does not affect the performance of
> evictions of the row,
> as it will not be part of that area.  I am doing some benchmarking and
> hacking to reduce that and will
> send results with the patch.

Sorry if that was unclear.  I didn't meant that adding new items to the key
would make evictions way costlier (although it would have some impact), but
much more frequent.  Adding application_name and the IP to the key can very
easily amplify the number of entries so much that you will either need an
unreasonable value for pg_stat_statements.max (which would likely bring its own
set of problems) if possible at all, or evict entries frequently.