Thread: Manage analytics through tag manager?
We currently use Google Analytics for analysing traffic on the website, and have done for many years. Whilst discussing some ideas to improve the user experience with Jonathan, it became clear to me that ideally we need outbound link tracking, i.e. what link did a user click that took them away from our site. This is useful to know so we can tell, for example, what download option a user ended up choosing, which can better inform us on how to improve the layout of the download pages.
Whilst it is possible to do outbound link tracking directly in Google Analytics, it can be invasive, requiring onclick attributes on every link. It is (in theory) possible to dynamically add those using a script in the base template or similar, but I've never actually been able to get that to work when I've tried.
Instead, I'd like to suggest we change to using Google Tag Manager directly in the site in place of Analytics. Tag Manager uses a couple of similar JS snippets to Analytics so would require minimal changes to the site. However, it can then be used (amongst many other things) to enable Analytics site-wide as it is now, and to automatically send outbound link clicks to Analytics globally or for subsets of pages and target URLs with no further code changes.
--
Thoughts?
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Jun 30, 2020 at 1:55 PM Dave Page <dpage@pgadmin.org> wrote:
We currently use Google Analytics for analysing traffic on the website, and have done for many years. Whilst discussing some ideas to improve the user experience with Jonathan, it became clear to me that ideally we need outbound link tracking, i.e. what link did a user click that took them away from our site. This is useful to know so we can tell, for example, what download option a user ended up choosing, which can better inform us on how to improve the layout of the download pages.Whilst it is possible to do outbound link tracking directly in Google Analytics, it can be invasive, requiring onclick attributes on every link. It is (in theory) possible to dynamically add those using a script in the base template or similar, but I've never actually been able to get that to work when I've tried.Instead, I'd like to suggest we change to using Google Tag Manager directly in the site in place of Analytics. Tag Manager uses a couple of similar JS snippets to Analytics so would require minimal changes to the site. However, it can then be used (amongst many other things) to enable Analytics site-wide as it is now, and to automatically send outbound link clicks to Analytics globally or for subsets of pages and target URLs with no further code changes.
Given the number of sites that completely break and fall over when one blocks GTM, I have to ask: I assume this can be done in a way that has zero impact on those who are sensible enough to block it?
On Tue, Jun 30, 2020 at 1:07 PM Magnus Hagander <magnus@hagander.net> wrote:
On Tue, Jun 30, 2020 at 1:55 PM Dave Page <dpage@pgadmin.org> wrote:We currently use Google Analytics for analysing traffic on the website, and have done for many years. Whilst discussing some ideas to improve the user experience with Jonathan, it became clear to me that ideally we need outbound link tracking, i.e. what link did a user click that took them away from our site. This is useful to know so we can tell, for example, what download option a user ended up choosing, which can better inform us on how to improve the layout of the download pages.Whilst it is possible to do outbound link tracking directly in Google Analytics, it can be invasive, requiring onclick attributes on every link. It is (in theory) possible to dynamically add those using a script in the base template or similar, but I've never actually been able to get that to work when I've tried.Instead, I'd like to suggest we change to using Google Tag Manager directly in the site in place of Analytics. Tag Manager uses a couple of similar JS snippets to Analytics so would require minimal changes to the site. However, it can then be used (amongst many other things) to enable Analytics site-wide as it is now, and to automatically send outbound link clicks to Analytics globally or for subsets of pages and target URLs with no further code changes.Given the number of sites that completely break and fall over when one blocks GTM, I have to ask: I assume this can be done in a way that has zero impact on those who are sensible enough to block it?
I just tested on a couple of sites using it, and blocking didn't seem to affect use of the sites at all.
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 30/6/20 13:55, Dave Page wrote: > We currently use Google Analytics for analysing traffic on the > website, and have done for many years. Whilst discussing some ideas to > improve the user experience with Jonathan, it became clear to me that > ideally we need outbound link tracking, i.e. what link did a user > click that took them away from our site. This is useful to know so we > can tell, for example, what download option a user ended up choosing, > which can better inform us on how to improve the layout of the > download pages. > > Whilst it is possible to do outbound link tracking directly in Google > Analytics, it can be invasive, requiring onclick attributes on every > link. It is (in theory) possible to dynamically add those using a > script in the base template or similar, but I've never actually been > able to get that to work when I've tried. > > Instead, I'd like to suggest we change to using Google Tag Manager > directly in the site in place of Analytics. Tag Manager uses a couple > of similar JS snippets to Analytics so would require minimal changes > to the site. However, it can then be used (amongst many other things) > to enable Analytics site-wide as it is now, and to automatically send > outbound link clicks to Analytics globally or for subsets of pages and > target URLs with no further code changes. > > Thoughts? Slightly different topic, but still related: there are growing concerns about the privacy implications of using Google Analytics for tracking. For a Community web like postgresql.org's, I'd say that considering these privacy concerns should be important. There are many alternatives for achieving the same, or similar knowledge about visits, but without the inconvenients of GA. Not picking on any particular post, but here's an example listing such alternatives: https://nts.strzibny.name/privacy-oriented-alternatives-to-google-analytics/ Is this something that may be subject to consideration? I'd say that our community has many privacy-focused users and this may better serve them (us). Álvaro -- Alvaro Hernandez ----------- OnGres
On Tue, Jun 30, 2020 at 2:22 PM Dave Page <dpage@pgadmin.org> wrote:
On Tue, Jun 30, 2020 at 1:07 PM Magnus Hagander <magnus@hagander.net> wrote:On Tue, Jun 30, 2020 at 1:55 PM Dave Page <dpage@pgadmin.org> wrote:We currently use Google Analytics for analysing traffic on the website, and have done for many years. Whilst discussing some ideas to improve the user experience with Jonathan, it became clear to me that ideally we need outbound link tracking, i.e. what link did a user click that took them away from our site. This is useful to know so we can tell, for example, what download option a user ended up choosing, which can better inform us on how to improve the layout of the download pages.Whilst it is possible to do outbound link tracking directly in Google Analytics, it can be invasive, requiring onclick attributes on every link. It is (in theory) possible to dynamically add those using a script in the base template or similar, but I've never actually been able to get that to work when I've tried.Instead, I'd like to suggest we change to using Google Tag Manager directly in the site in place of Analytics. Tag Manager uses a couple of similar JS snippets to Analytics so would require minimal changes to the site. However, it can then be used (amongst many other things) to enable Analytics site-wide as it is now, and to automatically send outbound link clicks to Analytics globally or for subsets of pages and target URLs with no further code changes.Given the number of sites that completely break and fall over when one blocks GTM, I have to ask: I assume this can be done in a way that has zero impact on those who are sensible enough to block it?I just tested on a couple of sites using it, and blocking didn't seem to affect use of the sites at all.
That's good.
I'd still say we need a very clear reason for it if we're going to collect more information about our visitors. That is, we need a plan for what we're going to do with the data. If we don't have that, we should not collect it.
(Same applies to Google Analytics of course -- and I think at some point we said we were going to re-evaluate our use of that and see if we already had enough information "on our own" for the things we actually do (since most of the details from GA I don't think we use), but I don't think we ever got around to it..)
On 2020-07-01 03:24, Álvaro Hernández wrote: <snip> > Slightly different topic, but still related: there are growing > concerns about the privacy implications of using Google Analytics for > tracking. For a Community web like postgresql.org's, I'd say that > considering these privacy concerns should be important. > > There are many alternatives for achieving the same, or similar > knowledge about visits, but without the inconvenients of GA. Not > picking on any particular post, but here's an example listing such > alternatives: > https://nts.strzibny.name/privacy-oriented-alternatives-to-google-analytics/ Good list. :) With the ones there, I'd recommend *not* using Fathom. We use it for the sqlitebrowser.org website, and (I think) we were one of the larger early public adopters. Literally due to their commitment to OSS. Which they then reneged on, and switched to proprietary. F**ckers. :( Goat Counter, also on there, seems decent, though I'm personally of two minds about it. The author seems like they're trying hard to get their SaaS business up and running, and it's also OSS at the moment. But in recent private conversation, the author also didn't grok why switching a project from OSS to non-OSS once established isn't going to win any friends. Hopefully that's not a path Goat Counter actually goes down. ;) + Justin
On Tue, Jun 30, 2020 at 6:42 PM Magnus Hagander <magnus@hagander.net> wrote:
On Tue, Jun 30, 2020 at 2:22 PM Dave Page <dpage@pgadmin.org> wrote:On Tue, Jun 30, 2020 at 1:07 PM Magnus Hagander <magnus@hagander.net> wrote:On Tue, Jun 30, 2020 at 1:55 PM Dave Page <dpage@pgadmin.org> wrote:We currently use Google Analytics for analysing traffic on the website, and have done for many years. Whilst discussing some ideas to improve the user experience with Jonathan, it became clear to me that ideally we need outbound link tracking, i.e. what link did a user click that took them away from our site. This is useful to know so we can tell, for example, what download option a user ended up choosing, which can better inform us on how to improve the layout of the download pages.Whilst it is possible to do outbound link tracking directly in Google Analytics, it can be invasive, requiring onclick attributes on every link. It is (in theory) possible to dynamically add those using a script in the base template or similar, but I've never actually been able to get that to work when I've tried.Instead, I'd like to suggest we change to using Google Tag Manager directly in the site in place of Analytics. Tag Manager uses a couple of similar JS snippets to Analytics so would require minimal changes to the site. However, it can then be used (amongst many other things) to enable Analytics site-wide as it is now, and to automatically send outbound link clicks to Analytics globally or for subsets of pages and target URLs with no further code changes.Given the number of sites that completely break and fall over when one blocks GTM, I have to ask: I assume this can be done in a way that has zero impact on those who are sensible enough to block it?I just tested on a couple of sites using it, and blocking didn't seem to affect use of the sites at all.That's good.I'd still say we need a very clear reason for it if we're going to collect more information about our visitors. That is, we need a plan for what we're going to do with the data. If we don't have that, we should not collect it.
Let's be clear here - we are not, and do not collect information about individual users (unless they sign up for an account of course).
We collect anonymous usage information that is free of any form of PII that informs us on things like popularity of different pages so we can gauge what works and doesn't work content-wise, browser/device usage so we know what to test with, navigation patterns so we understand how people use our site, and the bit I think is valuable to add; outbound link usage, so we can understand (in the particular case I'm working on) what the popularity of different download options are, particularly those where we do not have any stats at all because they're hosted at third party sites.
Most of that cannot be gained through the very limited amount of server logs we have, and even that which can is not meaningful because they are purged very quickly and only kept for a short while for diagnostics because they do contain PII.
--(Same applies to Google Analytics of course -- and I think at some point we said we were going to re-evaluate our use of that and see if we already had enough information "on our own" for the things we actually do (since most of the details from GA I don't think we use), but I don't think we ever got around to it..)
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 1/7/20 10:07, Dave Page wrote:
On Tue, Jun 30, 2020 at 6:42 PM Magnus Hagander <magnus@hagander.net> wrote:On Tue, Jun 30, 2020 at 2:22 PM Dave Page <dpage@pgadmin.org> wrote:On Tue, Jun 30, 2020 at 1:07 PM Magnus Hagander <magnus@hagander.net> wrote:On Tue, Jun 30, 2020 at 1:55 PM Dave Page <dpage@pgadmin.org> wrote:We currently use Google Analytics for analysing traffic on the website, and have done for many years. Whilst discussing some ideas to improve the user experience with Jonathan, it became clear to me that ideally we need outbound link tracking, i.e. what link did a user click that took them away from our site. This is useful to know so we can tell, for example, what download option a user ended up choosing, which can better inform us on how to improve the layout of the download pages.Whilst it is possible to do outbound link tracking directly in Google Analytics, it can be invasive, requiring onclick attributes on every link. It is (in theory) possible to dynamically add those using a script in the base template or similar, but I've never actually been able to get that to work when I've tried.Instead, I'd like to suggest we change to using Google Tag Manager directly in the site in place of Analytics. Tag Manager uses a couple of similar JS snippets to Analytics so would require minimal changes to the site. However, it can then be used (amongst many other things) to enable Analytics site-wide as it is now, and to automatically send outbound link clicks to Analytics globally or for subsets of pages and target URLs with no further code changes.Given the number of sites that completely break and fall over when one blocks GTM, I have to ask: I assume this can be done in a way that has zero impact on those who are sensible enough to block it?I just tested on a couple of sites using it, and blocking didn't seem to affect use of the sites at all.That's good.I'd still say we need a very clear reason for it if we're going to collect more information about our visitors. That is, we need a plan for what we're going to do with the data. If we don't have that, we should not collect it.Let's be clear here - we are not, and do not collect information about individual users (unless they sign up for an account of course).We collect anonymous usage information that is free of any form of PII that informs us on things like popularity of different pages so we can gauge what works and doesn't work content-wise, browser/device usage so we know what to test with, navigation patterns so we understand how people use our site, and the bit I think is valuable to add; outbound link usage, so we can understand (in the particular case I'm working on) what the popularity of different download options are, particularly those where we do not have any stats at all because they're hosted at third party sites.
Not sure if with this reply you were also considering what I mentioned in the thread about GA concerns in general, but just in case: concerns about GA are not on what information we'd collect, but rather the information that, thanks to us, Google is collecting. Because this information can be cross-referenced with that from other sites, ads and probably many other sources.
I'm all in to have usage and statistics information for the website --as long as they don't include PII of course, and only those that are used and reasonable-- but there are many other tools than GA to do this.
Most of that cannot be gained through the very limited amount of server logs we have, and even that which can is not meaningful because they are purged very quickly and only kept for a short while for diagnostics because they do contain PII.
Actually there is probably much of the information being used now that may be gathered from the logs. But anyway probably other existing tools are better than this.
Álvaro
-- Alvaro Hernandez ----------- OnGres
On Wed, Jul 1, 2020 at 12:18 PM Álvaro Hernández <aht@ongres.com> wrote:
On 1/7/20 10:07, Dave Page wrote:On Tue, Jun 30, 2020 at 6:42 PM Magnus Hagander <magnus@hagander.net> wrote:On Tue, Jun 30, 2020 at 2:22 PM Dave Page <dpage@pgadmin.org> wrote:On Tue, Jun 30, 2020 at 1:07 PM Magnus Hagander <magnus@hagander.net> wrote:On Tue, Jun 30, 2020 at 1:55 PM Dave Page <dpage@pgadmin.org> wrote:We currently use Google Analytics for analysing traffic on the website, and have done for many years. Whilst discussing some ideas to improve the user experience with Jonathan, it became clear to me that ideally we need outbound link tracking, i.e. what link did a user click that took them away from our site. This is useful to know so we can tell, for example, what download option a user ended up choosing, which can better inform us on how to improve the layout of the download pages.Whilst it is possible to do outbound link tracking directly in Google Analytics, it can be invasive, requiring onclick attributes on every link. It is (in theory) possible to dynamically add those using a script in the base template or similar, but I've never actually been able to get that to work when I've tried.Instead, I'd like to suggest we change to using Google Tag Manager directly in the site in place of Analytics. Tag Manager uses a couple of similar JS snippets to Analytics so would require minimal changes to the site. However, it can then be used (amongst many other things) to enable Analytics site-wide as it is now, and to automatically send outbound link clicks to Analytics globally or for subsets of pages and target URLs with no further code changes.Given the number of sites that completely break and fall over when one blocks GTM, I have to ask: I assume this can be done in a way that has zero impact on those who are sensible enough to block it?I just tested on a couple of sites using it, and blocking didn't seem to affect use of the sites at all.That's good.I'd still say we need a very clear reason for it if we're going to collect more information about our visitors. That is, we need a plan for what we're going to do with the data. If we don't have that, we should not collect it.Let's be clear here - we are not, and do not collect information about individual users (unless they sign up for an account of course).We collect anonymous usage information that is free of any form of PII that informs us on things like popularity of different pages so we can gauge what works and doesn't work content-wise, browser/device usage so we know what to test with, navigation patterns so we understand how people use our site, and the bit I think is valuable to add; outbound link usage, so we can understand (in the particular case I'm working on) what the popularity of different download options are, particularly those where we do not have any stats at all because they're hosted at third party sites.
Not sure if with this reply you were also considering what I mentioned in the thread about GA concerns in general, but just in case: concerns about GA are not on what information we'd collect, but rather the information that, thanks to us, Google is collecting. Because this information can be cross-referenced with that from other sites, ads and probably many other sources.
No, I wasn't because the discussion of what to use for analytics is tangential to the purpose of this thread which is to suggest a different way to integrate with GA to enable to functionality that would be very useful at the moment (the start of which can be seen in the other thread I just started on this list). Proposing alternatives to GA is a valid topic, but not something that's going to be decided and implemented in a day or two.
I'm all in to have usage and statistics information for the website --as long as they don't include PII of course, and only those that are used and reasonable-- but there are many other tools than GA to do this.Most of that cannot be gained through the very limited amount of server logs we have, and even that which can is not meaningful because they are purged very quickly and only kept for a short while for diagnostics because they do contain PII.
Actually there is probably much of the information being used now that may be gathered from the logs. But anyway probably other existing tools are better than this.
Not really. There are no access logs on the cache servers, and very limited logs on the backend server (which really only covers non-GET requests).
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 1/7/20 14:40, Dave Page wrote:
On Wed, Jul 1, 2020 at 12:18 PM Álvaro Hernández <aht@ongres.com> wrote:On 1/7/20 10:07, Dave Page wrote:On Tue, Jun 30, 2020 at 6:42 PM Magnus Hagander <magnus@hagander.net> wrote:On Tue, Jun 30, 2020 at 2:22 PM Dave Page <dpage@pgadmin.org> wrote:On Tue, Jun 30, 2020 at 1:07 PM Magnus Hagander <magnus@hagander.net> wrote:On Tue, Jun 30, 2020 at 1:55 PM Dave Page <dpage@pgadmin.org> wrote:We currently use Google Analytics for analysing traffic on the website, and have done for many years. Whilst discussing some ideas to improve the user experience with Jonathan, it became clear to me that ideally we need outbound link tracking, i.e. what link did a user click that took them away from our site. This is useful to know so we can tell, for example, what download option a user ended up choosing, which can better inform us on how to improve the layout of the download pages.Whilst it is possible to do outbound link tracking directly in Google Analytics, it can be invasive, requiring onclick attributes on every link. It is (in theory) possible to dynamically add those using a script in the base template or similar, but I've never actually been able to get that to work when I've tried.Instead, I'd like to suggest we change to using Google Tag Manager directly in the site in place of Analytics. Tag Manager uses a couple of similar JS snippets to Analytics so would require minimal changes to the site. However, it can then be used (amongst many other things) to enable Analytics site-wide as it is now, and to automatically send outbound link clicks to Analytics globally or for subsets of pages and target URLs with no further code changes.Given the number of sites that completely break and fall over when one blocks GTM, I have to ask: I assume this can be done in a way that has zero impact on those who are sensible enough to block it?I just tested on a couple of sites using it, and blocking didn't seem to affect use of the sites at all.That's good.I'd still say we need a very clear reason for it if we're going to collect more information about our visitors. That is, we need a plan for what we're going to do with the data. If we don't have that, we should not collect it.Let's be clear here - we are not, and do not collect information about individual users (unless they sign up for an account of course).We collect anonymous usage information that is free of any form of PII that informs us on things like popularity of different pages so we can gauge what works and doesn't work content-wise, browser/device usage so we know what to test with, navigation patterns so we understand how people use our site, and the bit I think is valuable to add; outbound link usage, so we can understand (in the particular case I'm working on) what the popularity of different download options are, particularly those where we do not have any stats at all because they're hosted at third party sites.
Not sure if with this reply you were also considering what I mentioned in the thread about GA concerns in general, but just in case: concerns about GA are not on what information we'd collect, but rather the information that, thanks to us, Google is collecting. Because this information can be cross-referenced with that from other sites, ads and probably many other sources.No, I wasn't because the discussion of what to use for analytics is tangential to the purpose of this thread which is to suggest a different way to integrate with GA to enable to functionality that would be very useful at the moment (the start of which can be seen in the other thread I just started on this list). Proposing alternatives to GA is a valid topic, but not something that's going to be decided and implemented in a day or two.
Sure. While tangential I thought it would be a good moment to raise this as "to integrate with GA" is some work that would be done differently if GA was considered to be replaced by something that would take more care of postgresql.org's visitors privacy. Anyway, my suggestion/idea is there if anyone wants to consider it :)
I'm all in to have usage and statistics information for the website --as long as they don't include PII of course, and only those that are used and reasonable-- but there are many other tools than GA to do this.Most of that cannot be gained through the very limited amount of server logs we have, and even that which can is not meaningful because they are purged very quickly and only kept for a short while for diagnostics because they do contain PII.
Actually there is probably much of the information being used now that may be gathered from the logs. But anyway probably other existing tools are better than this.Not really. There are no access logs on the cache servers, and very limited logs on the backend server (which really only covers non-GET requests).
I'm not aware of what cache servers are used, but some offer mechanisms to export and access the logs. It might not be the case.
Álvaro
-- Alvaro Hernandez ----------- OnGres
On Wed, Jul 1, 2020 at 10:07 AM Dave Page <dpage@pgadmin.org> wrote:
On Tue, Jun 30, 2020 at 6:42 PM Magnus Hagander <magnus@hagander.net> wrote:On Tue, Jun 30, 2020 at 2:22 PM Dave Page <dpage@pgadmin.org> wrote:On Tue, Jun 30, 2020 at 1:07 PM Magnus Hagander <magnus@hagander.net> wrote:On Tue, Jun 30, 2020 at 1:55 PM Dave Page <dpage@pgadmin.org> wrote:We currently use Google Analytics for analysing traffic on the website, and have done for many years. Whilst discussing some ideas to improve the user experience with Jonathan, it became clear to me that ideally we need outbound link tracking, i.e. what link did a user click that took them away from our site. This is useful to know so we can tell, for example, what download option a user ended up choosing, which can better inform us on how to improve the layout of the download pages.Whilst it is possible to do outbound link tracking directly in Google Analytics, it can be invasive, requiring onclick attributes on every link. It is (in theory) possible to dynamically add those using a script in the base template or similar, but I've never actually been able to get that to work when I've tried.Instead, I'd like to suggest we change to using Google Tag Manager directly in the site in place of Analytics. Tag Manager uses a couple of similar JS snippets to Analytics so would require minimal changes to the site. However, it can then be used (amongst many other things) to enable Analytics site-wide as it is now, and to automatically send outbound link clicks to Analytics globally or for subsets of pages and target URLs with no further code changes.Given the number of sites that completely break and fall over when one blocks GTM, I have to ask: I assume this can be done in a way that has zero impact on those who are sensible enough to block it?I just tested on a couple of sites using it, and blocking didn't seem to affect use of the sites at all.That's good.I'd still say we need a very clear reason for it if we're going to collect more information about our visitors. That is, we need a plan for what we're going to do with the data. If we don't have that, we should not collect it.Let's be clear here - we are not, and do not collect information about individual users (unless they sign up for an account of course).
I think that's a matter of definition. Well, *we* don't, but Google might.
We collect anonymous usage information that is free of any form of PII that informs us on things like popularity of different pages so we can gauge what works and doesn't work content-wise, browser/device usage so we know what to test with, navigation patterns so we understand how people use our site, and the bit I think is valuable to add; outbound link usage, so we can understand (in the particular case I'm working on) what the popularity of different download options are, particularly those where we do not have any stats at all because they're hosted at third party sites.Most of that cannot be gained through the very limited amount of server logs we have, and even that which can is not meaningful because they are purged very quickly and only kept for a short while for diagnostics because they do contain PII.
There's a fair amount of stuff we could get out of those *if we wanted to*. Basically, if the information we want to look at is present in the http requests or responses, we could easily collect metrics on that without involving a third party. The only reason we're not doing that today, is that we have not defined what kind of metrics we actually want.
Now don't get me wrong -- I'm not against collecting proper metrics, and using the right tool for it. I'm just saying we shouldn't collect more information than we need (and by "we" I mean neither we nor google on our behalf (or another third party)). So I think we should really start with what we need, and take it from there.
And if what we need cannot be collected from the request data that we already have, then we should certainly look at other solutions. Whether they're Google Analytics or Tag Manager or a separate product we install on our own infra etc. I think in particular when you say "outbound link data usage" you mean people clicking links on our site that goes to a different site, right? That being requests that never hit our servers, it wouldn't be in our request data. But if that's *all* we care about around them, we could just have a tiny collector dumping that data into a postgres database...
//Magnus
Hi
On Wed, Jul 1, 2020 at 8:28 PM Magnus Hagander <magnus@hagander.net> wrote:
Now don't get me wrong -- I'm not against collecting proper metrics, and using the right tool for it. I'm just saying we shouldn't collect more information than we need (and by "we" I mean neither we nor google on our behalf (or another third party)). So I think we should really start with what we need, and take it from there.
Right. That's exactly why I started this thread by explaining what I need.
And if what we need cannot be collected from the request data that we already have, then we should certainly look at other solutions. Whether they're Google Analytics or Tag Manager or a separate product we install on our own infra etc. I think in particular when you say "outbound link data usage" you mean people clicking links on our site that goes to a different site, right? That being requests that never hit our servers, it wouldn't be in our request data.
Yes.
But if that's *all* we care about around them, we could just have a tiny collector dumping that data into a postgres database...
So the options are:
- Change the way we call the system we're already using with a five minute patch, to one that easily allows us to send outbound clicks to the same system we're already using, and then easily turn it off again once we have useful data that can inform our design choices, or.
- Build a new mechanism from scratch that will require additional code in both the website frontend and backend, at least one potentially large table in the database, and the associated additional load as pages that currently get served entirely from varnish start making callbacks to the main server. Once we have the data we then have to go and remove all of that setup again (or at the very least, the frontend part), and if we later need additional data from elsewhere on the site (or different data entirely), we have to then modify the site to put the frontend code back in the new place of interest, or rewrite/extend the code we have to log something new.
The former is clearly orders of magnitude easier to implement and allows us the flexibility to measure different areas of the site or even entirely different metrics with just a few clicks, and turn it off again just as easily.
The latter avoids sending the additional click data to Google, which makes little to no difference because they're already getting the regular analytics data anyway, and which almost certainly contains everything that is important to them.
I have no issue with a longer term project to change the analytics to another service that can give us useful information (the nature of which we may not actually know until we come to try to improve a particular are of the site), but for what I'm trying to achieve right now it seems like I'm being asked to jump through a 30cm hoop that's being dangled from the top of a 4 storey building.
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
> On 2 Jul 2020, at 11:01, Dave Page <dpage@pgadmin.org> wrote: > ..for what I'm trying to achieve right now it seems like I'm being asked to jump through a 30cm hoop that's being dangledfrom the top of a 4 storey building. To be fair, I think the shape and placement of the hoops would've been different if it was made clear(er) from the start that the proposal was to measure outbound traffic for a *limited period*. At least I didn't understand that until this email. Settling on the future analytics needs for postgresql.org sounds like a topic well suited for an unconference style discussion, maybe we'll end up changing nothing but it's worth discussing. What we can do to address concerns with tracking in the meantime is to honor the DoNotTrack header in the site template, and only load GA/GTM in case navigator.doNotTrack isn't set. Something like the (completely untested) attached migth be all we need? The diff also removes the ability to load GA over http, which we clearly shouldn't allow (and since moving the site to all https we don't anyways IIUC). cheers ./daniel
Attachment
On Thu, Jul 2, 2020 at 12:56 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> On 2 Jul 2020, at 11:01, Dave Page <dpage@pgadmin.org> wrote:
> ..for what I'm trying to achieve right now it seems like I'm being asked to jump through a 30cm hoop that's being dangled from the top of a 4 storey building.
To be fair, I think the shape and placement of the hoops would've been
different if it was made clear(er) from the start that the proposal was to
measure outbound traffic for a *limited period*. At least I didn't understand
that until this email.
Sorry, I thought that was clear as I said in my original message that I thought the outbound link tracking would be useful for the project I'm working on. On re-reading I see that my wording on that was somewhat less than awesome.
Settling on the future analytics needs for postgresql.org sounds like a topic
well suited for an unconference style discussion, maybe we'll end up changing
nothing but it's worth discussing.
Agreed.
What we can do to address concerns with tracking in the meantime is to honor
the DoNotTrack header in the site template, and only load GA/GTM in case
navigator.doNotTrack isn't set. Something like the (completely untested)
attached migth be all we need? The diff also removes the ability to load GA
over http, which we clearly shouldn't allow (and since moving the site to all
https we don't anyways IIUC).
I think we'd also need to check windows.doNotTrack to cover Microsoft browsers, but yes, I think that will work with GA.
In GTM you can do it in the config (i.e. through the container settings in the management UI). Browsers would still pull the code of course, but then any tracking would be disabled.
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Jul 2, 2020 at 1:23 PM Dave Page <dpage@pgadmin.org> wrote:
On Thu, Jul 2, 2020 at 12:56 PM Daniel Gustafsson <daniel@yesql.se> wrote:> On 2 Jul 2020, at 11:01, Dave Page <dpage@pgadmin.org> wrote:
> ..for what I'm trying to achieve right now it seems like I'm being asked to jump through a 30cm hoop that's being dangled from the top of a 4 storey building.
To be fair, I think the shape and placement of the hoops would've been
different if it was made clear(er) from the start that the proposal was to
measure outbound traffic for a *limited period*. At least I didn't understand
that until this email.Sorry, I thought that was clear as I said in my original message that I thought the outbound link tracking would be useful for the project I'm working on. On re-reading I see that my wording on that was somewhat less than awesome.
Settling on the future analytics needs for postgresql.org sounds like a topic
well suited for an unconference style discussion, maybe we'll end up changing
nothing but it's worth discussing.Agreed.
What we can do to address concerns with tracking in the meantime is to honor
the DoNotTrack header in the site template, and only load GA/GTM in case
navigator.doNotTrack isn't set. Something like the (completely untested)
attached migth be all we need? The diff also removes the ability to load GA
over http, which we clearly shouldn't allow (and since moving the site to all
https we don't anyways IIUC).I think we'd also need to check windows.doNotTrack to cover Microsoft browsers, but yes, I think that will work with GA.
Err, that should be window.doNotTrack of course.
In GTM you can do it in the config (i.e. through the container settings in the management UI). Browsers would still pull the code of course, but then any tracking would be disabled.--Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
> On 2 Jul 2020, at 14:23, Dave Page <dpage@pgadmin.org> wrote: > On Thu, Jul 2, 2020 at 12:56 PM Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> wrote: > What we can do to address concerns with tracking in the meantime is to honor > the DoNotTrack header in the site template, and only load GA/GTM in case > navigator.doNotTrack isn't set. Something like the (completely untested) > attached migth be all we need? The diff also removes the ability to load GA > over http, which we clearly shouldn't allow (and since moving the site to all > https we don't anyways IIUC). > > I think we'd also need to check windows.doNotTrack to cover Microsoft browsers, but yes, I think that will work with GA. Ah yes, my bad, window.doNotTrack (as you corrected in a subsequent email) is required for IE and Safari. > In GTM you can do it in the config (i.e. through the container settings in the management UI). Browsers would still pullthe code of course, but then any tracking would be disabled. Right, I still prefer to do it on our side and put us in charge of whether tracking is invoked or not. cheers ./daniel
On Thu, Jul 2, 2020 at 2:33 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> On 2 Jul 2020, at 14:23, Dave Page <dpage@pgadmin.org> wrote:
> On Thu, Jul 2, 2020 at 12:56 PM Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> wrote:
> What we can do to address concerns with tracking in the meantime is to honor
> the DoNotTrack header in the site template, and only load GA/GTM in case
> navigator.doNotTrack isn't set. Something like the (completely untested)
> attached migth be all we need? The diff also removes the ability to load GA
> over http, which we clearly shouldn't allow (and since moving the site to all
> https we don't anyways IIUC).
>
> I think we'd also need to check windows.doNotTrack to cover Microsoft browsers, but yes, I think that will work with GA.
Ah yes, my bad, window.doNotTrack (as you corrected in a subsequent email) is
required for IE and Safari.
> In GTM you can do it in the config (i.e. through the container settings in the management UI). Browsers would still pull the code of course, but then any tracking would be disabled.
Right, I still prefer to do it on our side and put us in charge of whether
tracking is invoked or not.
+1.
(Both on the turn it took with some more details in general, and specifically on preventing even loading the GA/GTM scripts if the browser is set to do not track)
On Thu, Jul 2, 2020 at 1:50 PM Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Jul 2, 2020 at 2:33 PM Daniel Gustafsson <daniel@yesql.se> wrote:> On 2 Jul 2020, at 14:23, Dave Page <dpage@pgadmin.org> wrote:
> On Thu, Jul 2, 2020 at 12:56 PM Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> wrote:
> What we can do to address concerns with tracking in the meantime is to honor
> the DoNotTrack header in the site template, and only load GA/GTM in case
> navigator.doNotTrack isn't set. Something like the (completely untested)
> attached migth be all we need? The diff also removes the ability to load GA
> over http, which we clearly shouldn't allow (and since moving the site to all
> https we don't anyways IIUC).
>
> I think we'd also need to check windows.doNotTrack to cover Microsoft browsers, but yes, I think that will work with GA.
Ah yes, my bad, window.doNotTrack (as you corrected in a subsequent email) is
required for IE and Safari.
> In GTM you can do it in the config (i.e. through the container settings in the management UI). Browsers would still pull the code of course, but then any tracking would be disabled.
Right, I still prefer to do it on our side and put us in charge of whether
tracking is invoked or not.+1.(Both on the turn it took with some more details in general, and specifically on preventing even loading the GA/GTM scripts if the browser is set to do not track)
Thanks. Can you or Daniel or someone else please give the attached patch a quick look over before I commit it?
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment
> On 2 Jul 2020, at 15:38, Dave Page <dpage@pgadmin.org> wrote: > Can you or Daniel or someone else please give the attached patch a quick look over before I commit it? Correct me if I'm wrong, but won't this only run tracking for those with DNT set? +var DNT = navigator.doNotTrack || window.doNotTrack || navigator.msDoNotTrack || window.msDoNotTrack; +if ((DNT == "1") || (DNT == "yes")) { + (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': cheers ./daniel
On Thu, Jul 2, 2020 at 2:38 PM Dave Page <dpage@pgadmin.org> wrote:
On Thu, Jul 2, 2020 at 1:50 PM Magnus Hagander <magnus@hagander.net> wrote:On Thu, Jul 2, 2020 at 2:33 PM Daniel Gustafsson <daniel@yesql.se> wrote:> On 2 Jul 2020, at 14:23, Dave Page <dpage@pgadmin.org> wrote:
> On Thu, Jul 2, 2020 at 12:56 PM Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> wrote:
> What we can do to address concerns with tracking in the meantime is to honor
> the DoNotTrack header in the site template, and only load GA/GTM in case
> navigator.doNotTrack isn't set. Something like the (completely untested)
> attached migth be all we need? The diff also removes the ability to load GA
> over http, which we clearly shouldn't allow (and since moving the site to all
> https we don't anyways IIUC).
>
> I think we'd also need to check windows.doNotTrack to cover Microsoft browsers, but yes, I think that will work with GA.
Ah yes, my bad, window.doNotTrack (as you corrected in a subsequent email) is
required for IE and Safari.
> In GTM you can do it in the config (i.e. through the container settings in the management UI). Browsers would still pull the code of course, but then any tracking would be disabled.
Right, I still prefer to do it on our side and put us in charge of whether
tracking is invoked or not.+1.(Both on the turn it took with some more details in general, and specifically on preventing even loading the GA/GTM scripts if the browser is set to do not track)Thanks. Can you or Daniel or someone else please give the attached patch a quick look over before I commit it?
Hold that thought, logic bug detected...
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Jul 2, 2020 at 2:52 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> On 2 Jul 2020, at 15:38, Dave Page <dpage@pgadmin.org> wrote:
> Can you or Daniel or someone else please give the attached patch a quick look over before I commit it?
Correct me if I'm wrong, but won't this only run tracking for those with DNT set?
+var DNT = navigator.doNotTrack || window.doNotTrack || navigator.msDoNotTrack || window.msDoNotTrack;
+if ((DNT == "1") || (DNT == "yes")) {
+ (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
Yeah, sorry - brain fart, per my previous email.
Let's try again with the attached.
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment
> On 2 Jul 2020, at 15:56, Dave Page <dpage@pgadmin.org> wrote: > Let's try again with the attached. LGTM cheers ./daniel
On Thu, Jul 2, 2020 at 2:59 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> On 2 Jul 2020, at 15:56, Dave Page <dpage@pgadmin.org> wrote:
> Let's try again with the attached.
LGTM
Thanks!
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company