Thread: Resetting spilled txn statistics in pg_stat_replication
Hi all, Tracking of spilled transactions has been introduced to PG13. These new statistics values, spill_txns, spill_count, and spill_bytes, are cumulative total values unlike other statistics values in pg_stat_replication. How can we reset these values? We can reset statistics values in other statistics views using by pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that the only option to reset spilled transactions is to restart logical replication but it's surely high cost. It might have been discussed during development but it's worth having a SQL function to reset these statistics? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
At Tue, 2 Jun 2020 15:17:36 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > Hi all, > > Tracking of spilled transactions has been introduced to PG13. These > new statistics values, spill_txns, spill_count, and spill_bytes, are > cumulative total values unlike other statistics values in > pg_stat_replication. How can we reset these values? We can reset > statistics values in other statistics views using by > pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that > the only option to reset spilled transactions is to restart logical > replication but it's surely high cost. > > It might have been discussed during development but it's worth having > a SQL function to reset these statistics? Actually, I don't see pg_stat_reset() useful so much except for our regression test (or might be rather harmful for monitoring aids). So I doubt the usefulness of the feature, but having it makes things more consistent. Anyway I think the most significant point of implementing the feature would be user interface. Adding pg_stat_replication_reset(pid int) doesn't seem to be a good thing to do.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/06/02 16:00, Kyotaro Horiguchi wrote: > At Tue, 2 Jun 2020 15:17:36 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in >> Hi all, >> >> Tracking of spilled transactions has been introduced to PG13. These >> new statistics values, spill_txns, spill_count, and spill_bytes, are >> cumulative total values unlike other statistics values in >> pg_stat_replication. Basically I don't think it's good design to mix dynamic and collected stats in one view. It might be better to separate them into different new stats view. It's too late to add new stats view for v13, though.... Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Tue, Jun 2, 2020 at 11:48 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > Hi all, > > Tracking of spilled transactions has been introduced to PG13. These > new statistics values, spill_txns, spill_count, and spill_bytes, are > cumulative total values unlike other statistics values in > pg_stat_replication. How can we reset these values? We can reset > statistics values in other statistics views using by > pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that > the only option to reset spilled transactions is to restart logical > replication but it's surely high cost. > I see your point but I don't see a pressing need for such a function for PG13. Basically, these counters will be populated when we have large transactions in the system so not sure how much is the use case for such a function. Note that we need to add additional column stats_reset in pg_stat_replication view as well similar to what we have in pg_stat_archiver and pg_stat_bgwriter. OTOH, I don't see any big reason for not having such a function for PG14. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, 2 Jun 2020 at 17:22, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/06/02 16:00, Kyotaro Horiguchi wrote: > > At Tue, 2 Jun 2020 15:17:36 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > >> Hi all, > >> > >> Tracking of spilled transactions has been introduced to PG13. These > >> new statistics values, spill_txns, spill_count, and spill_bytes, are > >> cumulative total values unlike other statistics values in > >> pg_stat_replication. > > Basically I don't think it's good design to mix dynamic and collected > stats in one view. It might be better to separate them into different > new stats view. It's too late to add new stats view for v13, though.... Yeah, actually I had the same impression when studying this feature. Another benefit of having a separate view for such statistics would be that it can also support logical decoding invoked by SQL interface. Currently, reorder buffer always tracks statistics of spilled transactions but we can see it only in using logical replication. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jun 2, 2020 at 1:52 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/06/02 16:00, Kyotaro Horiguchi wrote: > > At Tue, 2 Jun 2020 15:17:36 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > >> Hi all, > >> > >> Tracking of spilled transactions has been introduced to PG13. These > >> new statistics values, spill_txns, spill_count, and spill_bytes, are > >> cumulative total values unlike other statistics values in > >> pg_stat_replication. > > Basically I don't think it's good design to mix dynamic and collected > stats in one view. It might be better to separate them into different > new stats view. > I think this is worth considering but note that we already have a similar mix in other views like pg_stat_archiver (archived_count and failed_count are dynamic whereas other columns are static). On the one hand, it is good if we have a separate view for such dynamic information but OTOH users need to consult more views for replication information. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, 2 Jun 2020 at 16:00, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Tue, 2 Jun 2020 15:17:36 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > > Hi all, > > > > Tracking of spilled transactions has been introduced to PG13. These > > new statistics values, spill_txns, spill_count, and spill_bytes, are > > cumulative total values unlike other statistics values in > > pg_stat_replication. How can we reset these values? We can reset > > statistics values in other statistics views using by > > pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that > > the only option to reset spilled transactions is to restart logical > > replication but it's surely high cost. > > > > It might have been discussed during development but it's worth having > > a SQL function to reset these statistics? > > Actually, I don't see pg_stat_reset() useful so much except for our > regression test (or might be rather harmful for monitoring aids). So > I doubt the usefulness of the feature, but having it makes things more > consistent. IMO these reset functions are useful for verifications. I often use them before starting performance evaluations. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 2 Jun 2020 at 18:34, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jun 2, 2020 at 11:48 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > Hi all, > > > > Tracking of spilled transactions has been introduced to PG13. These > > new statistics values, spill_txns, spill_count, and spill_bytes, are > > cumulative total values unlike other statistics values in > > pg_stat_replication. How can we reset these values? We can reset > > statistics values in other statistics views using by > > pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that > > the only option to reset spilled transactions is to restart logical > > replication but it's surely high cost. > > > > I see your point but I don't see a pressing need for such a function > for PG13. Basically, these counters will be populated when we have > large transactions in the system so not sure how much is the use case > for such a function. Note that we need to add additional column > stats_reset in pg_stat_replication view as well similar to what we > have in pg_stat_archiver and pg_stat_bgwriter. OTOH, I don't see any > big reason for not having such a function for PG14. Ok. I think the reset function is mostly for evaluations or rare cases. In either case, since it's not an urgent case we can postpone it to PG14 if necessary. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jun 9, 2020 at 9:12 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
On Tue, 2 Jun 2020 at 18:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jun 2, 2020 at 11:48 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > Hi all,
> >
> > Tracking of spilled transactions has been introduced to PG13. These
> > new statistics values, spill_txns, spill_count, and spill_bytes, are
> > cumulative total values unlike other statistics values in
> > pg_stat_replication. How can we reset these values? We can reset
> > statistics values in other statistics views using by
> > pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that
> > the only option to reset spilled transactions is to restart logical
> > replication but it's surely high cost.
You just have to "bounce" the worker though, right? You don't have to actually restart logical replication, just disconnect and reconnect?
> I see your point but I don't see a pressing need for such a function
> for PG13. Basically, these counters will be populated when we have
> large transactions in the system so not sure how much is the use case
> for such a function. Note that we need to add additional column
> stats_reset in pg_stat_replication view as well similar to what we
> have in pg_stat_archiver and pg_stat_bgwriter. OTOH, I don't see any
> big reason for not having such a function for PG14.
Ok. I think the reset function is mostly for evaluations or rare
cases. In either case, since it's not an urgent case we can postpone
it to PG14 if necessary.
Reading through this thread, I agree that it's kind of weird to keep cumulative stats mixed with non-cumulative stats. (it always irks me, for example, that we have numbackends in pg_stat_database which behaves different from every other column in it)
However, I don't see how they *are* cumulative. They are only cumulative while the client is connected -- as soon as it disconnects they go away. In that regard, they're more like the pg_stat_progress_xyz views for example.
Which makes it mostly useless for long-term tracking anyway. Because no matter which way you snapshot it, you will lose data.
ISTM the logical places to keep cumulative stats would be pg_replication_slots? (Or go create a pg_stat_replication_slots?) That is, that the logical grouping of these statistics for long-term is the replication slot, not the walsender?
On Tue, Jun 9, 2020 at 1:54 PM Magnus Hagander <magnus@hagander.net> wrote: > > On Tue, Jun 9, 2020 at 9:12 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Tue, 2 Jun 2020 at 18:34, Amit Kapila <amit.kapila16@gmail.com> wrote: >> > >> > I see your point but I don't see a pressing need for such a function >> > for PG13. Basically, these counters will be populated when we have >> > large transactions in the system so not sure how much is the use case >> > for such a function. Note that we need to add additional column >> > stats_reset in pg_stat_replication view as well similar to what we >> > have in pg_stat_archiver and pg_stat_bgwriter. OTOH, I don't see any >> > big reason for not having such a function for PG14. >> >> Ok. I think the reset function is mostly for evaluations or rare >> cases. In either case, since it's not an urgent case we can postpone >> it to PG14 if necessary. > > > Reading through this thread, I agree that it's kind of weird to keep cumulative stats mixed with non-cumulative stats.(it always irks me, for example, that we have numbackends in pg_stat_database which behaves different from every othercolumn in it) > > However, I don't see how they *are* cumulative. They are only cumulative while the client is connected -- as soon as itdisconnects they go away. In that regard, they're more like the pg_stat_progress_xyz views for example. > > Which makes it mostly useless for long-term tracking anyway. Because no matter which way you snapshot it, you will losedata. > > ISTM the logical places to keep cumulative stats would be pg_replication_slots? (Or go create a pg_stat_replication_slots?)That is, that the logical grouping of these statistics for long-term is the replication slot,not the walsender? > I think I see one advantage of displaying these stats at slot level. Currently, we won't be able to see these stats when we use SQL Interface APIs (like pg_logical_get_slot_changes) to decode the WAL but if we display at slot level, then we should be able to see it. I would prefer to display it in pg_replication_slots just to avoid creating more views but OTOH, if a new view like pg_stat_replication_slots sounds better place for these stats then I am fine with it too. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, 9 Jun 2020 at 17:24, Magnus Hagander <magnus@hagander.net> wrote: > > > > On Tue, Jun 9, 2020 at 9:12 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Tue, 2 Jun 2020 at 18:34, Amit Kapila <amit.kapila16@gmail.com> wrote: >> > >> > On Tue, Jun 2, 2020 at 11:48 AM Masahiko Sawada >> > <masahiko.sawada@2ndquadrant.com> wrote: >> > > >> > > Hi all, >> > > >> > > Tracking of spilled transactions has been introduced to PG13. These >> > > new statistics values, spill_txns, spill_count, and spill_bytes, are >> > > cumulative total values unlike other statistics values in >> > > pg_stat_replication. How can we reset these values? We can reset >> > > statistics values in other statistics views using by >> > > pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that >> > > the only option to reset spilled transactions is to restart logical >> > > replication but it's surely high cost. > > > You just have to "bounce" the worker though, right? You don't have to actually restart logical replication, just disconnectand reconnect? Right. > > >> > I see your point but I don't see a pressing need for such a function >> > for PG13. Basically, these counters will be populated when we have >> > large transactions in the system so not sure how much is the use case >> > for such a function. Note that we need to add additional column >> > stats_reset in pg_stat_replication view as well similar to what we >> > have in pg_stat_archiver and pg_stat_bgwriter. OTOH, I don't see any >> > big reason for not having such a function for PG14. >> >> Ok. I think the reset function is mostly for evaluations or rare >> cases. In either case, since it's not an urgent case we can postpone >> it to PG14 if necessary. > > > Reading through this thread, I agree that it's kind of weird to keep cumulative stats mixed with non-cumulative stats.(it always irks me, for example, that we have numbackends in pg_stat_database which behaves different from every othercolumn in it) > > However, I don't see how they *are* cumulative. They are only cumulative while the client is connected -- as soon as itdisconnects they go away. In that regard, they're more like the pg_stat_progress_xyz views for example. > > Which makes it mostly useless for long-term tracking anyway. Because no matter which way you snapshot it, you will losedata. > > ISTM the logical places to keep cumulative stats would be pg_replication_slots? (Or go create a pg_stat_replication_slots?)That is, that the logical grouping of these statistics for long-term is the replication slot,not the walsender? I personally prefer to display these values in pg_replication_slots. If we create a new stats view, it's only for logical replication slots? Or displaying both types of slots as physical replication slots might have statistics in the future? If we move these values to replication slots, I think we can change the code so that these statistics are managed by replication slots (e.g. ReplicationSlot struct). Once ReplicationSlot has these values, we can keep them beyond reconnections or multiple calls of SQL interface functions. Of course, these values don’t need to be persisted. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jun 10, 2020 at 9:01 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
On Tue, 9 Jun 2020 at 17:24, Magnus Hagander <magnus@hagander.net> wrote:
>
>
>
> On Tue, Jun 9, 2020 at 9:12 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
>>
>> On Tue, 2 Jun 2020 at 18:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >
>> > On Tue, Jun 2, 2020 at 11:48 AM Masahiko Sawada
>> > <masahiko.sawada@2ndquadrant.com> wrote:
>> > >
>> > > Hi all,
>> > >
>> > > Tracking of spilled transactions has been introduced to PG13. These
>> > > new statistics values, spill_txns, spill_count, and spill_bytes, are
>> > > cumulative total values unlike other statistics values in
>> > > pg_stat_replication. How can we reset these values? We can reset
>> > > statistics values in other statistics views using by
>> > > pg_stat_reset_shared(), pg_stat_reset() and so on. It seems to me that
>> > > the only option to reset spilled transactions is to restart logical
>> > > replication but it's surely high cost.
>
>
> You just have to "bounce" the worker though, right? You don't have to actually restart logical replication, just disconnect and reconnect?
Right.
>
>
>> > I see your point but I don't see a pressing need for such a function
>> > for PG13. Basically, these counters will be populated when we have
>> > large transactions in the system so not sure how much is the use case
>> > for such a function. Note that we need to add additional column
>> > stats_reset in pg_stat_replication view as well similar to what we
>> > have in pg_stat_archiver and pg_stat_bgwriter. OTOH, I don't see any
>> > big reason for not having such a function for PG14.
>>
>> Ok. I think the reset function is mostly for evaluations or rare
>> cases. In either case, since it's not an urgent case we can postpone
>> it to PG14 if necessary.
>
>
> Reading through this thread, I agree that it's kind of weird to keep cumulative stats mixed with non-cumulative stats. (it always irks me, for example, that we have numbackends in pg_stat_database which behaves different from every other column in it)
>
> However, I don't see how they *are* cumulative. They are only cumulative while the client is connected -- as soon as it disconnects they go away. In that regard, they're more like the pg_stat_progress_xyz views for example.
>
> Which makes it mostly useless for long-term tracking anyway. Because no matter which way you snapshot it, you will lose data.
>
> ISTM the logical places to keep cumulative stats would be pg_replication_slots? (Or go create a pg_stat_replication_slots?) That is, that the logical grouping of these statistics for long-term is the replication slot, not the walsender?
I personally prefer to display these values in pg_replication_slots.
If we create a new stats view, it's only for logical replication
slots? Or displaying both types of slots as physical replication slots
might have statistics in the future?
Yeah, I think it's kind of a weird situation. There's already some things in pg_replication_slots that should probably be in a stat_ view, so if we were to create one we would have to move those, and probably needlessly break things for people.
i guess we could have separate views for logical and pysical slots since there are things that only one of them will have. But there is that already -- the database for example, and xmins. Splitting that apart now should be a bigger thing, but I don't think it's worth it.
If we move these values to replication slots, I think we can change
the code so that these statistics are managed by replication slots
(e.g. ReplicationSlot struct). Once ReplicationSlot has these values,
we can keep them beyond reconnections or multiple calls of SQL
interface functions. Of course, these values don’t need to be
persisted.
Eh, why should they not be persisted? The comparison would be temp_files and temp_bytes in pg_stat_database, and those *are* persisted.
On Thu, Jun 11, 2020 at 12:35 AM Magnus Hagander <magnus@hagander.net> wrote: > > On Wed, Jun 10, 2020 at 9:01 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> > >> If we move these values to replication slots, I think we can change >> the code so that these statistics are managed by replication slots >> (e.g. ReplicationSlot struct). Once ReplicationSlot has these values, >> we can keep them beyond reconnections or multiple calls of SQL >> interface functions. Of course, these values don’t need to be >> persisted. > > > Eh, why should they not be persisted? > Because these stats are corresponding to a ReoderBuffer (logical decoding) which will be allocated fresh after restart and have no relation with what has happened before restart. Now, thinking about this again, I am not sure if these stats are directly related to slots. These are stats for logical decoding which can be performed either via WALSender or decoding plugin (via APIs). So, why not have them displayed in a new view like pg_stat_logical (or pg_stat_logical_decoding/pg_stat_logical_replication)? In future, we will need to add similar stats for streaming of in-progress transactions as well (see patch 0007-Track-statistics-for-streaming at [1]), so having a separate view for these doesn't sound illogical. > The comparison would be temp_files and temp_bytes in pg_stat_database, and those *are* persisted. I am not able to see a one-on-one mapping of those stats with what is being discussed here. [1] - https://www.postgresql.org/message-id/CAFiTN-vXQx_161WC-a9HvNaF25nwO%3DJJRpRdTtyfGQHbM3Bd1Q%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, 11 Jun 2020 at 12:30, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jun 11, 2020 at 12:35 AM Magnus Hagander <magnus@hagander.net> wrote: > > > > On Wed, Jun 10, 2020 at 9:01 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > >> > > > >> If we move these values to replication slots, I think we can change > >> the code so that these statistics are managed by replication slots > >> (e.g. ReplicationSlot struct). Once ReplicationSlot has these values, > >> we can keep them beyond reconnections or multiple calls of SQL > >> interface functions. Of course, these values don’t need to be > >> persisted. > > > > > > Eh, why should they not be persisted? > > > > Because these stats are corresponding to a ReoderBuffer (logical > decoding) which will be allocated fresh after restart and have no > relation with what has happened before restart. I thought the same. But I now think there is no difference between reconnecting replication and server restart in terms of the logical decoding context. Even if we persist these values, we might want to reset these values after crash recovery like stats collector. > > Now, thinking about this again, I am not sure if these stats are > directly related to slots. These are stats for logical decoding which > can be performed either via WALSender or decoding plugin (via APIs). > So, why not have them displayed in a new view like pg_stat_logical (or > pg_stat_logical_decoding/pg_stat_logical_replication)? In future, we > will need to add similar stats for streaming of in-progress > transactions as well (see patch 0007-Track-statistics-for-streaming at > [1]), so having a separate view for these doesn't sound illogical. > I think we need to decide how long we want to remain these statistics values. That is, if we were to have such pg_stat_logical view, these values would remain until logical decoding finished since I think the view would display only running logical decoding. OTOH, if we were to correspond these stats to slots, these values would remain beyond multiple logical decoding SQL API calls. I think one of the main use-case of these statistics is the tuning of logical_decoding_work_mem. This seems similar to a combination of pg_stat_database.temp_files/temp_bytes and work_mem. From this perspective, I guess it’s useful for users if these values remain until or the slots are removed or server crashes. Given that the kinds of logical decoding statistics might grow, having a separate view dedicated to replication slots makes sense to me. For updating these statistics, if we correspond these statistics to logical decoding or replication slot, we can change the strategy of updating statistics so that it doesn’t depend on logical decoding plugin implementation. If updating statistics doesn’t affect performance much, it’s better to track the statistics regardless of plugins. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 11, 2020 at 1:46 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 11 Jun 2020 at 12:30, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Now, thinking about this again, I am not sure if these stats are > > directly related to slots. These are stats for logical decoding which > > can be performed either via WALSender or decoding plugin (via APIs). > > So, why not have them displayed in a new view like pg_stat_logical (or > > pg_stat_logical_decoding/pg_stat_logical_replication)? In future, we > > will need to add similar stats for streaming of in-progress > > transactions as well (see patch 0007-Track-statistics-for-streaming at > > [1]), so having a separate view for these doesn't sound illogical. > > > > I think we need to decide how long we want to remain these statistics > values. That is, if we were to have such pg_stat_logical view, these > values would remain until logical decoding finished since I think the > view would display only running logical decoding. OTOH, if we were to > correspond these stats to slots, these values would remain beyond > multiple logical decoding SQL API calls. > I thought of having these till the process that performs these operations exist. So for WALSender, the stats will be valid till it is not restarted due to some reason or when performed via backend, the stats will be valid till the corresponding backend exits. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, 11 Jun 2020 at 18:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jun 11, 2020 at 1:46 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 11 Jun 2020 at 12:30, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > Now, thinking about this again, I am not sure if these stats are > > > directly related to slots. These are stats for logical decoding which > > > can be performed either via WALSender or decoding plugin (via APIs). > > > So, why not have them displayed in a new view like pg_stat_logical (or > > > pg_stat_logical_decoding/pg_stat_logical_replication)? In future, we > > > will need to add similar stats for streaming of in-progress > > > transactions as well (see patch 0007-Track-statistics-for-streaming at > > > [1]), so having a separate view for these doesn't sound illogical. > > > > > > > I think we need to decide how long we want to remain these statistics > > values. That is, if we were to have such pg_stat_logical view, these > > values would remain until logical decoding finished since I think the > > view would display only running logical decoding. OTOH, if we were to > > correspond these stats to slots, these values would remain beyond > > multiple logical decoding SQL API calls. > > > > I thought of having these till the process that performs these > operations exist. So for WALSender, the stats will be valid till it > is not restarted due to some reason or when performed via backend, the > stats will be valid till the corresponding backend exits. > The number of rows of that view could be up to (max_backends + max_wal_senders). Is that right? What if different backends used the same replication slot one after the other? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 11, 2020 at 3:07 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 11 Jun 2020 at 18:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Jun 11, 2020 at 1:46 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Thu, 11 Jun 2020 at 12:30, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > Now, thinking about this again, I am not sure if these stats are > > > > directly related to slots. These are stats for logical decoding which > > > > can be performed either via WALSender or decoding plugin (via APIs). > > > > So, why not have them displayed in a new view like pg_stat_logical (or > > > > pg_stat_logical_decoding/pg_stat_logical_replication)? In future, we > > > > will need to add similar stats for streaming of in-progress > > > > transactions as well (see patch 0007-Track-statistics-for-streaming at > > > > [1]), so having a separate view for these doesn't sound illogical. > > > > > > > > > > I think we need to decide how long we want to remain these statistics > > > values. That is, if we were to have such pg_stat_logical view, these > > > values would remain until logical decoding finished since I think the > > > view would display only running logical decoding. OTOH, if we were to > > > correspond these stats to slots, these values would remain beyond > > > multiple logical decoding SQL API calls. > > > > > > > I thought of having these till the process that performs these > > operations exist. So for WALSender, the stats will be valid till it > > is not restarted due to some reason or when performed via backend, the > > stats will be valid till the corresponding backend exits. > > > > The number of rows of that view could be up to (max_backends + > max_wal_senders). Is that right? What if different backends used the > same replication slot one after the other? > Yeah, it would be tricky if multiple slots are used by the same backend. We could probably track the number of times decoding has happened by the session that will probably help us in averaging the spill amount. If we think that the aim is to help users to tune logical_decoding_work_mem to avoid frequent spilling or streaming then how would maintaining at slot level will help? As you said previously we could track it only for running logical decoding context but if we do that then data will be temporary and the user needs to constantly monitor the same to make sense of it but maybe that is fine. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, 11 Jun 2020 at 20:02, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jun 11, 2020 at 3:07 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 11 Jun 2020 at 18:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Jun 11, 2020 at 1:46 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Thu, 11 Jun 2020 at 12:30, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > Now, thinking about this again, I am not sure if these stats are > > > > > directly related to slots. These are stats for logical decoding which > > > > > can be performed either via WALSender or decoding plugin (via APIs). > > > > > So, why not have them displayed in a new view like pg_stat_logical (or > > > > > pg_stat_logical_decoding/pg_stat_logical_replication)? In future, we > > > > > will need to add similar stats for streaming of in-progress > > > > > transactions as well (see patch 0007-Track-statistics-for-streaming at > > > > > [1]), so having a separate view for these doesn't sound illogical. > > > > > > > > > > > > > I think we need to decide how long we want to remain these statistics > > > > values. That is, if we were to have such pg_stat_logical view, these > > > > values would remain until logical decoding finished since I think the > > > > view would display only running logical decoding. OTOH, if we were to > > > > correspond these stats to slots, these values would remain beyond > > > > multiple logical decoding SQL API calls. > > > > > > > > > > I thought of having these till the process that performs these > > > operations exist. So for WALSender, the stats will be valid till it > > > is not restarted due to some reason or when performed via backend, the > > > stats will be valid till the corresponding backend exits. > > > > > > > The number of rows of that view could be up to (max_backends + > > max_wal_senders). Is that right? What if different backends used the > > same replication slot one after the other? > > > > Yeah, it would be tricky if multiple slots are used by the same > backend. We could probably track the number of times decoding has > happened by the session that will probably help us in averaging the > spill amount. If we think that the aim is to help users to tune > logical_decoding_work_mem to avoid frequent spilling or streaming then > how would maintaining at slot level will help? Since the logical decoding intermediate files are written at per slots directory, I thought that corresponding these statistics to replication slots is also understandable for users. I was thinking something like pg_stat_logical_replication_slot view which shows slot_name and statistics of only logical replication slots. The view always shows rows as many as existing replication slots regardless of logical decoding being running. I think there is no big difference in how users use these statistics values between maintaining at slot level and at logical decoding level. In logical replication case, since we generally don’t support setting different logical_decoding_work_mem per wal senders, every wal sender will decode the same WAL stream with the same setting, meaning they will similarly spill intermediate files. Maybe the same is true statistics of streaming. So having these statistics per logical replication might not help as of now. > As you said previously > we could track it only for running logical decoding context but if we > do that then data will be temporary and the user needs to constantly > monitor the same to make sense of it but maybe that is fine. Agreed, in general, it's better not to frequently reset cumulative values. I personally would like these statistics to be valid even after the process executing logical decoding exits (or even after server restart), and to do reset them as needed. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 11, 2020 at 7:39 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 11 Jun 2020 at 20:02, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Jun 11, 2020 at 3:07 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Thu, 11 Jun 2020 at 18:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Thu, Jun 11, 2020 at 1:46 PM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > On Thu, 11 Jun 2020 at 12:30, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Now, thinking about this again, I am not sure if these stats are > > > > > > directly related to slots. These are stats for logical decoding which > > > > > > can be performed either via WALSender or decoding plugin (via APIs). > > > > > > So, why not have them displayed in a new view like pg_stat_logical (or > > > > > > pg_stat_logical_decoding/pg_stat_logical_replication)? In future, we > > > > > > will need to add similar stats for streaming of in-progress > > > > > > transactions as well (see patch 0007-Track-statistics-for-streaming at > > > > > > [1]), so having a separate view for these doesn't sound illogical. > > > > > > > > > > > > > > > > I think we need to decide how long we want to remain these statistics > > > > > values. That is, if we were to have such pg_stat_logical view, these > > > > > values would remain until logical decoding finished since I think the > > > > > view would display only running logical decoding. OTOH, if we were to > > > > > correspond these stats to slots, these values would remain beyond > > > > > multiple logical decoding SQL API calls. > > > > > > > > > > > > > I thought of having these till the process that performs these > > > > operations exist. So for WALSender, the stats will be valid till it > > > > is not restarted due to some reason or when performed via backend, the > > > > stats will be valid till the corresponding backend exits. > > > > > > > > > > The number of rows of that view could be up to (max_backends + > > > max_wal_senders). Is that right? What if different backends used the > > > same replication slot one after the other? > > > > > > > Yeah, it would be tricky if multiple slots are used by the same > > backend. We could probably track the number of times decoding has > > happened by the session that will probably help us in averaging the > > spill amount. If we think that the aim is to help users to tune > > logical_decoding_work_mem to avoid frequent spilling or streaming then > > how would maintaining at slot level will help? > > Since the logical decoding intermediate files are written at per slots > directory, I thought that corresponding these statistics to > replication slots is also understandable for users. > What I wanted to know is how will it help users to tune logical_decoding_work_mem? Different backends can process from the same slot, so it is not clear how user will be able to make any meaning out of those stats. OTOH, it is easier to see how to make meaning of these stats if we display them w.r.t process. Basically, we have spill_count and spill_size which can be used to tune logical_decoding_work_mem and also the activity of spilling happens at process level, so it sounds like one-to-one mapping. I am not telling to rule out maintaining a slot level but trying to see if we can come up with a clear definition. > I was thinking > something like pg_stat_logical_replication_slot view which shows > slot_name and statistics of only logical replication slots. The view > always shows rows as many as existing replication slots regardless of > logical decoding being running. I think there is no big difference in > how users use these statistics values between maintaining at slot > level and at logical decoding level. > > In logical replication case, since we generally don’t support setting > different logical_decoding_work_mem per wal senders, every wal sender > will decode the same WAL stream with the same setting, meaning they > will similarly spill intermediate files. > I am not sure this will be true in every case. We do have a slot_advance functionality, so some plugin might use that and that will lead to different files getting spilled. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2020/06/12 12:21, Amit Kapila wrote: > On Thu, Jun 11, 2020 at 7:39 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Thu, 11 Jun 2020 at 20:02, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> >>> On Thu, Jun 11, 2020 at 3:07 PM Masahiko Sawada >>> <masahiko.sawada@2ndquadrant.com> wrote: >>>> >>>> On Thu, 11 Jun 2020 at 18:11, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>>> >>>>> On Thu, Jun 11, 2020 at 1:46 PM Masahiko Sawada >>>>> <masahiko.sawada@2ndquadrant.com> wrote: >>>>>> >>>>>> On Thu, 11 Jun 2020 at 12:30, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>>>>> >>>>>>> >>>>>>> Now, thinking about this again, I am not sure if these stats are >>>>>>> directly related to slots. These are stats for logical decoding which >>>>>>> can be performed either via WALSender or decoding plugin (via APIs). >>>>>>> So, why not have them displayed in a new view like pg_stat_logical (or >>>>>>> pg_stat_logical_decoding/pg_stat_logical_replication)? In future, we >>>>>>> will need to add similar stats for streaming of in-progress >>>>>>> transactions as well (see patch 0007-Track-statistics-for-streaming at >>>>>>> [1]), so having a separate view for these doesn't sound illogical. >>>>>>> >>>>>> >>>>>> I think we need to decide how long we want to remain these statistics >>>>>> values. That is, if we were to have such pg_stat_logical view, these >>>>>> values would remain until logical decoding finished since I think the >>>>>> view would display only running logical decoding. OTOH, if we were to >>>>>> correspond these stats to slots, these values would remain beyond >>>>>> multiple logical decoding SQL API calls. >>>>>> >>>>> >>>>> I thought of having these till the process that performs these >>>>> operations exist. So for WALSender, the stats will be valid till it >>>>> is not restarted due to some reason or when performed via backend, the >>>>> stats will be valid till the corresponding backend exits. >>>>> >>>> >>>> The number of rows of that view could be up to (max_backends + >>>> max_wal_senders). Is that right? What if different backends used the >>>> same replication slot one after the other? >>>> >>> >>> Yeah, it would be tricky if multiple slots are used by the same >>> backend. We could probably track the number of times decoding has >>> happened by the session that will probably help us in averaging the >>> spill amount. If we think that the aim is to help users to tune >>> logical_decoding_work_mem to avoid frequent spilling or streaming then >>> how would maintaining at slot level will help? >> >> Since the logical decoding intermediate files are written at per slots >> directory, I thought that corresponding these statistics to >> replication slots is also understandable for users. >> > > What I wanted to know is how will it help users to tune > logical_decoding_work_mem? Different backends can process from the > same slot, so it is not clear how user will be able to make any > meaning out of those stats. OTOH, it is easier to see how to make > meaning of these stats if we display them w.r.t process. Basically, > we have spill_count and spill_size which can be used to tune > logical_decoding_work_mem and also the activity of spilling happens at > process level, so it sounds like one-to-one mapping. I am not telling > to rule out maintaining a slot level but trying to see if we can come > up with a clear definition. > >> I was thinking >> something like pg_stat_logical_replication_slot view which shows >> slot_name and statistics of only logical replication slots. The view >> always shows rows as many as existing replication slots regardless of >> logical decoding being running. I think there is no big difference in >> how users use these statistics values between maintaining at slot >> level and at logical decoding level. >> >> In logical replication case, since we generally don’t support setting >> different logical_decoding_work_mem per wal senders, every wal sender >> will decode the same WAL stream with the same setting, meaning they >> will similarly spill intermediate files. I was thinking we support that. We can create multiple replication users with different logical_decoding_work_mem settings. Also each walsender can use logical_decoding_work_mem configured in its user. No? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Fri, 12 Jun 2020 at 12:56, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2020/06/12 12:21, Amit Kapila wrote: > > On Thu, Jun 11, 2020 at 7:39 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > >> > >> On Thu, 11 Jun 2020 at 20:02, Amit Kapila <amit.kapila16@gmail.com> wrote: > >>> > >>> On Thu, Jun 11, 2020 at 3:07 PM Masahiko Sawada > >>> <masahiko.sawada@2ndquadrant.com> wrote: > >>>> > >>>> On Thu, 11 Jun 2020 at 18:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > >>>>> > >>>>> On Thu, Jun 11, 2020 at 1:46 PM Masahiko Sawada > >>>>> <masahiko.sawada@2ndquadrant.com> wrote: > >>>>>> > >>>>>> On Thu, 11 Jun 2020 at 12:30, Amit Kapila <amit.kapila16@gmail.com> wrote: > >>>>>>> > >>>>>>> > >>>>>>> Now, thinking about this again, I am not sure if these stats are > >>>>>>> directly related to slots. These are stats for logical decoding which > >>>>>>> can be performed either via WALSender or decoding plugin (via APIs). > >>>>>>> So, why not have them displayed in a new view like pg_stat_logical (or > >>>>>>> pg_stat_logical_decoding/pg_stat_logical_replication)? In future, we > >>>>>>> will need to add similar stats for streaming of in-progress > >>>>>>> transactions as well (see patch 0007-Track-statistics-for-streaming at > >>>>>>> [1]), so having a separate view for these doesn't sound illogical. > >>>>>>> > >>>>>> > >>>>>> I think we need to decide how long we want to remain these statistics > >>>>>> values. That is, if we were to have such pg_stat_logical view, these > >>>>>> values would remain until logical decoding finished since I think the > >>>>>> view would display only running logical decoding. OTOH, if we were to > >>>>>> correspond these stats to slots, these values would remain beyond > >>>>>> multiple logical decoding SQL API calls. > >>>>>> > >>>>> > >>>>> I thought of having these till the process that performs these > >>>>> operations exist. So for WALSender, the stats will be valid till it > >>>>> is not restarted due to some reason or when performed via backend, the > >>>>> stats will be valid till the corresponding backend exits. > >>>>> > >>>> > >>>> The number of rows of that view could be up to (max_backends + > >>>> max_wal_senders). Is that right? What if different backends used the > >>>> same replication slot one after the other? > >>>> > >>> > >>> Yeah, it would be tricky if multiple slots are used by the same > >>> backend. We could probably track the number of times decoding has > >>> happened by the session that will probably help us in averaging the > >>> spill amount. If we think that the aim is to help users to tune > >>> logical_decoding_work_mem to avoid frequent spilling or streaming then > >>> how would maintaining at slot level will help? > >> > >> Since the logical decoding intermediate files are written at per slots > >> directory, I thought that corresponding these statistics to > >> replication slots is also understandable for users. > >> > > > > What I wanted to know is how will it help users to tune > > logical_decoding_work_mem? Different backends can process from the > > same slot, so it is not clear how user will be able to make any > > meaning out of those stats. OTOH, it is easier to see how to make > > meaning of these stats if we display them w.r.t process. Basically, > > we have spill_count and spill_size which can be used to tune > > logical_decoding_work_mem and also the activity of spilling happens at > > process level, so it sounds like one-to-one mapping. I am not telling > > to rule out maintaining a slot level but trying to see if we can come > > up with a clear definition. > > > >> I was thinking > >> something like pg_stat_logical_replication_slot view which shows > >> slot_name and statistics of only logical replication slots. The view > >> always shows rows as many as existing replication slots regardless of > >> logical decoding being running. I think there is no big difference in > >> how users use these statistics values between maintaining at slot > >> level and at logical decoding level. > >> > >> In logical replication case, since we generally don’t support setting > >> different logical_decoding_work_mem per wal senders, every wal sender > >> will decode the same WAL stream with the same setting, meaning they > >> will similarly spill intermediate files. > > I was thinking we support that. We can create multiple replication users > with different logical_decoding_work_mem settings. Also each walsender > can use logical_decoding_work_mem configured in its user. No? > Yes, you're right. I had missed that way. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 12 Jun 2020 at 12:21, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jun 11, 2020 at 7:39 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 11 Jun 2020 at 20:02, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Jun 11, 2020 at 3:07 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Thu, 11 Jun 2020 at 18:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Thu, Jun 11, 2020 at 1:46 PM Masahiko Sawada > > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > On Thu, 11 Jun 2020 at 12:30, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > Now, thinking about this again, I am not sure if these stats are > > > > > > > directly related to slots. These are stats for logical decoding which > > > > > > > can be performed either via WALSender or decoding plugin (via APIs). > > > > > > > So, why not have them displayed in a new view like pg_stat_logical (or > > > > > > > pg_stat_logical_decoding/pg_stat_logical_replication)? In future, we > > > > > > > will need to add similar stats for streaming of in-progress > > > > > > > transactions as well (see patch 0007-Track-statistics-for-streaming at > > > > > > > [1]), so having a separate view for these doesn't sound illogical. > > > > > > > > > > > > > > > > > > > I think we need to decide how long we want to remain these statistics > > > > > > values. That is, if we were to have such pg_stat_logical view, these > > > > > > values would remain until logical decoding finished since I think the > > > > > > view would display only running logical decoding. OTOH, if we were to > > > > > > correspond these stats to slots, these values would remain beyond > > > > > > multiple logical decoding SQL API calls. > > > > > > > > > > > > > > > > I thought of having these till the process that performs these > > > > > operations exist. So for WALSender, the stats will be valid till it > > > > > is not restarted due to some reason or when performed via backend, the > > > > > stats will be valid till the corresponding backend exits. > > > > > > > > > > > > > The number of rows of that view could be up to (max_backends + > > > > max_wal_senders). Is that right? What if different backends used the > > > > same replication slot one after the other? > > > > > > > > > > Yeah, it would be tricky if multiple slots are used by the same > > > backend. We could probably track the number of times decoding has > > > happened by the session that will probably help us in averaging the > > > spill amount. If we think that the aim is to help users to tune > > > logical_decoding_work_mem to avoid frequent spilling or streaming then > > > how would maintaining at slot level will help? > > > > Since the logical decoding intermediate files are written at per slots > > directory, I thought that corresponding these statistics to > > replication slots is also understandable for users. > > > > What I wanted to know is how will it help users to tune > logical_decoding_work_mem? Different backends can process from the > same slot, so it is not clear how user will be able to make any > meaning out of those stats. I thought that the user needs to constantly monitor them during one process is executing logical decoding and to see the increments. I might not fully understand but I guess the same is true for displaying them w.r.t. process. Since a process can do logical decoding several times using the same slot with a different setting, the user will need to monitor them several times. > OTOH, it is easier to see how to make > meaning of these stats if we display them w.r.t process. Basically, > we have spill_count and spill_size which can be used to tune > logical_decoding_work_mem and also the activity of spilling happens at > process level, so it sounds like one-to-one mapping. Displaying them w.r.t process also seems a good idea but I'm still unclear what to display and how long these values are valid. The view will have the following columns for example? * pid * slot_name * spill_txns * spill_count * spill_bytes * exec_count Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 12, 2020 at 11:20 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 12 Jun 2020 at 12:21, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > Since the logical decoding intermediate files are written at per slots > > > directory, I thought that corresponding these statistics to > > > replication slots is also understandable for users. > > > > > > > What I wanted to know is how will it help users to tune > > logical_decoding_work_mem? Different backends can process from the > > same slot, so it is not clear how user will be able to make any > > meaning out of those stats. > > I thought that the user needs to constantly monitor them during one > process is executing logical decoding and to see the increments. I > might not fully understand but I guess the same is true for displaying > them w.r.t. process. Since a process can do logical decoding several > times using the same slot with a different setting, the user will need > to monitor them several times. > Yeah, I think we might not be able to get exact measure but if we divide total_size spilled by exec_count, we will get some rough idea of what should be the logical_decoding_work_mem for that particular session. For ex. consider the logical_decoding_work_mem is 100bytes for a particular backend and the size spilled by that backend is 100 then I think you can roughly keep it to 200bytes if you want to avoid spilling. Similarly one can compute its average value over multiple executions. Does this make sense to you? > > OTOH, it is easier to see how to make > > meaning of these stats if we display them w.r.t process. Basically, > > we have spill_count and spill_size which can be used to tune > > logical_decoding_work_mem and also the activity of spilling happens at > > process level, so it sounds like one-to-one mapping. > > Displaying them w.r.t process also seems a good idea but I'm still > unclear what to display and how long these values are valid. > I feel till the lifetime of a process if we want to display the values at process level but I am open to hear others (including yours) views on this. > The view > will have the following columns for example? > > * pid > * slot_name > * spill_txns > * spill_count > * spill_bytes > * exec_count > Yeah, these appear to be what I have in mind. Note that we can have multiple entries of the same pid here because of slotname, there is some value to display slotname but I am not completely sure if that is a good idea but I am fine if you have a reason to include slotname? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Jun 12, 2020 at 10:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jun 12, 2020 at 11:20 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Fri, 12 Jun 2020 at 12:21, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > Since the logical decoding intermediate files are written at per slots
> > > directory, I thought that corresponding these statistics to
> > > replication slots is also understandable for users.
> > >
> >
> > What I wanted to know is how will it help users to tune
> > logical_decoding_work_mem? Different backends can process from the
> > same slot, so it is not clear how user will be able to make any
> > meaning out of those stats.
>
> I thought that the user needs to constantly monitor them during one
> process is executing logical decoding and to see the increments. I
> might not fully understand but I guess the same is true for displaying
> them w.r.t. process. Since a process can do logical decoding several
> times using the same slot with a different setting, the user will need
> to monitor them several times.
>
Yeah, I think we might not be able to get exact measure but if we
divide total_size spilled by exec_count, we will get some rough idea
of what should be the logical_decoding_work_mem for that particular
session. For ex. consider the logical_decoding_work_mem is 100bytes
for a particular backend and the size spilled by that backend is 100
then I think you can roughly keep it to 200bytes if you want to avoid
spilling. Similarly one can compute its average value over multiple
executions. Does this make sense to you?
The thing that becomes really interesting is to analyze this across time. For example to identify patterns where it always spills at the same time as certain other things are happening. For that usecase, having a "predictable persistence" is important. You may not be able to afford setting logical_decoding_work_mem high enough to cover every possible scenario (if you did, then we would basically not need the spilling..), so you want to track down in relation to the rest of your application exactly when and how this is happening.
> > OTOH, it is easier to see how to make
> > meaning of these stats if we display them w.r.t process. Basically,
> > we have spill_count and spill_size which can be used to tune
> > logical_decoding_work_mem and also the activity of spilling happens at
> > process level, so it sounds like one-to-one mapping.
>
> Displaying them w.r.t process also seems a good idea but I'm still
> unclear what to display and how long these values are valid.
>
I feel till the lifetime of a process if we want to display the values
at process level but I am open to hear others (including yours) views
on this.
The problem with "lifetime of a process" is that it's not predictable. A replication process might "bounce" for any reason, and it is normally not a problem. But if you suddenly lose your stats when you do that, it starts to matter a lot more. Especially when you don't know if it bounced. (Sure you can look at the backend_start time, but that adds a whole different sets of complexitites).
> > The view
> will have the following columns for example?
>
> * pid
> * slot_name
> * spill_txns
> * spill_count
> * spill_bytes
> * exec_count
>
Yeah, these appear to be what I have in mind. Note that we can have
multiple entries of the same pid here because of slotname, there is
some value to display slotname but I am not completely sure if that is
a good idea but I am fine if you have a reason to include slotname?
Well, it's a general view so you can always GROUP BY that away if you want at reading point?
On Fri, Jun 12, 2020 at 6:11 PM Magnus Hagander <magnus@hagander.net> wrote: > > On Fri, Jun 12, 2020 at 10:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >> > > > The problem with "lifetime of a process" is that it's not predictable. A replication process might "bounce" for any reason,and it is normally not a problem. But if you suddenly lose your stats when you do that, it starts to matter a lotmore. Especially when you don't know if it bounced. (Sure you can look at the backend_start time, but that adds a wholedifferent sets of complexitites). > It is not clear to me what is a good way to display the stats for a process that has exited or bounced due to whatever reason. OTOH, if we just display per-slot stats, it is difficult to imagine how the user can make any sense out of it or in other words how such stats can be useful to users. > > > > The view >> >> > will have the following columns for example? >> > >> > * pid >> > * slot_name >> > * spill_txns >> > * spill_count >> > * spill_bytes >> > * exec_count >> > >> >> Yeah, these appear to be what I have in mind. Note that we can have >> multiple entries of the same pid here because of slotname, there is >> some value to display slotname but I am not completely sure if that is >> a good idea but I am fine if you have a reason to include slotname? > > > Well, it's a general view so you can always GROUP BY that away if you want at reading point? > Okay, that is a valid point. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2020/06/13 14:23, Amit Kapila wrote: > On Fri, Jun 12, 2020 at 6:11 PM Magnus Hagander <magnus@hagander.net> wrote: >> >> On Fri, Jun 12, 2020 at 10:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >>> >> >> >> The problem with "lifetime of a process" is that it's not predictable. A replication process might "bounce" for any reason,and it is normally not a problem. But if you suddenly lose your stats when you do that, it starts to matter a lotmore. Especially when you don't know if it bounced. (Sure you can look at the backend_start time, but that adds a wholedifferent sets of complexitites). >> > > It is not clear to me what is a good way to display the stats for a > process that has exited or bounced due to whatever reason. OTOH, if > we just display per-slot stats, it is difficult to imagine how the > user can make any sense out of it or in other words how such stats can > be useful to users. If we allow users to set logical_decoding_work_mem per slot, maybe the users can tune it directly from the stats? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Sat, Jun 13, 2020 at 5:07 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > On 2020/06/13 14:23, Amit Kapila wrote: > > On Fri, Jun 12, 2020 at 6:11 PM Magnus Hagander <magnus@hagander.net> wrote: > >> > >> On Fri, Jun 12, 2020 at 10:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > >>> > >> > >> > >> The problem with "lifetime of a process" is that it's not predictable. A replication process might "bounce" for anyreason, and it is normally not a problem. But if you suddenly lose your stats when you do that, it starts to matter alot more. Especially when you don't know if it bounced. (Sure you can look at the backend_start time, but that adds a wholedifferent sets of complexitites). > >> > > > > It is not clear to me what is a good way to display the stats for a > > process that has exited or bounced due to whatever reason. OTOH, if > > we just display per-slot stats, it is difficult to imagine how the > > user can make any sense out of it or in other words how such stats can > > be useful to users. > > If we allow users to set logical_decoding_work_mem per slot, > maybe the users can tune it directly from the stats? > How will it behave when same slot is used from multiple sessions? I think it will be difficult to make sense of the stats for slots unless we also somehow see which process has lead to that stats and if we do so then there won't be much difference w.r.t what we can do now? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, 13 Jun 2020 at 14:23, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jun 12, 2020 at 6:11 PM Magnus Hagander <magnus@hagander.net> wrote: > > > > On Fri, Jun 12, 2020 at 10:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > > > > > > The problem with "lifetime of a process" is that it's not predictable. A replication process might "bounce" for any reason,and it is normally not a problem. But if you suddenly lose your stats when you do that, it starts to matter a lotmore. Especially when you don't know if it bounced. (Sure you can look at the backend_start time, but that adds a wholedifferent sets of complexitites). > > > > It is not clear to me what is a good way to display the stats for a > process that has exited or bounced due to whatever reason. OTOH, if > we just display per-slot stats, it is difficult to imagine how the > user can make any sense out of it or in other words how such stats can > be useful to users. If we have the reset function, the user can reset before doing logical decoding so that the user can use the stats directly. Or I think we can automatically reset the stats when logical decoding is performed with different logical_decoding_work_mem value than the previous one. In either way, since the stats correspond to the logical decoding using the same slot with the same parameter value the user can use them directly. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jun 17, 2020 at 1:34 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sat, 13 Jun 2020 at 14:23, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jun 12, 2020 at 6:11 PM Magnus Hagander <magnus@hagander.net> wrote: > > > > > > On Fri, Jun 12, 2020 at 10:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > >> > > > > > > > > > The problem with "lifetime of a process" is that it's not predictable. A replication process might "bounce" for anyreason, and it is normally not a problem. But if you suddenly lose your stats when you do that, it starts to matter alot more. Especially when you don't know if it bounced. (Sure you can look at the backend_start time, but that adds a wholedifferent sets of complexitites). > > > > > > > It is not clear to me what is a good way to display the stats for a > > process that has exited or bounced due to whatever reason. OTOH, if > > we just display per-slot stats, it is difficult to imagine how the > > user can make any sense out of it or in other words how such stats can > > be useful to users. > > If we have the reset function, the user can reset before doing logical > decoding so that the user can use the stats directly. Or I think we > can automatically reset the stats when logical decoding is performed > with different logical_decoding_work_mem value than the previous one. > I had written above in the context of persisting these stats. I mean to say if the process has bounced or server has restarted then the previous stats might not make much sense because we were planning to use pid [1], so the stats from process that has exited might not make much sense or do you think that is okay? If we don't want to persist and the lifetime of these stats is till the process is alive then we are fine. [1] - https://www.postgresql.org/message-id/CA%2Bfd4k5nqeFdhpnCULpTh9TR%2B15rHZSbz0SDC6sZhr_v99SeKA%40mail.gmail.com With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, 17 Jun 2020 at 20:14, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jun 17, 2020 at 1:34 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Sat, 13 Jun 2020 at 14:23, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Fri, Jun 12, 2020 at 6:11 PM Magnus Hagander <magnus@hagander.net> wrote: > > > > > > > > On Fri, Jun 12, 2020 at 10:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > >> > > > > > > > > > > > > The problem with "lifetime of a process" is that it's not predictable. A replication process might "bounce" for anyreason, and it is normally not a problem. But if you suddenly lose your stats when you do that, it starts to matter alot more. Especially when you don't know if it bounced. (Sure you can look at the backend_start time, but that adds a wholedifferent sets of complexitites). > > > > > > > > > > It is not clear to me what is a good way to display the stats for a > > > process that has exited or bounced due to whatever reason. OTOH, if > > > we just display per-slot stats, it is difficult to imagine how the > > > user can make any sense out of it or in other words how such stats can > > > be useful to users. > > > > If we have the reset function, the user can reset before doing logical > > decoding so that the user can use the stats directly. Or I think we > > can automatically reset the stats when logical decoding is performed > > with different logical_decoding_work_mem value than the previous one. > > > > I had written above in the context of persisting these stats. I mean > to say if the process has bounced or server has restarted then the > previous stats might not make much sense because we were planning to > use pid [1], so the stats from process that has exited might not make > much sense or do you think that is okay? If we don't want to persist > and the lifetime of these stats is till the process is alive then we > are fine. > Sorry for confusing you. The above my idea is about having the stats per slots. That is, we add spill_txns, spill_count and spill_bytes to pg_replication_slots or a new view pg_stat_logical_replication_slots with some columns: slot_name plus these stats columns and stats_reset. The idea is that the stats values accumulate until either the slot is dropped, the server crashed, the user executes the reset function, or logical decoding is performed with different logical_decoding_work_mem value than the previous time. In other words, the stats values are reset in either case. That way, I think the stats values always correspond to logical decoding using the same slot with the same logical_decoding_work_mem value. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 18, 2020 at 8:01 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 17 Jun 2020 at 20:14, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I had written above in the context of persisting these stats. I mean > > to say if the process has bounced or server has restarted then the > > previous stats might not make much sense because we were planning to > > use pid [1], so the stats from process that has exited might not make > > much sense or do you think that is okay? If we don't want to persist > > and the lifetime of these stats is till the process is alive then we > > are fine. > > > > Sorry for confusing you. The above my idea is about having the stats > per slots. That is, we add spill_txns, spill_count and spill_bytes to > pg_replication_slots or a new view pg_stat_logical_replication_slots > with some columns: slot_name plus these stats columns and stats_reset. > The idea is that the stats values accumulate until either the slot is > dropped, the server crashed, the user executes the reset function, or > logical decoding is performed with different logical_decoding_work_mem > value than the previous time. In other words, the stats values are > reset in either case. That way, I think the stats values always > correspond to logical decoding using the same slot with the same > logical_decoding_work_mem value. > What if the decoding has been performed by multiple backends using the same slot? In that case, it will be difficult to make the judgment for the value of logical_decoding_work_mem based on stats. It would make sense if we provide a way to set logical_decoding_work_mem for a slot but not sure if that is better than what we have now. What problems do we see in displaying these for each process? I think users might want to see the stats for the exited processes or after server restart but I think both of those are not even possible today. I think the stats are available till the corresponding WALSender process is active. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Hi, Sorry for neglecting this thread for the last couple days ... In general, I agree it's somewhat unfortunate the stats are reset when the walsender exits. This was mostly fine for tuning of the spilling (change value -> restart -> see stats) but for proper monitoring this is somewhat problematic. I simply considered these fields somewhat similar to lag monitoring, not from the "monitoring" POV. On Thu, Jun 11, 2020 at 11:09:00PM +0900, Masahiko Sawada wrote: > > ... > >Since the logical decoding intermediate files are written at per slots >directory, I thought that corresponding these statistics to >replication slots is also understandable for users. I was thinking >something like pg_stat_logical_replication_slot view which shows >slot_name and statistics of only logical replication slots. The view >always shows rows as many as existing replication slots regardless of >logical decoding being running. I think there is no big difference in >how users use these statistics values between maintaining at slot >level and at logical decoding level. > >In logical replication case, since we generally don’t support setting >different logical_decoding_work_mem per wal senders, every wal sender >will decode the same WAL stream with the same setting, meaning they >will similarly spill intermediate files. Maybe the same is true >statistics of streaming. So having these statistics per logical >replication might not help as of now. > I think the idea to track these stats per replication slot (rather than per walsender) is the right approach. We should extend statistics collector to keep one entry per replication slot and have a new stats view called e.g. pg_stat_replication_slots, which could be reset just like other stats in the collector. I don't quite understand the discussion about different backends using logical_decoding_work_mem - why would this be an issue? Surely we have this exact issue e.g. with tracking index vs. sequential scans and GUCs like random_page_cost. That can change over time too, different backends may use different values, and yet we don't worry about resetting the number of index scans for a table etc. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jun 18, 2020 at 12:21:17PM +0530, Amit Kapila wrote: >On Thu, Jun 18, 2020 at 8:01 AM Masahiko Sawada ><masahiko.sawada@2ndquadrant.com> wrote: >> >> On Wed, 17 Jun 2020 at 20:14, Amit Kapila <amit.kapila16@gmail.com> wrote: >> > >> > >> > I had written above in the context of persisting these stats. I mean >> > to say if the process has bounced or server has restarted then the >> > previous stats might not make much sense because we were planning to >> > use pid [1], so the stats from process that has exited might not make >> > much sense or do you think that is okay? If we don't want to persist >> > and the lifetime of these stats is till the process is alive then we >> > are fine. >> > >> >> Sorry for confusing you. The above my idea is about having the stats >> per slots. That is, we add spill_txns, spill_count and spill_bytes to >> pg_replication_slots or a new view pg_stat_logical_replication_slots >> with some columns: slot_name plus these stats columns and stats_reset. >> The idea is that the stats values accumulate until either the slot is >> dropped, the server crashed, the user executes the reset function, or >> logical decoding is performed with different logical_decoding_work_mem >> value than the previous time. In other words, the stats values are >> reset in either case. That way, I think the stats values always >> correspond to logical decoding using the same slot with the same >> logical_decoding_work_mem value. >> > >What if the decoding has been performed by multiple backends using the >same slot? In that case, it will be difficult to make the judgment >for the value of logical_decoding_work_mem based on stats. It would >make sense if we provide a way to set logical_decoding_work_mem for a >slot but not sure if that is better than what we have now. > >What problems do we see in displaying these for each process? I think >users might want to see the stats for the exited processes or after >server restart but I think both of those are not even possible today. >I think the stats are available till the corresponding WALSender >process is active. > I don't quite see what the problem is. We're in this exact position with many other stats we track and various GUCs. If you decide to tune the setting for a particular slot, you simply need to be careful which backends decode the slot and what GUC values they used. But I don't think this situation (multiple backends decoding the same slot with different logical_decoding_work_mem values) is very common. In most cases the backends/walsenders will all use the same value. If you change that, you better remember that. I really think we should not be inventing something that automatically resets the stats when someone happens to change the GUC. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Jun 21, 2020 at 3:27 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Thu, Jun 18, 2020 at 12:21:17PM +0530, Amit Kapila wrote: > >On Thu, Jun 18, 2020 at 8:01 AM Masahiko Sawada > ><masahiko.sawada@2ndquadrant.com> wrote: > >> > >> On Wed, 17 Jun 2020 at 20:14, Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > > >> > > >> > I had written above in the context of persisting these stats. I mean > >> > to say if the process has bounced or server has restarted then the > >> > previous stats might not make much sense because we were planning to > >> > use pid [1], so the stats from process that has exited might not make > >> > much sense or do you think that is okay? If we don't want to persist > >> > and the lifetime of these stats is till the process is alive then we > >> > are fine. > >> > > >> > >> Sorry for confusing you. The above my idea is about having the stats > >> per slots. That is, we add spill_txns, spill_count and spill_bytes to > >> pg_replication_slots or a new view pg_stat_logical_replication_slots > >> with some columns: slot_name plus these stats columns and stats_reset. > >> The idea is that the stats values accumulate until either the slot is > >> dropped, the server crashed, the user executes the reset function, or > >> logical decoding is performed with different logical_decoding_work_mem > >> value than the previous time. In other words, the stats values are > >> reset in either case. That way, I think the stats values always > >> correspond to logical decoding using the same slot with the same > >> logical_decoding_work_mem value. > >> > > > >What if the decoding has been performed by multiple backends using the > >same slot? In that case, it will be difficult to make the judgment > >for the value of logical_decoding_work_mem based on stats. It would > >make sense if we provide a way to set logical_decoding_work_mem for a > >slot but not sure if that is better than what we have now. > > > >What problems do we see in displaying these for each process? I think > >users might want to see the stats for the exited processes or after > >server restart but I think both of those are not even possible today. > >I think the stats are available till the corresponding WALSender > >process is active. > > > > I don't quite see what the problem is. We're in this exact position with > many other stats we track and various GUCs. If you decide to tune the > setting for a particular slot, you simply need to be careful which > backends decode the slot and what GUC values they used. > What problem do you if we allow it to display per-process (WALSender or backend)? They are incurred by the WALSender or by backends so displaying them accordingly seems more straightforward and logical to me. As of now, we don't allow it to be set for a slot, so it won't be convenient for the user to tune it per slot. I think we can allow to set it per-slot but not sure if there is any benefit for the same. > I really think we should not be inventing something that automatically > resets the stats when someone happens to change the GUC. > I agree with that. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Jun 22, 2020 at 8:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Jun 21, 2020 at 3:27 AM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: > > > > On Thu, Jun 18, 2020 at 12:21:17PM +0530, Amit Kapila wrote: > > >On Thu, Jun 18, 2020 at 8:01 AM Masahiko Sawada > > ><masahiko.sawada@2ndquadrant.com> wrote: > > >> > > >> On Wed, 17 Jun 2020 at 20:14, Amit Kapila <amit.kapila16@gmail.com> wrote: > > >> > > > >> > > > >> > I had written above in the context of persisting these stats. I mean > > >> > to say if the process has bounced or server has restarted then the > > >> > previous stats might not make much sense because we were planning to > > >> > use pid [1], so the stats from process that has exited might not make > > >> > much sense or do you think that is okay? If we don't want to persist > > >> > and the lifetime of these stats is till the process is alive then we > > >> > are fine. > > >> > > > >> > > >> Sorry for confusing you. The above my idea is about having the stats > > >> per slots. That is, we add spill_txns, spill_count and spill_bytes to > > >> pg_replication_slots or a new view pg_stat_logical_replication_slots > > >> with some columns: slot_name plus these stats columns and stats_reset. > > >> The idea is that the stats values accumulate until either the slot is > > >> dropped, the server crashed, the user executes the reset function, or > > >> logical decoding is performed with different logical_decoding_work_mem > > >> value than the previous time. In other words, the stats values are > > >> reset in either case. That way, I think the stats values always > > >> correspond to logical decoding using the same slot with the same > > >> logical_decoding_work_mem value. > > >> > > > > > >What if the decoding has been performed by multiple backends using the > > >same slot? In that case, it will be difficult to make the judgment > > >for the value of logical_decoding_work_mem based on stats. It would > > >make sense if we provide a way to set logical_decoding_work_mem for a > > >slot but not sure if that is better than what we have now. > > > > > >What problems do we see in displaying these for each process? I think > > >users might want to see the stats for the exited processes or after > > >server restart but I think both of those are not even possible today. > > >I think the stats are available till the corresponding WALSender > > >process is active. > > > > > > > I don't quite see what the problem is. We're in this exact position with > > many other stats we track and various GUCs. If you decide to tune the > > setting for a particular slot, you simply need to be careful which > > backends decode the slot and what GUC values they used. > > > > What problem do you if we allow it to display per-process (WALSender > or backend)? They are incurred by the WALSender or by backends so > displaying them accordingly seems more straightforward and logical to > me. > > As of now, we don't allow it to be set for a slot, so it won't be > convenient for the user to tune it per slot. I think we can allow to > set it per-slot but not sure if there is any benefit for the same. > If we display stats as discussed in email [1] (pid, slot_name, spill_txns, spill_count, etc.), then we can even find the stats w.r.t each slot. [1] - https://www.postgresql.org/message-id/CA%2Bfd4k5nqeFdhpnCULpTh9TR%2B15rHZSbz0SDC6sZhr_v99SeKA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Thu, Jun 18, 2020 at 12:21:17PM +0530, Amit Kapila wrote: > >On Thu, Jun 18, 2020 at 8:01 AM Masahiko Sawada > ><masahiko.sawada@2ndquadrant.com> wrote: > >> > >> On Wed, 17 Jun 2020 at 20:14, Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > > >> > > >> > I had written above in the context of persisting these stats. I mean > >> > to say if the process has bounced or server has restarted then the > >> > previous stats might not make much sense because we were planning to > >> > use pid [1], so the stats from process that has exited might not make > >> > much sense or do you think that is okay? If we don't want to persist > >> > and the lifetime of these stats is till the process is alive then we > >> > are fine. > >> > > >> > >> Sorry for confusing you. The above my idea is about having the stats > >> per slots. That is, we add spill_txns, spill_count and spill_bytes to > >> pg_replication_slots or a new view pg_stat_logical_replication_slots > >> with some columns: slot_name plus these stats columns and stats_reset. > >> The idea is that the stats values accumulate until either the slot is > >> dropped, the server crashed, the user executes the reset function, or > >> logical decoding is performed with different logical_decoding_work_mem > >> value than the previous time. In other words, the stats values are > >> reset in either case. That way, I think the stats values always > >> correspond to logical decoding using the same slot with the same > >> logical_decoding_work_mem value. > >> > > > >What if the decoding has been performed by multiple backends using the > >same slot? In that case, it will be difficult to make the judgment > >for the value of logical_decoding_work_mem based on stats. It would > >make sense if we provide a way to set logical_decoding_work_mem for a > >slot but not sure if that is better than what we have now. > > I thought that the stats are relevant to what logical_decoding_work_mem value was but not with who performed logical decoding. So even if multiple backends perform logical decoding using the same slot, the user can directly use stats as long as logical_decoding_work_mem value doesn’t change. > >What problems do we see in displaying these for each process? I think > >users might want to see the stats for the exited processes or after > >server restart but I think both of those are not even possible today. > >I think the stats are available till the corresponding WALSender > >process is active. I might want to see the stats for the exited processes or after server restart. But I'm inclined to agree with displaying the stats per process if the stats are displayed on a separate view (e.g. pg_stat_replication_slots). > > > > I don't quite see what the problem is. We're in this exact position with > many other stats we track and various GUCs. If you decide to tune the > setting for a particular slot, you simply need to be careful which > backends decode the slot and what GUC values they used. > > But I don't think this situation (multiple backends decoding the same > slot with different logical_decoding_work_mem values) is very common. In > most cases the backends/walsenders will all use the same value. If you > change that, you better remember that. > > I really think we should not be inventing something that automatically > resets the stats when someone happens to change the GUC. Agreed. But what I thought is more simple; storing the logical_decoding_work_mem value along with the stats into a logical replication slot and resetting the stats if the logical_decoding_work_mem value is different than the stored value when performing logical decoding. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > > > > > > >What if the decoding has been performed by multiple backends using the > > >same slot? In that case, it will be difficult to make the judgment > > >for the value of logical_decoding_work_mem based on stats. It would > > >make sense if we provide a way to set logical_decoding_work_mem for a > > >slot but not sure if that is better than what we have now. > > > > > I thought that the stats are relevant to what > logical_decoding_work_mem value was but not with who performed logical > decoding. So even if multiple backends perform logical decoding using > the same slot, the user can directly use stats as long as > logical_decoding_work_mem value doesn’t change. > I think if you maintain these stats at the slot level, you probably need to use spinlock or atomic ops in order to update those as slots can be used from multiple backends whereas currently, we don't need that. > > >What problems do we see in displaying these for each process? I think > > >users might want to see the stats for the exited processes or after > > >server restart but I think both of those are not even possible today. > > >I think the stats are available till the corresponding WALSender > > >process is active. > > I might want to see the stats for the exited processes or after server > restart. But I'm inclined to agree with displaying the stats per > process if the stats are displayed on a separate view (e.g. > pg_stat_replication_slots). > Yeah, as told previously, this makes more sense to me. Do you think we should try to write a POC patch using a per-process entry approach and see what difficulties we are facing and does it give the stats in a way we are imagining but OTOH, we can wait for some more to see if there is clear winner approach here? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote: >On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada ><masahiko.sawada@2ndquadrant.com> wrote: >> >> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> > >> > > >> > >What if the decoding has been performed by multiple backends using the >> > >same slot? In that case, it will be difficult to make the judgment >> > >for the value of logical_decoding_work_mem based on stats. It would >> > >make sense if we provide a way to set logical_decoding_work_mem for a >> > >slot but not sure if that is better than what we have now. >> > > >> >> I thought that the stats are relevant to what >> logical_decoding_work_mem value was but not with who performed logical >> decoding. So even if multiple backends perform logical decoding using >> the same slot, the user can directly use stats as long as >> logical_decoding_work_mem value doesn’t change. >> > >I think if you maintain these stats at the slot level, you probably >need to use spinlock or atomic ops in order to update those as slots >can be used from multiple backends whereas currently, we don't need >that. IMHO storing the stats in the slot itself is a bad idea. We have the statistics collector for exactly this purpose, and it's receiving data over UDP without any extra locking etc. > >> > >What problems do we see in displaying these for each process? I think >> > >users might want to see the stats for the exited processes or after >> > >server restart but I think both of those are not even possible today. >> > >I think the stats are available till the corresponding WALSender >> > >process is active. >> >> I might want to see the stats for the exited processes or after server >> restart. But I'm inclined to agree with displaying the stats per >> process if the stats are displayed on a separate view (e.g. >> pg_stat_replication_slots). >> > >Yeah, as told previously, this makes more sense to me. > >Do you think we should try to write a POC patch using a per-process >entry approach and see what difficulties we are facing and does it >give the stats in a way we are imagining but OTOH, we can wait for >some more to see if there is clear winner approach here? > I may be missing something obvious, but I still see no point in tracking per-process stats. We don't have that for other stats, and I'm not sure how common is the scenario when a given slot is decoded by many backends. I'd say vast majority of cases are simply running decoding from a walsender, which may occasionally restart, but I doubt the users are interested in per-pid data - they probably want aggregated data. Can someone explain a plausible scenario for which tracking per-process stats would be needed, and simply computing deltas would not work? How will you know which old PID is which, what will you do when a PID is reused, and so on? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jun 23, 2020 at 3:48 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote: > >On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada > ><masahiko.sawada@2ndquadrant.com> wrote: > >> > >> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > >> > > >> > > > >> > >What if the decoding has been performed by multiple backends using the > >> > >same slot? In that case, it will be difficult to make the judgment > >> > >for the value of logical_decoding_work_mem based on stats. It would > >> > >make sense if we provide a way to set logical_decoding_work_mem for a > >> > >slot but not sure if that is better than what we have now. > >> > > > >> > >> I thought that the stats are relevant to what > >> logical_decoding_work_mem value was but not with who performed logical > >> decoding. So even if multiple backends perform logical decoding using > >> the same slot, the user can directly use stats as long as > >> logical_decoding_work_mem value doesn’t change. > >> > > > >I think if you maintain these stats at the slot level, you probably > >need to use spinlock or atomic ops in order to update those as slots > >can be used from multiple backends whereas currently, we don't need > >that. > > IMHO storing the stats in the slot itself is a bad idea. We have the > statistics collector for exactly this purpose, and it's receiving data > over UDP without any extra locking etc. > > > >> > >What problems do we see in displaying these for each process? I think > >> > >users might want to see the stats for the exited processes or after > >> > >server restart but I think both of those are not even possible today. > >> > >I think the stats are available till the corresponding WALSender > >> > >process is active. > >> > >> I might want to see the stats for the exited processes or after server > >> restart. But I'm inclined to agree with displaying the stats per > >> process if the stats are displayed on a separate view (e.g. > >> pg_stat_replication_slots). > >> > > > >Yeah, as told previously, this makes more sense to me. > > > >Do you think we should try to write a POC patch using a per-process > >entry approach and see what difficulties we are facing and does it > >give the stats in a way we are imagining but OTOH, we can wait for > >some more to see if there is clear winner approach here? > > > > I may be missing something obvious, but I still see no point in tracking > per-process stats. We don't have that for other stats, > Won't we display per-process information in pg_stat_replication? These stats are currently displayed in that view and one of the shortcomings was that it won't display these stats when we decode via backend. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 23, 2020 at 6:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jun 23, 2020 at 3:48 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: > > > > On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote: > > >On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada > > ><masahiko.sawada@2ndquadrant.com> wrote: > > >> > > >> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > >> > > > >> > > > > >> > >What if the decoding has been performed by multiple backends using the > > >> > >same slot? In that case, it will be difficult to make the judgment > > >> > >for the value of logical_decoding_work_mem based on stats. It would > > >> > >make sense if we provide a way to set logical_decoding_work_mem for a > > >> > >slot but not sure if that is better than what we have now. > > >> > > > > >> > > >> I thought that the stats are relevant to what > > >> logical_decoding_work_mem value was but not with who performed logical > > >> decoding. So even if multiple backends perform logical decoding using > > >> the same slot, the user can directly use stats as long as > > >> logical_decoding_work_mem value doesn’t change. > > >> Today, I thought about it again, and if we consider the point that logical_decoding_work_mem value doesn’t change much then having the stats at slot-level would also allow computing logical_decoding_work_mem based on stats. Do you think it is a reasonable assumption that users won't change logical_decoding_work_mem for different processes (WALSender, etc.)? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, 25 Jun 2020 at 19:35, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jun 23, 2020 at 6:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Jun 23, 2020 at 3:48 PM Tomas Vondra > > <tomas.vondra@2ndquadrant.com> wrote: > > > > > > On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote: > > > >On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada > > > ><masahiko.sawada@2ndquadrant.com> wrote: > > > >> > > > >> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > > >> > > > > >> > > > > > >> > >What if the decoding has been performed by multiple backends using the > > > >> > >same slot? In that case, it will be difficult to make the judgment > > > >> > >for the value of logical_decoding_work_mem based on stats. It would > > > >> > >make sense if we provide a way to set logical_decoding_work_mem for a > > > >> > >slot but not sure if that is better than what we have now. > > > >> > > > > > >> > > > >> I thought that the stats are relevant to what > > > >> logical_decoding_work_mem value was but not with who performed logical > > > >> decoding. So even if multiple backends perform logical decoding using > > > >> the same slot, the user can directly use stats as long as > > > >> logical_decoding_work_mem value doesn’t change. > > > >> > > Today, I thought about it again, and if we consider the point that > logical_decoding_work_mem value doesn’t change much then having the > stats at slot-level would also allow computing > logical_decoding_work_mem based on stats. Do you think it is a > reasonable assumption that users won't change > logical_decoding_work_mem for different processes (WALSender, etc.)? FWIW, if we use logical_decoding_work_mem as a threshold of starting of sending changes to a subscriber, I think there might be use cases where the user wants to set different logical_decoding_work_mem values to different wal senders. For example, setting a lower value to minimize the latency of synchronous logical replication to a near-site whereas setting a large value to minimize the amount of data sent to a far site. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 26, 2020 at 11:31 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 25 Jun 2020 at 19:35, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Jun 23, 2020 at 6:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Jun 23, 2020 at 3:48 PM Tomas Vondra > > > <tomas.vondra@2ndquadrant.com> wrote: > > > > > > > > On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote: > > > > >On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada > > > > ><masahiko.sawada@2ndquadrant.com> wrote: > > > > >> > > > > >> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > > > >> > > > > > >> > > > > > > >> > >What if the decoding has been performed by multiple backends using the > > > > >> > >same slot? In that case, it will be difficult to make the judgment > > > > >> > >for the value of logical_decoding_work_mem based on stats. It would > > > > >> > >make sense if we provide a way to set logical_decoding_work_mem for a > > > > >> > >slot but not sure if that is better than what we have now. > > > > >> > > > > > > >> > > > > >> I thought that the stats are relevant to what > > > > >> logical_decoding_work_mem value was but not with who performed logical > > > > >> decoding. So even if multiple backends perform logical decoding using > > > > >> the same slot, the user can directly use stats as long as > > > > >> logical_decoding_work_mem value doesn’t change. > > > > >> > > > > Today, I thought about it again, and if we consider the point that > > logical_decoding_work_mem value doesn’t change much then having the > > stats at slot-level would also allow computing > > logical_decoding_work_mem based on stats. Do you think it is a > > reasonable assumption that users won't change > > logical_decoding_work_mem for different processes (WALSender, etc.)? > > FWIW, if we use logical_decoding_work_mem as a threshold of starting > of sending changes to a subscriber, I think there might be use cases > where the user wants to set different logical_decoding_work_mem values > to different wal senders. For example, setting a lower value to > minimize the latency of synchronous logical replication to a near-site > whereas setting a large value to minimize the amount of data sent to a > far site. > How does setting a large value can minimize the amount of data sent? One possibility is if there are a lot of transaction aborts and transactions are not large enough that they cross logical_decoding_work_mem threshold but such cases shouldn't be many. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 23, 2020 at 12:18 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote:
>On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada
><masahiko.sawada@2ndquadrant.com> wrote:
>>
>> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> >
>> > >
>> > >What if the decoding has been performed by multiple backends using the
>> > >same slot? In that case, it will be difficult to make the judgment
>> > >for the value of logical_decoding_work_mem based on stats. It would
>> > >make sense if we provide a way to set logical_decoding_work_mem for a
>> > >slot but not sure if that is better than what we have now.
>> > >
>>
>> I thought that the stats are relevant to what
>> logical_decoding_work_mem value was but not with who performed logical
>> decoding. So even if multiple backends perform logical decoding using
>> the same slot, the user can directly use stats as long as
>> logical_decoding_work_mem value doesn’t change.
>>
>
>I think if you maintain these stats at the slot level, you probably
>need to use spinlock or atomic ops in order to update those as slots
>can be used from multiple backends whereas currently, we don't need
>that.
IMHO storing the stats in the slot itself is a bad idea. We have the
statistics collector for exactly this purpose, and it's receiving data
over UDP without any extra locking etc.
Yeah, that seems much more appropriate. Of course, where they are exposed is a different question.
>> > >What problems do we see in displaying these for each process? I think
>> > >users might want to see the stats for the exited processes or after
>> > >server restart but I think both of those are not even possible today.
>> > >I think the stats are available till the corresponding WALSender
>> > >process is active.
>>
>> I might want to see the stats for the exited processes or after server
>> restart. But I'm inclined to agree with displaying the stats per
>> process if the stats are displayed on a separate view (e.g.
>> pg_stat_replication_slots).
>>
>
>Yeah, as told previously, this makes more sense to me.
>
>Do you think we should try to write a POC patch using a per-process
>entry approach and see what difficulties we are facing and does it
>give the stats in a way we are imagining but OTOH, we can wait for
>some more to see if there is clear winner approach here?
>
I may be missing something obvious, but I still see no point in tracking
per-process stats. We don't have that for other stats, and I'm not sure
how common is the scenario when a given slot is decoded by many
backends. I'd say vast majority of cases are simply running decoding
from a walsender, which may occasionally restart, but I doubt the users
are interested in per-pid data - they probably want aggregated data.
Well, technically we do -- we have the pg_stat_xact_* views. However, those are only viewable from *inside* the session itself (which can sometimes be quite annoying).
This does somewhat apply in that normal transactions send their stats batches at transaction end. If this is data we'd be interested in viewing inside of that, a more direct exposure would be needed -- such as the way we do with LSNs in pg_stat_replication or whatever.
For long-term monitoring, people definitely want aggregate data I'd say. The "realtime data" if we call it that is in my experience mostly interesting if you want to define alerts etc ("replication standby is too far behind" is alertable through that, whereas things like "total amount of replication traffic over the past hour" is something that's more trend-alertable which is typically handled in a separate system pulling the aggregate stats)
Can someone explain a plausible scenario for which tracking per-process
stats would be needed, and simply computing deltas would not work? How
will you know which old PID is which, what will you do when a PID is
reused, and so on?
I fail to see that one as well, in a real-world scenario. Maybe if you want to do a one-off point-tuning of one tiny piece of a system? But you will then also need to long term statistics to follow-up if what you did was correct anyway...
On Fri, 26 Jun 2020 at 17:53, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jun 26, 2020 at 11:31 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 25 Jun 2020 at 19:35, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Jun 23, 2020 at 6:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Tue, Jun 23, 2020 at 3:48 PM Tomas Vondra > > > > <tomas.vondra@2ndquadrant.com> wrote: > > > > > > > > > > On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote: > > > > > >On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada > > > > > ><masahiko.sawada@2ndquadrant.com> wrote: > > > > > >> > > > > > >> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > > > > >> > > > > > > >> > > > > > > > >> > >What if the decoding has been performed by multiple backends using the > > > > > >> > >same slot? In that case, it will be difficult to make the judgment > > > > > >> > >for the value of logical_decoding_work_mem based on stats. It would > > > > > >> > >make sense if we provide a way to set logical_decoding_work_mem for a > > > > > >> > >slot but not sure if that is better than what we have now. > > > > > >> > > > > > > > >> > > > > > >> I thought that the stats are relevant to what > > > > > >> logical_decoding_work_mem value was but not with who performed logical > > > > > >> decoding. So even if multiple backends perform logical decoding using > > > > > >> the same slot, the user can directly use stats as long as > > > > > >> logical_decoding_work_mem value doesn’t change. > > > > > >> > > > > > > Today, I thought about it again, and if we consider the point that > > > logical_decoding_work_mem value doesn’t change much then having the > > > stats at slot-level would also allow computing > > > logical_decoding_work_mem based on stats. Do you think it is a > > > reasonable assumption that users won't change > > > logical_decoding_work_mem for different processes (WALSender, etc.)? > > > > FWIW, if we use logical_decoding_work_mem as a threshold of starting > > of sending changes to a subscriber, I think there might be use cases > > where the user wants to set different logical_decoding_work_mem values > > to different wal senders. For example, setting a lower value to > > minimize the latency of synchronous logical replication to a near-site > > whereas setting a large value to minimize the amount of data sent to a > > far site. > > > > How does setting a large value can minimize the amount of data sent? > One possibility is if there are a lot of transaction aborts and > transactions are not large enough that they cross > logical_decoding_work_mem threshold but such cases shouldn't be many. Yeah, this is what I meant. I agree that it would not be a common case that the user sets different values for different processes. Based on that assumption, I also think having the stats at slot-level is a good idea. But I might want to have the reset function. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jun 29, 2020 at 10:26 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 26 Jun 2020 at 17:53, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jun 26, 2020 at 11:31 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Thu, 25 Jun 2020 at 19:35, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > Today, I thought about it again, and if we consider the point that > > > > logical_decoding_work_mem value doesn’t change much then having the > > > > stats at slot-level would also allow computing > > > > logical_decoding_work_mem based on stats. Do you think it is a > > > > reasonable assumption that users won't change > > > > logical_decoding_work_mem for different processes (WALSender, etc.)? > > > > > > FWIW, if we use logical_decoding_work_mem as a threshold of starting > > > of sending changes to a subscriber, I think there might be use cases > > > where the user wants to set different logical_decoding_work_mem values > > > to different wal senders. For example, setting a lower value to > > > minimize the latency of synchronous logical replication to a near-site > > > whereas setting a large value to minimize the amount of data sent to a > > > far site. > > > > > > > How does setting a large value can minimize the amount of data sent? > > One possibility is if there are a lot of transaction aborts and > > transactions are not large enough that they cross > > logical_decoding_work_mem threshold but such cases shouldn't be many. > > Yeah, this is what I meant. > > I agree that it would not be a common case that the user sets > different values for different processes. Based on that assumption, I > also think having the stats at slot-level is a good idea. > Okay. > But I might > want to have the reset function. > I don't mind but lets fist see how the patch for the basic feature looks and what is required to implement it? Are you interested in writing the patch for this work? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, 29 Jun 2020 at 20:37, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jun 29, 2020 at 10:26 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Fri, 26 Jun 2020 at 17:53, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Fri, Jun 26, 2020 at 11:31 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Thu, 25 Jun 2020 at 19:35, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > Today, I thought about it again, and if we consider the point that > > > > > logical_decoding_work_mem value doesn’t change much then having the > > > > > stats at slot-level would also allow computing > > > > > logical_decoding_work_mem based on stats. Do you think it is a > > > > > reasonable assumption that users won't change > > > > > logical_decoding_work_mem for different processes (WALSender, etc.)? > > > > > > > > FWIW, if we use logical_decoding_work_mem as a threshold of starting > > > > of sending changes to a subscriber, I think there might be use cases > > > > where the user wants to set different logical_decoding_work_mem values > > > > to different wal senders. For example, setting a lower value to > > > > minimize the latency of synchronous logical replication to a near-site > > > > whereas setting a large value to minimize the amount of data sent to a > > > > far site. > > > > > > > > > > How does setting a large value can minimize the amount of data sent? > > > One possibility is if there are a lot of transaction aborts and > > > transactions are not large enough that they cross > > > logical_decoding_work_mem threshold but such cases shouldn't be many. > > > > Yeah, this is what I meant. > > > > I agree that it would not be a common case that the user sets > > different values for different processes. Based on that assumption, I > > also think having the stats at slot-level is a good idea. > > > > Okay. > > > But I might > > want to have the reset function. > > > > I don't mind but lets fist see how the patch for the basic feature > looks and what is required to implement it? Are you interested in > writing the patch for this work? Yes, I'll write the draft patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jun 30, 2020 at 6:38 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 29 Jun 2020 at 20:37, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Jun 29, 2020 at 10:26 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > I agree that it would not be a common case that the user sets > > > different values for different processes. Based on that assumption, I > > > also think having the stats at slot-level is a good idea. > > > > > > > Okay. > > > > > But I might > > > want to have the reset function. > > > > > > > I don't mind but lets fist see how the patch for the basic feature > > looks and what is required to implement it? Are you interested in > > writing the patch for this work? > > Yes, I'll write the draft patch. > Great, thanks. One thing we can consider is that instead of storing the stats directly in the slot we can consider sending it to stats collector as suggested by Tomas. Basically that can avoid contention around slots (See discussion in email [1]). I have not evaluated any of the approaches in detail so you can let us know the advantage of one over another. Now, you might be already considering this but I thought it is better to share what I have in mind rather than saying that later once you have the draft patch ready. [1] - https://www.postgresql.org/message-id/20200623101831.it6lzwbm37xwquco%40development -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, 30 Jun 2020 at 12:58, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jun 30, 2020 at 6:38 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Mon, 29 Jun 2020 at 20:37, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Jun 29, 2020 at 10:26 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > I agree that it would not be a common case that the user sets > > > > different values for different processes. Based on that assumption, I > > > > also think having the stats at slot-level is a good idea. > > > > > > > > > > Okay. > > > > > > > But I might > > > > want to have the reset function. > > > > > > > > > > I don't mind but lets fist see how the patch for the basic feature > > > looks and what is required to implement it? Are you interested in > > > writing the patch for this work? > > > > Yes, I'll write the draft patch. > > > > Great, thanks. One thing we can consider is that instead of storing > the stats directly in the slot we can consider sending it to stats > collector as suggested by Tomas. Basically that can avoid contention > around slots (See discussion in email [1]). I have not evaluated any > of the approaches in detail so you can let us know the advantage of > one over another. Now, you might be already considering this but I > thought it is better to share what I have in mind rather than saying > that later once you have the draft patch ready. Thanks! Yes, I'm working on this patch while considering to send the stats to stats collector. I've attached PoC patch that implements a simple approach. I'd like to discuss how we collect the replication slot statistics in the stats collector before I bring the patch to completion. In this PoC patch, we have the array of PgStat_ReplSlotStats struct which has max_replication_slots entries. The backend and wal sender send the slot statistics to the stats collector when decoding a commit WAL record. typedef struct PgStat_ReplSlotStats { char slotname[NAMEDATALEN]; PgStat_Counter spill_txns; PgStat_Counter spill_count; PgStat_Counter spill_bytes; } PgStat_ReplSlotStats; What I'd like to discuss are: Since the unique identifier of replication slots is the name, the process sends slot statistics along with slot name to stats collector. I'm concerned about the amount of data sent to the stats collector and the cost of searching the statistics within the statistics array (it’s O(N) where N is max_replication_slots). Since the maximum length of slot name is NAMEDATALEN (64 bytes default) and max_replication_slots is unlikely a large number, I might be too worrying but it seems like it’s better to avoid that if we can do that easily. An idea I came up with is to use the index of slots (i.g., the index of ReplicationSlotCtl->replication_slots[]) as the index of statistics of slot in the stats collector. But since the index of slots could change after the restart we need to synchronize the index of slots on both array somehow. So I thought that we can determine the index of the statistics of slots at ReplicationSlotAcquire() or ReplicationSlotCreate(), but it will in turn need to read stats file while holding ReplicationSlotControlLock to prevent the same index of the statistics being used by the concurrent process who creating a slot. I might be missing something though. Second, as long as the unique identifier is the slot name there is no convenient way to distinguish between the same name old and new replication slots, so the backend process or wal sender process sends a message to the stats collector to drop the replication slot at ReplicationSlotDropPtr(). This strategy differs from what we do for table, index, and function statistics. It might not be a problem but I’m thinking a better way. The new view name is also an open question. I prefer pg_stat_replication_slots and to add stats of physical replication slots to the same view in the future, rather than a separate view. Aside from the above, this patch will change the most of the changes introduced by commit 9290ad198b1 and introduce new code much. I’m concerned whether such changes are acceptable at the time of beta 2. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Thu, Jul 2, 2020 at 1:31 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
Thanks! Yes, I'm working on this patch while considering to send the
stats to stats collector.
I've attached PoC patch that implements a simple approach. I'd like to
discuss how we collect the replication slot statistics in the stats
collector before I bring the patch to completion.
I understand the patch is only in the initial stage but I just tried testing it. Using the patch, I enabled logical replication and created two pub/subs (sub1,sub2) for two seperate tables (t1,t2). I inserted data into the second table (t2) such that it spills into disk.
Then when I checked the stats using the new function pg_stat_get_replication_slots() , I see that the same stats are updated for both the slots, when ideally it should have reflected in the second slot alone.
postgres=# SELECT s.name, s.spill_txns, s.spill_count, s.spill_bytes FROM pg_stat_get_replication_slots() s(name, spill_txns, spill_count, spill_bytes);
name | spill_txns | spill_count | spill_bytes
------+------------+-------------+-------------
sub1 | 1 | 20 | 1320000000
sub2 | 1 | 20 | 1320000000
(2 rows)
name | spill_txns | spill_count | spill_bytes
------+------------+-------------+-------------
sub1 | 1 | 20 | 1320000000
sub2 | 1 | 20 | 1320000000
(2 rows)
I haven't debugged the issue yet, I can if you wish but just thought I'd let you know what I found.
thanks,
Ajin Cherian
Fujitsu Australia
On Sat, 4 Jul 2020 at 22:13, Ajin Cherian <itsajin@gmail.com> wrote: > > > > On Thu, Jul 2, 2020 at 1:31 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> >> >> Thanks! Yes, I'm working on this patch while considering to send the >> stats to stats collector. >> >> I've attached PoC patch that implements a simple approach. I'd like to >> discuss how we collect the replication slot statistics in the stats >> collector before I bring the patch to completion. >> > > I understand the patch is only in the initial stage but I just tried testing it. Thank you for testing the patch! > Using the patch, I enabled logical replication and created two pub/subs (sub1,sub2) for two seperate tables (t1,t2). Iinserted data into the second table (t2) such that it spills into disk. > Then when I checked the stats using the new function pg_stat_get_replication_slots() , I see that the same stats are updatedfor both the slots, when ideally it should have reflected in the second slot alone. > > postgres=# SELECT s.name, s.spill_txns, s.spill_count, s.spill_bytes FROM pg_stat_get_replication_slots() s(name,spill_txns, spill_count, spill_bytes); > name | spill_txns | spill_count | spill_bytes > ------+------------+-------------+------------- > sub1 | 1 | 20 | 1320000000 > sub2 | 1 | 20 | 1320000000 > (2 rows) > I think this is because logical decodings behind those two logical replications decode all WAL records *before* filtering the specified tables. In logical replication, we decode the whole WAL stream and then pass it to a logical decoding output plugin such as pgoutput. And then we filter tables according to the publication. Therefore, even if subscription sub1 is not interested in changes related to table t2, the replication slot sub1 needs to decode the whole WAL stream, resulting in spilling into disk. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jul 2, 2020 at 9:01 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > I've attached PoC patch that implements a simple approach. I'd like to > discuss how we collect the replication slot statistics in the stats > collector before I bring the patch to completion. > > In this PoC patch, we have the array of PgStat_ReplSlotStats struct > which has max_replication_slots entries. The backend and wal sender > send the slot statistics to the stats collector when decoding a commit > WAL record. > > typedef struct PgStat_ReplSlotStats > { > char slotname[NAMEDATALEN]; > PgStat_Counter spill_txns; > PgStat_Counter spill_count; > PgStat_Counter spill_bytes; > } PgStat_ReplSlotStats; > > What I'd like to discuss are: > > Since the unique identifier of replication slots is the name, the > process sends slot statistics along with slot name to stats collector. > I'm concerned about the amount of data sent to the stats collector and > the cost of searching the statistics within the statistics array (it’s > O(N) where N is max_replication_slots). Since the maximum length of > slot name is NAMEDATALEN (64 bytes default) and max_replication_slots > is unlikely a large number, I might be too worrying but it seems like > it’s better to avoid that if we can do that easily. An idea I came up > with is to use the index of slots (i.g., the index of > ReplicationSlotCtl->replication_slots[]) as the index of statistics of > slot in the stats collector. But since the index of slots could change > after the restart we need to synchronize the index of slots on both > array somehow. So I thought that we can determine the index of the > statistics of slots at ReplicationSlotAcquire() or > ReplicationSlotCreate(), but it will in turn need to read stats file > while holding ReplicationSlotControlLock to prevent the same index of > the statistics being used by the concurrent process who creating a > slot. I might be missing something though. > I don't think we should be bothered about the large values of max_replication_slots. The default value is 10 and I am not sure if users will be able to select values large enough that we should bother about searching them by name. I think if it could turn out to be a problem then we can try to invent something to mitigate it. > Second, as long as the unique identifier is the slot name there is no > convenient way to distinguish between the same name old and new > replication slots, so the backend process or wal sender process sends > a message to the stats collector to drop the replication slot at > ReplicationSlotDropPtr(). This strategy differs from what we do for > table, index, and function statistics. It might not be a problem but > I’m thinking a better way. > Can we rely on message ordering in the transmission mechanism (UDP) for stats? The wiki suggests [1] we can't. If so, then this might not work. > The new view name is also an open question. I prefer > pg_stat_replication_slots and to add stats of physical replication > slots to the same view in the future, rather than a separate view. > This sounds okay to me. > Aside from the above, this patch will change the most of the changes > introduced by commit 9290ad198b1 and introduce new code much. I’m > concerned whether such changes are acceptable at the time of beta 2. > I think it depends on the final patch. My initial thought was that we should do this for PG14 but if you are suggesting removing the changes done by commit 9290ad198b1 then we need to think over it. I could think of below options: a. Revert 9290ad198b1 and introduce stats for spilling in PG14. We were anyway having spilling without any work in PG13 but didn’t have stats. b. Try to get your patch in PG13 if we can, otherwise, revert the feature 9290ad198b1. c. Get whatever we have in commit 9290ad198b1 for PG13 and additionally have what we are discussing here for PG14. This means that spilled stats at slot level will be available in PG14 via pg_stat_replication_slots and for individual WAL senders it will be available via pg_stat_replication both in PG13 and PG14. Even if we can get your patch in PG13, we can still keep those in pg_stat_replication. d. Get whatever we have in commit 9290ad198b1 for PG13 and change it for PG14. I don't think this will be a popular approach. [1] - https://en.wikipedia.org/wiki/User_Datagram_Protocol -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, 6 Jul 2020 at 20:45, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jul 2, 2020 at 9:01 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > I've attached PoC patch that implements a simple approach. I'd like to > > discuss how we collect the replication slot statistics in the stats > > collector before I bring the patch to completion. > > > > In this PoC patch, we have the array of PgStat_ReplSlotStats struct > > which has max_replication_slots entries. The backend and wal sender > > send the slot statistics to the stats collector when decoding a commit > > WAL record. > > > > typedef struct PgStat_ReplSlotStats > > { > > char slotname[NAMEDATALEN]; > > PgStat_Counter spill_txns; > > PgStat_Counter spill_count; > > PgStat_Counter spill_bytes; > > } PgStat_ReplSlotStats; > > > > What I'd like to discuss are: > > > > Since the unique identifier of replication slots is the name, the > > process sends slot statistics along with slot name to stats collector. > > I'm concerned about the amount of data sent to the stats collector and > > the cost of searching the statistics within the statistics array (it’s > > O(N) where N is max_replication_slots). Since the maximum length of > > slot name is NAMEDATALEN (64 bytes default) and max_replication_slots > > is unlikely a large number, I might be too worrying but it seems like > > it’s better to avoid that if we can do that easily. An idea I came up > > with is to use the index of slots (i.g., the index of > > ReplicationSlotCtl->replication_slots[]) as the index of statistics of > > slot in the stats collector. But since the index of slots could change > > after the restart we need to synchronize the index of slots on both > > array somehow. So I thought that we can determine the index of the > > statistics of slots at ReplicationSlotAcquire() or > > ReplicationSlotCreate(), but it will in turn need to read stats file > > while holding ReplicationSlotControlLock to prevent the same index of > > the statistics being used by the concurrent process who creating a > > slot. I might be missing something though. > > > > I don't think we should be bothered about the large values of > max_replication_slots. The default value is 10 and I am not sure if > users will be able to select values large enough that we should bother > about searching them by name. I think if it could turn out to be a > problem then we can try to invent something to mitigate it. Agreed. > > > Second, as long as the unique identifier is the slot name there is no > > convenient way to distinguish between the same name old and new > > replication slots, so the backend process or wal sender process sends > > a message to the stats collector to drop the replication slot at > > ReplicationSlotDropPtr(). This strategy differs from what we do for > > table, index, and function statistics. It might not be a problem but > > I’m thinking a better way. > > > > Can we rely on message ordering in the transmission mechanism (UDP) > for stats? The wiki suggests [1] we can't. If so, then this might > not work. Yeah, I'm also concerned about this. Another idea would be to have another unique identifier to distinguish old and new replication slots with the same name. For example, creation timestamp. And then we reclaim the stats of unused slots later like table and function statistics. On the other hand, if the ordering were to be reversed, we would miss that stats but the next stat reporting would create the new entry. If the problem is unlikely to happen in common case we can live with that. > > The new view name is also an open question. I prefer > > pg_stat_replication_slots and to add stats of physical replication > > slots to the same view in the future, rather than a separate view. > > > > This sounds okay to me. > > > Aside from the above, this patch will change the most of the changes > > introduced by commit 9290ad198b1 and introduce new code much. I’m > > concerned whether such changes are acceptable at the time of beta 2. > > > > I think it depends on the final patch. My initial thought was that we > should do this for PG14 but if you are suggesting removing the changes > done by commit 9290ad198b1 then we need to think over it. I could > think of below options: > a. Revert 9290ad198b1 and introduce stats for spilling in PG14. We > were anyway having spilling without any work in PG13 but didn’t have > stats. > b. Try to get your patch in PG13 if we can, otherwise, revert the > feature 9290ad198b1. > c. Get whatever we have in commit 9290ad198b1 for PG13 and > additionally have what we are discussing here for PG14. This means > that spilled stats at slot level will be available in PG14 via > pg_stat_replication_slots and for individual WAL senders it will be > available via pg_stat_replication both in PG13 and PG14. Even if we > can get your patch in PG13, we can still keep those in > pg_stat_replication. > d. Get whatever we have in commit 9290ad198b1 for PG13 and change it > for PG14. I don't think this will be a popular approach. I was thinking option (a) or (b). I'm inclined to option (a) since the PoC patch added a certain amount of new codes. I agree with you that it depends on the final patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jul 7, 2020 at 7:07 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 6 Jul 2020 at 20:45, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > Second, as long as the unique identifier is the slot name there is no > > > convenient way to distinguish between the same name old and new > > > replication slots, so the backend process or wal sender process sends > > > a message to the stats collector to drop the replication slot at > > > ReplicationSlotDropPtr(). This strategy differs from what we do for > > > table, index, and function statistics. It might not be a problem but > > > I’m thinking a better way. > > > > > > > Can we rely on message ordering in the transmission mechanism (UDP) > > for stats? The wiki suggests [1] we can't. If so, then this might > > not work. > > Yeah, I'm also concerned about this. Another idea would be to have > another unique identifier to distinguish old and new replication slots > with the same name. For example, creation timestamp. And then we > reclaim the stats of unused slots later like table and function > statistics. > So, we need to have 'creation timestamp' as persistent data for slots to achieve this? I am not sure of adding creation_time as a parameter to identify for this case because users can change timings on systems so it might not be a bullet-proof method but I agree that it can work in general. > On the other hand, if the ordering were to be reversed, we would miss > that stats but the next stat reporting would create the new entry. If > the problem is unlikely to happen in common case we can live with > that. > Yeah, that is a valid point and I think otherwise also some UDP packets can be lost so maybe we don't need to worry too much about this. I guess we can add a comment in the code for such a case. > > > > > Aside from the above, this patch will change the most of the changes > > > introduced by commit 9290ad198b1 and introduce new code much. I’m > > > concerned whether such changes are acceptable at the time of beta 2. > > > > > > > I think it depends on the final patch. My initial thought was that we > > should do this for PG14 but if you are suggesting removing the changes > > done by commit 9290ad198b1 then we need to think over it. I could > > think of below options: > > a. Revert 9290ad198b1 and introduce stats for spilling in PG14. We > > were anyway having spilling without any work in PG13 but didn’t have > > stats. > > b. Try to get your patch in PG13 if we can, otherwise, revert the > > feature 9290ad198b1. > > c. Get whatever we have in commit 9290ad198b1 for PG13 and > > additionally have what we are discussing here for PG14. This means > > that spilled stats at slot level will be available in PG14 via > > pg_stat_replication_slots and for individual WAL senders it will be > > available via pg_stat_replication both in PG13 and PG14. Even if we > > can get your patch in PG13, we can still keep those in > > pg_stat_replication. > > d. Get whatever we have in commit 9290ad198b1 for PG13 and change it > > for PG14. I don't think this will be a popular approach. > > I was thinking option (a) or (b). I'm inclined to option (a) since the > PoC patch added a certain amount of new codes. I agree with you that > it depends on the final patch. > Magnus, Tomas, others, do you have any suggestions on the above options or let us know if you have any other option in mind? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Jul 7, 2020 at 5:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jul 7, 2020 at 7:07 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Mon, 6 Jul 2020 at 20:45, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > Second, as long as the unique identifier is the slot name there is no
> > > convenient way to distinguish between the same name old and new
> > > replication slots, so the backend process or wal sender process sends
> > > a message to the stats collector to drop the replication slot at
> > > ReplicationSlotDropPtr(). This strategy differs from what we do for
> > > table, index, and function statistics. It might not be a problem but
> > > I’m thinking a better way.
> > >
> >
> > Can we rely on message ordering in the transmission mechanism (UDP)
> > for stats? The wiki suggests [1] we can't. If so, then this might
> > not work.
>
> Yeah, I'm also concerned about this. Another idea would be to have
> another unique identifier to distinguish old and new replication slots
> with the same name. For example, creation timestamp. And then we
> reclaim the stats of unused slots later like table and function
> statistics.
>
So, we need to have 'creation timestamp' as persistent data for slots
to achieve this? I am not sure of adding creation_time as a parameter
to identify for this case because users can change timings on systems
so it might not be a bullet-proof method but I agree that it can work
in general.
If we need them to be persistent across time like that, perhaps we simply need to assign oids to replication slots? That might simplify this problem quite a bit?
> On the other hand, if the ordering were to be reversed, we would miss
> that stats but the next stat reporting would create the new entry. If
> the problem is unlikely to happen in common case we can live with
> that.
>
Yeah, that is a valid point and I think otherwise also some UDP
packets can be lost so maybe we don't need to worry too much about
this. I guess we can add a comment in the code for such a case.
The fact that we may in theory lose some packages over UDP is the main reason we're using UDP in the first place, I believe :) But it's highly unlikely to happen in the real world I believe (and I think on some platforms impossible).
> > > Aside from the above, this patch will change the most of the changes
> > > introduced by commit 9290ad198b1 and introduce new code much. I’m
> > > concerned whether such changes are acceptable at the time of beta 2.
> > >
> >
> > I think it depends on the final patch. My initial thought was that we
> > should do this for PG14 but if you are suggesting removing the changes
> > done by commit 9290ad198b1 then we need to think over it. I could
> > think of below options:
> > a. Revert 9290ad198b1 and introduce stats for spilling in PG14. We
> > were anyway having spilling without any work in PG13 but didn’t have
> > stats.
> > b. Try to get your patch in PG13 if we can, otherwise, revert the
> > feature 9290ad198b1.
> > c. Get whatever we have in commit 9290ad198b1 for PG13 and
> > additionally have what we are discussing here for PG14. This means
> > that spilled stats at slot level will be available in PG14 via
> > pg_stat_replication_slots and for individual WAL senders it will be
> > available via pg_stat_replication both in PG13 and PG14. Even if we
> > can get your patch in PG13, we can still keep those in
> > pg_stat_replication.
> > d. Get whatever we have in commit 9290ad198b1 for PG13 and change it
> > for PG14. I don't think this will be a popular approach.
>
> I was thinking option (a) or (b). I'm inclined to option (a) since the
> PoC patch added a certain amount of new codes. I agree with you that
> it depends on the final patch.
>
Magnus, Tomas, others, do you have any suggestions on the above
options or let us know if you have any other option in mind?
I have a feeling it's far too late for (b) at this time. Regardless of the size of the patch, it feels that this can end up being a rushed and not thought-through-all-the-way one, in which case we may end up in an even worse position.
Much as I would like to have these stats earlier, I'm also leaning towards (a).
//Magnus
On Tue, 7 Jul 2020 at 17:50, Magnus Hagander <magnus@hagander.net> wrote: > > > > On Tue, Jul 7, 2020 at 5:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Tue, Jul 7, 2020 at 7:07 AM Masahiko Sawada >> <masahiko.sawada@2ndquadrant.com> wrote: >> > >> > On Mon, 6 Jul 2020 at 20:45, Amit Kapila <amit.kapila16@gmail.com> wrote: >> > > >> > > > Second, as long as the unique identifier is the slot name there is no >> > > > convenient way to distinguish between the same name old and new >> > > > replication slots, so the backend process or wal sender process sends >> > > > a message to the stats collector to drop the replication slot at >> > > > ReplicationSlotDropPtr(). This strategy differs from what we do for >> > > > table, index, and function statistics. It might not be a problem but >> > > > I’m thinking a better way. >> > > > >> > > >> > > Can we rely on message ordering in the transmission mechanism (UDP) >> > > for stats? The wiki suggests [1] we can't. If so, then this might >> > > not work. >> > >> > Yeah, I'm also concerned about this. Another idea would be to have >> > another unique identifier to distinguish old and new replication slots >> > with the same name. For example, creation timestamp. And then we >> > reclaim the stats of unused slots later like table and function >> > statistics. >> > >> >> So, we need to have 'creation timestamp' as persistent data for slots >> to achieve this? I am not sure of adding creation_time as a parameter >> to identify for this case because users can change timings on systems >> so it might not be a bullet-proof method but I agree that it can work >> in general. > > > If we need them to be persistent across time like that, perhaps we simply need to assign oids to replication slots? Thatmight simplify this problem quite a bit? Yeah, I guess assigning oids to replication slots in the same way of oids in system catalogs might not work because physical replication slot can be created even during recovery. But using a monotonically-increasing integer as id seems better and straight forward. This id is not necessarily displayed in pg_repliation_slots view because the user already can use slot name as a unique identifier. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jul 8, 2020 at 11:28 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 7 Jul 2020 at 17:50, Magnus Hagander <magnus@hagander.net> wrote: > > > > > > > > On Tue, Jul 7, 2020 at 5:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > >> On Tue, Jul 7, 2020 at 7:07 AM Masahiko Sawada > >> <masahiko.sawada@2ndquadrant.com> wrote: > >> > > >> > On Mon, 6 Jul 2020 at 20:45, Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > > > >> > > > Second, as long as the unique identifier is the slot name there is no > >> > > > convenient way to distinguish between the same name old and new > >> > > > replication slots, so the backend process or wal sender process sends > >> > > > a message to the stats collector to drop the replication slot at > >> > > > ReplicationSlotDropPtr(). This strategy differs from what we do for > >> > > > table, index, and function statistics. It might not be a problem but > >> > > > I’m thinking a better way. > >> > > > > >> > > > >> > > Can we rely on message ordering in the transmission mechanism (UDP) > >> > > for stats? The wiki suggests [1] we can't. If so, then this might > >> > > not work. > >> > > >> > Yeah, I'm also concerned about this. Another idea would be to have > >> > another unique identifier to distinguish old and new replication slots > >> > with the same name. For example, creation timestamp. And then we > >> > reclaim the stats of unused slots later like table and function > >> > statistics. > >> > > >> > >> So, we need to have 'creation timestamp' as persistent data for slots > >> to achieve this? I am not sure of adding creation_time as a parameter > >> to identify for this case because users can change timings on systems > >> so it might not be a bullet-proof method but I agree that it can work > >> in general. > > > > > > If we need them to be persistent across time like that, perhaps we simply need to assign oids to replication slots? Thatmight simplify this problem quite a bit? > > Yeah, I guess assigning oids to replication slots in the same way of > oids in system catalogs might not work because physical replication > slot can be created even during recovery. But using a > monotonically-increasing integer as id seems better and straight > forward. > But don't we need to make it WAL logged as well similar to what we do in GetNewObjectId? I am thinking do we really need Oids for slots or is it okay to have some approximate stats in boundary cases? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, 8 Jul 2020 at 16:04, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jul 8, 2020 at 11:28 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 7 Jul 2020 at 17:50, Magnus Hagander <magnus@hagander.net> wrote: > > > > > > > > > > > > On Tue, Jul 7, 2020 at 5:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > >> > > >> On Tue, Jul 7, 2020 at 7:07 AM Masahiko Sawada > > >> <masahiko.sawada@2ndquadrant.com> wrote: > > >> > > > >> > On Mon, 6 Jul 2020 at 20:45, Amit Kapila <amit.kapila16@gmail.com> wrote: > > >> > > > > >> > > > Second, as long as the unique identifier is the slot name there is no > > >> > > > convenient way to distinguish between the same name old and new > > >> > > > replication slots, so the backend process or wal sender process sends > > >> > > > a message to the stats collector to drop the replication slot at > > >> > > > ReplicationSlotDropPtr(). This strategy differs from what we do for > > >> > > > table, index, and function statistics. It might not be a problem but > > >> > > > I’m thinking a better way. > > >> > > > > > >> > > > > >> > > Can we rely on message ordering in the transmission mechanism (UDP) > > >> > > for stats? The wiki suggests [1] we can't. If so, then this might > > >> > > not work. > > >> > > > >> > Yeah, I'm also concerned about this. Another idea would be to have > > >> > another unique identifier to distinguish old and new replication slots > > >> > with the same name. For example, creation timestamp. And then we > > >> > reclaim the stats of unused slots later like table and function > > >> > statistics. > > >> > > > >> > > >> So, we need to have 'creation timestamp' as persistent data for slots > > >> to achieve this? I am not sure of adding creation_time as a parameter > > >> to identify for this case because users can change timings on systems > > >> so it might not be a bullet-proof method but I agree that it can work > > >> in general. > > > > > > > > > If we need them to be persistent across time like that, perhaps we simply need to assign oids to replication slots?That might simplify this problem quite a bit? > > > > Yeah, I guess assigning oids to replication slots in the same way of > > oids in system catalogs might not work because physical replication > > slot can be created even during recovery. But using a > > monotonically-increasing integer as id seems better and straight > > forward. > > > > But don't we need to make it WAL logged as well similar to what we do > in GetNewObjectId? Yes. I was thinking that assigning (the maximum number of the existing slot id + 1) to a new slot without WAL logging. > I am thinking do we really need Oids for slots or > is it okay to have some approximate stats in boundary cases? I think that using oids has another benefit that we don't need to send slot name to the stats collector along with the stats. Since the maximum size of slot name is NAMEDATALEN and we don't support the pgstat message larger than PGSTAT_MAX_MSG_SIZE (1000 bytes), if the user wants to increase NAMEDATALEN they might not be able to build. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jul 8, 2020 at 1:14 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 8 Jul 2020 at 16:04, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Jul 8, 2020 at 11:28 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > If we need them to be persistent across time like that, perhaps we simply need to assign oids to replication slots?That might simplify this problem quite a bit? > > > > > > Yeah, I guess assigning oids to replication slots in the same way of > > > oids in system catalogs might not work because physical replication > > > slot can be created even during recovery. But using a > > > monotonically-increasing integer as id seems better and straight > > > forward. > > > > > > > But don't we need to make it WAL logged as well similar to what we do > > in GetNewObjectId? > > Yes. I was thinking that assigning (the maximum number of the existing > slot id + 1) to a new slot without WAL logging. > > > I am thinking do we really need Oids for slots or > > is it okay to have some approximate stats in boundary cases? > > I think that using oids has another benefit that we don't need to send > slot name to the stats collector along with the stats. Since the > maximum size of slot name is NAMEDATALEN and we don't support the > pgstat message larger than PGSTAT_MAX_MSG_SIZE (1000 bytes), if the > user wants to increase NAMEDATALEN they might not be able to build. > I think NAMEDATALEN is used for many other objects as well and I don't think we want to change it in foreseeable future, so that doesn't sound to be a good reason to invent OIDs for slots. OTOH, I do understand it would be better to send OIDs than names for slots but I am just not sure if it is a good idea to invent a new way to generate OIDs (which is different from how we do it for other objects in the system) for this purpose. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Jul 7, 2020 at 2:20 PM Magnus Hagander <magnus@hagander.net> wrote: > > On Tue, Jul 7, 2020 at 5:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> > > I think it depends on the final patch. My initial thought was that we >> > > should do this for PG14 but if you are suggesting removing the changes >> > > done by commit 9290ad198b1 then we need to think over it. I could >> > > think of below options: >> > > a. Revert 9290ad198b1 and introduce stats for spilling in PG14. We >> > > were anyway having spilling without any work in PG13 but didn’t have >> > > stats. >> > > b. Try to get your patch in PG13 if we can, otherwise, revert the >> > > feature 9290ad198b1. >> > > c. Get whatever we have in commit 9290ad198b1 for PG13 and >> > > additionally have what we are discussing here for PG14. This means >> > > that spilled stats at slot level will be available in PG14 via >> > > pg_stat_replication_slots and for individual WAL senders it will be >> > > available via pg_stat_replication both in PG13 and PG14. Even if we >> > > can get your patch in PG13, we can still keep those in >> > > pg_stat_replication. >> > > d. Get whatever we have in commit 9290ad198b1 for PG13 and change it >> > > for PG14. I don't think this will be a popular approach. >> > >> > I was thinking option (a) or (b). I'm inclined to option (a) since the >> > PoC patch added a certain amount of new codes. I agree with you that >> > it depends on the final patch. >> > >> >> Magnus, Tomas, others, do you have any suggestions on the above >> options or let us know if you have any other option in mind? >> > > I have a feeling it's far too late for (b) at this time. Regardless of the size of the patch, it feels that this can endup being a rushed and not thought-through-all-the-way one, in which case we may end up in an even worse position. > > Much as I would like to have these stats earlier, I'm also leaning towards (a). > Fair enough. The attached patch reverts the commits related to these stats. Sawada-San, can you please once see if I have missed anything apart from catversion bump which I will do before commit? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Thu, 9 Jul 2020 at 16:09, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jul 7, 2020 at 2:20 PM Magnus Hagander <magnus@hagander.net> wrote: > > > > On Tue, Jul 7, 2020 at 5:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > >> > > I think it depends on the final patch. My initial thought was that we > >> > > should do this for PG14 but if you are suggesting removing the changes > >> > > done by commit 9290ad198b1 then we need to think over it. I could > >> > > think of below options: > >> > > a. Revert 9290ad198b1 and introduce stats for spilling in PG14. We > >> > > were anyway having spilling without any work in PG13 but didn’t have > >> > > stats. > >> > > b. Try to get your patch in PG13 if we can, otherwise, revert the > >> > > feature 9290ad198b1. > >> > > c. Get whatever we have in commit 9290ad198b1 for PG13 and > >> > > additionally have what we are discussing here for PG14. This means > >> > > that spilled stats at slot level will be available in PG14 via > >> > > pg_stat_replication_slots and for individual WAL senders it will be > >> > > available via pg_stat_replication both in PG13 and PG14. Even if we > >> > > can get your patch in PG13, we can still keep those in > >> > > pg_stat_replication. > >> > > d. Get whatever we have in commit 9290ad198b1 for PG13 and change it > >> > > for PG14. I don't think this will be a popular approach. > >> > > >> > I was thinking option (a) or (b). I'm inclined to option (a) since the > >> > PoC patch added a certain amount of new codes. I agree with you that > >> > it depends on the final patch. > >> > > >> > >> Magnus, Tomas, others, do you have any suggestions on the above > >> options or let us know if you have any other option in mind? > >> > > > > I have a feeling it's far too late for (b) at this time. Regardless of the size of the patch, it feels that this canend up being a rushed and not thought-through-all-the-way one, in which case we may end up in an even worse position. > > > > Much as I would like to have these stats earlier, I'm also leaning towards (a). > > > > Fair enough. The attached patch reverts the commits related to these > stats. Sawada-San, can you please once see if I have missed anything > apart from catversion bump which I will do before commit? Thank you for the patch! Do we remove the corresponding line in the release note by another commit? For the rest, the looks good to me. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 9 Jul 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jul 8, 2020 at 1:14 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 8 Jul 2020 at 16:04, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Jul 8, 2020 at 11:28 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > > > If we need them to be persistent across time like that, perhaps we simply need to assign oids to replication slots?That might simplify this problem quite a bit? > > > > > > > > Yeah, I guess assigning oids to replication slots in the same way of > > > > oids in system catalogs might not work because physical replication > > > > slot can be created even during recovery. But using a > > > > monotonically-increasing integer as id seems better and straight > > > > forward. > > > > > > > > > > But don't we need to make it WAL logged as well similar to what we do > > > in GetNewObjectId? > > > > Yes. I was thinking that assigning (the maximum number of the existing > > slot id + 1) to a new slot without WAL logging. > > > > > I am thinking do we really need Oids for slots or > > > is it okay to have some approximate stats in boundary cases? > > > > I think that using oids has another benefit that we don't need to send > > slot name to the stats collector along with the stats. Since the > > maximum size of slot name is NAMEDATALEN and we don't support the > > pgstat message larger than PGSTAT_MAX_MSG_SIZE (1000 bytes), if the > > user wants to increase NAMEDATALEN they might not be able to build. > > > > I think NAMEDATALEN is used for many other objects as well and I don't > think we want to change it in foreseeable future, so that doesn't > sound to be a good reason to invent OIDs for slots. OTOH, I do > understand it would be better to send OIDs than names for slots but I > am just not sure if it is a good idea to invent a new way to generate > OIDs (which is different from how we do it for other objects in the > system) for this purpose. I'm concerned that there might be users who are using custom PostgreSQL that increased NAMEDATALEN for some reason. But indeed, I also agree with your concerns. So perhaps we can go with the current PoC patch approach as the first version (i.g., sending slot drop message to stats collector). When we need such a unique identifier also for other purposes, we will be able to change this feature so that it uses that identifier for this statistics reporting purpose. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jul 10, 2020 at 7:19 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 9 Jul 2020 at 16:09, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Fair enough. The attached patch reverts the commits related to these > > stats. Sawada-San, can you please once see if I have missed anything > > apart from catversion bump which I will do before commit? > > Thank you for the patch! > > Do we remove the corresponding line in the release note by another > commit? > Yes, I will do that as well. > For the rest, the looks good to me. > Thanks, will wait for a day or so and then push it early next week. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Jul 10, 2020 at 7:23 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 9 Jul 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Jul 8, 2020 at 1:14 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > I think that using oids has another benefit that we don't need to send > > > slot name to the stats collector along with the stats. Since the > > > maximum size of slot name is NAMEDATALEN and we don't support the > > > pgstat message larger than PGSTAT_MAX_MSG_SIZE (1000 bytes), if the > > > user wants to increase NAMEDATALEN they might not be able to build. > > > > > > > I think NAMEDATALEN is used for many other objects as well and I don't > > think we want to change it in foreseeable future, so that doesn't > > sound to be a good reason to invent OIDs for slots. OTOH, I do > > understand it would be better to send OIDs than names for slots but I > > am just not sure if it is a good idea to invent a new way to generate > > OIDs (which is different from how we do it for other objects in the > > system) for this purpose. > > I'm concerned that there might be users who are using custom > PostgreSQL that increased NAMEDATALEN for some reason. But indeed, I > also agree with your concerns. So perhaps we can go with the current > PoC patch approach as the first version (i.g., sending slot drop > message to stats collector). When we need such a unique identifier > also for other purposes, we will be able to change this feature so > that it uses that identifier for this statistics reporting purpose. > Okay, feel to submit the version atop my revert patch. I think you might want to remove the indexing stuff you have added for faster search as discussed above. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Jul 10, 2020 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jul 10, 2020 at 7:19 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 9 Jul 2020 at 16:09, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > Fair enough. The attached patch reverts the commits related to these > > > stats. Sawada-San, can you please once see if I have missed anything > > > apart from catversion bump which I will do before commit? > > > > Thank you for the patch! > > > > Do we remove the corresponding line in the release note by another > > commit? > > > > Yes, I will do that as well. > > > For the rest, the looks good to me. > > > > Thanks, will wait for a day or so and then push it early next week. > Pushed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Jul 10, 2020 at 2:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jul 10, 2020 at 7:23 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 9 Jul 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Jul 8, 2020 at 1:14 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > I think that using oids has another benefit that we don't need to send > > > > slot name to the stats collector along with the stats. Since the > > > > maximum size of slot name is NAMEDATALEN and we don't support the > > > > pgstat message larger than PGSTAT_MAX_MSG_SIZE (1000 bytes), if the > > > > user wants to increase NAMEDATALEN they might not be able to build. > > > > > > > > > > I think NAMEDATALEN is used for many other objects as well and I don't > > > think we want to change it in foreseeable future, so that doesn't > > > sound to be a good reason to invent OIDs for slots. OTOH, I do > > > understand it would be better to send OIDs than names for slots but I > > > am just not sure if it is a good idea to invent a new way to generate > > > OIDs (which is different from how we do it for other objects in the > > > system) for this purpose. > > > > I'm concerned that there might be users who are using custom > > PostgreSQL that increased NAMEDATALEN for some reason. But indeed, I > > also agree with your concerns. So perhaps we can go with the current > > PoC patch approach as the first version (i.g., sending slot drop > > message to stats collector). When we need such a unique identifier > > also for other purposes, we will be able to change this feature so > > that it uses that identifier for this statistics reporting purpose. > > > > Okay, feel to submit the version atop my revert patch. > Attached, please find the rebased version. I have kept prorows as 10 instead of 100 for pg_stat_get_replication_slots because I don't see much reason for keeping the value more than the default value of max_replication_slots. As we are targeting this patch for PG14, so I think we can now add the functionality to reset the stats as well. What do you think? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Thu, 16 Jul 2020 at 15:53, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jul 10, 2020 at 2:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jul 10, 2020 at 7:23 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Thu, 9 Jul 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Wed, Jul 8, 2020 at 1:14 PM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > > > > I think that using oids has another benefit that we don't need to send > > > > > slot name to the stats collector along with the stats. Since the > > > > > maximum size of slot name is NAMEDATALEN and we don't support the > > > > > pgstat message larger than PGSTAT_MAX_MSG_SIZE (1000 bytes), if the > > > > > user wants to increase NAMEDATALEN they might not be able to build. > > > > > > > > > > > > > I think NAMEDATALEN is used for many other objects as well and I don't > > > > think we want to change it in foreseeable future, so that doesn't > > > > sound to be a good reason to invent OIDs for slots. OTOH, I do > > > > understand it would be better to send OIDs than names for slots but I > > > > am just not sure if it is a good idea to invent a new way to generate > > > > OIDs (which is different from how we do it for other objects in the > > > > system) for this purpose. > > > > > > I'm concerned that there might be users who are using custom > > > PostgreSQL that increased NAMEDATALEN for some reason. But indeed, I > > > also agree with your concerns. So perhaps we can go with the current > > > PoC patch approach as the first version (i.g., sending slot drop > > > message to stats collector). When we need such a unique identifier > > > also for other purposes, we will be able to change this feature so > > > that it uses that identifier for this statistics reporting purpose. > > > > > > > Okay, feel to submit the version atop my revert patch. > > > > Attached, please find the rebased version. I have kept prorows as 10 > instead of 100 for pg_stat_get_replication_slots because I don't see > much reason for keeping the value more than the default value of > max_replication_slots. > Thank you for rebasing the patch! Agreed. > As we are targeting this patch for PG14, so I think we can now add the > functionality to reset the stats as well. What do you think? > Yeah, I was also updating the patch while adding the reset functions. However, I'm concerned about the following part: +static int +pgstat_replslot_index(const char *name) +{ + int i; + + Assert(nReplSlotStats <= max_replication_slots); + for (i = 0; i < nReplSlotStats; i++) + { + if (strcmp(replSlotStats[i].slotname, name) == 0) + return i; /* found */ + } + + /* not found, register new slot */ + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats)); + memcpy(&replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN); + return nReplSlotStats++; +} +static void +pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len) +{ + int idx; + + idx = pgstat_replslot_index(msg->m_slotname); + Assert(idx >= 0 && idx < max_replication_slots); As long as we cannot rely on message ordering, the above assertion could be false. For example, suppose that there is no unused replication slots and the user: 1. drops the existing slot. 2. creates a new slot. If the stats messages arrive in order of 2 and 1, the above assertion is false or leads to memory corruption when assertions are not enabled. A possible solution would be to add an in-use flag to PgStat_ReplSlotStats indicating whether the stats for slot is used or not. When receiving a drop message for a slot, the stats collector just marks the corresponding stats as unused. When receiving the stats report for a new slot but there is no unused stats slot, ignore it. What do you think? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jul 16, 2020 at 1:45 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 16 Jul 2020 at 15:53, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jul 10, 2020 at 2:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Fri, Jul 10, 2020 at 7:23 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Thu, 9 Jul 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Wed, Jul 8, 2020 at 1:14 PM Masahiko Sawada > > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > > > > > > > I think that using oids has another benefit that we don't need to send > > > > > > slot name to the stats collector along with the stats. Since the > > > > > > maximum size of slot name is NAMEDATALEN and we don't support the > > > > > > pgstat message larger than PGSTAT_MAX_MSG_SIZE (1000 bytes), if the > > > > > > user wants to increase NAMEDATALEN they might not be able to build. > > > > > > > > > > > > > > > > I think NAMEDATALEN is used for many other objects as well and I don't > > > > > think we want to change it in foreseeable future, so that doesn't > > > > > sound to be a good reason to invent OIDs for slots. OTOH, I do > > > > > understand it would be better to send OIDs than names for slots but I > > > > > am just not sure if it is a good idea to invent a new way to generate > > > > > OIDs (which is different from how we do it for other objects in the > > > > > system) for this purpose. > > > > > > > > I'm concerned that there might be users who are using custom > > > > PostgreSQL that increased NAMEDATALEN for some reason. But indeed, I > > > > also agree with your concerns. So perhaps we can go with the current > > > > PoC patch approach as the first version (i.g., sending slot drop > > > > message to stats collector). When we need such a unique identifier > > > > also for other purposes, we will be able to change this feature so > > > > that it uses that identifier for this statistics reporting purpose. > > > > > > > > > > Okay, feel to submit the version atop my revert patch. > > > > > > > Attached, please find the rebased version. I have kept prorows as 10 > > instead of 100 for pg_stat_get_replication_slots because I don't see > > much reason for keeping the value more than the default value of > > max_replication_slots. > > > > Thank you for rebasing the patch! Agreed. > > > As we are targeting this patch for PG14, so I think we can now add the > > functionality to reset the stats as well. What do you think? > > > > Yeah, I was also updating the patch while adding the reset functions. > > However, I'm concerned about the following part: > > +static int > +pgstat_replslot_index(const char *name) > +{ > + int i; > + > + Assert(nReplSlotStats <= max_replication_slots); > + for (i = 0; i < nReplSlotStats; i++) > + { > + if (strcmp(replSlotStats[i].slotname, name) == 0) > + return i; /* found */ > + } > + > + /* not found, register new slot */ > + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats)); > + memcpy(&replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN); > + return nReplSlotStats++; > +} > > +static void > +pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len) > +{ > + int idx; > + > + idx = pgstat_replslot_index(msg->m_slotname); > + Assert(idx >= 0 && idx < max_replication_slots); > > As long as we cannot rely on message ordering, the above assertion > could be false. For example, suppose that there is no unused > replication slots and the user: > > 1. drops the existing slot. > 2. creates a new slot. > > If the stats messages arrive in order of 2 and 1, the above assertion > is false or leads to memory corruption when assertions are not > enabled. > > A possible solution would be to add an in-use flag to > PgStat_ReplSlotStats indicating whether the stats for slot is used or > not. When receiving a drop message for a slot, the stats collector > just marks the corresponding stats as unused. When receiving the stats > report for a new slot but there is no unused stats slot, ignore it. > What do you think? > As of now, you have a boolean flag msg.m_drop to distinguish the drop message but we don't have a similar way to distinguish the 'create' message. What if have a way to distinguish 'create' message (we can probably keep some sort of flag to indicate the type of message (create, drop, update)) and then if the slot with the same name already exists, we ignore such a message. Now, we also need a way to create the entry for a slot for a normal stats update message as well to accommodate for the lost 'create' message. Does that make sense? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, 16 Jul 2020 at 18:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jul 16, 2020 at 1:45 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 16 Jul 2020 at 15:53, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Fri, Jul 10, 2020 at 2:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Fri, Jul 10, 2020 at 7:23 AM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > On Thu, 9 Jul 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > On Wed, Jul 8, 2020 at 1:14 PM Masahiko Sawada > > > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > > > > > > > > > > I think that using oids has another benefit that we don't need to send > > > > > > > slot name to the stats collector along with the stats. Since the > > > > > > > maximum size of slot name is NAMEDATALEN and we don't support the > > > > > > > pgstat message larger than PGSTAT_MAX_MSG_SIZE (1000 bytes), if the > > > > > > > user wants to increase NAMEDATALEN they might not be able to build. > > > > > > > > > > > > > > > > > > > I think NAMEDATALEN is used for many other objects as well and I don't > > > > > > think we want to change it in foreseeable future, so that doesn't > > > > > > sound to be a good reason to invent OIDs for slots. OTOH, I do > > > > > > understand it would be better to send OIDs than names for slots but I > > > > > > am just not sure if it is a good idea to invent a new way to generate > > > > > > OIDs (which is different from how we do it for other objects in the > > > > > > system) for this purpose. > > > > > > > > > > I'm concerned that there might be users who are using custom > > > > > PostgreSQL that increased NAMEDATALEN for some reason. But indeed, I > > > > > also agree with your concerns. So perhaps we can go with the current > > > > > PoC patch approach as the first version (i.g., sending slot drop > > > > > message to stats collector). When we need such a unique identifier > > > > > also for other purposes, we will be able to change this feature so > > > > > that it uses that identifier for this statistics reporting purpose. > > > > > > > > > > > > > Okay, feel to submit the version atop my revert patch. > > > > > > > > > > Attached, please find the rebased version. I have kept prorows as 10 > > > instead of 100 for pg_stat_get_replication_slots because I don't see > > > much reason for keeping the value more than the default value of > > > max_replication_slots. > > > > > > > Thank you for rebasing the patch! Agreed. > > > > > As we are targeting this patch for PG14, so I think we can now add the > > > functionality to reset the stats as well. What do you think? > > > > > > > Yeah, I was also updating the patch while adding the reset functions. > > > > However, I'm concerned about the following part: > > > > +static int > > +pgstat_replslot_index(const char *name) > > +{ > > + int i; > > + > > + Assert(nReplSlotStats <= max_replication_slots); > > + for (i = 0; i < nReplSlotStats; i++) > > + { > > + if (strcmp(replSlotStats[i].slotname, name) == 0) > > + return i; /* found */ > > + } > > + > > + /* not found, register new slot */ > > + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats)); > > + memcpy(&replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN); > > + return nReplSlotStats++; > > +} > > > > +static void > > +pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len) > > +{ > > + int idx; > > + > > + idx = pgstat_replslot_index(msg->m_slotname); > > + Assert(idx >= 0 && idx < max_replication_slots); > > > > As long as we cannot rely on message ordering, the above assertion > > could be false. For example, suppose that there is no unused > > replication slots and the user: > > > > 1. drops the existing slot. > > 2. creates a new slot. > > > > If the stats messages arrive in order of 2 and 1, the above assertion > > is false or leads to memory corruption when assertions are not > > enabled. > > > > A possible solution would be to add an in-use flag to > > PgStat_ReplSlotStats indicating whether the stats for slot is used or > > not. When receiving a drop message for a slot, the stats collector > > just marks the corresponding stats as unused. When receiving the stats > > report for a new slot but there is no unused stats slot, ignore it. > > What do you think? > > > > As of now, you have a boolean flag msg.m_drop to distinguish the drop > message but we don't have a similar way to distinguish the 'create' > message. What if have a way to distinguish 'create' message (we can > probably keep some sort of flag to indicate the type of message > (create, drop, update)) and then if the slot with the same name > already exists, we ignore such a message. Now, we also need a way to > create the entry for a slot for a normal stats update message as well > to accommodate for the lost 'create' message. Does that make sense? I might be missing your point, but even if we have 'create' message, the problem can happen if when slots are full the user drops slot ‘slot_a’, creates slot ‘slot_b', and messages arrive in the reverse order? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jul 16, 2020 at 4:04 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 16 Jul 2020 at 18:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Jul 16, 2020 at 1:45 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > A possible solution would be to add an in-use flag to > > > PgStat_ReplSlotStats indicating whether the stats for slot is used or > > > not. When receiving a drop message for a slot, the stats collector > > > just marks the corresponding stats as unused. When receiving the stats > > > report for a new slot but there is no unused stats slot, ignore it. > > > What do you think? > > > > > > > As of now, you have a boolean flag msg.m_drop to distinguish the drop > > message but we don't have a similar way to distinguish the 'create' > > message. What if have a way to distinguish 'create' message (we can > > probably keep some sort of flag to indicate the type of message > > (create, drop, update)) and then if the slot with the same name > > already exists, we ignore such a message. Now, we also need a way to > > create the entry for a slot for a normal stats update message as well > > to accommodate for the lost 'create' message. Does that make sense? > > I might be missing your point, but even if we have 'create' message, > the problem can happen if when slots are full the user drops slot > ‘slot_a’, creates slot ‘slot_b', and messages arrive in the reverse > order? > In that case, also, we should drop the 'create' message of 'slot_b' as we don't have space but later when an 'update' message arrives with stats for the 'slot_b', we will create the entry. I am also thinking what if send only 'update' and 'drop' message, the message ordering problem can still happen but we will lose one 'update' message in that case? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, 16 Jul 2020 at 20:01, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jul 16, 2020 at 4:04 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 16 Jul 2020 at 18:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Jul 16, 2020 at 1:45 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > A possible solution would be to add an in-use flag to > > > > PgStat_ReplSlotStats indicating whether the stats for slot is used or > > > > not. When receiving a drop message for a slot, the stats collector > > > > just marks the corresponding stats as unused. When receiving the stats > > > > report for a new slot but there is no unused stats slot, ignore it. > > > > What do you think? > > > > > > > > > > As of now, you have a boolean flag msg.m_drop to distinguish the drop > > > message but we don't have a similar way to distinguish the 'create' > > > message. What if have a way to distinguish 'create' message (we can > > > probably keep some sort of flag to indicate the type of message > > > (create, drop, update)) and then if the slot with the same name > > > already exists, we ignore such a message. Now, we also need a way to > > > create the entry for a slot for a normal stats update message as well > > > to accommodate for the lost 'create' message. Does that make sense? > > > > I might be missing your point, but even if we have 'create' message, > > the problem can happen if when slots are full the user drops slot > > ‘slot_a’, creates slot ‘slot_b', and messages arrive in the reverse > > order? > > > > In that case, also, we should drop the 'create' message of 'slot_b' as > we don't have space but later when an 'update' message arrives with > stats for the 'slot_b', we will create the entry. Agreed. > I am also thinking > what if send only 'update' and 'drop' message, the message ordering > problem can still happen but we will lose one 'update' message in that > case? Yes, I think so too. We will lose one 'update' message at a maximum. I've updated the patch so that the stats collector ignores the 'update' message if the slot stats array is already full. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Thu, Jul 23, 2020 at 11:46 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > I've updated the patch so that the stats collector ignores the > 'update' message if the slot stats array is already full. > This patch needs a rebase. I don't see this patch in the CF app. I hope you are still interested in working on this. Review comments: =============== 1. +CREATE VIEW pg_stat_replication_slots AS + SELECT + s.name, + s.spill_txns, + s.spill_count, + s.spill_bytes + FROM pg_stat_get_replication_slots() AS s; The view pg_stat_replication_slots should have a column 'stats_reset' (datatype: timestamp with time zone) as we provide a facitily to reset the slots. A similar column exists in pg_stat_slru as well, so is there a reason of not providing it here? 2. + </para> + </sect2> + + <sect2 id="monitoring-pg-stat-wal-receiver-view"> <title><structname>pg_stat_wal_receiver</structname></title> It is better to keep one empty line between </para> and </sect2> to keep it consistent with the documentation of other views. 3. <primary>pg_stat_reset_replication_slot</primary> + </indexterm> + <function>pg_stat_reset_replication_slot</function> ( <type>text</type> ) + <returnvalue>void</returnvalue> + </para> + <para> + Resets statistics to zero for a single replication slot, or for all + replication slots in the cluster. If the argument is NULL, all counters + shown in the <structname>pg_stat_replication_slots</structname> view for + all replication slots are reset. + </para> I think the information about the parameter for this function is not completely clear. It seems to me that it should be the name of the slot for which we want to reset the stats, if so, let's try to be clear. 4. +pgstat_reset_replslot_counter(const char *name) +{ + PgStat_MsgResetreplslotcounter msg; + + if (pgStatSock == PGINVALID_SOCKET) + return; + + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETREPLSLOTCOUNTER); + if (name) + { + memcpy(&msg.m_slotname, name, NAMEDATALEN); + msg.clearall = false; + } Don't we want to verify here or in the caller of this function whether the passed slot_name is a valid slot or not? For ex. see pgstat_reset_shared_counters where we return an error if the target is not valid. 5. +static void +pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg, + int len) +{ + int i; + int idx = -1; + TimestampTz ts; + + if (!msg->clearall) + { + /* Get the index of replication slot statistics to reset */ + idx = pgstat_replslot_index(msg->m_slotname, false); + + if (idx < 0) + return; /* not found */ Can we add a comment to describe when we don't expect to find the slot here unless there is no way that can happen? 6. +pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg, + int len) { .. + for (i = 0; i < SLRU_NUM_ELEMENTS; i++) .. } I think here we need to traverse till nReplSlotStats, not SLRU_NUM_ELEMENTS. 7. Don't we need to change PGSTAT_FILE_FORMAT_ID for this patch? We can probably do at the end but better to change it now so that it doesn't slip from our mind. 8. @@ -5350,6 +5474,23 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep) break; + /* + * 'R' A PgStat_ReplSlotStats struct describing a replication slot + * follows. + */ + case 'R': + if (fread(&replSlotStats[nReplSlotStats], 1, sizeof(PgStat_ReplSlotStats), fpin) + != sizeof(PgStat_ReplSlotStats)) + { + ereport(pgStatRunningInCollector ? LOG : WARNING, + (errmsg("corrupted statistics file \"%s\"", + statfile))); + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats)); + goto done; + } + nReplSlotStats++; + break; Both here and in pgstat_read_db_statsfile_timestamp(), the patch handles 'R' message after 'D' whereas while writing the 'R' is written before 'D'. So, I think it is better if we keep the order during read the same as during write. 9. While reviewing this patch, I noticed that in pgstat_read_db_statsfile_timestamp(), if we fail to read ArchiverStats or SLRUStats, we return 'false' from this function but OTOH, if we fail to read 'D' or 'R' message, we will return 'true'. I feel the handling of 'D' and 'R' message is fine because once we read GlobalStats, we can return the stats_timestamp. So the other two stands corrected. I understand that this is not directly related to this patch but if you agree we can do this as a separate patch. -- With Regards, Amit Kapila.
On Mon, 7 Sep 2020 at 15:24, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jul 23, 2020 at 11:46 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > I've updated the patch so that the stats collector ignores the > > 'update' message if the slot stats array is already full. > > > > This patch needs a rebase. I don't see this patch in the CF app. I > hope you are still interested in working on this. Thank you for reviewing this patch! I'm still going to work on this patch although I might be slow response this month. > > Review comments: > =============== > 1. > +CREATE VIEW pg_stat_replication_slots AS > + SELECT > + s.name, > + s.spill_txns, > + s.spill_count, > + s.spill_bytes > + FROM pg_stat_get_replication_slots() AS s; > > The view pg_stat_replication_slots should have a column 'stats_reset' > (datatype: timestamp with time zone) as we provide a facitily to reset > the slots. A similar column exists in pg_stat_slru as well, so is > there a reason of not providing it here? I had missed adding the column. Fixed. > > 2. > + </para> > + </sect2> > + > + <sect2 id="monitoring-pg-stat-wal-receiver-view"> > <title><structname>pg_stat_wal_receiver</structname></title> > > It is better to keep one empty line between </para> and </sect2> to > keep it consistent with the documentation of other views. Fixed. > > 3. > <primary>pg_stat_reset_replication_slot</primary> > + </indexterm> > + <function>pg_stat_reset_replication_slot</function> ( > <type>text</type> ) > + <returnvalue>void</returnvalue> > + </para> > + <para> > + Resets statistics to zero for a single replication slot, or for all > + replication slots in the cluster. If the argument is NULL, > all counters > + shown in the > <structname>pg_stat_replication_slots</structname> view for > + all replication slots are reset. > + </para> > > I think the information about the parameter for this function is not > completely clear. It seems to me that it should be the name of the > slot for which we want to reset the stats, if so, let's try to be > clear. Fixed. > > 4. > +pgstat_reset_replslot_counter(const char *name) > +{ > + PgStat_MsgResetreplslotcounter msg; > + > + if (pgStatSock == PGINVALID_SOCKET) > + return; > + > + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETREPLSLOTCOUNTER); > + if (name) > + { > + memcpy(&msg.m_slotname, name, NAMEDATALEN); > + msg.clearall = false; > + } > > Don't we want to verify here or in the caller of this function whether > the passed slot_name is a valid slot or not? For ex. see > pgstat_reset_shared_counters where we return an error if the target is > not valid. Agreed. Fixed. > > 5. > +static void > +pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg, > + int len) > +{ > + int i; > + int idx = -1; > + TimestampTz ts; > + > + if (!msg->clearall) > + { > + /* Get the index of replication slot statistics to reset */ > + idx = pgstat_replslot_index(msg->m_slotname, false); > + > + if (idx < 0) > + return; /* not found */ > > Can we add a comment to describe when we don't expect to find the slot > here unless there is no way that can happen? Added. > > 6. > +pgstat_recv_resetreplslotcounter(PgStat_MsgResetreplslotcounter *msg, > + int len) > { > .. > + for (i = 0; i < SLRU_NUM_ELEMENTS; i++) > .. > } > > I think here we need to traverse till nReplSlotStats, not SLRU_NUM_ELEMENTS. Fixed. > > 7. Don't we need to change PGSTAT_FILE_FORMAT_ID for this patch? We > can probably do at the end but better to change it now so that it > doesn't slip from our mind. Yes, changed. > > 8. > @@ -5350,6 +5474,23 @@ pgstat_read_statsfiles(Oid onlydb, bool > permanent, bool deep) > > break; > > + /* > + * 'R' A PgStat_ReplSlotStats struct describing a replication slot > + * follows. > + */ > + case 'R': > + if (fread(&replSlotStats[nReplSlotStats], 1, > sizeof(PgStat_ReplSlotStats), fpin) > + != sizeof(PgStat_ReplSlotStats)) > + { > + ereport(pgStatRunningInCollector ? LOG : WARNING, > + (errmsg("corrupted statistics file \"%s\"", > + statfile))); > + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats)); > + goto done; > + } > + nReplSlotStats++; > + break; > > Both here and in pgstat_read_db_statsfile_timestamp(), the patch > handles 'R' message after 'D' whereas while writing the 'R' is written > before 'D'. So, I think it is better if we keep the order during read > the same as during write. Changed the code so that it writes 'R' after 'D'. > > 9. While reviewing this patch, I noticed that in > pgstat_read_db_statsfile_timestamp(), if we fail to read ArchiverStats > or SLRUStats, we return 'false' from this function but OTOH, if we > fail to read 'D' or 'R' message, we will return 'true'. I feel the > handling of 'D' and 'R' message is fine because once we read > GlobalStats, we can return the stats_timestamp. So the other two > stands corrected. I understand that this is not directly related to > this patch but if you agree we can do this as a separate patch. It seems to make sense to me. We can set *ts and then read both ArchiverStats and SLRUStats so we can return a valid timestamp even if we fail to read. I've attached both patches: 0001 patch fixes the issue you reported. 0002 patch is the patch that incorporated all review comments. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, Sep 8, 2020 at 7:53 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 7 Sep 2020 at 15:24, Amit Kapila <amit.kapila16@gmail.com> wrote: > > I'm still going to work on this patch although I might be slow > response this month. > This is a quite fast response. Thanks for staying on top of it. > > > > > 9. While reviewing this patch, I noticed that in > > pgstat_read_db_statsfile_timestamp(), if we fail to read ArchiverStats > > or SLRUStats, we return 'false' from this function but OTOH, if we > > fail to read 'D' or 'R' message, we will return 'true'. I feel the > > handling of 'D' and 'R' message is fine because once we read > > GlobalStats, we can return the stats_timestamp. So the other two > > stands corrected. I understand that this is not directly related to > > this patch but if you agree we can do this as a separate patch. > > It seems to make sense to me. We can set *ts and then read both > ArchiverStats and SLRUStats so we can return a valid timestamp even if > we fail to read. > I have started a separate thread for this bug-fix [1] and will continue reviewing this patch. [1] - https://www.postgresql.org/message-id/CAA4eK1J3oTJKyVq6v7K4d3jD%2BvtnruG9fHRib6UuWWsrwAR6Aw%40mail.gmail.com -- With Regards, Amit Kapila.
On Tue, Sep 8, 2020 at 7:53 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 7 Sep 2020 at 15:24, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > This patch needs a rebase. I don't see this patch in the CF app. I > > hope you are still interested in working on this. > > Thank you for reviewing this patch! > > I'm still going to work on this patch although I might be slow > response this month. > Comments on the latest patch: ============================= 1. +CREATE VIEW pg_stat_replication_slots AS + SELECT + s.name, + s.spill_txns, + s.spill_count, + s.spill_bytes, + s.stats_reset + FROM pg_stat_get_replication_slots() AS s; You forgot to update the docs for the new parameter. 2. @@ -5187,6 +5305,12 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep) for (i = 0; i < SLRU_NUM_ELEMENTS; i++) slruStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp; + /* + * Set the same reset timestamp for all replication slots too. + */ + for (i = 0; i < max_replication_slots; i++) + replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp; + I don't understand why you have removed the above code from the new version of the patch? 3. pgstat_recv_resetreplslotcounter() { .. + ts = GetCurrentTimestamp(); + for (i = 0; i < nReplSlotStats; i++) + { + /* reset entry with the given index, or all entries */ + if (msg->clearall || idx == i) + { + /* reset only counters. Don't clear slot name */ + replSlotStats[i].spill_txns = 0; + replSlotStats[i].spill_count = 0; + replSlotStats[i].spill_bytes = 0; + replSlotStats[i].stat_reset_timestamp = ts; + } + } .. I don't like this coding pattern as in the worst case we need to traverse all the slots to reset a particular slot. This could be okay for a fixed number of elements as we have in SLRU but here it appears quite inefficient. We can move the reset of stats part to a separate function and then invoke it from the place where we need to reset a particular slot and the above place. 4. +pgstat_replslot_index(const char *name, bool create_it) { .. + replSlotStats[nReplSlotStats].stat_reset_timestamp = GetCurrentTimestamp(); .. } Why do we need to set the reset timestamp on the creation of slot entry? 5. @@ -3170,6 +3175,13 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn) spilled++; } + /* update the statistics */ + rb->spillCount += 1; + rb->spillBytes += size; + + /* Don't consider already serialized transactions. */ + rb->spillTxns += rbtxn_is_serialized(txn) ? 0 : 1; We can't increment the spillTxns in the above way because now sometimes we do serialize before streaming and in that case we clear the serialized flag after streaming, see ReorderBufferTruncateTXN. So, the count can go wrong. Another problem is currently the patch call UpdateSpillStats only from begin_cb_wrapper which means it won't consider streaming transactions (streaming transactions that might have spilled). If we consider the streamed case along with it, we can probably keep this counter up-to-date because in the above place we can check if the txn is once serialized or streamed, we shouldn't increment the counter. I think we need to merge Ajin's patch for streaming stats [1] and fix the issue. I have not checked his patch so it might need a rebase and or some changes. 6. @@ -322,6 +321,9 @@ ReplicationSlotCreate(const char *name, bool db_specific, /* Let everybody know we've modified this slot */ ConditionVariableBroadcast(&slot->active_cv); + + /* Create statistics entry for the new slot */ + pgstat_report_replslot(NameStr(slot->data.name), 0, 0, 0); } .. .. @@ -683,6 +685,18 @@ ReplicationSlotDropPtr(ReplicationSlot *slot) ereport(WARNING, (errmsg("could not remove directory \"%s\"", tmppath))); + /* + * Report to drop the replication slot to stats collector. Since there + * is no guarantee the order of message arrival on an UDP connection, + * it's possible that a message for creating a new slot arrives before a + * message for removing the old slot. We send the drop message while + * holding ReplicationSlotAllocationLock to reduce that possibility. + * If the messages arrived in reverse, we would lose one statistics update + * message. But the next update message will create the statistics for + * the replication slot. + */ + pgstat_report_replslot_drop(NameStr(slot->data.name)); + Similar to drop message, why don't we send the create message while holding the ReplicationSlotAllocationLock? [1] - https://www.postgresql.org/message-id/CAFPTHDZ8RnOovefzB%2BOMoRxLSD404WRLqWBUHe6bWqM5ew1bNA%40mail.gmail.com -- With Regards, Amit Kapila.
On Tue, Sep 8, 2020 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Comments on the latest patch: > ============================= > Apart from the comments I gave yesterday, another thing I was wondering is how to write some tests for this patch. The two ideas I could think of are as follows: 1. One idea was to keep these stats for each WALSender as it was in the commit that we reverted as b074813d48. If we had that then we can query the stats for tests added in commit 58b5ae9d62. I am not sure whether we want to display it in view pg_stat_replication but it would be a really good way to test the streamed and serialized transactions in a predictable manner. 2. Then the second way is to try doing something similar to what we do in src/test/regress/sql/stats.sql I think we should do both if possible. -- With Regards, Amit Kapila.
On Wed, Sep 9, 2020 at 3:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Sep 8, 2020 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Comments on the latest patch: > > ============================= > > > > Apart from the comments I gave yesterday, another thing I was > wondering is how to write some tests for this patch. The two ideas I > could think of are as follows: > > 1. One idea was to keep these stats for each WALSender as it was in > the commit that we reverted as b074813d48. If we had that then we can > query the stats for tests added in commit 58b5ae9d62. I am not sure > whether we want to display it in view pg_stat_replication but it would > be a really good way to test the streamed and serialized transactions > in a predictable manner. > > 2. Then the second way is to try doing something similar to what we do > in src/test/regress/sql/stats.sql > > I think we should do both if possible. > I have made a few comment changes on top of your last version. If you are fine with these then include them with the next version of your patch. -- With Regards, Amit Kapila.
Attachment
On Tue, Sep 8, 2020 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Sep 8, 2020 at 7:53 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: I have fixed these review comments in the attached patch. > > Comments on the latest patch: > ============================= > 1. > +CREATE VIEW pg_stat_replication_slots AS > + SELECT > + s.name, > + s.spill_txns, > + s.spill_count, > + s.spill_bytes, > + s.stats_reset > + FROM pg_stat_get_replication_slots() AS s; > > You forgot to update the docs for the new parameter. > Updated the docs for this. > 2. > @@ -5187,6 +5305,12 @@ pgstat_read_statsfiles(Oid onlydb, bool > permanent, bool deep) > for (i = 0; i < SLRU_NUM_ELEMENTS; i++) > slruStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp; > > + /* > + * Set the same reset timestamp for all replication slots too. > + */ > + for (i = 0; i < max_replication_slots; i++) > + replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp; > + > > I don't understand why you have removed the above code from the new > version of the patch? > Added back. > 3. > pgstat_recv_resetreplslotcounter() > { > .. > + ts = GetCurrentTimestamp(); > + for (i = 0; i < nReplSlotStats; i++) > + { > + /* reset entry with the given index, or all entries */ > + if (msg->clearall || idx == i) > + { > + /* reset only counters. Don't clear slot name */ > + replSlotStats[i].spill_txns = 0; > + replSlotStats[i].spill_count = 0; > + replSlotStats[i].spill_bytes = 0; > + replSlotStats[i].stat_reset_timestamp = ts; > + } > + } > .. > > I don't like this coding pattern as in the worst case we need to > traverse all the slots to reset a particular slot. This could be okay > for a fixed number of elements as we have in SLRU but here it appears > quite inefficient. We can move the reset of stats part to a separate > function and then invoke it from the place where we need to reset a > particular slot and the above place. > Changed the code as per the above idea. > 4. > +pgstat_replslot_index(const char *name, bool create_it) > { > .. > + replSlotStats[nReplSlotStats].stat_reset_timestamp = GetCurrentTimestamp(); > .. > } > > Why do we need to set the reset timestamp on the creation of slot entry? > I don't think we need to show any time if the slot is never reset. Let me know if there is any reason to show it. > 5. > @@ -3170,6 +3175,13 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, > ReorderBufferTXN *txn) > spilled++; > } > > + /* update the statistics */ > + rb->spillCount += 1; > + rb->spillBytes += size; > + > + /* Don't consider already serialized transactions. */ > + rb->spillTxns += rbtxn_is_serialized(txn) ? 0 : 1; > > We can't increment the spillTxns in the above way because now > sometimes we do serialize before streaming and in that case we clear > the serialized flag after streaming, see ReorderBufferTruncateTXN. So, > the count can go wrong. > To fix, this I have added another flag which indicates if we have ever serialized the txn. I couldn't find a better way, do let me know if you can think of a better way to address this comment. > Another problem is currently the patch call > UpdateSpillStats only from begin_cb_wrapper which means it won't > consider streaming transactions (streaming transactions that might > have spilled). > The other problem I see with updating in begin_cb_wrapper is that it will ignore the spilling done for transactions that get aborted. To fix both the issues, I have updated the stats in DecodeCommit and DecodeAbort. > 6. > @@ -322,6 +321,9 @@ ReplicationSlotCreate(const char *name, bool db_specific, > > /* Let everybody know we've modified this slot */ > ConditionVariableBroadcast(&slot->active_cv); > + > + /* Create statistics entry for the new slot */ > + pgstat_report_replslot(NameStr(slot->data.name), 0, 0, 0); > } > .. > .. > @@ -683,6 +685,18 @@ ReplicationSlotDropPtr(ReplicationSlot *slot) > ereport(WARNING, > (errmsg("could not remove directory \"%s\"", tmppath))); > > + /* > + * Report to drop the replication slot to stats collector. Since there > + * is no guarantee the order of message arrival on an UDP connection, > + * it's possible that a message for creating a new slot arrives before a > + * message for removing the old slot. We send the drop message while > + * holding ReplicationSlotAllocationLock to reduce that possibility. > + * If the messages arrived in reverse, we would lose one statistics update > + * message. But the next update message will create the statistics for > + * the replication slot. > + */ > + pgstat_report_replslot_drop(NameStr(slot->data.name)); > + > > Similar to drop message, why don't we send the create message while > holding the ReplicationSlotAllocationLock? > Updated code to address this comment, basically moved the create message under lock. Apart from the above, (a) fixed one bug in ReorderBufferSerializeTXN() where we were updating the stats even when we have not spilled anything. (b) made changes in pgstat_read_db_statsfile_timestamp to return false when the replication slot entry is corrupt. (c) move the declaration and definitions in pgstat.c to make them consistent with existing code (d) made another couple of cosmetic fixes and changed a few comments (e) Tested the patch by using a guc which allows spilling all the changes. See v4-0001-guc-always-spill Let me know what do you think about the changes? -- With Regards, Amit Kapila.
Attachment
On Sat, Sep 19, 2020 at 1:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Sep 8, 2020 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Sep 8, 2020 at 7:53 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > I have fixed these review comments in the attached patch. > > > Apart from the above, > (a) fixed one bug in ReorderBufferSerializeTXN() where we were > updating the stats even when we have not spilled anything. > (b) made changes in pgstat_read_db_statsfile_timestamp to return false > when the replication slot entry is corrupt. > (c) move the declaration and definitions in pgstat.c to make them > consistent with existing code > (d) made another couple of cosmetic fixes and changed a few comments > (e) Tested the patch by using a guc which allows spilling all the > changes. See v4-0001-guc-always-spill > I have found a way to write the test case for this patch. This is based on the idea we used in stats.sql. As of now, I have kept the test as a separate patch. We can decide to commit the test part separately as it is slightly timing dependent but OTOH as it is based on existing logic in stats.sql so there shouldn't be much problem. I have not changed anything apart from the test patch in this version. Note that the first patch is just a debugging kind of tool to test the patch. Thoughts? -- With Regards, Amit Kapila.
Attachment
On Thu, Sep 24, 2020 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Sep 19, 2020 at 1:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Sep 8, 2020 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Sep 8, 2020 at 7:53 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > I have fixed these review comments in the attached patch. > > > > > > Apart from the above, > > (a) fixed one bug in ReorderBufferSerializeTXN() where we were > > updating the stats even when we have not spilled anything. > > (b) made changes in pgstat_read_db_statsfile_timestamp to return false > > when the replication slot entry is corrupt. > > (c) move the declaration and definitions in pgstat.c to make them > > consistent with existing code > > (d) made another couple of cosmetic fixes and changed a few comments > > (e) Tested the patch by using a guc which allows spilling all the > > changes. See v4-0001-guc-always-spill > > > > I have found a way to write the test case for this patch. This is > based on the idea we used in stats.sql. As of now, I have kept the > test as a separate patch. We can decide to commit the test part > separately as it is slightly timing dependent but OTOH as it is based > on existing logic in stats.sql so there shouldn't be much problem. I > have not changed anything apart from the test patch in this version. > Note that the first patch is just a debugging kind of tool to test the > patch. > I have done some more testing of this patch especially for the case where we spill before streaming the transaction and found everything is working as expected. Additionally, I have changed a few more comments and ran pgindent. I am still not very sure whether we want to display physical slots in this view as all the stats are for logical slots but anyway we can add stats w.r.t physical slots in the future. I am fine either way (don't show physical slots in this view or show them but keep stats as 0). Let me know if you have any thoughts on these points, other than that I am happy with the current state of the patch. -- With Regards, Amit Kapila.
Attachment
On Fri, Sep 25, 2020 at 4:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Sep 24, 2020 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Sep 19, 2020 at 1:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Sep 8, 2020 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Tue, Sep 8, 2020 at 7:53 AM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > I have fixed these review comments in the attached patch. > > > > > > > > > Apart from the above, > > > (a) fixed one bug in ReorderBufferSerializeTXN() where we were > > > updating the stats even when we have not spilled anything. > > > (b) made changes in pgstat_read_db_statsfile_timestamp to return false > > > when the replication slot entry is corrupt. > > > (c) move the declaration and definitions in pgstat.c to make them > > > consistent with existing code > > > (d) made another couple of cosmetic fixes and changed a few comments > > > (e) Tested the patch by using a guc which allows spilling all the > > > changes. See v4-0001-guc-always-spill > > > > > > > I have found a way to write the test case for this patch. This is > > based on the idea we used in stats.sql. As of now, I have kept the > > test as a separate patch. We can decide to commit the test part > > separately as it is slightly timing dependent but OTOH as it is based > > on existing logic in stats.sql so there shouldn't be much problem. I > > have not changed anything apart from the test patch in this version. > > Note that the first patch is just a debugging kind of tool to test the > > patch. > > > > I have done some more testing of this patch especially for the case > where we spill before streaming the transaction and found everything > is working as expected. Additionally, I have changed a few more > comments and ran pgindent. I am still not very sure whether we want to > display physical slots in this view as all the stats are for logical > slots but anyway we can add stats w.r.t physical slots in the future. > I am fine either way (don't show physical slots in this view or show > them but keep stats as 0). Let me know if you have any thoughts on > these points, other than that I am happy with the current state of the > patch. IMHO, It will make more sense to only show the logical replication slots in this view. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Sep 30, 2020 at 1:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Sep 25, 2020 at 4:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Sep 24, 2020 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > I have done some more testing of this patch especially for the case > > where we spill before streaming the transaction and found everything > > is working as expected. Additionally, I have changed a few more > > comments and ran pgindent. I am still not very sure whether we want to > > display physical slots in this view as all the stats are for logical > > slots but anyway we can add stats w.r.t physical slots in the future. > > I am fine either way (don't show physical slots in this view or show > > them but keep stats as 0). Let me know if you have any thoughts on > > these points, other than that I am happy with the current state of the > > patch. > > IMHO, It will make more sense to only show the logical replication > slots in this view. > Okay, Sawada-San, others, do you have any opinion on this matter? -- With Regards, Amit Kapila.
On Wed, Sep 30, 2020 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Sep 30, 2020 at 1:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Sep 25, 2020 at 4:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Sep 24, 2020 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I have done some more testing of this patch especially for the case > > > where we spill before streaming the transaction and found everything > > > is working as expected. Additionally, I have changed a few more > > > comments and ran pgindent. I am still not very sure whether we want to > > > display physical slots in this view as all the stats are for logical > > > slots but anyway we can add stats w.r.t physical slots in the future. > > > I am fine either way (don't show physical slots in this view or show > > > them but keep stats as 0). Let me know if you have any thoughts on > > > these points, other than that I am happy with the current state of the > > > patch. > > > > IMHO, It will make more sense to only show the logical replication > > slots in this view. > > > > Okay, Sawada-San, others, do you have any opinion on this matter? I have started looking into this patch, I have a few comments. + Number of times transactions were spilled to disk. Transactions + may get spilled repeatedly, and this counter gets incremented on every + such invocation. + </para></entry> + </row> + + + <para> + Tracking of spilled transactions works only for logical replication. In The number of spaces used after the full stop is not uniform. + /* update the statistics iff we have spilled anything */ + if (spilled) + { + rb->spillCount += 1; + rb->spillBytes += size; + + /* Don't consider already serialized transactions. */ Single line comments are not uniform, "update the statistics" is starting is small letter and not ending with the full stop whereas 'Don't consider' is starting with capital and ending with full stop. + + /* + * We set this flag to indicate if the transaction is ever serialized. + * We need this to accurately update the stats. + */ + txn->txn_flags |= RBTXN_IS_SERIALIZED_CLEAR; I feel we can explain the exact scenario in the comment, i.e. after spill if we stream then we still need to know that it spilled in past so that we don't count this again as a new spilled transaction. old slot. We send the drop + * and create messages while holding ReplicationSlotAllocationLock to + * reduce that possibility. If the messages reached in reverse, we would + * lose one statistics update message. But Spacing after the full stop is not uniform. + * Statistics about transactions spilled to disk. + * + * A single transaction may be spilled repeatedly, which is why we keep + * two different counters. For spilling, the transaction counter includes + * both toplevel transactions and subtransactions. + */ + int64 spillCount; /* spill-to-disk invocation counter */ + int64 spillTxns; /* number of transactions spilled to disk */ + int64 spillBytes; /* amount of data spilled to disk */ Can we keep the order as spillTxns, spillTxns, spillBytes because every other place we kept like that so that way it will look more uniform. Other than that I did not see any problem. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Sep 30, 2020 at 4:34 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Sep 30, 2020 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Sep 30, 2020 at 1:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Fri, Sep 25, 2020 at 4:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Thu, Sep 24, 2020 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > I have done some more testing of this patch especially for the case > > > > where we spill before streaming the transaction and found everything > > > > is working as expected. Additionally, I have changed a few more > > > > comments and ran pgindent. I am still not very sure whether we want to > > > > display physical slots in this view as all the stats are for logical > > > > slots but anyway we can add stats w.r.t physical slots in the future. > > > > I am fine either way (don't show physical slots in this view or show > > > > them but keep stats as 0). Let me know if you have any thoughts on > > > > these points, other than that I am happy with the current state of the > > > > patch. > > > > > > IMHO, It will make more sense to only show the logical replication > > > slots in this view. > > > > > > > Okay, Sawada-San, others, do you have any opinion on this matter? > I have changed so that view will only show logical replication slots and adjusted the docs accordingly. I think we can easily extend it to show physical replication slots if required in the future. > I have started looking into this patch, I have a few comments. > > + Number of times transactions were spilled to disk. Transactions > + may get spilled repeatedly, and this counter gets incremented on every > + such invocation. > + </para></entry> > + </row> > + > + > + <para> > + Tracking of spilled transactions works only for logical replication. In > > The number of spaces used after the full stop is not uniform. > I have removed this sentence as it is not required as we only want to show logical slots in the view and for that I have updated the other parts of the doc. > + /* update the statistics iff we have spilled anything */ > + if (spilled) > + { > + rb->spillCount += 1; > + rb->spillBytes += size; > + > + /* Don't consider already serialized transactions. */ > > Single line comments are not uniform, "update the statistics" is > starting is small letter and not ending with the full stop > whereas 'Don't consider' is starting with capital and ending with full stop. > Actually, it doesn't matter in this case but I have changed to keep it consistent. > > + > + /* > + * We set this flag to indicate if the transaction is ever serialized. > + * We need this to accurately update the stats. > + */ > + txn->txn_flags |= RBTXN_IS_SERIALIZED_CLEAR; > > I feel we can explain the exact scenario in the comment, i.e. after > spill if we stream then we still > need to know that it spilled in past so that we don't count this again > as a new spilled transaction. > Okay, I have expanded the comment a bit more to explain this. > old slot. We send the drop > + * and create messages while holding ReplicationSlotAllocationLock to > + * reduce that possibility. If the messages reached in reverse, we would > + * lose one statistics update message. But > > Spacing after the full stop is not uniform. > Changed. > > + * Statistics about transactions spilled to disk. > + * > + * A single transaction may be spilled repeatedly, which is why we keep > + * two different counters. For spilling, the transaction counter includes > + * both toplevel transactions and subtransactions. > + */ > + int64 spillCount; /* spill-to-disk invocation counter */ > + int64 spillTxns; /* number of transactions spilled to disk */ > + int64 spillBytes; /* amount of data spilled to disk */ > > Can we keep the order as spillTxns, spillTxns, spillBytes because > every other place we kept like that > so that way it will look more uniform. > Changed here and at one more place as per this suggestion. > Other than that I did not see any problem. > Thanks for the review. -- With Regards, Amit Kapila.
On Thu, Oct 1, 2020 at 12:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Thanks for the review. > oops, forgot to attach the updated patches, doing now. -- With Regards, Amit Kapila.
Attachment
On Wed, 30 Sep 2020 at 18:10, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Sep 30, 2020 at 1:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Sep 25, 2020 at 4:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Sep 24, 2020 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I have done some more testing of this patch especially for the case > > > where we spill before streaming the transaction and found everything > > > is working as expected. Additionally, I have changed a few more > > > comments and ran pgindent. I am still not very sure whether we want to > > > display physical slots in this view as all the stats are for logical > > > slots but anyway we can add stats w.r.t physical slots in the future. > > > I am fine either way (don't show physical slots in this view or show > > > them but keep stats as 0). Let me know if you have any thoughts on > > > these points, other than that I am happy with the current state of the > > > patch. > > > > IMHO, It will make more sense to only show the logical replication > > slots in this view. > > > Thank you for updating the patch. > Okay, Sawada-San, others, do you have any opinion on this matter? > When we discussed this before, I was thinking that we could have other statistics for physical slots in the same statistics view in the future. Having the view show only logical slots also makes sense to me but I’m concerned a bit that we could break backward compatibility that monitoring tools etc will be affected when the view starts to show physical slots too. If the view shows only logical slots, it also might be worth considering to have separate views for logical slots and physical slots and having pg_stat_logical_replication_slots by this change. Here is my review comment on the v7 patch. + /* + * Set the same reset timestamp for all replication slots too. + */ + for (i = 0; i < max_replication_slots; i++) + replSlotStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp; + You added back the above code but since we clear the timestamps on creation of a new slot they are not shown: + /* Register new slot */ + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats)); + memcpy(&replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN); Looking at other statistics views such as pg_stat_slru, pg_stat_bgwriter, and pg_stat_archiver, they have a valid reset_timestamp value from the beginning. That's why I removed that code and assigned the timestamp when registering a new slot. --- + if (OidIsValid(slot->data.database)) + pgstat_report_replslot(NameStr(slot->data.name), 0, 0, 0); I think we can use SlotIsLogical() for this purpose. The same is true when dropping a slot. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Oct 3, 2020 at 9:26 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > When we discussed this before, I was thinking that we could have other > statistics for physical slots in the same statistics view in the > future. Having the view show only logical slots also makes sense to me > but I’m concerned a bit that we could break backward compatibility > that monitoring tools etc will be affected when the view starts to > show physical slots too. > I think that would happen anyway as we need to add more columns in view for the physical slots. > If the view shows only logical slots, it also > might be worth considering to have separate views for logical slots > and physical slots and having pg_stat_logical_replication_slots by > this change. > I am not sure at this stage but I think we will add the additional stats for physical slots once we have any in this view itself. I would like to avoid adding separate views if possible. The only reason to omit physical slots at this stage is that we don't have any stats for the same. > Here is my review comment on the v7 patch. > > + /* > + * Set the same reset timestamp for all replication slots too. > + */ > + for (i = 0; i < max_replication_slots; i++) > + replSlotStats[i].stat_reset_timestamp = > globalStats.stat_reset_timestamp; > + > > You added back the above code but since we clear the timestamps on > creation of a new slot they are not shown: > > + /* Register new slot */ > + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats)); > + memcpy(&replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN); > > Looking at other statistics views such as pg_stat_slru, > pg_stat_bgwriter, and pg_stat_archiver, they have a valid > reset_timestamp value from the beginning. That's why I removed that > code and assigned the timestamp when registering a new slot. > Hmm, I don't think it is shown intentionally in those views. I think what is happening in other views is that it has been initialized with some value when we read the stats and then while updating and or writing because we don't change the stat_reset_timestamp, it displays the same value as initialized at the time of read. Now, because in pgstat_replslot_index() we always initialize the replSlotStats it would overwrite any previous value we have set during read and display the stat_reset as empty for replication slots. If we stop initializing the replSlotStats in pgstat_replslot_index() then we will see similar behavior as other views have. So even if we want to change then probably we should stop initialization in pgstat_replslot_index but I don't think that is necessarily better behavior because the description of the parameter doesn't indicate any such thing. > --- > + if (OidIsValid(slot->data.database)) > + pgstat_report_replslot(NameStr(slot->data.name), 0, 0, 0); > > I think we can use SlotIsLogical() for this purpose. The same is true > when dropping a slot. > makes sense, so changed accordingly in the attached patch. -- With Regards, Amit Kapila.
Attachment
On Sat, 3 Oct 2020 at 16:55, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Oct 3, 2020 at 9:26 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > When we discussed this before, I was thinking that we could have other > > statistics for physical slots in the same statistics view in the > > future. Having the view show only logical slots also makes sense to me > > but I’m concerned a bit that we could break backward compatibility > > that monitoring tools etc will be affected when the view starts to > > show physical slots too. > > > > I think that would happen anyway as we need to add more columns in > view for the physical slots. I think it depends; adding more columns to the view might not break tools if the query used in the tool explicitly specifies columns. OTOH if the view starts to show more rows, the tool will need to have the condition to get the same result as before. > > > If the view shows only logical slots, it also > > might be worth considering to have separate views for logical slots > > and physical slots and having pg_stat_logical_replication_slots by > > this change. > > > > I am not sure at this stage but I think we will add the additional > stats for physical slots once we have any in this view itself. I would > like to avoid adding separate views if possible. The only reason to > omit physical slots at this stage is that we don't have any stats for > the same. I also prefer not to have separate views. I'm concerned about the compatibility I explained above but at the same time I agree that it doesn't make sense to show the stats always having nothing. Since given you and Dilip agreed on that, I also agree with that. > > > Here is my review comment on the v7 patch. > > > > + /* > > + * Set the same reset timestamp for all replication slots too. > > + */ > > + for (i = 0; i < max_replication_slots; i++) > > + replSlotStats[i].stat_reset_timestamp = > > globalStats.stat_reset_timestamp; > > + > > > > You added back the above code but since we clear the timestamps on > > creation of a new slot they are not shown: > > > > + /* Register new slot */ > > + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats)); > > + memcpy(&replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN); > > > > Looking at other statistics views such as pg_stat_slru, > > pg_stat_bgwriter, and pg_stat_archiver, they have a valid > > reset_timestamp value from the beginning. That's why I removed that > > code and assigned the timestamp when registering a new slot. > > > > Hmm, I don't think it is shown intentionally in those views. I think > what is happening in other views is that it has been initialized with > some value when we read the stats and then while updating and or > writing because we don't change the stat_reset_timestamp, it displays > the same value as initialized at the time of read. Now, because in > pgstat_replslot_index() we always initialize the replSlotStats it > would overwrite any previous value we have set during read and display > the stat_reset as empty for replication slots. If we stop initializing > the replSlotStats in pgstat_replslot_index() then we will see similar > behavior as other views have. So even if we want to change then > probably we should stop initialization in pgstat_replslot_index but I > don't think that is necessarily better behavior because the > description of the parameter doesn't indicate any such thing. Understood. I agreed that the newly created slot doesn't have reset_timestamp. Looking at pg_stat_database, a view whose rows are added dynamically unlike other stat views, the newly created database doesn't have reset_timestamp. But given we clear the stats for a slot at pgstat_replslot_index(), why do we need to initialize the reset_timestamp with globalStats.stat_reset_timestamp when reading the stats file? Even if we could not find any slot stats in the stats file the view won’t show anything. And the reset_timestamp will be cleared when receiving a slot stats. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Oct 5, 2020 at 1:26 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sat, 3 Oct 2020 at 16:55, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Oct 3, 2020 at 9:26 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > When we discussed this before, I was thinking that we could have other > > > statistics for physical slots in the same statistics view in the > > > future. Having the view show only logical slots also makes sense to me > > > but I’m concerned a bit that we could break backward compatibility > > > that monitoring tools etc will be affected when the view starts to > > > show physical slots too. > > > > > > > I think that would happen anyway as we need to add more columns in > > view for the physical slots. > > I think it depends; adding more columns to the view might not break > tools if the query used in the tool explicitly specifies columns. > What if it uses Select * ...? It might not be advisable to assume how the user might fetch data. > OTOH > if the view starts to show more rows, the tool will need to have the > condition to get the same result as before. > > > > > > If the view shows only logical slots, it also > > > might be worth considering to have separate views for logical slots > > > and physical slots and having pg_stat_logical_replication_slots by > > > this change. > > > > > > > I am not sure at this stage but I think we will add the additional > > stats for physical slots once we have any in this view itself. I would > > like to avoid adding separate views if possible. The only reason to > > omit physical slots at this stage is that we don't have any stats for > > the same. > > I also prefer not to have separate views. I'm concerned about the > compatibility I explained above but at the same time I agree that it > doesn't make sense to show the stats always having nothing. Since > given you and Dilip agreed on that, I also agree with that. > Okay. > > > > > Here is my review comment on the v7 patch. > > > > > > + /* > > > + * Set the same reset timestamp for all replication slots too. > > > + */ > > > + for (i = 0; i < max_replication_slots; i++) > > > + replSlotStats[i].stat_reset_timestamp = > > > globalStats.stat_reset_timestamp; > > > + > > > > > > You added back the above code but since we clear the timestamps on > > > creation of a new slot they are not shown: > > > > > > + /* Register new slot */ > > > + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats)); > > > + memcpy(&replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN); > > > > > > Looking at other statistics views such as pg_stat_slru, > > > pg_stat_bgwriter, and pg_stat_archiver, they have a valid > > > reset_timestamp value from the beginning. That's why I removed that > > > code and assigned the timestamp when registering a new slot. > > > > > > > Hmm, I don't think it is shown intentionally in those views. I think > > what is happening in other views is that it has been initialized with > > some value when we read the stats and then while updating and or > > writing because we don't change the stat_reset_timestamp, it displays > > the same value as initialized at the time of read. Now, because in > > pgstat_replslot_index() we always initialize the replSlotStats it > > would overwrite any previous value we have set during read and display > > the stat_reset as empty for replication slots. If we stop initializing > > the replSlotStats in pgstat_replslot_index() then we will see similar > > behavior as other views have. So even if we want to change then > > probably we should stop initialization in pgstat_replslot_index but I > > don't think that is necessarily better behavior because the > > description of the parameter doesn't indicate any such thing. > > Understood. I agreed that the newly created slot doesn't have > reset_timestamp. Looking at pg_stat_database, a view whose rows are > added dynamically unlike other stat views, the newly created database > doesn't have reset_timestamp. But given we clear the stats for a slot > at pgstat_replslot_index(), why do we need to initialize the > reset_timestamp with globalStats.stat_reset_timestamp when reading the > stats file? Even if we could not find any slot stats in the stats file > the view won’t show anything. > It was mainly for a code consistency point of view. Also, we will clear the data in pgstat_replslot_index only for new slots, not for existing slots. It might be used when we can't load the statsfile as per comment in code ("Set the current timestamp (will be kept only in case we can't load an existing statsfile)). -- With Regards, Amit Kapila.
On Mon, 5 Oct 2020 at 17:50, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Oct 5, 2020 at 1:26 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Sat, 3 Oct 2020 at 16:55, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sat, Oct 3, 2020 at 9:26 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > When we discussed this before, I was thinking that we could have other > > > > statistics for physical slots in the same statistics view in the > > > > future. Having the view show only logical slots also makes sense to me > > > > but I’m concerned a bit that we could break backward compatibility > > > > that monitoring tools etc will be affected when the view starts to > > > > show physical slots too. > > > > > > > > > > I think that would happen anyway as we need to add more columns in > > > view for the physical slots. > > > > I think it depends; adding more columns to the view might not break > > tools if the query used in the tool explicitly specifies columns. > > > > What if it uses Select * ...? It might not be advisable to assume how > the user might fetch data. > > > OTOH > > if the view starts to show more rows, the tool will need to have the > > condition to get the same result as before. > > > > > > > > > If the view shows only logical slots, it also > > > > might be worth considering to have separate views for logical slots > > > > and physical slots and having pg_stat_logical_replication_slots by > > > > this change. > > > > > > > > > > I am not sure at this stage but I think we will add the additional > > > stats for physical slots once we have any in this view itself. I would > > > like to avoid adding separate views if possible. The only reason to > > > omit physical slots at this stage is that we don't have any stats for > > > the same. > > > > I also prefer not to have separate views. I'm concerned about the > > compatibility I explained above but at the same time I agree that it > > doesn't make sense to show the stats always having nothing. Since > > given you and Dilip agreed on that, I also agree with that. > > > > Okay. > > > > > > > > Here is my review comment on the v7 patch. > > > > > > > > + /* > > > > + * Set the same reset timestamp for all replication slots too. > > > > + */ > > > > + for (i = 0; i < max_replication_slots; i++) > > > > + replSlotStats[i].stat_reset_timestamp = > > > > globalStats.stat_reset_timestamp; > > > > + > > > > > > > > You added back the above code but since we clear the timestamps on > > > > creation of a new slot they are not shown: > > > > > > > > + /* Register new slot */ > > > > + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats)); > > > > + memcpy(&replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN); > > > > > > > > Looking at other statistics views such as pg_stat_slru, > > > > pg_stat_bgwriter, and pg_stat_archiver, they have a valid > > > > reset_timestamp value from the beginning. That's why I removed that > > > > code and assigned the timestamp when registering a new slot. > > > > > > > > > > Hmm, I don't think it is shown intentionally in those views. I think > > > what is happening in other views is that it has been initialized with > > > some value when we read the stats and then while updating and or > > > writing because we don't change the stat_reset_timestamp, it displays > > > the same value as initialized at the time of read. Now, because in > > > pgstat_replslot_index() we always initialize the replSlotStats it > > > would overwrite any previous value we have set during read and display > > > the stat_reset as empty for replication slots. If we stop initializing > > > the replSlotStats in pgstat_replslot_index() then we will see similar > > > behavior as other views have. So even if we want to change then > > > probably we should stop initialization in pgstat_replslot_index but I > > > don't think that is necessarily better behavior because the > > > description of the parameter doesn't indicate any such thing. > > > > Understood. I agreed that the newly created slot doesn't have > > reset_timestamp. Looking at pg_stat_database, a view whose rows are > > added dynamically unlike other stat views, the newly created database > > doesn't have reset_timestamp. But given we clear the stats for a slot > > at pgstat_replslot_index(), why do we need to initialize the > > reset_timestamp with globalStats.stat_reset_timestamp when reading the > > stats file? Even if we could not find any slot stats in the stats file > > the view won’t show anything. > > > > It was mainly for a code consistency point of view. Also, we will > clear the data in pgstat_replslot_index only for new slots, not for > existing slots. It might be used when we can't load the statsfile as > per comment in code ("Set the current timestamp (will be kept only in > case we can't load an existing statsfile)). > Understood. Looking at pgstat_reset_replslot_counter() in the v8 patch, even if we pass a physical slot name to pg_stat_reset_replication_slot() a PgStat_MsgResetreplslotcounter is sent to the stats collector. I’m okay with not raising an error but maybe we can have it not to send the message in that case. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 6, 2020 at 9:34 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > Looking at pgstat_reset_replslot_counter() in the v8 patch, even if we > pass a physical slot name to pg_stat_reset_replication_slot() a > PgStat_MsgResetreplslotcounter is sent to the stats collector. I’m > okay with not raising an error but maybe we can have it not to send > the message in that case. > makes sense, so changed accordingly. -- With Regards, Amit Kapila.
Attachment
On Tue, 6 Oct 2020 at 17:56, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 6, 2020 at 9:34 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > Looking at pgstat_reset_replslot_counter() in the v8 patch, even if we > > pass a physical slot name to pg_stat_reset_replication_slot() a > > PgStat_MsgResetreplslotcounter is sent to the stats collector. I’m > > okay with not raising an error but maybe we can have it not to send > > the message in that case. > > > > makes sense, so changed accordingly. > Thank you for updating the patch! Both patches look good to me. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Oct 7, 2020 at 11:24 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 6 Oct 2020 at 17:56, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Oct 6, 2020 at 9:34 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > Looking at pgstat_reset_replslot_counter() in the v8 patch, even if we > > > pass a physical slot name to pg_stat_reset_replication_slot() a > > > PgStat_MsgResetreplslotcounter is sent to the stats collector. I’m > > > okay with not raising an error but maybe we can have it not to send > > > the message in that case. > > > > > > > makes sense, so changed accordingly. > > > > Thank you for updating the patch! > Thanks, I will push the first one tomorrow unless I see more comments and test-case one later. I think after we are done with this the next step would be to finish the streaming stats work [1]. We probably need to review and add the test case in that patch. If nobody else shows up I will pick it up and complete it. [1] - https://www.postgresql.org/message-id/CAFPTHDZ8RnOovefzB%2BOMoRxLSD404WRLqWBUHe6bWqM5ew1bNA%40mail.gmail.com -- With Regards, Amit Kapila.
On Wed, 7 Oct 2020 at 17:52, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Oct 7, 2020 at 11:24 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 6 Oct 2020 at 17:56, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Oct 6, 2020 at 9:34 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > Looking at pgstat_reset_replslot_counter() in the v8 patch, even if we > > > > pass a physical slot name to pg_stat_reset_replication_slot() a > > > > PgStat_MsgResetreplslotcounter is sent to the stats collector. I’m > > > > okay with not raising an error but maybe we can have it not to send > > > > the message in that case. > > > > > > > > > > makes sense, so changed accordingly. > > > > > > > Thank you for updating the patch! > > > > Thanks, I will push the first one tomorrow unless I see more comments > and test-case one later. I thought we could have a test case for the reset function, what do you think? > I think after we are done with this the next > step would be to finish the streaming stats work [1]. We probably need > to review and add the test case in that patch. If nobody else shows up > I will pick it up and complete it. +1 I can review that patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Oct 8, 2020 at 7:46 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 7 Oct 2020 at 17:52, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Oct 7, 2020 at 11:24 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Tue, 6 Oct 2020 at 17:56, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Tue, Oct 6, 2020 at 9:34 AM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > Looking at pgstat_reset_replslot_counter() in the v8 patch, even if we > > > > > pass a physical slot name to pg_stat_reset_replication_slot() a > > > > > PgStat_MsgResetreplslotcounter is sent to the stats collector. I’m > > > > > okay with not raising an error but maybe we can have it not to send > > > > > the message in that case. > > > > > > > > > > > > > makes sense, so changed accordingly. > > > > > > > > > > Thank you for updating the patch! > > > > > > > Thanks, I will push the first one tomorrow unless I see more comments > > and test-case one later. > > I thought we could have a test case for the reset function, what do you think? > We can write if we want but there are few things we need to do for that like maybe a new function like wait_for_spill_stats which will check if the counters have become zero. Then probably call a reset function, call a new wait function, and then again check stats to ensure they are reset to 0. We can't write any advanced test which means reset the existing stats perform some tests and again check stats because *slot_get_changes() function can start from the previous WAL for which we have covered the stats. We might write that if we can somehow track the WAL positions from the previous test. I am not sure if we want to go there. > > I think after we are done with this the next > > step would be to finish the streaming stats work [1]. We probably need > > to review and add the test case in that patch. If nobody else shows up > > I will pick it up and complete it. > > +1 > I can review that patch. > Thanks! -- With Regards, Amit Kapila.
On Thu, 8 Oct 2020 at 14:10, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 8, 2020 at 7:46 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 7 Oct 2020 at 17:52, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Oct 7, 2020 at 11:24 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Tue, 6 Oct 2020 at 17:56, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Tue, Oct 6, 2020 at 9:34 AM Masahiko Sawada > > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > Looking at pgstat_reset_replslot_counter() in the v8 patch, even if we > > > > > > pass a physical slot name to pg_stat_reset_replication_slot() a > > > > > > PgStat_MsgResetreplslotcounter is sent to the stats collector. I’m > > > > > > okay with not raising an error but maybe we can have it not to send > > > > > > the message in that case. > > > > > > > > > > > > > > > > makes sense, so changed accordingly. > > > > > > > > > > > > > Thank you for updating the patch! > > > > > > > > > > Thanks, I will push the first one tomorrow unless I see more comments > > > and test-case one later. > > > > I thought we could have a test case for the reset function, what do you think? > > > > We can write if we want but there are few things we need to do for > that like maybe a new function like wait_for_spill_stats which will > check if the counters have become zero. Then probably call a reset > function, call a new wait function, and then again check stats to > ensure they are reset to 0. Yes. > We can't write any advanced test which means reset the existing stats > perform some tests and again check stats because *slot_get_changes() > function can start from the previous WAL for which we have covered the > stats. We might write that if we can somehow track the WAL positions > from the previous test. I am not sure if we want to go there. Can we use pg_logical_slot_peek_changes() instead to decode the same transactions multiple times? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Oct 8, 2020 at 1:55 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 8 Oct 2020 at 14:10, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > We can write if we want but there are few things we need to do for > > that like maybe a new function like wait_for_spill_stats which will > > check if the counters have become zero. Then probably call a reset > > function, call a new wait function, and then again check stats to > > ensure they are reset to 0. > > Yes. > I am not sure if it is worth but probably it is not a bad idea especially if we extend the existing tests based on your below idea? > > We can't write any advanced test which means reset the existing stats > > perform some tests and again check stats because *slot_get_changes() > > function can start from the previous WAL for which we have covered the > > stats. We might write that if we can somehow track the WAL positions > > from the previous test. I am not sure if we want to go there. > > Can we use pg_logical_slot_peek_changes() instead to decode the same > transactions multiple times? > I think this will do the trick. If we want to go there then I suggest we can have a separate regression test file in test_decoding with name as decoding_stats, stats, or something like that. We can later add the tests related to streaming stats in that file as well. -- With Regards, Amit Kapila.
On Thu, Oct 8, 2020 at 7:46 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 7 Oct 2020 at 17:52, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > I think after we are done with this the next > > step would be to finish the streaming stats work [1]. We probably need > > to review and add the test case in that patch. If nobody else shows up > > I will pick it up and complete it. > > +1 > I can review that patch. > I have rebased the stream stats patch and made minor modifications. I haven't done a detailed review but one thing that I think is not correct is: @@ -3496,10 +3499,18 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn) txn->snapshot_now = NULL; } + + rb->streamCount += 1; + rb->streamBytes += txn->total_size; + + /* Don't consider already streamed transaction. */ + rb->streamTxns += (rbtxn_is_streamed(txn)) ? 0 : 1; + /* Process and send the changes to output plugin. */ ReorderBufferProcessTXN(rb, txn, InvalidXLogRecPtr, snapshot_now, command_id, true); I think we should update the stream stats after ReorderBufferProcessTXN rather than before because any error in ReorderBufferProcessTXN can lead to an unnecessary update of stats. But OTOH, the txn flags, and other data can be changed after ReorderBufferProcessTXN so we need to save them in a temporary variable before calling the function. -- With Regards, Amit Kapila.
Attachment
On Thu, 8 Oct 2020 at 17:59, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 8, 2020 at 1:55 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 8 Oct 2020 at 14:10, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > We can write if we want but there are few things we need to do for > > > that like maybe a new function like wait_for_spill_stats which will > > > check if the counters have become zero. Then probably call a reset > > > function, call a new wait function, and then again check stats to > > > ensure they are reset to 0. > > > > Yes. > > > > I am not sure if it is worth but probably it is not a bad idea > especially if we extend the existing tests based on your below idea? > > > > We can't write any advanced test which means reset the existing stats > > > perform some tests and again check stats because *slot_get_changes() > > > function can start from the previous WAL for which we have covered the > > > stats. We might write that if we can somehow track the WAL positions > > > from the previous test. I am not sure if we want to go there. > > > > Can we use pg_logical_slot_peek_changes() instead to decode the same > > transactions multiple times? > > > > I think this will do the trick. If we want to go there then I suggest > we can have a separate regression test file in test_decoding with name > as decoding_stats, stats, or something like that. We can later add the > tests related to streaming stats in that file as well. > Agreed. I've updated the patch. Please review it. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Thu, 8 Oct 2020 at 22:57, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 8, 2020 at 7:46 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 7 Oct 2020 at 17:52, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > I think after we are done with this the next > > > step would be to finish the streaming stats work [1]. We probably need > > > to review and add the test case in that patch. If nobody else shows up > > > I will pick it up and complete it. > > > > +1 > > I can review that patch. > > > > I have rebased the stream stats patch and made minor modifications. I > haven't done a detailed review but one thing that I think is not > correct is: > @@ -3496,10 +3499,18 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, > ReorderBufferTXN *txn) > txn->snapshot_now = NULL; > } > > + > + rb->streamCount += 1; > + rb->streamBytes += txn->total_size; > + > + /* Don't consider already streamed transaction. */ > + rb->streamTxns += (rbtxn_is_streamed(txn)) ? 0 : 1; > + > /* Process and send the changes to output plugin. */ > ReorderBufferProcessTXN(rb, txn, InvalidXLogRecPtr, snapshot_now, > command_id, true); > > I think we should update the stream stats after > ReorderBufferProcessTXN rather than before because any error in > ReorderBufferProcessTXN can lead to an unnecessary update of stats. > But OTOH, the txn flags, and other data can be changed after > ReorderBufferProcessTXN so we need to save them in a temporary > variable before calling the function. Thank you for updating the patch! I've not looked at the patch in-depth yet but RBTXN_IS_STREAMED could be cleared after ReorderBUfferProcessTXN()? BTW maybe it's better to start a new thread for this patch as the title is no longer relevant. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Resetting spilled txn statistics in pg_stat_replication
From
"Shinoda, Noriyoshi (PN Japan A&PS Delivery)"
Date:
Hi, thank you for the awesome feature. As it may have been discussed, I think the 'name' column in pg_stat_replication_slots is more consistent with the columnname and data type matched to the pg_replication_slots catalog. The attached patch changes the name and data type of the 'name' column to slot_name and 'name' type, respectively. Also, the macro name PG_STAT_GET_..._CLOS has been changed to PG_STAT_GET_..._COLS. Regards, Noriyoshi Shinoda -----Original Message----- From: Masahiko Sawada [mailto:masahiko.sawada@2ndquadrant.com] Sent: Monday, October 12, 2020 3:22 PM To: Amit Kapila <amit.kapila16@gmail.com> Cc: Dilip Kumar <dilipbalaut@gmail.com>; Magnus Hagander <magnus@hagander.net>; Tomas Vondra <tomas.vondra@2ndquadrant.com>;PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>; Ajin Cherian <itsajin@gmail.com> Subject: Re: Resetting spilled txn statistics in pg_stat_replication On Thu, 8 Oct 2020 at 22:57, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 8, 2020 at 7:46 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 7 Oct 2020 at 17:52, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > I think after we are done with this the next step would be to > > > finish the streaming stats work [1]. We probably need to review > > > and add the test case in that patch. If nobody else shows up I > > > will pick it up and complete it. > > > > +1 > > I can review that patch. > > > > I have rebased the stream stats patch and made minor modifications. I > haven't done a detailed review but one thing that I think is not > correct is: > @@ -3496,10 +3499,18 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, > ReorderBufferTXN *txn) > txn->snapshot_now = NULL; > } > > + > + rb->streamCount += 1; > + rb->streamBytes += txn->total_size; > + > + /* Don't consider already streamed transaction. */ > + rb->streamTxns += (rbtxn_is_streamed(txn)) ? 0 : 1; > + > /* Process and send the changes to output plugin. */ > ReorderBufferProcessTXN(rb, txn, InvalidXLogRecPtr, snapshot_now, > command_id, true); > > I think we should update the stream stats after > ReorderBufferProcessTXN rather than before because any error in > ReorderBufferProcessTXN can lead to an unnecessary update of stats. > But OTOH, the txn flags, and other data can be changed after > ReorderBufferProcessTXN so we need to save them in a temporary > variable before calling the function. Thank you for updating the patch! I've not looked at the patch in-depth yet but RBTXN_IS_STREAMED could be cleared after ReorderBUfferProcessTXN()? BTW maybe it's better to start a new thread for this patch as the title is no longer relevant. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Mon, Oct 12, 2020 at 10:59 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 8 Oct 2020 at 17:59, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Oct 8, 2020 at 1:55 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Thu, 8 Oct 2020 at 14:10, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > We can write if we want but there are few things we need to do for > > > > that like maybe a new function like wait_for_spill_stats which will > > > > check if the counters have become zero. Then probably call a reset > > > > function, call a new wait function, and then again check stats to > > > > ensure they are reset to 0. > > > > > > Yes. > > > > > > > I am not sure if it is worth but probably it is not a bad idea > > especially if we extend the existing tests based on your below idea? > > > > > > We can't write any advanced test which means reset the existing stats > > > > perform some tests and again check stats because *slot_get_changes() > > > > function can start from the previous WAL for which we have covered the > > > > stats. We might write that if we can somehow track the WAL positions > > > > from the previous test. I am not sure if we want to go there. > > > > > > Can we use pg_logical_slot_peek_changes() instead to decode the same > > > transactions multiple times? > > > > > > > I think this will do the trick. If we want to go there then I suggest > > we can have a separate regression test file in test_decoding with name > > as decoding_stats, stats, or something like that. We can later add the > > tests related to streaming stats in that file as well. > > > > Agreed. > > I've updated the patch. Please review it. > Few comments: ============= 1. +-- function to wait for counters to advance +CREATE FUNCTION wait_for_spill_stats(check_reset bool) RETURNS void AS $$ Can we rename this function to wait_for_decode_stats? I am thinking we can later reuse this function for streaming stats as well by passing the additional parameter 'stream bool'. 2. let's drop the table added by this test and regression_slot at the end of the test. -- With Regards, Amit Kapila.
On Mon, 12 Oct 2020 at 18:29, Shinoda, Noriyoshi (PN Japan A&PS Delivery) <noriyoshi.shinoda@hpe.com> wrote: > > Hi, thank you for the awesome feature. > Thank you for reporting! > As it may have been discussed, I think the 'name' column in pg_stat_replication_slots is more consistent with the columnname and data type matched to the pg_replication_slots catalog. > The attached patch changes the name and data type of the 'name' column to slot_name and 'name' type, respectively. It seems a good idea to me. In other system views, we use the name data type for object name. When I wrote the first patch, I borrowed the code for pg_stat_slru which uses text data for the name but I think it's an oversight. > Also, the macro name PG_STAT_GET_..._CLOS has been changed to PG_STAT_GET_..._COLS. Good catch! Here is my comments on the patch: --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -798,7 +798,7 @@ CREATE VIEW pg_stat_replication AS CREATE VIEW pg_stat_replication_slots AS SELECT - s.name, + s.name AS slot_name, s.spill_txns, s.spill_count, s.spill_bytes, I think we should modify 'proargnames' of pg_stat_get_replication_slots() in pg_proc.dat instead. --- --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -7094,7 +7094,7 @@ pgstat_replslot_index(const char *name, bool create_it) Assert(nReplSlotStats <= max_replication_slots); for (i = 0; i < nReplSlotStats; i++) { - if (strcmp(replSlotStats[i].slotname, name) == 0) + if (strcmp(replSlotStats[i].slotname.data, name) == 0) return i; /* found */ } @@ -7107,7 +7107,7 @@ pgstat_replslot_index(const char *name, bool create_it) /* Register new slot */ memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats)); - memcpy(&replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN); + memcpy(&replSlotStats[nReplSlotStats].slotname.data, name, NAMEDATALEN); return nReplSlotStats++; } We can use NameStr() instead. --- Perhaps we need to update the regression test as well. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Oct 12, 2020 at 11:52 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 8 Oct 2020 at 22:57, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Oct 8, 2020 at 7:46 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > I have rebased the stream stats patch and made minor modifications. I > > haven't done a detailed review but one thing that I think is not > > correct is: > > @@ -3496,10 +3499,18 @@ ReorderBufferStreamTXN(ReorderBuffer *rb, > > ReorderBufferTXN *txn) > > txn->snapshot_now = NULL; > > } > > > > + > > + rb->streamCount += 1; > > + rb->streamBytes += txn->total_size; > > + > > + /* Don't consider already streamed transaction. */ > > + rb->streamTxns += (rbtxn_is_streamed(txn)) ? 0 : 1; > > + > > /* Process and send the changes to output plugin. */ > > ReorderBufferProcessTXN(rb, txn, InvalidXLogRecPtr, snapshot_now, > > command_id, true); > > > > I think we should update the stream stats after > > ReorderBufferProcessTXN rather than before because any error in > > ReorderBufferProcessTXN can lead to an unnecessary update of stats. > > But OTOH, the txn flags, and other data can be changed after > > ReorderBufferProcessTXN so we need to save them in a temporary > > variable before calling the function. > > Thank you for updating the patch! > > I've not looked at the patch in-depth yet but RBTXN_IS_STREAMED could > be cleared after ReorderBUfferProcessTXN()? > I think you mean to say RBTXN_IS_STREAMED could be *set* after ReorderBUfferProcessTXN(). We need to set it for txn and subtxns and currently, it is being done with other things in ReorderBufferTruncateTXN so not sure if it is a good idea to do this separately. > BTW maybe it's better to start a new thread for this patch as the > title is no longer relevant. > Yeah, that makes sense. I'll do that while posting a new version of the patch. -- With Regards, Amit Kapila.
RE: Resetting spilled txn statistics in pg_stat_replication
From
"Shinoda, Noriyoshi (PN Japan A&PS Delivery)"
Date:
Sawada-san, Thank you your comments. The attached patch reflects the comment. I also made a fix for the regression test. Regards, Noriyoshi Shinoda -----Original Message----- From: Masahiko Sawada [mailto:masahiko.sawada@2ndquadrant.com] Sent: Monday, October 12, 2020 8:12 PM To: Shinoda, Noriyoshi (PN Japan A&PS Delivery) <noriyoshi.shinoda@hpe.com> Cc: Amit Kapila <amit.kapila16@gmail.com>; Dilip Kumar <dilipbalaut@gmail.com>; Magnus Hagander <magnus@hagander.net>; TomasVondra <tomas.vondra@2ndquadrant.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>; Ajin Cherian <itsajin@gmail.com> Subject: Re: Resetting spilled txn statistics in pg_stat_replication On Mon, 12 Oct 2020 at 18:29, Shinoda, Noriyoshi (PN Japan A&PS Delivery) <noriyoshi.shinoda@hpe.com> wrote: > > Hi, thank you for the awesome feature. > Thank you for reporting! > As it may have been discussed, I think the 'name' column in pg_stat_replication_slots is more consistent with the columnname and data type matched to the pg_replication_slots catalog. > The attached patch changes the name and data type of the 'name' column to slot_name and 'name' type, respectively. It seems a good idea to me. In other system views, we use the name data type for object name. When I wrote the first patch,I borrowed the code for pg_stat_slru which uses text data for the name but I think it's an oversight. > Also, the macro name PG_STAT_GET_..._CLOS has been changed to PG_STAT_GET_..._COLS. Good catch! Here is my comments on the patch: --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -798,7 +798,7 @@ CREATE VIEW pg_stat_replication AS CREATE VIEW pg_stat_replication_slots AS SELECT - s.name, + s.name AS slot_name, s.spill_txns, s.spill_count, s.spill_bytes, I think we should modify 'proargnames' of pg_stat_get_replication_slots() in pg_proc.dat instead. --- --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -7094,7 +7094,7 @@ pgstat_replslot_index(const char *name, bool create_it) Assert(nReplSlotStats <= max_replication_slots); for (i = 0; i < nReplSlotStats; i++) { - if (strcmp(replSlotStats[i].slotname, name) == 0) + if (strcmp(replSlotStats[i].slotname.data, name) == 0) return i; /* found */ } @@ -7107,7 +7107,7 @@ pgstat_replslot_index(const char *name, bool create_it) /* Register new slot */ memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats)); - memcpy(&replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN); + memcpy(&replSlotStats[nReplSlotStats].slotname.data, name, + NAMEDATALEN); return nReplSlotStats++; } We can use NameStr() instead. --- Perhaps we need to update the regression test as well. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Mon, 12 Oct 2020 at 19:24, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Oct 12, 2020 at 10:59 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 8 Oct 2020 at 17:59, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Oct 8, 2020 at 1:55 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Thu, 8 Oct 2020 at 14:10, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > We can write if we want but there are few things we need to do for > > > > > that like maybe a new function like wait_for_spill_stats which will > > > > > check if the counters have become zero. Then probably call a reset > > > > > function, call a new wait function, and then again check stats to > > > > > ensure they are reset to 0. > > > > > > > > Yes. > > > > > > > > > > I am not sure if it is worth but probably it is not a bad idea > > > especially if we extend the existing tests based on your below idea? > > > > > > > > We can't write any advanced test which means reset the existing stats > > > > > perform some tests and again check stats because *slot_get_changes() > > > > > function can start from the previous WAL for which we have covered the > > > > > stats. We might write that if we can somehow track the WAL positions > > > > > from the previous test. I am not sure if we want to go there. > > > > > > > > Can we use pg_logical_slot_peek_changes() instead to decode the same > > > > transactions multiple times? > > > > > > > > > > I think this will do the trick. If we want to go there then I suggest > > > we can have a separate regression test file in test_decoding with name > > > as decoding_stats, stats, or something like that. We can later add the > > > tests related to streaming stats in that file as well. > > > > > > > Agreed. > > > > I've updated the patch. Please review it. > > > > Few comments: > ============= Thank you for your review. > 1. > +-- function to wait for counters to advance > +CREATE FUNCTION wait_for_spill_stats(check_reset bool) RETURNS void AS $$ > > Can we rename this function to wait_for_decode_stats? I am thinking we > can later reuse this function for streaming stats as well by passing > the additional parameter 'stream bool'. +1. Fixed. > > 2. let's drop the table added by this test and regression_slot at the > end of the test. Fixed. Attached the updated version patch. Please review it. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
(Please avoid top-posting) On Mon, 12 Oct 2020 at 23:45, Shinoda, Noriyoshi (PN Japan A&PS Delivery) <noriyoshi.shinoda@hpe.com> wrote: > > Sawada-san, Thank you your comments. > > The attached patch reflects the comment. > I also made a fix for the regression test. > > Regards, > Noriyoshi Shinoda > > -----Original Message----- > From: Masahiko Sawada [mailto:masahiko.sawada@2ndquadrant.com] > Sent: Monday, October 12, 2020 8:12 PM > To: Shinoda, Noriyoshi (PN Japan A&PS Delivery) <noriyoshi.shinoda@hpe.com> > Cc: Amit Kapila <amit.kapila16@gmail.com>; Dilip Kumar <dilipbalaut@gmail.com>; Magnus Hagander <magnus@hagander.net>;Tomas Vondra <tomas.vondra@2ndquadrant.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>;Ajin Cherian <itsajin@gmail.com> > Subject: Re: Resetting spilled txn statistics in pg_stat_replication > > On Mon, 12 Oct 2020 at 18:29, Shinoda, Noriyoshi (PN Japan A&PS > Delivery) <noriyoshi.shinoda@hpe.com> wrote: > > > > Hi, thank you for the awesome feature. > > > > Thank you for reporting! > > > As it may have been discussed, I think the 'name' column in pg_stat_replication_slots is more consistent with the columnname and data type matched to the pg_replication_slots catalog. > > The attached patch changes the name and data type of the 'name' column to slot_name and 'name' type, respectively. > > It seems a good idea to me. In other system views, we use the name data type for object name. When I wrote the first patch,I borrowed the code for pg_stat_slru which uses text data for the name but I think it's an oversight. Hmm, my above observation is wrong. All other statistics use text data type and internally use char[NAMEDATALEN]. So I think renaming to 'slot_name' would be a good idea but probably we don’t need to change the internally used data type. For the data type of slot_name of pg_stat_replication_slots view, given that the doc says the following[1], I think we can keep it too as this view is not a system catalog. What do you think? 8.3. Character Types: The name type exists only for the storage of identifiers in the internal system catalogs [1] https://www.postgresql.org/docs/devel/datatype-character.html Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 13, 2020 at 4:54 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > Attached the updated version patch. Please review it. > I have pushed this but it failed in one of the BF. See https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2020-10-13%2003%3A07%3A25 The failure is shown below and I am analyzing it. See, if you can provide any insights. @@ -58,7 +58,7 @@ SELECT name, spill_txns, spill_count FROM pg_stat_replication_slots; name | spill_txns | spill_count -----------------+------------+------------- - regression_slot | 1 | 12 + regression_slot | 1 | 10 (1 row) -- reset the slot stats, and wait for stats collector to reset @@ -96,7 +96,7 @@ SELECT name, spill_txns, spill_count FROM pg_stat_replication_slots; name | spill_txns | spill_count -----------------+------------+------------- - regression_slot | 1 | 12 + regression_slot | 1 | 10 (1 row) -- With Regards, Amit Kapila.
On Tue, Oct 13, 2020 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 13, 2020 at 4:54 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > Attached the updated version patch. Please review it. > > > > I have pushed this but it failed in one of the BF. See > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2020-10-13%2003%3A07%3A25 > > The failure is shown below and I am analyzing it. See, if you can > provide any insights. > > @@ -58,7 +58,7 @@ > SELECT name, spill_txns, spill_count FROM pg_stat_replication_slots; > name | spill_txns | spill_count > -----------------+------------+------------- > - regression_slot | 1 | 12 > + regression_slot | 1 | 10 > (1 row) > > -- reset the slot stats, and wait for stats collector to reset > @@ -96,7 +96,7 @@ > SELECT name, spill_txns, spill_count FROM pg_stat_replication_slots; > name | spill_txns | spill_count > -----------------+------------+------------- > - regression_slot | 1 | 12 > + regression_slot | 1 | 10 > (1 row) > The reason for this problem could be that there is some transaction (say by autovacuum) which happened interleaved with this transaction and committed before this one. Now during DecodeCommit of this background transaction, we will send the stats accumulated by that time which could lead to such a problem. -- With Regards, Amit Kapila.
On Tue, Oct 13, 2020 at 9:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 13, 2020 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Oct 13, 2020 at 4:54 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > Attached the updated version patch. Please review it. > > > > > > > I have pushed this but it failed in one of the BF. See > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2020-10-13%2003%3A07%3A25 > > > > The failure is shown below and I am analyzing it. See, if you can > > provide any insights. > > > > @@ -58,7 +58,7 @@ > > SELECT name, spill_txns, spill_count FROM pg_stat_replication_slots; > > name | spill_txns | spill_count > > -----------------+------------+------------- > > - regression_slot | 1 | 12 > > + regression_slot | 1 | 10 > > (1 row) > > > > -- reset the slot stats, and wait for stats collector to reset > > @@ -96,7 +96,7 @@ > > SELECT name, spill_txns, spill_count FROM pg_stat_replication_slots; > > name | spill_txns | spill_count > > -----------------+------------+------------- > > - regression_slot | 1 | 12 > > + regression_slot | 1 | 10 > > (1 row) > > > > The reason for this problem could be that there is some transaction > (say by autovacuum) which happened interleaved with this transaction > and committed before this one. Now during DecodeCommit of this > background transaction, we will send the stats accumulated by that > time which could lead to such a problem. > If this theory is correct then I think we can't rely on the 'spill_count' value, what do you think? -- With Regards, Amit Kapila.
Amit Kapila <amit.kapila16@gmail.com> writes: > I have pushed this but it failed in one of the BF. See > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2020-10-13%2003%3A07%3A25 > The failure is shown below and I am analyzing it. See, if you can > provide any insights. It's not very clear what spill_count actually counts (and the documentation sure does nothing to clarify that), but if it has anything to do with WAL volume, the explanation might be that florican is 32-bit. All the animals that have passed that test so far are 64-bit. > The reason for this problem could be that there is some transaction > (say by autovacuum) which happened interleaved with this transaction > and committed before this one. I can believe that idea too, but would it not have resulted in a diff in spill_txns as well? In short, I'm not real convinced that a stable result is possible in this test. Maybe you should just test for spill_txns and spill_count being positive. regards, tom lane
On Tue, Oct 13, 2020 at 9:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > I have pushed this but it failed in one of the BF. See > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2020-10-13%2003%3A07%3A25 > > The failure is shown below and I am analyzing it. See, if you can > > provide any insights. > > It's not very clear what spill_count actually counts (and the > documentation sure does nothing to clarify that), but if it has anything > to do with WAL volume, the explanation might be that florican is 32-bit. > All the animals that have passed that test so far are 64-bit. > It is based on the size of the change. In this case, it is the size of the tuples inserted. See ReorderBufferChangeSize() know how we compute the size of each change. Once the total_size for changes crosses logical_decoding_work_mem (64kB) in this case, we will spill. So 'spill_count' is the number of times the size of changes in that transaction crossed the threshold and which lead to a spill of the corresponding changes. > > The reason for this problem could be that there is some transaction > > (say by autovacuum) which happened interleaved with this transaction > > and committed before this one. > > I can believe that idea too, but would it not have resulted in a > diff in spill_txns as well? > We count that 'spill_txns' once for a transaction that is ever spilled. I think the 'spill_txns' wouldn't vary for this particular test even if the autovacuum transaction happens-before the main transaction of the test because in that case, wait_for_decode_stats won't finish until it sees the main transaction ('spill_txns' won't be positive by that time) > In short, I'm not real convinced that a stable result is possible in this > test. Maybe you should just test for spill_txns and spill_count being > positive. > Yeah, that seems like the best we can do here. -- With Regards, Amit Kapila.
Amit Kapila <amit.kapila16@gmail.com> writes: > On Tue, Oct 13, 2020 at 9:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It's not very clear what spill_count actually counts (and the >> documentation sure does nothing to clarify that), but if it has anything >> to do with WAL volume, the explanation might be that florican is 32-bit. >> All the animals that have passed that test so far are 64-bit. prairiedog just failed in not-quite-the-same way, which reinforces the idea that this test is dependent on MAXALIGN, which determines physical tuple size. (I just checked the buildfarm, and the four active members that report MAXALIGN 4 during configure are florican, lapwing, locust, and prairiedog. Not sure about the MSVC critters though.) The spill_count number is different though, so it seems that that may not be the whole story. > It is based on the size of the change. In this case, it is the size of > the tuples inserted. See ReorderBufferChangeSize() know how we compute > the size of each change. I know I can go read the source code, but most users will not want to. Is the documentation in monitoring.sgml really sufficient? If we can't explain this with more precision, is it really a number we want to expose at all? regards, tom lane
On Tue, Oct 13, 2020 at 10:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > On Tue, Oct 13, 2020 at 9:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> It's not very clear what spill_count actually counts (and the > >> documentation sure does nothing to clarify that), but if it has anything > >> to do with WAL volume, the explanation might be that florican is 32-bit. > >> All the animals that have passed that test so far are 64-bit. > > prairiedog just failed in not-quite-the-same way, which reinforces the > idea that this test is dependent on MAXALIGN, which determines physical > tuple size. (I just checked the buildfarm, and the four active members > that report MAXALIGN 4 during configure are florican, lapwing, locust, > and prairiedog. Not sure about the MSVC critters though.) The > spill_count number is different though, so it seems that that may not > be the whole story. > It is possible that MAXALIGN stuff is playing a role here and or the background transaction stuff. I think if we go with the idea of testing spill_txns and spill_count being positive then the results will be stable. I'll write a patch for that. > > It is based on the size of the change. In this case, it is the size of > > the tuples inserted. See ReorderBufferChangeSize() know how we compute > > the size of each change. > > I know I can go read the source code, but most users will not want to. > Is the documentation in monitoring.sgml really sufficient? If we can't > explain this with more precision, is it really a number we want to expose > at all? > This counter is important to give users an idea about the amount of I/O we incur during decoding and to tune logical_decoding_work_mem GUC. So, I would prefer to improve the documentation for this variable. -- With Regards, Amit Kapila.
I wrote: > prairiedog just failed in not-quite-the-same way, which reinforces the > idea that this test is dependent on MAXALIGN, which determines physical > tuple size. (I just checked the buildfarm, and the four active members > that report MAXALIGN 4 during configure are florican, lapwing, locust, > and prairiedog. Not sure about the MSVC critters though.) The > spill_count number is different though, so it seems that that may not > be the whole story. Oh, and here comes lapwing: - regression_slot | 1 | 12 + regression_slot | 1 | 10 So if it weren't that prairiedog showed 11 not 10, we'd have a nice neat it-depends-on-MAXALIGN theory. As is, I'm not sure what all is affecting it, though MAXALIGN sure seems to be a component. (locust seems to be AWOL at the moment, so I'm not holding my breath for that one to report in.) regards, tom lane
On Tue, Oct 13, 2020 at 10:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 13, 2020 at 10:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Amit Kapila <amit.kapila16@gmail.com> writes: > > > On Tue, Oct 13, 2020 at 9:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> It's not very clear what spill_count actually counts (and the > > >> documentation sure does nothing to clarify that), but if it has anything > > >> to do with WAL volume, the explanation might be that florican is 32-bit. > > >> All the animals that have passed that test so far are 64-bit. > > > > prairiedog just failed in not-quite-the-same way, which reinforces the > > idea that this test is dependent on MAXALIGN, which determines physical > > tuple size. (I just checked the buildfarm, and the four active members > > that report MAXALIGN 4 during configure are florican, lapwing, locust, > > and prairiedog. Not sure about the MSVC critters though.) The > > spill_count number is different though, so it seems that that may not > > be the whole story. > > > > It is possible that MAXALIGN stuff is playing a role here and or the > background transaction stuff. I think if we go with the idea of > testing spill_txns and spill_count being positive then the results > will be stable. I'll write a patch for that. > Please find the attached patch for the same. Additionally, I have skipped empty xacts during decoding as background autovacuum transactions can impact that count as well. I have done some minimal testing with this. I'll do some more. -- With Regards, Amit Kapila.
Attachment
Amit Kapila <amit.kapila16@gmail.com> writes: >> It is possible that MAXALIGN stuff is playing a role here and or the >> background transaction stuff. I think if we go with the idea of >> testing spill_txns and spill_count being positive then the results >> will be stable. I'll write a patch for that. Here's our first failure on a MAXALIGN-8 machine: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-10-13%2005%3A00%3A08 So this is just plain not stable. It is odd though. I can easily think of mechanisms that would cause the WAL volume to occasionally be *more* than the "typical" case. What would cause it to be *less*, if MAXALIGN is ruled out? regards, tom lane
On Tue, Oct 13, 2020 at 11:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > >> It is possible that MAXALIGN stuff is playing a role here and or the > >> background transaction stuff. I think if we go with the idea of > >> testing spill_txns and spill_count being positive then the results > >> will be stable. I'll write a patch for that. > > Here's our first failure on a MAXALIGN-8 machine: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-10-13%2005%3A00%3A08 > > So this is just plain not stable. It is odd though. I can > easily think of mechanisms that would cause the WAL volume > to occasionally be *more* than the "typical" case. What > would cause it to be *less*, if MAXALIGN is ruled out? > The original theory I have given above [1] which is an interleaved autovacumm transaction. Let me try to explain in a bit more detail. Say when transaction T-1 is performing Insert ('INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) g(i);') a parallel autovacuum transaction occurs. The problem as seen in buildfarm will happen when autovacuum transaction happens after 80% or more of the Insert is done. In such a situation we will start decoding 'Insert' first and need to spill multiple times due to the amount of changes (more than threshold logical_decoding_work_mem) and then before we encounter Commit of transaction that performed Insert (and probably some more changes from that transaction) we will encounter a small transaction (autovacuum transaction). The decode of that small transaction will send the stats collected till now which will lead to the problem shown in buildfarm. [1] - https://www.postgresql.org/message-id/CAA4eK1Jo0U1oSJyxrdA7i-bOOTh0Hue-NQqdG-CEqwGtDZPjyw%40mail.gmail.com -- With Regards, Amit Kapila.
On Tue, 13 Oct 2020 at 14:53, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 13, 2020 at 11:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Amit Kapila <amit.kapila16@gmail.com> writes: > > >> It is possible that MAXALIGN stuff is playing a role here and or the > > >> background transaction stuff. I think if we go with the idea of > > >> testing spill_txns and spill_count being positive then the results > > >> will be stable. I'll write a patch for that. > > > > Here's our first failure on a MAXALIGN-8 machine: > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-10-13%2005%3A00%3A08 > > > > So this is just plain not stable. It is odd though. I can > > easily think of mechanisms that would cause the WAL volume > > to occasionally be *more* than the "typical" case. What > > would cause it to be *less*, if MAXALIGN is ruled out? > > > > The original theory I have given above [1] which is an interleaved > autovacumm transaction. Let me try to explain in a bit more detail. > Say when transaction T-1 is performing Insert ('INSERT INTO stats_test > SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) > g(i);') a parallel autovacuum transaction occurs. The problem as seen > in buildfarm will happen when autovacuum transaction happens after 80% > or more of the Insert is done. > > In such a situation we will start decoding 'Insert' first and need to > spill multiple times due to the amount of changes (more than threshold > logical_decoding_work_mem) and then before we encounter Commit of > transaction that performed Insert (and probably some more changes from > that transaction) we will encounter a small transaction (autovacuum > transaction). The decode of that small transaction will send the > stats collected till now which will lead to the problem shown in > buildfarm. That seems a possible scenario. I think probably this also explains the reason why spill_count slightly varied and spill_txns was still 1. The spill_count value depends on how much the process spilled out transactions before encountering the commit of an autovacuum transaction. Since we have the spill statistics per reorder buffer, not per transactions, it's possible. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 13, 2020 at 11:49 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 13 Oct 2020 at 14:53, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Oct 13, 2020 at 11:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > Amit Kapila <amit.kapila16@gmail.com> writes: > > > >> It is possible that MAXALIGN stuff is playing a role here and or the > > > >> background transaction stuff. I think if we go with the idea of > > > >> testing spill_txns and spill_count being positive then the results > > > >> will be stable. I'll write a patch for that. > > > > > > Here's our first failure on a MAXALIGN-8 machine: > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-10-13%2005%3A00%3A08 > > > > > > So this is just plain not stable. It is odd though. I can > > > easily think of mechanisms that would cause the WAL volume > > > to occasionally be *more* than the "typical" case. What > > > would cause it to be *less*, if MAXALIGN is ruled out? > > > > > > > The original theory I have given above [1] which is an interleaved > > autovacumm transaction. Let me try to explain in a bit more detail. > > Say when transaction T-1 is performing Insert ('INSERT INTO stats_test > > SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) > > g(i);') a parallel autovacuum transaction occurs. The problem as seen > > in buildfarm will happen when autovacuum transaction happens after 80% > > or more of the Insert is done. > > > > In such a situation we will start decoding 'Insert' first and need to > > spill multiple times due to the amount of changes (more than threshold > > logical_decoding_work_mem) and then before we encounter Commit of > > transaction that performed Insert (and probably some more changes from > > that transaction) we will encounter a small transaction (autovacuum > > transaction). The decode of that small transaction will send the > > stats collected till now which will lead to the problem shown in > > buildfarm. > > That seems a possible scenario. > > I think probably this also explains the reason why spill_count > slightly varied and spill_txns was still 1. The spill_count value > depends on how much the process spilled out transactions before > encountering the commit of an autovacuum transaction. Since we have > the spill statistics per reorder buffer, not per transactions, it's > possible. > Okay, here is an updated version (changed some comments) of the patch I posted some time back. What do you think? I have tested this on both Windows and Linux environments. I think it is a bit tricky to reproduce the exact scenario so if you are fine we can push this and check or let me know if you any better idea? -- With Regards, Amit Kapila.
Attachment
On Tue, 13 Oct 2020 at 15:27, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 13, 2020 at 11:49 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 13 Oct 2020 at 14:53, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Oct 13, 2020 at 11:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > > > Amit Kapila <amit.kapila16@gmail.com> writes: > > > > >> It is possible that MAXALIGN stuff is playing a role here and or the > > > > >> background transaction stuff. I think if we go with the idea of > > > > >> testing spill_txns and spill_count being positive then the results > > > > >> will be stable. I'll write a patch for that. > > > > > > > > Here's our first failure on a MAXALIGN-8 machine: > > > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-10-13%2005%3A00%3A08 > > > > > > > > So this is just plain not stable. It is odd though. I can > > > > easily think of mechanisms that would cause the WAL volume > > > > to occasionally be *more* than the "typical" case. What > > > > would cause it to be *less*, if MAXALIGN is ruled out? > > > > > > > > > > The original theory I have given above [1] which is an interleaved > > > autovacumm transaction. Let me try to explain in a bit more detail. > > > Say when transaction T-1 is performing Insert ('INSERT INTO stats_test > > > SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) > > > g(i);') a parallel autovacuum transaction occurs. The problem as seen > > > in buildfarm will happen when autovacuum transaction happens after 80% > > > or more of the Insert is done. > > > > > > In such a situation we will start decoding 'Insert' first and need to > > > spill multiple times due to the amount of changes (more than threshold > > > logical_decoding_work_mem) and then before we encounter Commit of > > > transaction that performed Insert (and probably some more changes from > > > that transaction) we will encounter a small transaction (autovacuum > > > transaction). The decode of that small transaction will send the > > > stats collected till now which will lead to the problem shown in > > > buildfarm. > > > > That seems a possible scenario. > > > > I think probably this also explains the reason why spill_count > > slightly varied and spill_txns was still 1. The spill_count value > > depends on how much the process spilled out transactions before > > encountering the commit of an autovacuum transaction. Since we have > > the spill statistics per reorder buffer, not per transactions, it's > > possible. > > > > Okay, here is an updated version (changed some comments) of the patch > I posted some time back. What do you think? I have tested this on both > Windows and Linux environments. I think it is a bit tricky to > reproduce the exact scenario so if you are fine we can push this and > check or let me know if you any better idea? I agree to check if the spill_counts and spill_txns are positive. I thought we can reduce the number of tuples to insert to the half. It would help to reduce the likelihood of other transactions interfere and speed up the test (currently, the stats.sql test takes almost 1 sec in my environment). But it might lead to another problem like the logical decoding doesn't spill out the transaction on some environment. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 13, 2020 at 12:17 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 13 Oct 2020 at 15:27, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Oct 13, 2020 at 11:49 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Tue, 13 Oct 2020 at 14:53, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > The original theory I have given above [1] which is an interleaved > > > > autovacumm transaction. Let me try to explain in a bit more detail. > > > > Say when transaction T-1 is performing Insert ('INSERT INTO stats_test > > > > SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) > > > > g(i);') a parallel autovacuum transaction occurs. The problem as seen > > > > in buildfarm will happen when autovacuum transaction happens after 80% > > > > or more of the Insert is done. > > > > > > > > In such a situation we will start decoding 'Insert' first and need to > > > > spill multiple times due to the amount of changes (more than threshold > > > > logical_decoding_work_mem) and then before we encounter Commit of > > > > transaction that performed Insert (and probably some more changes from > > > > that transaction) we will encounter a small transaction (autovacuum > > > > transaction). The decode of that small transaction will send the > > > > stats collected till now which will lead to the problem shown in > > > > buildfarm. > > > > > > That seems a possible scenario. > > > > > > I think probably this also explains the reason why spill_count > > > slightly varied and spill_txns was still 1. The spill_count value > > > depends on how much the process spilled out transactions before > > > encountering the commit of an autovacuum transaction. Since we have > > > the spill statistics per reorder buffer, not per transactions, it's > > > possible. > > > > > > > Okay, here is an updated version (changed some comments) of the patch > > I posted some time back. What do you think? I have tested this on both > > Windows and Linux environments. I think it is a bit tricky to > > reproduce the exact scenario so if you are fine we can push this and > > check or let me know if you any better idea? > > I agree to check if the spill_counts and spill_txns are positive. > I am able to reproduce this problem via debugger. Basically, execute the Insert mentioned above from one the psql sessions and in ExecInsert() stop the execution once 'estate->es_processed > 4000' and then from another psql terminal execute some DDL which will be ignored but will any try to decode commit. Then perform 'continue' in the first session. This will lead to inconsistent stats value depending upon at what time DDL is performed. I'll push the patch as I am more confident now. > I > thought we can reduce the number of tuples to insert to the half. It > would help to reduce the likelihood of other transactions interfere > and speed up the test (currently, the stats.sql test takes almost 1 > sec in my environment). But it might lead to another problem like the > logical decoding doesn't spill out the transaction on some > environment. > Yeah, and in other cases, in spill.sql we are using the same amount of data to test spilling. -- With Regards, Amit Kapila.
On Tue, Oct 13, 2020 at 12:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 13, 2020 at 12:17 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > I agree to check if the spill_counts and spill_txns are positive. > > > > I am able to reproduce this problem via debugger. Basically, execute > the Insert mentioned above from one the psql sessions and in > ExecInsert() stop the execution once 'estate->es_processed > 4000' and > then from another psql terminal execute some DDL which will be ignored > but will any try to decode commit. /will any try to decode commit./we will anyway try to decode commit for this DDL transaction when decoding changes via pg_logical_slot_peek_changes > Then perform 'continue' in the > first session. This will lead to inconsistent stats value depending > upon at what time DDL is performed. I'll push the patch as I am more > confident now. > -- With Regards, Amit Kapila.
On Tue, 13 Oct 2020 at 16:12, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 13, 2020 at 12:17 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 13 Oct 2020 at 15:27, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Oct 13, 2020 at 11:49 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Tue, 13 Oct 2020 at 14:53, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > The original theory I have given above [1] which is an interleaved > > > > > autovacumm transaction. Let me try to explain in a bit more detail. > > > > > Say when transaction T-1 is performing Insert ('INSERT INTO stats_test > > > > > SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) > > > > > g(i);') a parallel autovacuum transaction occurs. The problem as seen > > > > > in buildfarm will happen when autovacuum transaction happens after 80% > > > > > or more of the Insert is done. > > > > > > > > > > In such a situation we will start decoding 'Insert' first and need to > > > > > spill multiple times due to the amount of changes (more than threshold > > > > > logical_decoding_work_mem) and then before we encounter Commit of > > > > > transaction that performed Insert (and probably some more changes from > > > > > that transaction) we will encounter a small transaction (autovacuum > > > > > transaction). The decode of that small transaction will send the > > > > > stats collected till now which will lead to the problem shown in > > > > > buildfarm. > > > > > > > > That seems a possible scenario. > > > > > > > > I think probably this also explains the reason why spill_count > > > > slightly varied and spill_txns was still 1. The spill_count value > > > > depends on how much the process spilled out transactions before > > > > encountering the commit of an autovacuum transaction. Since we have > > > > the spill statistics per reorder buffer, not per transactions, it's > > > > possible. > > > > > > > > > > Okay, here is an updated version (changed some comments) of the patch > > > I posted some time back. What do you think? I have tested this on both > > > Windows and Linux environments. I think it is a bit tricky to > > > reproduce the exact scenario so if you are fine we can push this and > > > check or let me know if you any better idea? > > > > I agree to check if the spill_counts and spill_txns are positive. > > > > I am able to reproduce this problem via debugger. Basically, execute > the Insert mentioned above from one the psql sessions and in > ExecInsert() stop the execution once 'estate->es_processed > 4000' and > then from another psql terminal execute some DDL which will be ignored > but will any try to decode commit. Then perform 'continue' in the > first session. This will lead to inconsistent stats value depending > upon at what time DDL is performed. Thanks! I'm also able to reproduce this in a similar way and have confirmed the patch fixes it. > I'll push the patch as I am more > confident now. +1. Let's check how the tests are going to be. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 13, 2020 at 12:51 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > > > I'll push the patch as I am more > > confident now. > > +1. Let's check how the tests are going to be. > Okay, thanks. I have pushed it. -- With Regards, Amit Kapila.
Amit Kapila <amit.kapila16@gmail.com> writes: > I am able to reproduce this problem via debugger. Basically, execute > the Insert mentioned above from one the psql sessions and in > ExecInsert() stop the execution once 'estate->es_processed > 4000' and > then from another psql terminal execute some DDL which will be ignored > but will any try to decode commit. Then perform 'continue' in the > first session. This will lead to inconsistent stats value depending > upon at what time DDL is performed. I'll push the patch as I am more > confident now. So ... doesn't this mean that if the concurrent transaction commits very shortly after our query starts, decoding might stop without having ever spilled at all? IOW, I'm afraid that the revised test can still fail, just at a frequency circa one-twelfth of before. I'm also somewhat suspicious of this explanation because it doesn't seem to account for the clear experimental evidence that 32-bit machines were more prone to failure than 64-bit. regards, tom lane
On Tue, Oct 13, 2020 at 7:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > I am able to reproduce this problem via debugger. Basically, execute > > the Insert mentioned above from one the psql sessions and in > > ExecInsert() stop the execution once 'estate->es_processed > 4000' and > > then from another psql terminal execute some DDL which will be ignored > > but will any try to decode commit. Then perform 'continue' in the > > first session. This will lead to inconsistent stats value depending > > upon at what time DDL is performed. I'll push the patch as I am more > > confident now. > > So ... doesn't this mean that if the concurrent transaction commits very > shortly after our query starts, decoding might stop without having ever > spilled at all? > I am assuming in "our query starts" you refer to Insert statement used in the test. No, the decoding (and required spilling) will still happen. It is only that we will try to send the stats accumulated by that time which will be zero. And later when the decoding of our Insert transaction is finished it will again send the updated stats. It should work fine in this or similar scenarios. -- With Regards, Amit Kapila.
RE: Resetting spilled txn statistics in pg_stat_replication
From
"Shinoda, Noriyoshi (PN Japan A&PS Delivery)"
Date:
Thanks for your comment. > 8.3. Character Types: > The name type exists only for the storage of identifiers in the internal system catalogs I didn't know the policy about data types. Thank you. But I think the column names should match pg_replication_slots. The attached patch changes only the column names and macros. Regards, Noriyoshi Shinoda -----Original Message----- From: Masahiko Sawada [mailto:masahiko.sawada@2ndquadrant.com] Sent: Tuesday, October 13, 2020 9:11 AM To: Shinoda, Noriyoshi (PN Japan A&PS Delivery) <noriyoshi.shinoda@hpe.com> Cc: Amit Kapila <amit.kapila16@gmail.com>; Dilip Kumar <dilipbalaut@gmail.com>; Magnus Hagander <magnus@hagander.net>; TomasVondra <tomas.vondra@2ndquadrant.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>; Ajin Cherian <itsajin@gmail.com> Subject: Re: Resetting spilled txn statistics in pg_stat_replication (Please avoid top-posting) On Mon, 12 Oct 2020 at 23:45, Shinoda, Noriyoshi (PN Japan A&PS Delivery) <noriyoshi.shinoda@hpe.com> wrote: > > Sawada-san, Thank you your comments. > > The attached patch reflects the comment. > I also made a fix for the regression test. > > Regards, > Noriyoshi Shinoda > > -----Original Message----- > From: Masahiko Sawada [mailto:masahiko.sawada@2ndquadrant.com] > Sent: Monday, October 12, 2020 8:12 PM > To: Shinoda, Noriyoshi (PN Japan A&PS Delivery) > <noriyoshi.shinoda@hpe.com> > Cc: Amit Kapila <amit.kapila16@gmail.com>; Dilip Kumar > <dilipbalaut@gmail.com>; Magnus Hagander <magnus@hagander.net>; Tomas > Vondra <tomas.vondra@2ndquadrant.com>; PostgreSQL Hackers > <pgsql-hackers@lists.postgresql.org>; Ajin Cherian <itsajin@gmail.com> > Subject: Re: Resetting spilled txn statistics in pg_stat_replication > > On Mon, 12 Oct 2020 at 18:29, Shinoda, Noriyoshi (PN Japan A&PS > Delivery) <noriyoshi.shinoda@hpe.com> wrote: > > > > Hi, thank you for the awesome feature. > > > > Thank you for reporting! > > > As it may have been discussed, I think the 'name' column in pg_stat_replication_slots is more consistent with the columnname and data type matched to the pg_replication_slots catalog. > > The attached patch changes the name and data type of the 'name' column to slot_name and 'name' type, respectively. > > It seems a good idea to me. In other system views, we use the name data type for object name. When I wrote the first patch,I borrowed the code for pg_stat_slru which uses text data for the name but I think it's an oversight. Hmm, my above observation is wrong. All other statistics use text data type and internally use char[NAMEDATALEN]. So I thinkrenaming to 'slot_name' would be a good idea but probably we don’t need to change the internally used data type. Forthe data type of slot_name of pg_stat_replication_slots view, given that the doc says the following[1], I think we cankeep it too as this view is not a system catalog. What do you think? 8.3. Character Types: The name type exists only for the storage of identifiers in the internal system catalogs [1] https://www.postgresql.org/docs/devel/datatype-character.html Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, 14 Oct 2020 at 12:03, Shinoda, Noriyoshi (PN Japan A&PS Delivery) <noriyoshi.shinoda@hpe.com> wrote: > > Thanks for your comment. > > > 8.3. Character Types: > > The name type exists only for the storage of identifiers in the internal system catalogs > > I didn't know the policy about data types. Thank you. > But I think the column names should match pg_replication_slots. > The attached patch changes only the column names and macros. > Thank you for updating the patch! The patch changes the column name from 'name' to 'slot_name' and fixes a typo. That's reasonable to me. Amit, what do you think? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 13, 2020 at 5:41 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 12 Oct 2020 at 23:45, Shinoda, Noriyoshi (PN Japan A&PS > Delivery) <noriyoshi.shinoda@hpe.com> wrote: > > > > > > > As it may have been discussed, I think the 'name' column in pg_stat_replication_slots is more consistent with the columnname and data type matched to the pg_replication_slots catalog. > > > The attached patch changes the name and data type of the 'name' column to slot_name and 'name' type, respectively. > > > > It seems a good idea to me. In other system views, we use the name data type for object name. When I wrote the firstpatch, I borrowed the code for pg_stat_slru which uses text data for the name but I think it's an oversight. > > Hmm, my above observation is wrong. All other statistics use text data > type and internally use char[NAMEDATALEN]. > AFAICS, we use name data-type in many other similar stats views like pg_stat_subscription, pg_statio_all_sequences, pg_stat_user_functions, pg_stat_all_tables. So, shouldn't we consistent with those views? -- With Regards, Amit Kapila.
RE: Resetting spilled txn statistics in pg_stat_replication
From
"Shinoda, Noriyoshi (PN Japan A&PS Delivery)"
Date:
Amit-san, Sawada-san, Thank you for your comment. > AFAICS, we use name data-type in many other similar stats views like pg_stat_subscription, pg_statio_all_sequences, pg_stat_user_functions, > pg_stat_all_tables. So, shouldn't we consistent with those views? I checked the data type used for the statistics view identity column. 'Name' type columns are used in many views. If thereis no problem with PostgreSQL standard, I would like to change both the data type and the column name. - name type pg_stat_activity.datname pg_stat_replication.usename pg_stat_subscription.subname pg_stat_database.datname pg_stat_database_conflicts.datname pg_stat_all_tables.schemaname/.relname pg_stat_all_indexes.schemaname/.relname/.indexrelname pg_statio_all_tables.schemaname/.relname pg_statio_all_indexes.schemaname/.relname/.indexname pg_statio_all_sequences.schemaname/.relname pg_stat_user_functions.schemaname/.funcname - text type pg_stat_replication_slots.name pg_stat_slru.name pg_backend_memory_contexts.name The attached patch makes the following changes. - column name: name to slot_name - data type: text to name - macro: ... CLOS to ... COLS Regards, Noriyoshi Shinoda -----Original Message----- From: Amit Kapila [mailto:amit.kapila16@gmail.com] Sent: Thursday, October 15, 2020 5:52 PM To: Masahiko Sawada <masahiko.sawada@2ndquadrant.com> Cc: Shinoda, Noriyoshi (PN Japan A&PS Delivery) <noriyoshi.shinoda@hpe.com>; Dilip Kumar <dilipbalaut@gmail.com>; MagnusHagander <magnus@hagander.net>; Tomas Vondra <tomas.vondra@2ndquadrant.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>;Ajin Cherian <itsajin@gmail.com> Subject: Re: Resetting spilled txn statistics in pg_stat_replication On Tue, Oct 13, 2020 at 5:41 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 12 Oct 2020 at 23:45, Shinoda, Noriyoshi (PN Japan A&PS > Delivery) <noriyoshi.shinoda@hpe.com> wrote: > > > > > > > As it may have been discussed, I think the 'name' column in pg_stat_replication_slots is more consistent with the columnname and data type matched to the pg_replication_slots catalog. > > > The attached patch changes the name and data type of the 'name' column to slot_name and 'name' type, respectively. > > > > It seems a good idea to me. In other system views, we use the name data type for object name. When I wrote the firstpatch, I borrowed the code for pg_stat_slru which uses text data for the name but I think it's an oversight. > > Hmm, my above observation is wrong. All other statistics use text data > type and internally use char[NAMEDATALEN]. > AFAICS, we use name data-type in many other similar stats views like pg_stat_subscription, pg_statio_all_sequences, pg_stat_user_functions,pg_stat_all_tables. So, shouldn't we consistent with those views? -- With Regards, Amit Kapila.
Attachment
On Thu, 15 Oct 2020 at 17:51, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 13, 2020 at 5:41 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Mon, 12 Oct 2020 at 23:45, Shinoda, Noriyoshi (PN Japan A&PS > > Delivery) <noriyoshi.shinoda@hpe.com> wrote: > > > > > > > > > > As it may have been discussed, I think the 'name' column in pg_stat_replication_slots is more consistent with thecolumn name and data type matched to the pg_replication_slots catalog. > > > > The attached patch changes the name and data type of the 'name' column to slot_name and 'name' type, respectively. > > > > > > It seems a good idea to me. In other system views, we use the name data type for object name. When I wrote the firstpatch, I borrowed the code for pg_stat_slru which uses text data for the name but I think it's an oversight. > > > > Hmm, my above observation is wrong. All other statistics use text data > > type and internally use char[NAMEDATALEN]. > > > > AFAICS, we use name data-type in many other similar stats views like > pg_stat_subscription, pg_statio_all_sequences, pg_stat_user_functions, > pg_stat_all_tables. So, shouldn't we consistent with those views? > Yes, they has the name data type column but it actually comes from system catalogs. For instance, here is the view definition of pg_stat_subscription: SELECT su.oid AS subid, su.subname, st.pid, st.relid, st.received_lsn, st.last_msg_send_time, st.last_msg_receipt_time, st.latest_end_lsn, st.latest_end_time FROM pg_subscription su LEFT JOIN pg_stat_get_subscription(NULL::oid) st(subid, relid, pid, received_lsn, last_msg_send_time, last_msg_receipt_time, latest _end_lsn, latest_end_time) ON st.subid = su.oid; This view uses the subscription name from pg_subscription system catalog. AFAICS no string data managed by the stats collector use name data type. It uses char[NAMEDATALEN] instead. And since I found the description about name data type in the doc, I thought it's better to have it as text. Probably since pg_stat_replication_slots (pg_stat_get_replication_slots()) is the our first view that doesn’t use system catalog to get the object name I'm okay with changing to name data type but if we do that it's better to update the description in the doc as well. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Oct 19, 2020 at 9:04 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 15 Oct 2020 at 17:51, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > AFAICS, we use name data-type in many other similar stats views like > > pg_stat_subscription, pg_statio_all_sequences, pg_stat_user_functions, > > pg_stat_all_tables. So, shouldn't we consistent with those views? > > > > Yes, they has the name data type column but it actually comes from > system catalogs. For instance, here is the view definition of > pg_stat_subscription: > > SELECT su.oid AS subid, > su.subname, > st.pid, > st.relid, > st.received_lsn, > st.last_msg_send_time, > st.last_msg_receipt_time, > st.latest_end_lsn, > st.latest_end_time > FROM pg_subscription su > LEFT JOIN pg_stat_get_subscription(NULL::oid) st(subid, relid, > pid, received_lsn, last_msg_send_time, last_msg_receipt_time, latest > _end_lsn, latest_end_time) ON st.subid = su.oid; > > This view uses the subscription name from pg_subscription system > catalog. AFAICS no string data managed by the stats collector use name > data type. It uses char[NAMEDATALEN] instead. And since I found the > description about name data type in the doc, I thought it's better to > have it as text. > Okay, I see the merit of keeping slot_name as 'text' type. I was trying to see if we can be consistent but I guess that is not so straight-forward, if we go with all (stat) view definitions to consistently display/use 'name' datatype for strings then ideally we need to change at other places as well. Also, I see that pg_stat_wal_receiver uses slot_name as 'text', so displaying the same as 'name' in pg_stat_replication_slots might not be ideal. So, let's stick with 'text' data type for slot_name which means we should go-ahead with the v3 version of the patch [1] posted by Shinoda-San, right? [1] - https://www.postgresql.org/message-id/TU4PR8401MB115297EF936A7675A5644151EE050%40TU4PR8401MB1152.NAMPRD84.PROD.OUTLOOK.COM -- With Regards, Amit Kapila.
On Mon, 19 Oct 2020 at 14:24, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Oct 19, 2020 at 9:04 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 15 Oct 2020 at 17:51, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > AFAICS, we use name data-type in many other similar stats views like > > > pg_stat_subscription, pg_statio_all_sequences, pg_stat_user_functions, > > > pg_stat_all_tables. So, shouldn't we consistent with those views? > > > > > > > Yes, they has the name data type column but it actually comes from > > system catalogs. For instance, here is the view definition of > > pg_stat_subscription: > > > > SELECT su.oid AS subid, > > su.subname, > > st.pid, > > st.relid, > > st.received_lsn, > > st.last_msg_send_time, > > st.last_msg_receipt_time, > > st.latest_end_lsn, > > st.latest_end_time > > FROM pg_subscription su > > LEFT JOIN pg_stat_get_subscription(NULL::oid) st(subid, relid, > > pid, received_lsn, last_msg_send_time, last_msg_receipt_time, latest > > _end_lsn, latest_end_time) ON st.subid = su.oid; > > > > This view uses the subscription name from pg_subscription system > > catalog. AFAICS no string data managed by the stats collector use name > > data type. It uses char[NAMEDATALEN] instead. And since I found the > > description about name data type in the doc, I thought it's better to > > have it as text. > > > > Okay, I see the merit of keeping slot_name as 'text' type. I was > trying to see if we can be consistent but I guess that is not so > straight-forward, if we go with all (stat) view definitions to > consistently display/use 'name' datatype for strings then ideally we > need to change at other places as well. Also, I see that > pg_stat_wal_receiver uses slot_name as 'text', so displaying the same > as 'name' in pg_stat_replication_slots might not be ideal. Oh, I didn't realize pg_stat_wal_receiver uses text data type. > So, let's > stick with 'text' data type for slot_name which means we should > go-ahead with the v3 version of the patch [1] posted by Shinoda-San, > right? Right. +1 Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Oct 19, 2020 at 1:20 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 19 Oct 2020 at 14:24, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > So, let's > > stick with 'text' data type for slot_name which means we should > > go-ahead with the v3 version of the patch [1] posted by Shinoda-San, > > right? > > Right. +1 > I have pushed the patch after updating the test_decoding/sql/stats. The corresponding changes in the test were missing. -- With Regards, Amit Kapila.
On Tue, 20 Oct 2020 at 20:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Oct 19, 2020 at 1:20 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Mon, 19 Oct 2020 at 14:24, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > So, let's > > > stick with 'text' data type for slot_name which means we should > > > go-ahead with the v3 version of the patch [1] posted by Shinoda-San, > > > right? > > > > Right. +1 > > > > I have pushed the patch after updating the test_decoding/sql/stats. > The corresponding changes in the test were missing. Thank you! Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Resetting spilled txn statistics in pg_stat_replication
From
"Shinoda, Noriyoshi (PN Japan A&PS Delivery)"
Date:
> I have pushed the patch after updating the test_decoding/sql/stats. > The corresponding changes in the test were missing. Thank you very much for your help! Regards, Noriyoshi Shinoda -----Original Message----- From: Masahiko Sawada [mailto:masahiko.sawada@2ndquadrant.com] Sent: Tuesday, October 20, 2020 9:24 PM To: Amit Kapila <amit.kapila16@gmail.com> Cc: Shinoda, Noriyoshi (PN Japan A&PS Delivery) <noriyoshi.shinoda@hpe.com>; Dilip Kumar <dilipbalaut@gmail.com>; MagnusHagander <magnus@hagander.net>; Tomas Vondra <tomas.vondra@2ndquadrant.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>;Ajin Cherian <itsajin@gmail.com> Subject: Re: Resetting spilled txn statistics in pg_stat_replication On Tue, 20 Oct 2020 at 20:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Oct 19, 2020 at 1:20 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Mon, 19 Oct 2020 at 14:24, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > So, let's > > > stick with 'text' data type for slot_name which means we should > > > go-ahead with the v3 version of the patch [1] posted by > > > Shinoda-San, right? > > > > Right. +1 > > > > I have pushed the patch after updating the test_decoding/sql/stats. > The corresponding changes in the test were missing. Thank you! Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Oct 13, 2020 at 10:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 13, 2020 at 10:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > I know I can go read the source code, but most users will not want to. > > Is the documentation in monitoring.sgml really sufficient? If we can't > > explain this with more precision, is it really a number we want to expose > > at all? > > > > This counter is important to give users an idea about the amount of > I/O we incur during decoding and to tune logical_decoding_work_mem > GUC. So, I would prefer to improve the documentation for this > variable. > I have modified the description of spill_count and spill_txns to make things clear. Any suggestions? -- With Regards, Amit Kapila.
Attachment
On Wed, 21 Oct 2020 at 12:56, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 13, 2020 at 10:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Oct 13, 2020 at 10:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > > > > I know I can go read the source code, but most users will not want to. > > > Is the documentation in monitoring.sgml really sufficient? If we can't > > > explain this with more precision, is it really a number we want to expose > > > at all? > > > > > > > This counter is important to give users an idea about the amount of > > I/O we incur during decoding and to tune logical_decoding_work_mem > > GUC. So, I would prefer to improve the documentation for this > > variable. > > > > I have modified the description of spill_count and spill_txns to make > things clear. Any suggestions? Thank you for the patch. - logical decoding exceeds <literal>logical_decoding_work_mem</literal>. The - counter gets incremented both for toplevel transactions and - subtransactions. + logical decoding of changes from WAL for this exceeds + <literal>logical_decoding_work_mem</literal>. The counter gets + incremented both for toplevel transactions and subtransactions. What is the word "this" in the above change referring to? How about something like: Number of transactions spilled to disk after the memory used by logical decoding of changes from WAL exceeding logical_decoding_work_mem. The counter gets incremented both for toplevel transactions and subtransactions. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Oct 22, 2020 at 4:09 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 21 Oct 2020 at 12:56, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Oct 13, 2020 at 10:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Oct 13, 2020 at 10:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > > > > > > > I know I can go read the source code, but most users will not want to. > > > > Is the documentation in monitoring.sgml really sufficient? If we can't > > > > explain this with more precision, is it really a number we want to expose > > > > at all? > > > > > > > > > > This counter is important to give users an idea about the amount of > > > I/O we incur during decoding and to tune logical_decoding_work_mem > > > GUC. So, I would prefer to improve the documentation for this > > > variable. > > > > > > > I have modified the description of spill_count and spill_txns to make > > things clear. Any suggestions? > > Thank you for the patch. > > - logical decoding exceeds > <literal>logical_decoding_work_mem</literal>. The > - counter gets incremented both for toplevel transactions and > - subtransactions. > + logical decoding of changes from WAL for this exceeds > + <literal>logical_decoding_work_mem</literal>. The counter gets > + incremented both for toplevel transactions and subtransactions. > > What is the word "this" in the above change referring to? > 'slot'. The word *slot* is missing in the sentence. > How about > something like: > > Number of transactions spilled to disk after the memory used by > logical decoding of changes from WAL exceeding > logical_decoding_work_mem. The counter gets incremented both for > toplevel transactions and subtransactions. > /exceeding/exceeds. I am fine with your proposed text as well but if you like the above after correction that would be better because it would be more close to spill_count description. -- With Regards, Amit Kapila.
On Thu, 22 Oct 2020 at 20:34, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 22, 2020 at 4:09 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 21 Oct 2020 at 12:56, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Oct 13, 2020 at 10:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Tue, Oct 13, 2020 at 10:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > > > > > > > > > > I know I can go read the source code, but most users will not want to. > > > > > Is the documentation in monitoring.sgml really sufficient? If we can't > > > > > explain this with more precision, is it really a number we want to expose > > > > > at all? > > > > > > > > > > > > > This counter is important to give users an idea about the amount of > > > > I/O we incur during decoding and to tune logical_decoding_work_mem > > > > GUC. So, I would prefer to improve the documentation for this > > > > variable. > > > > > > > > > > I have modified the description of spill_count and spill_txns to make > > > things clear. Any suggestions? > > > > Thank you for the patch. > > > > - logical decoding exceeds > > <literal>logical_decoding_work_mem</literal>. The > > - counter gets incremented both for toplevel transactions and > > - subtransactions. > > + logical decoding of changes from WAL for this exceeds > > + <literal>logical_decoding_work_mem</literal>. The counter gets > > + incremented both for toplevel transactions and subtransactions. > > > > What is the word "this" in the above change referring to? > > > > 'slot'. The word *slot* is missing in the sentence. > > > How about > > something like: > > > > > Number of transactions spilled to disk after the memory used by > > logical decoding of changes from WAL exceeding > > logical_decoding_work_mem. The counter gets incremented both for > > toplevel transactions and subtransactions. > > > > /exceeding/exceeds. I am fine with your proposed text as well but if > you like the above after correction that would be better because it > would be more close to spill_count description. yeah, I agree with the correction. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Oct 23, 2020 at 7:42 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 22 Oct 2020 at 20:34, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I have modified the description of spill_count and spill_txns to make > > > > things clear. Any suggestions? > > > > > > Thank you for the patch. > > > > > > - logical decoding exceeds > > > <literal>logical_decoding_work_mem</literal>. The > > > - counter gets incremented both for toplevel transactions and > > > - subtransactions. > > > + logical decoding of changes from WAL for this exceeds > > > + <literal>logical_decoding_work_mem</literal>. The counter gets > > > + incremented both for toplevel transactions and subtransactions. > > > > > > What is the word "this" in the above change referring to? > > > > > > > 'slot'. The word *slot* is missing in the sentence. > > > > > How about > > > something like: > > > > > > > > Number of transactions spilled to disk after the memory used by > > > logical decoding of changes from WAL exceeding > > > logical_decoding_work_mem. The counter gets incremented both for > > > toplevel transactions and subtransactions. > > > > > > > /exceeding/exceeds. I am fine with your proposed text as well but if > > you like the above after correction that would be better because it > > would be more close to spill_count description. > > yeah, I agree with the correction. > Okay, thanks, attached is an updated patch. I'll push this early next week unless you or someone else has any comments/suggestions. -- With Regards, Amit Kapila.
Attachment
On Fri, Oct 23, 2020 at 8:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Oct 23, 2020 at 7:42 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 22 Oct 2020 at 20:34, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > I have modified the description of spill_count and spill_txns to make > > > > > things clear. Any suggestions? > > > > > > > > Thank you for the patch. > > > > > > > > - logical decoding exceeds > > > > <literal>logical_decoding_work_mem</literal>. The > > > > - counter gets incremented both for toplevel transactions and > > > > - subtransactions. > > > > + logical decoding of changes from WAL for this exceeds > > > > + <literal>logical_decoding_work_mem</literal>. The counter gets > > > > + incremented both for toplevel transactions and subtransactions. > > > > > > > > What is the word "this" in the above change referring to? > > > > > > > > > > 'slot'. The word *slot* is missing in the sentence. > > > > > > > How about > > > > something like: > > > > > > > > > > > Number of transactions spilled to disk after the memory used by > > > > logical decoding of changes from WAL exceeding > > > > logical_decoding_work_mem. The counter gets incremented both for > > > > toplevel transactions and subtransactions. > > > > > > > > > > /exceeding/exceeds. I am fine with your proposed text as well but if > > > you like the above after correction that would be better because it > > > would be more close to spill_count description. > > > > yeah, I agree with the correction. > > > > Okay, thanks, attached is an updated patch. > While updating the streaming stats patch, it occurred to me that we can write a better description spill_bytes as well. Attached contains the update to spill_bytes description. -- With Regards, Amit Kapila.
Attachment
On Fri, Oct 23, 2020 at 10:45:34AM +0530, Amit Kapila wrote: > On Fri, Oct 23, 2020 at 8:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Oct 23, 2020 at 7:42 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Thu, 22 Oct 2020 at 20:34, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > I have modified the description of spill_count and spill_txns to make > > > > > > things clear. Any suggestions? > > > > > > > > > > Thank you for the patch. > > > > > > > > > > - logical decoding exceeds > > > > > <literal>logical_decoding_work_mem</literal>. The > > > > > - counter gets incremented both for toplevel transactions and > > > > > - subtransactions. > > > > > + logical decoding of changes from WAL for this exceeds > > > > > + <literal>logical_decoding_work_mem</literal>. The counter gets > > > > > + incremented both for toplevel transactions and subtransactions. > > > > > > > > > > What is the word "this" in the above change referring to? > > > > > > > > > > > > > 'slot'. The word *slot* is missing in the sentence. > > > > > > > > > How about > > > > > something like: > > > > > > > > > > > > > > Number of transactions spilled to disk after the memory used by > > > > > logical decoding of changes from WAL exceeding > > > > > logical_decoding_work_mem. The counter gets incremented both for > > > > > toplevel transactions and subtransactions. > > > > > > > > > > > > > /exceeding/exceeds. I am fine with your proposed text as well but if > > > > you like the above after correction that would be better because it > > > > would be more close to spill_count description. > > > > > > yeah, I agree with the correction. > > > > > > > Okay, thanks, attached is an updated patch. > > > > While updating the streaming stats patch, it occurred to me that we > can write a better description spill_bytes as well. Attached contains > the update to spill_bytes description. + This and other spill + counters can be used to gauge the I/O occurred during logical decoding + and accordingly can tune <literal>logical_decoding_work_mem</literal>. "gauge the IO occurred" is wrong. Either: I/O *which* occured, or I/O occurring, or occurs. "can tune" should say "allow tuning". Like: + This and other spill + counters can be used to gauge the I/O which occurs during logical decoding + and accordingly allow tuning of <literal>logical_decoding_work_mem</literal>. - Number of times transactions were spilled to disk. Transactions + Number of times transactions were spilled to disk while performing + decoding of changes from WAL for this slot. Transactions What about: "..while decoding changes.." (remove "performing" and "of"). -- Justin
On Tue, Oct 27, 2020 at 8:51 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Fri, Oct 23, 2020 at 10:45:34AM +0530, Amit Kapila wrote: > > On Fri, Oct 23, 2020 at 8:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > While updating the streaming stats patch, it occurred to me that we > > can write a better description spill_bytes as well. Attached contains > > the update to spill_bytes description. > > + This and other spill > + counters can be used to gauge the I/O occurred during logical decoding > + and accordingly can tune <literal>logical_decoding_work_mem</literal>. > > "gauge the IO occurred" is wrong. > Either: I/O *which* occured, or I/O occurring, or occurs. > > "can tune" should say "allow tuning". > > Like: > + This and other spill > + counters can be used to gauge the I/O which occurs during logical decoding > + and accordingly allow tuning of <literal>logical_decoding_work_mem</literal>. > > - Number of times transactions were spilled to disk. Transactions > + Number of times transactions were spilled to disk while performing > + decoding of changes from WAL for this slot. Transactions > > What about: "..while decoding changes.." (remove "performing" and "of"). > All of your suggestions sound good to me. Find the patch attached to update the docs accordingly. -- With Regards, Amit Kapila.
Attachment
On Wed, Oct 28, 2020 at 5:26 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Tue, Oct 27, 2020 at 06:19:42PM -0500, Justin Pryzby wrote: > >On Tue, Oct 27, 2020 at 09:17:43AM +0530, Amit Kapila wrote: > >> On Tue, Oct 27, 2020 at 8:51 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > >> > > >> > On Fri, Oct 23, 2020 at 10:45:34AM +0530, Amit Kapila wrote: > >> > > On Fri, Oct 23, 2020 at 8:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > > > >> > > While updating the streaming stats patch, it occurred to me that we > >> > > can write a better description spill_bytes as well. Attached contains > >> > > the update to spill_bytes description. > >> > > >> > + This and other spill > >> > + counters can be used to gauge the I/O occurred during logical decoding > >> > + and accordingly can tune <literal>logical_decoding_work_mem</literal>. > >> > > >> > "gauge the IO occurred" is wrong. > >> > Either: I/O *which* occured, or I/O occurring, or occurs. > >> > > >> > "can tune" should say "allow tuning". > >> > > >> > Like: > >> > + This and other spill > >> > + counters can be used to gauge the I/O which occurs during logical decoding > >> > + and accordingly allow tuning of <literal>logical_decoding_work_mem</literal>. > >> > > >> > - Number of times transactions were spilled to disk. Transactions > >> > + Number of times transactions were spilled to disk while performing > >> > + decoding of changes from WAL for this slot. Transactions > >> > > >> > What about: "..while decoding changes.." (remove "performing" and "of"). > >> > > >> > >> All of your suggestions sound good to me. Find the patch attached to > >> update the docs accordingly. > > > >@@ -2628,8 +2627,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i > > <para> > > Amount of decoded transaction data spilled to disk while performing > > decoding of changes from WAL for this slot. This and other spill > >- counters can be used to gauge the I/O occurred during logical decoding > >- and accordingly can tune <literal>logical_decoding_work_mem</literal>. > >+ counters can be used to gauge the I/O which occurred during logical > >+ decoding and accordingly allow tuning <literal>logical_decoding_work_mem</literal>. > > > >Now that I look again, maybe remove "accordingly" ? > > > > Yeah, the 'accordingly' seems rather unnecessary here. Let's remove it. > Removed and pushed. -- With Regards, Amit Kapila.