Thread: "Transaction local" statistics are incorrect at speed

"Transaction local" statistics are incorrect at speed

From
Tom Lane
Date:
My Salesforce colleague Teja Mupparti found an interesting bug.
Consider the following example:

drop table if exists test;
create table test(i int);

insert into test values(1);

select pg_sleep(1);

begin;
insert into test values(2);
insert into test values(3);
select pg_stat_get_xact_tuples_inserted('test'::regclass);
commit;

select pg_sleep(1);

begin;
insert into test values(4);
insert into test values(5);
select pg_stat_get_xact_tuples_inserted('test'::regclass);
commit;

If you do this by hand, or with the above script verbatim, the
pg_stat_get_xact_tuples_inserted() calls both report "2", which
is what you'd expect: the counts are supposed to reflect rows
inserted in the current transaction.

However, if you take out the pg_sleep calls, you get entirely
different results, and soon realize that the counts are including
the previous transactions!

The reason for this is that pgstat_report_stat() includes a delay
check, such that it doesn't ship off statistics counts to the collector
unless at least 500 ms have elapsed since the last report.  Without
the sleeps, the later transactions execute while the previous
transactions' counts are still being held locally, *and those counts
get included into the reported totals*.

This seems like a pretty clear bug to me; does anyone want to argue
that it isn't?

In the case of pg_stat_get_xact_tuples_inserted and a couple of other
routines, it would be entirely trivial to fix: just ignore
tabentry->t_counts.t_tuples_inserted (which is the count held over from
previous transactions) and only total the trans->tuples_inserted counters.
However, for a number of other counters such as blocks_fetched, we don't
store transaction-local counts separately, and would have to start doing
so if we wanted to make these functions work as documented.

Thoughts?  I have other things to do right now than fix this myself.
        regards, tom lane



Re: "Transaction local" statistics are incorrect at speed

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> This seems like a pretty clear bug to me; does anyone want to argue
> that it isn't?

I'd agree that it's a bug.

> In the case of pg_stat_get_xact_tuples_inserted and a couple of other
> routines, it would be entirely trivial to fix: just ignore
> tabentry->t_counts.t_tuples_inserted (which is the count held over from
> previous transactions) and only total the trans->tuples_inserted counters.
> However, for a number of other counters such as blocks_fetched, we don't
> store transaction-local counts separately, and would have to start doing
> so if we wanted to make these functions work as documented.

I've been thinking for a while that we need more of this (storing and
aggregating statistics data locally first) or we're never going to be
able to add more information like the stddev, etc, that's been
requested.  I'd really like to see us collecting a great deal more
regarding stats and that's not going to happen unless we can cache and
aggregate locally first.

> Thoughts?  I have other things to do right now than fix this myself.

While a bug, I don't feel that it's a particularly high priority one.
What worries me more is the risk of breaking things if we do backpatch
a fix that postpones updating some information that used to be
reflected immediately.  It seems like we'd need/want some kind of
generalized structure for this too, which would be even more work and
which has zero chance for 9.4, meaning that it'll need to be fixed one
way now and then whacked around in 9.5 (assuming someone has the time,
etc..).

Not a good situation. :/
Thanks,
    Stephen