Thread: Access statistics

Access statistics

From
Jan Wieck
Date:
Here it comes.

    The views created for now are:

    pg_stat_activity

        Current  connections to the entire installation including
        actual querystrings.

    pg_stat_database

        Per database IO summary.

    pg_stat_{all|sys|user}_tables

        Per table scan statistics.

    pg_statio_{all|sys|user}_tables

        Per table buffer cache statistics.

    pg_stat_{all|sys|user}_indexes

        Per index scan statistics.

    pg_statio_{all|sys|user}_indexes

        Per index buffer cache statistics.

    pg_statio_{all|sys|user}_indexes

        Per sequence buffer cache statistics.

    Of course, an initdb  is  required  (this  patch  up  to  now
    doesn't bump catversion).

    The rules regression test is broken due to the new views from
    initdb.

    The existing functionality to reset stats isn't callable yet.
    Should become a utility command someday.

    Collecting   must   be  configurable  per  database  (catalog
    change).

    Due to all the changes that  have  happened  (ever  tried  to
    apply  a  130K patch after a pgindent run?) I must verify all
    this patch touches once more.

    Have fun.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Attachment

Re: Access statistics

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> Here it comes.

Hm.  I really don't like the way you've uglified the heap_fetch
interface; can't that be done better?  It's just plain bizarre that
heap_fetch is now responsible for counting stats for index scans.
And surely this bit of code is wrong:

          *userbuf = buffer;
+
+         if (iscan != NULL)
+             pgstat_count_heap_fetch(&iscan->xs_pgstat_info);
+         else
+             pgstat_count_heap_fetch(&relation->pgstat_info);
      }

You can't count heapscan info via the relcache entry --- what if there's
more than one heapscan in progress on the same rel?  What if the
relcache entry gets rebuilt by SI invalidation?  I don't think the stats
stuff should touch the relcache at all.

There's been talk in the past of trying to unify the heapscan and
indexscan APIs, so that we wouldn't need ugliness like the two sets of
coding in AttrDefaultFetch and similar places.  Maybe it's time to bite
that bullet, if it'd provide a cleaner place to put the stats calls.

Also, I do not like the pgStatPmPipe mechanism for detecting postmaster
shutdown.  That eats two file descriptors in every backend, which is a
pretty substantial waste of resources.  Why not just have the postmaster
send a signal to the stats collector when it's time to shut down?

BTW, PG_FUNCTION_INFO_V1() is not needed for built-in functions.

            regards, tom lane

Re: Access statistics

From
Jan Wieck
Date:
Tom Lane wrote:
> Jan Wieck <JanWieck@Yahoo.com> writes:
> > Here it comes.
>
> Hm.  I really don't like the way you've uglified the heap_fetch
> interface; can't that be done better?  It's just plain bizarre that
> heap_fetch is now responsible for counting stats for index scans.
> And surely this bit of code is wrong:
>
>         *userbuf = buffer;
> +
> +       if (iscan != NULL)
> +            pgstat_count_heap_fetch(&iscan->xs_pgstat_info);
> +       else
> +            pgstat_count_heap_fetch(&relation->pgstat_info);
>    }
>
> You can't count heapscan info via the relcache entry --- what if there's
> more than one heapscan in progress on the same rel?  What if the
> relcache entry gets rebuilt by SI invalidation?  I don't think the stats
> stuff should touch the relcache at all.

    The  counting end's up in one and the same global slot of the
    tabstat messages. These  slots  are  collected  per  frontend
    command  and are identified by the tables oid. Does it matter
    how often the pointer to that slot is  updated  to  the  same
    value?

>
> There's been talk in the past of trying to unify the heapscan and
> indexscan APIs, so that we wouldn't need ugliness like the two sets of
> coding in AttrDefaultFetch and similar places.  Maybe it's time to bite
> that bullet, if it'd provide a cleaner place to put the stats calls.

    Exactly  that's  the  problem. Only heap_fetch() knows if the
    tid passed in is visible or not, so right  now  the  counting
    could  be  done there or in all the places where heap_fetch()
    is called.

>
> Also, I do not like the pgStatPmPipe mechanism for detecting postmaster
> shutdown.  That eats two file descriptors in every backend, which is a
> pretty substantial waste of resources.  Why not just have the postmaster
> send a signal to the stats collector when it's time to shut down?

    Don't kill -9 the postmaster :-)

    Well, we could do the signal *plus* having the child checking
    from  time to time with a zero kill if the postmaster died to
    suizide in case.

>
> BTW, PG_FUNCTION_INFO_V1() is not needed for built-in functions.

    Survived somehow from  the  time  where  they  resided  in  a
    loadable module.


Jan


--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


Re: Access statistics

From
Tom Lane
Date:
Jan Wieck <JanWieck@yahoo.com> writes:
>> You can't count heapscan info via the relcache entry --- what if there's
>> more than one heapscan in progress on the same rel?  What if the
>> relcache entry gets rebuilt by SI invalidation?  I don't think the stats
>> stuff should touch the relcache at all.

>     The  counting end's up in one and the same global slot of the
>     tabstat messages. These  slots  are  collected  per  frontend
>     command  and are identified by the tables oid. Does it matter
>     how often the pointer to that slot is  updated  to  the  same
>     value?

In that case, forget about the fields in the heapscan and indexscan
structs.  You can easily keep both the heap and index access counts
in (or referenced by) the relcache entry, no?  It just bothers me
that the treatment of heap and index counts is so inconsistent.
One or the other is wrong, or at least inefficient.

BTW, does it really matter whether we are counting an index-driven
or heap-driven fetch?  Seems to me that if you count indextuple and
heaptuple fetches, you can figure out what's going on without having
to have heap_fetch know the difference.

>> Also, I do not like the pgStatPmPipe mechanism for detecting postmaster
>> shutdown.  That eats two file descriptors in every backend, which is a
>> pretty substantial waste of resources.  Why not just have the postmaster
>> send a signal to the stats collector when it's time to shut down?

>     Don't kill -9 the postmaster :-)

What's your point?  If the postmaster crashes, whether the stats
collector goes away without help seems like the least of your worries.
You'll end up killing backends manually anyhow, so why not the stats
process too?  (For that matter, do you care if the stats process doesn't
die?  Once there is no one left to send messages to it, it'll never do
anything, no?)

            regards, tom lane

Re: Access statistics

From
Jan Wieck
Date:
Tom Lane wrote:
> Jan Wieck <JanWieck@yahoo.com> writes:
> >> You can't count heapscan info via the relcache entry --- what if there's
> >> more than one heapscan in progress on the same rel?  What if the
> >> relcache entry gets rebuilt by SI invalidation?  I don't think the stats
> >> stuff should touch the relcache at all.
>
> >     The  counting end's up in one and the same global slot of the
> >     tabstat messages. These  slots  are  collected  per  frontend
> >     command  and are identified by the tables oid. Does it matter
> >     how often the pointer to that slot is  updated  to  the  same
> >     value?
>
> In that case, forget about the fields in the heapscan and indexscan
> structs.  You can easily keep both the heap and index access counts
> in (or referenced by) the relcache entry, no?  It just bothers me
> that the treatment of heap and index counts is so inconsistent.
> One or the other is wrong, or at least inefficient.

    heap  tuples  fetched  due  to  an  index scan are counted as
    tuples_fetched on the indexes entry. Only heap_fetch()  calls
    not  done  due to an index scan count into the tuples_fetched
    count for the  table  (e.g.  tuples  refetched  for  deferred
    trigger invocation and the like).

>
> BTW, does it really matter whether we are counting an index-driven
> or heap-driven fetch?  Seems to me that if you count indextuple and
> heaptuple fetches, you can figure out what's going on without having
> to have heap_fetch know the difference.
>
> >> Also, I do not like the pgStatPmPipe mechanism for detecting postmaster
> >> shutdown.  That eats two file descriptors in every backend, which is a
> >> pretty substantial waste of resources.  Why not just have the postmaster
> >> send a signal to the stats collector when it's time to shut down?
>
> >     Don't kill -9 the postmaster :-)
>
> What's your point?  If the postmaster crashes, whether the stats
> collector goes away without help seems like the least of your worries.
> You'll end up killing backends manually anyhow, so why not the stats
> process too?  (For that matter, do you care if the stats process doesn't
> die?  Once there is no one left to send messages to it, it'll never do
> anything, no?)

    Except  for  generating  another FAQ. We haven't been able to
    stop ppl from killing the postmaster up to now.  Why  do  you
    think they'll ever learn? They are human ...


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com