Thread: Re: [GENERAL] autoanalyze criteria

Re: [GENERAL] autoanalyze criteria

From
Jeff Janes
Date:
Moved discussion from General To Hackers.

On Sat, Feb 23, 2013 at 10:41 AM, Stefan Andreatta <s.andreatta@synedra.com> wrote:

On 02/23/2013 05:10 PM, Jeff Janes wrote:

Sorry, I got tunnel vision about the how the threshold was computed, and forgot about the thing it was compared to.  There is a "secret" data point in the stats collector called changes_since_analyze.  This is not exposed in the pg_stat_user_tables.  But I think it should be as I often have wanted to see it.



Sounds like a very good idea to me - any way I could help to make such a thing happen?


It should be fairly easy to implement because the other columns are already there to show you the way, and if you want to try your hand at hacking pgsql it would be a good introduction to doing so.

Look at each instance in the code of n_dead_dup and pg_stat_get_dead_tuples, and those are the places where changes_since_analyze also need to be addressed, in an analogous manner (assuming it is isn't already there.)

git grep 'n_dead_tup'

It looks like we would need to add an SQL function to retrieve the data, then incorporate that function into the view definitions that make up the pg_stat_user_tables etc. views.  and of course update the regression test and the documentation.

Other than implementing it, we would need to convince other hackers that this is desirable to have.  I'm not sure how hard that would be.  I've looked in the archives to see if this idea was already considered but rejected, but I don't see any indication that it was previously considered.

(http://www.postgresql.org/message-id/4823.1262132964@sss.pgh.pa.us).

Cheers,

Jeff

Re: [GENERAL] autoanalyze criteria

From
Stefan Andreatta
Date:

On 02/23/2013 09:30 PM, Jeff Janes wrote:
Moved discussion from General To Hackers.

On Sat, Feb 23, 2013 at 10:41 AM, Stefan Andreatta <s.andreatta@synedra.com> wrote:

On 02/23/2013 05:10 PM, Jeff Janes wrote:

Sorry, I got tunnel vision about the how the threshold was computed, and forgot about the thing it was compared to.  There is a "secret" data point in the stats collector called changes_since_analyze.  This is not exposed in the pg_stat_user_tables.  But I think it should be as I often have wanted to see it.



Sounds like a very good idea to me - any way I could help to make such a thing happen?


It should be fairly easy to implement because the other columns are already there to show you the way, and if you want to try your hand at hacking pgsql it would be a good introduction to doing so.

Look at each instance in the code of n_dead_dup and pg_stat_get_dead_tuples, and those are the places where changes_since_analyze also need to be addressed, in an analogous manner (assuming it is isn't already there.)

git grep 'n_dead_tup'

It looks like we would need to add an SQL function to retrieve the data, then incorporate that function into the view definitions that make up the pg_stat_user_tables etc. views.  and of course update the regression test and the documentation.

Other than implementing it, we would need to convince other hackers that this is desirable to have.  I'm not sure how hard that would be.  I've looked in the archives to see if this idea was already considered but rejected, but I don't see any indication that it was previously considered.

(http://www.postgresql.org/message-id/4823.1262132964@sss.pgh.pa.us).

Cheers,

Jeff

Not being a developer, I am afraid, I will not be going to implement it myself - nor would anybody wish so ;-)

I also searched the archives, but the closest I found is a discussion on the Admin List starting here:
http://www.postgresql.org/message-id/626919622.7634700.1351695913466.JavaMail.root@alaloop.com

On the other hand, there is quite a lot of discussion about making autoanalyze more (or less) aggressive - which seems a difficult task to me, when you cannot even check what's triggering your autoanalyze.

Anybody else interested?

Regards,
Stefan

Re: [GENERAL] autoanalyze criteria

From
Mark Kirkwood
Date:
On 24/02/13 10:12, Stefan Andreatta wrote:
>
> On 02/23/2013 09:30 PM, Jeff Janes wrote:
>> Moved discussion from General To Hackers.
>>
>> On Sat, Feb 23, 2013 at 10:41 AM, Stefan Andreatta
>> <s.andreatta@synedra.com <mailto:s.andreatta@synedra.com>> wrote:
>>
>>
>>     On 02/23/2013 05:10 PM, Jeff Janes wrote:
>>>
>>>     Sorry, I got tunnel vision about the how the threshold was
>>>     computed, and forgot about the thing it was compared to.  There
>>>     is a "secret" data point in the stats collector
>>>     called changes_since_analyze.  This is not exposed in the
>>>     pg_stat_user_tables.  But I think it should be as I often have
>>>     wanted to see it.
>>>
>>>
>>
>>     Sounds like a very good idea to me - any way I could help to make
>>     such a thing happen?
>>
>>
>>
>> It should be fairly easy to implement because the other columns are
>> already there to show you the way, and if you want to try your hand at
>> hacking pgsql it would be a good introduction to doing so.
>>
>> Look at each instance in the code of n_dead_dup and
>> pg_stat_get_dead_tuples, and those are the places where
>> changes_since_analyze also need to be addressed, in an analogous
>> manner (assuming it is isn't already there.)
>>
>> git grep 'n_dead_tup'
>>
>> It looks like we would need to add an SQL function to retrieve the
>> data, then incorporate that function into the view definitions that
>> make up the pg_stat_user_tables etc. views. and of course update the
>> regression test and the documentation.
>>
>> Other than implementing it, we would need to convince other hackers
>> that this is desirable to have.  I'm not sure how hard that would be.
>> I've looked in the archives to see if this idea was already considered
>> but rejected, but I don't see any indication that it was previously
>> considered.
>>
>> (http://www.postgresql.org/message-id/4823.1262132964@sss.pgh.pa.us).
>>
>> Cheers,
>>
>> Jeff
>
> Not being a developer, I am afraid, I will not be going to implement it
> myself - nor would anybody wish so ;-)
>
> I also searched the archives, but the closest I found is a discussion on
> the Admin List starting here:
> http://www.postgresql.org/message-id/626919622.7634700.1351695913466.JavaMail.root@alaloop.com
>
> On the other hand, there is quite a lot of discussion about making
> autoanalyze more (or less) aggressive - which seems a difficult task to
> me, when you cannot even check what's triggering your autoanalyze.
>
> Anybody else interested?
>

I was asked about this exact thing the other day - it would be very nice 
to have the information visible. I may take a look at doing it (I've 
done some hacking on the stats system previously). However don't let 
that put anyone else off - as I'll have to find the time to start :-)

Regards

Mark




Re: [GENERAL] autoanalyze criteria

From
Mark Kirkwood
Date:
On 24/02/13 10:51, Mark Kirkwood wrote:
> On 24/02/13 10:12, Stefan Andreatta wrote:
>>
>> On 02/23/2013 09:30 PM, Jeff Janes wrote:
>>> Moved discussion from General To Hackers.
>>>
>>> On Sat, Feb 23, 2013 at 10:41 AM, Stefan Andreatta
>>> <s.andreatta@synedra.com <mailto:s.andreatta@synedra.com>> wrote:
>>>
>>>
>>>     On 02/23/2013 05:10 PM, Jeff Janes wrote:
>>>>
>>>>     Sorry, I got tunnel vision about the how the threshold was
>>>>     computed, and forgot about the thing it was compared to.  There
>>>>     is a "secret" data point in the stats collector
>>>>     called changes_since_analyze.  This is not exposed in the
>>>>     pg_stat_user_tables.  But I think it should be as I often have
>>>>     wanted to see it.
>>>>
>>>>
>>>
>>>     Sounds like a very good idea to me - any way I could help to make
>>>     such a thing happen?
>>>
>>>
>>>
>>> It should be fairly easy to implement because the other columns are
>>> already there to show you the way, and if you want to try your hand at
>>> hacking pgsql it would be a good introduction to doing so.
>>>
>>> Look at each instance in the code of n_dead_dup and
>>> pg_stat_get_dead_tuples, and those are the places where
>>> changes_since_analyze also need to be addressed, in an analogous
>>> manner (assuming it is isn't already there.)
>>>
>>> git grep 'n_dead_tup'
>>>
>>> It looks like we would need to add an SQL function to retrieve the
>>> data, then incorporate that function into the view definitions that
>>> make up the pg_stat_user_tables etc. views. and of course update the
>>> regression test and the documentation.
>>>
>>> Other than implementing it, we would need to convince other hackers
>>> that this is desirable to have.  I'm not sure how hard that would be.
>>> I've looked in the archives to see if this idea was already considered
>>> but rejected, but I don't see any indication that it was previously
>>> considered.
>>>
>>> (http://www.postgresql.org/message-id/4823.1262132964@sss.pgh.pa.us).
>>>
>>> Cheers,
>>>
>>> Jeff
>>
>> Not being a developer, I am afraid, I will not be going to implement it
>> myself - nor would anybody wish so ;-)
>>
>> I also searched the archives, but the closest I found is a discussion on
>> the Admin List starting here:
>> http://www.postgresql.org/message-id/626919622.7634700.1351695913466.JavaMail.root@alaloop.com
>>
>>
>> On the other hand, there is quite a lot of discussion about making
>> autoanalyze more (or less) aggressive - which seems a difficult task to
>> me, when you cannot even check what's triggering your autoanalyze.
>>
>> Anybody else interested?
>>
>
> I was asked about this exact thing the other day - it would be very
> nice to have the information visible. I may take a look at doing it
> (I've done some hacking on the stats system previously). However don't
> let that put anyone else off - as I'll have to find the time to start :-)
>
>
>

I happened to be looking at the whole autovacuum/analyze setup in
another context - which reminded me about volunteering to take a look at
a patch for adding changes_since_analyze. So with probably impeccably
poor timing (smack in the middle of 9.3 beta), here is a patch that does
that (so it is probably an early 9.4 addition).

I've called the column "n_changes_since_analyze" - I can sense that
there might be discussion about how to maybe shorten that :-) , and
added a doc line for the view + updated the regression test expected input.

Regards

Mark


Attachment

Re: [GENERAL] autoanalyze criteria

From
Magnus Hagander
Date:
On Wed, May 15, 2013 at 2:33 AM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:
> On 24/02/13 10:51, Mark Kirkwood wrote:
>>
>> On 24/02/13 10:12, Stefan Andreatta wrote:
>>>
>>>
>>> On 02/23/2013 09:30 PM, Jeff Janes wrote:
>>>>
>>>> Moved discussion from General To Hackers.
>>>>
>>>> On Sat, Feb 23, 2013 at 10:41 AM, Stefan Andreatta
>>>> <s.andreatta@synedra.com <mailto:s.andreatta@synedra.com>> wrote:
>>>>
>>>>
>>>>     On 02/23/2013 05:10 PM, Jeff Janes wrote:
>>>>>
>>>>>
>>>>>     Sorry, I got tunnel vision about the how the threshold was
>>>>>     computed, and forgot about the thing it was compared to.  There
>>>>>     is a "secret" data point in the stats collector
>>>>>     called changes_since_analyze.  This is not exposed in the
>>>>>     pg_stat_user_tables.  But I think it should be as I often have
>>>>>     wanted to see it.
>>>>>
>>>>>
>>>>
>>>>     Sounds like a very good idea to me - any way I could help to make
>>>>     such a thing happen?
>>>>
>>>>
>>>>
>>>> It should be fairly easy to implement because the other columns are
>>>> already there to show you the way, and if you want to try your hand at
>>>> hacking pgsql it would be a good introduction to doing so.
>>>>
>>>> Look at each instance in the code of n_dead_dup and
>>>> pg_stat_get_dead_tuples, and those are the places where
>>>> changes_since_analyze also need to be addressed, in an analogous
>>>> manner (assuming it is isn't already there.)
>>>>
>>>> git grep 'n_dead_tup'
>>>>
>>>> It looks like we would need to add an SQL function to retrieve the
>>>> data, then incorporate that function into the view definitions that
>>>> make up the pg_stat_user_tables etc. views. and of course update the
>>>> regression test and the documentation.
>>>>
>>>> Other than implementing it, we would need to convince other hackers
>>>> that this is desirable to have.  I'm not sure how hard that would be.
>>>> I've looked in the archives to see if this idea was already considered
>>>> but rejected, but I don't see any indication that it was previously
>>>> considered.
>>>>
>>>> (http://www.postgresql.org/message-id/4823.1262132964@sss.pgh.pa.us).
>>>>
>>>> Cheers,
>>>>
>>>> Jeff
>>>
>>>
>>> Not being a developer, I am afraid, I will not be going to implement it
>>> myself - nor would anybody wish so ;-)
>>>
>>> I also searched the archives, but the closest I found is a discussion on
>>> the Admin List starting here:
>>>
>>> http://www.postgresql.org/message-id/626919622.7634700.1351695913466.JavaMail.root@alaloop.com
>>>
>>> On the other hand, there is quite a lot of discussion about making
>>> autoanalyze more (or less) aggressive - which seems a difficult task to
>>> me, when you cannot even check what's triggering your autoanalyze.
>>>
>>> Anybody else interested?
>>>
>>
>> I was asked about this exact thing the other day - it would be very nice
>> to have the information visible. I may take a look at doing it (I've done
>> some hacking on the stats system previously). However don't let that put
>> anyone else off - as I'll have to find the time to start :-)
>>
>>
>>
>
> I happened to be looking at the whole autovacuum/analyze setup in another
> context - which reminded me about volunteering to take a look at a patch for
> adding changes_since_analyze. So with probably impeccably poor timing (smack
> in the middle of 9.3 beta), here is a patch that does that (so it is
> probably an early 9.4 addition).
>
> I've called the column "n_changes_since_analyze" - I can sense that there
> might be discussion about how to maybe shorten that :-) , and added a doc
> line for the view + updated the regression test expected input.

Applied, with the changs suggested by Laurenz Albe in his review.

Thanks!


--Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/