Thread: [HACKERS] mat views stats

[HACKERS] mat views stats

From
Jim Mlodgenski
Date:
I've come across a number of times where the statistics on materialized views become stale producing bad plans. It turns out that autovacuum only touches a materialized view when it is first created and ignores it on a refresh. When you have a materialized view like yesterdays_sales the data in the materialized view turns over every day. 

Attached is a patch to trigger autovacuum based on a matview refresh along with a system view pg_stat_all_matviews to show information more meaningful for materialized views.

-- Jim


Attachment

Re: [HACKERS] mat views stats

From
Peter Eisentraut
Date:
On 2/20/17 10:06, Jim Mlodgenski wrote:
> I've come across a number of times where the statistics on materialized
> views become stale producing bad plans. It turns out that autovacuum
> only touches a materialized view when it is first created and ignores it
> on a refresh. When you have a materialized view like yesterdays_sales
> the data in the materialized view turns over every day. 

That sounds like a bug.

> Attached is a patch to trigger autovacuum based on a matview refresh
> along with a system view pg_stat_all_matviews to show information more
> meaningful for materialized views.

It might be easier to include materialized views into pg_stat_*_tables.

I think these should be two separate patches.  We might want to
backpatch the first one.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] mat views stats

From
Jim Nasby
Date:
On 2/21/17 4:22 PM, Peter Eisentraut wrote:
>> Attached is a patch to trigger autovacuum based on a matview refresh
>> along with a system view pg_stat_all_matviews to show information more
>> meaningful for materialized views.
> It might be easier to include materialized views into pg_stat_*_tables.

Certainly easier, but I don't think it'd be better. Matviews really 
aren't the same thing as tables. Off-hand (without reviewing the patch), 
update and delete counts certainly wouldn't make any sense. "Insert" 
counts might, in as much as it's how many rows have been added by 
refreshes. You'd want a refresh count too.

> I think these should be two separate patches.  We might want to
> backpatch the first one.

+1; definitely sounds like a bug to me.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] mat views stats

From
Jim Mlodgenski
Date:


On Wed, Feb 22, 2017 at 12:43 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 2/21/17 4:22 PM, Peter Eisentraut wrote:
Attached is a patch to trigger autovacuum based on a matview refresh
along with a system view pg_stat_all_matviews to show information more
meaningful for materialized views.
It might be easier to include materialized views into pg_stat_*_tables.

Certainly easier, but I don't think it'd be better. Matviews really aren't the same thing as tables. Off-hand (without reviewing the patch), update and delete counts certainly wouldn't make any sense. "Insert" counts might, in as much as it's how many rows have been added by refreshes. You'd want a refresh count too.

Matviews already show up in the pg_stat_*_tables and the patch does leverage the existing pg_stat_*_tables underlying structure, but it creates more meaningful pg_stat_*_matviews leaving out things like insert and update counts.  
 


I think these should be two separate patches.  We might want to
backpatch the first one.

I was originally thinking 2 patches, but I couldn't think of a way to trigger the analyze reliably without adding a refresh count or sending bogus stats. We can certainly send a stats message containing the number of rows inserted by the refresh, but are we going to also send the number of deletes as well? Consider a matview that has month to date data. At the end of the month, there will be about 30n live tuples. The next day on the new month, there will be n inserts with the stats thinking there are 30n live tuples which is below the analyze scale factor.  We want to analyze the matview on the first of the day of the new month, but it wouldn't be triggered for a few days. We can have REFRESH also track live tuples, but it was quickly becoming a slippery slope of changing behavior for a back patch. Maybe that's OK and we can go down that road. 

We can back patch some documentation about the existing refresh behavior with autovacuum.



Re: [HACKERS] mat views stats

From
Peter Eisentraut
Date:
On 2/22/17 06:31, Jim Mlodgenski wrote:
> Matviews already show up in the pg_stat_*_tables and the patch does
> leverage the existing pg_stat_*_tables underlying structure, but it
> creates more meaningful pg_stat_*_matviews leaving out things like
> insert and update counts.  

But fields like seq_scans and last_analyze are then redundant between
the *_tables view and the *_matviews view.  Maybe it would make more
sense to introduce a new view like you propose and not show them in
*_tables anymore?

> I was originally thinking 2 patches, but I couldn't think of a way to
> trigger the analyze reliably without adding a refresh count or sending
> bogus stats. We can certainly send a stats message containing the number
> of rows inserted by the refresh, but are we going to also send the
> number of deletes as well? Consider a matview that has month to date
> data. At the end of the month, there will be about 30n live tuples. The
> next day on the new month, there will be n inserts with the stats
> thinking there are 30n live tuples which is below the analyze scale
> factor.  We want to analyze the matview on the first of the day of the
> new month, but it wouldn't be triggered for a few days. We can have
> REFRESH also track live tuples, but it was quickly becoming a slippery
> slope of changing behavior for a back patch. Maybe that's OK and we can
> go down that road.

For those not reading the patch, it introduces a new reloption
autovacuum_analyze_refresh_threshold that determines when to autoanalyze
a materialized view.

What behavior would we like by default?  Refreshing a materialized view
is a pretty expensive operation, so I think scheduling an analyze quite
aggressively right afterwards is often what you want.

I think sending a stats message with the number of inserted rows could
make sense.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] mat views stats

From
Jim Nasby
Date:
On 2/22/17 7:56 AM, Peter Eisentraut wrote:
> What behavior would we like by default?  Refreshing a materialized view
> is a pretty expensive operation, so I think scheduling an analyze quite
> aggressively right afterwards is often what you want.
>
> I think sending a stats message with the number of inserted rows could
> make sense.

+1 on both counts. And if sane analyze behavior does depend on the stats 
changes then there's no real advantage to a separate patch.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] mat views stats

From
Robert Haas
Date:
On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> Certainly easier, but I don't think it'd be better. Matviews really aren't
> the same thing as tables. Off-hand (without reviewing the patch), update and
> delete counts certainly wouldn't make any sense. "Insert" counts might, in
> as much as it's how many rows have been added by refreshes. You'd want a
> refresh count too.

Regular REFRESH truncates the view and repopulates it, but REFRESH
CONCURRENTLY does inserts, updates, and deletes as needed to adjust
the contents.  So I think all the same counters that make sense for
regular tables are also sensible here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] mat views stats

From
Jim Mlodgenski
Date:


On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> Certainly easier, but I don't think it'd be better. Matviews really aren't
> the same thing as tables. Off-hand (without reviewing the patch), update and
> delete counts certainly wouldn't make any sense. "Insert" counts might, in
> as much as it's how many rows have been added by refreshes. You'd want a
> refresh count too.

Regular REFRESH truncates the view and repopulates it, but REFRESH
CONCURRENTLY does inserts, updates, and deletes as needed to adjust
the contents.  So I think all the same counters that make sense for
regular tables are also sensible here.


After digging into things further, just making refresh report the stats for what is it basically doing simplifies and solves it and it is something we can back patch if that the consensus. See the attached patch.


Attachment

Re: [HACKERS] mat views stats

From
Michael Paquier
Date:
On Thu, Mar 2, 2017 at 7:20 AM, Jim Mlodgenski <jimmy76@gmail.com> wrote:
>
>
> On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
>> wrote:
>> > Certainly easier, but I don't think it'd be better. Matviews really
>> > aren't
>> > the same thing as tables. Off-hand (without reviewing the patch), update
>> > and
>> > delete counts certainly wouldn't make any sense. "Insert" counts might,
>> > in
>> > as much as it's how many rows have been added by refreshes. You'd want a
>> > refresh count too.
>>
>> Regular REFRESH truncates the view and repopulates it, but REFRESH
>> CONCURRENTLY does inserts, updates, and deletes as needed to adjust
>> the contrs that make sense for
>> regular tables are also sensible here.
>>
>
> After digging into things further, just making refresh report the stats for
> what is it basically doing simplifies and solves it and it is something we
> can back patch if that the consensus. See the attached patch.

This is unhappy:
$ git diff master --check
src/backend/commands/matview.c:155: indent with spaces.
+        uint64          processed = 0;

+                /*
+                 * Send the stats to mimic what we are essentially doing.
+                 * A truncate and insert
+                 */
This sentence is unfinished.

There is also no need to report the number of inserts if WITH NO DATA is used.
-- 
Michael



Re: [HACKERS] mat views stats

From
Jim Mlodgenski
Date:


On Wed, Mar 1, 2017 at 8:39 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Thu, Mar 2, 2017 at 7:20 AM, Jim Mlodgenski <jimmy76@gmail.com> wrote:
>
>
> On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
>> wrote:
>> > Certainly easier, but I don't think it'd be better. Matviews really
>> > aren't
>> > the same thing as tables. Off-hand (without reviewing the patch), update
>> > and
>> > delete counts certainly wouldn't make any sense. "Insert" counts might,
>> > in
>> > as much as it's how many rows have been added by refreshes. You'd want a
>> > refresh count too.
>>
>> Regular REFRESH truncates the view and repopulates it, but REFRESH
>> CONCURRENTLY does inserts, updates, and deletes as needed to adjust
>> the contrs that make sense for
>> regular tables are also sensible here.
>>
>
> After digging into things further, just making refresh report the stats for
> what is it basically doing simplifies and solves it and it is something we
> can back patch if that the consensus. See the attached patch.

This is unhappy:
$ git diff master --check
src/backend/commands/matview.c:155: indent with spaces.
+        uint64          processed = 0;

+                /*
+                 * Send the stats to mimic what we are essentially doing.
+                 * A truncate and insert
+                 */
This sentence is unfinished.

There is also no need to report the number of inserts if WITH NO DATA is used.


Here is the cleaned up patch

Attachment

Re: [HACKERS] mat views stats

From
Tom Lane
Date:
Jim Mlodgenski <jimmy76@gmail.com> writes:
> After digging into things further, just making refresh report the stats
> for what is it basically doing simplifies and solves it and it is
> something we can back patch if that the consensus. See the attached
> patch.

I've pushed this into HEAD with one non-cosmetic change: the patch tried
to pass a uint64 tuple count to pgstat_count_heap_insert(), whose argument
was only declared as "int".  This would go seriously wrong with matviews
having more than INT_MAX rows, which hardly seems out of the question,
so I changed pgstat_count_heap_insert() to take int64 instead.

I don't think we can make that change in the back branches though.
It seems too likely that third-party code might be calling 
pgstat_count_heap_insert().

We could possibly kluge around this to produce a safe-to-back-patch
fix by doing something like
pgstat_count_heap_insert(matviewRel, (int) Min(processed, INT_MAX));

But that seems pretty ugly.  Given the lack of previous reports, I'm
personally content to leave this unfixed in the back branches.

Comments?
        regards, tom lane



Re: [HACKERS] mat views stats

From
Jim Mlodgenski
Date:

But that seems pretty ugly.  Given the lack of previous reports, I'm
personally content to leave this unfixed in the back branches.

Comments?

Most instances of this I've seen out in the field have worked around this by just running analyze in the scheduled jobs after the refresh  so we're probably good not back patching.