Thread: postgres materialized view refresh performance

postgres materialized view refresh performance

From
Ayub M
Date:

There is a table t which is used in a mview mv, this is the only table in the mview definition.

create table t (c1 int, ..., c10 int);
-- there is a pk on say c1 column
create materialized view mv as select c1, c2...c10 from t;
---there is a unique index on say c5 and bunch of other indexes on the mview.

The reason there is a mview created instead of using table t, is that that the table gets truncated and reloaded every couple of hours and we don't want users to see an empty table at any point of time that's why mview is being used.

Using "refresh materialized view concurrently", this mview is being used by APIs and end users.

Couple of questions I have -

  1. Whenever mview refresh concurrently happens, does pg create another set of table and indexes and switch it with the orig? If no, then does it update the existing data?
  2. If the usage of mview is pretty heavy does it impact the performance of the refresh process? Vice-versa, if the refresh is going on does the performance of mview by users take a hit?
  3. The mview gets refreshed in a couple of mins sometimes and sometimes it takes hours. When it runs for longer, there are no locks and no resource shortage, the number of recs in the base table is 6m (7.5gb) which is not huge so why does it take so long to refresh the mview?
  4. Does mview need vacuum/analyze/reindex?

Re: postgres materialized view refresh performance

From
Philip Semanchuk
Date:

> On Oct 22, 2020, at 3:53 PM, Ayub M <hiayub@gmail.com> wrote:
>
> There is a table t which is used in a mview mv, this is the only table in the mview definition.
>
> create table t (c1 int, ..., c10 int
> );
>
> -- there is a pk on say c1 column
> create materialized view mv as select c1, c2...c10 from
>  t;
>
> ---there is a unique index on say c5 and bunch of other indexes on the mview.
> The reason there is a mview created instead of using table t, is that that the table gets truncated and reloaded
everycouple of hours and we don't want users to see an empty table at any point of time that's why mview is being used. 
>
> Using "refresh materialized view concurrently", this mview is being used by APIs and end users.
>
> Couple of questions I have -

Hi Ayub,
I’m not an expert on the subject; I hope you’ll get an answer from someone who is. :-) Until then, my answers might
help.

>     • Whenever mview refresh concurrently happens, does pg create another set of table and indexes and switch it with
theorig? If no, then does it update the existing data? 


My understanding is that when CONCURRENTLY is specified, Postgres implements the refresh as a series of INSERT, UPDATE,
andDELETE statements on the existing view. So the answer to your question is no, Postgres doesn’t create another table
andthen swap it. 


>     • The mview gets refreshed in a couple of mins sometimes and sometimes it takes hours. When it runs for longer,
thereare no locks and no resource shortage, the number of recs in the base table is 6m (7.5gb) which is not huge so why
doesit take so long to refresh the mview? 

Does the run time correlate with the number of changes being made?


>     • Does mview need vacuum/analyze/reindex?


My understanding is that when CONCURRENTLY is specified, yes it does need vacuuming, because of the aforementioned
implementationof REFRESH as a series of INSERT, UPDATE, and DELETE statements.  

In our situation, we have large views that are refreshed once per week. We want to ensure that the view is in the best
possibleshape for users, so we create the view with autovacuum_enabled = false and then run an explicit vacuum/analyze
stepimmediately after the refresh rather than leaving it to chance. 

Cheers
Philip





Re: postgres materialized view refresh performance

From
Ravi Krishna
Date:
> My understanding is that when CONCURRENTLY is specified, Postgres implements the refresh as a series of INSERT,
UPDATE,
> and DELETE statements on the existing view. So the answer to your question is no, Postgres doesn’t create another
tableand 
> then swap it.

The INSERTS/UPDATE/DELETE happens only for the difference.  PG first creates a new temp table and then compares it with
the MV and detects the difference.  That is why for CONCURRENTLY, a unique index is required on the MV.


Re: postgres materialized view refresh performance

From
Philip Semanchuk
Date:

> On Oct 23, 2020, at 9:52 AM, Ravi Krishna <sravikrishna@mail.com> wrote:
>
>> My understanding is that when CONCURRENTLY is specified, Postgres implements the refresh as a series of INSERT,
UPDATE,
>> and DELETE statements on the existing view. So the answer to your question is no, Postgres doesn’t create another
tableand 
>> then swap it.
>
> The INSERTS/UPDATE/DELETE happens only for the difference.  PG first creates a new temp table and then compares it
with
> the MV and detects the difference.  That is why for CONCURRENTLY, a unique index is required on the MV.

Yes, thank you, that’s what I understand too but I expressed it very poorly.




Re: postgres materialized view refresh performance

From
Ayub M
Date:
Thank you both.

As for the mview refresh taking long --
  • The mview gets refreshed in a couple of mins sometimes and sometimes it takes hours. When it runs for longer, there are no locks and no resource shortage, the number of recs in the base table is 6m (7.5gb) which is not huge so why does it take so long to refresh the mview?

Does the run time correlate with the number of changes being made?  

-- Almost the same number of records are present in the base table (6 million records). The base table gets truncated and reloaded everytime with almost the same number of records. 

And the mview is a simple select from this one base table. 

The mview has around 10 indexes, 1 unique and 9 non-unique indexes. 

Population of the base tables takes about 2 mins, using "insert into select from table", but when the mview is created for the first time it takes 16 minutes. Even when I remove all but one unique index it takes about 7 minutes. Any clue as to why it is taking longer than the create of the base table (which is 2 mins).

On Fri, Oct 23, 2020 at 10:53 AM Philip Semanchuk <philip@americanefficient.com> wrote:


> On Oct 23, 2020, at 9:52 AM, Ravi Krishna <sravikrishna@mail.com> wrote:
>
>> My understanding is that when CONCURRENTLY is specified, Postgres implements the refresh as a series of INSERT, UPDATE,
>> and DELETE statements on the existing view. So the answer to your question is no, Postgres doesn’t create another table and
>> then swap it.
>
> The INSERTS/UPDATE/DELETE happens only for the difference.  PG first creates a new temp table and then compares it with
> the MV and detects the difference.  That is why for CONCURRENTLY, a unique index is required on the MV.

Yes, thank you, that’s what I understand too but I expressed it very poorly.



--
Regards,
Ayub

Re: postgres materialized view refresh performance

From
Philip Semanchuk
Date:

> On Oct 25, 2020, at 10:52 PM, Ayub M <hiayub@gmail.com> wrote:
>
> Thank you both.
>
> As for the mview refresh taking long --
>   • The mview gets refreshed in a couple of mins sometimes and sometimes it takes hours. When it runs for longer,
thereare no locks and no resource shortage, the number of recs in the base table is 6m (7.5gb) which is not huge so why
doesit take so long to refresh the mview? 
>
> Does the run time correlate with the number of changes being made?
>
> -- Almost the same number of records are present in the base table (6 million records). The base table gets truncated
andreloaded everytime with almost the same number of records.  
>
> And the mview is a simple select from this one base table.
>
> The mview has around 10 indexes, 1 unique and 9 non-unique indexes.
>
> Population of the base tables takes about 2 mins, using "insert into select from table", but when the mview is
createdfor the first time it takes 16 minutes. Even when I remove all but one unique index it takes about 7 minutes.
Anyclue as to why it is taking longer than the create of the base table (which is 2 mins). 

Do you know if it’s executing a different plan when it takes a long time? auto_explain can help with that.



>
> On Fri, Oct 23, 2020 at 10:53 AM Philip Semanchuk <philip@americanefficient.com> wrote:
>
>
> > On Oct 23, 2020, at 9:52 AM, Ravi Krishna <sravikrishna@mail.com> wrote:
> >
> >> My understanding is that when CONCURRENTLY is specified, Postgres implements the refresh as a series of INSERT,
UPDATE,
> >> and DELETE statements on the existing view. So the answer to your question is no, Postgres doesn’t create another
tableand 
> >> then swap it.
> >
> > The INSERTS/UPDATE/DELETE happens only for the difference.  PG first creates a new temp table and then compares it
with
> > the MV and detects the difference.  That is why for CONCURRENTLY, a unique index is required on the MV.
>
> Yes, thank you, that’s what I understand too but I expressed it very poorly.
>
>
>
> --
> Regards,
> Ayub




Re: postgres materialized view refresh performance

From
Ayub M
Date:
It's a simple sequential scan plan of one line, just reading the base table sequentially. 

On Mon, Oct 26, 2020, 9:21 AM Philip Semanchuk <philip@americanefficient.com> wrote:


> On Oct 25, 2020, at 10:52 PM, Ayub M <hiayub@gmail.com> wrote:
>
> Thank you both.
>
> As for the mview refresh taking long --
>   • The mview gets refreshed in a couple of mins sometimes and sometimes it takes hours. When it runs for longer, there are no locks and no resource shortage, the number of recs in the base table is 6m (7.5gb) which is not huge so why does it take so long to refresh the mview?
>
> Does the run time correlate with the number of changes being made? 
>
> -- Almost the same number of records are present in the base table (6 million records). The base table gets truncated and reloaded everytime with almost the same number of records.
>
> And the mview is a simple select from this one base table.
>
> The mview has around 10 indexes, 1 unique and 9 non-unique indexes.
>
> Population of the base tables takes about 2 mins, using "insert into select from table", but when the mview is created for the first time it takes 16 minutes. Even when I remove all but one unique index it takes about 7 minutes. Any clue as to why it is taking longer than the create of the base table (which is 2 mins).

Do you know if it’s executing a different plan when it takes a long time? auto_explain can help with that.



>
> On Fri, Oct 23, 2020 at 10:53 AM Philip Semanchuk <philip@americanefficient.com> wrote:
>
>
> > On Oct 23, 2020, at 9:52 AM, Ravi Krishna <sravikrishna@mail.com> wrote:
> >
> >> My understanding is that when CONCURRENTLY is specified, Postgres implements the refresh as a series of INSERT, UPDATE,
> >> and DELETE statements on the existing view. So the answer to your question is no, Postgres doesn’t create another table and
> >> then swap it.
> >
> > The INSERTS/UPDATE/DELETE happens only for the difference.  PG first creates a new temp table and then compares it with
> > the MV and detects the difference.  That is why for CONCURRENTLY, a unique index is required on the MV.
>
> Yes, thank you, that’s what I understand too but I expressed it very poorly.
>
>
>
> --
> Regards,
> Ayub

Re: postgres materialized view refresh performance

From
Philip Semanchuk
Date:

> On Oct 26, 2020, at 10:45 AM, Ayub M <hiayub@gmail.com> wrote:
>
> It's a simple sequential scan plan of one line, just reading the base table sequentially.

Well, unless I have misunderstood you, the materialized view is basically just "select * from some_other_table”, the
numberof records in the source table is ~6m and doesn’t change much, there are no locking delays and no resource
shortages,but sometimes the refresh takes minutes, and sometimes hours. There’s something missing from the story here. 

Some things to try or check on —
 - activity (CPU, disk, memory) during the period when the mat view is refreshing
 - each time after you refresh the mat view, vacuum it
 - even better, if you can afford a brief lock on reads, run a vacuum full instead of just regular vacuum
 - if possible, at the same time as you create the problematic mat view, run a similar process that writes to a
differentmat view (tmp_throwaway_mat_view) without the CONCURRENTLY keyword and see if it behaves similarly.  



>
> On Mon, Oct 26, 2020, 9:21 AM Philip Semanchuk <philip@americanefficient.com> wrote:
>
>
> > On Oct 25, 2020, at 10:52 PM, Ayub M <hiayub@gmail.com> wrote:
> >
> > Thank you both.
> >
> > As for the mview refresh taking long --
> >   • The mview gets refreshed in a couple of mins sometimes and sometimes it takes hours. When it runs for longer,
thereare no locks and no resource shortage, the number of recs in the base table is 6m (7.5gb) which is not huge so why
doesit take so long to refresh the mview? 
> >
> > Does the run time correlate with the number of changes being made?
> >
> > -- Almost the same number of records are present in the base table (6 million records). The base table gets
truncatedand reloaded everytime with almost the same number of records.  
> >
> > And the mview is a simple select from this one base table.
> >
> > The mview has around 10 indexes, 1 unique and 9 non-unique indexes.
> >
> > Population of the base tables takes about 2 mins, using "insert into select from table", but when the mview is
createdfor the first time it takes 16 minutes. Even when I remove all but one unique index it takes about 7 minutes.
Anyclue as to why it is taking longer than the create of the base table (which is 2 mins). 
>
> Do you know if it’s executing a different plan when it takes a long time? auto_explain can help with that.
>
>
>
> >
> > On Fri, Oct 23, 2020 at 10:53 AM Philip Semanchuk <philip@americanefficient.com> wrote:
> >
> >
> > > On Oct 23, 2020, at 9:52 AM, Ravi Krishna <sravikrishna@mail.com> wrote:
> > >
> > >> My understanding is that when CONCURRENTLY is specified, Postgres implements the refresh as a series of INSERT,
UPDATE,
> > >> and DELETE statements on the existing view. So the answer to your question is no, Postgres doesn’t create
anothertable and 
> > >> then swap it.
> > >
> > > The INSERTS/UPDATE/DELETE happens only for the difference.  PG first creates a new temp table and then compares
itwith 
> > > the MV and detects the difference.  That is why for CONCURRENTLY, a unique index is required on the MV.
> >
> > Yes, thank you, that’s what I understand too but I expressed it very poorly.
> >
> >
> >
> > --
> > Regards,
> > Ayub
>