Thread: postgres materialized view refresh performance
There is a table t which is used in a mview mv, this is the only table in the mview definition.
create table t (c1 int, ..., c10 int);
-- there is a pk on say c1 column
create materialized view mv as select c1, c2...c10 from t;
---there is a unique index on say c5 and bunch of other indexes on the mview.
The reason there is a mview created instead of using table t, is that that the table gets truncated and reloaded every couple of hours and we don't want users to see an empty table at any point of time that's why mview is being used.
Using "refresh materialized view concurrently", this mview is being used by APIs and end users.
Couple of questions I have -
- Whenever mview refresh concurrently happens, does pg create another set of table and indexes and switch it with the orig? If no, then does it update the existing data?
- If the usage of mview is pretty heavy does it impact the performance of the refresh process? Vice-versa, if the refresh is going on does the performance of mview by users take a hit?
- The mview gets refreshed in a couple of mins sometimes and sometimes it takes hours. When it runs for longer, there are no locks and no resource shortage, the number of recs in the base table is 6m (7.5gb) which is not huge so why does it take so long to refresh the mview?
- Does mview need vacuum/analyze/reindex?
> On Oct 22, 2020, at 3:53 PM, Ayub M <hiayub@gmail.com> wrote: > > There is a table t which is used in a mview mv, this is the only table in the mview definition. > > create table t (c1 int, ..., c10 int > ); > > -- there is a pk on say c1 column > create materialized view mv as select c1, c2...c10 from > t; > > ---there is a unique index on say c5 and bunch of other indexes on the mview. > The reason there is a mview created instead of using table t, is that that the table gets truncated and reloaded everycouple of hours and we don't want users to see an empty table at any point of time that's why mview is being used. > > Using "refresh materialized view concurrently", this mview is being used by APIs and end users. > > Couple of questions I have - Hi Ayub, I’m not an expert on the subject; I hope you’ll get an answer from someone who is. :-) Until then, my answers might help. > • Whenever mview refresh concurrently happens, does pg create another set of table and indexes and switch it with theorig? If no, then does it update the existing data? My understanding is that when CONCURRENTLY is specified, Postgres implements the refresh as a series of INSERT, UPDATE, andDELETE statements on the existing view. So the answer to your question is no, Postgres doesn’t create another table andthen swap it. > • The mview gets refreshed in a couple of mins sometimes and sometimes it takes hours. When it runs for longer, thereare no locks and no resource shortage, the number of recs in the base table is 6m (7.5gb) which is not huge so why doesit take so long to refresh the mview? Does the run time correlate with the number of changes being made? > • Does mview need vacuum/analyze/reindex? My understanding is that when CONCURRENTLY is specified, yes it does need vacuuming, because of the aforementioned implementationof REFRESH as a series of INSERT, UPDATE, and DELETE statements. In our situation, we have large views that are refreshed once per week. We want to ensure that the view is in the best possibleshape for users, so we create the view with autovacuum_enabled = false and then run an explicit vacuum/analyze stepimmediately after the refresh rather than leaving it to chance. Cheers Philip
> My understanding is that when CONCURRENTLY is specified, Postgres implements the refresh as a series of INSERT, UPDATE, > and DELETE statements on the existing view. So the answer to your question is no, Postgres doesn’t create another tableand > then swap it. The INSERTS/UPDATE/DELETE happens only for the difference. PG first creates a new temp table and then compares it with the MV and detects the difference. That is why for CONCURRENTLY, a unique index is required on the MV.
> On Oct 23, 2020, at 9:52 AM, Ravi Krishna <sravikrishna@mail.com> wrote: > >> My understanding is that when CONCURRENTLY is specified, Postgres implements the refresh as a series of INSERT, UPDATE, >> and DELETE statements on the existing view. So the answer to your question is no, Postgres doesn’t create another tableand >> then swap it. > > The INSERTS/UPDATE/DELETE happens only for the difference. PG first creates a new temp table and then compares it with > the MV and detects the difference. That is why for CONCURRENTLY, a unique index is required on the MV. Yes, thank you, that’s what I understand too but I expressed it very poorly.
Thank you both.
As for the mview refresh taking long --
• The mview gets refreshed in a couple of mins sometimes and sometimes it takes hours. When it runs for longer, there are no locks and no resource shortage, the number of recs in the base table is 6m (7.5gb) which is not huge so why does it take so long to refresh the mview?
Does the run time correlate with the number of changes being made?
Does the run time correlate with the number of changes being made?
-- Almost the same number of records are present in the base table (6 million records). The base table gets truncated and reloaded everytime with almost the same number of records.
And the mview is a simple select from this one base table.
The mview has around 10 indexes, 1 unique and 9 non-unique indexes.
Population of the base tables takes about 2 mins, using "insert into select from table", but when the mview is created for the first time it takes 16 minutes. Even when I remove all but one unique index it takes about 7 minutes. Any clue as to why it is taking longer than the create of the base table (which is 2 mins).
On Fri, Oct 23, 2020 at 10:53 AM Philip Semanchuk <philip@americanefficient.com> wrote:
> On Oct 23, 2020, at 9:52 AM, Ravi Krishna <sravikrishna@mail.com> wrote:
>
>> My understanding is that when CONCURRENTLY is specified, Postgres implements the refresh as a series of INSERT, UPDATE,
>> and DELETE statements on the existing view. So the answer to your question is no, Postgres doesn’t create another table and
>> then swap it.
>
> The INSERTS/UPDATE/DELETE happens only for the difference. PG first creates a new temp table and then compares it with
> the MV and detects the difference. That is why for CONCURRENTLY, a unique index is required on the MV.
Yes, thank you, that’s what I understand too but I expressed it very poorly.
Regards,
Ayub
> On Oct 25, 2020, at 10:52 PM, Ayub M <hiayub@gmail.com> wrote: > > Thank you both. > > As for the mview refresh taking long -- > • The mview gets refreshed in a couple of mins sometimes and sometimes it takes hours. When it runs for longer, thereare no locks and no resource shortage, the number of recs in the base table is 6m (7.5gb) which is not huge so why doesit take so long to refresh the mview? > > Does the run time correlate with the number of changes being made? > > -- Almost the same number of records are present in the base table (6 million records). The base table gets truncated andreloaded everytime with almost the same number of records. > > And the mview is a simple select from this one base table. > > The mview has around 10 indexes, 1 unique and 9 non-unique indexes. > > Population of the base tables takes about 2 mins, using "insert into select from table", but when the mview is createdfor the first time it takes 16 minutes. Even when I remove all but one unique index it takes about 7 minutes. Anyclue as to why it is taking longer than the create of the base table (which is 2 mins). Do you know if it’s executing a different plan when it takes a long time? auto_explain can help with that. > > On Fri, Oct 23, 2020 at 10:53 AM Philip Semanchuk <philip@americanefficient.com> wrote: > > > > On Oct 23, 2020, at 9:52 AM, Ravi Krishna <sravikrishna@mail.com> wrote: > > > >> My understanding is that when CONCURRENTLY is specified, Postgres implements the refresh as a series of INSERT, UPDATE, > >> and DELETE statements on the existing view. So the answer to your question is no, Postgres doesn’t create another tableand > >> then swap it. > > > > The INSERTS/UPDATE/DELETE happens only for the difference. PG first creates a new temp table and then compares it with > > the MV and detects the difference. That is why for CONCURRENTLY, a unique index is required on the MV. > > Yes, thank you, that’s what I understand too but I expressed it very poorly. > > > > -- > Regards, > Ayub
It's a simple sequential scan plan of one line, just reading the base table sequentially.
On Mon, Oct 26, 2020, 9:21 AM Philip Semanchuk <philip@americanefficient.com> wrote:
> On Oct 25, 2020, at 10:52 PM, Ayub M <hiayub@gmail.com> wrote:
>
> Thank you both.
>
> As for the mview refresh taking long --
> • The mview gets refreshed in a couple of mins sometimes and sometimes it takes hours. When it runs for longer, there are no locks and no resource shortage, the number of recs in the base table is 6m (7.5gb) which is not huge so why does it take so long to refresh the mview?
>
> Does the run time correlate with the number of changes being made?
>
> -- Almost the same number of records are present in the base table (6 million records). The base table gets truncated and reloaded everytime with almost the same number of records.
>
> And the mview is a simple select from this one base table.
>
> The mview has around 10 indexes, 1 unique and 9 non-unique indexes.
>
> Population of the base tables takes about 2 mins, using "insert into select from table", but when the mview is created for the first time it takes 16 minutes. Even when I remove all but one unique index it takes about 7 minutes. Any clue as to why it is taking longer than the create of the base table (which is 2 mins).
Do you know if it’s executing a different plan when it takes a long time? auto_explain can help with that.
>
> On Fri, Oct 23, 2020 at 10:53 AM Philip Semanchuk <philip@americanefficient.com> wrote:
>
>
> > On Oct 23, 2020, at 9:52 AM, Ravi Krishna <sravikrishna@mail.com> wrote:
> >
> >> My understanding is that when CONCURRENTLY is specified, Postgres implements the refresh as a series of INSERT, UPDATE,
> >> and DELETE statements on the existing view. So the answer to your question is no, Postgres doesn’t create another table and
> >> then swap it.
> >
> > The INSERTS/UPDATE/DELETE happens only for the difference. PG first creates a new temp table and then compares it with
> > the MV and detects the difference. That is why for CONCURRENTLY, a unique index is required on the MV.
>
> Yes, thank you, that’s what I understand too but I expressed it very poorly.
>
>
>
> --
> Regards,
> Ayub
> On Oct 26, 2020, at 10:45 AM, Ayub M <hiayub@gmail.com> wrote: > > It's a simple sequential scan plan of one line, just reading the base table sequentially. Well, unless I have misunderstood you, the materialized view is basically just "select * from some_other_table”, the numberof records in the source table is ~6m and doesn’t change much, there are no locking delays and no resource shortages,but sometimes the refresh takes minutes, and sometimes hours. There’s something missing from the story here. Some things to try or check on — - activity (CPU, disk, memory) during the period when the mat view is refreshing - each time after you refresh the mat view, vacuum it - even better, if you can afford a brief lock on reads, run a vacuum full instead of just regular vacuum - if possible, at the same time as you create the problematic mat view, run a similar process that writes to a differentmat view (tmp_throwaway_mat_view) without the CONCURRENTLY keyword and see if it behaves similarly. > > On Mon, Oct 26, 2020, 9:21 AM Philip Semanchuk <philip@americanefficient.com> wrote: > > > > On Oct 25, 2020, at 10:52 PM, Ayub M <hiayub@gmail.com> wrote: > > > > Thank you both. > > > > As for the mview refresh taking long -- > > • The mview gets refreshed in a couple of mins sometimes and sometimes it takes hours. When it runs for longer, thereare no locks and no resource shortage, the number of recs in the base table is 6m (7.5gb) which is not huge so why doesit take so long to refresh the mview? > > > > Does the run time correlate with the number of changes being made? > > > > -- Almost the same number of records are present in the base table (6 million records). The base table gets truncatedand reloaded everytime with almost the same number of records. > > > > And the mview is a simple select from this one base table. > > > > The mview has around 10 indexes, 1 unique and 9 non-unique indexes. > > > > Population of the base tables takes about 2 mins, using "insert into select from table", but when the mview is createdfor the first time it takes 16 minutes. Even when I remove all but one unique index it takes about 7 minutes. Anyclue as to why it is taking longer than the create of the base table (which is 2 mins). > > Do you know if it’s executing a different plan when it takes a long time? auto_explain can help with that. > > > > > > > On Fri, Oct 23, 2020 at 10:53 AM Philip Semanchuk <philip@americanefficient.com> wrote: > > > > > > > On Oct 23, 2020, at 9:52 AM, Ravi Krishna <sravikrishna@mail.com> wrote: > > > > > >> My understanding is that when CONCURRENTLY is specified, Postgres implements the refresh as a series of INSERT, UPDATE, > > >> and DELETE statements on the existing view. So the answer to your question is no, Postgres doesn’t create anothertable and > > >> then swap it. > > > > > > The INSERTS/UPDATE/DELETE happens only for the difference. PG first creates a new temp table and then compares itwith > > > the MV and detects the difference. That is why for CONCURRENTLY, a unique index is required on the MV. > > > > Yes, thank you, that’s what I understand too but I expressed it very poorly. > > > > > > > > -- > > Regards, > > Ayub >