Thread: MVIEW refresh consistently faster then insert ... select

MVIEW refresh consistently faster then insert ... select

From
Thomas Kellerer
Date:
Hello,

I have a table that is an aggregation of another table.
This aggregation reduces an input of ~14 million rows to ~4 million rows.

So far I have used a truncate/insert approach for this:

    truncate table stock;
    insert into stock (product_id, warehouse_id, reserved_provisional, reserved, available, non_deliverable)
    select product_id, warehouse_id, sum(reserved_provisional), sum(reserved), sum(available), sum(non_deliverable)
    from staging.stock_data
    group by product_id, warehouse_id;

The table has a primary key on (product_id, warehouse_id), all columns are integer columns.
The refresh takes approx 2 minutes (fastest was 1:40) on our development server (CentOS, Postgres 9.5.0)

However, when I create a materialized view:

    create materialized view mv_stock
    as
    select product_id,
           warehouse_id,
           sum(reserved_provisional) as reserved_provisional,
           sum(reserved) as reserved,
           sum(available) as available,
           sum(non_deliverable) as non_deliverable
    from staging.stock_data
    group by product_id, warehouse_id;

    create unique index pk_mv_stock on mv_stock (product_id, warehouse_id);

Subsequent refreshs using "REFRESH MATERIALIZED VIEW mv_stock" are consistently much faster: between 40 seconds and 1
minute

I have run both refreshs about 10 times now, so caching effects should not be there.

My question is: what is refresh mview doing differently then a plain insert ... select that it makes that so much
faster? 

The select itself takes about 25 seconds. It is using an external merge on disk, which can be removed by increasing
work_mem(the select then goes down to 12 seconds, but that doesn't change much in the runtime of the refreshs).  

The 2 minutes are perfectly acceptable, I'm just curious why refreshing the mview would be so much faster as the work
theyare doing should be exactly the same.  

Thomas






Re: MVIEW refresh consistently faster then insert ... select

From
Kevin Grittner
Date:
On Wed, May 4, 2016 at 1:46 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:

> I have a table that is an aggregation of another table.
> This aggregation reduces an input of ~14 million rows to ~4
> million rows.

> The refresh takes approx 2 minutes (fastest was 1:40) on our
> development server (CentOS, Postgres 9.5.0)

> However, when I create a materialized view:

> Subsequent refreshs using "REFRESH MATERIALIZED VIEW mv_stock"
> are consistently much faster: between 40 seconds and 1 minute
>
> I have run both refreshs about 10 times now, so caching effects
> should not be there.
>
> My question is: what is refresh mview doing differently then a
> plain insert ... select that it makes that so much faster?

Just to confirm, is this with or without the CONCURRENTLY keyword
on the REFRESH command?

If *without*, I would guess the difference is probably in creating
the index "from scratch" with sort and load versus retail insertion
of index entries.  You could approximate this by dropping the index
before the TRUNCATE and INSERT and creating it again after it is
loaded.

If *with*, I would guess that it is because most of the work is
done in temporary files and workspace, with just the delta applied
to the table and index in permanent storage.

It's hard to guess which way will be faster for the use case you
describe -- it will probably depend on what percentage of rows
remain unchanged on each REFRESH.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: MVIEW refresh consistently faster then insert ... select

From
Thomas Kellerer
Date:
Kevin Grittner schrieb am 04.05.2016 um 09:06:
> On Wed, May 4, 2016 at 1:46 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:
>
>> I have a table that is an aggregation of another table.
>> This aggregation reduces an input of ~14 million rows to ~4
>> million rows.
>
>> The refresh takes approx 2 minutes (fastest was 1:40) on our
>> development server (CentOS, Postgres 9.5.0)
>
>> However, when I create a materialized view:
>
>> Subsequent refreshs using "REFRESH MATERIALIZED VIEW mv_stock"
>> are consistently much faster: between 40 seconds and 1 minute
>>
>> I have run both refreshs about 10 times now, so caching effects
>> should not be there.
>>
>> My question is: what is refresh mview doing differently then a
>> plain insert ... select that it makes that so much faster?
>
> If *without*, I would guess the difference is probably in creating
> the index "from scratch" with sort and load versus retail insertion
> of index entries.  You could approximate this by dropping the index
> before the TRUNCATE and INSERT and creating it again after it is
> loaded.

It's without.

But your suggestion got me looking in the right direction ;)

I completely forgot that the table has two foreign keys that I did not create on the mview.

When I remove those, then both solutions are equally fast.

Sorry for the noise.

Thomas