Thread: MVIEW refresh consistently faster then insert ... select
Hello, I have a table that is an aggregation of another table. This aggregation reduces an input of ~14 million rows to ~4 million rows. So far I have used a truncate/insert approach for this: truncate table stock; insert into stock (product_id, warehouse_id, reserved_provisional, reserved, available, non_deliverable) select product_id, warehouse_id, sum(reserved_provisional), sum(reserved), sum(available), sum(non_deliverable) from staging.stock_data group by product_id, warehouse_id; The table has a primary key on (product_id, warehouse_id), all columns are integer columns. The refresh takes approx 2 minutes (fastest was 1:40) on our development server (CentOS, Postgres 9.5.0) However, when I create a materialized view: create materialized view mv_stock as select product_id, warehouse_id, sum(reserved_provisional) as reserved_provisional, sum(reserved) as reserved, sum(available) as available, sum(non_deliverable) as non_deliverable from staging.stock_data group by product_id, warehouse_id; create unique index pk_mv_stock on mv_stock (product_id, warehouse_id); Subsequent refreshs using "REFRESH MATERIALIZED VIEW mv_stock" are consistently much faster: between 40 seconds and 1 minute I have run both refreshs about 10 times now, so caching effects should not be there. My question is: what is refresh mview doing differently then a plain insert ... select that it makes that so much faster? The select itself takes about 25 seconds. It is using an external merge on disk, which can be removed by increasing work_mem(the select then goes down to 12 seconds, but that doesn't change much in the runtime of the refreshs). The 2 minutes are perfectly acceptable, I'm just curious why refreshing the mview would be so much faster as the work theyare doing should be exactly the same. Thomas
On Wed, May 4, 2016 at 1:46 AM, Thomas Kellerer <spam_eater@gmx.net> wrote: > I have a table that is an aggregation of another table. > This aggregation reduces an input of ~14 million rows to ~4 > million rows. > The refresh takes approx 2 minutes (fastest was 1:40) on our > development server (CentOS, Postgres 9.5.0) > However, when I create a materialized view: > Subsequent refreshs using "REFRESH MATERIALIZED VIEW mv_stock" > are consistently much faster: between 40 seconds and 1 minute > > I have run both refreshs about 10 times now, so caching effects > should not be there. > > My question is: what is refresh mview doing differently then a > plain insert ... select that it makes that so much faster? Just to confirm, is this with or without the CONCURRENTLY keyword on the REFRESH command? If *without*, I would guess the difference is probably in creating the index "from scratch" with sort and load versus retail insertion of index entries. You could approximate this by dropping the index before the TRUNCATE and INSERT and creating it again after it is loaded. If *with*, I would guess that it is because most of the work is done in temporary files and workspace, with just the delta applied to the table and index in permanent storage. It's hard to guess which way will be faster for the use case you describe -- it will probably depend on what percentage of rows remain unchanged on each REFRESH. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner schrieb am 04.05.2016 um 09:06: > On Wed, May 4, 2016 at 1:46 AM, Thomas Kellerer <spam_eater@gmx.net> wrote: > >> I have a table that is an aggregation of another table. >> This aggregation reduces an input of ~14 million rows to ~4 >> million rows. > >> The refresh takes approx 2 minutes (fastest was 1:40) on our >> development server (CentOS, Postgres 9.5.0) > >> However, when I create a materialized view: > >> Subsequent refreshs using "REFRESH MATERIALIZED VIEW mv_stock" >> are consistently much faster: between 40 seconds and 1 minute >> >> I have run both refreshs about 10 times now, so caching effects >> should not be there. >> >> My question is: what is refresh mview doing differently then a >> plain insert ... select that it makes that so much faster? > > If *without*, I would guess the difference is probably in creating > the index "from scratch" with sort and load versus retail insertion > of index entries. You could approximate this by dropping the index > before the TRUNCATE and INSERT and creating it again after it is > loaded. It's without. But your suggestion got me looking in the right direction ;) I completely forgot that the table has two foreign keys that I did not create on the mview. When I remove those, then both solutions are equally fast. Sorry for the noise. Thomas