Re: Implementing Incremental View Maintenance - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Implementing Incremental View Maintenance
Date
Msg-id 024ba056-6b4f-b26c-af17-297c1bbca65a@postgrespro.ru
Whole thread Raw
In response to Re: Implementing Incremental View Maintenance  (Yugo NAGATA <nagata@sraoss.co.jp>)
Responses Re: Implementing Incremental View Maintenance  (legrand legrand <legrand_legrand@hotmail.com>)
Re: Implementing Incremental View Maintenance  (Tatsuo Ishii <ishii@sraoss.co.jp>)
Re: Implementing Incremental View Maintenance  (Yugo NAGATA <nagata@sraoss.co.jp>)
List pgsql-hackers

On 05.10.2020 12:16, Yugo NAGATA wrote:
> Hi,
>
> Attached is the rebased patch (v18) to add support for Incremental
> Materialized View Maintenance (IVM). It is able to be applied to
> current latest master branch.
>

Thank you very much for this work.
I consider incremental materialized views as "reincarnation" of OLAP 
hypercubes.
There are two approaches of making OLAP queries faster:
1. speed up query execution (using JIT, columnar store, vector 
operations and parallel execution)
2. precalculate requested data

Incremental materialize views make it possible to implement second 
approach. But how competitive it is?
I do not know current limitations of incremental materialized views, but 
I checked that basic OLAP functionality:
JOIN+GROUP_BY+AGGREGATION is supported.

The patch is not applied to the current master because makeFuncCall 
prototype is changed,
I fixed it by adding COAERCE_CALL_EXPLICIT.
Then I did the following simple test:

1. Create pgbench database with scale 100.
pgbench speed at my desktop is about 10k TPS:

pgbench -M prepared -N -c 10 -j 4 -T 30 -P 1 postgres
tps = 10194.951827 (including connections establishing)

2. Then I created incremental materialized view:

create incremental materialized view teller_sums as select 
t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on 
a.bid=t.bid group by t.tid;
SELECT 1000
Time: 20805.230 ms (00:20.805)

20 second is reasonable time, comparable with time of database 
initialization.

Then obviously we see advantages of precalculated aggregates:

postgres=# select * from teller_sums where tid=1;
  tid |  sum
-----+--------
    1 | -96427
(1 row)

Time: 0.871 ms
postgres=# select t.tid,sum(abalance) from pgbench_accounts a join 
pgbench_tellers t on a.bid=t.bid group by t.tid having t.tid=1
;
  tid |  sum
-----+--------
    1 | -96427
(1 row)

Time: 915.508 ms

Amazing. Almost 1000 times difference!

3. Run pgbench once again:

Ooops! Now TPS are much lower:

tps = 141.767347 (including connections establishing)

Speed of updates is reduced more than 70 times!
Looks like we loose parallelism because almost the same result I get 
with just one connection.

4. Finally let's create one more view (it is reasonable to expect that 
analytics will run many different queries and so need multiple views).

create incremental materialized view teller_avgs as select 
t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on 
a.bid=t.bid group by t.tid;

It is great that not only simple aggregates like SUM are supported, but 
also AVG.
But insertion speed now is reduced twice - 72TPS.

I tried to make some profiling but didn't see something unusual:

   16.41%  postgres  postgres            [.] ExecInterpExpr
    8.78%  postgres  postgres            [.] slot_deform_heap_tuple
    3.23%  postgres  postgres            [.] ExecMaterial
    2.71%  postgres  postgres            [.] AllocSetCheck
    2.33%  postgres  postgres            [.] AllocSetAlloc
    2.29%  postgres  postgres            [.] slot_getsomeattrs_int
    2.26%  postgres  postgres            [.] ExecNestLoop
    2.11%  postgres  postgres            [.] MemoryContextReset
    1.98%  postgres  postgres            [.] tts_minimal_store_tuple
    1.87%  postgres  postgres            [.] heap_compute_data_size
    1.78%  postgres  postgres            [.] fill_val
    1.56%  postgres  postgres            [.] tuplestore_gettuple
    1.44%  postgres  postgres            [.] sentinel_ok
    1.35%  postgres  postgres            [.] heap_fill_tuple
    1.27%  postgres  postgres            [.] tuplestore_gettupleslot
    1.17%  postgres  postgres            [.] ExecQual
    1.14%  postgres  postgres            [.] tts_minimal_clear
    1.13%  postgres  postgres            [.] CheckOpSlotCompatibility
    1.10%  postgres  postgres            [.] base_yyparse
    1.10%  postgres  postgres            [.] heapgetpage
    1.04%  postgres  postgres            [.] heap_form_minimal_tuple
    1.00%  postgres  postgres            [.] slot_getsomeattrs

So good news is that incremental materialized views really work.
And bad news is that maintenance overhead is too large which 
significantly restrict applicability of this approach.
Certainly in case of dominated read-only workload such materialized 
views can significantly improve performance.
But unfortunately my dream that them allow to combine OLAP+OLPT is not 
currently realized.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




pgsql-hackers by date:

Previous
From: Tim.Colles@ed.ac.uk
Date:
Subject: RE: POC: postgres_fdw insert batching
Next
From: Magnus Hagander
Date:
Subject: Re: Prevent printing "next step instructions" in initdb and pg_upgrade