Re: Implementing Incremental View Maintenance - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Implementing Incremental View Maintenance
Date
Msg-id 89729da8-9042-7ea0-95af-e415df6da14d@postgrespro.ru
Whole thread Raw
In response to Re: Implementing Incremental View Maintenance  (Tatsuo Ishii <ishii@sraoss.co.jp>)
Responses Re: Implementing Incremental View Maintenance  (Yugo NAGATA <nagata@sraoss.co.jp>)
List pgsql-hackers

On 12.11.2020 14:53, Tatsuo Ishii wrote:
>> 1. Create pgbench database with scale 100.
>> pgbench speed at my desktop is about 10k TPS:
>>
>> pgbench -M prepared -N -c 10 -j 4 -T 30 -P 1 postgres
>> tps = 10194.951827 (including connections establishing)
>>
>> 2. Then I created incremental materialized view:
>>
>> create incremental materialized view teller_sums as select
>> t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on
>> a.bid=t.bid group by t.tid;
>> SELECT 1000
>> Time: 20805.230 ms (00:20.805)
>>
>> 20 second is reasonable time, comparable with time of database
>> initialization.
>>
>> Then obviously we see advantages of precalculated aggregates:
>>
>> postgres=# select * from teller_sums where tid=1;
>>   tid |  sum
>> -----+--------
>>     1 | -96427
>> (1 row)
>>
>> Time: 0.871 ms
>> postgres=# select t.tid,sum(abalance) from pgbench_accounts a join
>> pgbench_tellers t on a.bid=t.bid group by t.tid having t.tid=1
>> ;
>>   tid |  sum
>> -----+--------
>>     1 | -96427
>> (1 row)
>>
>> Time: 915.508 ms
>>
>> Amazing. Almost 1000 times difference!
>>
>> 3. Run pgbench once again:
>>
>> Ooops! Now TPS are much lower:
>>
>> tps = 141.767347 (including connections establishing)
>>
>> Speed of updates is reduced more than 70 times!
>> Looks like we loose parallelism because almost the same result I get
>> with just one connection.
> How much TPS do you get if you execute pgbench -c 1 without
> incremental materialized view defined? If it's around 141 then we
> could surely confirm that the major bottle neck is locking contention.
>

My desktop has just 4 physical cores, so performance with one connection 
is about 2k TPS:

pgbench -M prepared -N -c 1 -T 60 -P 1 postgres
tps = 1949.233532 (including connections establishing)

So there is still large gap (~14 times) between insert speed 
with/without incremental view.
I did more investigations and found out that one pf the reasons of bad 
performance in this case is lack of index on materialized view,
so update has to perform sequential scan through 1000 elements.

Well, creation of proper indexes for table is certainly responsibility 
of DBA.
But users may not consider materialized view as normal table. So the 
idea that index should
be explicitly created for materialized view seems to be not so obvious.
 From the other side, implementation of materialized view knows which 
index is needed for performing efficient incremental update.
I wonder if it can create such index itself implicitly or at least 
produce notice with proposal to create such index.

In any case, after creation of index on tid column of materialized view, 
pgbench speed is increased from 141 to 331 TPS
(more than two times). It is with single connection. But if I run 
pgbench with 10 connections, then performance is even slightly slower: 
289 TPS.

I looked throw your patch for exclusive table locks and found this 
fragment in matview.c:

     /*
      * Wait for concurrent transactions which update this materialized 
view at
      * READ COMMITED. This is needed to see changes committed in other
      * transactions. No wait and raise an error at REPEATABLE READ or
      * SERIALIZABLE to prevent update anomalies of matviews.
      * XXX: dead-lock is possible here.
      */
     if (!IsolationUsesXactSnapshot())
         LockRelationOid(matviewOid, ExclusiveLock);
     else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock))


I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections.
It is still about 7 times slower than performance without incremental view.
But now the gap is not so dramatic. And it seems to be clear that this 
exclusive lock on matview is real show stopper for concurrent updates.
I do not know which race conditions and anomalies we can get if replace 
table-level lock with row-level lock here.
But I think that this problem should be addressed in any case: single 
client update mode is very rare scenario.

I attached to this mail profile of pgbench workload with defined 
incremental view (with index).
May be you will find it useful.


One more disappointing observation of materialized views (now 
non-incremental).
Time of creation of non-incremental materialized view is about 18 seconds:

postgres=# create materialized view teller_avgs as select 
t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on 
a.bid=t.bid group by t.tid;
SELECT 1000
Time: 17795.395 ms (00:17.795)

But refresh of such view takes 55 seconds:

postgres=# refresh materialized view teller_avgs;
REFRESH MATERIALIZED VIEW
Time: 55500.381 ms (00:55.500)

And refresh time doesn't depend on amount of updates since last refresh:
I got almost the same time when I ran pgbench for one minute before 
refresh and
when  two refreshes are performed subsequently.

Adding index doesn't help much in this case and concurrent refresh is 
even slower:

postgres=# refresh materialized view concurrently teller_avgs;
REFRESH MATERIALIZED VIEW
Time: 56981.772 ms (00:56.982)

So it seems to be more efficient to drop and recreate materialized view 
rather than refresh it. At least in this case.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Remove unused variable from SharedSort
Next
From: Heikki Linnakangas
Date:
Subject: Re: Refactor pg_rewind code and make it work against a standby