Home > mailing lists

Few remarks on JIT , parallel query execution and columnar store... - Mailing list pgsql-hackers

From	Konstantin Knizhnik
Subject	Few remarks on JIT , parallel query execution and columnar store...
Date	October 11, 2018 10:04:48
Msg-id	f57e183c-3cfe-fdd5-7be2-8e1456711067@postgrespro.ru Whole thread Raw
List	pgsql-hackers

Tree view

Recently I have to estimate performance of performing select with multiple search conditions with bad selectivity.
Definitely it is some kind of OLAP query and it will be interesting for me to understand the role of different PostgreSQL optimization options.

So the table is the following:

create table t(pk bigint primary key, val1 double, val2 double, val3 double);
insert into t select s."#1" as pk, rnd() as val1, rnd() as val2, rnd() as val3 from generate_series(1,10000000) s;

I run the following query on standard desktop with quad-core CPU and 16 Gb of RAM with shared buffer adjusted to fit the whole database.:

select count(*) from t where val1>=0.5 and val2<=0.5 and val3 between 0.2 and 0.6;

Results are the following:

JIT	Parallel workers	Time
off	0	773
off	8	216
on	0	650
on	8	254

So without parallelism JIT provides some speed improvement, but in case of parallel execution JIT effect is negative.
Most likely because JIT generation time (30 msec) is comparable with execution time.

Conclusion: for sequential scan of 10 million records JIT is not able to provide performance improvement.
Let's increase number of records 10 times.
Now results are the following:

JIT	Parallel workers	Time
off	0	7848
off	8	2063
on	0	6301
on	8	1648

So now JIT is faster both for sequential and parallel execution.
But is it not the fastest result of processing this query with Postgres.
Let's try my extension VOPS (https://github.com/postgrespro/vops):

Parallel workers	Time
0	1447
2	623
4	494
8	491

So VOPS is > 3 times faster than JIT, but looks like it can provide even better results for larger number of records,
because as you see increasing number of workers from 2 to 4 cause increase of performance about 30% and not two times.
Looks like overhead of starting parallel worker is too large and for query execution time < 1 second it has noticeable impact on total performance.

In some other my prototype DBMS with vertical data representation and multhreaded execution time of execution of this query is 195 msec.
So there is still scope for improvements:)

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

From: Corey Huinker
Date: 11 October 2018, 09:12:48
Subject: Re: COPY FROM WHEN condition

From: Tom Lane
Date: 11 October 2018, 13:53:45
Subject: Re: Debian mips: Failed test 'Check expected t_009_tbl data on standby'

Few remarks on JIT , parallel query execution and columnar store... - Mailing list pgsql-hackers

Previous

Next