Few remarks on JIT , parallel query execution and columnar store... - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Few remarks on JIT , parallel query execution and columnar store...
Date
Msg-id f57e183c-3cfe-fdd5-7be2-8e1456711067@postgrespro.ru
Whole thread Raw
List pgsql-hackers
Recently I have to estimate performance of performing select with multiple search conditions with bad selectivity.
Definitely it is some kind of OLAP query and it will be interesting for me to understand the role of different PostgreSQL optimization options.

So the table is the following:

create table t(pk bigint primary key, val1 double, val2 double, val3 double);
insert into t select s."#1" as pk, rnd() as val1, rnd() as val2, rnd() as val3 from generate_series(1,10000000) s;

I run the following query on standard desktop with quad-core CPU and 16 Gb of RAM with shared buffer adjusted to fit the whole database.:

select count(*) from t where val1>=0.5 and val2<=0.5 and val3 between 0.2 and 0.6;

Results are the following:

JIT
Parallel workers
Time
off
0
773
off
8
216
on
0
650
on
8
254

So without parallelism JIT provides some speed improvement, but in case of parallel execution JIT effect is negative.
Most likely because JIT generation time (30 msec) is comparable with execution time.

Conclusion: for sequential scan of 10 million records JIT is not able to provide performance improvement.
Let's increase number of records 10 times.
Now results are the following:

JIT
Parallel workers
Time
off
0
7848
off
8
2063
on
0
6301
on
8
1648

So now JIT is faster both for sequential and parallel execution.
But is it not the fastest result of processing this query with Postgres.
Let's try my extension VOPS (https://github.com/postgrespro/vops):

Parallel workersTime
0
1447
2623
4
494
8
491


So VOPS is > 3 times faster than JIT,  but looks like it can provide even better results for larger number of records,
because as you see increasing  number of workers from 2 to 4 cause increase of performance about 30% and not two times.
Looks like overhead of starting parallel worker is too large and for query execution time < 1 second it has noticeable impact on total performance.

In some other my prototype DBMS with vertical data representation and multhreaded execution time of execution of this query is 195 msec.
So there is still scope for improvements:)

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

pgsql-hackers by date:

Previous
From: Corey Huinker
Date:
Subject: Re: COPY FROM WHEN condition
Next
From: Tom Lane
Date:
Subject: Re: Debian mips: Failed test 'Check expected t_009_tbl data on standby'