Re: Vitesse DB call for testing - Mailing list pgsql-hackers

From CK Tan
Subject Re: Vitesse DB call for testing
Date
Msg-id 5CFE0CA1-E5CC-4CD1-9D0B-8D72143D81C2@vitessedata.com
Whole thread Raw
In response to Re: Vitesse DB call for testing  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: Vitesse DB call for testing
Re: Vitesse DB call for testing
List pgsql-hackers
Merlin, glad you tried it.

We take the query plan exactly as given by the planner, decide whether to JIT or to punt depending on the cost. If we
punt,it goes back to pg executor. If we JIT, and if we could not proceed (usually of some operators we haven't
implementedyet), we again punt. Once we were able to generate the code, there is no going back; we call into LLVM to
obtainthe function entry point, and run it to completion. The 3% improvement you see in OLTP tests is definitely noise. 

The bigint sum,avg,count case in the example you tried has some optimization. We use int128 to accumulate the bigint
insteadof numeric in pg. Hence the big speed up. Try the same query on int4 for the improvement where both pg and
vitessedbare using int4 in the execution. 

The speed up is really noticeable when the data type is nonvarlena. In the varlena cases, we still call into pg
routinesmost of the times. Again, try the sum,avg,count query on numeric, and you will see what I mean. 

Also, we don't support UDF at the moment. So all queries involving UDF gets sent to pg executor.

On your question of 32k page size, the rational is that some of our customers could be interested in a data warehouse
onpg. 32k page size is a big win when all you do is seqscan all day long. 

We are looking for bug reports at these stage and some stress tests done without our own prejudices. Some test on real
datain non prod setting on queries that are highly CPU bound would be ideal. 

Thanks,
-cktan

> On Oct 17, 2014, at 6:43 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
>> On Fri, Oct 17, 2014 at 8:14 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>> On Fri, Oct 17, 2014 at 7:32 AM, CK Tan <cktan@vitessedata.com> wrote:
>>> Hi everyone,
>>>
>>> Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor
>>> designed for compute intensive OLAP workload. We have gotten it to a
>>> reasonable state and would like to open it up to the pg hackers
>>> community for testing and suggestions.
>>>
>>> Vitesse DB offers
>>> -- JIT Compilation for compute-intensive queries
>>> -- CSV parsing with SSE instructions
>>> -- 100% binary compatibility with PG9.3.5.
>>>
>>> Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X faster.
>>>
>>> Our TPCH 1GB benchmark results is also available at
>>> http://vitessedata.com/benchmark/ .
>>>
>>> Please direct any questions by email to cktan@vitessedata.com .
>>
>> You offer a binary with 32k block size...what's the rationale for that?
>
> (sorry for the double post)
>
> OK, I downloaded the ubuntu binary and ran your benchmarks (after
> making some minor .conf tweaks like disabling SSL).  I then ran your
> benchmark (after fixing the typo) of the count/sum/avg test -- *and
> noticed a 95% reduction in runtime performance* which is really quite
> amazing IMNSHO.  I also ran a select only test on small scale factor
> pgbench and didn't see any regression there -- in fact you beat stock
> by ~ 3% (although this could be measurement noise).   So now you've
> got my attention.  So, if you don't mind, quit being coy and explain
> how the software works and all the neat things it does and doesn't do.
>
> merlin



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Support UPDATE table SET(*)=...
Next
From: Merlin Moncure
Date:
Subject: Re: Support UPDATE table SET(*)=...