Thread: Vitesse DB call for testing

Vitesse DB call for testing

From

CK Tan

Date:

17 October 2014, 12:32:19

Hi everyone,

Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor
designed for compute intensive OLAP workload. We have gotten it to a
reasonable state and would like to open it up to the pg hackers
community for testing and suggestions.

Vitesse DB offers
-- JIT Compilation for compute-intensive queries
-- CSV parsing with SSE instructions
-- 100% binary compatibility with PG9.3.5.

Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X faster.

Our TPCH 1GB benchmark results is also available at
http://vitessedata.com/benchmark/ .

Please direct any questions by email to cktan@vitessedata.com .

Thank you for your help.

--
CK Tan
Vitesse Data, Inc.

Re: Vitesse DB call for testing

From

Andres Freund

Date:

17 October 2014, 12:38:01

Hi,


On 2014-10-17 05:32:13 -0700, CK Tan wrote:
> Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor
> designed for compute intensive OLAP workload. We have gotten it to a
> reasonable state and would like to open it up to the pg hackers
> community for testing and suggestions.
> 
> Vitesse DB offers
> -- JIT Compilation for compute-intensive queries
> -- CSV parsing with SSE instructions
> -- 100% binary compatibility with PG9.3.5.
> 
> Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X faster.

How are these modifications licensed?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Vitesse DB call for testing

From

Merlin Moncure

Date:

17 October 2014, 13:14:28

On Fri, Oct 17, 2014 at 7:32 AM, CK Tan <cktan@vitessedata.com> wrote:
> Hi everyone,
>
> Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor
> designed for compute intensive OLAP workload. We have gotten it to a
> reasonable state and would like to open it up to the pg hackers
> community for testing and suggestions.
>
> Vitesse DB offers
> -- JIT Compilation for compute-intensive queries
> -- CSV parsing with SSE instructions
> -- 100% binary compatibility with PG9.3.5.
>
> Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X faster.
>
> Our TPCH 1GB benchmark results is also available at
> http://vitessedata.com/benchmark/ .
>
> Please direct any questions by email to cktan@vitessedata.com .

You offer a binary with 32k block size...what's the rationale for that?

merlin

Re: Vitesse DB call for testing

From

Merlin Moncure

Date:

17 October 2014, 13:43:35

On Fri, Oct 17, 2014 at 8:14 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Fri, Oct 17, 2014 at 7:32 AM, CK Tan <cktan@vitessedata.com> wrote:
>> Hi everyone,
>>
>> Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor
>> designed for compute intensive OLAP workload. We have gotten it to a
>> reasonable state and would like to open it up to the pg hackers
>> community for testing and suggestions.
>>
>> Vitesse DB offers
>> -- JIT Compilation for compute-intensive queries
>> -- CSV parsing with SSE instructions
>> -- 100% binary compatibility with PG9.3.5.
>>
>> Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X faster.
>>
>> Our TPCH 1GB benchmark results is also available at
>> http://vitessedata.com/benchmark/ .
>>
>> Please direct any questions by email to cktan@vitessedata.com .
>
> You offer a binary with 32k block size...what's the rationale for that?

(sorry for the double post)

OK, I downloaded the ubuntu binary and ran your benchmarks (after
making some minor .conf tweaks like disabling SSL).  I then ran your
benchmark (after fixing the typo) of the count/sum/avg test -- *and
noticed a 95% reduction in runtime performance* which is really quite
amazing IMNSHO.  I also ran a select only test on small scale factor
pgbench and didn't see any regression there -- in fact you beat stock
by ~ 3% (although this could be measurement noise).   So now you've
got my attention.  So, if you don't mind, quit being coy and explain
how the software works and all the neat things it does and doesn't do.

merlin

Re: Vitesse DB call for testing

From

CK Tan

Date:

17 October 2014, 15:47:23

Merlin, glad you tried it.

We take the query plan exactly as given by the planner, decide whether to JIT or to punt depending on the cost. If we
punt,it goes back to pg executor. If we JIT, and if we could not proceed (usually of some operators we haven't
implementedyet), we again punt. Once we were able to generate the code, there is no going back; we call into LLVM to
obtainthe function entry point, and run it to completion. The 3% improvement you see in OLTP tests is definitely noise. 

The bigint sum,avg,count case in the example you tried has some optimization. We use int128 to accumulate the bigint
insteadof numeric in pg. Hence the big speed up. Try the same query on int4 for the improvement where both pg and
vitessedbare using int4 in the execution. 

The speed up is really noticeable when the data type is nonvarlena. In the varlena cases, we still call into pg
routinesmost of the times. Again, try the sum,avg,count query on numeric, and you will see what I mean. 

Also, we don't support UDF at the moment. So all queries involving UDF gets sent to pg executor.

On your question of 32k page size, the rational is that some of our customers could be interested in a data warehouse
onpg. 32k page size is a big win when all you do is seqscan all day long. 

We are looking for bug reports at these stage and some stress tests done without our own prejudices. Some test on real
datain non prod setting on queries that are highly CPU bound would be ideal. 

Thanks,
-cktan

> On Oct 17, 2014, at 6:43 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
>> On Fri, Oct 17, 2014 at 8:14 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>> On Fri, Oct 17, 2014 at 7:32 AM, CK Tan <cktan@vitessedata.com> wrote:
>>> Hi everyone,
>>>
>>> Vitesse DB 9.3.5.S is Postgres 9.3.5 with a LLVM-JIT query executor
>>> designed for compute intensive OLAP workload. We have gotten it to a
>>> reasonable state and would like to open it up to the pg hackers
>>> community for testing and suggestions.
>>>
>>> Vitesse DB offers
>>> -- JIT Compilation for compute-intensive queries
>>> -- CSV parsing with SSE instructions
>>> -- 100% binary compatibility with PG9.3.5.
>>>
>>> Our results show CSV imports run up to 2X faster, and TPCH Q1 runs 8X faster.
>>>
>>> Our TPCH 1GB benchmark results is also available at
>>> http://vitessedata.com/benchmark/ .
>>>
>>> Please direct any questions by email to cktan@vitessedata.com .
>>
>> You offer a binary with 32k block size...what's the rationale for that?
>
> (sorry for the double post)
>
> OK, I downloaded the ubuntu binary and ran your benchmarks (after
> making some minor .conf tweaks like disabling SSL).  I then ran your
> benchmark (after fixing the typo) of the count/sum/avg test -- *and
> noticed a 95% reduction in runtime performance* which is really quite
> amazing IMNSHO.  I also ran a select only test on small scale factor
> pgbench and didn't see any regression there -- in fact you beat stock
> by ~ 3% (although this could be measurement noise).   So now you've
> got my attention.  So, if you don't mind, quit being coy and explain
> how the software works and all the neat things it does and doesn't do.
>
> merlin

Re: Vitesse DB call for testing

From

Merlin Moncure

Date:

17 October 2014, 16:00:48

On Fri, Oct 17, 2014 at 10:47 AM, CK Tan <cktan@vitessedata.com> wrote:
> Merlin, glad you tried it.
>
> We take the query plan exactly as given by the planner, decide whether to JIT or to punt depending on the cost. If we
punt,it goes back to pg executor. If we JIT, and if we could not proceed (usually of some operators we haven't
implementedyet), we again punt. Once we were able to generate the code, there is no going back; we call into LLVM to
obtainthe function entry point, and run it to completion. The 3% improvement you see in OLTP tests is definitely noise. 
>
> The bigint sum,avg,count case in the example you tried has some optimization. We use int128 to accumulate the bigint
insteadof numeric in pg. Hence the big speed up. Try the same query on int4 for the improvement where both pg and
vitessedbare using int4 in the execution. 
>
> The speed up is really noticeable when the data type is nonvarlena. In the varlena cases, we still call into pg
routinesmost of the times. Again, try the sum,avg,count query on numeric, and you will see what I mean. 
>
> Also, we don't support UDF at the moment. So all queries involving UDF gets sent to pg executor.
>
> On your question of 32k page size, the rational is that some of our customers could be interested in a data warehouse
onpg. 32k page size is a big win when all you do is seqscan all day long. 
>
> We are looking for bug reports at these stage and some stress tests done without our own prejudices. Some test on
realdata in non prod setting on queries that are highly CPU bound would be ideal. 

One thing that I noticed is that when slamming your benchmark query
via pgbench, resident memory consumption was really aggressive and
would have taken down the server had I not spuriously stopped the
test.  Memory consumption did return to baseline after that so I
figured some type of llvm memory management games were going on.  This
isn't really a problem for most OLAP workloads but it's something to
be aware of.

Via 'perf top' on stock postgres, you see the usual suspects: palloc,
hash_search_etc.   On your build though HeapTuplesSatisfiesMVCC zooms
right to the top of the stack which is pretty interesting...the
executor is you've built is very lean and mean for sure.  A drop in
optimization engine with little no schema/sql changes is pretty neat
-- your primary competitor here is going to be column organized type
table solutions to olap type problems.

merlin

Re: Vitesse DB call for testing

From

Tom Lane

Date:

17 October 2014, 17:12:38

CK Tan <cktan@vitessedata.com> writes:
> The bigint sum,avg,count case in the example you tried has some optimization. We use int128 to accumulate the bigint
insteadof numeric in pg. Hence the big speed up. Try the same query on int4 for the improvement where both pg and
vitessedbare using int4 in the execution.
 

Well, that's pretty much cheating: it's too hard to disentangle what's
coming from JIT vs what's coming from using a different accumulator
datatype.  If we wanted to depend on having int128 available we could
get that speedup with a couple hours' work.

But what exactly are you "compiling" here?  I trust not the actual data
accesses; that seems far too complicated to try to inline.
        regards, tom lane

Fwd: Vitesse DB call for testing

From

Feng Tian

Date:

17 October 2014, 18:08:54

Hi, Tom,

Sorry for double post to you.

Feng

---------- Forwarded message ----------
From: Feng Tian <ftian@vitessedata.com>
Date: Fri, Oct 17, 2014 at 10:29 AM
Subject: Re: [HACKERS] Vitesse DB call for testing
To: Tom Lane <tgl@sss.pgh.pa.us>

Hi, Tom,

I agree using that using int128 in stock postgres will speed up things too. On the other hand, that is only one part of the equation. For example, if you look at TPCH Q1, the int128 "cheating" does not kick in at all, but we are 8x faster.

I am not sure why do you mean by "actual data access". Data is still in stock postgres format on disk. We indeed jit-ed all data fields access (deform tuple). To put things in perspective, I just timed select count(*) and select count(l_orderkey) from tpch1.lineitem; Our code is bottlenecked by memory bandwidth and difference is pretty much invisible.

Thanks,

Feng

ftian=# set vdb_jit = 0;

SET

Time: 0.155 ms

ftian=# select count(*) from tpch1.lineitem;

count

---------

6001215

(1 row)

Time: 688.658 ms

ftian=# select count(*) from tpch1.lineitem;

count

---------

6001215

(1 row)

Time: 690.753 ms

ftian=# select count(l_orderkey) from tpch1.lineitem;

count

---------

6001215

(1 row)

Time: 819.452 ms

ftian=# set vdb_jit = 1;

SET

Time: 0.167 ms

ftian=# select count(*) from tpch1.lineitem;

count

---------

6001215

(1 row)

Time: 203.543 ms

ftian=# select count(l_orderkey) from tpch1.lineitem;

count

---------

6001215

(1 row)

Time: 202.253 ms

ftian=#

On Fri, Oct 17, 2014 at 10:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

CK Tan <cktan@vitessedata.com> writes:
> The bigint sum,avg,count case in the example you tried has some optimization. We use int128 to accumulate the bigint instead of numeric in pg. Hence the big speed up. Try the same query on int4 for the improvement where both pg and vitessedb are using int4 in the execution.

Well, that's pretty much cheating: it's too hard to disentangle what's
coming from JIT vs what's coming from using a different accumulator
datatype. If we wanted to depend on having int128 available we could
get that speedup with a couple hours' work.

But what exactly are you "compiling" here? I trust not the actual data
accesses; that seems far too complicated to try to inline.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Vitesse DB call for testing

From

Josh Berkus

Date:

17 October 2014, 18:13:27

CK,

Before we go any further on this, how is Vitesse currently licensed?
last time we talked it was still proprietary.  If it's not being
open-sourced, we likely need to take discussion off this list.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Vitesse DB call for testing

From

Peter Geoghegan

Date:

17 October 2014, 18:22:07

On Fri, Oct 17, 2014 at 11:08 AM, Feng Tian <ftian@vitessedata.com> wrote:
> I agree using that using int128 in stock postgres will speed up things too.
> On the other hand, that is only one part of the equation.   For example, if
> you look at TPCH Q1, the int128 "cheating" does not kick in at all, but we
> are 8x faster.

I'm curious about how the numbers look when stock Postgres is built
with the same page size as your fork. You didn't mention whether or
not your Postgres numbers came from a standard build.

-- 
Peter Geoghegan

Re: Vitesse DB call for testing

From

Andres Freund

Date:

17 October 2014, 18:25:11

On 2014-10-17 13:12:27 -0400, Tom Lane wrote:
> Well, that's pretty much cheating: it's too hard to disentangle what's
> coming from JIT vs what's coming from using a different accumulator
> datatype.  If we wanted to depend on having int128 available we could
> get that speedup with a couple hours' work.

I think doing that when configure detects int128 would make a great deal
of sense. It's not like we'd save a great deal of complicated code by
removing the existing accumulator... We'd still have to return a
numeric, but that's likely lost in the noise cost wise.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Vitesse DB call for testing

From

Merlin Moncure

Date:

17 October 2014, 18:29:17

On Fri, Oct 17, 2014 at 1:21 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Oct 17, 2014 at 11:08 AM, Feng Tian <ftian@vitessedata.com> wrote:
>> I agree using that using int128 in stock postgres will speed up things too.
>> On the other hand, that is only one part of the equation.   For example, if
>> you look at TPCH Q1, the int128 "cheating" does not kick in at all, but we
>> are 8x faster.
>
> I'm curious about how the numbers look when stock Postgres is built
> with the same page size as your fork. You didn't mention whether or
> not your Postgres numbers came from a standard build.

I downloaded the 8kb varant.

vitesse (median of 3):
postgres=# select count(*), sum(i*i), avg(i) from t; count  │        sum         │         avg
─────────┼────────────────────┼─────────────────────1000000 │ 333333833333500000 │ 500000.500000000000
(1 row)

Time: 39.197 ms

stock (median of 3):
postgres=# select count(*), sum(i*i), avg(i) from t; count  │        sum         │         avg
─────────┼────────────────────┼─────────────────────1000000 │ 333333833333500000 │ 500000.500000000000
(1 row)

Time: 667.362 ms

(stock int4 ops)
postgres=# select sum(1::int4) from t;  sum
─────────1000000
(1 row)

Time: 75.265 ms

What I'm wondering is how complex the hooks are that tie the
technology in.   Unless a bsd licensed patch materializes, the
conversation (beyond the intitial wow! factor) should probably focus
on a possible integration points and/or implementation of technology
into core in a general way.

merlin

Re: Vitesse DB call for testing

From

Tom Lane

Date:

17 October 2014, 18:36:10

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-10-17 13:12:27 -0400, Tom Lane wrote:
>> Well, that's pretty much cheating: it's too hard to disentangle what's
>> coming from JIT vs what's coming from using a different accumulator
>> datatype.  If we wanted to depend on having int128 available we could
>> get that speedup with a couple hours' work.

> I think doing that when configure detects int128 would make a great deal
> of sense.

Yeah, I was wondering about that myself: use int128 if available,
else fall back on existing code path.
        regards, tom lane

Re: Vitesse DB call for testing

From

CK Tan

Date:

17 October 2014, 18:52:38

Happy to contribute to that decision :-)


On Fri, Oct 17, 2014 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>> On 2014-10-17 13:12:27 -0400, Tom Lane wrote:
>>> Well, that's pretty much cheating: it's too hard to disentangle what's
>>> coming from JIT vs what's coming from using a different accumulator
>>> datatype.  If we wanted to depend on having int128 available we could
>>> get that speedup with a couple hours' work.
>
>> I think doing that when configure detects int128 would make a great deal
>> of sense.
>
> Yeah, I was wondering about that myself: use int128 if available,
> else fall back on existing code path.
>
>                         regards, tom lane

Re: Vitesse DB call for testing

From

David Gould

Date:

18 October 2014, 03:40:25

On Fri, 17 Oct 2014 13:12:27 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> CK Tan <cktan@vitessedata.com> writes:
> > The bigint sum,avg,count case in the example you tried has some optimization. We use int128 to accumulate the
bigintinstead of numeric in pg. Hence the big speed up. Try the same query on int4 for the improvement where both pg
andvitessedb are using int4 in the execution.
 
> 
> Well, that's pretty much cheating: it's too hard to disentangle what's
> coming from JIT vs what's coming from using a different accumulator
> datatype.  If we wanted to depend on having int128 available we could
> get that speedup with a couple hours' work.
> 
> But what exactly are you "compiling" here?  I trust not the actual data
> accesses; that seems far too complicated to try to inline.
> 
>             regards, tom lane
> 
> 

I don't have any inside knowledge, but from the presentation given at the
recent SFPUG followed by a bit of google-fu I think these papers are
relevant:
 http://www.vldb.org/pvldb/vol4/p539-neumann.pdf http://sites.computer.org/debull/A14mar/p3.pdf

-dg

-- 
David Gould              510 282 0869         daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

Re: Vitesse DB call for testing

From

CK Tan

Date:

18 October 2014, 04:46:17

Indeed! A big part of our implementation is based on the Neumann
paper. There are also a few other papers that impacted our
implemented:

A. Ailamaki, D. DeWitt, M. Hill, D. Wood. DBMSs On A Modern Processor:
Where Does Time Go?

Peter Boncz, Marcin Zukowski, Niels Nes. MonetDB/X100:
Hyper-Pipelining Query Execution

M. Zukowski el al. Super-Scalar RAM-CPU Cache Compression

Of course, we need to adapt a lot of the design to Postgres to make
something that could stand up harmoniously with the Postgres system,
and also to take care that we would be able to merge easily with
future versions of Postgres -- the implementation needs to be as
non-invasive as possible.

Regards,
-cktan

On Fri, Oct 17, 2014 at 8:40 PM, David Gould <daveg@sonic.net> wrote:
> On Fri, 17 Oct 2014 13:12:27 -0400
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> CK Tan <cktan@vitessedata.com> writes:
>> > The bigint sum,avg,count case in the example you tried has some optimization. We use int128 to accumulate the
bigintinstead of numeric in pg. Hence the big speed up. Try the same query on int4 for the improvement where both pg
andvitessedb are using int4 in the execution. 
>>
>> Well, that's pretty much cheating: it's too hard to disentangle what's
>> coming from JIT vs what's coming from using a different accumulator
>> datatype.  If we wanted to depend on having int128 available we could
>> get that speedup with a couple hours' work.
>>
>> But what exactly are you "compiling" here?  I trust not the actual data
>> accesses; that seems far too complicated to try to inline.
>>
>>                       regards, tom lane
>>
>>
>
> I don't have any inside knowledge, but from the presentation given at the
> recent SFPUG followed by a bit of google-fu I think these papers are
> relevant:
>
>   http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
>   http://sites.computer.org/debull/A14mar/p3.pdf
>
> -dg
>
> --
> David Gould              510 282 0869         daveg@sonic.net
> If simplicity worked, the world would be overrun with insects.

Re: Vitesse DB call for testing

From

Mark Kirkwood

Date:

18 October 2014, 05:55:20

On 18/10/14 07:13, Josh Berkus wrote:
> CK,
>
> Before we go any further on this, how is Vitesse currently licensed?
> last time we talked it was still proprietary.  If it's not being
> open-sourced, we likely need to take discussion off this list.
>

+1

Guys, you need to 'fess up on the licensing!

Regards

Mark

Re: Vitesse DB call for testing

From

CK Tan

Date:

18 October 2014, 07:11:59

Hi Mark,

Vitesse DB won't be open-sourced, or it would have been a contrib
module to postgres. We should take further discussions off this list.
People should contact me directly if there is any questions.

Thanks,
cktan@vitessedata.com

On Fri, Oct 17, 2014 at 10:55 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:
> On 18/10/14 07:13, Josh Berkus wrote:
>>
>> CK,
>>
>> Before we go any further on this, how is Vitesse currently licensed?
>> last time we talked it was still proprietary.  If it's not being
>> open-sourced, we likely need to take discussion off this list.
>>
>
> +1
>
> Guys, you need to 'fess up on the licensing!
>
> Regards
>
> Mark