Thread: AXLE Plans for 9.5 and 9.6

AXLE Plans for 9.5 and 9.6

From

Simon Riggs

Date:

21 April 2014, 22:41:35

I've discussed 2ndQuadrant's involvement in the AXLE project a few
times publicly, but never on this mailing list. The project relates to
innovation and improvement in Business Intelligence for systems based
upon PostgreSQL in the range of 10-100TB.

Our work will span the 9.5 and 9.6 cycles. We're looking to make
measurable improvements in a number of cases; one of those is TPC-H,
since its a publicly accessible benchmark, another is a more private
benchmark on healthcare data. In brief, this means speeding up the
performance of large queries, data loading and looking at very large
systems issues.

Some of areas of R&D are definitely on the roadmap, others are more
flexible. Some of this is in progress, other stuff is not even at the
design stage - yet, just a few paragraphs along the lines of "we will
look at these topics". If we have room, its possible we may
accommodate other topics; this is not carte blanche, but the reason
for posting here is so people know we will take input, following the
normal community process. Detailed in-person discussions at PGCon are
expected and the Wiki pages will be updated for each aspect.

BI-related Indexing
* MinMax indexes
* Bitmap indexes

Large Systems
* Freeze avoidance
* Storage management issues for very large systems

Storage Efficiency
* Compression
* Column Orientation

Optimisation
* Bulk loading speed improvements
* Bulk FK evaluation
* Executor tuning for very large queries

Query tuning
* Approximate queries, sampling
* Materialized Views

...and possibly some other aspects.

2ndQuadrant is also assisting other researchers on GPU and FPGA
topics, which may also yield work of interest to PostgreSQL project.

Couple of points: The project is time limited, so if work gets pushed
back beyond that then we'll lose the opportunity to contribute. Please
support our work with timely objections, assistance in defining the
path forwards and limiting the scope to something that avoids wasting
this opportunity. Further funding is possible if we don't squander
this. We are being funded to make best efforts to contribute to open
source PostgreSQL, not pay-for-commit.

AXLE is funded by the EU under FP7 Grant Agreement 318633. Further
details are available here http://www.axleproject.eu/

(There are also other 2ndQuadrant development projects in progress,
this is just one of the larger ones).

Best Regards

-- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: AXLE Plans for 9.5 and 9.6

From

Josh Berkus

Date:

21 April 2014, 23:24:16

On 04/21/2014 03:41 PM, Simon Riggs wrote:
> Storage Efficiency
> * Compression
> * Column Orientation

You might look at turning this:

http://citusdata.github.io/cstore_fdw/

... into a more integrated part of Postgres.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: AXLE Plans for 9.5 and 9.6

From

Jov

Date:

22 April 2014, 09:43:05

what about runtime code generation using LLVM?

http://blog.cloudera.com/blog/2013/02/inside-cloudera-impala-runtime-code-generation/

http://llvm.org/devmtg/2013-11/slides/Wanderman-Milne-Cloudera.pdf

Jov

blog: http:amutu.com/blog

2014-04-22 6:41 GMT+08:00 Simon Riggs <simon@2ndquadrant.com>:

I've discussed 2ndQuadrant's involvement in the AXLE project a few
times publicly, but never on this mailing list. The project relates to
innovation and improvement in Business Intelligence for systems based
upon PostgreSQL in the range of 10-100TB.

Our work will span the 9.5 and 9.6 cycles. We're looking to make
measurable improvements in a number of cases; one of those is TPC-H,
since its a publicly accessible benchmark, another is a more private
benchmark on healthcare data. In brief, this means speeding up the
performance of large queries, data loading and looking at very large
systems issues.

Some of areas of R&D are definitely on the roadmap, others are more
flexible. Some of this is in progress, other stuff is not even at the
design stage - yet, just a few paragraphs along the lines of "we will
look at these topics". If we have room, its possible we may
accommodate other topics; this is not carte blanche, but the reason
for posting here is so people know we will take input, following the
normal community process. Detailed in-person discussions at PGCon are
expected and the Wiki pages will be updated for each aspect.

BI-related Indexing
* MinMax indexes
* Bitmap indexes

Large Systems
* Freeze avoidance
* Storage management issues for very large systems

Storage Efficiency
* Compression
* Column Orientation

Optimisation
* Bulk loading speed improvements
* Bulk FK evaluation
* Executor tuning for very large queries

Query tuning
* Approximate queries, sampling
* Materialized Views

...and possibly some other aspects.

2ndQuadrant is also assisting other researchers on GPU and FPGA
topics, which may also yield work of interest to PostgreSQL project.

Couple of points: The project is time limited, so if work gets pushed
back beyond that then we'll lose the opportunity to contribute. Please
support our work with timely objections, assistance in defining the
path forwards and limiting the scope to something that avoids wasting
this opportunity. Further funding is possible if we don't squander
this. We are being funded to make best efforts to contribute to open
source PostgreSQL, not pay-for-commit.

AXLE is funded by the EU under FP7 Grant Agreement 318633. Further
details are available here http://www.axleproject.eu/

(There are also other 2ndQuadrant development projects in progress,
this is just one of the larger ones).

Best Regards

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: AXLE Plans for 9.5 and 9.6

From

Simon Riggs

Date:

22 April 2014, 12:04:48

On 22 April 2014 00:24, Josh Berkus <josh@agliodbs.com> wrote:
> On 04/21/2014 03:41 PM, Simon Riggs wrote:
>> Storage Efficiency
>> * Compression
>> * Column Orientation
>
> You might look at turning this:
>
> http://citusdata.github.io/cstore_fdw/
>
> ... into a more integrated part of Postgres.

Of course I'm aware of that work - credit to them. Certainly, many
people feel that it is now time to do as you suggest and include
column store features within PostgreSQL.

As to turning it into a more integrated part of Postgres, we have a
few problems there

1. cstore_fdw code has an incompatible licence

2. I don't think FDWs are the right place for complex new
architectures such as column store, massively parallel processing or
sharding. The fact that it is probably the best place to implement it
in user space doesn't mean it transfers well into core code. That's a
shame and I don't know what to do about it, because it would be nice
to simply ask for change of licence and then integrate it, but it
seems more work than that (to me).

cstore_fdw uses ORC, which interestingly stores "lightweight index"
values that look exactly like MinMax indexes, so at least PostgreSQL
shoiuld be getting that soon.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: AXLE Plans for 9.5 and 9.6

From

Simon Riggs

Date:

22 April 2014, 12:09:46

On 22 April 2014 10:42, Jov <amutu@amutu.com> wrote:

> what about runtime code generation using LLVM?
> http://blog.cloudera.com/blog/2013/02/inside-cloudera-impala-runtime-code-generation/
> http://llvm.org/devmtg/2013-11/slides/Wanderman-Milne-Cloudera.pdf

Those techniques have been in use for at least 20 years on various platforms.

The main issues PostgreSQL faces is supporting many platforms and
compilers, while at the same time supporting extensible data types.

I believe there is some research work into run-time compilation in
progress, but that seems unlikely to make it into Postgres core.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: AXLE Plans for 9.5 and 9.6

From

Hannu Krosing

Date:

22 April 2014, 12:10:39

On 04/22/2014 01:24 AM, Josh Berkus wrote:
> On 04/21/2014 03:41 PM, Simon Riggs wrote:
>> Storage Efficiency
>> * Compression
>> * Column Orientation
> You might look at turning this:
>
> http://citusdata.github.io/cstore_fdw/
>
> ... into a more integrated part of Postgres.
What would be of more general usefulness is probably
better planning and better performance of FDW interface.

So instead of integrating one specific FDW it would make
sense to improve postgresql so that it can use (properly written)
FDWs at native speeds

Regards

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: AXLE Plans for 9.5 and 9.6

From

"MauMau"

Date:

22 April 2014, 12:12:40

From: "Simon Riggs" <simon@2ndQuadrant.com>
> Some of areas of R&D are definitely on the roadmap, others are more
> flexible. Some of this is in progress, other stuff is not even at the
> design stage - yet, just a few paragraphs along the lines of "we will
> look at these topics". If we have room, its possible we may
> accommodate other topics; this is not carte blanche, but the reason
> for posting here is so people know we will take input, following the
> normal community process. Detailed in-person discussions at PGCon are
> expected and the Wiki pages will be updated for each aspect.
>
> BI-related Indexing
> * MinMax indexes
> * Bitmap indexes
>
> Large Systems
> * Freeze avoidance
> * Storage management issues for very large systems
>
> Storage Efficiency
> * Compression
> * Column Orientation
>
> Optimisation
> * Bulk loading speed improvements
> * Bulk FK evaluation
> * Executor tuning for very large queries
>
> Query tuning
> * Approximate queries, sampling
> * Materialized Views

Great!  I'm looking forward to seeing PostgreSQL evolve as an analytics 
database for data warehousing.  Is there any reason why in-memory database 
and MPP is not included?

Are you planning to include the above features in 9.5 and 9.6?  Are you 
recommending other developers not implement these features to avoid 
duplication of work with AXLE?

Regards
MauMau

Re: AXLE Plans for 9.5 and 9.6

From

Hannu Krosing

Date:

22 April 2014, 12:30:40

On 04/22/2014 02:04 PM, Simon Riggs wrote:
> On 22 April 2014 00:24, Josh Berkus <josh@agliodbs.com> wrote:
>> On 04/21/2014 03:41 PM, Simon Riggs wrote:
>>> Storage Efficiency
>>> * Compression
>>> * Column Orientation
>> You might look at turning this:
>>
>> http://citusdata.github.io/cstore_fdw/
>>
>> ... into a more integrated part of Postgres.
> Of course I'm aware of that work - credit to them. Certainly, many
> people feel that it is now time to do as you suggest and include
> column store features within PostgreSQL.
>
> As to turning it into a more integrated part of Postgres, we have a
> few problems there
>
> 1. cstore_fdw code has an incompatible licence
>
> 2. I don't think FDWs are the right place for complex new
> architectures such as column store, massively parallel processing or
> sharding. 
I agree that FDW is not an end-all solution for all these, but it is a
reasonable starting point and it just might be that the extra things
needed could be added to our FDW API instead of sewing it directly
into backend guts.


I recently tried to implement sharding at FDW level and the main
problem I ran into was a missing join type for efficiently using it
for certain queries.

The specific use case was queries of form

select l.*, r*
from remotetable r
join localtable l
on l.key1 = r.id and l.n = N;

PostgreSQL offered only two options:

1) full scan on remote table

2) single id=$ selects

neither of which are what is actually needed, as firs performs badly
if there are more than a few rows in remote table and 2nd performs
badly if l.n = N returns more than a few rows

when I manually rewrote the query to

select l.*, r*
from remotetable r where r.id = ANY(ARRAY(select key1 from localtable
where n = N))
join localtable l
on l.key1 = r.id and l.n = N;

it run really well.

Unfortunately this is not something that postgreSQL considers by itself
while optimising.

BTW, this kind of optimisation should also be a win for really large IN
queries if we
could have an indexed IN whic would not start each lookup from the index
root, but
rather would sort the IN contents and do an index merge vis skipping
from current position.


Cheers

Re: AXLE Plans for 9.5 and 9.6

From

Andrew Dunstan

Date:

22 April 2014, 13:29:44

On 04/22/2014 08:15 AM, MauMau wrote:
>
>
> Are you planning to include the above features in 9.5 and 9.6? Are you 
> recommending other developers not implement these features to avoid 
> duplication of work with AXLE?
>
>

Without pointing any fingers, I should note that I have learned the hard 
way to take such recommendations with a grain of salt. More than once I 
have been stopped from working on something because someone else said 
they were, only for nothing to appear, and in the interests of full 
disclosure I can think of two significant instances when I have been 
similarly guilty, although the most serious of those has since been 
rectified by someone else.

cheers

andrew

Re: AXLE Plans for 9.5 and 9.6

From

Andrew Dunstan

Date:

22 April 2014, 13:39:23

On 04/22/2014 08:04 AM, Simon Riggs wrote:
> On 22 April 2014 00:24, Josh Berkus <josh@agliodbs.com> wrote:
>> On 04/21/2014 03:41 PM, Simon Riggs wrote:
>>> Storage Efficiency
>>> * Compression
>>> * Column Orientation
>> You might look at turning this:
>>
>> http://citusdata.github.io/cstore_fdw/
>>
>> ... into a more integrated part of Postgres.
> Of course I'm aware of that work - credit to them. Certainly, many
> people feel that it is now time to do as you suggest and include
> column store features within PostgreSQL.
>
> As to turning it into a more integrated part of Postgres, we have a
> few problems there
>
> 1. cstore_fdw code has an incompatible licence
>
> 2. I don't think FDWs are the right place for complex new
> architectures such as column store, massively parallel processing or
> sharding. The fact that it is probably the best place to implement it
> in user space doesn't mean it transfers well into core code. That's a
> shame and I don't know what to do about it, because it would be nice
> to simply ask for change of licence and then integrate it, but it
> seems more work than that (to me).
>

I agree, and indeed that was something like my first reaction to hearing 
about this development - FDW seems like a very odd way to handle this. 
But the notion of builtin columnar storage suggests to me that we really 
need first to tackle how various storage engines might be incorporated 
into Postgres. I know this has been a bugbear for many years, but maybe 
now with serious proposals for alternative storage engines on the 
horizon we can no longer afford to put off the evil day when we grapple 
with it.

cheers

andrew

Re: AXLE Plans for 9.5 and 9.6

From

Simon Riggs

Date:

22 April 2014, 13:45:50

On 22 April 2014 13:15, MauMau <maumau307@gmail.com> wrote:

> Great!  I'm looking forward to seeing PostgreSQL evolve as an analytics
> database for data warehousing.  Is there any reason why in-memory database
> and MPP is not included?

Those ideas are valid; the features are bounded by resource
constraints of time and money, as well as by technical skills/
capacities of my fellow developers. My analysis has been that
implementing parallelism has lower benefit/cost ratio than other
features, as well as requiring more expensive servers (for MPP). I
expect MPP to be an eventual end goal from BDR project.

> Are you planning to include the above features in 9.5 and 9.6?

Yes

> Are you
> recommending other developers not implement these features to avoid
> duplication of work with AXLE?

This was more to draw attention to the work so that all interested
parties can participate in producing something useful.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: AXLE Plans for 9.5 and 9.6

From

Stephen Frost

Date:

22 April 2014, 14:16:44

* Andrew Dunstan (andrew@dunslane.net) wrote:
> I agree, and indeed that was something like my first reaction to
> hearing about this development - FDW seems like a very odd way to
> handle this. But the notion of builtin columnar storage suggests to
> me that we really need first to tackle how various storage engines
> might be incorporated into Postgres. I know this has been a bugbear
> for many years, but maybe now with serious proposals for alternative
> storage engines on the horizon we can no longer afford to put off
> the evil day when we grapple with it.

Agreed, and it goes beyond just columnar stores- I could see IOTs being
implemented using this notion of a different 'storage engine', but
calling it a 'storage engine' makes it sound like we want to change how
we access files and I don't think we really want to change that but
rather come up with a way to have an alternative heap..  Columnar or
IOTs would still be page-based and go through shared buffers, etc, I'd
think..
Thanks,
    Stephen

Re: AXLE Plans for 9.5 and 9.6

From

Josh Berkus

Date:

22 April 2014, 17:03:12

On 04/22/2014 06:39 AM, Andrew Dunstan wrote:
> I agree, and indeed that was something like my first reaction to hearing
> about this development - FDW seems like a very odd way to handle this.
> But the notion of builtin columnar storage suggests to me that we really
> need first to tackle how various storage engines might be incorporated
> into Postgres. I know this has been a bugbear for many years, but maybe
> now with serious proposals for alternative storage engines on the
> horizon we can no longer afford to put off the evil day when we grapple
> with it.

Yes.  *IF* PostgreSQL already supported alternate storage, then the
Citus folks might have released their CStore as a storage plugin instead
of an FDW.  However, if they'd waited for pluggable storage, they'd
still be waiting.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: AXLE Plans for 9.5 and 9.6

From

Pavel Stehule

Date:

22 April 2014, 17:30:24

2014-04-22 19:02 GMT+02:00 Josh Berkus <josh@agliodbs.com>:

On 04/22/2014 06:39 AM, Andrew Dunstan wrote:
> I agree, and indeed that was something like my first reaction to hearing
> about this development - FDW seems like a very odd way to handle this.
> But the notion of builtin columnar storage suggests to me that we really
> need first to tackle how various storage engines might be incorporated
> into Postgres. I know this has been a bugbear for many years, but maybe
> now with serious proposals for alternative storage engines on the
> horizon we can no longer afford to put off the evil day when we grapple
> with it.

Yes. *IF* PostgreSQL already supported alternate storage, then the
Citus folks might have released their CStore as a storage plugin instead
of an FDW. However, if they'd waited for pluggable storage, they'd
still be waiting.

I am sceptical - what I know about OLAP column store databases - they need a hardly different planner, so just engine or storage is not enough. Vector Wise try to merge Ingres with Monet engine more than four years - and still has some issues.

Our extensibility is probably major barrier against fast OLAP - I see a most realistic way to support better partitioning and going in direction higher parallelism and distribution - and maybe map/reduce support.

In GoodData we use successfully Postgres for BI projects to 20G with fast response - and most painfulness are missing MERGE, missing fault tolerant copy, IO expensive update of large tables with lot of indexes and missing simple massive partitioning. On second hand - Postgres works perfectly on thousands databases with thousands tables without errors with terrible simple deploying in cloud environment.

Regards

Pavel

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers