Thread: AXLE Plans for 9.5 and 9.6
I've discussed 2ndQuadrant's involvement in the AXLE project a few times publicly, but never on this mailing list. The project relates to innovation and improvement in Business Intelligence for systems based upon PostgreSQL in the range of 10-100TB. Our work will span the 9.5 and 9.6 cycles. We're looking to make measurable improvements in a number of cases; one of those is TPC-H, since its a publicly accessible benchmark, another is a more private benchmark on healthcare data. In brief, this means speeding up the performance of large queries, data loading and looking at very large systems issues. Some of areas of R&D are definitely on the roadmap, others are more flexible. Some of this is in progress, other stuff is not even at the design stage - yet, just a few paragraphs along the lines of "we will look at these topics". If we have room, its possible we may accommodate other topics; this is not carte blanche, but the reason for posting here is so people know we will take input, following the normal community process. Detailed in-person discussions at PGCon are expected and the Wiki pages will be updated for each aspect. BI-related Indexing * MinMax indexes * Bitmap indexes Large Systems * Freeze avoidance * Storage management issues for very large systems Storage Efficiency * Compression * Column Orientation Optimisation * Bulk loading speed improvements * Bulk FK evaluation * Executor tuning for very large queries Query tuning * Approximate queries, sampling * Materialized Views ...and possibly some other aspects. 2ndQuadrant is also assisting other researchers on GPU and FPGA topics, which may also yield work of interest to PostgreSQL project. Couple of points: The project is time limited, so if work gets pushed back beyond that then we'll lose the opportunity to contribute. Please support our work with timely objections, assistance in defining the path forwards and limiting the scope to something that avoids wasting this opportunity. Further funding is possible if we don't squander this. We are being funded to make best efforts to contribute to open source PostgreSQL, not pay-for-commit. AXLE is funded by the EU under FP7 Grant Agreement 318633. Further details are available here http://www.axleproject.eu/ (There are also other 2ndQuadrant development projects in progress, this is just one of the larger ones). Best Regards -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 04/21/2014 03:41 PM, Simon Riggs wrote: > Storage Efficiency > * Compression > * Column Orientation You might look at turning this: http://citusdata.github.io/cstore_fdw/ ... into a more integrated part of Postgres. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
what about runtime code generation using LLVM?
Jov
blog: http:amutu.com/blog
2014-04-22 6:41 GMT+08:00 Simon Riggs <simon@2ndquadrant.com>:
I've discussed 2ndQuadrant's involvement in the AXLE project a few
times publicly, but never on this mailing list. The project relates to
innovation and improvement in Business Intelligence for systems based
upon PostgreSQL in the range of 10-100TB.
Our work will span the 9.5 and 9.6 cycles. We're looking to make
measurable improvements in a number of cases; one of those is TPC-H,
since its a publicly accessible benchmark, another is a more private
benchmark on healthcare data. In brief, this means speeding up the
performance of large queries, data loading and looking at very large
systems issues.
Some of areas of R&D are definitely on the roadmap, others are more
flexible. Some of this is in progress, other stuff is not even at the
design stage - yet, just a few paragraphs along the lines of "we will
look at these topics". If we have room, its possible we may
accommodate other topics; this is not carte blanche, but the reason
for posting here is so people know we will take input, following the
normal community process. Detailed in-person discussions at PGCon are
expected and the Wiki pages will be updated for each aspect.
BI-related Indexing
* MinMax indexes
* Bitmap indexes
Large Systems
* Freeze avoidance
* Storage management issues for very large systems
Storage Efficiency
* Compression
* Column Orientation
Optimisation
* Bulk loading speed improvements
* Bulk FK evaluation
* Executor tuning for very large queries
Query tuning
* Approximate queries, sampling
* Materialized Views
...and possibly some other aspects.
2ndQuadrant is also assisting other researchers on GPU and FPGA
topics, which may also yield work of interest to PostgreSQL project.
Couple of points: The project is time limited, so if work gets pushed
back beyond that then we'll lose the opportunity to contribute. Please
support our work with timely objections, assistance in defining the
path forwards and limiting the scope to something that avoids wasting
this opportunity. Further funding is possible if we don't squander
this. We are being funded to make best efforts to contribute to open
source PostgreSQL, not pay-for-commit.
AXLE is funded by the EU under FP7 Grant Agreement 318633. Further
details are available here http://www.axleproject.eu/
(There are also other 2ndQuadrant development projects in progress,
this is just one of the larger ones).
Best Regards
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 22 April 2014 00:24, Josh Berkus <josh@agliodbs.com> wrote: > On 04/21/2014 03:41 PM, Simon Riggs wrote: >> Storage Efficiency >> * Compression >> * Column Orientation > > You might look at turning this: > > http://citusdata.github.io/cstore_fdw/ > > ... into a more integrated part of Postgres. Of course I'm aware of that work - credit to them. Certainly, many people feel that it is now time to do as you suggest and include column store features within PostgreSQL. As to turning it into a more integrated part of Postgres, we have a few problems there 1. cstore_fdw code has an incompatible licence 2. I don't think FDWs are the right place for complex new architectures such as column store, massively parallel processing or sharding. The fact that it is probably the best place to implement it in user space doesn't mean it transfers well into core code. That's a shame and I don't know what to do about it, because it would be nice to simply ask for change of licence and then integrate it, but it seems more work than that (to me). cstore_fdw uses ORC, which interestingly stores "lightweight index" values that look exactly like MinMax indexes, so at least PostgreSQL shoiuld be getting that soon. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 22 April 2014 10:42, Jov <amutu@amutu.com> wrote: > what about runtime code generation using LLVM? > http://blog.cloudera.com/blog/2013/02/inside-cloudera-impala-runtime-code-generation/ > http://llvm.org/devmtg/2013-11/slides/Wanderman-Milne-Cloudera.pdf Those techniques have been in use for at least 20 years on various platforms. The main issues PostgreSQL faces is supporting many platforms and compilers, while at the same time supporting extensible data types. I believe there is some research work into run-time compilation in progress, but that seems unlikely to make it into Postgres core. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 04/22/2014 01:24 AM, Josh Berkus wrote: > On 04/21/2014 03:41 PM, Simon Riggs wrote: >> Storage Efficiency >> * Compression >> * Column Orientation > You might look at turning this: > > http://citusdata.github.io/cstore_fdw/ > > ... into a more integrated part of Postgres. What would be of more general usefulness is probably better planning and better performance of FDW interface. So instead of integrating one specific FDW it would make sense to improve postgresql so that it can use (properly written) FDWs at native speeds Regards -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
From: "Simon Riggs" <simon@2ndQuadrant.com> > Some of areas of R&D are definitely on the roadmap, others are more > flexible. Some of this is in progress, other stuff is not even at the > design stage - yet, just a few paragraphs along the lines of "we will > look at these topics". If we have room, its possible we may > accommodate other topics; this is not carte blanche, but the reason > for posting here is so people know we will take input, following the > normal community process. Detailed in-person discussions at PGCon are > expected and the Wiki pages will be updated for each aspect. > > BI-related Indexing > * MinMax indexes > * Bitmap indexes > > Large Systems > * Freeze avoidance > * Storage management issues for very large systems > > Storage Efficiency > * Compression > * Column Orientation > > Optimisation > * Bulk loading speed improvements > * Bulk FK evaluation > * Executor tuning for very large queries > > Query tuning > * Approximate queries, sampling > * Materialized Views Great! I'm looking forward to seeing PostgreSQL evolve as an analytics database for data warehousing. Is there any reason why in-memory database and MPP is not included? Are you planning to include the above features in 9.5 and 9.6? Are you recommending other developers not implement these features to avoid duplication of work with AXLE? Regards MauMau
On 04/22/2014 02:04 PM, Simon Riggs wrote: > On 22 April 2014 00:24, Josh Berkus <josh@agliodbs.com> wrote: >> On 04/21/2014 03:41 PM, Simon Riggs wrote: >>> Storage Efficiency >>> * Compression >>> * Column Orientation >> You might look at turning this: >> >> http://citusdata.github.io/cstore_fdw/ >> >> ... into a more integrated part of Postgres. > Of course I'm aware of that work - credit to them. Certainly, many > people feel that it is now time to do as you suggest and include > column store features within PostgreSQL. > > As to turning it into a more integrated part of Postgres, we have a > few problems there > > 1. cstore_fdw code has an incompatible licence > > 2. I don't think FDWs are the right place for complex new > architectures such as column store, massively parallel processing or > sharding. I agree that FDW is not an end-all solution for all these, but it is a reasonable starting point and it just might be that the extra things needed could be added to our FDW API instead of sewing it directly into backend guts. I recently tried to implement sharding at FDW level and the main problem I ran into was a missing join type for efficiently using it for certain queries. The specific use case was queries of form select l.*, r* from remotetable r join localtable l on l.key1 = r.id and l.n = N; PostgreSQL offered only two options: 1) full scan on remote table 2) single id=$ selects neither of which are what is actually needed, as firs performs badly if there are more than a few rows in remote table and 2nd performs badly if l.n = N returns more than a few rows when I manually rewrote the query to select l.*, r* from remotetable r where r.id = ANY(ARRAY(select key1 from localtable where n = N)) join localtable l on l.key1 = r.id and l.n = N; it run really well. Unfortunately this is not something that postgreSQL considers by itself while optimising. BTW, this kind of optimisation should also be a win for really large IN queries if we could have an indexed IN whic would not start each lookup from the index root, but rather would sort the IN contents and do an index merge vis skipping from current position. Cheers
On 04/22/2014 08:15 AM, MauMau wrote: > > > Are you planning to include the above features in 9.5 and 9.6? Are you > recommending other developers not implement these features to avoid > duplication of work with AXLE? > > Without pointing any fingers, I should note that I have learned the hard way to take such recommendations with a grain of salt. More than once I have been stopped from working on something because someone else said they were, only for nothing to appear, and in the interests of full disclosure I can think of two significant instances when I have been similarly guilty, although the most serious of those has since been rectified by someone else. cheers andrew
On 04/22/2014 08:04 AM, Simon Riggs wrote: > On 22 April 2014 00:24, Josh Berkus <josh@agliodbs.com> wrote: >> On 04/21/2014 03:41 PM, Simon Riggs wrote: >>> Storage Efficiency >>> * Compression >>> * Column Orientation >> You might look at turning this: >> >> http://citusdata.github.io/cstore_fdw/ >> >> ... into a more integrated part of Postgres. > Of course I'm aware of that work - credit to them. Certainly, many > people feel that it is now time to do as you suggest and include > column store features within PostgreSQL. > > As to turning it into a more integrated part of Postgres, we have a > few problems there > > 1. cstore_fdw code has an incompatible licence > > 2. I don't think FDWs are the right place for complex new > architectures such as column store, massively parallel processing or > sharding. The fact that it is probably the best place to implement it > in user space doesn't mean it transfers well into core code. That's a > shame and I don't know what to do about it, because it would be nice > to simply ask for change of licence and then integrate it, but it > seems more work than that (to me). > I agree, and indeed that was something like my first reaction to hearing about this development - FDW seems like a very odd way to handle this. But the notion of builtin columnar storage suggests to me that we really need first to tackle how various storage engines might be incorporated into Postgres. I know this has been a bugbear for many years, but maybe now with serious proposals for alternative storage engines on the horizon we can no longer afford to put off the evil day when we grapple with it. cheers andrew
On 22 April 2014 13:15, MauMau <maumau307@gmail.com> wrote: > Great! I'm looking forward to seeing PostgreSQL evolve as an analytics > database for data warehousing. Is there any reason why in-memory database > and MPP is not included? Those ideas are valid; the features are bounded by resource constraints of time and money, as well as by technical skills/ capacities of my fellow developers. My analysis has been that implementing parallelism has lower benefit/cost ratio than other features, as well as requiring more expensive servers (for MPP). I expect MPP to be an eventual end goal from BDR project. > Are you planning to include the above features in 9.5 and 9.6? Yes > Are you > recommending other developers not implement these features to avoid > duplication of work with AXLE? This was more to draw attention to the work so that all interested parties can participate in producing something useful. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Andrew Dunstan (andrew@dunslane.net) wrote: > I agree, and indeed that was something like my first reaction to > hearing about this development - FDW seems like a very odd way to > handle this. But the notion of builtin columnar storage suggests to > me that we really need first to tackle how various storage engines > might be incorporated into Postgres. I know this has been a bugbear > for many years, but maybe now with serious proposals for alternative > storage engines on the horizon we can no longer afford to put off > the evil day when we grapple with it. Agreed, and it goes beyond just columnar stores- I could see IOTs being implemented using this notion of a different 'storage engine', but calling it a 'storage engine' makes it sound like we want to change how we access files and I don't think we really want to change that but rather come up with a way to have an alternative heap.. Columnar or IOTs would still be page-based and go through shared buffers, etc, I'd think.. Thanks, Stephen
On 04/22/2014 06:39 AM, Andrew Dunstan wrote: > I agree, and indeed that was something like my first reaction to hearing > about this development - FDW seems like a very odd way to handle this. > But the notion of builtin columnar storage suggests to me that we really > need first to tackle how various storage engines might be incorporated > into Postgres. I know this has been a bugbear for many years, but maybe > now with serious proposals for alternative storage engines on the > horizon we can no longer afford to put off the evil day when we grapple > with it. Yes. *IF* PostgreSQL already supported alternate storage, then the Citus folks might have released their CStore as a storage plugin instead of an FDW. However, if they'd waited for pluggable storage, they'd still be waiting. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
2014-04-22 19:02 GMT+02:00 Josh Berkus <josh@agliodbs.com>:
On 04/22/2014 06:39 AM, Andrew Dunstan wrote:
> I agree, and indeed that was something like my first reaction to hearing
> about this development - FDW seems like a very odd way to handle this.
> But the notion of builtin columnar storage suggests to me that we really
> need first to tackle how various storage engines might be incorporated
> into Postgres. I know this has been a bugbear for many years, but maybe
> now with serious proposals for alternative storage engines on the
> horizon we can no longer afford to put off the evil day when we grapple
> with it.
Yes. *IF* PostgreSQL already supported alternate storage, then the
Citus folks might have released their CStore as a storage plugin instead
of an FDW. However, if they'd waited for pluggable storage, they'd
still be waiting.
I am sceptical - what I know about OLAP column store databases - they need a hardly different planner, so just engine or storage is not enough. Vector Wise try to merge Ingres with Monet engine more than four years - and still has some issues.
Our extensibility is probably major barrier against fast OLAP - I see a most realistic way to support better partitioning and going in direction higher parallelism and distribution - and maybe map/reduce support.
In GoodData we use successfully Postgres for BI projects to 20G with fast response - and most painfulness are missing MERGE, missing fault tolerant copy, IO expensive update of large tables with lot of indexes and missing simple massive partitioning. On second hand - Postgres works perfectly on thousands databases with thousands tables without errors with terrible simple deploying in cloud environment.
Regards
Pavel
Pavel
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers