Thread: Sponsoring enterprise features
Hi folks, Is there any pre-existing protocol for a company to pay for specific features to be added to PostgreSQL? I've gotten full executive buy-in to the idea that it would be far cheaper to sponsor and pay for people to develop the enterprise features we need in Postgres than to do an Oracle migration to get those same features that we need (which would cost unholy amounts of money that we don't want to spend for our installation). All that said, I don't know if this is a feasible plan, or what the makeup is of the developers currently working on Postgres. As a practical matter, we do not have the time or people to take on this project in-house. Our company is interested in sponsoring a push to get enterprise-level scalability features into PostgreSQL, things like partitioning and organized heaps. As a practical business matter, Oracle is an option but one of last resort that we (and I) would prefer to avoid if at all possible. We see an obvious long-term benefit to making Postgres do what we need it to do than buying gobs of Oracle licenses. Are other people/companies already doing this, either officially or unofficially, and what is the general protocol for going about doing this? Cheers, -James Rogersjamesr@best.com
Mr. Rogers, > Is there any pre-existing protocol for a company to pay for specific > features to be added to PostgreSQL? > Are other people/companies already doing this, either officially or > unofficially, and what is the general protocol for going about doing > this? Other companies are doing this, and we often depend on corporate support for new major features. The general approach is to hire or contract a major contributor to make the changes you want and win approval in the community for them. Probably a few companies and/or individuals have already contacted you. If not, you are welcome to contact me. -- -Josh BerkusAglio Database SolutionsSan Francisco
On Tue, 2003-11-18 at 14:33, James Rogers wrote: > Hi folks, > > Is there any pre-existing protocol for a company to pay for specific > features to be added to PostgreSQL? There are several people who do this type of work (Neil, Joe, David, the folks are Command Prompt Inc., etc.). Personally, I think the best way is simply to make a post on -hackers with a description of what you want to accomplish with a call for estimates and proposals. Ensure you require the feature to be applied to the main code line so it will be maintained for future releases. I say a description of what you want to accomplish because certain features are not as useful on PostgreSQL as they are other databases (data partitioning being one of them, due to the ability to use partial indexes) so you may not achieve what you are expecting.
Rod Taylor <pg@rbt.ca> writes: > Personally, I think the best way is simply to make a post on -hackers > with a description of what you want to accomplish with a call for > estimates and proposals. ... > I say a description of what you want to accomplish because certain > features are not as useful on PostgreSQL as they are other databases > (data partitioning being one of them, due to the ability to use partial > indexes) so you may not achieve what you are expecting. Right. You can in any case get a great deal of free advice by starting a pghackers discussion ;-) It should be noted that "because Oracle does it that way" is a guaranteed nonstarter as a rationale for any Postgres feature proposal. There are enough differences between Postgres and Oracle that you will need to do significant investigation before assuming that an Oracle- based feature design is appropriate for Postgres. Aside from technical differences, we have fundamentally different priorities --- one of which is simplicity of administration. You'll get no buyin on proposals that tend to create Oracle-like difficulties of installation and tuning. regards, tom lane
On Thu, 2003-11-20 at 22:20, Tom Lane wrote: > It should be noted that "because Oracle does it that way" is a > guaranteed nonstarter as a rationale for any Postgres feature proposal. A method of doing something is not a "feature"; making something possible that couldn't be done before is a "feature". I don't really care how Oracle does something, though I am cognizant of *why* Oracle does something. s/Oracle/DB2/, and little changes. > There are enough differences between Postgres and Oracle that you will > need to do significant investigation before assuming that an Oracle- > based feature design is appropriate for Postgres. Aside from technical > differences, we have fundamentally different priorities --- one of which > is simplicity of administration. You'll get no buyin on proposals that > tend to create Oracle-like difficulties of installation and tuning. I'm not sure what Oracle has to do with any of this. If I wanted to use Oracle, I would buy Oracle. The thing is, I'm intimately familiar with Oracle and there are a lot of things I despise about Oracle as a consequence of this familiarity. The features I'm talking about can be added to any reasonable database engine, and are generically supported features (or "enterprise" add-ons) in virtually all large commercial databases. As I stated previously, I/we are interested in adding features for managing very large tables and working sets, and making Postgres scale in general for these kinds of databases (currently, it does not). These kinds of features will be important for enterprise users, particularly ones interested in migrating from Oracle/DB2/SQLServer/etc, and would be invisible to people that don't need them. This is a matter of adding important functionality that can be supported by any reasonable database engine. In a nutshell, the features on my short list are all about heap management (e.g. partitioning). This is really important when databases reach a certain size, but something for which Postgres has almost no support. >From a large-scale enterprise database standpoint, heap management is almost as important a capability as replication. Replication is being aggressively worked on, heap management is not and so we are interested in making sure this part gets developed. When PostgreSQL has this, there will be little reason for anyone to use the big commercial database-du-jour. I don't care how its implemented specifically, just as long as it is in there, and there is no technical reason that it couldn't be implemented per previous discussions. I've gotten the green light (and many responses from people interested in doing it) to start writing up RFQs for specific features, which I will post to the pg-hackers list. It is all stuff previously determined to be doable within the current PostgreSQL framework, and just requiring some work that my company is willing to help pay for. Cheers, -James Rogersjamesr@best.com
James, > I'm not sure what Oracle has to do with any of this. If I wanted to use > Oracle, I would buy Oracle. Good. Your original post, which appeared to propose carbon-copying a number of features from Oracle -- I didn't necessarily read it that way, but several other people did, including some of the developers you will want to recruit. You are not the first person to put forward some of these ideas, and your predicessors tended to follow the line "but if Oracle does it that way, it must be good." So it's a knee-jerk reaction thing. AFAIK, Tom was just warning you that your feature proposals need to be backed by arguments/evidence that they will actually improve performance or expand capabilities for PostgreSQL in some meaningful way. You seem prepared to do that, so we shouldn't have any disagreements in that way. > In a nutshell, the features on my short list are all about heap > management (e.g. partitioning). This is really important when databases > reach a certain size, but something for which Postgres has almost no > support. heap management == table partitioning? I'm a little unclear, personally, about what can be accomplished through table partitioning that we can't currently do through partial indexes and inherited tables, especially after Gavin finishes his tablespaces patch (btw, Gavin could use sponsorship on that one, I think). Can you make your case to me/the list? So far, the only arguments we've gotten on this list have been of the "Oracle does it that way" variety so it'd be interesting to see something concrete. Now, query partitioning is something I think everyone is interested in, and would very much like to see someone implement. > I've gotten the green light (and many responses from people interested > in doing it) to start writing up RFQs for specific features, which I > will post to the pg-hackers list. It is all stuff previously determined > to be doable within the current PostgreSQL framework, and just requiring > some work that my company is willing to help pay for. Cool. I look forward to seeing the fruits of this effort. From my perspective, the other "oracle-killer" that we're missing includes some isolated but difficult improvements in the Query Planner necessary to pass the TPC benchmarks. I'd be happy to discuss those as well, if you like. -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > I'm a little unclear, personally, about what can be accomplished through table > partitioning that we can't currently do through partial indexes and inherited > tables, especially after Gavin finishes his tablespaces patch (btw, Gavin > could use sponsorship on that one, I think). Can you make your case to > me/the list? So far, the only arguments we've gotten on this list have been > of the "Oracle does it that way" variety so it'd be interesting to see > something concrete. Well probably everyone who wants it is saying things of the form "they were a useful with Oracle because...". Which isn't the same thing as "Oracle does it this way". I don't particularly care how partitioned tables are *implemented*, only the net effect. You can think of them as an abstraction over inherited tables that let the database guarantee your data integrity and offer query optimizations in a way it cannot if you build it by hand. I know for us they were an absolute godsend. The main advantages over a single monolithic table even with partial indexes are: 1) Being able to load and unload parts of the table quickly. Adding and removing a partition is basically a DDL operation, not DML. It doesn't have to visit every tuple and mark itdeleted or added. It just has to add or remove the entire partition to the structure. Partitioned tables are frequently used for aging out old data. The common example is of having a partition per month andkeeping 3-12 months of data. We had a more extreme case where we had one partition per day and kept 21 days of data. When we implemented partitioned tables the time to archive and delete the old data went from taking most of thenight and killing production performance to effectively instantaneous and we were able to run it at peak time. 2) Being able to do a sequential scan of a partition. Sequential scans are faster than index scans. Sometimes much faster. Partial indexes are nice but when they cover 10-20%of your table scanning them is much slower than a sequential scan of a partition. As for inherited tables. Well, I would expect a partitioned tables scheme to be implemented using inherited tables or just using views. You could jury-rig it today using these tools, it would just be very awkward and fragile. The original Oracle implementation in Oracle 7 was implemented much the same way using views. They were a complete hack and required lots of manual tweaking though. The point of partitioned tables is a) The database ensures tuples go into the correct partition. If you used a manually constructed view or inherited tablesyou would always run the risk of inserting into the wrong partition which would break your data integrity. b) The database automatically optimizes queries to query the correct partitions. It detects clauses in the query much likepartial indexes so you don't have to tweak every query by hand and the database can skip clauses that match the partitionclause exactly. Also this is a prime opportunity for the database to introduce parallel queries because each partitioncan be accessed independently. -- greg
On Sat, Nov 22, 2003 at 11:54:45AM -0800, Josh Berkus wrote: > > In a nutshell, the features on my short list are all about heap > > management (e.g. partitioning). This is really important when databases > > reach a certain size, but something for which Postgres has almost no > > support. > > heap management == table partitioning? It was just an example. Another example could be a B-Tree sorted heap (like an index, but with the whole tuple at the leaf pages instead of a pointer to the heap). There was a thread about this some time ago. It seemed a good idea ... -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Para tener más hay que desear menos"
Main needs partitioning is useful for: - partition elimination for queries (e.g. seq scans only scan relevant partitions) - deleting/detaching huge parts of a table in seconds - attaching huge parts to a table in seconds (that may have been loaded witha fast loading utility (e.g. loading withoutindex, prebuilding indexes,attaching table + prebuilt partitioned indexes)) - achieving [heap and index] per page data locality (for better cache rates) - allowing partial restores (for defect disks) while the rest of the db is still online - in pg, allowing partial vacuums (only partitions that see changes) People needing those features usually have data with more than 10-50 Gb per partition. > I'm a little unclear, personally, about what can be accomplished through table > partitioning that we can't currently do through partial indexes and inherited > tables, especially after Gavin finishes his tablespaces patch Well, sure the goal needs to be to make use of what already exists, but a few things are still missing, e.g.:- unique indexes, that span the hierarchy (and do not contain the partitioning column[s])-partition elimination (imho we should use check constraints for that)- physical backups :-)- tablespaces ? Note, that these would all be useful for non partitioning use cases also. Andreas
Zeugswetter Andreas SB SD kirjutas E, 24.11.2003 kell 13:16: > Main needs partitioning is useful for: > - partition elimination for queries (e.g. seq scans only scan relevant partitions) > - deleting/detaching huge parts of a table in seconds > - attaching huge parts to a table in seconds (that may have been loaded with > a fast loading utility (e.g. loading without index, prebuilding indexes, > attaching table + prebuilt partitioned indexes)) > - achieving [heap and index] per page data locality (for better cache rates) > - allowing partial restores (for defect disks) while the rest of the db is still online > - in pg, allowing partial vacuums (only partitions that see changes) > > People needing those features usually have data with more than 10-50 Gb per > partition. > > > I'm a little unclear, personally, about what can be accomplished through table > > partitioning that we can't currently do through partial indexes and inherited > > tables, especially after Gavin finishes his tablespaces patch Partial indexes don't solve the basic clustering problem (need for reading too many pages). Using inherited tables for this somehow seems conceptually wrong, though this may solve the clustering problem. Also, indexes don't currently cover child tables, but this may be thought of as an implementation deficiency. But some mechanisms used for inheritance could likely be used as basis for partitioning. > Well, sure the goal needs to be to make use of what already exists, > but a few things are still missing, e.g.: > - unique indexes, that span the hierarchy (and do not contain the partitioning column[s]) We kind of have these - a unique index can nicely be built over table stored in several 1GB files. Now we must just make sure that seqscans know about the gaps if files are not full 1GB in size. BTW, does VACUUM FULL shrink individual files or will it move tuples between files when shrinking the storage ? Are interfaces to sparse files portable enough that we could use these ? > - partition elimination (imho we should use check constraints for that) or partial indexes or smart clustering which can make use of several partitions depending on the possition of the tuple in the index(es) used for clustering. > - physical backups :-) Do you mean read-only partitions which can come and go? > - tablespaces ? Tablespaces should be an orthogonal feature, mainly about efficient storage. We could like to spread partitions of the same table over several tablespaces. > Note, that these would all be useful for non partitioning > use cases also. True. --------------- Hannu