Thread: Inheritance efficiency
Hi all. I'm wondering how efficient the inheritance can be. I'm using the constraint exclusion feature and for each child table (maybe but one) I have a proper CHECK constraint. How efficient can the query planner be in choosing the right child tables in the case of, say, thousands of them? Would the selection process behave linearly, logarithmically or what? And now it comes to my mind the same question for partial indexes. That is, if I had a lot (really a lot) of small partial indexes over a very large table, how efficient can the query planner be in selecting the right indexes? Will these answers change in v9? -- Vincenzo Romano NotOrAnd Information Technologies NON QVIETIS MARIBVS NAVTA PERITVS
Vincenzo Romano wrote: > Hi all. > > I'm wondering how efficient the inheritance can be. > I'm using the constraint exclusion feature and for each child table > (maybe but one) I have a proper CHECK constraint. > How efficient can the query planner be in choosing the right child > tables in the case of, say, thousands of them? > Would the selection process behave linearly, logarithmically or what? It is fine for dozens of child tables, but not thousands; it does need improvement. > And now it comes to my mind the same question for partial indexes. > That is, if I had a lot (really a lot) of small partial indexes over a > very large table, how efficient can the query planner be > in selecting the right indexes? > > Will these answers change in v9? No. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
2010/4/26 Bruce Momjian <bruce@momjian.us>: > Vincenzo Romano wrote: >> Hi all. >> >> I'm wondering how efficient the inheritance can be. >> I'm using the constraint exclusion feature and for each child table >> (maybe but one) I have a proper CHECK constraint. >> How efficient can the query planner be in choosing the right child >> tables in the case of, say, thousands of them? >> Would the selection process behave linearly, logarithmically or what? > > It is fine for dozens of child tables, but not thousands; it does need > improvement. This sounds like "linear" algorithms. Doesn't it? >> And now it comes to my mind the same question for partial indexes. >> That is, if I had a lot (really a lot) of small partial indexes over a >> very large table, how efficient can the query planner be >> in selecting the right indexes? No info about this point (partial indexes)? Is also this geared with linear algorithms ? -- Vincenzo Romano NotOrAnd Information Technologies NON QVIETIS MARIBVS NAVTA PERITVS
2010/4/26 Vincenzo Romano <vincenzo.romano@notorand.it>: > 2010/4/26 Bruce Momjian <bruce@momjian.us>: >> Vincenzo Romano wrote: >>> Hi all. >>> >>> I'm wondering how efficient the inheritance can be. >>> I'm using the constraint exclusion feature and for each child table >>> (maybe but one) I have a proper CHECK constraint. >>> How efficient can the query planner be in choosing the right child >>> tables in the case of, say, thousands of them? >>> Would the selection process behave linearly, logarithmically or what? >> >> It is fine for dozens of child tables, but not thousands; it does need >> improvement. > > This sounds like "linear" algorithms. Doesn't it? > >>> And now it comes to my mind the same question for partial indexes. >>> That is, if I had a lot (really a lot) of small partial indexes over a >>> very large table, how efficient can the query planner be >>> in selecting the right indexes? > > No info about this point (partial indexes)? > Is also this geared with linear algorithms ? Should I move to an "enterprise grade" version of PostgreSQL? -- Vincenzo Romano NotOrAnd Information Technologies NON QVIETIS MARIBVS NAVTA PERITVS
On Thu, Apr 29, 2010 at 11:29:36AM +0200, Vincenzo Romano wrote: > > No info about this point (partial indexes)? > > Is also this geared with linear algorithms ? > > Should I move to an "enterprise grade" version of PostgreSQL? The enterprise grade version of PostgreSQL is the community version. Proprietary forks exist, but they don't fix this kind of problem. :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
2010/4/30 David Fetter <david@fetter.org>: > On Thu, Apr 29, 2010 at 11:29:36AM +0200, Vincenzo Romano wrote: >> > No info about this point (partial indexes)? >> > Is also this geared with linear algorithms ? >> >> Should I move to an "enterprise grade" version of PostgreSQL? > > The enterprise grade version of PostgreSQL is the community version. > > Proprietary forks exist, but they don't fix this kind of problem. :) Hmmm ... I think this is the kind of problems that keeps PostgreSQL away from the "enterprise grade" world. The ability to cope with thousands of DB objects like (child-)tables, indexes, functions and so on with O(1) or at least O(log(n)) complexity is among the key points. For example, the Linux kernel made the big jump with server hardware thanks also to the O(1) schedulers. In this specific case, if you think about "inheritance for partitioning" and you stick with the example idea of "one partition per month", then the current solution is more than OK. In the real world, that is not really the general case, especially in the "enterprise grade" world, where maybe you partition with both a time stamp and another column, like product code ranges and prefixes ... Is there any planning about this improvement? > > Cheers, > David. > -- > David Fetter <david@fetter.org> http://fetter.org/ > Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter > Skype: davidfetter XMPP: david.fetter@gmail.com > iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate > -- Vincenzo Romano NotOrAnd Information Technologies cel. +39 339 8083886 | gtalk. vincenzo.romano@notorand.it fix. +39 0823 454163 | skype. notorand.it fax. +39 02 700506964 | msn. notorand.it NON QVIETIS MARIBVS NAVTA PERITVS
2010/4/30 Vincenzo Romano <vincenzo.romano@notorand.it>: > 2010/4/30 David Fetter <david@fetter.org>: >> On Thu, Apr 29, 2010 at 11:29:36AM +0200, Vincenzo Romano wrote: >>> > No info about this point (partial indexes)? >>> > Is also this geared with linear algorithms ? >>> >>> Should I move to an "enterprise grade" version of PostgreSQL? >> >> The enterprise grade version of PostgreSQL is the community version. >> >> Proprietary forks exist, but they don't fix this kind of problem. :) > > Hmmm ... I think this is the kind of problems that keeps PostgreSQL away > from the "enterprise grade" world. > The ability to cope with thousands of DB objects like (child-)tables, > indexes, functions and so on with > O(1) or at least O(log(n)) complexity is among the key points. > > For example, the Linux kernel made the big jump with server hardware > thanks also to the O(1) schedulers. > > In this specific case, if you think about "inheritance for > partitioning" and you stick with the example idea of "one partition > per month", then the current solution is more than OK. > In the real world, that is not really the general case, especially in > the "enterprise grade" world, where maybe you partition with both a > time stamp and another column, like product code ranges and prefixes > ... > > Is there any planning about this improvement? Could it be possible to just make some changes (adding indexes) to the information schema to gain this enterprise gradeness? -- Vincenzo Romano NotOrAnd Information Technologies NON QVIETIS MARIBVS NAVTA PERITVS
On Fri, Apr 30, 2010 at 08:44:26AM +0200, Vincenzo Romano wrote: > >>> Should I move to an "enterprise grade" version of PostgreSQL? > >> > >> The enterprise grade version of PostgreSQL is the community > >> version. > >> > >> Proprietary forks exist, but they don't fix this kind of problem. > >> :) > > > > Hmmm ... I think this is the kind of problems that keeps > > PostgreSQL away from the "enterprise grade" world. The ability to > > cope with thousands of DB objects like (child-)tables, indexes, > > functions and so on with O(1) or at least O(log(n)) complexity is > > among the key points. > > > > For example, the Linux kernel made the big jump with server > > hardware thanks also to the O(1) schedulers. > > > > In this specific case, if you think about "inheritance for > > partitioning" and you stick with the example idea of "one > > partition per month", then the current solution is more than OK. > > In the real world, that is not really the general case, especially > > in the "enterprise grade" world, where maybe you partition with > > both a time stamp and another column, like product code ranges and > > prefixes ... > > > > Is there any planning about this improvement? > > Could it be possible to just make some changes (adding indexes) to > the information schema to gain this enterprise gradeness? Your assertion that PostgreSQL is not "enterprise grade" is simply false. For years, it has been and continues to be used as the basis of extremely large mission-critical systems. That said, if you wish to make changes, or propose that some be made, please feel free to do so after 9.0 comes out. In the mean time, please test 9.0beta1 along with any ensuing betas and release candidates, and report back the results of the aforementioned testing. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Vincenzo Romano wrote: > In this specific case, if you think about "inheritance for > partitioning" and you stick with the example idea of "one partition > per month", then the current solution is more than OK. > In the real world, that is not really the general case, especially in > the "enterprise grade" world, where maybe you partition with both a > time stamp and another column, like product code ranges and prefixes > ... > > Is there any planning about this improvement? Of course. People is always looking to make improvements in many areas. There are very few things that people consider to be "more than OK". The partitioning features are among those being more examined for possibly improvements. This does *not* mean that PostgreSQL doesn't serve mission critical systems already, on enterprises large and small, some of them on very large systems. What you see in these lists (people describing "partition by month" schemes) are not necessarily the most complex setups out there. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2010/4/30 Alvaro Herrera <alvherre@commandprompt.com>: > Vincenzo Romano wrote: > >> In this specific case, if you think about "inheritance for >> partitioning" and you stick with the example idea of "one partition >> per month", then the current solution is more than OK. >> In the real world, that is not really the general case, especially in >> the "enterprise grade" world, where maybe you partition with both a >> time stamp and another column, like product code ranges and prefixes >> ... >> >> Is there any planning about this improvement? > > Of course. People is always looking to make improvements in many areas. > There are very few things that people consider to be "more than OK". > The partitioning features are among those being more examined for > possibly improvements. > > This does *not* mean that PostgreSQL doesn't serve mission critical > systems already, on enterprises large and small, some of them on very > large systems. What you see in these lists (people describing > "partition by month" schemes) are not necessarily the most complex > setups out there. Hi. I've nerver meant to say that PG is not mission critical! I argued that O(n) stuff will keep it away from "enterprise grade" applications. I've been told earlier that "It is fine for dozens of child tables, but not thousands; it does need improvement." This is not enterprise grade. And the same could go for (a large number of) partial indexes. Any idea here? Infact I have in mind also a different approach to partitioning which could be useful (under certain constraints, of course). Instead of partitioning the table itself, you can partition the indexes. The data can still be in a single table (for the sake of some FKs for example). Just the indexes get "partitioned"· But, of course, a lot depends on whether the selection of the right indexes (among thousands) is effective or not. > > -- > Alvaro Herrera http://www.CommandPrompt.com/ > PostgreSQL Replication, Consulting, Custom Development, 24x7 support > -- Vincenzo Romano NotOrAnd Information Technologies cel. +39 339 8083886 | gtalk. vincenzo.romano@notorand.it fix. +39 0823 454163 | skype. notorand.it fax. +39 02 700506964 | msn. notorand.it NON QVIETIS MARIBVS NAVTA PERITVS
Vincenzo Romano wrote: > This is not enterprise grade. "Enterprise grade" is nothing but a buzzword. Oh, it's also a moving target. We've been not enterprise grade for years, always one feature behind (and strangely, the one lacking feature is always the one of interest to the complainant). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > Vincenzo Romano wrote: > > > This is not enterprise grade. > > "Enterprise grade" is nothing but a buzzword. Oh, it's also a moving > target. We've been not enterprise grade for years, always one feature > behind (and strangely, the one lacking feature is always the one of > interest to the complainant). We do have this enhancement coming in Postgres 9.0: Add an index on pg_inherits.inhparent, and use it to avoid seqscans in find_inheritance_children(). This is a complete no-op in databases without any inheritance. In databases where there are just a few entries in pg_inherits, it could conceivably be a small loss. However, in databases with many inheritance parents, it can be a big win. However, I don't think this going to help a lot for partitioning because the cost is mostly checking the CHECK constraints, not finding the table's children. Like all Postgres missing features, we just need someone with time to volunteer to research and fix it. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Fri, Apr 30, 2010 at 00:19, Vincenzo Romano <vincenzo.romano@notorand.it> wrote: > For example, the Linux kernel made the big jump with server hardware > thanks also to the O(1) schedulers. <flamebait> Uhh linux has not had a O(1) scheduler since 2.6.23, its supposedly O(log n) now. =) </flamebait>
Vincenzo Romano wrote: > I argued that O(n) stuff will keep it away from "enterprise grade" applications. > I've been told earlier that "It is fine for dozens of child tables, > but not thousands; > it does need improvement." > This is not enterprise grade Enterprise grade doesn't mean anything. Partitioning designs that require thousands of child tables to work right are fundamentally misdesigned anyway, so there is no reason for any of the contributors to the project to work on improving support for them. There are far too many obvious improvements that could be made to PostgreSQL, ones that will benefit vastly more people, to divert resources toward something you shouldn't be dong anyway like that. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
2010/5/1 Greg Smith <greg@2ndquadrant.com>: > Vincenzo Romano wrote: >> >> I argued that O(n) stuff will keep it away from "enterprise grade" >> applications. >> I've been told earlier that "It is fine for dozens of child tables, >> but not thousands; >> it does need improvement." >> This is not enterprise grade > > Enterprise grade doesn't mean anything. Partitioning designs that require > thousands of child tables to work right are fundamentally misdesigned > anyway, so there is no reason for any of the contributors to the project to > work on improving support for them. There are far too many obvious > improvements that could be made to PostgreSQL, ones that will benefit vastly > more people, to divert resources toward something you shouldn't be dong > anyway like that. While I can agree that "Enterprise grade" is a buzzword, it does mean something: "very large amount of data" among other. There's no "fundamentally good design", but only a design which takes limitations and constraints into account. I just say that sublinear algorithms allow better handling for growing numbers of objects. -- Vincenzo Romano NotOrAnd Information Technologies cel. +39 339 8083886 | gtalk. vincenzo.romano@notorand.it fix. +39 0823 454163 | skype. notorand.it fax. +39 02 700506964 | msn. notorand.it NON QVIETIS MARIBVS NAVTA PERITVS
Vincenzo Romano wrote: > While I can agree that "Enterprise grade" is a buzzword, it does mean > something: "very large amount of data" among other. > http://thedailywtf.com/Articles/Bitten_by_the_Enterprise_Bug.aspx It's quite straighforward to get PostgreSQL up and running with many terabytes of data, so long as you respect the design trade-offs in some options. What you can't do is say those are wrong and reject alternative implementation suggestions just because they're not "enterprise". Whenever anyone uses that word at me, I mentally replace it with "super duper", and > There's no "fundamentally good design", but only a design which takes > limitations and constraints into account. > You mean like taking into account the fact that partitioning performance has an unavoidable trade-off, where you have to balance the query optimizer overhead of supporting many partitions against the improvement from splitting data into smaller pieces? -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
2010/5/1 Greg Smith <greg@2ndquadrant.com>: > Vincenzo Romano wrote: >> >> While I can agree that "Enterprise grade" is a buzzword, it does mean >> something: "very large amount of data" among other. >> > > http://thedailywtf.com/Articles/Bitten_by_the_Enterprise_Bug.aspx > > It's quite straighforward to get PostgreSQL up and running with many > terabytes of data, so long as you respect the design trade-offs in some > options. What you can't do is say those are wrong and reject alternative > implementation suggestions just because they're not "enterprise". Whenever > anyone uses that word at me, I mentally replace it with "super duper", and > >> There's no "fundamentally good design", but only a design which takes >> limitations and constraints into account. >> > > You mean like taking into account the fact that partitioning performance has > an unavoidable trade-off, where you have to balance the query optimizer > overhead of supporting many partitions against the improvement from > splitting data into smaller pieces? Or taking into account that some parts of the engine are not scalable. Almost all current RDBMS can cope with terabytes. Almost none (if any) can cope with large number of partial indexes (provided they support them) or child tables or table level constraints and so on. This is a fact as far as I've read so far. Then we can discuss about the enterprise grade, the fault tolerance and whatever else buzzword pops up in our minds. These ones maybe not. -- Vincenzo Romano NotOrAnd Information Technologies NON QVIETIS MARIBVS NAVTA PERITVS
Greg Smith wrote: > Enterprise grade doesn't mean anything. Partitioning designs that > require thousands of child tables to work right are fundamentally > misdesigned anyway, so there is no reason for any of the contributors > to the project to work on improving support for them. There are far > too many obvious improvements that could be made to PostgreSQL, ones > that will benefit vastly more people, to divert resources toward > something you shouldn't be dong anyway like that. > my sql developer, who's been doing oracle for 15+ years, says postgres' partitioning is flawed from his perspective because if you have a prepared statement like.. SELECT fields FROM partitioned_table WHERE primarykey = $1; it doesn't optimize this very well and ends up looking at all the sub-table indicies. ir you instead execute the statement SELECT fields FROM parritioned_table WHERE primarykey = constant; he says the planner will go straight to the correct partition. i haven't confirmed this for myself.
2010/5/1 John R Pierce <pierce@hogranch.com>: > Greg Smith wrote: >> >> Enterprise grade doesn't mean anything. Partitioning designs that require >> thousands of child tables to work right are fundamentally misdesigned >> anyway, so there is no reason for any of the contributors to the project to >> work on improving support for them. There are far too many obvious >> improvements that could be made to PostgreSQL, ones that will benefit vastly >> more people, to divert resources toward something you shouldn't be dong >> anyway like that. >> > > my sql developer, who's been doing oracle for 15+ years, says postgres' > partitioning is flawed from his perspective because if you have a prepared > statement like.. > > SELECT fields FROM partitioned_table WHERE primarykey = $1; > > it doesn't optimize this very well and ends up looking at all the sub-table > indicies. ir you instead execute the statement > > SELECT fields FROM parritioned_table WHERE primarykey = constant; > > he says the planner will go straight to the correct partition. > > i haven't confirmed this for myself. It has nothing to do with partitionning but how the planner works. Even if the use case remain correct.... > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Cédric Villemain
On 1 May 2010, at 5:33, John R Pierce wrote: > Greg Smith wrote: > my sql developer, who's been doing oracle for 15+ years, says postgres' partitioning is flawed from his perspective becauseif you have a prepared statement like.. > > SELECT fields FROM partitioned_table WHERE primarykey = $1; > > it doesn't optimize this very well and ends up looking at all the sub-table indicies. Yes it would, for a very logical reason. A prepared statement is nothing but a stored query plan - its benefits are mostly that you can skip the query planning stepbefore performing a query, which helps queries that are performed very frequently in a short time or that take a longtime planning. But skipping the query planner also has a drawback; the planner has to make a general assumption about what kind of datayou'll be querying. It can't vary the query plan depending on what data you're querying for. If someone is writing a query on a partitioned table and wants to rely on constraint exclusion and they're trying to usea prepared statement then they don't understand what prepared statements are. You could argue that some logic could be added to the handling of prepared statements to insert query-subplans dependingon what data you use for your parameters, but then you're moving back in the direction of unprepared statements(namely invoking the query planner). It would help cases like this one, but it would hurt all other prepared statements.It would at the least add a parse tree back into the queries path, which would be a fairly simplistic one in thecase of table partitioning, but would get fairly complex for prepared statements involving more parameters - so much sothat the benefit of using a prepared statement (not spending time planning the query) would get reduced significantly. It's possible that Oracle implemented something like this, but as you see it's not necessarily an improvement. In practice people either query the correct table partition directly or do not use a prepared statement. > ir you instead execute the statement > > SELECT fields FROM parritioned_table WHERE primarykey = constant; > > he says the planner will go straight to the correct partition. > > i haven't confirmed this for myself. Alban Hertroys -- Screwing up is an excellent way to attach something to the ceiling. !DSPAM:737,4bdc08fc10416246414315!
On 1 May 2010, at 12:56, Alban Hertroys wrote: > You could argue that some logic could be added to the handling of prepared statements to insert query-subplans dependingon what data you use for your parameters, but then you're moving back in the direction of unprepared statements(namely invoking the query planner). It would help cases like this one, but it would hurt all other prepared statements.It would at the least add a parse tree back into the queries path, which would be a fairly simplistic one in thecase of table partitioning, but would get fairly complex for prepared statements involving more parameters - so much sothat the benefit of using a prepared statement (not spending time planning the query) would get reduced significantly. And of course it would add time for planning the query-tree to the creation of the prepared statement - which could be significantcompared to the time people expect to save by not invoking the planner on later invocations of the same query.That said, the more frequent the query is executed the less that hurts performance, while it doesn't really matterfor queries that are executed infrequently. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4bdc0ba010411331128920!