Thread: Fragmentation project
Hello people, I'm thinking in project and implement "data fragmentation" based on catalogs inside pgcluster as my university final project. I would like suggestions and would be happy if anyone help me to define it. Sorry if this is the incorrect list...I had no answer on "pgcluster-general"... Thanks, Gustavo.
Gustavo Tonini wrote: > I'm thinking in project and implement "data fragmentation" based on > catalogs inside > pgcluster as my university final project. I would like suggestions and > would be happy if anyone help me to define it. Sorry if this is the > incorrect list...I had no answer on "pgcluster-general"... This is the correct list. I'm not sure what you mean by data fragmentation, but we do support horizontal partitioning by table constraints. And toasting is like vertical partitioning. What exactly are you thinking of implementing? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Well, I'm thinking in define (maybe via SQL) a set of servers as a cluster and make the fragmentation rules based on "select clauses", storing this "configuration" in a specific catalog in global schema. For example: when a record is inserted in a server which not store this fragment (no rule matches), it will be automatically moved do the correct server (the server who matches that record), transparently to the user. This process involves transaction management and a lot more validations. A propose is: create partition <name> on <site> as <select clause> This create a replication in this site with the rules of the select clause. We can provide, for example, users create a foreign key in a site referencing a table who stored in other(s). In this case, the foreign key will be created in global schema but the validations of the local schema's would be managed by DRDBMS. These are my ideas. Is this supported today? Gustavo. P.S.: sorry by the English mistakes... On 4/19/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > Gustavo Tonini wrote: > > I'm thinking in project and implement "data fragmentation" based on > > catalogs inside > > pgcluster as my university final project. I would like suggestions and > > would be happy if anyone help me to define it. Sorry if this is the > > incorrect list...I had no answer on "pgcluster-general"... > > This is the correct list. > > I'm not sure what you mean by data fragmentation, but we do support > horizontal partitioning by table constraints. And toasting is like > vertical partitioning. > > What exactly are you thinking of implementing? > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com >
Gustavo Tonini wrote: > Well, I'm thinking in define (maybe via SQL) a set of servers as a > cluster and make the fragmentation rules based on "select clauses", > storing this "configuration" in a specific catalog in global schema. > For example: when a record is inserted in a server which not store > this fragment (no rule matches), it will be automatically moved do the > correct server (the server who matches that record), transparently to > the user. This process involves transaction management and a lot more > validations. > > A propose is: > > create partition <name> on <site> as <select clause> > > This create a replication in this site with the rules of the select clause. > > We can provide, for example, users create a foreign key in a site > referencing a table who stored in other(s). In this case, the foreign > key will be created in global schema but the validations of the local > schema's would be managed by DRDBMS. > > These are my ideas. Is this supported today? Oh, you're talking about distributing partitions across different nodes and parallelizing queries. No, we don't do that today. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 4/23/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > Gustavo Tonini wrote: > > Well, I'm thinking in define (maybe via SQL) a set of servers as a > > cluster and make the fragmentation rules based on "select clauses", > > storing this "configuration" in a specific catalog in global schema. > > For example: when a record is inserted in a server which not store > > this fragment (no rule matches), it will be automatically moved do the > > correct server (the server who matches that record), transparently to > > the user. This process involves transaction management and a lot more > > validations. > > > > A propose is: > > > > create partition <name> on <site> as <select clause> > > > > This create a replication in this site with the rules of the select clause. > > > > We can provide, for example, users create a foreign key in a site > > referencing a table who stored in other(s). In this case, the foreign > > key will be created in global schema but the validations of the local > > schema's would be managed by DRDBMS. > > > > These are my ideas. Is this supported today? > > Oh, you're talking about distributing partitions across different nodes > and parallelizing queries. No, we don't do that today. Yes.This is the goal. Well, I will try it. I'll send the project reports to this list. Comments will be valuable. Desire me good luck... Thanks, Gustavo.
Gustavo, > > Oh, you're talking about distributing partitions across different nodes > > and parallelizing queries. No, we don't do that today. > > Yes.This is the goal. Well, I will try it. I'll send the project > reports to this list. Comments will be valuable. Desire me good > luck... You might join/look at the PgPoolII project, which is working on parallel query amoung other things. -- Josh Berkus PostgreSQL @ Sun San Francisco
On 4/23/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > Oh, you're talking about distributing partitions across different nodes > and parallelizing queries. No, we don't do that today. PL/Proxy actually works like that, only in smaller scope - for function calls only. General solution that partitions free-form SQL will be non-trivial... -- marko
Josh, On 4/23/07, Josh Berkus <josh@agliodbs.com> wrote: > Gustavo, > > > > Oh, you're talking about distributing partitions across different nodes > > > and parallelizing queries. No, we don't do that today. > > > > Yes.This is the goal. Well, I will try it. I'll send the project > > reports to this list. Comments will be valuable. Desire me good > > luck... > > You might join/look at the PgPoolII project, which is working on parallel > query amoung other things. > The pgpool is an interesting approach to this, but I think that the funcionality of inserting a record at a backend which will be "redirectioned" to other and verifying deadlocks under network demands in acquiring locks on the referenced records/tables in several hosts. Then, IMO, this may be implemented inside dbms. How Marko wrote, this is a non-trivial solution... Really, It could be improved on pgpool to be a process coordinator, but will need some changes in backend too. This is a non trivial implementation, but there are several users waiting for an effective solution for data distributing in a cluster. These users actually buy commercial solutions or build themselves one. Gustavo.
Marko, On 4/24/07, Marko Kreen <markokr@gmail.com> wrote: > On 4/23/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > > Oh, you're talking about distributing partitions across different nodes > > and parallelizing queries. No, we don't do that today. > > PL/Proxy actually works like that, only in smaller scope - > for function calls only. > I think that proposed funcionalities cannot be implemented in a PL scope... Gustavo. > General solution that partitions free-form SQL > will be non-trivial... >
Gustavo, > The pgpool is an interesting approach to this, but I think that the > funcionality of inserting a record at a backend which will be > "redirectioned" to other and verifying deadlocks under network demands > in acquiring locks on the referenced records/tables in several hosts. > Then, IMO, this may be implemented inside dbms. How Marko wrote, this > is a non-trivial solution... > Really, It could be improved on pgpool to be a process coordinator, > but will need some changes in backend too. > This is a non trivial implementation, but there are several users > waiting for an effective solution for data distributing in a cluster. > These users actually buy commercial solutions or build themselves one. Yeah, I was just thinking that if you start from scratch you're not likely to get very far ... you might look at some of the existing partial solutions (pgPoolII, PostgreSQLForest, ExtenDB, etc.) and see if you can improve them. --Josh
On 4/25/07, Gustavo Tonini <gustavotonini@gmail.com> wrote: > On 4/24/07, Marko Kreen <markokr@gmail.com> wrote: > > On 4/23/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote: > > > Oh, you're talking about distributing partitions across different nodes > > > and parallelizing queries. No, we don't do that today. > > > > PL/Proxy actually works like that, only in smaller scope - > > for function calls only. > > > > I think that proposed funcionalities cannot be implemented in a PL scope... Oh, sure. PL/proxy just proves if you can adhere to specific coding-style - all db access goes via functions - you can solve the problem today, with simple tools. You dont need even PL/proxy for that, any PL that can do connections (plpython, plperl, ..) can be used for proxy functions. PL/proxy just makes it so much easier. Function based DB API can give additional benefits: - easy upgradeablility - easy to change db structure without apps knowing - easy to monitor/audit And ofcourse: - if database hits hardware limits, you can replace all functions with PLproxy functions and spread data over several partitions. I think most OLTP apps can be designed around db-functions, with OLAP it will be bit harder, but there it is less critical also. -- marko