Thread: Dynamic data model, locks and performance
Hello people of the Postgresql world!
I am wondering if Postgresql would a great choice for my database needs.
I would like to create a db with dynamic data model. I would like to be able to add tables and columns to existing tables while other queries are running.
Will Postresql be able to manage the locks correctly? Am I pushing Postgresql too far by wanting to do something like this? If so, what do you suggest instead?
Also, I would like to be able to have millions of tables in my database. As I know, Postgresql uses one file per table. So is this problem related to performance about mapping correctly the files with the file system? Is Posgresql able to handle this kind of charge?
--
A+
-------------
Pierre
I am wondering if Postgresql would a great choice for my database needs.
I would like to create a db with dynamic data model. I would like to be able to add tables and columns to existing tables while other queries are running.
Will Postresql be able to manage the locks correctly? Am I pushing Postgresql too far by wanting to do something like this? If so, what do you suggest instead?
Also, I would like to be able to have millions of tables in my database. As I know, Postgresql uses one file per table. So is this problem related to performance about mapping correctly the files with the file system? Is Posgresql able to handle this kind of charge?
--
A+
-------------
Pierre
On 29/07/10 07:06, Pierre Thibault wrote: > Hello people of the Postgresql world! > > I am wondering if Postgresql would a great choice for my database needs. > > I would like to create a db with dynamic data model. I would like to be > able to add tables and columns to existing tables while other queries > are running. > > Will Postresql be able to manage the locks correctly? Am I pushing > Postgresql too far by wanting to do something like this? If so, what do > you suggest instead? > > Also, I would like to be able to have millions of tables in my database. > As I know, Postgresql uses one file per table. So is this problem > related to performance about mapping correctly the files with the file > system? Is Posgresql able to handle this kind of charge? From what you say, I really doubt that PostgreSQL will be well suited to your needs. You might be able to make it work, but I doubt it'll be great. At heart Pg is a relational database, and you want to do things like dynamic table structures that relational databases are typically not great at. People here might be able to suggest alternate approaches if you can explain *why* you want to do what you want, what your goals are, etc. Pg might be quite suitable if you can rethink how you're doing things a little - or it might just be a bad choice. I doubt anyone can make any useful recommendations without a more complete explanation of what you're trying to achieve and why you want to do what you have described. Re Pg-specific stuff, you should look into hstore. It might offer a way to avoid the need for dynamic table re-structuring. -- Craig Ringer Tech-related writing: http://soapyfrogs.blogspot.com/
2010/7/28 Craig Ringer <craig@postnewspapers.com.au>
Thank you Craig,
Yes, I was not very specific. I have an idea in mind. Everyone knows about Craig's List and the hype about NoSql databases. This seems to be cool to do full text searches in a lot of data. But what about doing more formal searches based specific criteria like dates, numbers or other kind values on specific fields? This where an SQL database shines.
What I would like to do is enable users to create their own data model. Enable them to create a model and make it evolve. For example, it would be cool to create a model to represent car adds. Then, the buyers can search adds using the previous model with specific criteria like the color of the car they want.
I thought about using a table where each row would represents a field in the dynamic model. The row would contain a fields for each possible data type supported but only one of them would be used. This would waste a lot space and it would also be hard to query efficiently but I will not have to change the schema as users create new models. Otherwise, I would create a standard data model and apply the changes as users update their models.
I also like to support inheritance so we can have a simple model for regular adds and more specfic models with more fields for car adds for example.
I have a really hard finding how to implement my idea efficiently. So the advice of experienced database developers is what I am looking for.
I thought about using maps for the entities of my domain model. hstore seems to be an interesting avenue.
I doubt anyone can make any useful recommendations without a moreOn 29/07/10 07:06, Pierre Thibault wrote:
complete explanation of what you're trying to achieve and why you want
to do what you have described.
Thank you Craig,
Yes, I was not very specific. I have an idea in mind. Everyone knows about Craig's List and the hype about NoSql databases. This seems to be cool to do full text searches in a lot of data. But what about doing more formal searches based specific criteria like dates, numbers or other kind values on specific fields? This where an SQL database shines.
What I would like to do is enable users to create their own data model. Enable them to create a model and make it evolve. For example, it would be cool to create a model to represent car adds. Then, the buyers can search adds using the previous model with specific criteria like the color of the car they want.
I thought about using a table where each row would represents a field in the dynamic model. The row would contain a fields for each possible data type supported but only one of them would be used. This would waste a lot space and it would also be hard to query efficiently but I will not have to change the schema as users create new models. Otherwise, I would create a standard data model and apply the changes as users update their models.
I also like to support inheritance so we can have a simple model for regular adds and more specfic models with more fields for car adds for example.
I have a really hard finding how to implement my idea efficiently. So the advice of experienced database developers is what I am looking for.
Re Pg-specific stuff, you should look into hstore. It might offer a way
to avoid the need for dynamic table re-structuring.
I thought about using maps for the entities of my domain model. hstore seems to be an interesting avenue.
--
A+
-------------
Pierre
On Wed, Jul 28, 2010 at 10:15 PM, Pierre Thibault <pierre.thibault1@gmail.com> wrote: > > What I would like to do is enable users to create their own data model. Enable them to create a model and make > it evolve.For example, it would be cool to create a model to represent car adds. Then, the buyers can search adds > using theprevious model with specific criteria like the color of the car they want. > Sounds like a standard parts explosion or component / subcomponent hierarchy? The most general form has two pieces: 1) a tree model of some form. I like the set / subset model that Joe Celko popularized (search the archives for other references to it from me); 2) an extensible set of containers to hang of the tree. Here I like to use a strongly typed version of the Entity Attribute Value (EAV) type model; Those two pieces on their own are so general that you can't really scale the results very far. However, if you've got some specific goals in mind I might be able to provide some hints. If you're an IEEE member I can point you at a presentation I did on the basics (I think)... -- Peter Hunsberger
On 29/07/10 11:15, Pierre Thibault wrote: > What I would like to do is enable users to create their own data model. Then, really, SQL databases aren't wonderful for your needs. You can use them for dynamic, user-defined schema, but you'll always be swimming up hill. > I thought about using a table where each row would represents a field in > the dynamic model. The row would contain a fields for each possible data > type supported but only one of them would be used. Google "EAV" This has known problems. It works, but it's kind of like bolting floats onto a car and trying to use it as a boat. Not very nice. Really, I'd be researching dynamic schema databases, object stores, key/value set stores, etc. Such things are outside my personal experience, though, and I can't really advise you much on technologies. Beware of the latest No-SQL fad platforms, though; you might land up being better off with something older and more stable even if it's less fashionable than CouchDB or whatever is currently "in" today. -- Craig Ringer Tech-related writing: http://soapyfrogs.blogspot.com/
Pierre Thibault wrote: > Hello people of the Postgresql world! > > I am wondering if Postgresql would a great choice for my database needs. > > I would like to create a db with dynamic data model. I would like to > be able to add tables and columns to existing tables while other > queries are running. It sounds a bit like the (commercial) ChronicDB add-on that was on the announce list a while ago, http://chronicdb.com/ regards, Yeb Havinga
On 29/07/10 22:36, Pierre Thibault wrote: > Why so? This is something expected by a database used in a constant > integration environment. Maybe I did not expressed myself very well. Users > are not changing their models all the time. They create new models which > mean create new tables and from time to time they add new fields to existing > tables just like developers do when they update a schema for new application > functionalities. Ah, OK. I read your initial mail as implying much more frequent changes, especially combined with "millions" of tables. > In my last job, I was working with constant database integration. We were > created DDL scripts to add new fields and tables live on a SQLSever database > in production. Most scripts were executed during to night to reduce the > impact on the db. In practice, this may means that a running query will have > to wait maybe half a second to get a lock because of such update. Usually, > not really more than that. Can I expect similar performance with > Postgressql? With a few caveats, yes. The main one: For columns you want to be NOT NULL, you should add new columns as nullable. Then UPDATE the new column to hold any desired default, before issuing an ALTER TABLE ... ALTER COLUMN ... SET NOT NULL. That's because an ALTER TABLE to add a nullable column doesn't have to rewrite the table. An ALTER TABLE to add a NOT NULL column has to immediately rewrite the table to add the default to every record. This is slow, and during this operation ALTER TABLE holds an exclusive lock. By contrast, if you ALTER TABLE to add a nullable column (brief exclusive lock), UPDATE (long much lower-order lock that doesn't conflict with SELECT, INSERT, or UPDATE to unaffected rows) and then finally ALTER TABLE again to add the constraint (a further brief lock) you have greatly reduced lock times. >> Really, I'd be researching dynamic schema databases, object stores, >> key/value set stores, etc. Such things are outside my personal >> experience, though, and I can't really advise you much on technologies. >> Beware of the latest No-SQL fad platforms, though; you might land up >> being better off with something older and more stable even if it's less >> fashionable than CouchDB or whatever is currently "in" today. >> > Maybe, but, as I said, using a SQL database with the one table by class > hierarchy strategy seems to be the way to go for me. I'll take a lot a these > options too. Just beware of huge table counts. People have reported issues on the list with truly huge numbers of tables. It's not something that turns up in most regular relational designs, and there are a few things in Pg (like, AFAIK, autovacuum's scanning of tables to vacuum) that scale linearly with table counts. I'm sure it's workable, it just might not be ideal. -- Craig Ringer
On 07/29/2010 05:15 AM, Pierre Thibault wrote:
You might want to take a look at TopicMaps, http://www.topicmaps.org/
One of the greatest topicmap-engines is opensource: http://www.ontopia.net/
It uses, if you want, an RDBMS (PostgreSQL is fully supported) underneath.
2010/7/28 Craig Ringer <craig@postnewspapers.com.au>I doubt anyone can make any useful recommendations without a moreOn 29/07/10 07:06, Pierre Thibault wrote:
complete explanation of what you're trying to achieve and why you want
to do what you have described.
Thank you Craig,
Yes, I was not very specific. I have an idea in mind. Everyone knows about Craig's List and the hype about NoSql databases. This seems to be cool to do full text searches in a lot of data. But what about doing more formal searches based specific criteria like dates, numbers or other kind values on specific fields? This where an SQL database shines.
What I would like to do is enable users to create their own data model. Enable them to create a model and make it evolve. For example, it would be cool to create a model to represent car adds. Then, the buyers can search adds using the previous model with specific criteria like the color of the car they want.
I thought about using a table where each row would represents a field in the dynamic model. The row would contain a fields for each possible data type supported but only one of them would be used. This would waste a lot space and it would also be hard to query efficiently but I will not have to change the schema as users create new models. Otherwise, I would create a standard data model and apply the changes as users update their models.
I also like to support inheritance so we can have a simple model for regular adds and more specfic models with more fields for car adds for example.
I have a really hard finding how to implement my idea efficiently. So the advice of experienced database developers is what I am looking for.
Re Pg-specific stuff, you should look into hstore. It might offer a way
to avoid the need for dynamic table re-structuring.
I thought about using maps for the entities of my domain model. hstore seems to be an interesting avenue.
You might want to take a look at TopicMaps, http://www.topicmaps.org/
One of the greatest topicmap-engines is opensource: http://www.ontopia.net/
It uses, if you want, an RDBMS (PostgreSQL is fully supported) underneath.
-- Andreas Joseph Krogh <andreak@officenet.no> Senior Software Developer / CTO ------------------------+---------------------------------------------+ OfficeNet AS | The most difficult thing in the world is to | Rosenholmveien 25 | know how to do a thing and to watch | 1414 Trollåsen | somebody else doing it wrong, without | NORWAY | comment. | | | Tlf: +47 24 15 38 90 | | Fax: +47 24 15 38 91 | | Mobile: +47 909 56 963 | | ------------------------+---------------------------------------------+
On 07/30/10 1:08 PM, Andreas Joseph Krogh wrote: > You might want to take a look at TopicMaps, http://www.topicmaps.org/ > One of the greatest topicmap-engines is opensource: > http://www.ontopia.net/ Huh, visited both those sites, and I still have no idea what TopicMaps actually are.
On 07/30/2010 10:20 PM, John R Pierce wrote: > On 07/30/10 1:08 PM, Andreas Joseph Krogh wrote: >> You might want to take a look at TopicMaps, http://www.topicmaps.org/ >> One of the greatest topicmap-engines is opensource: >> http://www.ontopia.net/ > > Huh, visited both those sites, and I still have no idea what TopicMaps > actually are. http://www.ontopia.net/topicmaps/materials/tao.html -- Andreas Joseph Krogh<andreak@officenet.no> Senior Software Developer / CTO ------------------------+---------------------------------------------+ OfficeNet AS | The most difficult thing in the world is to | Rosenholmveien 25 | know how to do a thing and to watch | 1414 Trollåsen | somebody else doing it wrong, without | NORWAY | comment. | | | Tlf: +47 24 15 38 90 | | Fax: +47 24 15 38 91 | | Mobile: +47 909 56 963 | | ------------------------+---------------------------------------------+
This example is certainly a workable situation. However it does require understanding the constraints of an ALTER TABLE statement and manually developing appropriate scripts. The update model offered my ChronicDB accounts for schema changes of considerable complexity, such as merging fields, partitioning, renaming, or moving fields from one table to another, as well as for reversing schema changes live if a bug is discovered in the application. On Thu, 2010-07-29 at 23:22 +0800, Craig Ringer wrote: > On 29/07/10 22:36, Pierre Thibault wrote: > > > Why so? This is something expected by a database used in a constant > > integration environment. Maybe I did not expressed myself very well. Users > > are not changing their models all the time. They create new models which > > mean create new tables and from time to time they add new fields to existing > > tables just like developers do when they update a schema for new application > > functionalities. > > Ah, OK. I read your initial mail as implying much more frequent changes, > especially combined with "millions" of tables. > > > In my last job, I was working with constant database integration. We were > > created DDL scripts to add new fields and tables live on a SQLSever database > > in production. Most scripts were executed during to night to reduce the > > impact on the db. In practice, this may means that a running query will have > > to wait maybe half a second to get a lock because of such update. Usually, > > not really more than that. Can I expect similar performance with > > Postgressql? > > With a few caveats, yes. > > The main one: For columns you want to be NOT NULL, you should add new > columns as nullable. Then UPDATE the new column to hold any desired > default, before issuing an > > ALTER TABLE ... ALTER COLUMN ... SET NOT NULL. > > That's because an ALTER TABLE to add a nullable column doesn't have to > rewrite the table. An ALTER TABLE to add a NOT NULL column has to > immediately rewrite the table to add the default to every record. This > is slow, and during this operation ALTER TABLE holds an exclusive lock. > > By contrast, if you ALTER TABLE to add a nullable column (brief > exclusive lock), UPDATE (long much lower-order lock that doesn't > conflict with SELECT, INSERT, or UPDATE to unaffected rows) and then > finally ALTER TABLE again to add the constraint (a further brief lock) > you have greatly reduced lock times. > > >> Really, I'd be researching dynamic schema databases, object stores, > >> key/value set stores, etc. Such things are outside my personal > >> experience, though, and I can't really advise you much on technologies. > >> Beware of the latest No-SQL fad platforms, though; you might land up > >> being better off with something older and more stable even if it's less > >> fashionable than CouchDB or whatever is currently "in" today. > >> > > Maybe, but, as I said, using a SQL database with the one table by class > > hierarchy strategy seems to be the way to go for me. I'll take a lot a these > > options too. > > Just beware of huge table counts. People have reported issues on the > list with truly huge numbers of tables. It's not something that turns up > in most regular relational designs, and there are a few things in Pg > (like, AFAIK, autovacuum's scanning of tables to vacuum) that scale > linearly with table counts. > > I'm sure it's workable, it just might not be ideal. > > -- > Craig Ringer >