Thread: Dynamic data model, locks and performance

Dynamic data model, locks and performance

From
Pierre Thibault
Date:
Hello people of the Postgresql world!

I am wondering if Postgresql would a great choice for my database needs.

I would like to create a db with dynamic data model. I would like to be able to add tables and columns to existing tables while other queries are running.

Will Postresql be able to manage the locks correctly? Am I pushing Postgresql too far by wanting to do something like this? If so, what do you suggest instead?

Also, I would like to be able to have millions of tables in my database. As I know, Postgresql uses one file per table. So is this problem related to performance about mapping correctly the files with the file system? Is Posgresql able to handle this kind of charge?

--
A+

-------------
Pierre

Re: Dynamic data model, locks and performance

From
Craig Ringer
Date:
On 29/07/10 07:06, Pierre Thibault wrote:
> Hello people of the Postgresql world!
>
> I am wondering if Postgresql would a great choice for my database needs.
>
> I would like to create a db with dynamic data model. I would like to be
> able to add tables and columns to existing tables while other queries
> are running.
>
> Will Postresql be able to manage the locks correctly? Am I pushing
> Postgresql too far by wanting to do something like this? If so, what do
> you suggest instead?
>
> Also, I would like to be able to have millions of tables in my database.
> As I know, Postgresql uses one file per table. So is this problem
> related to performance about mapping correctly the files with the file
> system? Is Posgresql able to handle this kind of charge?

From what you say, I really doubt that PostgreSQL will be well suited to
your needs. You might be able to make it work, but I doubt it'll be
great. At heart Pg is a relational database, and you want to do things
like dynamic table structures that relational databases are typically
not great at.

People here might be able to suggest alternate approaches if you can
explain *why* you want to do what you want, what your goals are, etc. Pg
might be quite suitable if you can rethink how you're doing things a
little - or it might just be a bad choice.

I doubt anyone can make any useful recommendations without a more
complete explanation of what you're trying to achieve and why you want
to do what you have described.

Re Pg-specific stuff, you should look into hstore. It might offer a way
to avoid the need for dynamic table re-structuring.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Re: Dynamic data model, locks and performance

From
Pierre Thibault
Date:
2010/7/28 Craig Ringer <craig@postnewspapers.com.au>
On 29/07/10 07:06, Pierre Thibault wrote:
I doubt anyone can make any useful recommendations without a more
complete explanation of what you're trying to achieve and why you want
to do what you have described.

Thank you Craig,

Yes, I was not very specific. I have an idea in mind. Everyone knows about Craig's List and the hype about NoSql databases. This seems to be cool to do full text searches in a lot of data. But what about doing more formal searches based specific criteria like dates, numbers or other kind values on specific fields? This where an SQL database shines.

What I would like to do is enable users to create their own data model. Enable them to create a model and make it evolve. For example, it would be cool to create a model to represent car adds. Then, the buyers can search adds using the previous model with specific criteria like the color of the car they want.

I thought about using a table where each row would represents a field in the dynamic model. The row would contain a fields for each possible data type supported but only one of them would be used. This would waste a lot space and it would also be hard to query efficiently but I will not have to change the schema as users create new models. Otherwise, I would create a standard data model and apply the changes as users update their models.

I also like to support inheritance so we can have a simple model for regular adds and more specfic models with more fields for car adds for example.

I have a really hard finding how to implement my idea efficiently. So the advice of experienced database developers is what I am looking for.
 

Re Pg-specific stuff, you should look into hstore. It might offer a way
to avoid the need for dynamic table re-structuring.

I thought about using maps for the entities of my domain model. hstore seems to be an interesting avenue.


--
A+

-------------
Pierre

Re: Dynamic data model, locks and performance

From
Peter Hunsberger
Date:
On Wed, Jul 28, 2010 at 10:15 PM, Pierre Thibault
<pierre.thibault1@gmail.com> wrote:
>

> What I would like to do is enable users to create their own data model. Enable them to create a model and make > it
evolve.For example, it would be cool to create a model to represent car adds. Then, the buyers can search adds > using
theprevious model with specific criteria like the color of the car they want. 
>

Sounds like a standard parts explosion or component / subcomponent
hierarchy?  The most general form has two pieces:

1) a tree model of some form. I like the set / subset model that Joe
Celko popularized (search the archives for other references to it from
me);

2) an extensible set of containers to hang of the tree.  Here I like
to use a strongly typed version of the Entity Attribute Value (EAV)
type model;

Those two pieces on their own are so general that you can't really
scale the results very far.  However, if you've got some specific
goals in mind I might be able to provide some hints.  If you're an
IEEE member I can point you at a presentation I did on the basics (I
think)...

--
Peter Hunsberger

Re: Dynamic data model, locks and performance

From
Craig Ringer
Date:
On 29/07/10 11:15, Pierre Thibault wrote:

> What I would like to do is enable users to create their own data model.

Then, really, SQL databases aren't wonderful for your needs. You can use
them for dynamic, user-defined schema, but you'll always be swimming up
hill.

> I thought about using a table where each row would represents a field in
> the dynamic model. The row would contain a fields for each possible data
> type supported but only one of them would be used.

Google "EAV"

This has known problems. It works, but it's kind of like bolting floats
onto a car and trying to use it as a boat. Not very nice.

Really, I'd be researching dynamic schema databases, object stores,
key/value set stores, etc. Such things are outside my personal
experience, though, and I can't really advise you much on technologies.
Beware of the latest No-SQL fad platforms, though; you might land up
being better off with something older and more stable even if it's less
fashionable than CouchDB or whatever is currently "in" today.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Re: Dynamic data model, locks and performance

From
Yeb Havinga
Date:
Pierre Thibault wrote:
> Hello people of the Postgresql world!
>
> I am wondering if Postgresql would a great choice for my database needs.
>
> I would like to create a db with dynamic data model. I would like to
> be able to add tables and columns to existing tables while other
> queries are running.
It sounds a bit like the (commercial) ChronicDB add-on that was on the
announce list a while ago, http://chronicdb.com/

regards,
Yeb Havinga


Re: Dynamic data model, locks and performance

From
Craig Ringer
Date:
On 29/07/10 22:36, Pierre Thibault wrote:

> Why so? This is something expected by a database used in a constant
> integration environment. Maybe I did not expressed myself very well. Users
> are not changing their models all the time. They create new models which
> mean create new tables and from time to time they add new fields to existing
> tables just like developers do when they update a schema for new application
> functionalities.

Ah, OK. I read your initial mail as implying much more frequent changes,
especially combined with "millions" of tables.

> In my last job, I was working with constant database integration. We were
> created DDL scripts to add new fields and tables live on a SQLSever database
> in production. Most scripts were executed during to night to reduce the
> impact on the db. In practice, this may means that a running query will have
> to wait maybe half a second to get a lock because of such update. Usually,
> not really more than that. Can I expect similar performance with
> Postgressql?

With a few caveats, yes.

The main one: For columns you want to be NOT NULL, you should add new
columns as nullable. Then UPDATE the new column to hold any desired
default, before issuing an

ALTER TABLE ... ALTER COLUMN ... SET NOT NULL.

That's because an ALTER TABLE to add a nullable column doesn't have to
rewrite the table. An ALTER TABLE to add a NOT NULL column has to
immediately rewrite the table to add the default to every record. This
is slow, and during this operation ALTER TABLE holds an exclusive lock.

By contrast, if you ALTER TABLE to add a nullable column (brief
exclusive lock), UPDATE (long much lower-order lock that doesn't
conflict with SELECT, INSERT, or UPDATE to unaffected rows) and then
finally ALTER TABLE again to add the constraint (a further brief lock)
you have greatly reduced lock times.

>> Really, I'd be researching dynamic schema databases, object stores,
>> key/value set stores, etc. Such things are outside my personal
>> experience, though, and I can't really advise you much on technologies.
>> Beware of the latest No-SQL fad platforms, though; you might land up
>> being better off with something older and more stable even if it's less
>> fashionable than CouchDB or whatever is currently "in" today.
>>
> Maybe, but, as I said, using a SQL database with the one table by class
> hierarchy strategy seems to be the way to go for me. I'll take a lot a these
> options too.

Just beware of huge table counts. People have reported issues on the
list with truly huge numbers of tables. It's not something that turns up
in most regular relational designs, and there are a few things in Pg
(like, AFAIK, autovacuum's scanning of tables to vacuum) that scale
linearly with table counts.

I'm sure it's workable, it just might not be ideal.

--
Craig Ringer

Re: Dynamic data model, locks and performance

From
Andreas Joseph Krogh
Date:
On 07/29/2010 05:15 AM, Pierre Thibault wrote:
2010/7/28 Craig Ringer <craig@postnewspapers.com.au>
On 29/07/10 07:06, Pierre Thibault wrote:
I doubt anyone can make any useful recommendations without a more
complete explanation of what you're trying to achieve and why you want
to do what you have described.

Thank you Craig,

Yes, I was not very specific. I have an idea in mind. Everyone knows about Craig's List and the hype about NoSql databases. This seems to be cool to do full text searches in a lot of data. But what about doing more formal searches based specific criteria like dates, numbers or other kind values on specific fields? This where an SQL database shines.

What I would like to do is enable users to create their own data model. Enable them to create a model and make it evolve. For example, it would be cool to create a model to represent car adds. Then, the buyers can search adds using the previous model with specific criteria like the color of the car they want.

I thought about using a table where each row would represents a field in the dynamic model. The row would contain a fields for each possible data type supported but only one of them would be used. This would waste a lot space and it would also be hard to query efficiently but I will not have to change the schema as users create new models. Otherwise, I would create a standard data model and apply the changes as users update their models.

I also like to support inheritance so we can have a simple model for regular adds and more specfic models with more fields for car adds for example.

I have a really hard finding how to implement my idea efficiently. So the advice of experienced database developers is what I am looking for.
 

Re Pg-specific stuff, you should look into hstore. It might offer a way
to avoid the need for dynamic table re-structuring.

I thought about using maps for the entities of my domain model. hstore seems to be an interesting avenue.


You might want to take a look at TopicMaps, http://www.topicmaps.org/
One of the greatest topicmap-engines is opensource: http://www.ontopia.net/
It uses, if you want, an RDBMS (PostgreSQL is fully supported) underneath.
-- 
Andreas Joseph Krogh <andreak@officenet.no>
Senior Software Developer / CTO
------------------------+---------------------------------------------+
OfficeNet AS            | The most difficult thing in the world is to |
Rosenholmveien 25       | know how to do a thing and to watch         |
1414 Trollåsen          | somebody else doing it wrong, without       |
NORWAY                  | comment.                                    |                       |                                             |
Tlf:    +47 24 15 38 90 |                                             |
Fax:    +47 24 15 38 91 |                                             |
Mobile: +47 909  56 963 |                                             |
------------------------+---------------------------------------------+

Re: Dynamic data model, locks and performance

From
John R Pierce
Date:
  On 07/30/10 1:08 PM, Andreas Joseph Krogh wrote:
> You might want to take a look at TopicMaps, http://www.topicmaps.org/
> One of the greatest topicmap-engines is opensource:
> http://www.ontopia.net/

Huh, visited both those sites, and I still have no idea what TopicMaps
actually are.

Re: Dynamic data model, locks and performance

From
Andreas Joseph Krogh
Date:
On 07/30/2010 10:20 PM, John R Pierce wrote:
>  On 07/30/10 1:08 PM, Andreas Joseph Krogh wrote:
>> You might want to take a look at TopicMaps, http://www.topicmaps.org/
>> One of the greatest topicmap-engines is opensource:
>> http://www.ontopia.net/
>
> Huh, visited both those sites, and I still have no idea what TopicMaps
> actually are.

http://www.ontopia.net/topicmaps/materials/tao.html

--
Andreas Joseph Krogh<andreak@officenet.no>
Senior Software Developer / CTO
------------------------+---------------------------------------------+
OfficeNet AS            | The most difficult thing in the world is to |
Rosenholmveien 25       | know how to do a thing and to watch         |
1414 Trollåsen          | somebody else doing it wrong, without       |
NORWAY                  | comment.                                    |
                         |                                             |
Tlf:    +47 24 15 38 90 |                                             |
Fax:    +47 24 15 38 91 |                                             |
Mobile: +47 909  56 963 |                                             |
------------------------+---------------------------------------------+


Re: Dynamic data model, locks and performance

From
ChronicDB Community Team
Date:
This example is certainly a workable situation. However it does require
understanding the constraints of an ALTER TABLE statement and manually
developing appropriate scripts.

The update model offered my ChronicDB accounts for schema changes of
considerable complexity, such as merging fields, partitioning, renaming,
or moving fields from one table to another, as well as for reversing
schema changes live if a bug is discovered in the application.

On Thu, 2010-07-29 at 23:22 +0800, Craig Ringer wrote:
> On 29/07/10 22:36, Pierre Thibault wrote:
>
> > Why so? This is something expected by a database used in a constant
> > integration environment. Maybe I did not expressed myself very well. Users
> > are not changing their models all the time. They create new models which
> > mean create new tables and from time to time they add new fields to existing
> > tables just like developers do when they update a schema for new application
> > functionalities.
>
> Ah, OK. I read your initial mail as implying much more frequent changes,
> especially combined with "millions" of tables.
>
> > In my last job, I was working with constant database integration. We were
> > created DDL scripts to add new fields and tables live on a SQLSever database
> > in production. Most scripts were executed during to night to reduce the
> > impact on the db. In practice, this may means that a running query will have
> > to wait maybe half a second to get a lock because of such update. Usually,
> > not really more than that. Can I expect similar performance with
> > Postgressql?
>
> With a few caveats, yes.
>
> The main one: For columns you want to be NOT NULL, you should add new
> columns as nullable. Then UPDATE the new column to hold any desired
> default, before issuing an
>
> ALTER TABLE ... ALTER COLUMN ... SET NOT NULL.
>
> That's because an ALTER TABLE to add a nullable column doesn't have to
> rewrite the table. An ALTER TABLE to add a NOT NULL column has to
> immediately rewrite the table to add the default to every record. This
> is slow, and during this operation ALTER TABLE holds an exclusive lock.
>
> By contrast, if you ALTER TABLE to add a nullable column (brief
> exclusive lock), UPDATE (long much lower-order lock that doesn't
> conflict with SELECT, INSERT, or UPDATE to unaffected rows) and then
> finally ALTER TABLE again to add the constraint (a further brief lock)
> you have greatly reduced lock times.
>
> >> Really, I'd be researching dynamic schema databases, object stores,
> >> key/value set stores, etc. Such things are outside my personal
> >> experience, though, and I can't really advise you much on technologies.
> >> Beware of the latest No-SQL fad platforms, though; you might land up
> >> being better off with something older and more stable even if it's less
> >> fashionable than CouchDB or whatever is currently "in" today.
> >>
> > Maybe, but, as I said, using a SQL database with the one table by class
> > hierarchy strategy seems to be the way to go for me. I'll take a lot a these
> > options too.
>
> Just beware of huge table counts. People have reported issues on the
> list with truly huge numbers of tables. It's not something that turns up
> in most regular relational designs, and there are a few things in Pg
> (like, AFAIK, autovacuum's scanning of tables to vacuum) that scale
> linearly with table counts.
>
> I'm sure it's workable, it just might not be ideal.
>
> --
> Craig Ringer
>