Thread: DB design: How to store object properties?

DB design: How to store object properties?

From
"Maxim Khitrov"
Date:
Greetings everyone,

I'm fairly new to PostgreSQL, but I'm currently doing some research on
tools to be used for an upcoming project and could really use your
help with a possible database design.

So the scenario is this. We have two projects starting that will deal
heavily with mapping spatial regions. One of the reasons I'm looking
at PostgreSQL is the PostGIS extension that may help us in dealing
with all the geometry. The first project will be mapping the outside
environment. Think along the lines of Google Maps, but intended for
use by pedestrians rather than cars. The other project, one that I'll
be managing, will be doing the same thing but with interiors of
buildings. For now, the two projects will be separate, but eventually
the plan is to merge them and actually allow the system to plan paths
that incorporate outside and interior maps.

That's the project description in a nutshell. What I'm doing now is
trying to figure out the best way to store all the spatial
information. I want to have a foundational layer that is as simple as
possible, and that can later be extended for any special needs.

The current plan is to do all the mapping via a graph structure using
nodes/vertices and edges. A node will be defined simply in terms of a
GUID (UUID in PostgreSQL). Each node will have a unique id, but no
other "mandatory" information. An edge is defined by the ids of its
two end points. This setup will be common to both projects, and the
use of UUIDs will ensure that any database merges in the future will
not result in conflicts.

Now here's the tricky bit. We have to be able to associate an
arbitrary number of properties with each node and edge. So an example
of a node property would be that node's position defined by a
latitude/longitude pair. We will define all the possible properties
between our two projects, but just because a property is valid,
doesn't mean that it will be present in or be relevant to every node
in the system. For instance, nodes located inside of a building will
likely have their position defined by something other than latitude
and longitude, since capturing that data with a GPS will not be
possible.

The simplest design would be to create two tables, one for nodes
another for edges, and create a column for every possible property.
This, however, is huge waste of space, since there will not be a
single node or edge that will make use of all the defined properties.
There may be hundreds of properties, but each node may use ten on
average. That's the question - how do you represent this information
in the database in a space-efficient manner. We still have to be able
to search for specific nodes given their properties. For example –
find all the nodes located within radius z of coordinate x,y. PostGIS
will handle the special bit, but there has to be an efficient way of
accessing this information/property for each node that has it.

One other possibility I thought of was defining a master node table
that would contain just the node ids. Then for each property I would
create a separate table with the id column referencing the master.
Here we're still wasting space for the extra storage of UUIDs, and if
I want to run a query that selects nodes with several different
properties, it would require a long list of JOINs.

I looked at table inheritance that PostgreSQL offers, but it turned
out to be something different from what I was hoping for. What I need
is some sort of data inheritance based on a common unique key, but I
don't think that's possible. Same thing with partitioning. If there
was a way to partition the columns (with data) of a single large
table, then that would be an ideal solution. But in the absence of
that feature, what design would you recommend?

Thanks,
Max

Re: DB design: How to store object properties?

From
"Douglas McNaught"
Date:
On 2/17/08, Maxim Khitrov <mkhitrov@gmail.com> wrote:
> The simplest design would be to create two tables, one for nodes
> another for edges, and create a column for every possible property.
> This, however, is huge waste of space, since there will not be a
> single node or edge that will make use of all the defined properties.
> There may be hundreds of properties, but each node may use ten on
> average. That's the question - how do you represent this information
> in the database in a space-efficient manner. We still have to be able
> to search for specific nodes given their properties. For example –
> find all the nodes located within radius z of coordinate x,y. PostGIS
> will handle the special bit, but there has to be an efficient way of
> accessing this information/property for each node that has it.

If you have a lot of columns but most of them are NULL for a given
record, the tuple size won't actually be that big.  PG stores a bitmap
in each tuple telling which columns are NULL, and then only stores the
non-NULL column vales.  So this approach may end up being reasonably
efficient for you.

-Doug

Re: DB design: How to store object properties?

From
"Alexander Staubo"
Date:
On 2/17/08, Maxim Khitrov <mkhitrov@gmail.com> wrote:
> So the scenario is this. We have two projects starting that will deal
> heavily with mapping spatial regions. One of the reasons I'm looking
> at PostgreSQL is the PostGIS extension that may help us in dealing
> with all the geometry.

From your description, PostGIS sounds like it would work great.

> Now here's the tricky bit. We have to be able to associate an
> arbitrary number of properties with each node and edge.
[snip]

This is actually a fairly common scenario. As you point out, the
problem case is impractical to implement in PostgreSQL (though the
relational model is not to blame), and you have enumerated some of the
possible solutions. There are a few other ways to tackle the problem,
though.

While dividing the properties into subtables either by property or by
type (eg., string_properties, int_properties -- which may result in
fever tables) results in many outer joins, you could alleviate the
need to express these joins so often by creating a view that selects
from all the tables.

You can then create rules which turn insert statements on this view
into insert statements on the various subtables. There's the question
of how efficient this will be, but it simplifies the data model
conceptually in a way that does not go against PostgreSQL's grain.

I would also look at the different types of nodes required by your
application. You say that defining a single table containing columns
for all possible properties will result in too many columns. Could you
instead isolate the types of nodes you need (outdoor nodes, indoor
nodes, and so on) and create tables for those, each of which have only
the properties that apply to those nodes?

Again you would end up with a bunch of joins, but perhaps fewer than
in the one-table-per-property scenario.

Alexander.

Re: DB design: How to store object properties?

From
Oleg Bartunov
Date:
Max,

we use contrib/hstore specially designed for such kind of problem. It's a
sort of perl's hash, where you can store all specific properties.
In that way, your table will looks like

create table objects ( id integer, x real, y real, ....,  props hstore)

Here '...' designates other mandatory fields you want to be searched.

Oleg
On Sun, 17 Feb 2008, Maxim Khitrov wrote:

> Greetings everyone,
>
> I'm fairly new to PostgreSQL, but I'm currently doing some research on
> tools to be used for an upcoming project and could really use your
> help with a possible database design.
>
> So the scenario is this. We have two projects starting that will deal
> heavily with mapping spatial regions. One of the reasons I'm looking
> at PostgreSQL is the PostGIS extension that may help us in dealing
> with all the geometry. The first project will be mapping the outside
> environment. Think along the lines of Google Maps, but intended for
> use by pedestrians rather than cars. The other project, one that I'll
> be managing, will be doing the same thing but with interiors of
> buildings. For now, the two projects will be separate, but eventually
> the plan is to merge them and actually allow the system to plan paths
> that incorporate outside and interior maps.
>
> That's the project description in a nutshell. What I'm doing now is
> trying to figure out the best way to store all the spatial
> information. I want to have a foundational layer that is as simple as
> possible, and that can later be extended for any special needs.
>
> The current plan is to do all the mapping via a graph structure using
> nodes/vertices and edges. A node will be defined simply in terms of a
> GUID (UUID in PostgreSQL). Each node will have a unique id, but no
> other "mandatory" information. An edge is defined by the ids of its
> two end points. This setup will be common to both projects, and the
> use of UUIDs will ensure that any database merges in the future will
> not result in conflicts.
>
> Now here's the tricky bit. We have to be able to associate an
> arbitrary number of properties with each node and edge. So an example
> of a node property would be that node's position defined by a
> latitude/longitude pair. We will define all the possible properties
> between our two projects, but just because a property is valid,
> doesn't mean that it will be present in or be relevant to every node
> in the system. For instance, nodes located inside of a building will
> likely have their position defined by something other than latitude
> and longitude, since capturing that data with a GPS will not be
> possible.
>
> The simplest design would be to create two tables, one for nodes
> another for edges, and create a column for every possible property.
> This, however, is huge waste of space, since there will not be a
> single node or edge that will make use of all the defined properties.
> There may be hundreds of properties, but each node may use ten on
> average. That's the question - how do you represent this information
> in the database in a space-efficient manner. We still have to be able
> to search for specific nodes given their properties. For example ЪЪ
> find all the nodes located within radius z of coordinate x,y. PostGIS
> will handle the special bit, but there has to be an efficient way of
> accessing this information/property for each node that has it.
>
> One other possibility I thought of was defining a master node table
> that would contain just the node ids. Then for each property I would
> create a separate table with the id column referencing the master.
> Here we're still wasting space for the extra storage of UUIDs, and if
> I want to run a query that selects nodes with several different
> properties, it would require a long list of JOINs.
>
> I looked at table inheritance that PostgreSQL offers, but it turned
> out to be something different from what I was hoping for. What I need
> is some sort of data inheritance based on a common unique key, but I
> don't think that's possible. Same thing with partitioning. If there
> was a way to partition the columns (with data) of a single large
> table, then that would be an ideal solution. But in the absence of
> that feature, what design would you recommend?
>
> Thanks,
> Max
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83