Thread: PostgreSQL as a triple store

PostgreSQL as a triple store

From
Jimmy Thrasibule
Date:

Hello,

I'm looking into gathering information about geographical locations:

- Coordinates
- Place name
- Pictures
- etc.

Since a place can be anything, a specific place may  need any kind of data type to be described.

I'm therefore looking into using the same method as the semantic Web and trying to describe a place with triples.

1. The "triples" table will reference a subject, attribute and value.
2. The "attributes" table will have an attribute name, a type and mauve a table name.
3. For each attribute type, a new table is created to store the values.
4. A "resources" table list all the available resources (location, picture, ...).

Using this, I can easily add a new resource and describe it by adding new triples. Every item is identified using UUIDs as primary key.

This seems too simple to be true so I falling back to you in case you see any pitfalls here.

The "triples" table will grow very fast in this setup and may become a bottleneck. However, I don't see any other way to store information about something as wide as places around the world.

Regards,
Jimmy

Re: PostgreSQL as a triple store

From
Chris Travers
Date:
Is there a reason why hstore or json is not an option?  That may work a lot better than this approach.

Another approach I have had is a set of additional catalog tables and dynamically adding/removing columns from an extended attributes table.



On Tue, Aug 12, 2014 at 6:19 AM, Jimmy Thrasibule <thrasibule.jimmy@gmail.com> wrote:

Hello,

I'm looking into gathering information about geographical locations:

- Coordinates
- Place name
- Pictures
- etc.

Since a place can be anything, a specific place may  need any kind of data type to be described.

I'm therefore looking into using the same method as the semantic Web and trying to describe a place with triples.

1. The "triples" table will reference a subject, attribute and value.
2. The "attributes" table will have an attribute name, a type and mauve a table name.
3. For each attribute type, a new table is created to store the values.
4. A "resources" table list all the available resources (location, picture, ...).

Using this, I can easily add a new resource and describe it by adding new triples. Every item is identified using UUIDs as primary key.

This seems too simple to be true so I falling back to you in case you see any pitfalls here.

The "triples" table will grow very fast in this setup and may become a bottleneck. However, I don't see any other way to store information about something as wide as places around the world.

Regards,
Jimmy




--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

Re: PostgreSQL as a triple store

From
Jonas Xie
Date:
We are currently working on in-database semantic reasoning.
However, if you do not need any T-Box or A-Box reasoning, maybe JSON is
a better choice in your case.

Am 12.08.2014 15:19, schrieb Jimmy Thrasibule:
>
> Hello,
>
> I'm looking into gathering information about geographical locations:
>
> - Coordinates
> - Place name
> - Pictures
> - etc.
>
> Since a place can be anything, a specific place may need any kind of
> data type to be described.
>
> I'm therefore looking into using the same method as the semantic Web
> and trying to describe a place with triples.
>
> 1. The "triples" table will reference a subject, attribute and value.
> 2. The "attributes" table will have an attribute name, a type and
> mauve a table name.
> 3. For each attribute type, a new table is created to store the values.
> 4. A "resources" table list all the available resources (location,
> picture, ...).
>
> Using this, I can easily add a new resource and describe it by adding
> new triples. Every item is identified using UUIDs as primary key.
>
> This seems too simple to be true so I falling back to you in case you
> see any pitfalls here.
>
> The "triples" table will grow very fast in this setup and may become a
> bottleneck. However, I don't see any other way to store information
> about something as wide as places around the world.
>
> Regards,
> Jimmy
>



Re: PostgreSQL as a triple store

From
Jimmy Thrasibule
Date:
> Is there a reason why hstore or json is not an option?  That may work a lot
> better than this approach.

I don't want to move away from SQL common features so I can have a
test environment using SQLite and deploy on PostgreSQL. This approach
looks elegant and simple to me. Using a new table per attribute type
or even per attribute for the values, I can easily index them for
quick lookups.


Re: PostgreSQL as a triple store

From
Raymond O'Donnell
Date:
On 12/08/2014 15:57, Jimmy Thrasibule wrote:
>> Is there a reason why hstore or json is not an option?  That may work a lot
>> better than this approach.
>
> I don't want to move away from SQL common features so I can have a
> test environment using SQLite and deploy on PostgreSQL. This approach
> looks elegant and simple to me. Using a new table per attribute type
> or even per attribute for the values, I can easily index them for
> quick lookups.

What is your test environment? Postgres works quite happily on my rather
low-powered Windows 7 laptop, and so I have all the bells and whistles
available for development.

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie


Re: PostgreSQL as a triple store

From
Bill Moran
Date:
On Tue, 12 Aug 2014 16:57:32 +0200
Jimmy Thrasibule <thrasibule.jimmy@gmail.com> wrote:

> > Is there a reason why hstore or json is not an option?  That may work a lot
> > better than this approach.
>
> I don't want to move away from SQL common features so I can have a
> test environment using SQLite and deploy on PostgreSQL. This approach
> looks elegant and simple to me. Using a new table per attribute type
> or even per attribute for the values, I can easily index them for
> quick lookups.

Couple of things to keep in mind:
* SQLLite and PostgreSQL are not 100% compatable.  It's cheap and easy to
set up PostgreSQL for testing/development, so it seems silly (to me) to test
on something that might behave differently than the production environment.

* Any setup where the application can execute DDL is a potential security
concern.  If the code can manipulate tables then a bug in the code can
allow an accidental or malicious user to quickly and easily destroy data.

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com


Re: PostgreSQL as a triple store

From
John R Pierce
Date:
On 8/12/2014 6:19 AM, Jimmy Thrasibule wrote:
>
> I'm looking into gathering information about geographical locations:
>
> - Coordinates
> - Place name
> - Pictures
> - etc.
>
> Since a place can be anything, a specific place may need any kind of
> data type to be described.
>
> I'm therefore looking into using the same method as the semantic Web
> and trying to describe a place with triples.
>
> 1. The "triples" table will reference a subject, attribute and value.
> 2. The "attributes" table will have an attribute name, a type and
> mauve a table name.
> 3. For each attribute type, a new table is created to store the values.
> 4. A "resources" table list all the available resources (location,
> picture, ...).
>
> Using this, I can easily add a new resource and describe it by adding
> new triples. Every item is identified using UUIDs as primary key.
>

that is a variation of an anti-pattern known as EAV
(Entity-Attribute-Value).

its impossible to implement well-performing queries with this as you'll
need to make several sequential queries each time, since table names
can't be 'dynamic' in a sql query by design. (query triples join
attributes, case on the attribute type, query appropriate attribute
table and possibly resources)

http://mikesmithers.wordpress.com/2013/12/22/the-anti-pattern-eavil-database-design/


--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: PostgreSQL as a triple store

From
Chris Travers
Date:



On Tue, Aug 12, 2014 at 8:33 AM, Bill Moran <wmoran@potentialtech.com> wrote:
On Tue, 12 Aug 2014 16:57:32 +0200
Jimmy Thrasibule <thrasibule.jimmy@gmail.com> wrote:

> > Is there a reason why hstore or json is not an option?  That may work a lot
> > better than this approach.
>
> I don't want to move away from SQL common features so I can have a
> test environment using SQLite and deploy on PostgreSQL. This approach
> looks elegant and simple to me. Using a new table per attribute type
> or even per attribute for the values, I can easily index them for
> quick lookups.

Couple of things to keep in mind:
* SQLLite and PostgreSQL are not 100% compatable.  It's cheap and easy to
set up PostgreSQL for testing/development, so it seems silly (to me) to test
on something that might behave differently than the production environment.

* Any setup where the application can execute DDL is a potential security
concern.  If the code can manipulate tables then a bug in the code can
allow an accidental or malicious user to quickly and easily destroy data.

This is a good point I probably should have mentioned specifically.  Making my solution secure means restricting access to the catalog and catalog to DDL features to a very small subset of trusted users (ideally not through a standard application connection).  In this case, one might as well restrict it to those who are already db superusers unless there is a clear need to broaden it.

BTW, a specific case in point...  DDL can't be parameterized since it doesn't have a query plan.  This means you are doing string concatenation to create your ddl queries.  If you aren't careful someone can add an attribute like:

'; DROP TABLE things; --

In practice this means a great deal of manual review and restriction on which users can access this feature.

Best Wishes,
Chris Travers 

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com



--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

Re: PostgreSQL as a triple store

From
Seref Arikan
Date:
Hi Jimmy,
I think you're going to need to perform experiments and decide if the flexibility you get is worth the performance you're going to lose.
As you can see from John R Pierce's response it is only matter of time before someone (correctly) warns you about the performance issues with EAV derivatives, but then the conversation becomes too abstract, at least in my opinion.
Why? Because we don't know how much data you're going to process with this design, with what kind of hardware.

I've developed solutions where I've used EAV along with a more relational design, they were delivered on time, performed at the expected/acceptable level, so no problems. Then again, I've tried to use EAV bluntly in a healthcare related project and it died on me quite quickly when the data grew.

I always use on demand Amazon instances to test performance of the db design to test performance using automatically generated data. It is a very good way of testing your design as data grows and I suggest you consider the same approach.

Also, your requirements sounds related to some projects that use PostGis, have you taken a look at that?

Best regards
Seref


 


On Tue, Aug 12, 2014 at 2:19 PM, Jimmy Thrasibule <thrasibule.jimmy@gmail.com> wrote:

Hello,

I'm looking into gathering information about geographical locations:

- Coordinates
- Place name
- Pictures
- etc.

Since a place can be anything, a specific place may  need any kind of data type to be described.

I'm therefore looking into using the same method as the semantic Web and trying to describe a place with triples.

1. The "triples" table will reference a subject, attribute and value.
2. The "attributes" table will have an attribute name, a type and mauve a table name.
3. For each attribute type, a new table is created to store the values.
4. A "resources" table list all the available resources (location, picture, ...).

Using this, I can easily add a new resource and describe it by adding new triples. Every item is identified using UUIDs as primary key.

This seems too simple to be true so I falling back to you in case you see any pitfalls here.

The "triples" table will grow very fast in this setup and may become a bottleneck. However, I don't see any other way to store information about something as wide as places around the world.

Regards,
Jimmy


Re: PostgreSQL as a triple store

From
Chris Travers
Date:



On Wed, Aug 13, 2014 at 12:43 AM, Seref Arikan <serefarikan@gmail.com> wrote:
Hi Jimmy,
I think you're going to need to perform experiments and decide if the flexibility you get is worth the performance you're going to lose.
As you can see from John R Pierce's response it is only matter of time before someone (correctly) warns you about the performance issues with EAV derivatives, but then the conversation becomes too abstract, at least in my opinion.
Why? Because we don't know how much data you're going to process with this design, with what kind of hardware.

We also don't know how the data will be processed.

Where EAV dies a screaming horrible death is when you try to search on multiple attributes.  If this data is just never going to be searched on attributes, EAV isn't that bad (I use it for some things where that is clearly the case). 
 
--
Best Wishes,
Chris Travers

Re: PostgreSQL as a triple store

From
Jimmy Thrasibule
Date:

> I think you're going to need to perform experiments and decide if the flexibility you get is worth the performance you're going to lose.
> As you can see from John R Pierce's response it is only matter of time before someone (correctly) warns you about the performance issues with EAV derivatives, but then the conversation becomes too abstract, at least in my opinion.
> Why? Because we don't know how much data you're going to process with this design, with what kind of hardware.

My data model is not completely ready yet and is still being worked on. But yes,  I think I have first to think more about how I will be using the data before looking at a storage model.

I will indeed run some experiments before but using this model, I was planning to but some complexity in the code.

> Also, your requirements sounds related to some projects that use PostGis, have you taken a look at that?

Never heard of it, I'll give it a look.

---
Jimmy