Thread: Storing computed values

Storing computed values

From
Colin Wetherbee
Date:
Greetings.

I have a question regarding storing computed values.  Essentially, it's
a question about caching, and I'm willing to implement a cache on the
client side or using pg_memcache, but I wonder if I can do this in a
PostgreSQL table, instead?

The background is that I'm using PostGIS to store thousands of points
across the globe.  Once in a while, I want to connect a subset of these
points using an algorithm I have in a PL/Perl procedure.  Depending on
where the endpoints are located, this algorithm takes between 10 ms and
40 ms to run for each pair of points.  Hundreds of these points are
connected and then displayed over the web, which, essentially, takes a
really long time.  Some connections are calculated very frequently, and
some connections will probably never be calculated, though it's
difficult to determine which connections fit those descriptions.

The locations of these points sometimes change, and, relatively
infrequently, new points are added.  Even more infrequently, some points
are deleted.

I would like to have a table that contains a connection for each
distinct pair of points (point1 to point2 is the same as point2 to
point1).  This table would then be automatically updated every time a
modification is made to the reference table.  If my calculation is
correct, the new table would contain 3,654,456 rows using the current
data set.

I realize I could use a TRIGGER to keep the connections table fresh, and
perhaps that's also a solution.

But, really, I'm just wondering if PostgreSQL has some automated,
built-in facility for situations like this?

Thanks.

Colin

Re: Storing computed values

From
"Richard Broersma"
Date:
On Mon, Apr 21, 2008 at 10:34 AM, Colin Wetherbee <cww@denterprises.org> wrote:

> I would like to have a table that contains a connection for each distinct
> pair of points (point1 to point2 is the same as point2 to point1).  This
> table would then be automatically updated every time a modification is made
> to the reference table.  If my calculation is correct, the new table would
> contain 3,654,456 rows using the current data set.
>
> I realize I could use a TRIGGER to keep the connections table fresh, and
> perhaps that's also a solution.
>
> But, really, I'm just wondering if PostgreSQL has some automated, built-in
> facility for situations like this?

Would a functional index do this for you?  Perhaps, you wouldn't even
need a table is you store these computed values in an index instead.


--
Regards,
Richard Broersma Jr.

Re: Storing computed values

From
Colin Wetherbee
Date:
Richard Broersma wrote:
> On Mon, Apr 21, 2008 at 10:34 AM, Colin Wetherbee <cww@denterprises.org> wrote:
>
>> I would like to have a table that contains a connection for each distinct
>> pair of points (point1 to point2 is the same as point2 to point1).  This
>> table would then be automatically updated every time a modification is made
>> to the reference table.  If my calculation is correct, the new table would
>> contain 3,654,456 rows using the current data set.
>>
>> I realize I could use a TRIGGER to keep the connections table fresh, and
>> perhaps that's also a solution.
>>
>> But, really, I'm just wondering if PostgreSQL has some automated, built-in
>> facility for situations like this?
>
> Would a functional index do this for you?  Perhaps, you wouldn't even
> need a table is you store these computed values in an index instead.

I'm not sure, as I've never used one before.  I think I briefly
considered it a while back and decided it wouldn't do what I want
because I don't know the value of the connection before-hand.  Perhaps
you can steer me in the right direction.

Let's say my points table looks like this:

point_id | location
---------+----------
        1 | 010100000000... <-- some PostGIS geometry string
        2 | 010100000000...

And, my foo table, which contains data pertaining to these connections,
looks like this:

id | point_id_start | point_id_end
---+----------------+--------------
  1 |              1 |            2

And, let's say my function is connect(location1, location2).

Right now, in order to get my connection, I'm using something like:

SELECT connect(p_start.location, p_end.location)
FROM foo
JOIN points AS p_start ON foo.point_id_start = points.point_id
JOIN points AS p_end   ON foo.point_id_end   = points.point_id
WHERE foo.id = 8192;

I would like to be able to retrieve that connection without using the
connect() procedure.  How would I be able to take advantage of a
functional index in this context?

As I mentioned above, I don't know the result of connect() before the
query; that's what I'm trying to compute, not what I'm trying to search
against.

Thanks.

Colin

Re: Storing computed values

From
Colin Wetherbee
Date:
Colin Wetherbee wrote:
> SELECT connect(p_start.location, p_end.location)
> FROM foo
> JOIN points AS p_start ON foo.point_id_start = points.point_id
> JOIN points AS p_end   ON foo.point_id_end   = points.point_id
> WHERE foo.id = 8192;

As I didn't test this code, my syntax was slightly incorrect.

JOIN ... ON ... = p_start.point_id
JOIN ... ON ... = p_end.point_id

Colin

Re: Storing computed values

From
"Richard Broersma"
Date:
On Mon, Apr 21, 2008 at 11:02 AM, Colin Wetherbee <cww@denterprises.org> wrote:

> Let's say my points table looks like this:
>
> point_id | location
> ---------+----------
>       1 | 010100000000... <-- some PostGIS geometry string
>       2 | 010100000000...
>
> And, my foo table, which contains data pertaining to these connections,
> looks like this:
>
> id | point_id_start | point_id_end
> ---+----------------+--------------
>  1 |              1 |            2
>
> And, let's say my function is connect(location1, location2).
>


> I would like to be able to retrieve that connection without using the
> connect() procedure.  How would I be able to take advantage of a functional
> index in this context?


I am not sure what kind of constraints you have on your points table
for location.  It location is unique, this might be an example where
you can solve your problem if you use a natural foreign key in foo
instead of a surrogate key.

Then you could just create an index:

CREATE INDEX ON Foo Connect( location1, location2);

--
Regards,
Richard Broersma Jr.

Re: Storing computed values

From
Colin Wetherbee
Date:
Richard Broersma wrote:
> On Mon, Apr 21, 2008 at 11:02 AM, Colin Wetherbee <cww@denterprises.org> wrote:
>
>> Let's say my points table looks like this:
>>
>> point_id | location
>> ---------+----------
>>       1 | 010100000000... <-- some PostGIS geometry string
>>       2 | 010100000000...
>>
>> And, my foo table, which contains data pertaining to these connections,
>> looks like this:
>>
>> id | point_id_start | point_id_end
>> ---+----------------+--------------
>>  1 |              1 |            2
>>
>> And, let's say my function is connect(location1, location2).
>
>> I would like to be able to retrieve that connection without using the
>> connect() procedure.  How would I be able to take advantage of a functional
>> index in this context?
>
> I am not sure what kind of constraints you have on your points table
> for location.  It location is unique, this might be an example where
> you can solve your problem if you use a natural foreign key in foo
> instead of a surrogate key.

Yes, every location is currently unique, and I can't think of a
situation where I would have useful duplicates.

> Then you could just create an index:
>
> CREATE INDEX ON Foo Connect( location1, location2);

My impression of functional indexes is that they're useful mostly in
WHERE queries, like the following.

SELECT foo, bar, baz FROM some_table WHERE lower(foo) = 'qux';

In this case, the index would be created on lower(foo).

How would I get the value of the functional index out of the index in my
case?

Colin


Re: Storing computed values

From
"Richard Broersma"
Date:
On Mon, Apr 21, 2008 at 11:19 AM, Colin Wetherbee <cww@denterprises.org> wrote:

> My impression of functional indexes is that they're useful mostly in WHERE
> queries, like the following.
>
> SELECT foo, bar, baz FROM some_table WHERE lower(foo) = 'qux';
>
> In this case, the index would be created on lower(foo).
>
> How would I get the value of the functional index out of the index in my
> case?

If this is the case, and you use Natural Foreign Keys, you only need
to select from your foo table:

SELECT Connect( location1, location2)
  FROM Foo
 WHERE location1 = ...
        OR location2 = ...
        OR connect(location1,location2) = ...; --this is where the
functional index would be useful


--
Regards,
Richard Broersma Jr.

Re: Storing computed values

From
Colin Wetherbee
Date:
Colin Wetherbee wrote:
> Richard Broersma wrote:
>> On Mon, Apr 21, 2008 at 11:02 AM, Colin Wetherbee
>> <cww@denterprises.org> wrote:
>>
>>> Let's say my points table looks like this:
>>>
>>> point_id | location
>>> ---------+----------
>>>       1 | 010100000000... <-- some PostGIS geometry string
>>>       2 | 010100000000...
>>>
>>> And, my foo table, which contains data pertaining to these connections,
>>> looks like this:
>>>
>>> id | point_id_start | point_id_end
>>> ---+----------------+--------------
>>>  1 |              1 |            2
>>>
>>> And, let's say my function is connect(location1, location2).
>>
>>> I would like to be able to retrieve that connection without using the
>>> connect() procedure.  How would I be able to take advantage of a
>>> functional
>>> index in this context?
>>
>> I am not sure what kind of constraints you have on your points table
>> for location.  It location is unique, this might be an example where
>> you can solve your problem if you use a natural foreign key in foo
>> instead of a surrogate key.
>
> Yes, every location is currently unique, and I can't think of a
> situation where I would have useful duplicates.
>
>> Then you could just create an index:
>>
>> CREATE INDEX ON Foo Connect( location1, location2);
>
> My impression of functional indexes is that they're useful mostly in
> WHERE queries, like the following.
>
> SELECT foo, bar, baz FROM some_table WHERE lower(foo) = 'qux';
>
> In this case, the index would be created on lower(foo).
>
> How would I get the value of the functional index out of the index in my
> case?

I think I now see where you're going with this, but this makes my
problem somewhat more interesting.

I can imagine two ways of solving this.

First, an INDEX with a JOIN... (I can see Tom Lane laughing at me now.)

CREATE INDEX foo_connect_idx
ON foo connect(p_start.location, p_end.location)
JOIN points AS p_start ON foo.point_id_start = p_start.point_id
JOIN points AS p_end ON foo.point_id_end = p_end.point_id;

Just in case this might work, I checked the documentation and found no
mention of JOIN anywhere in the INDEX sections.

Second, bury the JOIN part in my PL/Perl function and use spi_ functions
to retrieve my actual PostGIS locations so the process is transparent to
CREATE INDEX.

Hmm.

Colin

Re: Storing computed values

From
Colin Wetherbee
Date:
Richard Broersma wrote:
> On Mon, Apr 21, 2008 at 11:19 AM, Colin Wetherbee <cww@denterprises.org> wrote:
>
>> My impression of functional indexes is that they're useful mostly in WHERE
>> queries, like the following.
>>
>> SELECT foo, bar, baz FROM some_table WHERE lower(foo) = 'qux';
>>
>> In this case, the index would be created on lower(foo).
>>
>> How would I get the value of the functional index out of the index in my
>> case?
>
> If this is the case, and you use Natural Foreign Keys, you only need
> to select from your foo table:
>
> SELECT Connect( location1, location2)
>   FROM Foo
>  WHERE location1 = ...
>         OR location2 = ...
>         OR connect(location1,location2) = ...; --this is where the
> functional index would be useful

Right, but the problem I see is that my locations are not actually
stored in foo.  Since many rows of foo can reference the same location,
the locations are stored in a separate table and, in fact, are
referenced by foreign keys (SERIAL/INTEGER types) already.

Colin


Re: Storing computed values

From
"Richard Broersma"
Date:
On Mon, Apr 21, 2008 at 12:24 PM, Colin Wetherbee <cww@denterprises.org> wrote:

> Right, but the problem I see is that my locations are not actually stored in
> foo.  Since many rows of foo can reference the same location, the locations
> are stored in a separate table and, in fact, are referenced by foreign keys
> (SERIAL/INTEGER types) already.

I see, I haven't used PostGIS yet, so I am not sure how it is used.
However, I am curious about the advantages or the intent of having
duplicate points in a table that requires the surrogate ID in order to
be distinguished these duplicate point.  I only ask since I am not
familiar with postGIS (perhaps this is the way it is intended to be
used.)   To me this seems to be a bit de-normalized, unless there is
something that differentiates or somehow makes these points unique
these points.


If you constrained your data to only allowed unique points in your
points table, your problem would be solved in regards to caching,
since you could simple use natural foreign keys to get the connections
data that you want.

--
Regards,
Richard Broersma Jr.

Re: Storing computed values

From
Colin Wetherbee
Date:
Richard Broersma wrote:
> On Mon, Apr 21, 2008 at 12:24 PM, Colin Wetherbee
> <cww@denterprises.org> wrote:
>
>> Right, but the problem I see is that my locations are not actually
>> stored in foo.  Since many rows of foo can reference the same
>> location, the locations are stored in a separate table and, in
>> fact, are referenced by foreign keys (SERIAL/INTEGER types)
>> already.
>
> I see, I haven't used PostGIS yet, so I am not sure how it is used.
> However, I am curious about the advantages or the intent of having
> duplicate points in a table that requires the surrogate ID in order
> to be distinguished these duplicate point.  I only ask since I am not
>  familiar with postGIS (perhaps this is the way it is intended to be
> used.)   To me this seems to be a bit de-normalized, unless there is
> something that differentiates or somehow makes these points unique
> these points.

In the example we're tossing around, these PostGIS points could be
considered to be integers or anything else.  I just mentioned that they
were PostGIS points because then it's easier to visualize a procedure
that "connects" them.

The surrogate key exists because that's what users see, and it's a cheap
way for me to validate user input.  For example, if a user wants to
connect point_id 123 and point_id 456, that's great, but if a user is
connecting arbitrary longitudes and latitudes to others, that's not so
great.  It wouldn't destroy anything, but it's not really how this
application is intended to be used.

OTOH, I could keep the point_id in the points table and just not use it
as the surrogate key for the main table anymore.

> If you constrained your data to only allowed unique points in your
> points table, your problem would be solved in regards to caching,
> since you could simple use natural foreign keys to get the
> connections data that you want.

Indeed.

I'll have to ponder natural foreign keys for a while before I make a
decision on this.

Thank you very much for your input.

Colin