Thread: High-Concurrency GiST in postgreSQL
Hello. This is my first post. As such, feedback on style and choice of venue are especially welcome.
I am a regular but not especially expert user of a variety of databases, including postgreSQL.
I have only modest experience with spatial databases.
I have a new project[1] in which GiST could be very useful, provided I can achieve high concurrency. Starting with some empirical evidence that R* would be a good place to start, and after reading "High-Concurrency Locking in R-Trees" [2], I went looking for an implementation of R-link trees extended to R*. So I was very interested to read Hellerstein et al. where they wrote [3]:
High concurrency, recoverability, and degree-3 consis-
tency are critical factors in a full-fledged database sys-
tem. We are considering extending the results of Kor-
nacker and Banks for R-trees [KB95] to our implemen-
tation of GiSTs.
tency are critical factors in a full-fledged database sys-
tem. We are considering extending the results of Kor-
nacker and Banks for R-trees [KB95] to our implemen-
tation of GiSTs.
Since this information may be somewhat dated, and GiST has obviously come a long way in postgreSQL, I am looking for current information and advice on the state of concurrency in GiST in postgreSQL. If someone has already done an R*-link tree then that could really help me. ( I can wish, no?)
Thanks for reading and thanks for advice or pointers.
Carlos
[1] It's not a GiS prject, but it has some similarities:
(a) I need to manage up to 10 million three-dimensional "boxes" or as few as 1000 "boxes"
(b) The distribution of sizes, aspect ratios and locations in R3 are all unknown a priori and may change during execution under insert/delete.
(c) Queries may arrive asynchronously and at high rate from hundreds (or more?) of compute nodes.
(d) Successive queries from any node, viewed as a time-sequence, may have very low (or at best sporadic) spatial correlation -- lots of page jumps.
(e) R* will be advantageous over R, but Priority R is probably not especially useful since turnover may be greater than 20% during a "job."
(f) I would like to avoid teh complications of distributed databases, again because of the high turnover.
[2] Marcel Kornacker and Douglas Banks. High-Concurrency Locking in R-Trees. (1995)
[3] Hellerstein, Naughton, and Pfeffer. Generalized Search Trees for Database Systems. (1995)
On 12/5/2011 12:31 PM, C. Mundi wrote: > > Hello. This is my first post. As such, feedback on style and choice of > venue are especially welcome. > > I am a regular but not especially expert user of a variety of databases, > including postgreSQL. > I have only modest experience with spatial databases. > > I have a new project[1] in which GiST could be very useful, provided I > can achieve high concurrency.<SNIP> concurrency here can mean different things. One application hitting PG which then uses multiple threads? (Not currently possible) Or one app with multiple threads each having a database connection? (Which is really the same as) Multiple app's each having a database connection? PG limits one database connection to one cpu. Multiple connections will use multiple cpu. OR, by concurrency, do you mean, non-blocking? And if you mean non-blocking, is that for read's, write's, or both? In PG you can do non-blocking, multiple connections (ie multiple cpu), reads as much as you want. Extending to indexes: many connections can read a gist index at the same time. Is that what you need? -Andy
On Mon, Dec 5, 2011 at 12:26 PM, Andy Colson <andy@squeakycode.net> wrote:
Thanks, Andy. You're quite right of course. I'm thinking of concurrent clients. Lots of them. I envision thousands of actors (which could be threads within a process or separate processes) behaving as clients, each with its own connection to a single-threaded database server. So concurrency here simply means that the database has to be able to handle a lot of "simultaneous" connections coming at it fast and asynchronously. (I'm prepared to deploy a thin queuing layer if it turns out that I saturate the server.) The compute nodes are doing heavy physics which depends on spatially organized data, and it is very difficult to predict which rows an actor will need next. (In fact, knowing that would presuppose that the problem to be solved could be factored at great savings in computation.)
So what I really need is minimal locking, as in [Karnacker and Banks 1995]. The whole database can be pre-loaded before the start of the calculation. Now about 80% of the data in the database will never change during a run. But about 20% will change via "feedback" from the compute nodes. And the nature of the problem is that we do not know a priori which data is in the 80% and which is in the 20%. If we did, we could split the database to ensure no block-on-write impact on reads for the 80%. Alas, we have to assume that reads and writes are mixed with statistics yet unknown.
The 20% which changes changes spatially, not just in content. This can lead to the need to rebalance on inserts. And since splitting nodes in "naive" R* trees is kind of expensive [1], I am wondering to what extent the "sibling links" approach described by Karnacker and Banks has -- as anticipated by Hellerstein et al. -- already been implemented in GiST in postgreSQL. If it has, then I win just by using the 'cube' contrib extension. Hellerstein notes that sibling pointers have long been in common use even in B-trees; so I am optimistic for GiST.
So that's my concern. I'm doing 80% reads which are all non-blocking with 20% writes mixed in, and I need to avoid the effect of writes blocking queries which do not need to traverse branches affected by the write.
Thanks!
Carlos
[1] Even if inserts were not potentially expensive for the database, the prospect that an insert triggered by one compute node could occasionally cause *all* the compute to stall when not logivally necessary is horrifying.
On 12/5/2011 12:31 PM, C. Mundi wrote:can achieve high concurrency.<SNIP>
Hello. This is my first post. As such, feedback on style and choice of
venue are especially welcome.
I am a regular but not especially expert user of a variety of databases,
including postgreSQL.
I have only modest experience with spatial databases.
I have a new project[1] in which GiST could be very useful, provided I
concurrency here can mean different things. One application hitting PG which then uses multiple threads? (Not currently possible) Or one app with multiple threads each having a database connection? (Which is really the same as) Multiple app's each having a database connection?
PG limits one database connection to one cpu. Multiple connections will use multiple cpu.
OR, by concurrency, do you mean, non-blocking? And if you mean non-blocking, is that for read's, write's, or both?
In PG you can do non-blocking, multiple connections (ie multiple cpu), reads as much as you want.
Extending to indexes: many connections can read a gist index at the same time. Is that what you need?
-Andy
Thanks, Andy. You're quite right of course. I'm thinking of concurrent clients. Lots of them. I envision thousands of actors (which could be threads within a process or separate processes) behaving as clients, each with its own connection to a single-threaded database server. So concurrency here simply means that the database has to be able to handle a lot of "simultaneous" connections coming at it fast and asynchronously. (I'm prepared to deploy a thin queuing layer if it turns out that I saturate the server.) The compute nodes are doing heavy physics which depends on spatially organized data, and it is very difficult to predict which rows an actor will need next. (In fact, knowing that would presuppose that the problem to be solved could be factored at great savings in computation.)
So what I really need is minimal locking, as in [Karnacker and Banks 1995]. The whole database can be pre-loaded before the start of the calculation. Now about 80% of the data in the database will never change during a run. But about 20% will change via "feedback" from the compute nodes. And the nature of the problem is that we do not know a priori which data is in the 80% and which is in the 20%. If we did, we could split the database to ensure no block-on-write impact on reads for the 80%. Alas, we have to assume that reads and writes are mixed with statistics yet unknown.
The 20% which changes changes spatially, not just in content. This can lead to the need to rebalance on inserts. And since splitting nodes in "naive" R* trees is kind of expensive [1], I am wondering to what extent the "sibling links" approach described by Karnacker and Banks has -- as anticipated by Hellerstein et al. -- already been implemented in GiST in postgreSQL. If it has, then I win just by using the 'cube' contrib extension. Hellerstein notes that sibling pointers have long been in common use even in B-trees; so I am optimistic for GiST.
So that's my concern. I'm doing 80% reads which are all non-blocking with 20% writes mixed in, and I need to avoid the effect of writes blocking queries which do not need to traverse branches affected by the write.
Thanks!
Carlos
[1] Even if inserts were not potentially expensive for the database, the prospect that an insert triggered by one compute node could occasionally cause *all* the compute to stall when not logivally necessary is horrifying.
On 12/05/11 1:34 PM, C. Mundi wrote: > So that's my concern. I'm doing 80% reads which are all non-blocking > with 20% writes mixed in, and I need to avoid the effect of writes > blocking queries which do not need to traverse branches affected by > the write. postgres does no blocking on inserts/updates. the commonest lock is if you're doing a transaction, and need to select something prior to updating it, then you use a SELECT ... FOR UPDATE; this locks just the rows you're going to update so noone else can update them (but other clients can still read the existing value prior to your COMMIT). -- john r pierce N 37, W 122 santa cruz ca mid-left coast
On 12/05/11 1:34 PM, C. Mundi wrote: > Thanks, Andy. You're quite right of course. I'm thinking of > concurrent clients. Lots of them. I envision thousands of actors > (which could be threads within a process or separate processes) > behaving as clients, each with its own connection to a single-threaded > database server. So concurrency here simply means that the database > has to be able to handle a lot of "simultaneous" connections coming at > it fast and asynchronously. (I'm prepared to deploy a thin queuing > layer if it turns out that I saturate the server.) The compute nodes > are doing heavy physics which depends on spatially organized data, and > it is very difficult to predict which rows an actor will need next. > (In fact, knowing that would presuppose that the problem to be solved > could be factored at great savings in computation.) for optimal performance, you typically don't want more active queries than the number of CPU hardware threads if you're running purely from cache, and maybe that number + the number of physical disk drives, if you're doing lots of disk IO. I'm running OLTP style operations on a dual 6-core 'sandy bridge' xeon, this has 12 cores of 2 threads each, or 24 total hardware threads, and the server has 25 2.5" SAS drives, so we find staying around 50 active queries is the sweet spot. we manage this by using a message queueing system, where we can dynamically tune the worker thread count, where these worker threads do the actual database connections... that way 1000s of actual clients can lob requests into the messaging system (AMQP, JMS, etc would be suitable candidates, but we have our own inhouse system for legacy reasons), and 50 worker processes take these requests, handle them, and reply. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
On 12/5/2011 3:41 PM, John R Pierce wrote: > On 12/05/11 1:34 PM, C. Mundi wrote: >> So that's my concern. I'm doing 80% reads which are all non-blocking >> with 20% writes mixed in, and I need to avoid the effect of writes >> blocking queries which do not need to traverse branches affected by >> the write. > > postgres does no blocking on inserts/updates. the commonest lock is if > you're doing a transaction, and need to select something prior to > updating it, then you use a SELECT ... FOR UPDATE; this locks just the > rows you're going to update so noone else can update them (but other > clients can still read the existing value prior to your COMMIT). > As an addition to this, Reads and Writes wont block each other, but you'll need to watch the overlap if its a problem. There are many ways to go about it depending on what you want (transaction isolation levels, locking, etc). In general, I think it might look like: connection1: start transaction select * from table where the_geom && POINT(a b) connection2: start transaction update table set the_geom = POLYGON(a b c d) where rowid = 5; connection1: (in the same transaction it started above) select the_geom from table where rowid = 5; -- gets the origional geom, NOT the one from connection2! There are transaction options for read committed, read un-committed, etc, etc. I don't rightly understand them all, but it sounds like you'll want to. > traverse branches affected by the write I assume that's a reference to building an underlying tree structure. You wont need to worry about it. On the other hand, if that's a reference to some geo-boxing thing where one row is included in another and you need to update multiple rows, and I'm starting to confuse myself, then you might have a problem. Also, as John points out, you'll want a connection pooler. I've heard good things about pgPool. It'll also spread read's across multiple computers just incase you need a faster response. (writes go to all computers, read's round-robin). -Andy
Agreed on the importance of understanding the transaction modes.
I was specifically pointing to the potential latency of blocked reads during splitting nodes on inserting when rebalancing. But as Paul points out, postgres does Ang/Tan splits. While less optimal than R* splits, Ang/Tan is faster as I recall. So it might not be so bad overall.
And I appreciate the tip to look at pgPool which I didn't know about and will read up.
Thanks,
Carlos
On Dec 5, 2011 3:26 PM, "Andy Colson" <andy@squeakycode.net> wrote:
On 12/5/2011 3:41 PM, John R Pierce wrote:On 12/05/11 1:34 PM, C. Mundi wrote:As an addition to this, Reads and Writes wont block each other, but you'll need to watch the overlap if its a problem. There are many ways to go about it depending on what you want (transaction isolation levels, locking, etc).So that's my concern. I'm doing 80% reads which are all non-blocking
with 20% writes mixed in, and I need to avoid the effect of writes
blocking queries which do not need to traverse branches affected by
the write.
postgres does no blocking on inserts/updates. the commonest lock is if
you're doing a transaction, and need to select something prior to
updating it, then you use a SELECT ... FOR UPDATE; this locks just the
rows you're going to update so noone else can update them (but other
clients can still read the existing value prior to your COMMIT).
In general, I think it might look like:
connection1:
start transaction
select * from table where the_geom && POINT(a b)
connection2:
start transaction
update table set the_geom = POLYGON(a b c d) where rowid = 5;
connection1: (in the same transaction it started above)
select the_geom from table where rowid = 5;
-- gets the origional geom, NOT the one from connection2!
There are transaction options for read committed, read un-committed, etc, etc. I don't rightly understand them all, but it sounds like you'll want to.
> traverse branches affected by the write
I assume that's a reference to building an underlying tree structure. You wont need to worry about it. On the other hand, if that's a reference to some geo-boxing thing where one row is included in another and you need to update multiple rows, and I'm starting to confuse myself, then you might have a problem.
Also, as John points out, you'll want a connection pooler. I've heard good things about pgPool. It'll also spread read's across multiple computers just incase you need a faster response. (writes go to all computers, read's round-robin).
-Andy
On 12/5/2011 12:31 PM, C. Mundi wrote: > > Hello. This is my first post. As such, feedback on style and choice of > venue are especially welcome. > If I might add a personal request. Way down the road, after you get things working, would you mind dropping by and letting us know what happened? Did PG work for you or not? Good/Bad Etc. we get to hear the problems, but hardly ever the solutions. -Andy