Thread: Accelerating aggregates
Stop me if you've heard this before. I'm looking at fast calculation of aggregates (sum(), max(), count()) across large tables, or across fairly simply defined subsets of those tables. Lets say that, for a given aggregate function on a given table (with a given where clause, perhaps), each postgres process maintains a state variable (stype, in aggregate terms) and there's a also a single state variable available to all backends via shared memory. Each time a transaction starts the process initialises its local state variable to the initcond of the aggregate. Each time a row is inserted into the table the local state variable is updated, using the aggregate update function. Each time a row is removed then the local state variable is either updated, or invalidated, using a "reverse-update" function, again specific to the aggregate. If the transaction is rolled back, the local state variable is thrown away. If the transaction is commited and the local state variable has been invalidated then the global state variable is invalidated, otherwise the global state variable is updated using a a state merge function, specific to the aggregate. Then, at any point, the correct value of the aggregate function, taking into account transactions that commited before the current transaction started can be found by merging the global state variable and the local one. If the global state variable isn't valid (because it's been explicitly invalidated, or the system has just started up or the caching of the value was just enabled) then it can be calculated by running the real aggregate function against the table. By defining the right functions this approach could significantly accelerate many aggregate queries, including count(), min(), max(), avg() and also simple 'count() where <something>' or 'sum() where <something>' calculations. count() update(): local_stype = local_stype+1 reverse_update(): local_stype = local_stype-1 merge(): global_stype = global_stype + local_stype max() update(X): local_stype = max(local_stype, X) reverse_update(X): if X >= local_stype then invalidate else nothing merge() global_stype = max(global_stype, local_stype) This would be fairly cheap on updates, no worse than a fairly cheap trigger, and either very cheap to read or at worst no more expensive than calculating the aggregate fully. It's a fairly nasty kludge, and probably a bad idea, but do you think it's feasible (ignoring the whole nested transaction issue, for now)? Cheers, Steve
Steve Atkins wrote: > Stop me if you've heard this before. > > > I'm looking at fast calculation of aggregates (sum(), max(), count()) > across large tables, or across fairly simply defined subsets of those > tables. > > Lets say that, for a given aggregate function on a given table (with a > given where clause, perhaps), each postgres process maintains a state > variable (stype, in aggregate terms) and there's a also a single state > variable available to all backends via shared memory. > > Each time a transaction starts the process initialises its local state > variable to the initcond of the aggregate. Each time a row is inserted > into the table the local state variable is updated, using the > aggregate update function. Each time a row is removed then the local > state variable is either updated, or invalidated, using a > "reverse-update" function, again specific to the aggregate. > > If the transaction is rolled back, the local state variable is > thrown away. If the transaction is commited and the local state > variable has been invalidated then the global state variable is > invalidated, otherwise the global state variable is updated using > a a state merge function, specific to the aggregate. Isn't this going to have visibility issues wrt other backends? How do I know what transactions have updated the global and what haven't and which I should currently be seeing? I'm not sure that there is a solution simpler than the "insert +1/-1 into summary table" that gets discussed. -- Richard Huxton Archonet Ltd
[snip] I've been harping on this problem myself the last couple days. A summary table with frequent vacuums is your best bet for the existing versions of PostgreSQL. It is, IMHO, suboptimal, but a workable solution depending on the expected database load. Right now I am exploring the possibility of writing a set of msession functions for PostgreSQL. The msession project is a high speed high volume session manager for PHP based websites. Using PostgreSQL's recent ability to return sets of rows from functions, it should be possible to create a set of msession functionality in PostgreSQL that allows really FAST and really temporary variables that, while syntactically different, behave much like simple tables.
On Fri, Jun 11, 2004 at 09:27:07AM +0100, Richard Huxton wrote: > >If the transaction is rolled back, the local state variable is > >thrown away. If the transaction is commited and the local state > >variable has been invalidated then the global state variable is > >invalidated, otherwise the global state variable is updated using > >a a state merge function, specific to the aggregate. > > Isn't this going to have visibility issues wrt other backends? How do I > know what transactions have updated the global and what haven't and > which I should currently be seeing? The global is only updated at transaction commit. So, if you take a local snapshot of the global at the beginning of your transaction then the visible changes at any point are those from transactions that commited before your transaction started. That's well-defined, at least, and appears to be pretty much the same as the standard read commited isolation level. > I'm not sure that there is a solution simpler than the "insert +1/-1 > into summary table" that gets discussed. That's fairly slow and painful. Cheers, Steve
Steve Atkins <steve@blighty.com> writes: > So, if you take a local snapshot of the global at the beginning of > your transaction then the visible changes at any point are those from > transactions that commited before your transaction started. That's > well-defined, at least, and appears to be pretty much the same as the > standard read commited isolation level. no, read committed would see any other updates that have been committed since the start of your transaction. For some linear aggregates you could start with the initcond, apply all the local updates and whenever you have to read the actual value then use the global variable at that time. But not all aggregates can be handled that way. I think all the standard ones could be though, sum(), count(), stddev(), etc. -- greg
On Fri, Jun 11, 2004 at 12:17:57PM -0400, Greg Stark wrote: > Steve Atkins <steve@blighty.com> writes: > > > So, if you take a local snapshot of the global at the beginning of > > your transaction then the visible changes at any point are those from > > transactions that commited before your transaction started. That's > > well-defined, at least, and appears to be pretty much the same as the > > standard read commited isolation level. > > no, read committed would see any other updates that have been committed since > the start of your transaction. Uhm... only updates within the current transaction. So if you merge the global state and the local state that's exactly what you'll see. > For some linear aggregates you could start with the initcond, apply all the > local updates and whenever you have to read the actual value then use the > global variable at that time. But not all aggregates can be handled that way. > I think all the standard ones could be though, sum(), count(), stddev(), etc. I think all the standard ones can (anything with an associative update function, if I remember my math correctly). And my thought was not that this would be a neato transparent optimization that the parser would use directly in all cases, rather that it would be a hack explicitly setup by the DBA for those specific cases where they need it. Cheers, Steve
Steve Atkins <steve@blighty.com> writes: > Uhm... only updates within the current transaction. So if you merge the > global state and the local state that's exactly what you'll see. The only way this would work is if at every SetQuerySnapshot() you copy *all* of the global variables as part of the snapshot. You'd have to copy them all since you don't know which ones you'll need for the next query. To avoid race conditions, you'd need to lock out transaction commits while you are doing this copying. I think there are also race conditions involved in transaction commit, since there's no way to make the update of the global state be atomic with the actual transaction commit ... unless perhaps you want to hold a lock on the global state area while committing. All in all, I think the overhead of this scheme would be enormous. It implies significant costs during every transaction start and commit, whether or not that transaction is getting any benefit. regards, tom lane
On Fri, Jun 11, 2004 at 01:49:18PM -0400, Tom Lane wrote: > Steve Atkins <steve@blighty.com> writes: > > Uhm... only updates within the current transaction. So if you merge the > > global state and the local state that's exactly what you'll see. > > The only way this would work is if at every SetQuerySnapshot() you copy > *all* of the global variables as part of the snapshot. You'd have to > copy them all since you don't know which ones you'll need for the next > query. To avoid race conditions, you'd need to lock out transaction > commits while you are doing this copying. Yup, though that's going to be acquire lock, memcpy, release lock and there's unlikely to be more than a few hundred bytes of state. > I think there are also race conditions involved in transaction commit, > since there's no way to make the update of the global state be atomic > with the actual transaction commit ... unless perhaps you want to hold > a lock on the global state area while committing. Yeah, that's the implementation detail that's going to really kill the idea in most cases. > All in all, I think the overhead of this scheme would be enormous. It > implies significant costs during every transaction start and commit, > whether or not that transaction is getting any benefit. I think you're right, but it was interesting to consider briefly. Cheers, Steve
Steve Atkins <steve@blighty.com> writes: > On Fri, Jun 11, 2004 at 12:17:57PM -0400, Greg Stark wrote: > > > no, read committed would see any other updates that have been committed since > > the start of your transaction. > > Uhm... only updates within the current transaction. No, "read committed" refers to being able to read any updates that are committed, even if they were committed after the start of your transaction: For example: db=> begin; BEGIN db=> begin; BEGIN db=> insert into test values (1); INSERT 6725927 1 db=> select * from test;a ---1 (1 row) db=> select * from test; a --- (0 rows) db=> commit; COMMIT db=> select * from test; a --- 1 (1 row) -- greg
I am getting the following error: error: parse error in expression error: /usr/src/redhat/SPECS/postgresql-7.4.2-1PGDG.spec:98: parseExpressionBoolean returns -1 error: Package has no %description: postgresql When I execute rpmbuild --rebuild --define 'build9x 1' --define 'tcldevel 0' --define 'perl 0' --define 'tcl 0' --define 'tkpkg 0' --define 'test 0' --define 'newintarray 1' --define 'kerberos 0' -vv SRPMS/postgresql-7.4.2-1PGDG.src.rpm Dave -- Dave Cramer 519 939 0336 ICQ # 14675561
Dave Cramer <pg@fastcrypt.com> writes: > I am getting the following error: > error: parse error in expression What does this have to do with accelerating aggregates? Please don't start new threads by responding to existing threads. -- greg