Thread: MVCC and Implications for (Near) Real-Time Application
Hi experts,
I have a (near) real-time application in which inserts into the database needs to be visible to queries from other threads with minimal delay. The inserts are triggered by real-time events and are therefore asynchronous (i.e. many performance tips I read related to batch inserts or copy do not apply here, since these events cannot be predicted or batched), and the inserted data need to be available within a couple of seconds to other threads (for example, an inserted row that only appears to other query threads 5 seconds or more after the insert is not acceptable). The delay should be under 2 seconds maximum, sub-1 second would be great.
My questions are: (1) Does the MVCC architecture introduce significant delays between insert by a thread and visibility by other threads (I am unclear about how multiple versions are "collapsed" or reconciled, as well as how different query threads are seeing which version)? (2) Are there any available benchmarks that can measure this delay? (3) What are relevant config parameters that will reduce this delay?
Thanks for your patience with my ignorance of MVCC (still learning more about it),
Steve
On Oct 25, 2010, at 2:46 PM, Steve Wong wrote: > Hi experts, > > I have a (near) real-time application in which inserts into the database needs > to be visible to queries from other threads with minimal delay. The inserts are > triggered by real-time events and are therefore asynchronous (i.e. many > performance tips I read related to batch inserts or copy do not apply here, > since these events cannot be predicted or batched), and the inserted data need > to be available within a couple of seconds to other threads (for example, an > inserted row that only appears to other query threads 5 seconds or more after > the insert is not acceptable). The delay should be under 2 seconds maximum, > sub-1 second would be great. > > My questions are: (1) Does the MVCC architecture introduce significant delays > between insert by a thread and visibility by other threads (I am unclear about > how multiple versions are "collapsed" or reconciled, as well as how different > query threads are seeing which version)? (2) Are there any available benchmarks > that can measure this delay? (3) What are relevant config parameters that will > reduce this delay? There is no way to know without testing whether your hardware, OS, database schema, and database load can meet your demands.However, there is no technical reason why PostgreSQL could not meet your timing goals- MVCC does not inherently introducedelays, however the PostgreSQL implementation requires a cleanup process which can introduce latency. If you find that your current architecture is not up to the task, consider using LISTEN/NOTIFY with a payload (new in 9.0),which we are using for a similar "live-update" system. Cheers, M
Steve Wong <powerpchead@yahoo.com> wrote: > (1) Does the MVCC architecture introduce significant delays > between insert by a thread and visibility by other threads (I am > unclear about how multiple versions are "collapsed" or reconciled, > as well as how different query threads are seeing which version)? As soon as the inserting transaction commits the inserted row is visible to new snapshots. If you are in an explicit transaction the commit will have occurred before the return from the COMMIT request; otherwise it will have completed before the return from the INSERT request. You will get a new snapshot for every statement in READ COMMITTED (or lower) transaction isolation. You will get a new snapshot for each database transaction in higher isolation levels. -Kevin
> My questions are: (1) Does the MVCC architecture introduce significant > delays between insert by a thread and visibility by other threads As said by others, once commited it is immediately visible to all > (2) Are there any available benchmarks that can measure this delay? Since you will not be batching INSERTs, you will use 1 INSERT per transaction. If you use Autocommit mode, that's it. If you don't, you will get a few extra network roundtrips after the INSERT, to send the COMMIT. One INSERT is usually extremely fast unless you're short on RAM and the indexes that need updating need some disk seeking. Anyway, doing lots of INSERTs each in its own transaction is usually very low-throughput, because at each COMMIT, postgres must always be sure that all the data is actually written to the harddisks. So, depending on the speed of your harddisks, each COMMIT can take up to 10-20 milliseconds. On a 7200rpm harddisk, it is absolutely impossible to do more than 7200 commits/minute if you want to be sure each time that the data really is written on the harddisk, unless : - you use several threads (one disk write can group several commits from different connections, see the config file docs) - you turn of synchronous_commit ; in this case commit is instantaneous, but if your server loses power or crashes, the last few seconds of data may be lost (database integrity is still guaranteed though) - you use a battery backup cache on your RAID controller, in this case "written to the harddisks" is replaced by "written to batteyr backed RAM" which is a lot faster If you dont use battery backed cache, place the xlog on a different RAID1 array than the tables/indexes, this allows committing of xlog records (which is the time critical part) to proceed smoothly and not be disturbed by other IO on the indexes/tables. Also consider tuning your bgwriter and checkpoints, after experimentation under realistic load conditions. So, when you benchmark your application, if you get disappointing results, think about this...