Thread: set autovacuum=off
I have a database where I virtually never delete and almost never do updates. (The updates might change in the future but for now it's okay to assume they never happen.) As such, it seems like it might be worth it to set autovacuum=off or at least make it so vacuuming hardly ever occurs. Actually, the latter is probably the more robust solution, though I don't know how to do that (hence me writing this list). I did try turning autovacuum off but got:
ERROR: parameter "autovacuum" cannot be changed now
SQL state: 55P02
Not sure what, if anything, I can do about that.
Thanks,
-Alessandro
On 22 February 2012 23:50, Alessandro Gagliardi <alessandro@path.com> wrote: > I have a database where I virtually never delete and almost never do > updates. (The updates might change in the future but for now it's okay to > assume they never happen.) As such, it seems like it might be worth it to > set autovacuum=off or at least make it so vacuuming hardly ever occurs. > Actually, the latter is probably the more robust solution, though I don't > know how to do that (hence me writing this list). I did try turning > autovacuum off but got: > > ERROR: parameter "autovacuum" cannot be changed now > SQL state: 55P02 > > Not sure what, if anything, I can do about that. Autovacuum is controlled by how much of a table has changed, so if a table never changes, it never gets vacuumed (with the exceptional case being a forced vacuum freeze to mitigate the transaction id wrap-around issue). The settings which control this are autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor. Therefore it isn't necessary to disable autovacuum. But if you are adamant about disabling it, you need to change it in your postgresql.conf file and restart the server. -- Thom
On 2/23/2012 6:34 AM, Thom Brown wrote: > On 22 February 2012 23:50, Alessandro Gagliardi<alessandro@path.com> wrote: >> I have a database where I virtually never delete and almost never do >> updates. (The updates might change in the future but for now it's okay to >> assume they never happen.) As such, it seems like it might be worth it to >> set autovacuum=off or at least make it so vacuuming hardly ever occurs. >> Actually, the latter is probably the more robust solution, though I don't >> know how to do that (hence me writing this list). I did try turning >> autovacuum off but got: >> >> ERROR: parameter "autovacuum" cannot be changed now >> SQL state: 55P02 >> >> Not sure what, if anything, I can do about that. > > Autovacuum is controlled by how much of a table has changed, so if a > table never changes, it never gets vacuumed (with the exceptional case > being a forced vacuum freeze to mitigate the transaction id > wrap-around issue). The settings which control this are > autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor. > Therefore it isn't necessary to disable autovacuum. > > But if you are adamant about disabling it, you need to change it in > your postgresql.conf file and restart the server. > Agreed, don't disable autovacuum. It's not that demanding, and if you do need it and forget to run it, it might cause you more problems. I have a db that's on a VM that doesnt get hit very much. I've noticed IO is a little busy (we are talking small percents of percents less than one) but still more that I thought should be happening on a db with next to no usage. I found setting autovacuum_naptime = 6min made the IO all but vanish. And if I ever get a wild hair and blow some stuff away, the db will clean up after me. -Andy
I should have been more clear. I virtually never delete or do updates, but I insert a lot. So the table does change quite a bit, but only in one direction.
I was unable to disable autovacuum universally (due to the cant_change_runtime_param error) but I was able to disable it on individual tables. Still, I know this is heavy handed and sub-optimal. I tried set autovacuum_naptime='6min' but got the same 55P02 error. Should/can I set that per table?
I did look at autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor but couldn't make sense out of them. (Besides, I'd probably get the same 55P02 error if I tried to change them.)
On Thu, Feb 23, 2012 at 7:18 AM, Andy Colson <andy@squeakycode.net> wrote:
Agreed, don't disable autovacuum. It's not that demanding, and if you do need it and forget to run it, it might cause you more problems.On 2/23/2012 6:34 AM, Thom Brown wrote:On 22 February 2012 23:50, Alessandro Gagliardi<alessandro@path.com> wrote:I have a database where I virtually never delete and almost never do
updates. (The updates might change in the future but for now it's okay to
assume they never happen.) As such, it seems like it might be worth it to
set autovacuum=off or at least make it so vacuuming hardly ever occurs.
Actually, the latter is probably the more robust solution, though I don't
know how to do that (hence me writing this list). I did try turning
autovacuum off but got:
ERROR: parameter "autovacuum" cannot be changed now
SQL state: 55P02
Not sure what, if anything, I can do about that.
Autovacuum is controlled by how much of a table has changed, so if a
table never changes, it never gets vacuumed (with the exceptional case
being a forced vacuum freeze to mitigate the transaction id
wrap-around issue). The settings which control this are
autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor.
Therefore it isn't necessary to disable autovacuum.
But if you are adamant about disabling it, you need to change it in
your postgresql.conf file and restart the server.
I have a db that's on a VM that doesnt get hit very much. I've noticed IO is a little busy (we are talking small percents of percents less than one) but still more that I thought should be happening on a db with next to no usage.
I found setting autovacuum_naptime = 6min made the IO all but vanish.
And if I ever get a wild hair and blow some stuff away, the db will clean up after me.
-Andy
On 23 February 2012 17:35, Alessandro Gagliardi <alessandro@path.com> wrote: > I should have been more clear. I virtually never delete or do updates, but I > insert a lot. So the table does change quite a bit, but only in one > direction. The same thing applies. VACUUM cleans up dead tuples, which INSERTs don't create, only DELETE and UPDATEs do. -- Thom
Hm. Okay, so just to be perfectly clear, my database with all its INSERTs, but no DELETEs or UPDATEs should not be VACUUMing anyway, so disabling auto-vacuum is redundant (and possibly hazardous).
FWIW, I did notice a speed increase after disabling auto-vacuum on several of my tables though that could have been a coincidence. Is there any way that these tables could have been getting vacuumed (or some thing else) despite the fact that they are not receiving updates or deletes? Or must that have been a coincidence?
While we're on the subject, I welcome any pointers with regard to tuning a database that is being used in this way. Any cache sizes I should be messing with? Etc.
Thank you,
-Alessandro
On Thu, Feb 23, 2012 at 9:45 AM, Thom Brown <thom@linux.com> wrote:
On 23 February 2012 17:35, Alessandro Gagliardi <alessandro@path.com> wrote:The same thing applies. VACUUM cleans up dead tuples, which INSERTs
> I should have been more clear. I virtually never delete or do updates, but I
> insert a lot. So the table does change quite a bit, but only in one
> direction.
don't create, only DELETE and UPDATEs do.
--
Thom
On 02/23/2012 09:35 AM, Alessandro Gagliardi wrote:
See:
http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html
The documentation has information like "This parameter can only be set in the postgresql.conf file or on the server command line." that will tell you in advance which settings will fail when you attempt to set them through SQL statements.
But autovacuum is pretty smart about not vacuuming tables until reasonably necessary. And beware that autovacuum is also controlling when to analyze a table. Mass inserts are probably changing the characteristics of your table such that it needs to be analyzed to allow the planner to properly optimize your queries.
Have you identified that vacuum is actually causing a problem? If not, I'd leave it alone. The system tables have a lot of information on table vacuuming and analyzing:
select
relname,
last_vacuum,
last_autovacuum,
last_analyze,
last_autoanalyze,
vacuum_count,
autovacuum_count,
analyze_count,
autoanalyze_count
from
pg_stat_user_tables;
Cheers,
Steve
I should have been more clear. I virtually never delete or do updates, but I insert a lot. So the table does change quite a bit, but only in one direction.I was unable to disable autovacuum universally (due to the cant_change_runtime_param error) but I was able to disable it on individual tables. Still, I know this is heavy handed and sub-optimal. I tried set autovacuum_naptime='6min' but got the same 55P02 error. Should/can I set that per table?I did look at autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor but couldn't make sense out of them. (Besides, I'd probably get the same 55P02 error if I tried to change them.)
See:
http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html
The documentation has information like "This parameter can only be set in the postgresql.conf file or on the server command line." that will tell you in advance which settings will fail when you attempt to set them through SQL statements.
But autovacuum is pretty smart about not vacuuming tables until reasonably necessary. And beware that autovacuum is also controlling when to analyze a table. Mass inserts are probably changing the characteristics of your table such that it needs to be analyzed to allow the planner to properly optimize your queries.
Have you identified that vacuum is actually causing a problem? If not, I'd leave it alone. The system tables have a lot of information on table vacuuming and analyzing:
select
relname,
last_vacuum,
last_autovacuum,
last_analyze,
last_autoanalyze,
vacuum_count,
autovacuum_count,
analyze_count,
autoanalyze_count
from
pg_stat_user_tables;
Cheers,
Steve
On Thu, Feb 23, 2012 at 10:01 AM, Steve Crawford <scrawford@pinpointresearch.com> wrote:
The documentation has information like "This parameter can only be set in the postgresql.conf file or on the server command line." that will tell you in advance which settings will fail when you attempt to set them through SQL statements.
Ah. I missed that. Sorry for asking stupid questions.
But autovacuum is pretty smart about not vacuuming tables until reasonably necessary. And beware that autovacuum is also controlling when to analyze a table. Mass inserts are probably changing the characteristics of your table such that it needs to be analyzed to allow the planner to properly optimize your queries.
Okay, that makes more sense to me; because the stats would be changing quickly and so while vacuuming may not be necessary, analyzing would be. At the same time, I can't afford to analyze if it's causing my inserts to take over 50 ms. Something else I should add: if my selects are slow, that's annoying; but if my inserts are slow, that could be disastrous. Does analyze increase the efficiency of inserts or just selects? (I assumed the latter.) Obviously, I will need to analyze sometimes, but perhaps not nearly as often as postgres would predict under the circumstances.
Have you identified that vacuum is actually causing a problem? If not, I'd leave it alone. The system tables have a lot of information on table vacuuming and analyzing:
Not indubitably, but circumstantially, I did notice that significantly fewer of my commits were taking over 50 ms after I set autovacuum_enabled=off on many of my tables. Unfortunately, it was not an isolated experiment, so I can't really be sure. At the same time, I'm hesitant to turn it back on until I'm sure it either didn't make a difference or I've got a better plan for how to deal with this.
select
relname,
last_vacuum,
last_autovacuum,
last_analyze,
last_autoanalyze,
vacuum_count,
autovacuum_count,
analyze_count,
autoanalyze_count
from
pg_stat_user_tables;
Apparently the last four columns don't exist in my database. As for the first four, that is somewhat illuminating. It looks like the last_autovacuum that occurred on any of my tables was late Monday evening (almost two days before I set autovacuum_enabled=off). The last_autoanalyze on one of the tables where I set autovacuum_enabled=off was yesterday at 10:30, several hours before I disabled auto-vacuum. (I've had others since then on tables where I didn't disable auto-vacuum.) It looks like disabling auto-vacuum also disabled auto-analyze (did it?) but it also looks like that might not have been the continuous problem I thought it was.
So if it's not auto-vacuuming that's making my inserts so slow, what is it? I'm batching my inserts (that didn't seem to help at all actually, but maybe cause I had already turned off synchronous_commit anyway). I've gotten rid of a bunch of indices (especially those with low cardinality–that I did around the same time as disabling auto-vacuum, so that could account for the coincidental speed up). I'm not sure what else I could be doing wrong. It's definitely better than it was a few days ago, but I still see "LOG: duration: 77.315 ms statement: COMMIT" every minute or two.
Thank you,
-Alessandro
On Thu, Feb 23, 2012 at 10:38 AM, Alessandro Gagliardi <alessandro@path.com> wrote: > On Thu, Feb 23, 2012 at 10:01 AM, Steve Crawford > <scrawford@pinpointresearch.com> wrote: > So if it's not auto-vacuuming that's making my inserts so slow, what is it? > I'm batching my inserts (that didn't seem to help at all actually, but maybe > cause I had already turned off synchronous_commit anyway). I've gotten rid > of a bunch of indices (especially those with low cardinality–that I did > around the same time as disabling auto-vacuum, so that could account for the > coincidental speed up). I'm not sure what else I could be doing wrong. It's > definitely better than it was a few days ago, but I still see "LOG: > duration: 77.315 ms statement: COMMIT" every minute or two. > Have you considered that you may have lock contention? Sampling pg_locks may be illuminating; based on your description the lock contention would be intermittent, so I wouldn't trust an n=1 test. -p -- Peter van Hardenberg San Francisco, California "Everything was beautiful, and nothing hurt." -- Kurt Vonnegut
On 2/23/2012 12:38 PM, Alessandro Gagliardi wrote: > Does analyze increase the efficiency of inserts or just selects? (I > assumed the latter.) Obviously, I will need to analyze sometimes, but That depends on if you have triggers that are doing selects. But in general you are correct, analyze wont help inserts. checkpoint_segments can help insert speed, what do you have that set to? Also how you insert can make things faster too. (insert vs prepared vs COPY) Also, if you have too many indexes on a table that can cause things to slow down. Your IO layer needs to be fast too. Have you watched vmstat and iostat? Have you read up on synchronous_commit? -Andy
On 02/23/2012 10:38 AM, Alessandro Gagliardi wrote:
You need to rethink things a bit. Databases can fail in all sorts of ways and can slow down during bursts of activity, data dumps, etc. You may need to investigate some form of intermediate buffering.
And what is the 50ms limit? Is that an average? Since you are batching, it doesn't sound like you need every statement to complete in 50ms. There is always a tradeoff between overall maximum throughput and maximum allowed latency.
That's a huge topic ranging from hardware (CPU speed, RAM, spindle-count, disk-type, battery-backed write caching), OS (you *are* running on some sort of *nix, right?), OS tuning, PG tuning, etc. Fortunately the biggest benefit comes from some basic tuning.
I recommend you abandon this thread as it presupposes a now seemingly incorrect cause of the problem and start a new one titled something like "Tuning for high insert rate" where you describe the problem you want to solve. See http://wiki.postgresql.org/wiki/Guide_to_reporting_problems for a good guide to the information that will be helpful in diagnosis.
Cheers,
Steve
On Thu, Feb 23, 2012 at 10:01 AM, Steve Crawford <scrawford@pinpointresearch.com> wrote:No problem and not stupid. With the manual running to hundreds of pages plus information on wikis and mailing-list histories spanning hundreds of thousands of messages sometimes knowing where to look is 90% of the battle.The documentation has information like "This parameter can only be set in the postgresql.conf file or on the server command line." that will tell you in advance which settings will fail when you attempt to set them through SQL statements.Ah. I missed that. Sorry for asking stupid questions.
But autovacuum is pretty smart about not vacuuming tables until reasonably necessary. And beware that autovacuum is also controlling when to analyze a table. Mass inserts are probably changing the characteristics of your table such that it needs to be analyzed to allow the planner to properly optimize your queries.Okay, that makes more sense to me; because the stats would be changing quickly and so while vacuuming may not be necessary, analyzing would be. At the same time, I can't afford to analyze if it's causing my inserts to take over 50 ms. Something else I should add: if my selects are slow, that's annoying; but if my inserts are slow, that could be disastrous...
You need to rethink things a bit. Databases can fail in all sorts of ways and can slow down during bursts of activity, data dumps, etc. You may need to investigate some form of intermediate buffering.
Then you are not running a current version of PostgreSQL so the first step to performance enhancement is to upgrade. (As a general rule - there are occasionally specific cases where performance decreases.)...Apparently the last four columns don't exist in my database. As for the first four, that is somewhat illuminating....
How are you batching them? Into a temp table that is copied to the main table? As a bunch of insert statements within a single connection (saves probably considerable time due to eliminating multiple connection setups)? With one PREPARE and multiple EXECUTE (to save repeated planning time - I'm not sure this will buy you much for simple inserts, though)? With COPY (much faster as many records are inserted in a single statement but if one fails, all fail)?So if it's not auto-vacuuming that's making my inserts so slow, what is it? I'm batching my inserts (that didn't seem to help at all actually, but maybe cause I had already turned off synchronous_commit anyway).
And what is the 50ms limit? Is that an average? Since you are batching, it doesn't sound like you need every statement to complete in 50ms. There is always a tradeoff between overall maximum throughput and maximum allowed latency.
Yes, inserts require the indexes to be updated so they can slow down inserts and updates.I've gotten rid of a bunch of indices (especially those with low cardinality–that I did around the same time as disabling auto-vacuum, so that could account for the coincidental speed up).
I'm not sure what else I could be doing wrong. It's definitely better than it was a few days ago, but I still see "LOG: duration: 77.315 ms statement: COMMIT" every minute or two.
That's a huge topic ranging from hardware (CPU speed, RAM, spindle-count, disk-type, battery-backed write caching), OS (you *are* running on some sort of *nix, right?), OS tuning, PG tuning, etc. Fortunately the biggest benefit comes from some basic tuning.
I recommend you abandon this thread as it presupposes a now seemingly incorrect cause of the problem and start a new one titled something like "Tuning for high insert rate" where you describe the problem you want to solve. See http://wiki.postgresql.org/wiki/Guide_to_reporting_problems for a good guide to the information that will be helpful in diagnosis.
Cheers,
Steve
I'm unable to make sense of pg_locks. The vast majority are locktype='transactionid', mode='ExclusiveLock', granted=t. There are some 'relation' locks with mode='RowExclusiveLock' and fewer with 'AccessShareLock'. I have no idea what I should be looking for here.
On Thu, Feb 23, 2012 at 10:42 AM, Peter van Hardenberg <pvh@pvh.ca> wrote:
On Thu, Feb 23, 2012 at 10:38 AM, Alessandro Gagliardi
<alessandro@path.com> wrote:> around the same time as disabling auto-vacuum, so that could account for theHave you considered that you may have lock contention? Sampling
> coincidental speed up). I'm not sure what else I could be doing wrong. It's
> definitely better than it was a few days ago, but I still see "LOG:
> duration: 77.315 ms statement: COMMIT" every minute or two.
>
pg_locks may be illuminating; based on your description the lock
contention would be intermittent, so I wouldn't trust an n=1 test.
-p
--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt." -- Kurt Vonnegut
On Thu, Feb 23, 2012 at 11:07 AM, Andy Colson <andy@squeakycode.net> wrote:
That depends on if you have triggers that are doing selects. But in general you are correct, analyze wont help inserts.
I do have some, actually. I have a couple trigger functions like:
CREATE OR REPLACE FUNCTION locations_quiet_unique_violation()
RETURNS trigger AS
$BODY$
BEGIN
IF EXISTS (SELECT 1 FROM public.locations WHERE geohash = NEW.geohash) THEN
RETURN NULL;
ELSE
RETURN NEW;
END IF;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
that are triggered thusly:
CREATE TRIGGER locations_check_unique_violation
BEFORE INSERT
ON locations
FOR EACH ROW
EXECUTE PROCEDURE locations_quiet_unique_violation();
I left auto-vacuum enabled for those tables.
checkpoint_segments can help insert speed, what do you have that set to?
40. Checking http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server it looks like setting that as high as 256 would not necessarily be unreasonable. What do you think?
Also how you insert can make things faster too. (insert vs prepared vs COPY)
I'm doing this all with INSERT. Is COPY that much faster? I don't know anything about prepared.
Also, if you have too many indexes on a table that can cause things to slow down.
Yeah, got that. I removed a bunch. I'd rather not remove what's left unless I have to.
Your IO layer needs to be fast too. Have you watched vmstat and iostat?
I don't know if I have access to vmstat and iostat. Heroku is hosting this for me on AWS.
Have you read up on synchronous_commit?
Only a tiny bit. A couple people suggested disabling it since my database is being hosted on AWS so I did that. It seems a bit risky but perhaps worth it.
On 2/23/2012 2:40 PM, Alessandro Gagliardi wrote: > > checkpoint_segments can help insert speed, what do you have that set to? > > 40. Checking > http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server it looks > like setting that as high as 256 would not necessarily be unreasonable. > What do you think? I'd say go slow. Try a little bit and see if it helps. I don't actually have high insert rate problems, so I don't actually know from experience. > > Also how you insert can make things faster too. (insert vs prepared > vs COPY) > > I'm doing this all with INSERT. Is COPY that much faster? I don't know > anything about prepared. If you can batch multiple records then COPY is the fastest method. (Of course your triggers might be the cause for the slowness and not insert speed). Depending on the language you are using to insert records, you can prepare a query and only send the arguments vs sending the entire sql statement every time. In pseudo-perl code I'd: my $q = $db->prepare('insert into table(col1, vol2) values ($1, $2)'); $q->execute('one', 'two'); $q->execute('three', 'four'); $q->execute('five', 'six'); This is faster because the "insert..." is only sent over the wire and parsed once. Then only the arguments are sent for each execute. Speed wise, I think it'll go: 1) slowest: individual insert statements 2) prepared statements 3) fastest: COPY Again.. assuming the triggers are not the bottleneck. Have you run an insert by hand with 'EXPLAIN ANALYZE'? -Andy > > Have you read up on synchronous_commit? > > Only a tiny bit. A couple people suggested disabling it since my > database is being hosted on AWS so I did that. It seems a bit risky but > perhaps worth it. > I would think they are running on battery backed IO, with boxes on UPS, so I'd guess its pretty safe. It would also depend on your commit size. If you are batching a million records into one commit, you might loose all of them. -Andy
On Thu, Feb 23, 2012 at 11:26 AM, Steve Crawford <scrawford@pinpointresearch.com> wrote:
You need to rethink things a bit. Databases can fail in all sorts of ways and can slow down during bursts of activity, data dumps, etc. You may need to investigate some form of intermediate buffering.
Currently my "buffer" (such as it is) is Kestrel which queues up INSERTs and then executes them one at a time. This keeps the rest of the app from being held back, but it becomes a problem when the queue fills up faster than it can drain. For one particularly heavy logger, I tried writing it all to an unconstrained table with the idea that I would copy that table (using INSERT . . . SELECT . . .) into another table with constraints, reducing the data in the process (deduping and such). Problem was, even my constraint-less table wasn't fast enough. Perhaps buffering to a local file and then using COPY would do the trick.
Then you are not running a current version of PostgreSQL so the first step to performance enhancement is to upgrade. (As a general rule - there are occasionally specific cases where performance decreases.)...Apparently the last four columns don't exist in my database. As for the first four, that is somewhat illuminating....
We're using 9.0.6. Peter, how do you feel about upgrading? :)
How are you batching them? Into a temp table that is copied to the main table? As a bunch of insert statements within a single connection (saves probably considerable time due to eliminating multiple connection setups)? With one PREPARE and multiple EXECUTE (to save repeated planning time - I'm not sure this will buy you much for simple inserts, though)? With COPY (much faster as many records are inserted in a single statement but if one fails, all fail)?
The second one (a bunch of insert statements within a single connection). As I mentioned above, I was going to try the temp table thing, but that wasn't fast enough. COPY might be my next attempt.
And what is the 50ms limit? Is that an average? Since you are batching, it doesn't sound like you need every statement to complete in 50ms. There is always a tradeoff between overall maximum throughput and maximum allowed latency.
No, not average. I want to be able to do 100-200 INSERTs per second (90% of those would go to one of two tables, the other 10% would go to any of a couple dozen tables). If 1% of my INSERTs take 100 ms, then the other 99% must take no more than 9 ms to complete.
...actually, it occurs to me that since I'm now committing batches of 1000, a 100ms latency per commit wouldn't be bad at all! I'll have to look into that.... (Either that or my batching isn't working like I thought it was.)
I recommend you abandon this thread as it presupposes a now seemingly incorrect cause of the problem and start a new one titled something like "Tuning for high insert rate" where you describe the problem you want to solve. See http://wiki.postgresql.org/wiki/Guide_to_reporting_problems for a good guide to the information that will be helpful in diagnosis.
I'll leave the title as is since I think simply renaming this message would cause more confusion than it would prevent. But this gives me something to chew on and when I need to return to this topic, I'll do just that.
Thanks,
-Alessandro
On 02/23/2012 01:07 PM, Alessandro Gagliardi wrote: > The second one (a bunch of insert statements within a single > connection). As I mentioned above, I was going to try the temp table > thing, but that wasn't fast enough. COPY might be my next attempt. insert into...; insert into...; insert into...; ... is really (ignoring statement preparation time): begin; insert into...; commit; begin; insert into...; commit; begin; insert into...; commit; It's possible that you might get a nice boost by wrapping the inserts into a transaction: begin; insert into...; insert into...; insert into...; ... commit; This only requires all that disk-intensive stuff that protects your data once at the end instead of 1000 times for you batch of 1000. COPY is even better. I just ran a quick test by restoring a table on my desktop hacking db (untuned, few years old PC, single SATA disk, modest RAM and lots of resource competition). The 22+ million rows restored in 282 seconds which is a rate somewhat north of 78,000 records/second or about 0.13ms/record. You may want to eliminate that trigger, which only seems to exist to silence errors from uniqueness violations, and copy the incoming data into a temp table then move the data with a variant of: INSERT INTO main_table (SELECT ... FROM incoming_table WHERE NOT EXISTS ((SELECT 1 from main_table WHERE ...)) Cheers, Steve
On Thu, Feb 23, 2012 at 1:37 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote:
It's possible that you might get a nice boost by wrapping the inserts into a transaction:
begin;
insert into...;
insert into...;
insert into...;
...
commit;
This only requires all that disk-intensive stuff that protects your data once at the end instead of 1000 times for you batch of 1000.
I think that is essentially what I am doing. I'm using psycopg2 in a python script that runs continuously on a queue. It opens a connection and creates a cursor when it begins. It then passes that cursor into a function along with the data (read off the queue) that needs to be inserted. I run cur.execute("SAVEPOINT insert_savepoint;") followed by cur.execute(q) (where q is the insert statement). If there's an error I run cur.execute("ROLLBACK TO SAVEPOINT insert_savepoint;") otherwise I increment a counter. Once the counter exceeds 999, I run conn.commit() and reset the counter. I believe that psycopg2 is essentially doing what you are suggesting. The fact that the data does not appear in the database until conn.commit() tells me that it's not committing anything until then.
COPY is even better. I just ran a quick test by restoring a table on my desktop hacking db (untuned, few years old PC, single SATA disk, modest RAM and lots of resource competition). The 22+ million rows restored in 282 seconds which is a rate somewhat north of 78,000 records/second or about 0.13ms/record.
I'll try that. Of course, the fact that the database is stored in AWS complicates matters. Regardless, it sounds like COPY should be considerably faster.
On Thu, Feb 23, 2012 at 1:11 PM, Peter van Hardenberg <pvh@heroku.com> wrote:
My hunch is still that your issue is lock contention.
How would I check that? I tried looking at pg_locks but I don't know what to look for.
We have many customers who do much more than this throughput, thoughI'm not sure what level of resourcing you're current at. You might
consider experimenting with a larger system if you're having
performance problems.
Heh. I thought you might say that. :) It's definitely worth considering, but as youmight expect, I want to exhaust other options first. For customers who do much more (or even comparable) throughput, can you tell me how big of a system they require?
Also, as per Andy's suggestion, I'd like to try doubling checkpoint_segments. However, it appears that that is one of those variables that I cannot change from pgAdmin. I don't suppose there's any way to change this without rebooting the database?
Also, as per Andy's suggestion, I'd like to try doubling checkpoint_segments. However, it appears that that is one of those variables that I cannot change from pgAdmin. I don't suppose there's any way to change this without rebooting the database?
On Thu, Feb 23, 2012 at 1:07 PM, Alessandro Gagliardi <alessandro@path.com> wrote: >> >> ...Apparently the last four columns don't exist in my database. As for the >> first four, that is somewhat illuminating.... >> >> Then you are not running a current version of PostgreSQL so the first step >> to performance enhancement is to upgrade. (As a general rule - there are >> occasionally specific cases where performance decreases.) >> > We're using 9.0.6. Peter, how do you feel about upgrading? :) > 9.1's in beta; we're working on writing an upgrade system before calling it GA, but it works fine. Feel free. My hunch is still that your issue is lock contention. > No, not average. I want to be able to do 100-200 INSERTs per second (90% of > those would go to one of two tables, the other 10% would go to any of a > couple dozen tables). If 1% of my INSERTs take 100 ms, then the other 99% > must take no more than 9 ms to complete. > ...actually, it occurs to me that since I'm now committing batches of 1000, > a 100ms latency per commit wouldn't be bad at all! I'll have to look into > that.... (Either that or my batching isn't working like I thought it was.) > We have many customers who do much more than this throughput, though I'm not sure what level of resourcing you're current at. You might consider experimenting with a larger system if you're having performance problems. Peter
yup there is. the parameter checkpoint_segments does not require a restart of the server, just a reload :)
On Fri, Feb 24, 2012 at 12:54 AM, Alessandro Gagliardi <alessandro@path.com> wrote:
On Thu, Feb 23, 2012 at 1:11 PM, Peter van Hardenberg <pvh@heroku.com> wrote:My hunch is still that your issue is lock contention.How would I check that? I tried looking at pg_locks but I don't know what to look for.We have many customers who do much more than this throughput, thoughI'm not sure what level of resourcing you're current at. You might
consider experimenting with a larger system if you're having
performance problems.Heh. I thought you might say that. :) It's definitely worth considering, but as youmight expect, I want to exhaust other options first. For customers who do much more (or even comparable) throughput, can you tell me how big of a system they require?
Also, as per Andy's suggestion, I'd like to try doubling checkpoint_segments. However, it appears that that is one of those variables that I cannot change from pgAdmin. I don't suppose there's any way to change this without rebooting the database?
you might also consider increasing the wal_buffers value if it's still the default (64KB)
BR
BR
On Thu, Feb 23, 2012 at 3:28 PM, Alessandro Gagliardi <alessandro@path.com> wrote: > I'm unable to make sense of pg_locks. The vast majority are > locktype='transactionid', mode='ExclusiveLock', granted=t. There are some > 'relation' locks with mode='RowExclusiveLock' and fewer with > 'AccessShareLock'. I have no idea what I should be looking for here. If you have lock contention, you'll see locks with granted='f', at least from time to time. Those are the ones you want to worry about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company