Thread: vacuum performance on insert
Hi, I'm curious -- does "vacuum analyze e.g. table1" improve performance on "insert into table1 ...". I understand the vacuum analyze helps out the query -- select, etc., but just not quite sure on insert. Specifically, I'm doing the following. 1, delete records ... 2, insert records ... if I add "vacuum analyze" in-between this two steps, will it help on the performance on the insert? More importantly, If so, why? Thanks, Sean
Sean Chen <zyschen@gmail.com> wrote: > 1, delete records ... > 2, insert records ... > > if I add "vacuum analyze" in-between this two steps, will it help > on the performance on the insert? Assuming there are no long-running transactions which would still be able to see the deleted rows, a VACUUM between those statements would allow the INSERT to re-use the space previously occupied by the deleted rows, rather than possibly needing to allocate new space from the OS. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Sean Chen <zyschen@gmail.com> wrote: >> 1, delete records ... >> 2, insert records ... >> >> if I add "vacuum analyze" in-between this two steps, will it help >> on the performance on the insert? > Assuming there are no long-running transactions which would still be > able to see the deleted rows, a VACUUM between those statements > would allow the INSERT to re-use the space previously occupied by > the deleted rows, rather than possibly needing to allocate new space > from the OS. But on the other side of the coin, the ANALYZE step is probably not very helpful there. Better to do that after you've loaded the new data. regards, tom lane
hi, thank you for the reply. I ran a number of tests to try to make sense of this. When I ran with or without vacuum, the number of disk io operations, cache operations etc. gathered from pg_stat table for the insertions are pretty much the same. So I don't see vacuum reduce disk io operations. Now from what you mentioned below, do you know what's the cost of postgres requesting new disk space from OS? I'm seeing a big performance difference with vacuum, but I need a proof to show it's the requesting new space operation that was the problem, not disk io, etc. since I would think disk could be expensive as well. Thanks, Sean On Thu, Aug 5, 2010 at 2:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> Sean Chen <zyschen@gmail.com> wrote: >>> 1, delete records ... >>> 2, insert records ... >>> >>> if I add "vacuum analyze" in-between this two steps, will it help >>> on the performance on the insert? > >> Assuming there are no long-running transactions which would still be >> able to see the deleted rows, a VACUUM between those statements >> would allow the INSERT to re-use the space previously occupied by >> the deleted rows, rather than possibly needing to allocate new space >> from the OS. > > But on the other side of the coin, the ANALYZE step is probably not very > helpful there. Better to do that after you've loaded the new data. > > regards, tom lane >
Sean Chen <zyschen@gmail.com> wrote: > Now from what you mentioned below, do you know what's the cost of > postgres requesting new disk space from OS? Depending on your OS and its version, your file system, your mount options, and your disk subsystem (and its firmware revision), there could be various effects -- the one likely to be biting you is write barriers. When you allocate additional space from the OS, and it extends a file or creates a new file, there might be a write barrier to ensure that the file system catalog entries are persisted. This could cause all writes (and possibly even reads) to pause until the data is written to disk. That's just a guess, of course. If you have a profiler you can run you might be able to pin it down with that. -Kevin