Thread: understanding the interaction with delete/select/vacuum
Hello all, say for example I have a larger table T with 26 millions rows, one million associated with each letter of the alphabet. I have a long running process which does a 'SELECT ID FROM T'. The results are being streamed to the client using a fetch size limit. This process with take 26 hours to run. It turns out that all the "C" and "P" are going to be deleted when the SELECT gets to them. Several hours into this process, after the "C" rows have been deleted in a separate transaction but we haven't yet gotten to the "P" rows, a vacuum is begun on table T. What happens? Will the 1 million "C" rows be freed and made available for reuse or will their visibility with the initial SELECT statement cause the vacuum to skip over them? Thanks! -- Alan
my understanding of how postgres works may be flawed, but the way I do understand it is that each process gets its own 'copy' of the database to work with... which is why, for example, you wont be able to run a vaccuum if you've got an ongoing connection to the dbase. I think. On Mon, 29 Aug 2005 15:11:14 -0400, Alan Stange <stange@rentec.com> wrote: > Hello all, > > say for example I have a larger table T with 26 millions rows, one > million associated with each letter of the alphabet. > > I have a long running process which does a 'SELECT ID FROM T'. The > results are being streamed to the client using a fetch size limit. This > process with take 26 hours to run. It turns out that all the "C" and > "P" are going to be deleted when the SELECT gets to them. > > Several hours into this process, after the "C" rows have been deleted in > a separate transaction but we haven't yet gotten to the "P" rows, a > vacuum is begun on table T. > > > What happens? > > Will the 1 million "C" rows be freed and made available for reuse or > will their visibility with the initial SELECT statement cause the vacuum > to skip over them? > > Thanks! > > -- Alan > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Oren Mazor // Developer, Sysadmin, Explorer GPG Key: http://www.grepthemonkey.org/secure "Ut sementem feceris, ita metes"
Alan Stange <stange@rentec.com> writes: > I have a long running process which does a 'SELECT ID FROM T'. The > results are being streamed to the client using a fetch size limit. This > process with take 26 hours to run. It turns out that all the "C" and > "P" are going to be deleted when the SELECT gets to them. > Several hours into this process, after the "C" rows have been deleted in > a separate transaction but we haven't yet gotten to the "P" rows, a > vacuum is begun on table T. > What happens? VACUUM can't remove any rows that are still potentially visible to any open transaction ... so those rows will stay. It's best to avoid having single transactions that take 26 hours to run --- there are a lot of other inefficiencies that will show up in such a situation. Can you break the long-running process into shorter transactions? regards, tom lane
Tom Lane wrote: > Alan Stange <stange@rentec.com> writes: > >> I have a long running process which does a 'SELECT ID FROM T'. The >> results are being streamed to the client using a fetch size limit. This >> process with take 26 hours to run. It turns out that all the "C" and >> "P" are going to be deleted when the SELECT gets to them. >> > > >> Several hours into this process, after the "C" rows have been deleted in >> a separate transaction but we haven't yet gotten to the "P" rows, a >> vacuum is begun on table T. >> > > >> What happens? >> > > VACUUM can't remove any rows that are still potentially visible to any > open transaction ... so those rows will stay. It's best to avoid having > single transactions that take 26 hours to run --- there are a lot of > other inefficiencies that will show up in such a situation. Thanks. Is there a variation of the isolation rules that would achieve my desired goal: have the deleted rows be vacuumed even though the select still has them in visibility? Or is this just a the wrong direction to go in? > Can you > break the long-running process into shorter transactions? > That's what I'm working on now. I've reworked the sql command so that the deletes involved don't take hours to run but instead happen in 10K row chunks. Now I was going to rework the select to work in O(100K) row chunks. Is there a reason why the open() calls for a vacuum don't use O_DIRECT, thus possibly preventing the IO from flushing lots of data from memory? I was going to hack something up for the WAL files for 8.1, but I found that O_DIRECT is now used when using open_sync for the WAL files. Finally, why O_RDWR for the wal files and not O_WRONLY? I was under the impression that the files were only written to by the usual postgresql server processes. Thanks much! -- Alan