Thread: understanding the interaction with delete/select/vacuum

understanding the interaction with delete/select/vacuum

From
Alan Stange
Date:
Hello all,

say for example I have a larger table T with 26 millions rows, one
million associated with each letter of the alphabet.

I have a long running process which does a 'SELECT ID FROM T'.  The
results are being streamed to the client using a fetch size limit.  This
process with take 26 hours to run.    It turns out that all the "C" and
"P" are going to be deleted when the SELECT gets to them.

Several hours into this process, after the "C" rows have been deleted in
a separate transaction but we haven't yet gotten to the "P" rows, a
vacuum is begun on table T.


What happens?

Will the 1 million "C" rows be freed and made available for reuse or
will their visibility with the initial SELECT statement cause the vacuum
to skip over them?

Thanks!

-- Alan

Re: understanding the interaction with delete/select/vacuum

From
"Oren Mazor"
Date:
my understanding of how postgres works may be flawed, but the way I do
understand it is that each process gets its own 'copy' of the database to
work with...

which is why, for example, you wont be able to run a vaccuum if you've got
an ongoing connection to the dbase.

I think.

On Mon, 29 Aug 2005 15:11:14 -0400, Alan Stange <stange@rentec.com> wrote:

> Hello all,
>
> say for example I have a larger table T with 26 millions rows, one
> million associated with each letter of the alphabet.
>
> I have a long running process which does a 'SELECT ID FROM T'.  The
> results are being streamed to the client using a fetch size limit.  This
> process with take 26 hours to run.    It turns out that all the "C" and
> "P" are going to be deleted when the SELECT gets to them.
>
> Several hours into this process, after the "C" rows have been deleted in
> a separate transaction but we haven't yet gotten to the "P" rows, a
> vacuum is begun on table T.
>
>
> What happens?
>
> Will the 1 million "C" rows be freed and made available for reuse or
> will their visibility with the initial SELECT statement cause the vacuum
> to skip over them?
>
> Thanks!
>
> -- Alan
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings



--
Oren Mazor // Developer, Sysadmin, Explorer
GPG Key: http://www.grepthemonkey.org/secure
"Ut sementem feceris, ita metes"

Re: understanding the interaction with delete/select/vacuum

From
Tom Lane
Date:
Alan Stange <stange@rentec.com> writes:
> I have a long running process which does a 'SELECT ID FROM T'.  The
> results are being streamed to the client using a fetch size limit.  This
> process with take 26 hours to run.    It turns out that all the "C" and
> "P" are going to be deleted when the SELECT gets to them.

> Several hours into this process, after the "C" rows have been deleted in
> a separate transaction but we haven't yet gotten to the "P" rows, a
> vacuum is begun on table T.

> What happens?

VACUUM can't remove any rows that are still potentially visible to any
open transaction ... so those rows will stay.  It's best to avoid having
single transactions that take 26 hours to run --- there are a lot of
other inefficiencies that will show up in such a situation.  Can you
break the long-running process into shorter transactions?

            regards, tom lane

Re: understanding the interaction with delete/select/vacuum

From
Alan Stange
Date:
Tom Lane wrote:
> Alan Stange <stange@rentec.com> writes:
>
>> I have a long running process which does a 'SELECT ID FROM T'.  The
>> results are being streamed to the client using a fetch size limit.  This
>> process with take 26 hours to run.    It turns out that all the "C" and
>> "P" are going to be deleted when the SELECT gets to them.
>>
>
>
>> Several hours into this process, after the "C" rows have been deleted in
>> a separate transaction but we haven't yet gotten to the "P" rows, a
>> vacuum is begun on table T.
>>
>
>
>> What happens?
>>
>
> VACUUM can't remove any rows that are still potentially visible to any
> open transaction ... so those rows will stay.  It's best to avoid having
> single transactions that take 26 hours to run --- there are a lot of
> other inefficiencies that will show up in such a situation.
Thanks.

Is there a variation of the isolation rules that would achieve my
desired goal:  have the deleted rows be vacuumed even though the select
still has them in visibility?   Or is this just a the wrong direction to
go in?

> Can you
> break the long-running process into shorter transactions?
>
That's what I'm working on now.   I've reworked the sql command so that
the deletes involved don't take hours to run but instead happen in 10K
row chunks.  Now I was going to rework the select to work in O(100K) row
chunks.


Is there a reason why the open() calls for a vacuum don't use O_DIRECT,
thus possibly preventing the IO from flushing lots of data from
memory?   I was going to hack something up for the WAL files for 8.1,
but I found that O_DIRECT is now used when using open_sync for the WAL
files.   Finally, why O_RDWR for the wal files and not O_WRONLY?   I was
under the impression that the files were only written to by the usual
postgresql server processes.

Thanks much!

-- Alan