Thread: sequential scans that pick up only deleted records do not honor query cancel or timeout

sequential scans that pick up only deleted records do not honor query cancel or timeout

From

Merlin Moncure

Date:

22 May 2012, 14:14:22

Basically, $subject says it all.  It's pretty easy to reproduce:
delete all the records from a large table and execute any sequentially
scanning query before autocvacuum comes around and cleans the table
up; the query will be uncancellable.  This can result in fairly
pathological behavior in i/o constrained systems because the query
will bog itself down writing out hint bits for minutes or hours
without any way to cancel or effective i/o throttling (unlike vacuum).

IMO, this should be backpatched, and is likely fixed by injecting an
interrupts check at a strategic location.  But where? I was thinking
in heapgetpage() but here are no checks elsehwere in heapam.c which is
a red flag.


merlin

Re: sequential scans that pick up only deleted records do not honor query cancel or timeout

From

Tom Lane

Date:

22 May 2012, 18:08:50

Merlin Moncure <mmoncure@gmail.com> writes:
> Basically, $subject says it all.  It's pretty easy to reproduce:
> delete all the records from a large table and execute any sequentially
> scanning query before autocvacuum comes around and cleans the table
> up; the query will be uncancellable.  This can result in fairly
> pathological behavior in i/o constrained systems because the query
> will bog itself down writing out hint bits for minutes or hours
> without any way to cancel or effective i/o throttling (unlike vacuum).

> IMO, this should be backpatched, and is likely fixed by injecting an
> interrupts check at a strategic location.  But where? I was thinking
> in heapgetpage() but here are no checks elsehwere in heapam.c which is
> a red flag.

heapgetpage() seems like the most reasonable place to me, as there we'll
only be making the check once per page not once per tuple.

            regards, tom lane

Re: sequential scans that pick up only deleted records do not honor query cancel or timeout

From

Merlin Moncure

Date:

22 May 2012, 19:39:19

On Tue, May 22, 2012 at 4:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> Basically, $subject says it all. =A0It's pretty easy to reproduce:
>> delete all the records from a large table and execute any sequentially
>> scanning query before autocvacuum comes around and cleans the table
>> up; the query will be uncancellable. =A0This can result in fairly
>> pathological behavior in i/o constrained systems because the query
>> will bog itself down writing out hint bits for minutes or hours
>> without any way to cancel or effective i/o throttling (unlike vacuum).
>
>> IMO, this should be backpatched, and is likely fixed by injecting an
>> interrupts check at a strategic location. =A0But where? I was thinking
>> in heapgetpage() but here are no checks elsehwere in heapam.c which is
>> a red flag.
>
> heapgetpage() seems like the most reasonable place to me, as there we'll
> only be making the check once per page not once per tuple.

ok. this fixes the issue:

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/hea=
pam.c
new file mode 100644
index 0d6fe3f..acef385
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
*************** heapgetpage(HeapScanDesc scan, BlockNumb
*** 287,292 ****
--- 287,299 ----

    LockBuffer(buffer, BUFFER_LOCK_UNLOCK);

+   /*
+    * We have to check for signals here because a long series of
+    * pages containing nothing but deleted tuples can cause control
+    * to remain in the scan loop for an unbounded amount of time.
+    */
+   CHECK_FOR_INTERRUPTS();
+
    Assert(ntup <=3D MaxHeapTuplesPerPage);
    scan->rs_ntuples =3D ntup;
  }

merlin

Re: sequential scans that pick up only deleted records do not honor query cancel or timeout

From

Tom Lane

Date:

22 May 2012, 20:13:28

Merlin Moncure <mmoncure@gmail.com> writes:
> On Tue, May 22, 2012 at 4:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> heapgetpage() seems like the most reasonable place to me, as there we'll
>> only be making the check once per page not once per tuple.

> ok. this fixes the issue:

Well, actually it needs to be a bit earlier than that or it won't stop
non-pageatatime scans (cf the return at line 230 in HEAD).  I was
thinking to place it at the point where we hold no buffer pin, just to
save a couple cycles in error cleanup:


diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0d6fe3f0acd38a87af759bd3d5196da504202486..0c67156390a6a1bf1fee0bb39b96e0da8cd55dbb 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
*************** heapgetpage(HeapScanDesc scan, BlockNumb
*** 222,227 ****
--- 222,234 ----
          scan->rs_cbuf = InvalidBuffer;
      }

+     /*
+      * Be sure to check for interrupts at least once per page.  Checks at
+      * higher code levels won't be able to stop a seqscan that encounters
+      * many pages' worth of consecutive dead tuples.
+      */
+     CHECK_FOR_INTERRUPTS();
+
      /* read page using selected strategy */
      scan->rs_cbuf = ReadBufferExtended(scan->rs_rd, MAIN_FORKNUM, page,
                                         RBM_NORMAL, scan->rs_strategy);


But thanks for verifying that a check in this function does fix the
issue for you.

            regards, tom lane