Thread: should check interrupts in BuildRelationExtStatistics ?

should check interrupts in BuildRelationExtStatistics ?

From
Justin Pryzby
Date:
Restarting a large instance took twice as long as I expected due to not
checking interrupts in (at least) statext_ndistinct_build.  Long enough that I
attached (and was able to attach) a debugger to verify, which I think is too
long.  I think it could cause issues for an high-availability cluster or other
script if it takes too long to shut down.

The tables being auto-analyzed have 9 exteneded stats objects, each with stats
target=10.  7 of those are (ndistinct) stats on 4 simple columns plus 1
expression (5 total).  And the other 2 stats objects are expressional stats
(necessarily on a single expression).



Re: should check interrupts in BuildRelationExtStatistics ?

From
Michael Paquier
Date:
On Sun, May 08, 2022 at 07:01:08PM -0500, Justin Pryzby wrote:
> Restarting a large instance took twice as long as I expected due to not
> checking interrupts in (at least) statext_ndistinct_build.  Long enough that I
> attached (and was able to attach) a debugger to verify, which I think is too
> long.  I think it could cause issues for an high-availability cluster or other
> script if it takes too long to shut down.

Hmm.  That's annoying.

> The tables being auto-analyzed have 9 exteneded stats objects, each with stats
> target=10.  7 of those are (ndistinct) stats on 4 simple columns plus 1
> expression (5 total).  And the other 2 stats objects are expressional stats
> (necessarily on a single expression).

How long can the backend remain unresponsive?  I don't think that
anybody would object to the addition of some CHECK_FOR_INTERRUPTS() in
areas where it would be efficient to make the shutdown quicker, but
we need to think carefully about the places where we'd want to add
these.
--
Michael

Attachment

Re: should check interrupts in BuildRelationExtStatistics ?

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> How long can the backend remain unresponsive?  I don't think that
> anybody would object to the addition of some CHECK_FOR_INTERRUPTS() in
> areas where it would be efficient to make the shutdown quicker, but
> we need to think carefully about the places where we'd want to add
> these.

CHECK_FOR_INTERRUPTS is really quite cheap, just a test-and-branch.
I wouldn't put it in a *very* tight loop, but one test per row
processed while gathering stats is unlikely to be a problem.

            regards, tom lane



Re: should check interrupts in BuildRelationExtStatistics ?

From
Robert Haas
Date:
On Sun, May 8, 2022 at 11:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael@paquier.xyz> writes:
> > How long can the backend remain unresponsive?  I don't think that
> > anybody would object to the addition of some CHECK_FOR_INTERRUPTS() in
> > areas where it would be efficient to make the shutdown quicker, but
> > we need to think carefully about the places where we'd want to add
> > these.
>
> CHECK_FOR_INTERRUPTS is really quite cheap, just a test-and-branch.
> I wouldn't put it in a *very* tight loop, but one test per row
> processed while gathering stats is unlikely to be a problem.

+1. If we're finding things stalling that would be fixed by adding
CHECK_FOR_INTERRUPTS(), we should generally just add it. In the
unlikely event that this causes a performance problem, we can try to
figure out some other solution, but not responding to interrupts isn't
the right way to economize.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: should check interrupts in BuildRelationExtStatistics ?

From
Justin Pryzby
Date:
On Mon, May 09, 2022 at 09:11:37AM -0400, Robert Haas wrote:
> On Sun, May 8, 2022 at 11:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Michael Paquier <michael@paquier.xyz> writes:
> > > How long can the backend remain unresponsive?  I don't think that
> > > anybody would object to the addition of some CHECK_FOR_INTERRUPTS() in
> > > areas where it would be efficient to make the shutdown quicker, but
> > > we need to think carefully about the places where we'd want to add
> > > these.
> >
> > CHECK_FOR_INTERRUPTS is really quite cheap, just a test-and-branch.
> > I wouldn't put it in a *very* tight loop, but one test per row
> > processed while gathering stats is unlikely to be a problem.
> 
> +1. If we're finding things stalling that would be fixed by adding
> CHECK_FOR_INTERRUPTS(), we should generally just add it. In the
> unlikely event that this causes a performance problem, we can try to
> figure out some other solution, but not responding to interrupts isn't
> the right way to economize.

Reproduce the problem for ndistinct and dependencies like:

DROP TABLE t; CREATE TABLE t AS SELECT i A,1+i B,2+i C,3+i D,4+i E,5+i F, now() AS ts FROM generate_series(1.0,
99999.0)i;VACUUM t;
 
DROP STATISTICS stxx; CREATE STATISTICS stxx (ndistinct) ON mod(a,14),mod(b,15),mod(c,16),mod(d,17),mod(e,18),mod(f,19)
FROMt;
 
ANALYZE VERBOSE t;

Maybe this should actually call vacuum_delay_point(), like
compute_scalar_stats().  For MCV, there seems to be no issue, since those
functions are being called (but only for expressional stats).  But maybe I've
just failed to make a large enough, non-expressional MCV list for the problem
to be apparent.

The patch is WIP, but whatever we end up with should probably be backpatched at
least to v14, where expressional indexes were introduced, since they're likely
to have more columns, and are slower to compute.

diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index e8f71567b4e..e5538dcd4e1 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -19,6 +19,7 @@
 #include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_statistic_ext_data.h"
 #include "lib/stringinfo.h"
+#include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/nodes.h"
 #include "nodes/pathnodes.h"
@@ -383,6 +384,8 @@ statext_dependencies_build(StatsBuildData *data)
             MVDependency *d;
             MemoryContext oldcxt;
 
+            CHECK_FOR_INTERRUPTS();
+
             /* release memory used by dependency degree calculation */
             oldcxt = MemoryContextSwitchTo(cxt);
 
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index dd67b19b6fa..9db1d0325cd 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -269,6 +269,8 @@ statext_mcv_build(StatsBuildData *data, double totalrows, int stattarget)
         SortItem  **freqs;
         int           *nfreqs;
 
+        // CHECK_FOR_INTERRUPTS();
+
         /* used to search values */
         tmp = (MultiSortSupport) palloc(offsetof(MultiSortSupportData, ssup)
                                         + sizeof(SortSupportData));
diff --git a/src/backend/statistics/mvdistinct.c b/src/backend/statistics/mvdistinct.c
index 6ade5eff78c..3b739ab7ca0 100644
--- a/src/backend/statistics/mvdistinct.c
+++ b/src/backend/statistics/mvdistinct.c
@@ -29,6 +29,7 @@
 #include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_statistic_ext_data.h"
 #include "lib/stringinfo.h"
+#include "miscadmin.h"
 #include "statistics/extended_stats_internal.h"
 #include "statistics/statistics.h"
 #include "utils/fmgrprotos.h"
@@ -114,6 +115,8 @@ statext_ndistinct_build(double totalrows, StatsBuildData *data)
             MVNDistinctItem *item = &result->items[itemcnt];
             int            j;
 
+            CHECK_FOR_INTERRUPTS();
+
             item->attributes = palloc(sizeof(AttrNumber) * k);
             item->nattributes = k;
 



Re: should check interrupts in BuildRelationExtStatistics ?

From
Justin Pryzby
Date:
On Fri, Jun 03, 2022 at 10:28:37AM -0500, Justin Pryzby wrote:
> Maybe this should actually call vacuum_delay_point(), like
> compute_scalar_stats().

I think vacuum_delay_point() would be wrong for these cases, since they don't
call "fetchfunc()", like the other places which use vacuum_delay_point.

> For MCV, there seems to be no issue, since those
> functions are being called (but only for expressional stats).  But maybe I've
> just failed to make a large enough, non-expressional MCV list for the problem
> to be apparent.

I reproduced the issue with MCV like this:

DROP TABLE IF EXISTS t; CREATE TABLE t AS SELECT a::text,b::text,c::text,d::text,e::text,f::text,g::text FROM
generate_series(1000001,1000006)a,generate_series(1000001,1000006)b,generate_series(1000001,1000006)c,generate_series(1000001,1000006)d,generate_series(1000001,1000006)e,generate_series(1000001,1000006)f,generate_series(1000001,1000006)g,generate_series(1000001,1000006)h;
VACUUMt; 
 
DROP STATISTICS IF EXISTS stxx; CREATE STATISTICS stxx (mcv) ON a,b,c,d,e,f FROM t; ALTER STATISTICS stxx SET
STATISTICS9999; ANALYZE VERBOSE t;
 

This is slow (25 seconds) inside qsort:

(gdb) bt
#0  __memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:1020
#1  0x00005653d8686fac in varstrfastcmp_locale (a1p=0x5653dce67d54 "1000004~", len1=7, a2p=0x5653e895ffa4 "1000004~",
len2=7,ssup=ssup@entry=0x5653d98c37b8) at varlena.c:2444
 
#2  0x00005653d8687161 in varlenafastcmp_locale (x=94918188367184, y=94918384418720, ssup=0x5653d98c37b8) at
varlena.c:2270
#3  0x00005653d85134d8 in ApplySortComparator (ssup=0x5653d98c37b8, isNull2=<optimized out>, datum2=<optimized out>,
isNull1=<optimizedout>, datum1=<optimized out>) at ../../../src/include/utils/sortsupport.h:224
 
#4  multi_sort_compare (a=0x7fa587b44e58, b=0x7fa5875f0dd0, arg=0x5653d98c37b0) at extended_stats.c:903
#5  0x00005653d8712eed in qsort_arg (data=data@entry=0x7fa5875f0050, n=<optimized out>, n@entry=1679616,
element_size=element_size@entry=24,compare=compare@entry=0x5653d8513483 <multi_sort_compare>, 
 
    arg=arg@entry=0x5653d98c37b0) at ../../src/include/lib/sort_template.h:349
#6  0x00005653d851415f in build_sorted_items (data=data@entry=0x7fa58f2e1050, nitems=nitems@entry=0x7ffe4f764e5c,
mss=mss@entry=0x5653d98c37b0,numattrs=6, attnums=0x7fa58f2e1078) at extended_stats.c:1134
 
#7  0x00005653d8515d84 in statext_mcv_build (data=data@entry=0x7fa58f2e1050, totalrows=totalrows@entry=1679616,
stattarget=stattarget@entry=9999)at mcv.c:204
 
#8  0x00005653d8513819 in BuildRelationExtStatistics (onerel=onerel@entry=0x7fa5b26ef658, inh=inh@entry=false,
totalrows=1679616,numrows=numrows@entry=1679616, rows=rows@entry=0x7fa5a4103050, natts=natts@entry=7, 
 
    vacattrstats=vacattrstats@entry=0x5653d98b76b0) at extended_stats.c:213

The fix seems to be to CHECK_FOR_INTERRUPTS() within multi_sort_compare().
That would supercede the other two CHECK_FOR_INTERRUPTS I'd proposed, and
handle mcv, depends, and ndistinct all at once.

Does that sound right ?

For MCV, there's also ~0.6sec spent in build_column_frequencies(), which (if
needed) would be addressed by adding CFI in sort_item_compare.



Re: should check interrupts in BuildRelationExtStatistics ?

From
Michael Paquier
Date:
On Sat, Jun 04, 2022 at 08:42:33PM -0500, Justin Pryzby wrote:
> The fix seems to be to CHECK_FOR_INTERRUPTS() within multi_sort_compare().
> That would supercede the other two CHECK_FOR_INTERRUPTS I'd proposed, and
> handle mcv, depends, and ndistinct all at once.

Hmm.  I have to admit that adding a CFI() in multi_sort_compare()
stresses me a bit as it is dependent on the number of rows involved,
and it can be used as a qsort routine.
--
Michael

Attachment

Re: should check interrupts in BuildRelationExtStatistics ?

From
Justin Pryzby
Date:
On Mon, Jun 06, 2022 at 04:23:34PM +0900, Michael Paquier wrote:
> On Sat, Jun 04, 2022 at 08:42:33PM -0500, Justin Pryzby wrote:
> > The fix seems to be to CHECK_FOR_INTERRUPTS() within multi_sort_compare().
> > That would supercede the other two CHECK_FOR_INTERRUPTS I'd proposed, and
> > handle mcv, depends, and ndistinct all at once.
> 
> Hmm.  I have to admit that adding a CFI() in multi_sort_compare()
> stresses me a bit as it is dependent on the number of rows involved,
> and it can be used as a qsort routine.

That's exactly the problem for which I showed a backtrace - it took 10s of
seconds to do qsort, which is (uh) a human timescale and too long to be
unresponsive, even if I create on a table with many rows a stats object with a
lot of columns and a high stats target.

Attachment

Re: should check interrupts in BuildRelationExtStatistics ?

From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes:
> On Mon, Jun 06, 2022 at 04:23:34PM +0900, Michael Paquier wrote:
>> Hmm.  I have to admit that adding a CFI() in multi_sort_compare()
>> stresses me a bit as it is dependent on the number of rows involved,
>> and it can be used as a qsort routine.

> That's exactly the problem for which I showed a backtrace - it took 10s of
> seconds to do qsort, which is (uh) a human timescale and too long to be
> unresponsive, even if I create on a table with many rows a stats object with a
> lot of columns and a high stats target.

Hmm.  On my machine, the example last shown upthread takes about 9
seconds, which I agree is a mighty long time to be unresponsive
--- but it appears that fully half of that elapses before we
reach multi_sort_compare for the first time.  The first half of
the ANALYZE run does seem to contain some CFI calls, but they
are not exactly thick on the ground there either.  So I'm feeling
like this patch isn't ambitious enough.

I tried interrupting at a random point and then stepping, and
look what I hit after just a couple of steps:

(gdb) s
qsort_arg (data=data@entry=0x13161410, n=<optimized out>, n@entry=1679616,
    element_size=element_size@entry=16,
    compare=compare@entry=0x649450 <compare_scalars>,
    arg=arg@entry=0x7ffec539c0f0) at ../../src/include/lib/sort_template.h:353
353                             if (r == 0)
(gdb)
358                             pc -= ST_POINTER_STEP;
(gdb)
359                             DO_CHECK_FOR_INTERRUPTS();

That, um, piqued my interest.  After a bit of digging,
I modestly propose the attached.  I'm not sure if it's
okay to back-patch this, because maybe someone out there
is relying on qsort() to be incapable of throwing an error
--- but it'd solve the problem at hand and a bunch of other
issues of the same ilk.

            regards, tom lane

diff --git a/src/port/qsort.c b/src/port/qsort.c
index 7879e6cd56..34b99aac08 100644
--- a/src/port/qsort.c
+++ b/src/port/qsort.c
@@ -2,7 +2,12 @@
  *    qsort.c: standard quicksort algorithm
  */
 
-#include "c.h"
+#ifndef FRONTEND
+#include "postgres.h"
+#include "miscadmin.h"
+#else
+#include "postgres_fe.h"
+#endif
 
 #define ST_SORT pg_qsort
 #define ST_ELEMENT_TYPE_VOID
@@ -10,6 +15,11 @@
 #define ST_SCOPE
 #define ST_DECLARE
 #define ST_DEFINE
+
+#ifndef FRONTEND
+#define ST_CHECK_FOR_INTERRUPTS
+#endif
+
 #include "lib/sort_template.h"
 
 /*
diff --git a/src/port/qsort_arg.c b/src/port/qsort_arg.c
index fa7e11a3b8..685ab32913 100644
--- a/src/port/qsort_arg.c
+++ b/src/port/qsort_arg.c
@@ -2,7 +2,12 @@
  *    qsort_arg.c: qsort with a passthrough "void *" argument
  */
 
-#include "c.h"
+#ifndef FRONTEND
+#include "postgres.h"
+#include "miscadmin.h"
+#else
+#include "postgres_fe.h"
+#endif
 
 #define ST_SORT qsort_arg
 #define ST_ELEMENT_TYPE_VOID
@@ -11,4 +16,9 @@
 #define ST_COMPARE_ARG_TYPE void
 #define ST_SCOPE
 #define ST_DEFINE
+
+#ifndef FRONTEND
+#define ST_CHECK_FOR_INTERRUPTS
+#endif
+
 #include "lib/sort_template.h"

Re: should check interrupts in BuildRelationExtStatistics ?

From
Justin Pryzby
Date:
On Fri, Jul 01, 2022 at 07:19:11PM -0400, Tom Lane wrote:
> I tried interrupting at a random point and then stepping, and
> look what I hit after just a couple of steps:

I'd come up with the trick of setting
  SET backtrace_functions='ProcessInterrupts';

> That, um, piqued my interest.  After a bit of digging,
> I modestly propose the attached.  I'm not sure if it's
> okay to back-patch this, because maybe someone out there
> is relying on qsort() to be incapable of throwing an error
> --- but it'd solve the problem at hand and a bunch of other
> issues of the same ilk.

Confirmed this fixes the 3 types of extended stats, at least for the example
cases I made.

If it's okay to backpatch, v14 seems adequate since the problem is more
prominent with expressional statistics (I'm sure that's why I hit it) ...
otherwise v15 would do.

-- 
Justin



Re: should check interrupts in BuildRelationExtStatistics ?

From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes:
> On Fri, Jul 01, 2022 at 07:19:11PM -0400, Tom Lane wrote:
>> That, um, piqued my interest.  After a bit of digging,
>> I modestly propose the attached.  I'm not sure if it's
>> okay to back-patch this, because maybe someone out there
>> is relying on qsort() to be incapable of throwing an error
>> --- but it'd solve the problem at hand and a bunch of other
>> issues of the same ilk.

> Confirmed this fixes the 3 types of extended stats, at least for the example
> cases I made.

> If it's okay to backpatch, v14 seems adequate since the problem is more
> prominent with expressional statistics (I'm sure that's why I hit it) ...
> otherwise v15 would do.

After thinking for awhile, my inclination is to apply my patch in
HEAD and yours in v15/v14.  We have a year to find out if there's
any problem with the more invasive check if we do it in HEAD;
but there's a lot less margin for error in v14, and not that
much in v15 either.

            regards, tom lane



Re: should check interrupts in BuildRelationExtStatistics ?

From
Andres Freund
Date:
Hi,

On 2022-07-01 19:19:11 -0400, Tom Lane wrote:
> That, um, piqued my interest.  After a bit of digging,
> I modestly propose the attached.  I'm not sure if it's
> okay to back-patch this, because maybe someone out there
> is relying on qsort() to be incapable of throwing an error
> --- but it'd solve the problem at hand and a bunch of other
> issues of the same ilk.

I'm worried about this. Interrupting random qsorts all over seems like it
could end up corrupting state. We do things like qsort in building snapshots
etc. Are we confident that we handle interrupts reliably in all those places?

Greetings,

Andres Freund



Re: should check interrupts in BuildRelationExtStatistics ?

From
Tom Lane
Date:
I wrote:
>>> I modestly propose the attached.  I'm not sure if it's
>>> okay to back-patch this, because maybe someone out there
>>> is relying on qsort() to be incapable of throwing an error

I thought I'd better try to check that, and I soon found several
places that *are* relying on qsort() to not be any smarter than the
libc version.  Notably, guc.c qsorts its persistent-state GUC array
during add_guc_variable --- an interrupt there would leave the GUC
data structure in an inconsistent state.  There are lesser hazards,
typically memory leaks, elsewhere.  So I fear this can't fly as-is.

Nonetheless, it'd be a good idea to use an interruptible sort in
as many places as we can.  If we don't mind YA copy of the qsort
code, I'd be inclined to propose inventing qsort_interruptible
which is like qsort_arg but also does CHECK_FOR_INTERRUPTS.
(There seems no reason to support a non-arg version, since you
can just pass NULL.)  Or we could redefine qsort_arg as allowing
interrupts, but that might still carry some surprises for existing
code.

            regards, tom lane



Re: should check interrupts in BuildRelationExtStatistics ?

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> I'm worried about this. Interrupting random qsorts all over seems like it
> could end up corrupting state. We do things like qsort in building snapshots
> etc. Are we confident that we handle interrupts reliably in all those places?

Nope ... see my followup just now.

            regards, tom lane



Re: should check interrupts in BuildRelationExtStatistics ?

From
Michael Paquier
Date:
On Tue, Jul 05, 2022 at 07:37:03PM -0400, Tom Lane wrote:
> Nonetheless, it'd be a good idea to use an interruptible sort in
> as many places as we can.  If we don't mind YA copy of the qsort
> code, I'd be inclined to propose inventing qsort_interruptible
> which is like qsort_arg but also does CHECK_FOR_INTERRUPTS.
> (There seems no reason to support a non-arg version, since you
> can just pass NULL.)  Or we could redefine qsort_arg as allowing
> interrupts, but that might still carry some surprises for existing
> code.

Agreed to use a new and different API for this purpose.  It seems like
a tool where one could write some new code and use an interruptible
qsort() thinking that it is fine, still a preperly-documented API has
clear benefits.
--
Michael

Attachment

Re: should check interrupts in BuildRelationExtStatistics ?

From
Thomas Munro
Date:
On Wed, Jul 6, 2022 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> qsort_interruptible

+1

FWIW compute_scalar_stats() was already noted as hot and perhaps a
candidate for specialisation (with more study needed), but
qsort_interruptible() seems like a sane use of ~4KB of text to me.



Re: should check interrupts in BuildRelationExtStatistics ?

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Wed, Jul 6, 2022 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> qsort_interruptible

> +1

So here's a patch that does it that way.  I first meant to put the
new file into src/port/, but after remembering that that directory
has no backend-only functions, I went with src/backend/utils/sort/
instead.

For the moment I contented myself with changing qsort[_arg] calls
that occur during statistics collection.  We have a lot more of course,
but many of them can be expected to not be dealing with much data,
and in some cases we might want some closer analysis to be sure there's
no performance hit.  So I'm inclined to not worry too much about the
remaining qsort calls until somebody complains.

This could be back-patched to v14 without much worry, I should think.

            regards, tom lane

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 07613c38b5..a7966fff83 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -100,7 +100,7 @@ static VacAttrStats *examine_attribute(Relation onerel, int attnum,
 static int    acquire_sample_rows(Relation onerel, int elevel,
                                 HeapTuple *rows, int targrows,
                                 double *totalrows, double *totaldeadrows);
-static int    compare_rows(const void *a, const void *b);
+static int    compare_rows(const void *a, const void *b, void *arg);
 static int    acquire_inherited_sample_rows(Relation onerel, int elevel,
                                           HeapTuple *rows, int targrows,
                                           double *totalrows, double *totaldeadrows);
@@ -1306,7 +1306,8 @@ acquire_sample_rows(Relation onerel, int elevel,
      * tuples are already sorted.
      */
     if (numrows == targrows)
-        qsort((void *) rows, numrows, sizeof(HeapTuple), compare_rows);
+        qsort_interruptible((void *) rows, numrows, sizeof(HeapTuple),
+                            compare_rows, NULL);

     /*
      * Estimate total numbers of live and dead rows in relation, extrapolating
@@ -1342,10 +1343,10 @@ acquire_sample_rows(Relation onerel, int elevel,
 }

 /*
- * qsort comparator for sorting rows[] array
+ * Comparator for sorting rows[] array
  */
 static int
-compare_rows(const void *a, const void *b)
+compare_rows(const void *a, const void *b, void *arg)
 {
     HeapTuple    ha = *(const HeapTuple *) a;
     HeapTuple    hb = *(const HeapTuple *) b;
@@ -1836,7 +1837,7 @@ static void compute_scalar_stats(VacAttrStatsP stats,
                                  int samplerows,
                                  double totalrows);
 static int    compare_scalars(const void *a, const void *b, void *arg);
-static int    compare_mcvs(const void *a, const void *b);
+static int    compare_mcvs(const void *a, const void *b, void *arg);
 static int    analyze_mcv_list(int *mcv_counts,
                              int num_mcv,
                              double stadistinct,
@@ -2473,8 +2474,8 @@ compute_scalar_stats(VacAttrStatsP stats,
         /* Sort the collected values */
         cxt.ssup = &ssup;
         cxt.tupnoLink = tupnoLink;
-        qsort_arg((void *) values, values_cnt, sizeof(ScalarItem),
-                  compare_scalars, (void *) &cxt);
+        qsort_interruptible((void *) values, values_cnt, sizeof(ScalarItem),
+                            compare_scalars, (void *) &cxt);

         /*
          * Now scan the values in order, find the most common ones, and also
@@ -2718,8 +2719,8 @@ compute_scalar_stats(VacAttrStatsP stats,
                         deltafrac;

             /* Sort the MCV items into position order to speed next loop */
-            qsort((void *) track, num_mcv,
-                  sizeof(ScalarMCVItem), compare_mcvs);
+            qsort_interruptible((void *) track, num_mcv, sizeof(ScalarMCVItem),
+                                compare_mcvs, NULL);

             /*
              * Collapse out the MCV items from the values[] array.
@@ -2882,7 +2883,7 @@ compute_scalar_stats(VacAttrStatsP stats,
 }

 /*
- * qsort_arg comparator for sorting ScalarItems
+ * Comparator for sorting ScalarItems
  *
  * Aside from sorting the items, we update the tupnoLink[] array
  * whenever two ScalarItems are found to contain equal datums.  The array
@@ -2919,10 +2920,10 @@ compare_scalars(const void *a, const void *b, void *arg)
 }

 /*
- * qsort comparator for sorting ScalarMCVItems by position
+ * Comparator for sorting ScalarMCVItems by position
  */
 static int
-compare_mcvs(const void *a, const void *b)
+compare_mcvs(const void *a, const void *b, void *arg)
 {
     int            da = ((const ScalarMCVItem *) a)->first;
     int            db = ((const ScalarMCVItem *) b)->first;
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 7c02fb279f..d2aa8d0ca3 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1129,8 +1129,8 @@ build_sorted_items(StatsBuildData *data, int *nitems,
     }

     /* do the sort, using the multi-sort */
-    qsort_arg((void *) items, nrows, sizeof(SortItem),
-              multi_sort_compare, mss);
+    qsort_interruptible((void *) items, nrows, sizeof(SortItem),
+                        multi_sort_compare, mss);

     return items;
 }
diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 1ef3034428..f10642df4f 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -404,7 +404,7 @@ count_distinct_groups(int numrows, SortItem *items, MultiSortSupport mss)
  *        order.
  */
 static int
-compare_sort_item_count(const void *a, const void *b)
+compare_sort_item_count(const void *a, const void *b, void *arg)
 {
     SortItem   *ia = (SortItem *) a;
     SortItem   *ib = (SortItem *) b;
@@ -457,8 +457,8 @@ build_distinct_groups(int numrows, SortItem *items, MultiSortSupport mss,
     Assert(j + 1 == ngroups);

     /* Sort the distinct groups by frequency (in descending order). */
-    pg_qsort((void *) groups, ngroups, sizeof(SortItem),
-             compare_sort_item_count);
+    qsort_interruptible((void *) groups, ngroups, sizeof(SortItem),
+                        compare_sort_item_count, NULL);

     *ndistinct = ngroups;
     return groups;
@@ -528,8 +528,8 @@ build_column_frequencies(SortItem *groups, int ngroups,
         }

         /* sort the values, deduplicate */
-        qsort_arg((void *) result[dim], ngroups, sizeof(SortItem),
-                  sort_item_compare, ssup);
+        qsort_interruptible((void *) result[dim], ngroups, sizeof(SortItem),
+                            sort_item_compare, ssup);

         /*
          * Identify distinct values, compute frequency (there might be
@@ -696,8 +696,8 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats)

         PrepareSortSupportFromOrderingOp(typentry->lt_opr, &ssup[dim]);

-        qsort_arg(values[dim], counts[dim], sizeof(Datum),
-                  compare_scalars_simple, &ssup[dim]);
+        qsort_interruptible(values[dim], counts[dim], sizeof(Datum),
+                            compare_scalars_simple, &ssup[dim]);

         /*
          * Walk through the array and eliminate duplicate values, but keep the
diff --git a/src/backend/statistics/mvdistinct.c b/src/backend/statistics/mvdistinct.c
index 6ade5eff78..9b216af517 100644
--- a/src/backend/statistics/mvdistinct.c
+++ b/src/backend/statistics/mvdistinct.c
@@ -489,8 +489,8 @@ ndistinct_for_combination(double totalrows, StatsBuildData *data,
     }

     /* We can sort the array now ... */
-    qsort_arg((void *) items, numrows, sizeof(SortItem),
-              multi_sort_compare, mss);
+    qsort_interruptible((void *) items, numrows, sizeof(SortItem),
+                        multi_sort_compare, mss);

     /* ... and count the number of distinct combinations */

diff --git a/src/backend/tsearch/ts_typanalyze.c b/src/backend/tsearch/ts_typanalyze.c
index 3e1a61b56d..e771a7cd62 100644
--- a/src/backend/tsearch/ts_typanalyze.c
+++ b/src/backend/tsearch/ts_typanalyze.c
@@ -44,8 +44,10 @@ static void prune_lexemes_hashtable(HTAB *lexemes_tab, int b_current);
 static uint32 lexeme_hash(const void *key, Size keysize);
 static int    lexeme_match(const void *key1, const void *key2, Size keysize);
 static int    lexeme_compare(const void *key1, const void *key2);
-static int    trackitem_compare_frequencies_desc(const void *e1, const void *e2);
-static int    trackitem_compare_lexemes(const void *e1, const void *e2);
+static int    trackitem_compare_frequencies_desc(const void *e1, const void *e2,
+                                               void *arg);
+static int    trackitem_compare_lexemes(const void *e1, const void *e2,
+                                      void *arg);


 /*
@@ -347,8 +349,8 @@ compute_tsvector_stats(VacAttrStats *stats,
          */
         if (num_mcelem < track_len)
         {
-            qsort(sort_table, track_len, sizeof(TrackItem *),
-                  trackitem_compare_frequencies_desc);
+            qsort_interruptible(sort_table, track_len, sizeof(TrackItem *),
+                                trackitem_compare_frequencies_desc, NULL);
             /* reset minfreq to the smallest frequency we're keeping */
             minfreq = sort_table[num_mcelem - 1]->frequency;
         }
@@ -376,8 +378,8 @@ compute_tsvector_stats(VacAttrStats *stats,
              * presorted we can employ binary search for that.  See
              * ts_selfuncs.c for a real usage scenario.
              */
-            qsort(sort_table, num_mcelem, sizeof(TrackItem *),
-                  trackitem_compare_lexemes);
+            qsort_interruptible(sort_table, num_mcelem, sizeof(TrackItem *),
+                                trackitem_compare_lexemes, NULL);

             /* Must copy the target values into anl_context */
             old_context = MemoryContextSwitchTo(stats->anl_context);
@@ -510,10 +512,10 @@ lexeme_compare(const void *key1, const void *key2)
 }

 /*
- *    qsort() comparator for sorting TrackItems on frequencies (descending sort)
+ *    Comparator for sorting TrackItems on frequencies (descending sort)
  */
 static int
-trackitem_compare_frequencies_desc(const void *e1, const void *e2)
+trackitem_compare_frequencies_desc(const void *e1, const void *e2, void *arg)
 {
     const TrackItem *const *t1 = (const TrackItem *const *) e1;
     const TrackItem *const *t2 = (const TrackItem *const *) e2;
@@ -522,10 +524,10 @@ trackitem_compare_frequencies_desc(const void *e1, const void *e2)
 }

 /*
- *    qsort() comparator for sorting TrackItems on lexemes
+ *    Comparator for sorting TrackItems on lexemes
  */
 static int
-trackitem_compare_lexemes(const void *e1, const void *e2)
+trackitem_compare_lexemes(const void *e1, const void *e2, void *arg)
 {
     const TrackItem *const *t1 = (const TrackItem *const *) e1;
     const TrackItem *const *t2 = (const TrackItem *const *) e2;
diff --git a/src/backend/utils/adt/array_typanalyze.c b/src/backend/utils/adt/array_typanalyze.c
index ab1d69b2f4..1964cedd93 100644
--- a/src/backend/utils/adt/array_typanalyze.c
+++ b/src/backend/utils/adt/array_typanalyze.c
@@ -86,9 +86,9 @@ static void prune_element_hashtable(HTAB *elements_tab, int b_current);
 static uint32 element_hash(const void *key, Size keysize);
 static int    element_match(const void *key1, const void *key2, Size keysize);
 static int    element_compare(const void *key1, const void *key2);
-static int    trackitem_compare_frequencies_desc(const void *e1, const void *e2);
-static int    trackitem_compare_element(const void *e1, const void *e2);
-static int    countitem_compare_count(const void *e1, const void *e2);
+static int    trackitem_compare_frequencies_desc(const void *e1, const void *e2, void *arg);
+static int    trackitem_compare_element(const void *e1, const void *e2, void *arg);
+static int    countitem_compare_count(const void *e1, const void *e2, void *arg);


 /*
@@ -502,8 +502,8 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
          */
         if (num_mcelem < track_len)
         {
-            qsort(sort_table, track_len, sizeof(TrackItem *),
-                  trackitem_compare_frequencies_desc);
+            qsort_interruptible(sort_table, track_len, sizeof(TrackItem *),
+                                trackitem_compare_frequencies_desc, NULL);
             /* reset minfreq to the smallest frequency we're keeping */
             minfreq = sort_table[num_mcelem - 1]->frequency;
         }
@@ -522,8 +522,8 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
              * the element type's default comparison function.  This permits
              * fast binary searches in selectivity estimation functions.
              */
-            qsort(sort_table, num_mcelem, sizeof(TrackItem *),
-                  trackitem_compare_element);
+            qsort_interruptible(sort_table, num_mcelem, sizeof(TrackItem *),
+                                trackitem_compare_element, NULL);

             /* Must copy the target values into anl_context */
             old_context = MemoryContextSwitchTo(stats->anl_context);
@@ -599,8 +599,9 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
             {
                 sorted_count_items[j++] = count_item;
             }
-            qsort(sorted_count_items, count_items_count,
-                  sizeof(DECountItem *), countitem_compare_count);
+            qsort_interruptible(sorted_count_items, count_items_count,
+                                sizeof(DECountItem *),
+                                countitem_compare_count, NULL);

             /*
              * Prepare to fill stanumbers with the histogram, followed by the
@@ -751,10 +752,10 @@ element_compare(const void *key1, const void *key2)
 }

 /*
- * qsort() comparator for sorting TrackItems by frequencies (descending sort)
+ * Comparator for sorting TrackItems by frequencies (descending sort)
  */
 static int
-trackitem_compare_frequencies_desc(const void *e1, const void *e2)
+trackitem_compare_frequencies_desc(const void *e1, const void *e2, void *arg)
 {
     const TrackItem *const *t1 = (const TrackItem *const *) e1;
     const TrackItem *const *t2 = (const TrackItem *const *) e2;
@@ -763,10 +764,10 @@ trackitem_compare_frequencies_desc(const void *e1, const void *e2)
 }

 /*
- * qsort() comparator for sorting TrackItems by element values
+ * Comparator for sorting TrackItems by element values
  */
 static int
-trackitem_compare_element(const void *e1, const void *e2)
+trackitem_compare_element(const void *e1, const void *e2, void *arg)
 {
     const TrackItem *const *t1 = (const TrackItem *const *) e1;
     const TrackItem *const *t2 = (const TrackItem *const *) e2;
@@ -775,10 +776,10 @@ trackitem_compare_element(const void *e1, const void *e2)
 }

 /*
- * qsort() comparator for sorting DECountItems by count
+ * Comparator for sorting DECountItems by count
  */
 static int
-countitem_compare_count(const void *e1, const void *e2)
+countitem_compare_count(const void *e1, const void *e2, void *arg)
 {
     const DECountItem *const *t1 = (const DECountItem *const *) e1;
     const DECountItem *const *t2 = (const DECountItem *const *) e2;
diff --git a/src/backend/utils/adt/rangetypes_typanalyze.c b/src/backend/utils/adt/rangetypes_typanalyze.c
index 2b77e03449..2043d3f98b 100644
--- a/src/backend/utils/adt/rangetypes_typanalyze.c
+++ b/src/backend/utils/adt/rangetypes_typanalyze.c
@@ -32,7 +32,7 @@
 #include "utils/rangetypes.h"
 #include "utils/multirangetypes.h"

-static int    float8_qsort_cmp(const void *a1, const void *a2);
+static int    float8_qsort_cmp(const void *a1, const void *a2, void *arg);
 static int    range_bound_qsort_cmp(const void *a1, const void *a2, void *arg);
 static void compute_range_stats(VacAttrStats *stats,
                                 AnalyzeAttrFetchFunc fetchfunc, int samplerows,
@@ -93,7 +93,7 @@ multirange_typanalyze(PG_FUNCTION_ARGS)
  * Comparison function for sorting float8s, used for range lengths.
  */
 static int
-float8_qsort_cmp(const void *a1, const void *a2)
+float8_qsort_cmp(const void *a1, const void *a2, void *arg)
 {
     const float8 *f1 = (const float8 *) a1;
     const float8 *f2 = (const float8 *) a2;
@@ -280,10 +280,10 @@ compute_range_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
         if (non_empty_cnt >= 2)
         {
             /* Sort bound values */
-            qsort_arg(lowers, non_empty_cnt, sizeof(RangeBound),
-                      range_bound_qsort_cmp, typcache);
-            qsort_arg(uppers, non_empty_cnt, sizeof(RangeBound),
-                      range_bound_qsort_cmp, typcache);
+            qsort_interruptible(lowers, non_empty_cnt, sizeof(RangeBound),
+                                range_bound_qsort_cmp, typcache);
+            qsort_interruptible(uppers, non_empty_cnt, sizeof(RangeBound),
+                                range_bound_qsort_cmp, typcache);

             num_hist = non_empty_cnt;
             if (num_hist > num_bins)
@@ -345,7 +345,8 @@ compute_range_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
              * Ascending sort of range lengths for further filling of
              * histogram
              */
-            qsort(lengths, non_empty_cnt, sizeof(float8), float8_qsort_cmp);
+            qsort_interruptible(lengths, non_empty_cnt, sizeof(float8),
+                                float8_qsort_cmp, NULL);

             num_hist = non_empty_cnt;
             if (num_hist > num_bins)
diff --git a/src/backend/utils/sort/Makefile b/src/backend/utils/sort/Makefile
index 26f65fcaf7..2c31fd453d 100644
--- a/src/backend/utils/sort/Makefile
+++ b/src/backend/utils/sort/Makefile
@@ -16,6 +16,7 @@ override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)

 OBJS = \
     logtape.o \
+    qsort_interruptible.o \
     sharedtuplestore.o \
     sortsupport.o \
     tuplesort.o \
diff --git a/src/backend/utils/sort/qsort_interruptible.c b/src/backend/utils/sort/qsort_interruptible.c
new file mode 100644
index 0000000000..f179b25624
--- /dev/null
+++ b/src/backend/utils/sort/qsort_interruptible.c
@@ -0,0 +1,16 @@
+/*
+ *    qsort_interruptible.c: qsort_arg that includes CHECK_FOR_INTERRUPTS
+ */
+
+#include "postgres.h"
+#include "miscadmin.h"
+
+#define ST_SORT qsort_interruptible
+#define ST_ELEMENT_TYPE_VOID
+#define ST_COMPARATOR_TYPE_NAME qsort_arg_comparator
+#define ST_COMPARE_RUNTIME_POINTER
+#define ST_COMPARE_ARG_TYPE void
+#define ST_SCOPE
+#define ST_DEFINE
+#define ST_CHECK_FOR_INTERRUPTS
+#include "lib/sort_template.h"
diff --git a/src/include/port.h b/src/include/port.h
index 12c05b5d9f..e647f62b77 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -499,6 +499,9 @@ typedef int (*qsort_arg_comparator) (const void *a, const void *b, void *arg);
 extern void qsort_arg(void *base, size_t nel, size_t elsize,
                       qsort_arg_comparator cmp, void *arg);

+extern void qsort_interruptible(void *base, size_t nel, size_t elsize,
+                                qsort_arg_comparator cmp, void *arg);
+
 extern void *bsearch_arg(const void *key, const void *base,
                          size_t nmemb, size_t size,
                          int (*compar) (const void *, const void *, void *),

Re: should check interrupts in BuildRelationExtStatistics ?

From
Tom Lane
Date:
I wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
>> On Wed, Jul 6, 2022 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> qsort_interruptible

>> +1

> So here's a patch that does it that way.

Hearing no comments, pushed.

            regards, tom lane



Re: should check interrupts in BuildRelationExtStatistics ?

From
Zhihong Yu
Date:


On Tue, Jul 12, 2022 at 1:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
>> On Wed, Jul 6, 2022 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> qsort_interruptible

>> +1

> So here's a patch that does it that way.

Hearing no comments, pushed.

                        regards, tom lane


Hi,
Looking at the files under src/backend/utils/sort/, looks like license header is missing from qsort_interruptible.c

Please consider the patch which adds license header to qsort_interruptible.c

Cheers
Attachment

Re: should check interrupts in BuildRelationExtStatistics ?

From
Tom Lane
Date:
Zhihong Yu <zyu@yugabyte.com> writes:
> Looking at the files under src/backend/utils/sort/, looks like license
> header is missing from qsort_interruptible.c

[ shrug ... ] qsort.c and qsort_arg.c don't have that either.

            regards, tom lane