Thread: should check interrupts in BuildRelationExtStatistics ?
Restarting a large instance took twice as long as I expected due to not checking interrupts in (at least) statext_ndistinct_build. Long enough that I attached (and was able to attach) a debugger to verify, which I think is too long. I think it could cause issues for an high-availability cluster or other script if it takes too long to shut down. The tables being auto-analyzed have 9 exteneded stats objects, each with stats target=10. 7 of those are (ndistinct) stats on 4 simple columns plus 1 expression (5 total). And the other 2 stats objects are expressional stats (necessarily on a single expression).
On Sun, May 08, 2022 at 07:01:08PM -0500, Justin Pryzby wrote: > Restarting a large instance took twice as long as I expected due to not > checking interrupts in (at least) statext_ndistinct_build. Long enough that I > attached (and was able to attach) a debugger to verify, which I think is too > long. I think it could cause issues for an high-availability cluster or other > script if it takes too long to shut down. Hmm. That's annoying. > The tables being auto-analyzed have 9 exteneded stats objects, each with stats > target=10. 7 of those are (ndistinct) stats on 4 simple columns plus 1 > expression (5 total). And the other 2 stats objects are expressional stats > (necessarily on a single expression). How long can the backend remain unresponsive? I don't think that anybody would object to the addition of some CHECK_FOR_INTERRUPTS() in areas where it would be efficient to make the shutdown quicker, but we need to think carefully about the places where we'd want to add these. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > How long can the backend remain unresponsive? I don't think that > anybody would object to the addition of some CHECK_FOR_INTERRUPTS() in > areas where it would be efficient to make the shutdown quicker, but > we need to think carefully about the places where we'd want to add > these. CHECK_FOR_INTERRUPTS is really quite cheap, just a test-and-branch. I wouldn't put it in a *very* tight loop, but one test per row processed while gathering stats is unlikely to be a problem. regards, tom lane
On Sun, May 8, 2022 at 11:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael@paquier.xyz> writes: > > How long can the backend remain unresponsive? I don't think that > > anybody would object to the addition of some CHECK_FOR_INTERRUPTS() in > > areas where it would be efficient to make the shutdown quicker, but > > we need to think carefully about the places where we'd want to add > > these. > > CHECK_FOR_INTERRUPTS is really quite cheap, just a test-and-branch. > I wouldn't put it in a *very* tight loop, but one test per row > processed while gathering stats is unlikely to be a problem. +1. If we're finding things stalling that would be fixed by adding CHECK_FOR_INTERRUPTS(), we should generally just add it. In the unlikely event that this causes a performance problem, we can try to figure out some other solution, but not responding to interrupts isn't the right way to economize. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, May 09, 2022 at 09:11:37AM -0400, Robert Haas wrote: > On Sun, May 8, 2022 at 11:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Michael Paquier <michael@paquier.xyz> writes: > > > How long can the backend remain unresponsive? I don't think that > > > anybody would object to the addition of some CHECK_FOR_INTERRUPTS() in > > > areas where it would be efficient to make the shutdown quicker, but > > > we need to think carefully about the places where we'd want to add > > > these. > > > > CHECK_FOR_INTERRUPTS is really quite cheap, just a test-and-branch. > > I wouldn't put it in a *very* tight loop, but one test per row > > processed while gathering stats is unlikely to be a problem. > > +1. If we're finding things stalling that would be fixed by adding > CHECK_FOR_INTERRUPTS(), we should generally just add it. In the > unlikely event that this causes a performance problem, we can try to > figure out some other solution, but not responding to interrupts isn't > the right way to economize. Reproduce the problem for ndistinct and dependencies like: DROP TABLE t; CREATE TABLE t AS SELECT i A,1+i B,2+i C,3+i D,4+i E,5+i F, now() AS ts FROM generate_series(1.0, 99999.0)i;VACUUM t; DROP STATISTICS stxx; CREATE STATISTICS stxx (ndistinct) ON mod(a,14),mod(b,15),mod(c,16),mod(d,17),mod(e,18),mod(f,19) FROMt; ANALYZE VERBOSE t; Maybe this should actually call vacuum_delay_point(), like compute_scalar_stats(). For MCV, there seems to be no issue, since those functions are being called (but only for expressional stats). But maybe I've just failed to make a large enough, non-expressional MCV list for the problem to be apparent. The patch is WIP, but whatever we end up with should probably be backpatched at least to v14, where expressional indexes were introduced, since they're likely to have more columns, and are slower to compute. diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c index e8f71567b4e..e5538dcd4e1 100644 --- a/src/backend/statistics/dependencies.c +++ b/src/backend/statistics/dependencies.c @@ -19,6 +19,7 @@ #include "catalog/pg_statistic_ext.h" #include "catalog/pg_statistic_ext_data.h" #include "lib/stringinfo.h" +#include "miscadmin.h" #include "nodes/nodeFuncs.h" #include "nodes/nodes.h" #include "nodes/pathnodes.h" @@ -383,6 +384,8 @@ statext_dependencies_build(StatsBuildData *data) MVDependency *d; MemoryContext oldcxt; + CHECK_FOR_INTERRUPTS(); + /* release memory used by dependency degree calculation */ oldcxt = MemoryContextSwitchTo(cxt); diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c index dd67b19b6fa..9db1d0325cd 100644 --- a/src/backend/statistics/mcv.c +++ b/src/backend/statistics/mcv.c @@ -269,6 +269,8 @@ statext_mcv_build(StatsBuildData *data, double totalrows, int stattarget) SortItem **freqs; int *nfreqs; + // CHECK_FOR_INTERRUPTS(); + /* used to search values */ tmp = (MultiSortSupport) palloc(offsetof(MultiSortSupportData, ssup) + sizeof(SortSupportData)); diff --git a/src/backend/statistics/mvdistinct.c b/src/backend/statistics/mvdistinct.c index 6ade5eff78c..3b739ab7ca0 100644 --- a/src/backend/statistics/mvdistinct.c +++ b/src/backend/statistics/mvdistinct.c @@ -29,6 +29,7 @@ #include "catalog/pg_statistic_ext.h" #include "catalog/pg_statistic_ext_data.h" #include "lib/stringinfo.h" +#include "miscadmin.h" #include "statistics/extended_stats_internal.h" #include "statistics/statistics.h" #include "utils/fmgrprotos.h" @@ -114,6 +115,8 @@ statext_ndistinct_build(double totalrows, StatsBuildData *data) MVNDistinctItem *item = &result->items[itemcnt]; int j; + CHECK_FOR_INTERRUPTS(); + item->attributes = palloc(sizeof(AttrNumber) * k); item->nattributes = k;
On Fri, Jun 03, 2022 at 10:28:37AM -0500, Justin Pryzby wrote: > Maybe this should actually call vacuum_delay_point(), like > compute_scalar_stats(). I think vacuum_delay_point() would be wrong for these cases, since they don't call "fetchfunc()", like the other places which use vacuum_delay_point. > For MCV, there seems to be no issue, since those > functions are being called (but only for expressional stats). But maybe I've > just failed to make a large enough, non-expressional MCV list for the problem > to be apparent. I reproduced the issue with MCV like this: DROP TABLE IF EXISTS t; CREATE TABLE t AS SELECT a::text,b::text,c::text,d::text,e::text,f::text,g::text FROM generate_series(1000001,1000006)a,generate_series(1000001,1000006)b,generate_series(1000001,1000006)c,generate_series(1000001,1000006)d,generate_series(1000001,1000006)e,generate_series(1000001,1000006)f,generate_series(1000001,1000006)g,generate_series(1000001,1000006)h; VACUUMt; DROP STATISTICS IF EXISTS stxx; CREATE STATISTICS stxx (mcv) ON a,b,c,d,e,f FROM t; ALTER STATISTICS stxx SET STATISTICS9999; ANALYZE VERBOSE t; This is slow (25 seconds) inside qsort: (gdb) bt #0 __memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:1020 #1 0x00005653d8686fac in varstrfastcmp_locale (a1p=0x5653dce67d54 "1000004~", len1=7, a2p=0x5653e895ffa4 "1000004~", len2=7,ssup=ssup@entry=0x5653d98c37b8) at varlena.c:2444 #2 0x00005653d8687161 in varlenafastcmp_locale (x=94918188367184, y=94918384418720, ssup=0x5653d98c37b8) at varlena.c:2270 #3 0x00005653d85134d8 in ApplySortComparator (ssup=0x5653d98c37b8, isNull2=<optimized out>, datum2=<optimized out>, isNull1=<optimizedout>, datum1=<optimized out>) at ../../../src/include/utils/sortsupport.h:224 #4 multi_sort_compare (a=0x7fa587b44e58, b=0x7fa5875f0dd0, arg=0x5653d98c37b0) at extended_stats.c:903 #5 0x00005653d8712eed in qsort_arg (data=data@entry=0x7fa5875f0050, n=<optimized out>, n@entry=1679616, element_size=element_size@entry=24,compare=compare@entry=0x5653d8513483 <multi_sort_compare>, arg=arg@entry=0x5653d98c37b0) at ../../src/include/lib/sort_template.h:349 #6 0x00005653d851415f in build_sorted_items (data=data@entry=0x7fa58f2e1050, nitems=nitems@entry=0x7ffe4f764e5c, mss=mss@entry=0x5653d98c37b0,numattrs=6, attnums=0x7fa58f2e1078) at extended_stats.c:1134 #7 0x00005653d8515d84 in statext_mcv_build (data=data@entry=0x7fa58f2e1050, totalrows=totalrows@entry=1679616, stattarget=stattarget@entry=9999)at mcv.c:204 #8 0x00005653d8513819 in BuildRelationExtStatistics (onerel=onerel@entry=0x7fa5b26ef658, inh=inh@entry=false, totalrows=1679616,numrows=numrows@entry=1679616, rows=rows@entry=0x7fa5a4103050, natts=natts@entry=7, vacattrstats=vacattrstats@entry=0x5653d98b76b0) at extended_stats.c:213 The fix seems to be to CHECK_FOR_INTERRUPTS() within multi_sort_compare(). That would supercede the other two CHECK_FOR_INTERRUPTS I'd proposed, and handle mcv, depends, and ndistinct all at once. Does that sound right ? For MCV, there's also ~0.6sec spent in build_column_frequencies(), which (if needed) would be addressed by adding CFI in sort_item_compare.
On Sat, Jun 04, 2022 at 08:42:33PM -0500, Justin Pryzby wrote: > The fix seems to be to CHECK_FOR_INTERRUPTS() within multi_sort_compare(). > That would supercede the other two CHECK_FOR_INTERRUPTS I'd proposed, and > handle mcv, depends, and ndistinct all at once. Hmm. I have to admit that adding a CFI() in multi_sort_compare() stresses me a bit as it is dependent on the number of rows involved, and it can be used as a qsort routine. -- Michael
Attachment
On Mon, Jun 06, 2022 at 04:23:34PM +0900, Michael Paquier wrote: > On Sat, Jun 04, 2022 at 08:42:33PM -0500, Justin Pryzby wrote: > > The fix seems to be to CHECK_FOR_INTERRUPTS() within multi_sort_compare(). > > That would supercede the other two CHECK_FOR_INTERRUPTS I'd proposed, and > > handle mcv, depends, and ndistinct all at once. > > Hmm. I have to admit that adding a CFI() in multi_sort_compare() > stresses me a bit as it is dependent on the number of rows involved, > and it can be used as a qsort routine. That's exactly the problem for which I showed a backtrace - it took 10s of seconds to do qsort, which is (uh) a human timescale and too long to be unresponsive, even if I create on a table with many rows a stats object with a lot of columns and a high stats target.
Attachment
Justin Pryzby <pryzby@telsasoft.com> writes: > On Mon, Jun 06, 2022 at 04:23:34PM +0900, Michael Paquier wrote: >> Hmm. I have to admit that adding a CFI() in multi_sort_compare() >> stresses me a bit as it is dependent on the number of rows involved, >> and it can be used as a qsort routine. > That's exactly the problem for which I showed a backtrace - it took 10s of > seconds to do qsort, which is (uh) a human timescale and too long to be > unresponsive, even if I create on a table with many rows a stats object with a > lot of columns and a high stats target. Hmm. On my machine, the example last shown upthread takes about 9 seconds, which I agree is a mighty long time to be unresponsive --- but it appears that fully half of that elapses before we reach multi_sort_compare for the first time. The first half of the ANALYZE run does seem to contain some CFI calls, but they are not exactly thick on the ground there either. So I'm feeling like this patch isn't ambitious enough. I tried interrupting at a random point and then stepping, and look what I hit after just a couple of steps: (gdb) s qsort_arg (data=data@entry=0x13161410, n=<optimized out>, n@entry=1679616, element_size=element_size@entry=16, compare=compare@entry=0x649450 <compare_scalars>, arg=arg@entry=0x7ffec539c0f0) at ../../src/include/lib/sort_template.h:353 353 if (r == 0) (gdb) 358 pc -= ST_POINTER_STEP; (gdb) 359 DO_CHECK_FOR_INTERRUPTS(); That, um, piqued my interest. After a bit of digging, I modestly propose the attached. I'm not sure if it's okay to back-patch this, because maybe someone out there is relying on qsort() to be incapable of throwing an error --- but it'd solve the problem at hand and a bunch of other issues of the same ilk. regards, tom lane diff --git a/src/port/qsort.c b/src/port/qsort.c index 7879e6cd56..34b99aac08 100644 --- a/src/port/qsort.c +++ b/src/port/qsort.c @@ -2,7 +2,12 @@ * qsort.c: standard quicksort algorithm */ -#include "c.h" +#ifndef FRONTEND +#include "postgres.h" +#include "miscadmin.h" +#else +#include "postgres_fe.h" +#endif #define ST_SORT pg_qsort #define ST_ELEMENT_TYPE_VOID @@ -10,6 +15,11 @@ #define ST_SCOPE #define ST_DECLARE #define ST_DEFINE + +#ifndef FRONTEND +#define ST_CHECK_FOR_INTERRUPTS +#endif + #include "lib/sort_template.h" /* diff --git a/src/port/qsort_arg.c b/src/port/qsort_arg.c index fa7e11a3b8..685ab32913 100644 --- a/src/port/qsort_arg.c +++ b/src/port/qsort_arg.c @@ -2,7 +2,12 @@ * qsort_arg.c: qsort with a passthrough "void *" argument */ -#include "c.h" +#ifndef FRONTEND +#include "postgres.h" +#include "miscadmin.h" +#else +#include "postgres_fe.h" +#endif #define ST_SORT qsort_arg #define ST_ELEMENT_TYPE_VOID @@ -11,4 +16,9 @@ #define ST_COMPARE_ARG_TYPE void #define ST_SCOPE #define ST_DEFINE + +#ifndef FRONTEND +#define ST_CHECK_FOR_INTERRUPTS +#endif + #include "lib/sort_template.h"
On Fri, Jul 01, 2022 at 07:19:11PM -0400, Tom Lane wrote: > I tried interrupting at a random point and then stepping, and > look what I hit after just a couple of steps: I'd come up with the trick of setting SET backtrace_functions='ProcessInterrupts'; > That, um, piqued my interest. After a bit of digging, > I modestly propose the attached. I'm not sure if it's > okay to back-patch this, because maybe someone out there > is relying on qsort() to be incapable of throwing an error > --- but it'd solve the problem at hand and a bunch of other > issues of the same ilk. Confirmed this fixes the 3 types of extended stats, at least for the example cases I made. If it's okay to backpatch, v14 seems adequate since the problem is more prominent with expressional statistics (I'm sure that's why I hit it) ... otherwise v15 would do. -- Justin
Justin Pryzby <pryzby@telsasoft.com> writes: > On Fri, Jul 01, 2022 at 07:19:11PM -0400, Tom Lane wrote: >> That, um, piqued my interest. After a bit of digging, >> I modestly propose the attached. I'm not sure if it's >> okay to back-patch this, because maybe someone out there >> is relying on qsort() to be incapable of throwing an error >> --- but it'd solve the problem at hand and a bunch of other >> issues of the same ilk. > Confirmed this fixes the 3 types of extended stats, at least for the example > cases I made. > If it's okay to backpatch, v14 seems adequate since the problem is more > prominent with expressional statistics (I'm sure that's why I hit it) ... > otherwise v15 would do. After thinking for awhile, my inclination is to apply my patch in HEAD and yours in v15/v14. We have a year to find out if there's any problem with the more invasive check if we do it in HEAD; but there's a lot less margin for error in v14, and not that much in v15 either. regards, tom lane
Hi, On 2022-07-01 19:19:11 -0400, Tom Lane wrote: > That, um, piqued my interest. After a bit of digging, > I modestly propose the attached. I'm not sure if it's > okay to back-patch this, because maybe someone out there > is relying on qsort() to be incapable of throwing an error > --- but it'd solve the problem at hand and a bunch of other > issues of the same ilk. I'm worried about this. Interrupting random qsorts all over seems like it could end up corrupting state. We do things like qsort in building snapshots etc. Are we confident that we handle interrupts reliably in all those places? Greetings, Andres Freund
I wrote: >>> I modestly propose the attached. I'm not sure if it's >>> okay to back-patch this, because maybe someone out there >>> is relying on qsort() to be incapable of throwing an error I thought I'd better try to check that, and I soon found several places that *are* relying on qsort() to not be any smarter than the libc version. Notably, guc.c qsorts its persistent-state GUC array during add_guc_variable --- an interrupt there would leave the GUC data structure in an inconsistent state. There are lesser hazards, typically memory leaks, elsewhere. So I fear this can't fly as-is. Nonetheless, it'd be a good idea to use an interruptible sort in as many places as we can. If we don't mind YA copy of the qsort code, I'd be inclined to propose inventing qsort_interruptible which is like qsort_arg but also does CHECK_FOR_INTERRUPTS. (There seems no reason to support a non-arg version, since you can just pass NULL.) Or we could redefine qsort_arg as allowing interrupts, but that might still carry some surprises for existing code. regards, tom lane
Andres Freund <andres@anarazel.de> writes: > I'm worried about this. Interrupting random qsorts all over seems like it > could end up corrupting state. We do things like qsort in building snapshots > etc. Are we confident that we handle interrupts reliably in all those places? Nope ... see my followup just now. regards, tom lane
On Tue, Jul 05, 2022 at 07:37:03PM -0400, Tom Lane wrote: > Nonetheless, it'd be a good idea to use an interruptible sort in > as many places as we can. If we don't mind YA copy of the qsort > code, I'd be inclined to propose inventing qsort_interruptible > which is like qsort_arg but also does CHECK_FOR_INTERRUPTS. > (There seems no reason to support a non-arg version, since you > can just pass NULL.) Or we could redefine qsort_arg as allowing > interrupts, but that might still carry some surprises for existing > code. Agreed to use a new and different API for this purpose. It seems like a tool where one could write some new code and use an interruptible qsort() thinking that it is fine, still a preperly-documented API has clear benefits. -- Michael
Attachment
On Wed, Jul 6, 2022 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > qsort_interruptible +1 FWIW compute_scalar_stats() was already noted as hot and perhaps a candidate for specialisation (with more study needed), but qsort_interruptible() seems like a sane use of ~4KB of text to me.
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Jul 6, 2022 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> qsort_interruptible > +1 So here's a patch that does it that way. I first meant to put the new file into src/port/, but after remembering that that directory has no backend-only functions, I went with src/backend/utils/sort/ instead. For the moment I contented myself with changing qsort[_arg] calls that occur during statistics collection. We have a lot more of course, but many of them can be expected to not be dealing with much data, and in some cases we might want some closer analysis to be sure there's no performance hit. So I'm inclined to not worry too much about the remaining qsort calls until somebody complains. This could be back-patched to v14 without much worry, I should think. regards, tom lane diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 07613c38b5..a7966fff83 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -100,7 +100,7 @@ static VacAttrStats *examine_attribute(Relation onerel, int attnum, static int acquire_sample_rows(Relation onerel, int elevel, HeapTuple *rows, int targrows, double *totalrows, double *totaldeadrows); -static int compare_rows(const void *a, const void *b); +static int compare_rows(const void *a, const void *b, void *arg); static int acquire_inherited_sample_rows(Relation onerel, int elevel, HeapTuple *rows, int targrows, double *totalrows, double *totaldeadrows); @@ -1306,7 +1306,8 @@ acquire_sample_rows(Relation onerel, int elevel, * tuples are already sorted. */ if (numrows == targrows) - qsort((void *) rows, numrows, sizeof(HeapTuple), compare_rows); + qsort_interruptible((void *) rows, numrows, sizeof(HeapTuple), + compare_rows, NULL); /* * Estimate total numbers of live and dead rows in relation, extrapolating @@ -1342,10 +1343,10 @@ acquire_sample_rows(Relation onerel, int elevel, } /* - * qsort comparator for sorting rows[] array + * Comparator for sorting rows[] array */ static int -compare_rows(const void *a, const void *b) +compare_rows(const void *a, const void *b, void *arg) { HeapTuple ha = *(const HeapTuple *) a; HeapTuple hb = *(const HeapTuple *) b; @@ -1836,7 +1837,7 @@ static void compute_scalar_stats(VacAttrStatsP stats, int samplerows, double totalrows); static int compare_scalars(const void *a, const void *b, void *arg); -static int compare_mcvs(const void *a, const void *b); +static int compare_mcvs(const void *a, const void *b, void *arg); static int analyze_mcv_list(int *mcv_counts, int num_mcv, double stadistinct, @@ -2473,8 +2474,8 @@ compute_scalar_stats(VacAttrStatsP stats, /* Sort the collected values */ cxt.ssup = &ssup; cxt.tupnoLink = tupnoLink; - qsort_arg((void *) values, values_cnt, sizeof(ScalarItem), - compare_scalars, (void *) &cxt); + qsort_interruptible((void *) values, values_cnt, sizeof(ScalarItem), + compare_scalars, (void *) &cxt); /* * Now scan the values in order, find the most common ones, and also @@ -2718,8 +2719,8 @@ compute_scalar_stats(VacAttrStatsP stats, deltafrac; /* Sort the MCV items into position order to speed next loop */ - qsort((void *) track, num_mcv, - sizeof(ScalarMCVItem), compare_mcvs); + qsort_interruptible((void *) track, num_mcv, sizeof(ScalarMCVItem), + compare_mcvs, NULL); /* * Collapse out the MCV items from the values[] array. @@ -2882,7 +2883,7 @@ compute_scalar_stats(VacAttrStatsP stats, } /* - * qsort_arg comparator for sorting ScalarItems + * Comparator for sorting ScalarItems * * Aside from sorting the items, we update the tupnoLink[] array * whenever two ScalarItems are found to contain equal datums. The array @@ -2919,10 +2920,10 @@ compare_scalars(const void *a, const void *b, void *arg) } /* - * qsort comparator for sorting ScalarMCVItems by position + * Comparator for sorting ScalarMCVItems by position */ static int -compare_mcvs(const void *a, const void *b) +compare_mcvs(const void *a, const void *b, void *arg) { int da = ((const ScalarMCVItem *) a)->first; int db = ((const ScalarMCVItem *) b)->first; diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c index 7c02fb279f..d2aa8d0ca3 100644 --- a/src/backend/statistics/extended_stats.c +++ b/src/backend/statistics/extended_stats.c @@ -1129,8 +1129,8 @@ build_sorted_items(StatsBuildData *data, int *nitems, } /* do the sort, using the multi-sort */ - qsort_arg((void *) items, nrows, sizeof(SortItem), - multi_sort_compare, mss); + qsort_interruptible((void *) items, nrows, sizeof(SortItem), + multi_sort_compare, mss); return items; } diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c index 1ef3034428..f10642df4f 100644 --- a/src/backend/statistics/mcv.c +++ b/src/backend/statistics/mcv.c @@ -404,7 +404,7 @@ count_distinct_groups(int numrows, SortItem *items, MultiSortSupport mss) * order. */ static int -compare_sort_item_count(const void *a, const void *b) +compare_sort_item_count(const void *a, const void *b, void *arg) { SortItem *ia = (SortItem *) a; SortItem *ib = (SortItem *) b; @@ -457,8 +457,8 @@ build_distinct_groups(int numrows, SortItem *items, MultiSortSupport mss, Assert(j + 1 == ngroups); /* Sort the distinct groups by frequency (in descending order). */ - pg_qsort((void *) groups, ngroups, sizeof(SortItem), - compare_sort_item_count); + qsort_interruptible((void *) groups, ngroups, sizeof(SortItem), + compare_sort_item_count, NULL); *ndistinct = ngroups; return groups; @@ -528,8 +528,8 @@ build_column_frequencies(SortItem *groups, int ngroups, } /* sort the values, deduplicate */ - qsort_arg((void *) result[dim], ngroups, sizeof(SortItem), - sort_item_compare, ssup); + qsort_interruptible((void *) result[dim], ngroups, sizeof(SortItem), + sort_item_compare, ssup); /* * Identify distinct values, compute frequency (there might be @@ -696,8 +696,8 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats) PrepareSortSupportFromOrderingOp(typentry->lt_opr, &ssup[dim]); - qsort_arg(values[dim], counts[dim], sizeof(Datum), - compare_scalars_simple, &ssup[dim]); + qsort_interruptible(values[dim], counts[dim], sizeof(Datum), + compare_scalars_simple, &ssup[dim]); /* * Walk through the array and eliminate duplicate values, but keep the diff --git a/src/backend/statistics/mvdistinct.c b/src/backend/statistics/mvdistinct.c index 6ade5eff78..9b216af517 100644 --- a/src/backend/statistics/mvdistinct.c +++ b/src/backend/statistics/mvdistinct.c @@ -489,8 +489,8 @@ ndistinct_for_combination(double totalrows, StatsBuildData *data, } /* We can sort the array now ... */ - qsort_arg((void *) items, numrows, sizeof(SortItem), - multi_sort_compare, mss); + qsort_interruptible((void *) items, numrows, sizeof(SortItem), + multi_sort_compare, mss); /* ... and count the number of distinct combinations */ diff --git a/src/backend/tsearch/ts_typanalyze.c b/src/backend/tsearch/ts_typanalyze.c index 3e1a61b56d..e771a7cd62 100644 --- a/src/backend/tsearch/ts_typanalyze.c +++ b/src/backend/tsearch/ts_typanalyze.c @@ -44,8 +44,10 @@ static void prune_lexemes_hashtable(HTAB *lexemes_tab, int b_current); static uint32 lexeme_hash(const void *key, Size keysize); static int lexeme_match(const void *key1, const void *key2, Size keysize); static int lexeme_compare(const void *key1, const void *key2); -static int trackitem_compare_frequencies_desc(const void *e1, const void *e2); -static int trackitem_compare_lexemes(const void *e1, const void *e2); +static int trackitem_compare_frequencies_desc(const void *e1, const void *e2, + void *arg); +static int trackitem_compare_lexemes(const void *e1, const void *e2, + void *arg); /* @@ -347,8 +349,8 @@ compute_tsvector_stats(VacAttrStats *stats, */ if (num_mcelem < track_len) { - qsort(sort_table, track_len, sizeof(TrackItem *), - trackitem_compare_frequencies_desc); + qsort_interruptible(sort_table, track_len, sizeof(TrackItem *), + trackitem_compare_frequencies_desc, NULL); /* reset minfreq to the smallest frequency we're keeping */ minfreq = sort_table[num_mcelem - 1]->frequency; } @@ -376,8 +378,8 @@ compute_tsvector_stats(VacAttrStats *stats, * presorted we can employ binary search for that. See * ts_selfuncs.c for a real usage scenario. */ - qsort(sort_table, num_mcelem, sizeof(TrackItem *), - trackitem_compare_lexemes); + qsort_interruptible(sort_table, num_mcelem, sizeof(TrackItem *), + trackitem_compare_lexemes, NULL); /* Must copy the target values into anl_context */ old_context = MemoryContextSwitchTo(stats->anl_context); @@ -510,10 +512,10 @@ lexeme_compare(const void *key1, const void *key2) } /* - * qsort() comparator for sorting TrackItems on frequencies (descending sort) + * Comparator for sorting TrackItems on frequencies (descending sort) */ static int -trackitem_compare_frequencies_desc(const void *e1, const void *e2) +trackitem_compare_frequencies_desc(const void *e1, const void *e2, void *arg) { const TrackItem *const *t1 = (const TrackItem *const *) e1; const TrackItem *const *t2 = (const TrackItem *const *) e2; @@ -522,10 +524,10 @@ trackitem_compare_frequencies_desc(const void *e1, const void *e2) } /* - * qsort() comparator for sorting TrackItems on lexemes + * Comparator for sorting TrackItems on lexemes */ static int -trackitem_compare_lexemes(const void *e1, const void *e2) +trackitem_compare_lexemes(const void *e1, const void *e2, void *arg) { const TrackItem *const *t1 = (const TrackItem *const *) e1; const TrackItem *const *t2 = (const TrackItem *const *) e2; diff --git a/src/backend/utils/adt/array_typanalyze.c b/src/backend/utils/adt/array_typanalyze.c index ab1d69b2f4..1964cedd93 100644 --- a/src/backend/utils/adt/array_typanalyze.c +++ b/src/backend/utils/adt/array_typanalyze.c @@ -86,9 +86,9 @@ static void prune_element_hashtable(HTAB *elements_tab, int b_current); static uint32 element_hash(const void *key, Size keysize); static int element_match(const void *key1, const void *key2, Size keysize); static int element_compare(const void *key1, const void *key2); -static int trackitem_compare_frequencies_desc(const void *e1, const void *e2); -static int trackitem_compare_element(const void *e1, const void *e2); -static int countitem_compare_count(const void *e1, const void *e2); +static int trackitem_compare_frequencies_desc(const void *e1, const void *e2, void *arg); +static int trackitem_compare_element(const void *e1, const void *e2, void *arg); +static int countitem_compare_count(const void *e1, const void *e2, void *arg); /* @@ -502,8 +502,8 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc, */ if (num_mcelem < track_len) { - qsort(sort_table, track_len, sizeof(TrackItem *), - trackitem_compare_frequencies_desc); + qsort_interruptible(sort_table, track_len, sizeof(TrackItem *), + trackitem_compare_frequencies_desc, NULL); /* reset minfreq to the smallest frequency we're keeping */ minfreq = sort_table[num_mcelem - 1]->frequency; } @@ -522,8 +522,8 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc, * the element type's default comparison function. This permits * fast binary searches in selectivity estimation functions. */ - qsort(sort_table, num_mcelem, sizeof(TrackItem *), - trackitem_compare_element); + qsort_interruptible(sort_table, num_mcelem, sizeof(TrackItem *), + trackitem_compare_element, NULL); /* Must copy the target values into anl_context */ old_context = MemoryContextSwitchTo(stats->anl_context); @@ -599,8 +599,9 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc, { sorted_count_items[j++] = count_item; } - qsort(sorted_count_items, count_items_count, - sizeof(DECountItem *), countitem_compare_count); + qsort_interruptible(sorted_count_items, count_items_count, + sizeof(DECountItem *), + countitem_compare_count, NULL); /* * Prepare to fill stanumbers with the histogram, followed by the @@ -751,10 +752,10 @@ element_compare(const void *key1, const void *key2) } /* - * qsort() comparator for sorting TrackItems by frequencies (descending sort) + * Comparator for sorting TrackItems by frequencies (descending sort) */ static int -trackitem_compare_frequencies_desc(const void *e1, const void *e2) +trackitem_compare_frequencies_desc(const void *e1, const void *e2, void *arg) { const TrackItem *const *t1 = (const TrackItem *const *) e1; const TrackItem *const *t2 = (const TrackItem *const *) e2; @@ -763,10 +764,10 @@ trackitem_compare_frequencies_desc(const void *e1, const void *e2) } /* - * qsort() comparator for sorting TrackItems by element values + * Comparator for sorting TrackItems by element values */ static int -trackitem_compare_element(const void *e1, const void *e2) +trackitem_compare_element(const void *e1, const void *e2, void *arg) { const TrackItem *const *t1 = (const TrackItem *const *) e1; const TrackItem *const *t2 = (const TrackItem *const *) e2; @@ -775,10 +776,10 @@ trackitem_compare_element(const void *e1, const void *e2) } /* - * qsort() comparator for sorting DECountItems by count + * Comparator for sorting DECountItems by count */ static int -countitem_compare_count(const void *e1, const void *e2) +countitem_compare_count(const void *e1, const void *e2, void *arg) { const DECountItem *const *t1 = (const DECountItem *const *) e1; const DECountItem *const *t2 = (const DECountItem *const *) e2; diff --git a/src/backend/utils/adt/rangetypes_typanalyze.c b/src/backend/utils/adt/rangetypes_typanalyze.c index 2b77e03449..2043d3f98b 100644 --- a/src/backend/utils/adt/rangetypes_typanalyze.c +++ b/src/backend/utils/adt/rangetypes_typanalyze.c @@ -32,7 +32,7 @@ #include "utils/rangetypes.h" #include "utils/multirangetypes.h" -static int float8_qsort_cmp(const void *a1, const void *a2); +static int float8_qsort_cmp(const void *a1, const void *a2, void *arg); static int range_bound_qsort_cmp(const void *a1, const void *a2, void *arg); static void compute_range_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc, int samplerows, @@ -93,7 +93,7 @@ multirange_typanalyze(PG_FUNCTION_ARGS) * Comparison function for sorting float8s, used for range lengths. */ static int -float8_qsort_cmp(const void *a1, const void *a2) +float8_qsort_cmp(const void *a1, const void *a2, void *arg) { const float8 *f1 = (const float8 *) a1; const float8 *f2 = (const float8 *) a2; @@ -280,10 +280,10 @@ compute_range_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc, if (non_empty_cnt >= 2) { /* Sort bound values */ - qsort_arg(lowers, non_empty_cnt, sizeof(RangeBound), - range_bound_qsort_cmp, typcache); - qsort_arg(uppers, non_empty_cnt, sizeof(RangeBound), - range_bound_qsort_cmp, typcache); + qsort_interruptible(lowers, non_empty_cnt, sizeof(RangeBound), + range_bound_qsort_cmp, typcache); + qsort_interruptible(uppers, non_empty_cnt, sizeof(RangeBound), + range_bound_qsort_cmp, typcache); num_hist = non_empty_cnt; if (num_hist > num_bins) @@ -345,7 +345,8 @@ compute_range_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc, * Ascending sort of range lengths for further filling of * histogram */ - qsort(lengths, non_empty_cnt, sizeof(float8), float8_qsort_cmp); + qsort_interruptible(lengths, non_empty_cnt, sizeof(float8), + float8_qsort_cmp, NULL); num_hist = non_empty_cnt; if (num_hist > num_bins) diff --git a/src/backend/utils/sort/Makefile b/src/backend/utils/sort/Makefile index 26f65fcaf7..2c31fd453d 100644 --- a/src/backend/utils/sort/Makefile +++ b/src/backend/utils/sort/Makefile @@ -16,6 +16,7 @@ override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS) OBJS = \ logtape.o \ + qsort_interruptible.o \ sharedtuplestore.o \ sortsupport.o \ tuplesort.o \ diff --git a/src/backend/utils/sort/qsort_interruptible.c b/src/backend/utils/sort/qsort_interruptible.c new file mode 100644 index 0000000000..f179b25624 --- /dev/null +++ b/src/backend/utils/sort/qsort_interruptible.c @@ -0,0 +1,16 @@ +/* + * qsort_interruptible.c: qsort_arg that includes CHECK_FOR_INTERRUPTS + */ + +#include "postgres.h" +#include "miscadmin.h" + +#define ST_SORT qsort_interruptible +#define ST_ELEMENT_TYPE_VOID +#define ST_COMPARATOR_TYPE_NAME qsort_arg_comparator +#define ST_COMPARE_RUNTIME_POINTER +#define ST_COMPARE_ARG_TYPE void +#define ST_SCOPE +#define ST_DEFINE +#define ST_CHECK_FOR_INTERRUPTS +#include "lib/sort_template.h" diff --git a/src/include/port.h b/src/include/port.h index 12c05b5d9f..e647f62b77 100644 --- a/src/include/port.h +++ b/src/include/port.h @@ -499,6 +499,9 @@ typedef int (*qsort_arg_comparator) (const void *a, const void *b, void *arg); extern void qsort_arg(void *base, size_t nel, size_t elsize, qsort_arg_comparator cmp, void *arg); +extern void qsort_interruptible(void *base, size_t nel, size_t elsize, + qsort_arg_comparator cmp, void *arg); + extern void *bsearch_arg(const void *key, const void *base, size_t nmemb, size_t size, int (*compar) (const void *, const void *, void *),
I wrote: > Thomas Munro <thomas.munro@gmail.com> writes: >> On Wed, Jul 6, 2022 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> qsort_interruptible >> +1 > So here's a patch that does it that way. Hearing no comments, pushed. regards, tom lane
On Tue, Jul 12, 2022 at 1:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
>> On Wed, Jul 6, 2022 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> qsort_interruptible
>> +1
> So here's a patch that does it that way.
Hearing no comments, pushed.
regards, tom lane
Hi,
Looking at the files under src/backend/utils/sort/, looks like license header is missing from qsort_interruptible.c
Please consider the patch which adds license header to qsort_interruptible.c
Cheers
Attachment
Zhihong Yu <zyu@yugabyte.com> writes: > Looking at the files under src/backend/utils/sort/, looks like license > header is missing from qsort_interruptible.c [ shrug ... ] qsort.c and qsort_arg.c don't have that either. regards, tom lane