Thread: putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

From
"Charles Duffy"
Date:
Hi,

We came up with this patch in response to a problem reported to us by a
client. They had a query which took an unacceptably long time to respond
to a cancel request (SIGINT). The client uses 8.1.4, so the patch is
against that.

Their work_mem setting was rather large (1000000). We determined that when it
received SIGINT, the backend was always inside qsort(), so it wouldn't
call ProcessInterrupts() again until it finished this large in-memory
sort. Upon entering tuplesort_performsort(), state->memtupcount was
29247.

The patch puts a CHECK_FOR_INTERRUPTS in qsort_comparetup. This solves
their problem, at the cost of checking InterruptPending a lot. The
"unlikely()" gcc directive might help there a bit.

I'm not sure if this patch has general applicability - but it seems to
solve the problem for our client. Does anyone think it might introduce
any new problems?

Thanks,


--
Charles Duffy

Attachment

Re: putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

From
"Qingqing Zhou"
Date:
""Charles Duffy"" <charles.duffy@gmail.com> wrote
>
> We came up with this patch in response to a problem reported to us by a
> client. They had a query which took an unacceptably long time to respond
> to a cancel request (SIGINT). The client uses 8.1.4, so the patch is
> against that.
>
How long is that "unacceptably long time"?

> Their work_mem setting was rather large (1000000). We determined that when
it
> received SIGINT, the backend was always inside qsort(), so it wouldn't
> call ProcessInterrupts() again until it finished this large in-memory
> sort. Upon entering tuplesort_performsort(), state->memtupcount was
> 29247.
>

I agree that we may need to consider to let qsort() check interrupts, but
the problem here is that 29247 doesn't look like a big number so I can't see
why your patch solved the problem, unless the qsort_comparetup() function of
the data type eats too many circles or the cpu is too slow.  I just did a
test to invoke a qsort on an "integer" field of a table with 5 million rows,
and sent a SIGINT, the delay is 7 or 8 seconds. I suspect there are some
other places doesn't check interrupts -- what's your query plan?

Regards,
Qingqing



Re: putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

From
"Charles Duffy"
Date:
Qingqing,

On 7/12/06, Qingqing Zhou <zhouqq@cs.toronto.edu> wrote:

> >
> How long is that "unacceptably long time"?
>

30 seconds.

> the problem here is that 29247 doesn't look like a big number so I can't see
> why your patch solved the problem, unless the qsort_comparetup() function of
> the data type eats too many circles or the cpu is too slow.

Well...when exhibiting the problem, execution stays inside qsort() for
the entire 'delay period' - I don't think its off in some other recess
which is also lacking in interrupt flag checking.

As for the CPU - the machine is a 4 way Opteron, and otherwise
performs well. It does seem odd that the in-memory sort takes as long
as it does - the plan may suggest a reason.

And - the patched version doesn't exhibit the problem.

> I just did a
> test to invoke a qsort on an "integer" field of a table with 5 million rows,
> and sent a SIGINT, the delay is 7 or 8 seconds. I suspect there are some
> other places doesn't check interrupts -- what's your query plan?

Plan attached.

Thanks,

Charles Duffy

Attachment

Re: putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

From
Tom Lane
Date:
"Charles Duffy" <charles.duffy@gmail.com> writes:
> On 7/12/06, Qingqing Zhou <zhouqq@cs.toronto.edu> wrote:
>> the problem here is that 29247 doesn't look like a big number so I can't see
>> why your patch solved the problem, unless the qsort_comparetup() function of
>> the data type eats too many circles or the cpu is too slow.

> Well...when exhibiting the problem, execution stays inside qsort() for
> the entire 'delay period' - I don't think its off in some other recess
> which is also lacking in interrupt flag checking.
> As for the CPU - the machine is a 4 way Opteron, and otherwise
> performs well. It does seem odd that the in-memory sort takes as long
> as it does - the plan may suggest a reason.

Well, there's something awfully fishy here.  Compare the two lower-level
sorts in your EXPLAIN output:

  ->  Sort  (cost=2909.44..2944.94 rows=14201 width=320) (actual time=78196.698..78239.799 rows=29247 loops=1)
        Sort Key: record, commr1, envr1, docin, creat, flati, flato, doc, docst, vlord, vl0, vl1, vl2, vl3, vl4, vl5,
vl6,vl7, vl8, vl9 
        ->  Append  (cost=0.00..1930.02 rows=14201 width=320) (actual time=0.052..396.577 rows=29247 loops=1)

  ->  Sort  (cost=403.42..403.59 rows=71 width=320) (actual time=318.727..339.305 rows=10932 loops=1)
        Sort Key: record, commr1, envr1, docin, creat, flati, flato, doc, docst, vlord, vl0, vl1, vl2, vl3, vl4, vl5,
vl6,vl7, vl8, vl9 
        ->  Append  (cost=78.88..401.23 rows=71 width=320) (actual time=5.197..192.868 rows=10932 loops=1)

The first one took 77843.222 msec to sort 29247 rows, the second took
146.437 msec to sort 10932 rows with what appears to be the same sort
key.  One of these things is not like the other ...

What are the data types of the sort columns?  Is there anything much
different in the statistics of the two subqueries --- for example,
maybe one produces a lot of output that only differs in the rightmost
sort columns, where the other has keys that always differ in a leading
column?

I forget if you are running 8.1 or not, but if you are, turning on
trace_sort might produce useful information.

            regards, tom lane

Re: putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

From
Tom Lane
Date:
"Charles Duffy" <charles.duffy@gmail.com> writes:
> [ CHECK_FOR_INTERRUPTS inside qsort comparison routine ]

It occurs to me that there's a nonzero risk of problems here, because if
the interrupt occurs qsort() will lose control.  I'm wondering whether
there are any implementations of qsort() that allocate memory and then
release it before returning.  If so, interrupting the sort would lead to
memory leaked permanently (till backend exit anyway).

We might have to just tolerate this, but if it occurs on a lot of
platforms I'd have second thoughts about applying the patch.  Anyone
familiar with the internals of glibc's qsort, in particular?

            regards, tom lane

Re: [HACKERS] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

From
Peter Eisentraut
Date:
Tom Lane wrote:
> We might have to just tolerate this, but if it occurs on a lot of
> platforms I'd have second thoughts about applying the patch.  Anyone
> familiar with the internals of glibc's qsort, in particular?

Doesn't look like it's allocating any nonlocal memory:

http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/qsort.c?rev=1.12&content-type=text/x-cvsweb-markup&cvsroot=glibc

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: [HACKERS] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane wrote:
>> We might have to just tolerate this, but if it occurs on a lot of
>> platforms I'd have second thoughts about applying the patch.  Anyone
>> familiar with the internals of glibc's qsort, in particular?

> Doesn't look like it's allocating any nonlocal memory:

> http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/qsort.c?rev=1.12&content-type=text/x-cvsweb-markup&cvsroot=glibc

But this file defines _quicksort() not qsort().  I was under the
impression that the latter is actually a mergesort in glibc ...

            regards, tom lane

Re: [HACKERS] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

From
Peter Eisentraut
Date:
Tom Lane wrote:
> > Doesn't look like it's allocating any nonlocal memory:
> >
> > http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/qsort.c?rev=1.
> >12&content-type=text/x-cvsweb-markup&cvsroot=glibc
>
> But this file defines _quicksort() not qsort().  I was under the
> impression that the latter is actually a mergesort in glibc ...

The merge sort is here:

http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/msort.c?rev=1.21&content-type=text/x-cvsweb-markup&cvsroot=glibc

It uses alloca, so we're good here.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: [HACKERS] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> The merge sort is here:

> http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/msort.c?rev=1.21&content-type=text/x-cvsweb-markup&cvsroot=glibc

> It uses alloca, so we're good here.

Uh ... but it also uses malloc, and potentially a honkin' big malloc at
that (up to a quarter of physical RAM).  So I'm worried again.

Anyway, Qingqing's question still needs to be answered: how can a sort
of under 30k items take so long?

            regards, tom lane

Re: [HACKERS] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

From
"Charles Duffy"
Date:
On 7/15/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Anyway, Qingqing's question still needs to be answered: how can a sort
> of under 30k items take so long?
>

It happens because (as previously suggested by Tom) the dataset for
the 'short' (~10k rows, .3 sec) sort has no rows whose leftmost fields
evaluate to 'equal' when passed to the qsort compare function. The
'long' sort, (~30k rows, 78 sec) has plenty of rows whose first 6
columns all evaluate as 'equal' when the rows are compared.

For the 'long' data, the compare moves on rightward until it
encounters 'flato', which is a TEXT column with an average length of
7.5k characters (with some rows up to 400k). The first 6 columns are
mostly INTEGER, so compares on them are relatively inexpensive. All
the expensive compares on 'flato' account for the disproportionate
difference in sort times, relative to the number of rows in each set.

As for the potential for memory leaks - thinking about it.

Thanks,

Charles Duffy.

> Peter Eisentraut <peter_e@gmx.net> writes:
> > The merge sort is here:
>
> >
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/stdlib/msort.c?rev=1.21&content-type=text/x-cvsweb-markup&cvsroot=glibc
>
> > It uses alloca, so we're good here.
>
> Uh ... but it also uses malloc, and potentially a honkin' big malloc at
> that (up to a quarter of physical RAM).  So I'm worried again.
>
> Anyway, Qingqing's question still needs to be answered: how can a sort
> of under 30k items take so long?
>
>                         regards, tom lane
>

Attachment

Re: [HACKERS] putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

From
Tom Lane
Date:
"Charles Duffy" <charles.duffy@gmail.com> writes:
> ... For the 'long' data, the compare moves on rightward until it
> encounters 'flato', which is a TEXT column with an average length of
> 7.5k characters (with some rows up to 400k). The first 6 columns are
> mostly INTEGER, so compares on them are relatively inexpensive. All
> the expensive compares on 'flato' account for the disproportionate
> difference in sort times, relative to the number of rows in each set.

Yeah, and it's not just that it's text either.  At those sizes, all
the values will be toasted, which means each compare is paying the
price of fetching multiple rows from the toast table.  And decompressing
them too, no doubt.  These costs are most likely swamping the actual
strcoll() (not that that's not bad enough compared to int4cmp).

We could probably tweak the sorting code to forcibly detoast sort keys
before beginning the sort, but I'm not entirely convinced that would be
a win: reading and writing enormous sort keys won't be cheap either.

Meanwhile, for a cheap solution: do you really need to sort on flato
at all?  Maybe sorting on substr(flato,1,100) would be good enough?

            regards, tom lane

Re: [HACKERS] putting CHECK_FOR_INTERRUPTS in

From
Bruce Momjian
Date:
Are we done with the sort interrupt issue mentioned in the subject line,
and the issue outlined below?

---------------------------------------------------------------------------

Tom Lane wrote:
> "Charles Duffy" <charles.duffy@gmail.com> writes:
> > ... For the 'long' data, the compare moves on rightward until it
> > encounters 'flato', which is a TEXT column with an average length of
> > 7.5k characters (with some rows up to 400k). The first 6 columns are
> > mostly INTEGER, so compares on them are relatively inexpensive. All
> > the expensive compares on 'flato' account for the disproportionate
> > difference in sort times, relative to the number of rows in each set.
>
> Yeah, and it's not just that it's text either.  At those sizes, all
> the values will be toasted, which means each compare is paying the
> price of fetching multiple rows from the toast table.  And decompressing
> them too, no doubt.  These costs are most likely swamping the actual
> strcoll() (not that that's not bad enough compared to int4cmp).
>
> We could probably tweak the sorting code to forcibly detoast sort keys
> before beginning the sort, but I'm not entirely convinced that would be
> a win: reading and writing enormous sort keys won't be cheap either.
>
> Meanwhile, for a cheap solution: do you really need to sort on flato
> at all?  Maybe sorting on substr(flato,1,100) would be good enough?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] putting CHECK_FOR_INTERRUPTS in

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Are we done with the sort interrupt issue mentioned in the subject line,
> and the issue outlined below?

I'm inclined not to apply the proposed patch (adding
CHECK_FOR_INTERRUPTS) because of the risk of memory leakage inside
qsort.  OTOH you could argue that there's an unfixable risk of memory
leakage there anyway, because it's always possible that the invoked
datatype comparison routine exits with elog(ERROR) for some reason,
or even contains a CHECK_FOR_INTERRUPTS call itself.  Comments?

As for the question of whether we should try to detoast sort keys before
sorting, I'd suggest adding that to TODO.  Investigating whether this
would be a good idea will take more time than we have for 8.2, so it's
gonna have to wait for a future cycle.

            regards, tom lane

Re: [HACKERS] putting CHECK_FOR_INTERRUPTS in

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Are we done with the sort interrupt issue mentioned in the subject line,
> > and the issue outlined below?
>
> I'm inclined not to apply the proposed patch (adding
> CHECK_FOR_INTERRUPTS) because of the risk of memory leakage inside
> qsort.  OTOH you could argue that there's an unfixable risk of memory
> leakage there anyway, because it's always possible that the invoked
> datatype comparison routine exits with elog(ERROR) for some reason,
> or even contains a CHECK_FOR_INTERRUPTS call itself.  Comments?

OK, we do check somewhere during sorting, I assume.  I can't imagine a
single qsort() call taking all that long because we do them in batches
anyway.

> As for the question of whether we should try to detoast sort keys before
> sorting, I'd suggest adding that to TODO.  Investigating whether this
> would be a good idea will take more time than we have for 8.2, so it's
> gonna have to wait for a future cycle.

Added to TODO:

    * Consider detoasting keys before sorting

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: putting CHECK_FOR_INTERRUPTS in qsort_comparetup()

From
Tom Lane
Date:
"Charles Duffy" <charles.duffy@gmail.com> writes:
> The patch puts a CHECK_FOR_INTERRUPTS in qsort_comparetup.

Just to close out this thread: this is now done for 8.2, per discussion
here:
http://archives.postgresql.org/pgsql-hackers/2006-10/msg00144.php

            regards, tom lane