Thread: [PATCH] Use correct types and limits for PL/Perl SPI query results

[PATCH] Use correct types and limits for PL/Perl SPI query results

From
ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Date:
Hi hackers,

Commit 23a27b039d94ba359286694831eafe03cd970eef changed the type of
numbers-of-tuples-processed counters to uint64 and adjusted various PLs
to cope with this.  I noticed the PL/Perl changes did not take full
advantage of what Perl is capable of handling, so here's a patch that
improves that.

1) Perl's integers are at least pointer-sized and either signed or
   unsigned, so can potentially hold up to 2⁶⁴-1. Floating point numbers
   can also be larger than double (up to 128bit), allowing for exact
   representation of integers beyond 2⁵³.  Hence, adjust the setting of
   the "processed" hash item to use UV_MAX for the limit and (NV) or
   (UV) for the casts.

2) Perl 5.20 and later use SSize_t for array indices, so can cope with
   up to SSize_t_max items in an array (if you have the memory).

3) To be able to actually return such result sets from sp_exec_query(),
   I had to change the repalloc() in spi_printtup() to repalloc_huge().

--
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen

>From f6d38f1a5c7d6135dd5f580aa8ed38bb74162a0f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Sun, 13 Mar 2016 01:21:43 +0000
Subject: [PATCH] Use correct types and limits for PL/Perl SPI query results

Perl's integers are pointer-sized, so can hold more than INT_MAX on LP64
platforms, and come in both signed (IV) and unsigned (UV).  Floating
point values (NV) may also be larger than double.

Since Perl 5.19.4 array indices are SSize_t instead of I32, so allow up
to SSize_t_max on those versions.  The limit is not imposed just by
av_extend's argument type, but all the array handling code, so remove
the speculative comment.

Finally, change spi_printtup() to use repalloc_huge() to be able to
return a tuple table of more than 1GiB.
---
 src/backend/executor/spi.c |  2 +-
 src/pl/plperl/plperl.c     | 13 ++++---------
 src/pl/plperl/plperl.h     |  7 +++++++
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 3d04c23..fd94179 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1800,7 +1800,7 @@ spi_printtup(TupleTableSlot *slot, DestReceiver *self)
         /* Double the size of the pointer array */
         tuptable->free = tuptable->alloced;
         tuptable->alloced += tuptable->free;
-        tuptable->vals = (HeapTuple *) repalloc(tuptable->vals,
+        tuptable->vals = (HeapTuple *) repalloc_huge(tuptable->vals,
                                       tuptable->alloced * sizeof(HeapTuple));
     }

diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index 269f7f3..d405179 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -3104,9 +3104,9 @@ plperl_spi_execute_fetch_result(SPITupleTable *tuptable, uint64 processed,
     hv_store_string(result, "status",
                     cstr2sv(SPI_result_code_string(status)));
     hv_store_string(result, "processed",
-                    (processed > (uint64) INT_MAX) ?
-                    newSVnv((double) processed) :
-                    newSViv((int) processed));
+                    (processed > (uint64) UV_MAX) ?
+                    newSVnv((NV) processed) :
+                    newSVuv((UV) processed));

     if (status > 0 && tuptable)
     {
@@ -3114,12 +3114,7 @@ plperl_spi_execute_fetch_result(SPITupleTable *tuptable, uint64 processed,
         SV           *row;
         uint64        i;

-        /*
-         * av_extend's 2nd argument is declared I32.  It's possible we could
-         * nonetheless push more than INT_MAX elements into a Perl array, but
-         * let's just fail instead of trying.
-         */
-        if (processed > (uint64) INT_MAX)
+        if (processed > AV_SIZE_MAX)
             ereport(ERROR,
                     (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
             errmsg("query result has too many rows to fit in a Perl array")));
diff --git a/src/pl/plperl/plperl.h b/src/pl/plperl/plperl.h
index 9179bbd..2f376ee 100644
--- a/src/pl/plperl/plperl.h
+++ b/src/pl/plperl/plperl.h
@@ -100,4 +100,11 @@ void        plperl_spi_freeplan(char *);
 void        plperl_spi_cursor_close(char *);
 char       *plperl_sv_to_literal(SV *, char *);

+/* Perl 5.19.4 changed array indices from I32 to SSize_t */
+#if PERL_BCDVERSION >= 0x5019004
+#define AV_SIZE_MAX ((uint64) SSize_t_MAX)
+#else
+#define AV_SIZE_MAX ((uint64) I32_MAX)
+#endif
+
 #endif   /* PL_PERL_H */
--
2.7.0


Re: [PATCH] Use correct types and limits for PL/Perl SPI query results

From
Tom Lane
Date:
ilmari@ilmari.org (Dagfinn Ilmari Mannsåker) writes:
> 1) Perl's integers are at least pointer-sized and either signed or
>    unsigned, so can potentially hold up to 2⁶⁴-1. Floating point numbers
>    can also be larger than double (up to 128bit), allowing for exact
>    representation of integers beyond 2⁵³.  Hence, adjust the setting of
>    the "processed" hash item to use UV_MAX for the limit and (NV) or
>    (UV) for the casts.

I thought about using UV where feasible, but it was not clear to me
whether unsigned numbers behave semantically differently from signed ones
in Perl.  If they do, the change you suggest would create a small semantic
change in the behavior of code using spi_exec_query.  I'm not sure it's
worth taking any risk of that, considering we already discourage people
from using spi_exec_query for large results.

> 2) Perl 5.20 and later use SSize_t for array indices, so can cope with
>    up to SSize_t_max items in an array (if you have the memory).

I don't much like having such hard-wired knowledge about Perl versions
in our code.  Is there a way to identify this change other than
#if PERL_BCDVERSION >= 0x5019004 ?

> 3) To be able to actually return such result sets from sp_exec_query(),
>    I had to change the repalloc() in spi_printtup() to repalloc_huge().

Hmm, good point.  I wonder if there are any hazards to doing that.
        regards, tom lane



Re: [PATCH] Use correct types and limits for PL/Perl SPI query results

From
ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> ilmari@ilmari.org (Dagfinn Ilmari Mannsåker) writes:
>> 1) Perl's integers are at least pointer-sized and either signed or
>>    unsigned, so can potentially hold up to 2⁶⁴-1. Floating point numbers
>>    can also be larger than double (up to 128bit), allowing for exact
>>    representation of integers beyond 2⁵³.  Hence, adjust the setting of
>>    the "processed" hash item to use UV_MAX for the limit and (NV) or
>>    (UV) for the casts.
>
> I thought about using UV where feasible, but it was not clear to me
> whether unsigned numbers behave semantically differently from signed ones
> in Perl.  If they do, the change you suggest would create a small semantic
> change in the behavior of code using spi_exec_query.  I'm not sure it's
> worth taking any risk of that, considering we already discourage people
> from using spi_exec_query for large results.

IVs and UVs are semantically equivalent, and Perl will automatically
convert between them (and NVs) as necessary, i.e. when crossing
IV_MAX/UV_MAX/IV_MIN.

>> 2) Perl 5.20 and later use SSize_t for array indices, so can cope with
>>    up to SSize_t_max items in an array (if you have the memory).
>
> I don't much like having such hard-wired knowledge about Perl versions
> in our code.  Is there a way to identify this change other than
> #if PERL_BCDVERSION >= 0x5019004 ?

There isn't, unfortunately.  I could add e.g. AVidx_t and _MAX defines,
but that wouldn't be available until 5.26 (May 2017) at the earliest,
since full code freeze for May's 5.24 is next week.

>> 3) To be able to actually return such result sets from sp_exec_query(),
>>    I had to change the repalloc() in spi_printtup() to repalloc_huge().
>
> Hmm, good point.  I wonder if there are any hazards to doing that.

One hazard would be that people not heeding the warning in
spi_exec_query will get a visit from the OOM killer (or death by swap)
instead of a friendly "invalid memory alloc request size".

>             regards, tom lane

ilmari

-- 
- Twitter seems more influential [than blogs] in the 'gets reported in the mainstream press' sense at least.
  - Matt McLeod
 
- That'd be because the content of a tweet is easier to condense down to a mainstream media article.
 - Calle Dybedahl
 



Re: [PATCH] Use correct types and limits for PL/Perl SPI query results

From
Tom Lane
Date:
ilmari@ilmari.org (Dagfinn Ilmari Mannsåker) writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> I thought about using UV where feasible, but it was not clear to me
>> whether unsigned numbers behave semantically differently from signed ones
>> in Perl.  If they do, the change you suggest would create a small semantic
>> change in the behavior of code using spi_exec_query.  I'm not sure it's
>> worth taking any risk of that, considering we already discourage people
>> from using spi_exec_query for large results.

> IVs and UVs are semantically equivalent, and Perl will automatically
> convert between them (and NVs) as necessary, i.e. when crossing
> IV_MAX/UV_MAX/IV_MIN.

OK, thanks for the authoritative statement.

>> I don't much like having such hard-wired knowledge about Perl versions
>> in our code.  Is there a way to identify this change other than
>> #if PERL_BCDVERSION >= 0x5019004 ?

> There isn't, unfortunately.

Sigh ... it was worth asking anyway.

>>> 3) To be able to actually return such result sets from sp_exec_query(),
>>> I had to change the repalloc() in spi_printtup() to repalloc_huge().

>> Hmm, good point.  I wonder if there are any hazards to doing that.

> One hazard would be that people not heeding the warning in
> spi_exec_query will get a visit from the OOM killer (or death by swap)
> instead of a friendly "invalid memory alloc request size".

Yeah.  But there are plenty of other ways to drive a backend into swap
hell, and the whole point of this change is to eliminate arbitrary
boundaries on query result size.  So let's do it.

Thanks for the patch and discussion!
        regards, tom lane