Thread: count_nulls(VARIADIC "any")

count_nulls(VARIADIC "any")

From

Marko Tiikkaja

Date:

12 August 2015, 17:18:52

Hi,

I'd like to suggest $SUBJECT for inclusion in Postgres 9.6.  I'm sure
everyone would've found it useful at some point in their lives, and the
fact that it can't be properly implemented in any language other than C
I think speaks for the fact that we as a project should provide it.

A quick and dirty proof of concept (patch attached):

=# select count_nulls(null::int, null::text, 17, 'bar');
  count_nulls
-------------
            2
(1 row)

Its natural habitat would be CHECK constraints, e.g:

   CHECK (count_nulls(a,b,c) IN (0, 3))

Will finish this up for the next CF, unless someone wants to tell me how
stupid this idea is before that.


.m

Attachment

count_nulls.patch

Re: count_nulls(VARIADIC "any")

From

Greg Stark

Date:

12 August 2015, 17:24:34

On Wed, Aug 12, 2015 at 6:18 PM, Marko Tiikkaja <marko@joh.to> wrote:
> Will finish this up for the next CF, unless someone wants to tell me how
> stupid this idea is before that.

I'm kind of puzzled what kind of schema would need this.

-- 
greg

Re: count_nulls(VARIADIC "any")

From

Alvaro Herrera

Date:

12 August 2015, 17:30:55

Greg Stark wrote:
> On Wed, Aug 12, 2015 at 6:18 PM, Marko Tiikkaja <marko@joh.to> wrote:
> > Will finish this up for the next CF, unless someone wants to tell me how
> > stupid this idea is before that.
> 
> I'm kind of puzzled what kind of schema would need this.

I've seen cases where you want some entity to be of either of some
number of types.  In those cases you have nullable FKs in separate
columns, and only one of them can be non null.

The name count_nulls() suggest an aggregate function to me, though.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: count_nulls(VARIADIC "any")

From

Pavel Stehule

Date:

12 August 2015, 17:32:11

2015-08-12 19:18 GMT+02:00 Marko Tiikkaja <marko@joh.to>:

Hi,

I'd like to suggest $SUBJECT for inclusion in Postgres 9.6. I'm sure everyone would've found it useful at some point in their lives, and the fact that it can't be properly implemented in any language other than C I think speaks for the fact that we as a project should provide it.

A quick and dirty proof of concept (patch attached):

=# select count_nulls(null::int, null::text, 17, 'bar');
count_nulls
-------------
2
(1 row)

Its natural habitat would be CHECK constraints, e.g:

CHECK (count_nulls(a,b,c) IN (0, 3))

Will finish this up for the next CF, unless someone wants to tell me how stupid this idea is before that.

It is not bad idea

+1

Pavel

.m

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: count_nulls(VARIADIC "any")

From

Peter Geoghegan

Date:

12 August 2015, 17:32:51

On Wed, Aug 12, 2015 at 10:30 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> The name count_nulls() suggest an aggregate function to me, though.

I thought the same.


-- 
Peter Geoghegan

Re: count_nulls(VARIADIC "any")

From

Marko Tiikkaja

Date:

12 August 2015, 17:33:35

On 2015-08-12 7:23 PM, Greg Stark wrote:
> On Wed, Aug 12, 2015 at 6:18 PM, Marko Tiikkaja <marko@joh.to> wrote:
>> Will finish this up for the next CF, unless someone wants to tell me how
>> stupid this idea is before that.
>
> I'm kind of puzzled what kind of schema would need this.

The first example I could find from our schema was specifying the URL 
for a Remote Procedure Call.  You can either specify a single request 
URI, or you can specify the pieces: protocol, host, port, path.  So the 
constraints look roughly like this:
  CHECK ((fulluri IS NULL) <> (protocol IS NULL)),  CHECK ((protocol IS NULL) = ALL(ARRAY[host IS NULL, port IS NULL, 
path IS NULL]))

Obviously the second one would be much prettier with count_nulls().

The other example is an OOP inheritance-like schema where an object 
could be one of any X number of types.  You could write that:
  CHECK ((a IS NULL)::int + (b IS NULL)::int + (c IS NULL)::int) = 1)

or just:
  CHECK (count_nulls(a,b,c) = 1)

The first example could be redesigned with three tables, but that seems 
like a cure worse than the disease.

.m

Re: count_nulls(VARIADIC "any")

From

Pavel Stehule

Date:

12 August 2015, 17:36:07

2015-08-12 19:32 GMT+02:00 Peter Geoghegan <pg@heroku.com>:

On Wed, Aug 12, 2015 at 10:30 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> The name count_nulls() suggest an aggregate function to me, though.

I thought the same.

maybe nulls_count ?

we have regr_count already

Regards

Pavel

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: count_nulls(VARIADIC "any")

From

Marko Tiikkaja

Date:

12 August 2015, 17:37:12

On 2015-08-12 7:35 PM, Pavel Stehule wrote:
> maybe nulls_count ?
>
> we have regr_count already

But that's an aggregate as well..


.m

Re: count_nulls(VARIADIC "any")

From

Pavel Stehule

Date:

12 August 2015, 17:45:16

2015-08-12 19:37 GMT+02:00 Marko Tiikkaja <marko@joh.to>:

On 2015-08-12 7:35 PM, Pavel Stehule wrote:
maybe nulls_count ?

we have regr_count already

But that's an aggregate as well..

my mistake

Pavel

.m

Re: count_nulls(VARIADIC "any")

From

Tom Lane

Date:

12 August 2015, 23:11:49

Peter Geoghegan <pg@heroku.com> writes:
> On Wed, Aug 12, 2015 at 10:30 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> The name count_nulls() suggest an aggregate function to me, though.

> I thought the same.

Ditto.  I'd be fine with this if we can come up with a name that
doesn't sound like an aggregate.  The best I can do offhand is
"number_of_nulls()", which doesn't seem very pretty.
        regards, tom lane

Re: count_nulls(VARIADIC "any")

From

"David G. Johnston"

Date:

13 August 2015, 00:19:13

On Wed, Aug 12, 2015 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Geoghegan <pg@heroku.com> writes:
> On Wed, Aug 12, 2015 at 10:30 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> The name count_nulls() suggest an aggregate function to me, though.

> I thought the same.

Ditto. I'd be fine with this if we can come up with a name that
doesn't sound like an aggregate. The best I can do offhand is
"number_of_nulls()", which doesn't seem very pretty.

nulls_in(a, b, c) IN (0, 3) - yes the repetition is not great...

David J.

Re: count_nulls(VARIADIC "any")

From

"Shulgin, Oleksandr"

Date:

13 August 2015, 07:19:06

On Thu, Aug 13, 2015 at 2:19 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Wed, Aug 12, 2015 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Geoghegan <pg@heroku.com> writes:
> On Wed, Aug 12, 2015 at 10:30 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> The name count_nulls() suggest an aggregate function to me, though.

> I thought the same.

Ditto. I'd be fine with this if we can come up with a name that
doesn't sound like an aggregate. The best I can do offhand is
"number_of_nulls()", which doesn't seem very pretty.

nulls_in(a, b, c) IN (0, 3) - yes the repetition is not great...

How about these:

nulls_rank() (the analogy being 0 <= "rank" <= "set size")

nnulls()

or just

nulls() (this one might be a bit confusing due to existing NULLS LAST/FIRST syntax, though)

Alex

Re: count_nulls(VARIADIC "any")

From

Marko Tiikkaja

Date:

13 August 2015, 07:21:24

On 8/13/15 9:18 AM, Shulgin, Oleksandr wrote:
> nnulls()

I think I'd prefer num_nulls() over that.


.m

Re: count_nulls(VARIADIC "any")

From

Pavel Stehule

Date:

13 August 2015, 07:26:00

2015-08-13 9:21 GMT+02:00 Marko Tiikkaja <marko@joh.to>:

On 8/13/15 9:18 AM, Shulgin, Oleksandr wrote:
nnulls()

I think I'd prefer num_nulls() over that.

can be

what about similar twin function num_nonulls()?

Pavel

.m

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: count_nulls(VARIADIC "any")

From

Atri Sharma

Date:

13 August 2015, 07:29:00

On Thu, Aug 13, 2015 at 12:55 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-08-13 9:21 GMT+02:00 Marko Tiikkaja <marko@joh.to>:
On 8/13/15 9:18 AM, Shulgin, Oleksandr wrote:
nnulls()

I think I'd prefer num_nulls() over that.

can be

what about similar twin function num_nonulls()?

Re: count_nulls(VARIADIC "any")

From

"Shulgin, Oleksandr"

Date:

13 August 2015, 07:47:33

On Thu, Aug 13, 2015 at 9:25 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-08-13 9:21 GMT+02:00 Marko Tiikkaja <marko@joh.to>:
On 8/13/15 9:18 AM, Shulgin, Oleksandr wrote:
nnulls()

I think I'd prefer num_nulls() over that.

can be

what about similar twin function num_nonulls()?

Yes. But I'm can't see any precedent for naming it like num_*... And if anything, should it be num_nonnulls() then?

Re: count_nulls(VARIADIC "any")

From

Pavel Stehule

Date:

13 August 2015, 08:02:37

2015-08-13 9:47 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Thu, Aug 13, 2015 at 9:25 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-08-13 9:21 GMT+02:00 Marko Tiikkaja <marko@joh.to>:
On 8/13/15 9:18 AM, Shulgin, Oleksandr wrote:
nnulls()

I think I'd prefer num_nulls() over that.

can be

what about similar twin function num_nonulls()?

Yes. But I'm can't see any precedent for naming it like num_*... And if anything, should it be num_nonnulls() then?

it is detail - depends on final naming convention.

Re: count_nulls(VARIADIC "any")

From

Marko Tiikkaja

Date:

21 November 2015, 05:02:18

Hello,

Here's a patch implementing this under the name num_nulls().  For
January's CF, of course.



.m

Attachment

num_nulls_v2.patch

Re: count_nulls(VARIADIC "any")

From

Tom Lane

Date:

21 November 2015, 05:06:31

Marko Tiikkaja <marko@joh.to> writes:
> Here's a patch implementing this under the name num_nulls().  For 
> January's CF, of course.

What's this do that "count(*) - count(x)" doesn't?
        regards, tom lane

Re: count_nulls(VARIADIC "any")

From

Marko Tiikkaja

Date:

21 November 2015, 05:11:37

On 2015-11-21 06:06, Tom Lane wrote:
> Marko Tiikkaja <marko@joh.to> writes:
>> Here's a patch implementing this under the name num_nulls().  For
>> January's CF, of course.
>
> What's this do that "count(*) - count(x)" doesn't?

This is sort of a lateral version of count(x); the input is a list of 
expressions rather than an expression executed over a bunch of input rows.

.m

Re: count_nulls(VARIADIC "any")

From

Marko Tiikkaja

Date:

21 November 2015, 05:12:44

On 2015-11-21 06:02, I wrote:
> Here's a patch implementing this under the name num_nulls().  For
> January's CF, of course.

I forgot to update the some references in the documentation.  Fixed in
v3, attached.


.m

Attachment

num_nulls_v3.patch

Re: count_nulls(VARIADIC "any")

From

Jim Nasby

Date:

21 November 2015, 05:53:14

On 11/20/15 11:12 PM, Marko Tiikkaja wrote:
> On 2015-11-21 06:02, I wrote:
>> Here's a patch implementing this under the name num_nulls().  For
>> January's CF, of course.
>
> I forgot to update the some references in the documentation.  Fixed in
> v3, attached.

I thought there was going to be a not-null equivalent as well? I've 
definitely wanted both variations in the past.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: count_nulls(VARIADIC "any")

From

Marko Tiikkaja

Date:

21 November 2015, 05:55:30

On 2015-11-21 06:52, Jim Nasby wrote:
> On 11/20/15 11:12 PM, Marko Tiikkaja wrote:
>> On 2015-11-21 06:02, I wrote:
>>> Here's a patch implementing this under the name num_nulls().  For
>>> January's CF, of course.
>>
>> I forgot to update the some references in the documentation.  Fixed in
>> v3, attached.
>
> I thought there was going to be a not-null equivalent as well? I've
> definitely wanted both variations in the past.

I'm not sure that's necessary.  It's quite simple to implement yourself 
using the  int - int  operator.


.m

Re: count_nulls(VARIADIC "any")

From

Jim Nasby

Date:

22 November 2015, 17:29:45

On 11/20/15 11:55 PM, Marko Tiikkaja wrote:
> On 2015-11-21 06:52, Jim Nasby wrote:
>> On 11/20/15 11:12 PM, Marko Tiikkaja wrote:
>>> On 2015-11-21 06:02, I wrote:
>>>> Here's a patch implementing this under the name num_nulls().  For
>>>> January's CF, of course.
>>>
>>> I forgot to update the some references in the documentation.  Fixed in
>>> v3, attached.
>>
>> I thought there was going to be a not-null equivalent as well? I've
>> definitely wanted both variations in the past.
>
> I'm not sure that's necessary.  It's quite simple to implement yourself
> using the  int - int  operator.

Only if you know how many columns there already are.

Or does this not work if you hand it a row?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: count_nulls(VARIADIC "any")

From

Marko Tiikkaja

Date:

22 November 2015, 17:34:33

On 2015-11-22 18:29, Jim Nasby wrote:
> Only if you know how many columns there already are.
>
> Or does this not work if you hand it a row?

It "works" in the sense that it tells you whether the row is NULL or 
not.  I.e. the answer will always be 0 or 1.


.m

Re: count_nulls(VARIADIC "any")

From

Jim Nasby

Date:

22 November 2015, 20:17:47

On 11/22/15 11:34 AM, Marko Tiikkaja wrote:
> On 2015-11-22 18:29, Jim Nasby wrote:
>> Only if you know how many columns there already are.
>>
>> Or does this not work if you hand it a row?
>
> It "works" in the sense that it tells you whether the row is NULL or
> not.  I.e. the answer will always be 0 or 1.

Hrm, I was hoping something like count_nulls(complex_type.*) would work.

I guess one could always create a wrapper function that does 
count_not_nulls() anyway.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: count_nulls(VARIADIC "any")

From

Marko Tiikkaja

Date:

22 November 2015, 20:25:28

On 2015-11-22 21:17, Jim Nasby wrote:
> On 11/22/15 11:34 AM, Marko Tiikkaja wrote:
>> On 2015-11-22 18:29, Jim Nasby wrote:
>>> Only if you know how many columns there already are.
>>>
>>> Or does this not work if you hand it a row?
>>
>> It "works" in the sense that it tells you whether the row is NULL or
>> not.  I.e. the answer will always be 0 or 1.
>
> Hrm, I was hoping something like count_nulls(complex_type.*) would work.

Nope:

=# select num_nulls((f).*) from (select '(1,2,3)'::foo) ss(f);
ERROR:  row expansion via "*" is not supported here


.m

Re: count_nulls(VARIADIC "any")

From

Pavel Stehule

Date:

03 January 2016, 20:37:58

<div dir="ltr">Hi<br /><div class="gmail_extra"><br /><div class="gmail_quote">2015-08-12 19:18 GMT+02:00 Marko
Tiikkaja<span dir="ltr"><<a href="mailto:marko@joh.to" target="_blank">marko@joh.to</a>></span>:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br /><br /> I'd like to
suggest$SUBJECT for inclusion in Postgres 9.6.  I'm sure everyone would've found it useful at some point in their
lives,and the fact that it can't be properly implemented in any language other than C I think speaks for the fact that
weas a project should provide it.<br /><br /> A quick and dirty proof of concept (patch attached):<br /><br /> =#
selectcount_nulls(null::int, null::text, 17, 'bar');<br />  count_nulls<br /> -------------<br />            2<br /> (1
row)<br/><br /> Its natural habitat would be CHECK constraints, e.g:<br /><br />   CHECK (count_nulls(a,b,c) IN (0,
3))<br/><br /> Will finish this up for the next CF, unless someone wants to tell me how stupid this idea is before
that.<spanclass="HOEnZb"><font color="#888888"><br /><br /><br /> .m<br /></font></span></blockquote></div><br
/></div><divclass="gmail_extra">I am sending updated version - support num_nulls and num_notnulls<br /><br /></div><div
class="gmail_extra">Regards<br/><br /></div><div class="gmail_extra">Pavel<br /></div></div>

Re: count_nulls(VARIADIC "any")

From

Pavel Stehule

Date:

03 January 2016, 20:38:27

2016-01-03 21:37 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:

Hi

2015-08-12 19:18 GMT+02:00 Marko Tiikkaja <marko@joh.to>:
Hi,

I'd like to suggest $SUBJECT for inclusion in Postgres 9.6. I'm sure everyone would've found it useful at some point in their lives, and the fact that it can't be properly implemented in any language other than C I think speaks for the fact that we as a project should provide it.

A quick and dirty proof of concept (patch attached):

=# select count_nulls(null::int, null::text, 17, 'bar');
count_nulls
-------------
2
(1 row)

Its natural habitat would be CHECK constraints, e.g:

CHECK (count_nulls(a,b,c) IN (0, 3))

Will finish this up for the next CF, unless someone wants to tell me how stupid this idea is before that.

.m

I am sending updated version - support num_nulls and num_notnulls

and patch

Regards

Pavel

Attachment

num_nulls_v4.patch

Re: count_nulls(VARIADIC "any")

From

Jim Nasby

Date:

03 January 2016, 21:49:33

On 1/3/16 2:37 PM, Pavel Stehule wrote:
> +         /* num_nulls(VARIADIC NULL) is defined as NULL */
> +         if (PG_ARGISNULL(0))
> +             return false;

Could you add to the comment explaining why that's the desired behavior?

> +         /*
> +          * Non-null argument had better be an array.  We assume that any call
> +          * context that could let get_fn_expr_variadic return true will have
> +          * checked that a VARIADIC-labeled parameter actually is an array.  So
> +          * it should be okay to just Assert that it's an array rather than
> +          * doing a full-fledged error check.
> +          */
> +         Assert(OidIsValid(get_base_element_type(get_fn_expr_argtype(fcinfo->flinfo, 0))));

Erm... is that really the way to verify that what you have is an array? 
ISTM there should be a macro for that somewhere...

> +         /* OK, safe to fetch the array value */
> +         arr = PG_GETARG_ARRAYTYPE_P(0);
> +
> +         ndims = ARR_NDIM(arr);
> +         dims = ARR_DIMS(arr);
> +         nitems = ArrayGetNItems(ndims, dims);
> +
> +         bitmap = ARR_NULLBITMAP(arr);
> +         if (bitmap)
> +         {
> +             bitmask = 1;
> +
> +             for (i = 0; i < nitems; i++)
> +             {
> +                 if ((*bitmap & bitmask) == 0)
> +                     count++;
> +
> +                 bitmask <<= 1;
> +                 if (bitmask == 0x100)
> +                 {
> +                     bitmap++;
> +                     bitmask = 1;

For brevity and example sake it'd probably be better to just use the 
normal iterator, unless there's a serious speed difference?

In the unit test, I'd personally prefer just building a table with the 
test cases and the expected NULL/NOT NULL results, at least for all the 
calls that would fit that paradigm. That should significantly reduce the 
size of the test. Not a huge deal though...

Also, I don't think anything is testing multiples of whatever value... 
how 'bout change the generate_series CASE statement to >40 instead of <>40?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: count_nulls(VARIADIC "any")

From

Pavel Stehule

Date:

04 January 2016, 04:24:42

2016-01-03 22:49 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 1/3/16 2:37 PM, Pavel Stehule wrote:
+ /* num_nulls(VARIADIC NULL) is defined as NULL */
+ if (PG_ARGISNULL(0))
+ return false;

Could you add to the comment explaining why that's the desired behavior?

This case should be different than num_nulls(VARIADIC ARRAY[..]) - this situation is really equivalent of missing data and NULL is correct answer. It should not be too clean in num_nulls, but when it is cleaner for num_notnulls. And more, it is consistent with other variadic functions in Postgres: see concat_internal and text_format.

+ /*
+ * Non-null argument had better be an array. We assume that any call
+ * context that could let get_fn_expr_variadic return true will have
+ * checked that a VARIADIC-labeled parameter actually is an array. So
+ * it should be okay to just Assert that it's an array rather than
+ * doing a full-fledged error check.
+ */
+ Assert(OidIsValid(get_base_element_type(get_fn_expr_argtype(fcinfo->flinfo, 0))));

Erm... is that really the way to verify that what you have is an array? ISTM there should be a macro for that somewhere...

really, it is. It is used more time. Although I am not against some macro, I don't think so it is necessary. The macro should not be too shorter than this text.

+ /* OK, safe to fetch the array value */
+ arr = PG_GETARG_ARRAYTYPE_P(0);
+
+ ndims = ARR_NDIM(arr);
+ dims = ARR_DIMS(arr);
+ nitems = ArrayGetNItems(ndims, dims);
+
+ bitmap = ARR_NULLBITMAP(arr);
+ if (bitmap)
+ {
+ bitmask = 1;
+
+ for (i = 0; i < nitems; i++)
+ {
+ if ((*bitmap & bitmask) == 0)
+ count++;
+
+ bitmask <<= 1;
+ if (bitmask == 0x100)
+ {
+ bitmap++;
+ bitmask = 1;

For brevity and example sake it'd probably be better to just use the normal iterator, unless there's a serious speed difference?

The iterator does some memory allocations and some access to type cache. Almost all work of iterator is useless for this case. This code is developed by Marko, but I agree with this design. Using the iterator is big gun for this case. I didn't any performance checks, but it should be measurable for any varlena arrays.

Regards

Pavel

In the unit test, I'd personally prefer just building a table with the test cases and the expected NULL/NOT NULL results, at least for all the calls that would fit that paradigm. That should significantly reduce the size of the test. Not a huge deal though...

Also, I don't think anything is testing multiples of whatever value... how 'bout change the generate_series CASE statement to >40 instead of <>40?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: count_nulls(VARIADIC "any")

From

Jim Nasby

Date:

04 January 2016, 04:49:28

On 1/3/16 10:23 PM, Pavel Stehule wrote:
> Hi
>
> 2016-01-03 22:49 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com
> <mailto:Jim.Nasby@bluetreble.com>>:
>
>     On 1/3/16 2:37 PM, Pavel Stehule wrote:
>
>         +               /* num_nulls(VARIADIC NULL) is defined as NULL */
>         +               if (PG_ARGISNULL(0))
>         +                       return false;
>
>
>     Could you add to the comment explaining why that's the desired behavior?
>
>
> This case should be different than num_nulls(VARIADIC ARRAY[..]) - this
> situation is really equivalent of missing data and NULL is correct
> answer. It should not be too clean in num_nulls, but when it is cleaner
> for num_notnulls. And more, it is consistent with other variadic
> functions in Postgres: see concat_internal and text_format.

Makes sense, now that you explain it. Which is why I'm thinking it'd be 
good to add that explanation to the comment... ;)

>           Assert(OidIsValid(get_base_element_type(get_fn_expr_argtype(fcinfo->flinfo, 0))));
>
>
>     Erm... is that really the way to verify that what you have is an
>     array? ISTM there should be a macro for that somewhere...
>
>
> really, it is. It is used more time. Although I am not against some
> macro, I don't think so it is necessary. The macro should not be too
> shorter than this text.

Well, if there's other stuff doing that... would be nice to refactor 
that though.

>     For brevity and example sake it'd probably be better to just use the
>     normal iterator, unless there's a serious speed difference?
>
>
> The iterator does some memory allocations and some access to type cache.
> Almost all work of iterator is useless for this case. This code is
> developed by Marko, but I agree with this design. Using the iterator is
> big gun for this case. I didn't any performance checks, but it should be
> measurable  for any varlena arrays.

Makes sense then.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: count_nulls(VARIADIC "any")

From

Pavel Stehule

Date:

04 January 2016, 08:25:29

2016-01-04 5:49 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 1/3/16 10:23 PM, Pavel Stehule wrote:
Hi

2016-01-03 22:49 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com
<mailto:Jim.Nasby@bluetreble.com>>:

On 1/3/16 2:37 PM, Pavel Stehule wrote:

+ /* num_nulls(VARIADIC NULL) is defined as NULL */
+ if (PG_ARGISNULL(0))
+ return false;

Could you add to the comment explaining why that's the desired behavior?

This case should be different than num_nulls(VARIADIC ARRAY[..]) - this
situation is really equivalent of missing data and NULL is correct
answer. It should not be too clean in num_nulls, but when it is cleaner
for num_notnulls. And more, it is consistent with other variadic
functions in Postgres: see concat_internal and text_format.

Makes sense, now that you explain it. Which is why I'm thinking it'd be good to add that explanation to the comment... ;)

Assert(OidIsValid(get_base_element_type(get_fn_expr_argtype(fcinfo->flinfo, 0))));

Erm... is that really the way to verify that what you have is an
array? ISTM there should be a macro for that somewhere...

really, it is. It is used more time. Although I am not against some
macro, I don't think so it is necessary. The macro should not be too
shorter than this text.

Well, if there's other stuff doing that... would be nice to refactor that though.

For brevity and example sake it'd probably be better to just use the
normal iterator, unless there's a serious speed difference?

The iterator does some memory allocations and some access to type cache.
Almost all work of iterator is useless for this case. This code is
developed by Marko, but I agree with this design. Using the iterator is
big gun for this case. I didn't any performance checks, but it should be
measurable for any varlena arrays.

Makes sense then.

+ enhanced comment

+ rewritten regress tests

Regards

Pavel

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachment

num_nulls_v5.patch

Re: count_nulls(VARIADIC "any")

From

Marko Tiikkaja

Date:

12 January 2016, 16:27:13

On 03/01/16 22:49, Jim Nasby wrote:
> In the unit test, I'd personally prefer just building a table with the
> test cases and the expected NULL/NOT NULL results, at least for all the
> calls that would fit that paradigm. That should significantly reduce the
> size of the test. Not a huge deal though...

I don't really see the point.  "The size of the test" doesn't seem like 
a worthwhile optimization target, unless the test scripts are somehow 
really unnecessarily large.

Further, if you were developing code related to this, previously you 
could just copy-paste the defective test case in order to easily 
reproduce a problem.  But now suddenly you need a ton of different setup.

I don't expect to really have a say in this, but I think the tests are 
now worse than they were before.

.m

Re: count_nulls(VARIADIC "any")

From

Pavel Stehule

Date:

17 January 2016, 07:44:42

2016-01-12 17:27 GMT+01:00 Marko Tiikkaja <marko@joh.to>:

On 03/01/16 22:49, Jim Nasby wrote:
In the unit test, I'd personally prefer just building a table with the
test cases and the expected NULL/NOT NULL results, at least for all the
calls that would fit that paradigm. That should significantly reduce the
size of the test. Not a huge deal though...

I don't really see the point. "The size of the test" doesn't seem like a worthwhile optimization target, unless the test scripts are somehow really unnecessarily large.

Further, if you were developing code related to this, previously you could just copy-paste the defective test case in order to easily reproduce a problem. But now suddenly you need a ton of different setup.

I don't expect to really have a say in this, but I think the tests are now worse than they were before.

the form of regress tests is not pretty significant issue. Jim's design is little bit transparent, Marko's is maybe little bit practical. Both has sense from my opinion, and any hasn't significant advantage against other.

Regards

Pavel

.m

Re: count_nulls(VARIADIC "any")

From

Pavel Stehule

Date:

21 January 2016, 19:49:12

2016-01-17 8:43 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:

2016-01-12 17:27 GMT+01:00 Marko Tiikkaja <marko@joh.to>:
On 03/01/16 22:49, Jim Nasby wrote:
In the unit test, I'd personally prefer just building a table with the
test cases and the expected NULL/NOT NULL results, at least for all the
calls that would fit that paradigm. That should significantly reduce the
size of the test. Not a huge deal though...

I don't really see the point. "The size of the test" doesn't seem like a worthwhile optimization target, unless the test scripts are somehow really unnecessarily large.

Further, if you were developing code related to this, previously you could just copy-paste the defective test case in order to easily reproduce a problem. But now suddenly you need a ton of different setup.

I don't expect to really have a say in this, but I think the tests are now worse than they were before.

the form of regress tests is not pretty significant issue. Jim's design is little bit transparent, Marko's is maybe little bit practical. Both has sense from my opinion, and any hasn't significant advantage against other.

any possible agreement, how these tests should be designed?

simple patch, simple regress tests, so there are no reason for long waiting.

Regards

Pavel

Regards

Pavel

.m

Re: count_nulls(VARIADIC "any")

From

Jim Nasby

Date:

22 January 2016, 18:01:32

On 1/21/16 1:48 PM, Pavel Stehule wrote:
>     the form of regress tests is not pretty significant issue. Jim's
>     design is little bit transparent, Marko's is maybe little bit
>     practical. Both has sense from my opinion, and any hasn't
>     significant advantage against other.
>
>
> any possible agreement, how these tests should be designed?
>
> simple patch, simple regress tests, so there are no reason for long waiting.

I don't really see how individual tests are more practical (you can 
still cut and paste a table...), but since there's no strong consensus 
either way I'd say it's up to you as author.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: count_nulls(VARIADIC "any")

From

Pavel Stehule

Date:

25 January 2016, 18:58:06

2016-01-22 13:34 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 1/21/16 1:48 PM, Pavel Stehule wrote:
the form of regress tests is not pretty significant issue. Jim's
design is little bit transparent, Marko's is maybe little bit
practical. Both has sense from my opinion, and any hasn't
significant advantage against other.

any possible agreement, how these tests should be designed?

simple patch, simple regress tests, so there are no reason for long waiting.

I don't really see how individual tests are more practical (you can still cut and paste a table...), but since there's no strong consensus either way I'd say it's up to you as author.

Marco is a author of this patch, so - Marco, please, send final version of this patch

Regards

Pavel

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: count_nulls(VARIADIC "any")

From

Marko Tiikkaja

Date:

26 January 2016, 10:42:40

On 25/01/16 19:57, Pavel Stehule wrote:
> Marco is a author of this patch, so - Marco, please, send final version of
> this patch

I don't really care about the tests.  Can we not use the v5 patch 
already in the thread?  As far as I could tell there were no reviewer's 
comments on it anymore.


.m

Re: count_nulls(VARIADIC "any")

From

Pavel Stehule

Date:

26 January 2016, 11:20:55

2016-01-26 11:42 GMT+01:00 Marko Tiikkaja <marko@joh.to>:

On 25/01/16 19:57, Pavel Stehule wrote:
Marco is a author of this patch, so - Marco, please, send final version of
this patch

I don't really care about the tests. Can we not use the v5 patch already in the thread? As far as I could tell there were no reviewer's comments on it anymore.

It was not my request, but I counted Jim as second reviewer.

So, I'll return back original regress tests. If I remember well, there are no any other objection, so I'll mark this version as ready for commiter.

1. the patch is rebased against master

2. now warning or errors due compilation

3. all tests are passed

4. the code is simple without side effects and possible negative performance impacts

6. there was not objections against the design

7. the iteration over null bitmap is used more times in our code, but this is new special case. We don't need iterate over array elements, so we should not to use existing array iterators.

I'll mark this patch as ready for commiter

Regards

Pavel

.m

Attachment

num_nulls_v6.patch

Re: count_nulls(VARIADIC "any")

From

Tom Lane

Date:

04 February 2016, 22:33:15

Pavel Stehule <pavel.stehule@gmail.com> writes:
> [ num_nulls_v6.patch ]

I started looking through this.  It seems generally okay, but I'm not
very pleased with the function name "num_notnulls".  I think it would
be better as "num_nonnulls", as I see Oleksandr suggested already.
        regards, tom lane

Re: count_nulls(VARIADIC "any")

From

Tom Lane

Date:

05 February 2016, 04:06:46

I wrote:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> [ num_nulls_v6.patch ]

> I started looking through this.  It seems generally okay, but I'm not
> very pleased with the function name "num_notnulls".  I think it would
> be better as "num_nonnulls", as I see Oleksandr suggested already.

Not hearing any complaints, I pushed it with that change and some other
cosmetic adjustments.
        regards, tom lane

Re: count_nulls(VARIADIC "any")

From

Pavel Stehule

Date:

05 February 2016, 04:08:41

<p dir="ltr"><br /> Dne 5. 2. 2016 1:33 napsal uživatel "Tom Lane" <<a
href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>:<br/> ><br /> > Pavel Stehule <<a
href="mailto:pavel.stehule@gmail.com">pavel.stehule@gmail.com</a>>writes:<br /> > > [ num_nulls_v6.patch ]<br
/>><br /> > I started looking through this.  It seems generally okay, but I'm not<br /> > very pleased with
thefunction name "num_notnulls".  I think it would<br /> > be better as "num_nonnulls", as I see Oleksandr suggested
already.<pdir="ltr">I have no problem with it.<p dir="ltr">Regards<p dir="ltr">Pavel<br /> ><br /> >            
           regards, tom lane<br />

Re: count_nulls(VARIADIC "any")

From

Marko Tiikkaja

Date:

05 February 2016, 10:12:54

On 2016-02-05 05:06, Tom Lane wrote:
> I wrote:
>> Pavel Stehule <pavel.stehule@gmail.com> writes:
>>> [ num_nulls_v6.patch ]
>
>> I started looking through this.  It seems generally okay, but I'm not
>> very pleased with the function name "num_notnulls".  I think it would
>> be better as "num_nonnulls", as I see Oleksandr suggested already.
>
> Not hearing any complaints, I pushed it with that change and some other
> cosmetic adjustments.

Thanks Tom and Pavel and everyone who provided feedback.


.m

Re: count_nulls(VARIADIC "any")

From

Thomas Munro

Date:

08 February 2016, 20:22:56

On Fri, Feb 5, 2016 at 5:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Pavel Stehule <pavel.stehule@gmail.com> writes:
>>> [ num_nulls_v6.patch ]
>
>> I started looking through this.  It seems generally okay, but I'm not
>> very pleased with the function name "num_notnulls".  I think it would
>> be better as "num_nonnulls", as I see Oleksandr suggested already.
>
> Not hearing any complaints, I pushed it with that change and some other
> cosmetic adjustments.

Would num_values be a better name than num_nonnulls?

-- 
Thomas Munro
http://www.enterprisedb.com

Re: count_nulls(VARIADIC "any")

From

Tom Lane

Date:

08 February 2016, 20:26:26

Thomas Munro <thomas.munro@enterprisedb.com> writes:
> Would num_values be a better name than num_nonnulls?

If "value" is a term that excludes null values, it's news to me.
        regards, tom lane

Re: count_nulls(VARIADIC "any")

From

Thomas Munro

Date:

08 February 2016, 21:25:41

On Tue, Feb 9, 2016 at 9:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> Would num_values be a better name than num_nonnulls?
>
> If "value" is a term that excludes null values, it's news to me.

Ah, right, I was thinking of null as the absence of a value.  But in
fact it is a special value that indicates the absence of a "data
value".  And num_data_values doesn't sound great.

-- 
Thomas Munro
http://www.enterprisedb.com