Re: count_nulls(VARIADIC "any") - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: count_nulls(VARIADIC "any")
Date
Msg-id CAFj8pRBkMU=DNdBamyeKqVeydJMhGOmL0N1q-aEfHqbd_JPkwA@mail.gmail.com
Whole thread Raw
In response to Re: count_nulls(VARIADIC "any")  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses Re: count_nulls(VARIADIC "any")  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
Hi

2016-01-03 22:49 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/3/16 2:37 PM, Pavel Stehule wrote:
+               /* num_nulls(VARIADIC NULL) is defined as NULL */
+               if (PG_ARGISNULL(0))
+                       return false;

Could you add to the comment explaining why that's the desired behavior?

This case should be different than num_nulls(VARIADIC ARRAY[..]) - this situation is really equivalent of missing data and NULL is correct answer. It should not be too clean in num_nulls, but when it is cleaner for num_notnulls. And more, it is consistent with other variadic functions in Postgres: see concat_internal and text_format.
 

+               /*
+                * Non-null argument had better be an array.  We assume that any call
+                * context that could let get_fn_expr_variadic return true will have
+                * checked that a VARIADIC-labeled parameter actually is an array.  So
+                * it should be okay to just Assert that it's an array rather than
+                * doing a full-fledged error check.
+                */
+               Assert(OidIsValid(get_base_element_type(get_fn_expr_argtype(fcinfo->flinfo, 0))));

Erm... is that really the way to verify that what you have is an array? ISTM there should be a macro for that somewhere...

really, it is. It is used more time. Although I am not against some macro, I don't think so it is necessary. The macro should not be too shorter than this text.
 

+               /* OK, safe to fetch the array value */
+               arr = PG_GETARG_ARRAYTYPE_P(0);
+
+               ndims = ARR_NDIM(arr);
+               dims = ARR_DIMS(arr);
+               nitems = ArrayGetNItems(ndims, dims);
+
+               bitmap = ARR_NULLBITMAP(arr);
+               if (bitmap)
+               {
+                       bitmask = 1;
+
+                       for (i = 0; i < nitems; i++)
+                       {
+                               if ((*bitmap & bitmask) == 0)
+                                       count++;
+
+                               bitmask <<= 1;
+                               if (bitmask == 0x100)
+                               {
+                                       bitmap++;
+                                       bitmask = 1;

For brevity and example sake it'd probably be better to just use the normal iterator, unless there's a serious speed difference?

The iterator does some memory allocations and some access to type cache. Almost all work of iterator is useless for this case. This code is developed by Marko, but I agree with this design. Using the iterator is big gun for this case. I didn't any performance checks, but it should be measurable  for any varlena arrays.
 
Regards

Pavel


In the unit test, I'd personally prefer just building a table with the test cases and the expected NULL/NOT NULL results, at least for all the calls that would fit that paradigm. That should significantly reduce the size of the test. Not a huge deal though...

Also, I don't think anything is testing multiples of whatever value... how 'bout change the generate_series CASE statement to >40 instead of <>40?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: 9.5 BLOCKER: regrole and regnamespace and quotes
Next
From: Tom Lane
Date:
Subject: Re: 9.5 BLOCKER: regrole and regnamespace and quotes