Re: proposal - function string_to_table - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: proposal - function string_to_table
Date
Msg-id CAFj8pRD+W3YcNm72+VbMXH1dzi0m4e1BHppfpwjiDP6nVVM8og@mail.gmail.com
Whole thread Raw
In response to Re: proposal - function string_to_table  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: proposal - function string_to_table  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
Hi

čt 20. 8. 2020 v 4:07 odesílatel Peter Smith <smithpb2250@gmail.com> napsal:
Hi.

I have been looking at the patch: string_to_table-20200706-2.patch

Below are some review comments for your consideration.

====

COMMENT func.sgml (style)

+       <para>
+        splits string into table using supplied delimiter and
+        optional null string.
+       </para>

The format style of the short description is inconsistent with the
other functions.
e.g. Should start with Capital letter.
e.g. Should tag the parameter names properly

Something like:
<para>
Splits <parameter>string</parameter> into a table
using supplied <parameter>delimiter</parameter>
and optional null string <parameter>nullstr</parameter>.
</para>


done
 
====

COMMENT func.sgml (what does nullstr do)

The description does not sufficiently describe the purpose/behaviour
of the nullstr.

e.g. Firstly I thought that it meant if 2 consecutive delimiters were
encountered it would substitute this string as the row value. But it
is doing the opposite of what I guessed - if the extracted row value
is the same as nullstr then a NULL row is inserted instead.


done
 
====

COMMENT func.sgml (wrong sample output)

+<programlisting>xx
+yy,
+zz</programlisting>

This output is incorrect for the sample given. There is no "yy," in
the output because there is a 'yy' nullstr substitution.

Should be:
---
xx
NULL
zz
---

fixed


====

COMMENT func.sgml (related to regexp_split_to_table)

Because this new function is similar to the existing
regexp_split_to_table, perhaps they should cross-reference each other
so a reader of this documentation is made aware of the alternative
function?

I wrote new sentence with ref
 

====

COMMENT (test cases)

It is impossible to tell difference in the output between empty
strings and nulls currently, so maybe you can change all the tests to
have a form like below so they can be validated properly:

# select v, v IS NULL as "is null" from
string_to_table('a,b,*,c,d,',',','*') g(v);
 v | is null
---+---------
 a | f
 b | f
   | t
 c | f
 d | f
   | f
(6 rows)

or maybe like this is even easier:

# select quote_nullable(string_to_table('a|*||c|d|','|','*'));
 quote_nullable
----------------
 'a'
 NULL
 ''
 'c'
 'd'
 ''
(6 rows)

I prefer the first variant, it is clean. It is good idea, done


Something similar was already proposed before [1] but that never got
put into the test code.
[1] https://www.postgresql.org/message-id/CAFj8pRDSzDYmaS06dfMXBfbr8x%2B3xjDJxA5kbL3h8%2BeOGoRUcA%40mail.gmail.com

====

COMMENT (test cases)

There are multiple combinations of the parameters to this function and
MANY different results depending on different values they can take, so
the existing tests only cover a small sample.

I have attached a lot more test scenarios that you may want to include
for better test coverage. Everything seemed to work as expected.

ok, merged


PSA test-cases.pdf

====

COMMENT (accum_result)

+ Datum values[1];
+ bool nulls[1];
+
+ values[0] = PointerGetDatum(result_text);
+ nulls[0] = is_null;

Why not use variables instead of arrays with only 1 element?

Technically it is equivalent,  but I think so using one element array is more correct, because function heap_form_tuple expects an array. Sure in C language there is no difference between pointer to value or pointer to array, but minimally the name of the argument "values" implies so argument is an array.

This pattern is used more times in Postgres. You can find a fragments where although we know so array has only one field, still we works with array

misc.c
hash.c
execTuples.c

but I can this code simplify little bit - I can use function tuplestore_putvalues(tupstore, tupdesc, values, nulls);

I see, so this code can be reduced more, and I don't need local variables, but I prefer to be consistent with other parts, and I feel better if I pass an array where the array is expected.

This is not extra important, and I can it change, just I think this variant is cleaner little bit



====

COMMENT (text_to_array_internal)

+ if (!tstate.astate)
+ PG_RETURN_ARRAYTYPE_P(construct_empty_array(TEXTOID));

Maybe the condition is more readable when expressed as:
if (tstate.astate == NULL)


done


new patch attached

Thank you for precious review

Regards

Pavel

====

Kind Regards,
Peter Smith.
Fujitsu Australia
Attachment

pgsql-hackers by date:

Previous
From: Ashwin Agrawal
Date:
Subject: Re: language cleanups in code and docs
Next
From: Roger Pack
Date:
Subject: Re: deb repo doesn't have latest. or possible to update web page?