On Thu, Feb 22, 2024 at 04:42:37PM -0300, Ranier Vilela wrote:
> Can you share exactly script used to create a table?
Stressing the internals of array_out() for the area of the patch is
not that difficult, as we want to quote each element that's returned
in output.
The trick is to have the following to stress the second quoting loop a
maximum:
- a high number of rows.
- a high number of items in the arrays.
- a *minimum* number of characters in each element of the array, with
characters that require quoting.
The best test case I can think of to demonstrate the patch would be
something like that (adjust rows and elts as you see fit):
-- Number of rows
\set rows 6
-- Number of elements
\set elts 4
create table tab as
with data as (
select array_agg(a) as array
from (
select '{'::text
from generate_series(1, :elts) as int(a)) as index(a))
select data.array from data, generate_series(1,:rows);
Then I get:
array
-------------------
{"{","{","{","{"}
{"{","{","{","{"}
{"{","{","{","{"}
{"{","{","{","{"}
{"{","{","{","{"}
{"{","{","{","{"}
(6 rows)
With "\set rows 100000" and "\set elts 10000", giving 100MB of data
with 100k rows with 10k elements each, I get for HEAD when data is in
shared buffers:
=# copy tab to '/dev/null';
COPY 100000
Time: 48620.927 ms (00:48.621)
And with v3:
=# copy tab to '/dev/null';
COPY 100000
Time: 47993.183 ms (00:47.993)
Profiles don't fundamentally change much, array_out() gets a 30.76% ->
29.72% in self runtime, with what looks like a limited impact to me.
With 1k rows and 1M elements, COPY TO gets reduced from 54338.436 ms
to 54129.978 ms, and a 29.51% -> 29.12% increase (looks like noise).
Perhaps I've hit some noise while running this set of tests, but the
impact of the proposed patch looks very limited to me. If you have a
better set of tests and/or ideas, feel free of course.
--
Michael