Re: Modifying update_attstats of analyze.c for C Strings - Mailing list pgsql-hackers

From Ashoke
Subject Re: Modifying update_attstats of analyze.c for C Strings
Date
Msg-id CALpszJO5ax+yVcjU3t_A3S94gM+s4u+SYjLZO5Ydf+VGkZPhDg@mail.gmail.com
Whole thread Raw
In response to Modifying update_attstats of analyze.c for C Strings  (Ashoke <s.ashoke@gmail.com>)
Responses Re: Modifying update_attstats of analyze.c for C Strings  (Ashoke <s.ashoke@gmail.com>)
List pgsql-hackers
As a follow-up question, 

I found some of the varchar column types, in which the histogram_bounds are not being surrounded in double quotes (" ") even in the default implementation.
Ex : c_name column of Customer table

I also found histogram_bounds in which only some strings are surrounded in double quotes and some are not.
Ex : c_address column of Customer table

Why are there such inconsistencies? How is this determined?

Thank you.


On Tue, Jul 8, 2014 at 10:52 AM, Ashoke <s.ashoke@gmail.com> wrote:
Hi,

I am trying to implement a functionality that is similar to ANALYZE, but needs to have different values (the values will be valid and is stored in inp->str[][]) for MCV/Histogram Bounds in case the column under consideration is varchar (C Strings). I have written a function dummy_update_attstats with the following changes. Other things remain the same as in update_attstats of ~/src/backend/commands/analyze.c

---
{
ArrayType  *arry;

if (
strcmp(col_type,"varchar") == 0
)

arry = construct_array(stats->stavalues[k],

stats->numvalues[k],

CSTRINGOID,

-2,

false,

'c');

else

arry = construct_array(stats->stavalues[k],

stats->numvalues[k],

stats->statypid[k],

stats->statyplen[k],

stats->statypbyval[k],

stats->statypalign[k]);

values[i++] = PointerGetDatum(arry); /* stavaluesN */
            }
          ---

and I update the hist_values in the appropriate function as:
          ---
if (strcmp(col_type,"varchar") == 0)
hist_values[i] = datumCopy(CStringGetDatum(inp->str[i][j]),
false,
-2);
---


My issue is : When I use my way for strings, the MCV/histogram_bounds in pg_stats doesn't have double quotes (" ") surrounding string. That is,

If normal update_attstats is used, histogram_bounds for TPCH nation(n_name) are : "ALGERIA       ","ARGENTINA    ",...
If I use dummy_update_attstats as above, histogram_bounds for TPCH nation(n_name) are : ALGERIA,ARGENTINA,...

This becomes an issue if the string has ',' (commas), like for example in n_comment column of nation table.

Could someone point out the problem and suggest a solution?

Thank you.

--
Regards,
Ashoke



--
Regards,
Ashoke




pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Extending constraint exclusion for implied constraints/conditions
Next
From: Craig Ringer
Date:
Subject: Re: RLS Design