Re: Doing better at HINTing an appropriate column within errorMissingColumn() - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date
Msg-id CAB7nPqRMOuVyVaHhCJ7WvF-tr1=0JcBiOpD0Su66kw9D8W-B+g@mail.gmail.com
Whole thread Raw
In response to Re: Doing better at HINTing an appropriate column within errorMissingColumn()  (Abhijit Menon-Sen <ams@2ndQuadrant.com>)
Responses Re: Doing better at HINTing an appropriate column within errorMissingColumn()  (Peter Geoghegan <pg@heroku.com>)
Re: Doing better at HINTing an appropriate column within errorMissingColumn()  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Sat, Jul 5, 2014 at 12:46 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
> At 2014-07-02 15:51:08 -0700, pg@heroku.com wrote:
>>
>> Attached revision factors in everyone's concerns here, I think.
>
> Is anyone planning to review Peter's revised patch?

I have been doing some functional tests, and looked quickly at the
code to understand what it does:
1) Compiles without warnings, passes regression tests
2) Checking process goes through all the existing columns of a
relation even a difference of 1 with some other column(s) has already
been found. As we try to limit the number of hints returned, this
seems like a waste of resources.
3) distanceName could be improved, by for example having some checks
on the string lengths of target and source columns, and immediately
reject the match if for example the length of the source string is the
double/half of the length of target.
4) This is not nice, could it be possible to remove the stuff from varlena.c?
+/* Expand each Levenshtein distance variant */
+#include "levenshtein.c"
+#define LEVENSHTEIN_LESS_EQUAL
+#include "levenshtein.c"
+#undef LEVENSHTEIN_LESS_EQUAL
Part of the same comment: only varstr_leven_less_equal is used to
calculate the distance, should we really move varstr_leven to core?
This clearly needs to be reworked as not just a copy-paste of the
things in fuzzystrmatch.
The flag LEVENSHTEIN_LESS_EQUAL should be let within fuzzystrmatch I think.
5) Do we want hints on system columns as well? For example here we
could get tableoid as column hint:
=# select tablepid from foo;
ERROR:  42703: column "tablepid" does not exist
LINE 1: select tablepid from foo;              ^
LOCATION:  errorMissingColumn, parse_relation.c:3123
Time: 0.425 ms
6) Sometimes no hints are returned... Even in simple cases like this one:
=# create table foo (aa int, bb int);
CREATE TABLE
=# select ab from foo;
ERROR:  42703: column "ab" does not exist
LINE 1: select ab from foo;              ^
LOCATION:  errorMissingColumn, parse_relation.c:3123
7) Performance penalty with a table with 1600 columns:
=# CREATE FUNCTION create_long_table(tabname text, columns int)
RETURNS void
LANGUAGE plpgsql
as $$
declare first_col bool = true; count int; query text;
begin query := 'CREATE TABLE ' || tabname || ' ('; for count in 0..columns loop   query := query || 'col' || count ||
'int';   if count <> columns then     query := query || ', ';   end if; end loop; query := query || ')'; execute
query;
end;
$$;
=# SELECT create_long_table('aa', 1599);create_long_table
-------------------

(1 row)
Then tested queries like that: SELECT col888a FROM aa;
Patched version: 2.100ms~2.200ms
master branch (6048896): 0.956 ms~0.990 ms
So the performance impact seems limited.

Regards,
-- 
Michael



pgsql-hackers by date:

Previous
From: "Tomas Vondra"
Date:
Subject: Re: tweaking NTUP_PER_BUCKET
Next
From: Kohei KaiGai
Date:
Subject: Re: RLS Design