Thread: Hash join operator question

Hash join operator question

From
"Paolo Tavalazzi"
Date:
I'd like to understand if it is possible to find a solution to the problem that we have on ours DB in production.
 
I make an example simplified in order to explain itself better:
 
We have 2 table :
 

TABLE vendor (
 group TEXT,
 client TEXT,
 vdr_venue_code CHAR(8),
 vdr_location_code CHAR(8)
)
 

TABLE venue (
 title TEXT,
 date timestamp,
 .......
 code CHAR(8),
 location CHAR(8)
)
 

For being able to couple some tuples of vendor with all the tuple of venue
that are inserted from other systems ,we have used a default character "*",
so that whichever is venue.code it comes coupled to the vendor tuple thet have vdr_venue_code = "*".
 
For this reason base query base will be:
 
SELECT group FRON vendor,venue WHERE vdr_venue_code in (venue.code,"*") and vdr_location_code in (venue.location,"*") and  venue.data < .... and ....;
 

This type of query does not allow  planner  to use HASH JOIN slowing down the query for great amounts of data.
Therefore we have tried to create an operator and a function that they supported the hash and they resolved this case:
 
 
 
strcmp_left_default(PG_FUNCTION_ARGS)
{
  text     *str   = PG_GETARG_TEXT_P(0);
  text     *cmp  = PG_GETARG_TEXT_P(1);
  char *my_str = NULL;
  char *my_cmp = NULL;
  bool            result;
 
  textInChar(&my_str,str);
  textInChar(&my_cmp,cmp);
 
  result = (strcmp(my_str, my_cmp) == 0 ||
           strcmp(my_str, "*") == 0);
 
  if (my_str != NULL)
    pfree(my_str);
 
  if (my_cmp != NULL)
    pfree(my_cmp);
 
  PG_FREE_IF_COPY(str, 0);
  PG_FREE_IF_COPY(cmp, 1);
 
  PG_RETURN_BOOL(result);
}
 
 
 
  CREATE OPERATOR ==* (
    PROCEDURE = strcmp_left_default,
    LEFTARG = text,
    RIGHTARG = text,
    COMMUTATOR = OPERATOR(*==),
    HASHES,
    RESTRICT = eqsel,
     JOIN = eqjoinsel,
    SORT1 = <,
    SORT2 = <,
    LTCMP = <,
    GTCMP = >
);
 
 
 
CREATE OPERATOR *== (
    PROCEDURE = strcmp_right_default,
    LEFTARG = text,
    RIGHTARG = text,
    COMMUTATOR = OPERATOR(==*),
    HASHES,
    RESTRICT = eqsel,
    JOIN = eqjoinsel,
    SORT1 = <,
    SORT2 = <,
    LTCMP = <,
    GTCMP = >
);
 

CREATE OPERATOR CLASS text_default_ops
    FOR TYPE text USING btree AS
    OPERATOR 3 ==*(text,text) ,
    FUNCTION 1 bttextcmp(text,text);
 

CREATE OPERATOR CLASS text_default_ops
    FOR TYPE text USING hash AS
    OPERATOR 1 ==*(text,text) ,
    FUNCTION 1 hashtext(text);
 

For being able to have query of the type :
 
SELECT group FROM vendor,venue WHERE vendor.vdr_venue_code ==* venue.code and  vendor.vdr_location_code ==* venue.location and ...;
 
 
 
Effectively  it comes used the hash join clause,but this cannot be the solution,because my operator come used after the creation of the buckets
of the hash, so  bucket that do not have correspondence on the key (vdr_venue_code,code) does not come considers, even if has default value "*".
 
The situation does not change also using one our various function, different from hashtext, for the creation of the hashtable,
in fact I cannot force the comparison between a value of venue.code with a different bucket value where the default value "*" resides.
 

My feeling is that having to only confront the value key venue_code with a variable value and a constant value "*",
it can be possible to create an operator that it manages this type of query using a hashjoin clause.
Is it possible??