Hash join operator question - Mailing list pgsql-general
From | Paolo Tavalazzi |
---|---|
Subject | Hash join operator question |
Date | |
Msg-id | 000801c55f92$9051a1a0$0f0b640a@pppji7vdtyklcr Whole thread Raw |
List | pgsql-general |
I'd like to understand if it is possible to find a solution to the problem that we have on ours DB in production.
I make an example simplified in order to explain itself better:
We have 2 table :
TABLE vendor (
group TEXT,
client TEXT,
vdr_venue_code CHAR(8),
vdr_location_code CHAR(8)
)
TABLE venue (
title TEXT,
date timestamp,
.......
code CHAR(8),
location CHAR(8)
)
For being able to couple some tuples of vendor with all the tuple of venue
that are inserted from other systems ,we have used a default character "*",
so that whichever is venue.code it comes coupled to the vendor tuple thet have vdr_venue_code = "*".
For this reason base query base will be:
SELECT group FRON vendor,venue WHERE vdr_venue_code in (venue.code,"*") and vdr_location_code in (venue.location,"*") and venue.data < .... and ....;
This type of query does not allow planner to use HASH JOIN slowing down the query for great amounts of data.
Therefore we have tried to create an operator and a function that they supported the hash and they resolved this case:
strcmp_left_default(PG_FUNCTION_ARGS)
{
text *str = PG_GETARG_TEXT_P(0);
text *cmp = PG_GETARG_TEXT_P(1);
char *my_str = NULL;
char *my_cmp = NULL;
bool result;
{
text *str = PG_GETARG_TEXT_P(0);
text *cmp = PG_GETARG_TEXT_P(1);
char *my_str = NULL;
char *my_cmp = NULL;
bool result;
textInChar(&my_str,str);
textInChar(&my_cmp,cmp);
textInChar(&my_cmp,cmp);
result = (strcmp(my_str, my_cmp) == 0 ||
strcmp(my_str, "*") == 0);
strcmp(my_str, "*") == 0);
if (my_str != NULL)
pfree(my_str);
pfree(my_str);
if (my_cmp != NULL)
pfree(my_cmp);
pfree(my_cmp);
PG_FREE_IF_COPY(str, 0);
PG_FREE_IF_COPY(cmp, 1);
PG_FREE_IF_COPY(cmp, 1);
PG_RETURN_BOOL(result);
}
}
CREATE OPERATOR ==* (
PROCEDURE = strcmp_left_default,
LEFTARG = text,
RIGHTARG = text,
COMMUTATOR = OPERATOR(*==),
HASHES,
RESTRICT = eqsel,
JOIN = eqjoinsel,
SORT1 = <,
SORT2 = <,
LTCMP = <,
GTCMP = >
);
PROCEDURE = strcmp_left_default,
LEFTARG = text,
RIGHTARG = text,
COMMUTATOR = OPERATOR(*==),
HASHES,
RESTRICT = eqsel,
JOIN = eqjoinsel,
SORT1 = <,
SORT2 = <,
LTCMP = <,
GTCMP = >
);
CREATE OPERATOR *== (
PROCEDURE = strcmp_right_default,
LEFTARG = text,
RIGHTARG = text,
COMMUTATOR = OPERATOR(==*),
HASHES,
RESTRICT = eqsel,
JOIN = eqjoinsel,
SORT1 = <,
SORT2 = <,
LTCMP = <,
GTCMP = >
);
PROCEDURE = strcmp_right_default,
LEFTARG = text,
RIGHTARG = text,
COMMUTATOR = OPERATOR(==*),
HASHES,
RESTRICT = eqsel,
JOIN = eqjoinsel,
SORT1 = <,
SORT2 = <,
LTCMP = <,
GTCMP = >
);
CREATE OPERATOR CLASS text_default_ops
FOR TYPE text USING btree AS
OPERATOR 3 ==*(text,text) ,
FUNCTION 1 bttextcmp(text,text);
CREATE OPERATOR CLASS text_default_ops
FOR TYPE text USING hash AS
OPERATOR 1 ==*(text,text) ,
FUNCTION 1 hashtext(text);
For being able to have query of the type :
SELECT group FROM vendor,venue WHERE vendor.vdr_venue_code ==* venue.code and vendor.vdr_location_code ==* venue.location and ...;
Effectively it comes used the hash join clause,but this cannot be the solution,because my operator come used after the creation of the buckets
of the hash, so bucket that do not have correspondence on the key (vdr_venue_code,code) does not come considers, even if has default value "*".
of the hash, so bucket that do not have correspondence on the key (vdr_venue_code,code) does not come considers, even if has default value "*".
The situation does not change also using one our various function, different from hashtext, for the creation of the hashtable,
in fact I cannot force the comparison between a value of venue.code with a different bucket value where the default value "*" resides.
in fact I cannot force the comparison between a value of venue.code with a different bucket value where the default value "*" resides.
My feeling is that having to only confront the value key venue_code with a variable value and a constant value "*",
it can be possible to create an operator that it manages this type of query using a hashjoin clause.
Is it possible??
pgsql-general by date: