Thread: Re: [GENERAL] Join works in 7.3.6, fails in 7.4.2

Re: [GENERAL] Join works in 7.3.6, fails in 7.4.2

From
Tom Lane
Date:
I wrote:
> Michael Fuhr <mike@fuhr.org> writes:
>> I have a query that works in 7.3.6 but not in 7.4.2 unless I turn
>> off enable_hashjoin.  I'm joining a table of network interfaces and
>> a table of networks so I can find additional info about a particular
>> interface's network.

> Hmm.  The inet = operator is marked hashable in 7.4 but not in 7.3 ...
> I wonder if that is a mistake?

Digging further, I find that indeed this seems to be a mistake.  CIDR
and INET values that have the same address and masklen compare as equal
according to network_eq(), but they will not hash the same because
there's a flag identifying whether a given value is considered CIDR or
INET.  And what the network() function returns is marked as a CIDR.
It's a bit surprising that your hash join produces any matches at all...

I believe I got misled on this because there is a hash index operator
class for inet; at one point during the 7.4 cycle I went around and
cleaned up cases where the equality operator's canhash flag was
inconsistent with the set of hash index opclasses.  Arguably the hash
opclass is broken, although in practice people probably don't notice the
failure since a given column is likely to contain either all inet or all
cidr values.  (And of course it's entirely likely that there *aren't*
any people using the inet hash opclass, period...)

I can think of a number of possible fixes:

1. Mark inet = as not hashjoinable.  We'd probably want to remove the
   inet hash opclass too.

2. Redefine inet = so that CIDR and INET values are never considered
   equal, thus eliminating the unused field.  This could be back-patched
   into 7.4 but otherwise seems to have little to recommend it.  It
   would certainly not help solve Michael's problem.

3. Provide a specialized hash method for type inet that ignores the
   iptype field.

#3 seems the most desirable going forward, but is probably impractical
to back-patch into 7.4.*, so I'm not sure what to do about the problem
in that branch.  Given the relatively low incidence of the problem,
maybe it's okay to just clear the oprcanhash flag in future 7.4.*
releases.  This would not fix the problem for existing installations
(unless they initdb) but any complainers could be told how to adjust
their catalogs manually.

Can anyone think of any other approaches?

            regards, tom lane

Re: [GENERAL] Join works in 7.3.6, fails in 7.4.2

From
Bruce Momjian
Date:
This has not been fixed yet, right?

---------------------------------------------------------------------------

Tom Lane wrote:
> I wrote:
> > Michael Fuhr <mike@fuhr.org> writes:
> >> I have a query that works in 7.3.6 but not in 7.4.2 unless I turn
> >> off enable_hashjoin.  I'm joining a table of network interfaces and
> >> a table of networks so I can find additional info about a particular
> >> interface's network.
>
> > Hmm.  The inet = operator is marked hashable in 7.4 but not in 7.3 ...
> > I wonder if that is a mistake?
>
> Digging further, I find that indeed this seems to be a mistake.  CIDR
> and INET values that have the same address and masklen compare as equal
> according to network_eq(), but they will not hash the same because
> there's a flag identifying whether a given value is considered CIDR or
> INET.  And what the network() function returns is marked as a CIDR.
> It's a bit surprising that your hash join produces any matches at all...
>
> I believe I got misled on this because there is a hash index operator
> class for inet; at one point during the 7.4 cycle I went around and
> cleaned up cases where the equality operator's canhash flag was
> inconsistent with the set of hash index opclasses.  Arguably the hash
> opclass is broken, although in practice people probably don't notice the
> failure since a given column is likely to contain either all inet or all
> cidr values.  (And of course it's entirely likely that there *aren't*
> any people using the inet hash opclass, period...)
>
> I can think of a number of possible fixes:
>
> 1. Mark inet = as not hashjoinable.  We'd probably want to remove the
>    inet hash opclass too.
>
> 2. Redefine inet = so that CIDR and INET values are never considered
>    equal, thus eliminating the unused field.  This could be back-patched
>    into 7.4 but otherwise seems to have little to recommend it.  It
>    would certainly not help solve Michael's problem.
>
> 3. Provide a specialized hash method for type inet that ignores the
>    iptype field.
>
> #3 seems the most desirable going forward, but is probably impractical
> to back-patch into 7.4.*, so I'm not sure what to do about the problem
> in that branch.  Given the relatively low incidence of the problem,
> maybe it's okay to just clear the oprcanhash flag in future 7.4.*
> releases.  This would not fix the problem for existing installations
> (unless they initdb) but any complainers could be told how to adjust
> their catalogs manually.
>
> Can anyone think of any other approaches?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [GENERAL] Join works in 7.3.6, fails in 7.4.2

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>>> Hmm.  The inet = operator is marked hashable in 7.4 but not in 7.3 ...
>>> I wonder if that is a mistake?
>>
>> Digging further, I find that indeed this seems to be a mistake.

> This has not been fixed yet, right?

Right, it's still on the to-do list.  I think it's a must-fix item for
7.5, but a sufficiently localized one that I haven't worried too much
about getting it done early.

            regards, tom lane