Re: Using contains operator on arrays that have duplicate elements: feedback request. - Mailing list pgsql-general

From Seref Arikan
Subject Re: Using contains operator on arrays that have duplicate elements: feedback request.
Date
Msg-id CA+4ThdqD-wV8NNTaQeAymUJbsJjQY_2BqoC3eLhhuRw5NccdWw@mail.gmail.com
Whole thread Raw
In response to Re: Using contains operator on arrays that have duplicate elements: feedback request.  ("David Johnston" <polobo@yahoo.com>)
List pgsql-general
Thanks a lot David,
You are right, I want to left to be a superset (or equal at the limit) of the right set, but taking cardinalities of elements into account. Please do not spend time on this, since it appears I can't keep the indexing advantage of array operation under these circumstances.

Actually, the sets are created via concatenating ids that represent node types in a tree. I wanted to develop a solution using arrays, but maybe I should look into ltree, which may help me do what I want to do without losing the advantage of indexing.

Or I can implement some other logic in the business layer that ensures that duplicate elements are eliminated before inserts and queries with a replacement strategy that changes second and further occurrences of node ids on both left and right side of the array operation. This may help me keep array based approach. I am not sure how ltree would scale in terms of performance to a few billion rows.

Best regards
Seref


On Wed, Aug 8, 2012 at 3:45 PM, David Johnston <polobo@yahoo.com> wrote:
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Seref Arikan
Sent: Wednesday, August 08, 2012 10:35 AM
To: PG-General Mailing List
Subject: [GENERAL] Using contains operator on arrays that have duplicate
elements: feedback request.

Greetings,
I've been trying to use contains operator on an array such as {3,4,9,4,5}
My problem is, I'd like to get rows that has two 4s in them, and contains
operator seems to work separately for each member of the right operand.
Therefore

select  '{3,4,9,4,5}'::int[] @> '{4,4,4}'

returns true. I'm trying to implement the semantics of the operation so that
the set {4,4,4} is a single parameter to a subset query, which should return
false in the case above.

Is there any way I can force the contains operation to consider cardinality
of unique members of the set ? Arrays look like to simplest way of
performing set operations on a group of values, but is there anything else
that can be an alternative to them?

Best regards
Seref

============================================================

The only thing I can think of is to create a function that unnest(s) the two
supplied arrays, groups on the value and performs a count, then LEFT JOINs
the two grouped relations - limited by a count comparison - and returns
TRUE/FALSE depending on whether there are any NULLs on the right side of the
join.

I may try a stab and an example of this later if it is still an open
question but hopefully this will help you.

I do not believe you are going to be able to change the behavior of
"contains" in the way that you want.

You might also try performing an intersection and then confirm that the
result is the same as the compare-to array; it seems that you want the
left-side to be a super-set of the right-side in the example provided.

Apologies if I get some of the left/right/compare-to terminology mixed-up.
I hope that at least the intent is clear.

HTH

David J.



pgsql-general by date:

Previous
From: "David Johnston"
Date:
Subject: Re: Using contains operator on arrays that have duplicate elements: feedback request.
Next
From: "Kevin Grittner"
Date:
Subject: Re: File system level backup