filter duplicates by priority - Mailing list pgsql-general

From Clark Slater
Subject filter duplicates by priority
Date
Msg-id 52376.96.236.139.176.1247580252.squirrel@webmail8.pair.com
Whole thread Raw
Responses Re: filter duplicates by priority
Re: filter duplicates by priority
Re: filter duplicates by priority
Re: filter duplicates by priority
List pgsql-general
Hello-

I am trying to use DISTINCT ON to filter out *potential* duplicate values
from a set of sub queries.  There are certain cases where there can be
repetitive part numbers that are priced differently.  I'm trying to start
with the full list, ordered by priority, and then remove any repeats that
have a lesser priority.

SELECT DISTINCT ON (part_number) * FROM (
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
) AS filter_duplicates ORDER BY priority,part_number

The above statement does not work because if I ORDER BY
priority,part_number then I have to DISTINCT ON (priority,part_number).
But DISTINCT ON (priority, part_number) does not remove the repeated rows
because the same part_number with a different priority becomes a distinct
tuple.

Any suggestions are appreciated.

--------------------------------------

A more detailed explanation of my problem follows:

I am working on an e-commerce system that has different lists of products
which contain many of the same products, at different prices.  When a user
searches for a certain set of part numbers, I would like the resulting
products (and prices) to come from one of the lists, according to the
list's priority.  Each user can have a different set of lists and
priorities.

Table: product_lists
  id |             name             | priority |  user_id
-----+------------------------------+----------+----------
  5  |  General List of Products    |     2    |     23
  3  |  Different List of Products  |     3    |     23
 150 |  Customer-Specific Products  |     1    |     23

Table: products
 product_list_id | part_number |  price
-----------------+-------------+--------
        3        |   92298A    | 123.38
        5        |   92298A    | 111.04
        3        |   C39207    |  78.38
       150       |   C39207    |  67.93

Below is a simplified example of the structure of the query I am working
with.  I realize that in this case, I could re-factor all of this into one
statement, but each sub-query in the real case has a more complex set of
joins that determines the price.  The pricing joins from one sub-query to
the next vary, so a collection of sub-queries seemed to be a logical
solution.  Some part numbers are found in only one of the lists, while
other part numbers are repeated across lists at different prices.

This is what I would *like* to say:

SELECT DISTINCT ON (part_number) * FROM (

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id=product_lists.id
AND product_list_id=150
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

UNION ALL

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id= product_lists.id
AND product_list_id=5
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

UNION ALL

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id= product_lists.id
AND product_list_id=3
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

) AS filter_duplicates ORDER BY priority,part_number

I need to ORDER BY priority so that, in the case of duplicates, the
product from the desired list is returned first.  Then the purpose of
DISTINCT ON is to filter out any duplicate part numbers that have a lesser
priority.  But, the above statement fails because the DISTINCT ON
expression must match the leftmost ORDER BY expression.  However,
inserting the priority into the DISTINCT ON expression means that all of
the resulting tuples are unique, even though the part_number is the same.


pgsql-general by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Best practices for moving UTF8 databases
Next
From: Dan Armbrust
Date:
Subject: Re: Checkpoint Tuning Question