[GENERAL] improvements/feedback sought for a working query that looks a bit ugly and might be inefficient - Mailing list pgsql-general

From jonathan vanasco
Subject [GENERAL] improvements/feedback sought for a working query that looks a bit ugly and might be inefficient
Date
Msg-id 72883A7C-E8CF-409E-BAB9-6F60F0B3BD5D@2xlp.com
Whole thread Raw
Responses Re: [GENERAL] improvements/feedback sought for a working query thatlooks a bit ugly and might be inefficient
List pgsql-general
Everything here works fine - but after a handful of product iterations & production adjustments, a query that handles a
"taskqueue" across a few tables looks a bit ugly. 

I'm wondering if anyone can see obvious improvements.

There are 3 tables:
    upstream_provider
    task
    task_queue

Originally we needed to select 50 items off the top of the queue at a time.
Then we needed to set a max of 5 tasks per upstream provider (It's a cheap way to handle throttling).
The table is quite large, so a limit of the last 1000 items drastically improved performance.
The query got ugly when we needed to add a "priority" toggle to the queue -- basically to mark things as "process
ASAP".  
The only way I could figure out how to do that, was to add a sort -- on "is_priority DESC NULLS LAST".
My concern is that the sort needs to happen 3x --
    in the subselect for 1000 items
    in the partition for row numbering
    in the final sort


If anyone has a moment to look this over and suggest anything, I'd be very thankful.  A working reproduction is below.


SQL---
--------------

CREATE TABLE upstream_provider (id SERIAL PRIMARY KEY,
                                name VARCHAR(32),
                                is_paused BOOLEAN DEFAULT NULL
                                );
CREATE TABLE task (id SERIAL PRIMARY KEY,
                   upstream_provider_id INT NOT NULL REFERENCES upstream_provider(id),
                   name VARCHAR(32)
                   );
CREATE TABLE task_queue (id SERIAL PRIMARY KEY,
                         task_id INT NOT NULL REFERENCES task(id),
                         upstream_provider_id INT NOT NULL REFERENCES upstream_provider(id),  # only here because it
eliminatesexpensive joins elsewhere 
                         processing_status BOOLEAN DEFAULT NULL,
                         is_priority BOOLEAN DEFAULT NULL
                         );

SELECT  partition1.*
      , task.*
FROM (SELECT  window1.*
            , row_number() OVER (PARTITION BY window1.upstream_provider_id
                                 ORDER BY window1.is_priority DESC NULLS LAST,
                                          window1.task_queue_id
                                 ) AS rownum
      FROM (SELECT   qu.id AS task_queue_id
                   , qu.upstream_provider_id
                   , qu.task_id
                   , qu.is_priority
            FROM
                task_queue qu
            JOIN
                upstream_provider ON qu.upstream_provider_id = upstream_provider.id
            WHERE (qu.processing_status IS NULL)
              AND (upstream_provider.is_paused IS NOT TRUE)
            ORDER BY is_priority DESC NULLS LAST,
                     qu.id DESC
            LIMIT 1000
            ) window1
      ) partition1
JOIN task ON partition1.task_id = task.id
WHERE partition1.rownum < 5
ORDER BY is_priority DESC NULLS LAST,
         task_queue_id DESC
LIMIT
    50
;

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: [GENERAL] sorry, too many clients already error
Next
From: "David G. Johnston"
Date:
Subject: Re: [GENERAL] improvements/feedback sought for a working query thatlooks a bit ugly and might be inefficient