Re: Tuning a query with ORDER BY and LIMIT - Mailing list pgsql-general

From Michael van der Kolff
Subject Re: Tuning a query with ORDER BY and LIMIT
Date
Msg-id CAFBbO2TqqLUBO8qAocqG0bEAUfHcx+azpqBNR1Mx-R2uQY8+=A@mail.gmail.com
Whole thread Raw
In response to Tuning a query with ORDER BY and LIMIT  ("Dirschel, Steve" <steve.dirschel@thomsonreuters.com>)
List pgsql-general
What do you see when you remove the LIMIT clause? It may be possible to rewrite this using ROW_NUMBER.

--Michael

On Thu, Jun 23, 2022 at 5:39 AM Dirschel, Steve <steve.dirschel@thomsonreuters.com> wrote:

I am fairly new to tuning Postgres queries.  I have a long background tuning Oracle queries.

 

Posrgres version 10.11

 

Here is the DDL for the index the query is using:

 

create index workflow_execution_initial_ui_tabs

    on workflow_execution (workflow_id asc, status asc, result asc, completed_datetime desc);

 

 

explain (analyze, verbose, costs, buffers, timing, summary, hashes)

select * from workflow_execution

where workflow_id = 14560 and

      status = 'COMPLETED' and

      result in ('SUCCEEDED','REEXECUTED','ABORTED','DISCONTINUED','FAILED','PARTIAL_SUCCESS')

order by completed_datetime desc limit 50;

 

--

Limit  (cost=56394.91..56395.04 rows=50 width=1676) (actual time=3400.608..3400.622 rows=50 loops=1)

"  Output: execution_id, state_machine_id, workflow_id, started_datetime, completed_datetime, status, execution_context_s3_arn, ol_version, created_datetime, updated_datetime, deleted_millis, acquisition_channel_id, correlation_id, result, state_machine_execution_arn, created_by_id, updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message, acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id, notification_trigger_workflow_id, acquired_object_name, subscription_guid"

  Buffers: shared hit=142368

  ->  Sort  (cost=56394.91..56432.71 rows=15118 width=1676) (actual time=3400.607..3400.615 rows=50 loops=1)

"        Output: execution_id, state_machine_id, workflow_id, started_datetime, completed_datetime, status, execution_context_s3_arn, ol_version, created_datetime, updated_datetime, deleted_millis, acquisition_channel_id, correlation_id, result, state_machine_execution_arn, created_by_id, updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message, acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id, notification_trigger_workflow_id, acquired_object_name, subscription_guid"

        Sort Key: workflow_execution.completed_datetime DESC

        Sort Method: top-N heapsort  Memory: 125kB

        Buffers: shared hit=142368

        ->  Index Scan using workflow_execution_initial_ui_tabs on workflow.workflow_execution  (cost=0.69..55892.70 rows=15118 width=1676) (actual time=0.038..2258.579 rows=2634718 loops=1)

"              Output: execution_id, state_machine_id, workflow_id, started_datetime, completed_datetime, status, execution_context_s3_arn, ol_version, created_datetime, updated_datetime, deleted_millis, acquisition_channel_id, correlation_id, result, state_machine_execution_arn, created_by_id, updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message, acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id, notification_trigger_workflow_id, acquired_object_name, subscription_guid"

"              Index Cond: ((workflow_execution.workflow_id = 14560) AND ((workflow_execution.status)::text = 'COMPLETED'::text) AND ((workflow_execution.result)::text = ANY ('{SUCCEEDED,REEXECUTED,ABORTED,DISCONTINUED,FAILED,PARTIAL_SUCCESS}'::text[])))"

              Buffers: shared hit=142368

Planning time: 0.217 ms

Execution time: 3400.656 ms

 

With Oracle for a query like this since the index is on the 3 columns matching the WHERE clause and the ORDER BY clause is in the 4th position Oracle would be able to scan that index and as soon as it finds the first matching 50 rows.  But as you can see above Postgres is finding 2,634,718 matching rows for the WHERE clause , sorts them, and then returns the first 50 rows.   

 

I was questioning if the result IN clause was causing the issue so I ran the query with result = and see the same results:

 

explain (analyze, verbose, costs, buffers, timing, summary, hashes)

select * from workflow_execution

where workflow_id = 14560 and

      status = 'COMPLETED' and

      result = 'SUCCEEDED'

order by completed_datetime desc limit 50;

 

Limit  (cost=54268.09..54268.22 rows=50 width=1676) (actual time=3372.453..3372.467 rows=50 loops=1)

"  Output: execution_id, state_machine_id, workflow_id, started_datetime, completed_datetime, status, execution_context_s3_arn, ol_version, created_datetime, updated_datetime, deleted_millis, acquisition_channel_id, correlation_id, result, state_machine_execution_arn, created_by_id, updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message, acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id, notification_trigger_workflow_id, acquired_object_name, subscription_guid"

  Buffers: shared hit=140313

  ->  Sort  (cost=54268.09..54304.46 rows=14547 width=1676) (actual time=3372.452..3372.460 rows=50 loops=1)

"        Output: execution_id, state_machine_id, workflow_id, started_datetime, completed_datetime, status, execution_context_s3_arn, ol_version, created_datetime, updated_datetime, deleted_millis, acquisition_channel_id, correlation_id, result, state_machine_execution_arn, created_by_id, updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message, acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id, notification_trigger_workflow_id, acquired_object_name, subscription_guid"

        Sort Key: workflow_execution.completed_datetime DESC

        Sort Method: top-N heapsort  Memory: 125kB

        Buffers: shared hit=140313

        ->  Index Scan using workflow_execution_initial_ui_tabs on workflow.workflow_execution  (cost=0.69..53784.85 rows=14547 width=1676) (actual time=0.034..2238.867 rows=2616747 loops=1)

"              Output: execution_id, state_machine_id, workflow_id, started_datetime, completed_datetime, status, execution_context_s3_arn, ol_version, created_datetime, updated_datetime, deleted_millis, acquisition_channel_id, correlation_id, result, state_machine_execution_arn, created_by_id, updated_by_id, acquired_gcs_s3_object, sqs_trigger_id, trigger_message, acquired_gcs_s3_object_uuid, api_trigger_id, scheduled_trigger_id, notification_trigger_workflow_id, acquired_object_name, subscription_guid"

              Index Cond: ((workflow_execution.workflow_id = 14560) AND ((workflow_execution.status)::text = 'COMPLETED'::text) AND ((workflow_execution.result)::text = 'SUCCEEDED'::text))

              Buffers: shared hit=140313

Planning time: 0.264 ms

Execution time: 3372.511 ms

 

 

Is Postgres unable to optimize the query similar to Oracle?  Is it possible this is possible but we are running on too old of a version?

 

Thanks in advance for any input.

 

Steve

 

 

This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html

pgsql-general by date:

Previous
From: "Dirschel, Steve"
Date:
Subject: Tuning a query with ORDER BY and LIMIT
Next
From: Tomas Pospisek
Date:
Subject: Re: ERROR: new collation (en_US.UTF-8) is incompatible with the collation of the template database (en_US.utf-8)