Thread: Assistance with Query Optimisation?

Assistance with Query Optimisation?

From
Shaun Johnston
Date:
Hi

Apologies in advance for the verbosity of my explanation for this
problem, but I think it's all pertinent.

I have a fairly simple query which postgresql's query planner seems to
be interpreting / optimising in interesting ways:

Query:

SELECT *
FROM account_transaction
WHERE account_id = 10783
ORDER BY account_transaction_timestamp asc
OFFSET 0
LIMIT 100

Table "public.account_transaction"

(I have stripped the default values etc for the sake of formatting)

            Column             |           Type           |
-------------------------------+--------------------------+
 account_transaction_id        | bigint                   |
 account_transaction_timestamp | timestamp with time zone |
 account_id                    | integer                  |
 other_account_transaction_id  | bigint                   |
 transaction_reason            | text                     |
 transaction_amount            | numeric(15,2)            |
 transaction_exchange_rate     | numeric(20,10)           |
 transaction_base_amount       | numeric(15,2)            |
 transaction_locked_until      | timestamp with time zone |
 transaction_approved          | boolean                  |
Indexes:
    "account_transaction_pkey" PRIMARY KEY, btree (account_transaction_id),
     tablespace "indexes"
    "account_transaction_ai" btree (account_id), tablespace "indexes"
    "account_transaction_timestamp" btree (account_transaction_timestamp),
     tablespace "indexes"
    "account_transaction_tlu" btree (transaction_locked_until),
     tablespace "indexes"
Foreign-key constraints:
    "$1" FOREIGN KEY (account_id) REFERENCES
      account(account_id)
    "$2" FOREIGN KEY (other_account_transaction_id) REFERENCES
      account_transaction(account_transaction_id) ON UPDATE CASCADE

Query Plans:

With Sort and Limit
---------------------------------------------------------------------------------
                                    QUERY
PLAN
---------------------------------------------------------------------------------
 Limit  (cost=0.00..12297.59 rows=100 width=120)
 (actual time=23.537..275476.496 rows=100 loops=1)
   ->  Index Scan Backward using account_transaction_timestamp on
       account_transaction
       (cost=0.00..640704.23 rows=5210 width=120)
       (actual time=23.529..275475.781 rows=100 loops=1)
         Filter: (account_id = 10783)
 Total runtime: 275476.944 ms


With Sort but no Limit
--------------------------------------------------------------------------------
                                    QUERY
PLAN
--------------------------------------------------------------------------------
 Sort  (cost=18777.41..18790.43 rows=5210 width=120)
 (actual time=1081.226..1082.170 rows=308 loops=1)
   Sort Key: account_transaction_timestamp
   ->  Index Scan using account_transaction_ai on account_transaction
       (cost=0.00..18455.77 rows=5210 width=120)
       (actual time=47.731..1070.788 rows=308 loops=1)
         Index Cond: (account_id = 10783)
 Total runtime: 1083.182 ms

With Limit but no Sort
---------------------------------------------------------------------------------
                                    QUERY
PLAN
---------------------------------------------------------------------------------
 Limit  (cost=0.00..354.24 rows=100 width=120)
 (actual time=0.029..1.070 rows=100 loops=1)
   ->  Index Scan using account_transaction_ai on account_transaction
       (cost=0.00..18455.77 rows=5210 width=120)
       (actual time=0.022..0.467 rows=100 loops=1)
         Index Cond: (account_id = 10783)
 Total runtime: 1.422 ms

With Limit and Sort, but sorted by transaction_base_amount
---------------------------------------------------------------------------------
                                    QUERY
PLAN
---------------------------------------------------------------------------------
 Limit  (cost=18777.41..18777.66 rows=100 width=120)
 (actual time=55.294..56.221 rows=100 loops=1)
   ->  Sort  (cost=18777.41..18790.43 rows=5210 width=120)
       (actual time=55.285..55.600 rows=100 loops=1)
         Sort Key: transaction_base_amount
         ->  Index Scan using account_transaction_ai on
account_transaction
             (cost=0.00..18455.77 rows=5210 width=120)
             (actual time=0.057..53.187 rows=308 loops=1)
               Index Cond: (account_id = 10783)
 Total runtime: 56.597 ms

The table currently contains about 10 million records.

If I sort by account_transaction_timestamp then limit, the planner
performs a backward index scan based on account_transaction_timestamp
then limits.

If I sort but don't limit, it performs a forward scan on the table, then
a sort on the results.

If I limit but don't sort, it performs the forward scan then limits.

If I limit and sort, but sort by transaction_base_amount instead, it
performs a forward index scan, then sorts the results, then limits - in
stark contrast to sorting and limiting using account_transaction_timestamp.

As shown, there is a vast difference in efficiency between sorting and
limiting by account_transaction_timestamp, and doing the same using
transaction_base_amount (in favour of transaction_base_amount by a
factor of about 4860).  This seems to be due to the way the planner is
optimising the query.

Could this optimisation peculiarity be due to the combination of indexed
columns in the query?  Also, Is there a way I can 'force' the planner to
perform an 'index scan -> sort -> limit' or even better an 'index scan
-> limit -> sort'?

Any pointers / assistance appreciated.

Kind Regards,

Shaun Johnston

Re: Assistance with Query Optimisation?

From
"Isak Hansen"
Date:
On 3/1/07, Shaun Johnston <shaunj@benon.com> wrote:
> Hi
>
> Apologies in advance for the verbosity of my explanation for this
> problem, but I think it's all pertinent.
>
> I have a fairly simple query which postgresql's query planner seems to
> be interpreting / optimising in interesting ways:
>
> Query:
>
> SELECT *
> FROM account_transaction
> WHERE account_id = 10783
> ORDER BY account_transaction_timestamp asc
> OFFSET 0
> LIMIT 100
>
> Table "public.account_transaction"
>
> (I have stripped the default values etc for the sake of formatting)
>
>             Column             |           Type           |
> -------------------------------+--------------------------+
>  account_transaction_id        | bigint                   |
>  account_transaction_timestamp | timestamp with time zone |
>  account_id                    | integer                  |
>  other_account_transaction_id  | bigint                   |
>  transaction_reason            | text                     |
>  transaction_amount            | numeric(15,2)            |
>  transaction_exchange_rate     | numeric(20,10)           |
>  transaction_base_amount       | numeric(15,2)            |
>  transaction_locked_until      | timestamp with time zone |
>  transaction_approved          | boolean                  |
> Indexes:
>     "account_transaction_pkey" PRIMARY KEY, btree (account_transaction_id),
>      tablespace "indexes"
>     "account_transaction_ai" btree (account_id), tablespace "indexes"
>     "account_transaction_timestamp" btree (account_transaction_timestamp),
>      tablespace "indexes"
>     "account_transaction_tlu" btree (transaction_locked_until),
>      tablespace "indexes"
> Foreign-key constraints:
>     "$1" FOREIGN KEY (account_id) REFERENCES
>       account(account_id)
>     "$2" FOREIGN KEY (other_account_transaction_id) REFERENCES
>       account_transaction(account_transaction_id) ON UPDATE CASCADE
>
> Query Plans:
>
> With Sort and Limit
> ---------------------------------------------------------------------------------
>                                     QUERY
> PLAN
> ---------------------------------------------------------------------------------
>  Limit  (cost=0.00..12297.59 rows=100 width=120)
>  (actual time=23.537..275476.496 rows=100 loops=1)
>    ->  Index Scan Backward using account_transaction_timestamp on
>        account_transaction
>        (cost=0.00..640704.23 rows=5210 width=120)
>        (actual time=23.529..275475.781 rows=100 loops=1)
>          Filter: (account_id = 10783)
>  Total runtime: 275476.944 ms
>
>
> With Sort but no Limit
> --------------------------------------------------------------------------------
>                                     QUERY
> PLAN
> --------------------------------------------------------------------------------
>  Sort  (cost=18777.41..18790.43 rows=5210 width=120)
>  (actual time=1081.226..1082.170 rows=308 loops=1)
>    Sort Key: account_transaction_timestamp
>    ->  Index Scan using account_transaction_ai on account_transaction
>        (cost=0.00..18455.77 rows=5210 width=120)
>        (actual time=47.731..1070.788 rows=308 loops=1)
>          Index Cond: (account_id = 10783)
>  Total runtime: 1083.182 ms
>
> With Limit but no Sort
> ---------------------------------------------------------------------------------
>                                     QUERY
> PLAN
> ---------------------------------------------------------------------------------
>  Limit  (cost=0.00..354.24 rows=100 width=120)
>  (actual time=0.029..1.070 rows=100 loops=1)
>    ->  Index Scan using account_transaction_ai on account_transaction
>        (cost=0.00..18455.77 rows=5210 width=120)
>        (actual time=0.022..0.467 rows=100 loops=1)
>          Index Cond: (account_id = 10783)
>  Total runtime: 1.422 ms
>
> With Limit and Sort, but sorted by transaction_base_amount
> ---------------------------------------------------------------------------------
>                                     QUERY
> PLAN
> ---------------------------------------------------------------------------------
>  Limit  (cost=18777.41..18777.66 rows=100 width=120)
>  (actual time=55.294..56.221 rows=100 loops=1)
>    ->  Sort  (cost=18777.41..18790.43 rows=5210 width=120)
>        (actual time=55.285..55.600 rows=100 loops=1)
>          Sort Key: transaction_base_amount
>          ->  Index Scan using account_transaction_ai on
> account_transaction
>              (cost=0.00..18455.77 rows=5210 width=120)
>              (actual time=0.057..53.187 rows=308 loops=1)
>                Index Cond: (account_id = 10783)
>  Total runtime: 56.597 ms
>
> The table currently contains about 10 million records.
>
> If I sort by account_transaction_timestamp then limit, the planner
> performs a backward index scan based on account_transaction_timestamp
> then limits.
>
> If I sort but don't limit, it performs a forward scan on the table, then
> a sort on the results.
>
> If I limit but don't sort, it performs the forward scan then limits.
>
> If I limit and sort, but sort by transaction_base_amount instead, it
> performs a forward index scan, then sorts the results, then limits - in
> stark contrast to sorting and limiting using account_transaction_timestamp.
>
> As shown, there is a vast difference in efficiency between sorting and
> limiting by account_transaction_timestamp, and doing the same using
> transaction_base_amount (in favour of transaction_base_amount by a
> factor of about 4860).  This seems to be due to the way the planner is
> optimising the query.
>
> Could this optimisation peculiarity be due to the combination of indexed
> columns in the query?  Also, Is there a way I can 'force' the planner to
> perform an 'index scan -> sort -> limit' or even better an 'index scan
> -> limit -> sort'?
>
> Any pointers / assistance appreciated.
>

Starting off with a disclaimer, just to get that out of the way; I'm a
programmer; no DBA, have limited experience with Postgres, and my
terminology is probably way off.


First thing I notice is that your query plans seem to only use one index.

Postgres should be able to combine the timestamp and account_id
indexes in that first query, if the optimizer thought there was a
point in doing so?

Did you analyze the db recently? The optimizer won't perform too well
with incorrect statistics.

If that doesn't help, you could perhaps add account_id to your timestamp index.


I have no idea how far the postgres planner goes in restructuring your
queries, but using explicit subqueries, correlated vs. non-correlated,
may give it a good hint about what you're trying to do.


HTH,
Isak


> Kind Regards,
>
> Shaun Johnston
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>

Re: Assistance with Query Optimisation?

From
Shaun Johnston
Date:

Isak Hansen wrote:
> First thing I notice is that your query plans seem to only use one index.
>
> Postgres should be able to combine the timestamp and account_id
> indexes in that first query, if the optimizer thought there was a
> point in doing so?
>
Absolutely brilliant - this solved my issue, brought the query time down
to about 2 seconds initially, then about 500 milliseconds thereafter.
Thank you very much for this tip.