Re: Assistance with Query Optimisation? - Mailing list pgsql-general

From Isak Hansen
Subject Re: Assistance with Query Optimisation?
Date
Msg-id 6b9e1eb20703010452m6fa97914he629df06e01a5601@mail.gmail.com
Whole thread Raw
In response to Assistance with Query Optimisation?  (Shaun Johnston <shaunj@benon.com>)
Responses Re: Assistance with Query Optimisation?
List pgsql-general
On 3/1/07, Shaun Johnston <shaunj@benon.com> wrote:
> Hi
>
> Apologies in advance for the verbosity of my explanation for this
> problem, but I think it's all pertinent.
>
> I have a fairly simple query which postgresql's query planner seems to
> be interpreting / optimising in interesting ways:
>
> Query:
>
> SELECT *
> FROM account_transaction
> WHERE account_id = 10783
> ORDER BY account_transaction_timestamp asc
> OFFSET 0
> LIMIT 100
>
> Table "public.account_transaction"
>
> (I have stripped the default values etc for the sake of formatting)
>
>             Column             |           Type           |
> -------------------------------+--------------------------+
>  account_transaction_id        | bigint                   |
>  account_transaction_timestamp | timestamp with time zone |
>  account_id                    | integer                  |
>  other_account_transaction_id  | bigint                   |
>  transaction_reason            | text                     |
>  transaction_amount            | numeric(15,2)            |
>  transaction_exchange_rate     | numeric(20,10)           |
>  transaction_base_amount       | numeric(15,2)            |
>  transaction_locked_until      | timestamp with time zone |
>  transaction_approved          | boolean                  |
> Indexes:
>     "account_transaction_pkey" PRIMARY KEY, btree (account_transaction_id),
>      tablespace "indexes"
>     "account_transaction_ai" btree (account_id), tablespace "indexes"
>     "account_transaction_timestamp" btree (account_transaction_timestamp),
>      tablespace "indexes"
>     "account_transaction_tlu" btree (transaction_locked_until),
>      tablespace "indexes"
> Foreign-key constraints:
>     "$1" FOREIGN KEY (account_id) REFERENCES
>       account(account_id)
>     "$2" FOREIGN KEY (other_account_transaction_id) REFERENCES
>       account_transaction(account_transaction_id) ON UPDATE CASCADE
>
> Query Plans:
>
> With Sort and Limit
> ---------------------------------------------------------------------------------
>                                     QUERY
> PLAN
> ---------------------------------------------------------------------------------
>  Limit  (cost=0.00..12297.59 rows=100 width=120)
>  (actual time=23.537..275476.496 rows=100 loops=1)
>    ->  Index Scan Backward using account_transaction_timestamp on
>        account_transaction
>        (cost=0.00..640704.23 rows=5210 width=120)
>        (actual time=23.529..275475.781 rows=100 loops=1)
>          Filter: (account_id = 10783)
>  Total runtime: 275476.944 ms
>
>
> With Sort but no Limit
> --------------------------------------------------------------------------------
>                                     QUERY
> PLAN
> --------------------------------------------------------------------------------
>  Sort  (cost=18777.41..18790.43 rows=5210 width=120)
>  (actual time=1081.226..1082.170 rows=308 loops=1)
>    Sort Key: account_transaction_timestamp
>    ->  Index Scan using account_transaction_ai on account_transaction
>        (cost=0.00..18455.77 rows=5210 width=120)
>        (actual time=47.731..1070.788 rows=308 loops=1)
>          Index Cond: (account_id = 10783)
>  Total runtime: 1083.182 ms
>
> With Limit but no Sort
> ---------------------------------------------------------------------------------
>                                     QUERY
> PLAN
> ---------------------------------------------------------------------------------
>  Limit  (cost=0.00..354.24 rows=100 width=120)
>  (actual time=0.029..1.070 rows=100 loops=1)
>    ->  Index Scan using account_transaction_ai on account_transaction
>        (cost=0.00..18455.77 rows=5210 width=120)
>        (actual time=0.022..0.467 rows=100 loops=1)
>          Index Cond: (account_id = 10783)
>  Total runtime: 1.422 ms
>
> With Limit and Sort, but sorted by transaction_base_amount
> ---------------------------------------------------------------------------------
>                                     QUERY
> PLAN
> ---------------------------------------------------------------------------------
>  Limit  (cost=18777.41..18777.66 rows=100 width=120)
>  (actual time=55.294..56.221 rows=100 loops=1)
>    ->  Sort  (cost=18777.41..18790.43 rows=5210 width=120)
>        (actual time=55.285..55.600 rows=100 loops=1)
>          Sort Key: transaction_base_amount
>          ->  Index Scan using account_transaction_ai on
> account_transaction
>              (cost=0.00..18455.77 rows=5210 width=120)
>              (actual time=0.057..53.187 rows=308 loops=1)
>                Index Cond: (account_id = 10783)
>  Total runtime: 56.597 ms
>
> The table currently contains about 10 million records.
>
> If I sort by account_transaction_timestamp then limit, the planner
> performs a backward index scan based on account_transaction_timestamp
> then limits.
>
> If I sort but don't limit, it performs a forward scan on the table, then
> a sort on the results.
>
> If I limit but don't sort, it performs the forward scan then limits.
>
> If I limit and sort, but sort by transaction_base_amount instead, it
> performs a forward index scan, then sorts the results, then limits - in
> stark contrast to sorting and limiting using account_transaction_timestamp.
>
> As shown, there is a vast difference in efficiency between sorting and
> limiting by account_transaction_timestamp, and doing the same using
> transaction_base_amount (in favour of transaction_base_amount by a
> factor of about 4860).  This seems to be due to the way the planner is
> optimising the query.
>
> Could this optimisation peculiarity be due to the combination of indexed
> columns in the query?  Also, Is there a way I can 'force' the planner to
> perform an 'index scan -> sort -> limit' or even better an 'index scan
> -> limit -> sort'?
>
> Any pointers / assistance appreciated.
>

Starting off with a disclaimer, just to get that out of the way; I'm a
programmer; no DBA, have limited experience with Postgres, and my
terminology is probably way off.


First thing I notice is that your query plans seem to only use one index.

Postgres should be able to combine the timestamp and account_id
indexes in that first query, if the optimizer thought there was a
point in doing so?

Did you analyze the db recently? The optimizer won't perform too well
with incorrect statistics.

If that doesn't help, you could perhaps add account_id to your timestamp index.


I have no idea how far the postgres planner goes in restructuring your
queries, but using explicit subqueries, correlated vs. non-correlated,
may give it a good hint about what you're trying to do.


HTH,
Isak


> Kind Regards,
>
> Shaun Johnston
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>

pgsql-general by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Please Help ! Error with Access 2003
Next
From: Kenneth Downs
Date:
Subject: Re: usage for 'with recursive'?