Thread: Postgres not willing to use an index?
I have a table, like this: CREATE TABLE transactions ( transaction_id integer NOT NULL DEFAULT nextval('transactions_seq'::regclass), transaction_type integer NOT NULL, transaction_client_id integer NOT NULL, transaction_destination_id integer NOT NULL, transaction_operator_id integer NOT NULL, transaction_application_id integer NOT NULL, transaction_application_service character varying NOT NULL, transaction_quantity integer NOT NULL, transaction_time_commit timestamp with time zone NOT NULL, transaction_time_received timestamp with time zone NOT NULL, transaction_gateway_id character(36) NOT NULL, transaction_payment_amount integer NOT NULL DEFAULT 0, CONSTRAINT transactions_pk PRIMARY KEY (transaction_id), CONSTRAINT transactions_uq__gateway_id UNIQUE (transaction_gateway_id) ) WITH (OIDS=FALSE); Now, all the _type, client_id, destination_id, operator_id, and application_id are foreigen-keyed to coresponding tables. There are no indices on those columns. Besides PK and uq-constraint indices I have this index: CREATE INDEX transactions_idx__client_data ON transactions USING btree (transaction_client_id, transaction_destination_id, transaction_operator_id, transaction_application_id, transaction_time_commit) The table_count is like this: jura=# select count(*) from transactions; count ---------- 13751457 (1 row) There are roughly 500.000 - 600.000 transactions for each month. There are also transactions from past two years in the table. I often SELECT data from the table for specified time period - usualy from begining to the end of the month, like this: SELECT <some-columns> FROM transactions WHERE transaction_time_commit BETWEEN '2009-01-01' AND '2009-01-31 23:59:59'; The problem is that postgres is never using an index: jura=# explain analyze select * from transactions where transaction_time_commit between '2009-01-01' and '2009-01-31 23:59:59'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on transactions (cost=0.00..416865.85 rows=593713 width=91) (actual time=4.067..3918.629 rows=525051 loops=1) Filter: ((transaction_time_commit >= '2009-01-01 00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <= '2009-01-31 23:59:59+01'::timestamp with time zone)) Total runtime: 4026.404 ms (3 rows) Time: 4068.521 ms If I force it not to use sequential scans, it is using index, with benefits of shorter execution time: jura=# set enable_seqscan to false; SET Time: 0.103 ms jura=# explain analyze select * from transactions where transaction_time_commit between '2009-01-01' and '2009-01-31 23:59:59'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on transactions (cost=410369.98..629869.67 rows=593713 width=91) (actual time=1060.569..1280.500 rows=525051 loops=1) Recheck Cond: ((transaction_time_commit >= '2009-01-01 00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <= '2009-01-31 23:59:59+01'::timestamp with time zone)) -> Bitmap Index Scan on transactions_idx__client_data (cost=0.00..410221.55 rows=593713 width=0) (actual time=1058.992..1058.992 rows=525051 loops=1) Index Cond: ((transaction_time_commit >= '2009-01-01 00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <= '2009-01-31 23:59:59+01'::timestamp with time zone)) Total runtime: 1388.882 ms (5 rows) Time: 1396.737 ms Now, I found interesting is that if I create index just on transaction_time_commit column (and I leave transactions_idx__client_data index), then postgres is using that new index. Also, if I change idx__client_data index like this (first I drop it, and then I create new one): CREATE INDEX transactions_idx__client_data ON transactions USING btree (transaction_client_id, transaction_destination_id, transaction_time_commit); then postgres is using that index: jura=# explain analyze select * from transactions where transaction_time_commit between '2009-01-01' and '2009-01-31 23:59:59'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on transactions (cost=349473.37..568973.06 rows=593713 width=91) (actual time=949.224..1128.848 rows=525051 loops=1) Recheck Cond: ((transaction_time_commit >= '2009-01-01 00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <= '2009-01-31 23:59:59+01'::timestamp with time zone)) -> Bitmap Index Scan on transactions_idx__client_data (cost=0.00..349324.94 rows=593713 width=0) (actual time=947.678..947.678 rows=525051 loops=1) Index Cond: ((transaction_time_commit >= '2009-01-01 00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <= '2009-01-31 23:59:59+01'::timestamp with time zone)) Total runtime: 1234.989 ms (5 rows) Time: 1235.727 ms Now, I have many 'selects' on the transactions table (still, not as many as inserts), mostly filtered on transaction_time, client_id, destination_id and application_id, but there is fair amount of 'selects' filtered only on transaction_time. Now, shall I keep the original index and add another one on just transaction_time (there is, I guess, overhead of maintaining two indices), or shall I remove transaction_time from original index, and create another one? And, is it normal for postgres to 'ignore' the transaction_time column in original index? This is the postgres version I'm using: jura=# select version(); version ---------------------------------------------------------------------------------------------------------------- PostgreSQL 8.3.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) (1 row) Mike
On Fri, Feb 6, 2009 at 3:43 PM, Mario Splivalo <mario.splivalo@megafon.hr> wrote: > Besides PK and uq-constraint indices I have this index: > > CREATE INDEX transactions_idx__client_data ON transactions > USING btree (transaction_client_id, transaction_destination_id, > transaction_operator_id, transaction_application_id, > transaction_time_commit) I think it is because it isn't just a simple index, and for some reason planner decides - that going through every transaction_application_id, etc, etc just to find right transaction_time_commit isn't worth the hassle. Try using few more indexes, on less columns. Also, if I may - I don't think it is quite usefull to have column names that include table name in every single one of them, makes things so much less readable. -- GJ
Mario Splivalo <mario.splivalo@megafon.hr> writes: > Besides PK and uq-constraint indices I have this index: > CREATE INDEX transactions_idx__client_data ON transactions > USING btree (transaction_client_id, transaction_destination_id, > transaction_operator_id, transaction_application_id, > transaction_time_commit) > SELECT <some-columns> FROM transactions WHERE transaction_time_commit > BETWEEN '2009-01-01' AND '2009-01-31 23:59:59'; > The problem is that postgres is never using an index: Hardly surprising --- a search on the index's lowest-order column would require scanning practically all of the index. (If you think about the ordering of the index entries you'll see why.) If this is a typical query then you need a separate index on transaction_time_commit. The fine manual goes into some detail about how to design indexes; http://www.postgresql.org/docs/8.3/static/indexes.html particularly 11.3, 11.5. regards, tom lane
On Fri, Feb 6, 2009 at 11:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Mario Splivalo <mario.splivalo@megafon.hr> writes: >> Besides PK and uq-constraint indices I have this index: > >> CREATE INDEX transactions_idx__client_data ON transactions >> USING btree (transaction_client_id, transaction_destination_id, >> transaction_operator_id, transaction_application_id, >> transaction_time_commit) > >> SELECT <some-columns> FROM transactions WHERE transaction_time_commit >> BETWEEN '2009-01-01' AND '2009-01-31 23:59:59'; >> The problem is that postgres is never using an index: > > Hardly surprising --- a search on the index's lowest-order column would > require scanning practically all of the index. (If you think about the > ordering of the index entries you'll see why.) If this is a typical > query then you need a separate index on transaction_time_commit. > > The fine manual goes into some detail about how to design indexes; > http://www.postgresql.org/docs/8.3/static/indexes.html > particularly 11.3, 11.5. What's weird about this example is that when he sets enable_seqscan to off, the bitmap index scan plan is actually substantially faster, even though it in fact does scan nearly the entire heap. I don't understand how it can be faster to scan the index and the heap than to just scan the heap. ...Robert
>>> Robert Haas <robertmhaas@gmail.com> wrote: > What's weird about this example is that when he sets enable_seqscan to > off, the bitmap index scan plan is actually substantially faster, even > though it in fact does scan nearly the entire heap. I don't > understand how it can be faster to scan the index and the heap than to > just scan the heap. It's cached in the second test, maybe? -Kevin
On Fri, Feb 6, 2009 at 12:41 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >>>> Robert Haas <robertmhaas@gmail.com> wrote: >> What's weird about this example is that when he sets enable_seqscan to >> off, the bitmap index scan plan is actually substantially faster, even >> though it in fact does scan nearly the entire heap. I don't >> understand how it can be faster to scan the index and the heap than to >> just scan the heap. > > It's cached in the second test, maybe? I gather that the results were repeatable, but perhaps Mario could double-check that? ...Robert
Tom Lane wrote: > > Hardly surprising --- a search on the index's lowest-order column would > require scanning practically all of the index. (If you think about the > ordering of the index entries you'll see why.) If this is a typical > query then you need a separate index on transaction_time_commit. Yes, actually I just moved transaction_time_commit column to the begining of the index, since, most of the time I run queries based on transaction_time_commit and then transaction_client_id and transaction_destination_id. > The fine manual goes into some detail about how to design indexes; > http://www.postgresql.org/docs/8.3/static/indexes.html > particularly 11.3, 11.5. I see it now. I read the manual concerning CREATE INDEX command, and there is no mention of multicolumn indices, did not notice Note that points to Chapter 11. Mike
Robert Haas wrote: > On Fri, Feb 6, 2009 at 12:41 PM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: >>>>> Robert Haas <robertmhaas@gmail.com> wrote: >>> What's weird about this example is that when he sets enable_seqscan to >>> off, the bitmap index scan plan is actually substantially faster, even >>> though it in fact does scan nearly the entire heap. I don't >>> understand how it can be faster to scan the index and the heap than to >>> just scan the heap. >> It's cached in the second test, maybe? > > I gather that the results were repeatable, but perhaps Mario could > double-check that? I think that it is always cached - the machine has 4GB of RAM, and i'm just using it for testing. Now, I think that the cache is used because there is no I/O wait when I run the queries (if you have any suggestion on how to check cache usage, since I have no idea): jura=# set enable_seqscan to true; SET jura=# explain analyze select * from transactions where transaction_time_commit between '2008-01-01' and '2008-01-31 23:59:59'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on transactions (cost=0.00..418365.68 rows=759775 width=91) (actual time=928.342..3788.232 rows=722176 loops=1) Filter: ((transaction_time_commit >= '2008-01-01 00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <= '2008-01-31 23:59:59+01'::timestamp with time zone)) Total runtime: 3936.744 ms (3 rows) jura=# set enable_seqscan to false; SET jura=# explain analyze select * from transactions where transaction_time_commit between '2008-01-01' and '2008-01-31 23:59:59'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on transactions (cost=428882.89..651630.52 rows=759775 width=91) (actual time=1358.040..1633.867 rows=722176 loops=1) Recheck Cond: ((transaction_time_commit >= '2008-01-01 00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <= '2008-01-31 23:59:59+01'::timestamp with time zone)) -> Bitmap Index Scan on transactions_idx__client_data (cost=0.00..428692.95 rows=759775 width=0) (actual time=1354.485..1354.485 rows=722176 loops=1) Index Cond: ((transaction_time_commit >= '2008-01-01 00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <= '2008-01-31 23:59:59+01'::timestamp with time zone)) Total runtime: 1778.938 ms (5 rows) Now, transactions_idx__client_data index has transaction_time_commit as the last column in index. When I 'recreate' the database, and run the queries again, first run which uses sequential scan is around 10 seconds, heavy I/O, any subsequent query run takes cca 3900 msecs, as shown above. When I say 'disable seqscan', planner uses Bitmap Index Scan, as shown above, just that the first query takes around 25 seconds to run, with heavy I/O. Any subsequent query runs take somewhat less than 2 seconds, as shown above. I'm not sure on what to do to minimize the impact of the OS-cache, apart from taking RAM modules out of the machine - if you have any suggestions I'll try to apply them. On production database I changed the index so that 'transaction_time_commit' is the first column, and now I don't get any sequential scans on transactions table when only WHERE condition is on transaction_time_commit. Mike
Mario Splivalo wrote: > Robert Haas wrote: > jura=# set enable_seqscan to false; > SET > jura=# explain analyze select * from transactions where > transaction_time_commit between '2008-01-01' and '2008-01-31 23:59:59'; > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Bitmap Heap Scan on transactions (cost=428882.89..651630.52 > rows=759775 width=91) (actual time=1358.040..1633.867 rows=722176 loops=1) > Recheck Cond: ((transaction_time_commit >= '2008-01-01 > 00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <= > '2008-01-31 23:59:59+01'::timestamp with time zone)) > -> Bitmap Index Scan on transactions_idx__client_data > (cost=0.00..428692.95 rows=759775 width=0) (actual > time=1354.485..1354.485 rows=722176 loops=1) > Index Cond: ((transaction_time_commit >= '2008-01-01 > 00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <= > '2008-01-31 23:59:59+01'::timestamp with time zone)) > Total runtime: 1778.938 ms > (5 rows) > > I neglected to paste this 'explain analyze', when I changed the index so that 'transaction_time_commit' is first column in the index: jura=# explain analyze select * from transactions where transaction_time_commit between '2009-01-01' and '2009-01-31 23:59:59'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on transactions (cost=7550.51..233419.58 rows=250880 width=91) (actual time=95.139..280.008 rows=525051 loops=1) Recheck Cond: ((transaction_time_commit >= '2009-01-01 00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <= '2009-01-31 23:59:59+01'::timestamp with time zone)) -> Bitmap Index Scan on transactions_idx__client_data (cost=0.00..7487.79 rows=250880 width=0) (actual time=93.382..93.382 rows=525051 loops=1) Index Cond: ((transaction_time_commit >= '2009-01-01 00:00:00+01'::timestamp with time zone) AND (transaction_time_commit <= '2009-01-31 23:59:59+01'::timestamp with time zone)) Total runtime: 386.665 ms (5 rows) Thank you, Tom! Mike