Re: Improve Seq scan performance - Mailing list pgsql-performance

From Vladimir Sitnikov
Subject Re: Improve Seq scan performance
Date
Msg-id 1d709ecc0811100018o5058377bl8fd7b2ddd7cba96a@mail.gmail.com
Whole thread Raw
In response to Re: Improve Seq scan performance  (Craig Ringer <craig@postnewspapers.com.au>)
Responses Re: Improve Seq scan performance
List pgsql-performance


Maybe there's some hybrid type possible where you can scan the index to find large table regions that are known /not/ to contain tuples of interest and seek over them in your scan. I wouldn't know, really, but it sounds like it'd probably be more I/O than a pure seq scan (given the reading of the index too) unless the table had the values of interest rather neatly clustered. It'd also surely use more memory and CPU time processing the whole index to find table regions without values of interest.


Is that what you meant, though?
Not exactly.  I mean the following:  there are cases when index scan even over non-clustered values is a complete win (basically, it is a win when the number of returned values is relatively small no matter is it due to selectivity or due to limit clause).
The test case that I have provided creates a 667 pages long table and 30 pages long index thus a complete scan of the index is 22 times faster in terms of I/O.

Suppose you want to find all the values that contain '%123%'. Currently PostgreSQL will do a sec scan, while the better option might be (and it is) to loop through all the items in the index (it will cost 30 I/O), find records that truly contain %123% (it will find 20 of them) and do 20 I/O to check tuple visiblity. That is 50 I/O versus 667 for seq scan.

 
A b-tree index cannot be used on a LIKE query with a leading wildcard. See the FAQ.
Unfortunately it is true. I would love to improve that particular case.

In addition, if your database is not in the C locale you can't use an ordinary index for LIKE queries. See the FAQ. You need to create a text_pattern_ops index instead:

create index i_ix_txt on seq_test(i text_pattern_ops);
Good catch. However, that does not change the results. PostgresSQL does the same amount of 2529 I/O for index scan on '%123%' for some unknown reason.
 


set enable_seqscan=off
-- Index Scan reads 2529 pages for some reason. I would expect *30 *(index
size) + *20 *(number of matching entries) = 50 pages maximum, that is 10
times better than with seq scan.
Index Scan using i_ix on seq_test  (cost=0.00..1643.74 rows=356 width=508)
(actual time=0.334..16.746 rows=*20 *loops=1 read_shared=2529(2529)
read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
 Filter: (i ~~ '%123%'::text)
Total runtime: 16.863 ms

I think it's reading the whole index, because it can't do a prefix search if there's a leading wildcard. I'm a bit confused, though, since I thought in this case it couldn't actually execute the query w/o a sequential scan, and would just use one irrespective of the enable_seqscan param. That's what happens here.
Please, follow the case carefully:  the index is only 30 pages long. Why is PostgreSQL doing 2529 I/O? It drives me crazy.


Regards,
Vladimir Sitnikov

pgsql-performance by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Improve Seq scan performance
Next
From: Jim 'Decibel!' Nasby
Date:
Subject: Oddity with view