Thread: Strange (?) Index behavior?
Hey people, long while since I posted here, but I'm having an index issue that looks on the surface to be a little strange. I have a text field that I'm trying to query on in a table with millions of rows. Stupid I know, but a fairly common stupid thing to try to do. For some reason it's a requirement that partial wildcard searches are done on this field, such as "SELECT ... WHERE field LIKE 'A%'" I thought an interesting way to do this would be to simply create partial indexes for each letter on that field, and it works when the query matches the WHERE clause in the index exactly like above. The problem is thus: Say I have an index.. CREATE INDEX column_idx_a ON table (column) WHERE column LIKE 'A%' It seems to me that a query saying "SELECT column FROM table WHERE column LIKE 'AA%';" should be just as fast or very close to the first case up above. However, explain tells me that this query is not using the index above, which is what's not making sense to me. Does the planner not realize that 'AA%' will always fall between 'A%' and 'B', and thus that using the index would be the best way to go, or am I missing something else that's preventing this from working?
> For some reason it's a requirement that partial wildcard > searches are done on this field, such as "SELECT ... WHERE > field LIKE 'A%'" > > I thought an interesting way to do this would be to simply > create partial indexes for each letter on that field, and it > works when the query matches the WHERE clause in the index > exactly like above. The problem is thus: I thought PG could use an ordinary index for 'like' conditions with just a terminating '%'? My other thought is that like 'A%' should grab about 1/26th of the table anyway (if the initial character distribution is random), and so a sequential scan might be the best plan anyway... M
> It seems to me that a query saying "SELECT column FROM table WHERE > column LIKE 'AA%';" should be just as fast or very close to the first > case up above. However, explain tells me that this query is not using > the index above, which is what's not making sense to me. It looks for an exact expression match, and doesn't know about values which are equal. You can provide both clauses. WHERE column LIKE 'A%' and column LIKE 'AA%';
On Fri, 05 Nov 2004 10:07:38 -0500, Rod Taylor <pg@rbt.ca> wrote: > > It seems to me that a query saying "SELECT column FROM table WHERE > > column LIKE 'AA%';" should be just as fast or very close to the first > > case up above. However, explain tells me that this query is not using > > the index above, which is what's not making sense to me. > > It looks for an exact expression match, and doesn't know about values > which are equal. > > You can provide both clauses. > > WHERE column LIKE 'A%' and column LIKE 'AA%'; I see. That's not really optimal either however as you can probably see already.. adding AB, AC, AD...AZ is likely to be pretty bogus and at the least is time consuming. Matt Clark was right that it will use a standard index, which is in fact what it's doing right now in the "SELECT column WHERE column LIKE 'AA%';" case.. however as I said, the table has millions of rows -- currently about 76 million, so even a full index scan is fairly slow. The machine isn't all that hot performance wise either, a simple dual 800 P3 with a single 47GB Seagate SCSI. The only redeeming factor is that it has 2GB of memory, which I'm trying to make the most of with these indexes. So assuming this partial index situation isn't going to change (it seems like it would be a fairly simple fix for someone that knows the pg code however) I'm wondering if a subselect may speed things up any, so I'm going to investigate that next. Perhaps.. SELECT column FROM (SELECT column FROM table WHERE column LIKE 'A%') AS sq WHERE column LIKE 'AA%'; The query planner thinks this will be pretty fast indeed, and does use the index I am after. OS is, of course, FreeBSD.
On Fri, 5 Nov 2004 10:32:43 -0500, Allen Landsidel <alandsidel@gmail.com> wrote: > On Fri, 05 Nov 2004 10:07:38 -0500, Rod Taylor <pg@rbt.ca> wrote: > > > > > It seems to me that a query saying "SELECT column FROM table WHERE > > > column LIKE 'AA%';" should be just as fast or very close to the first > > > case up above. However, explain tells me that this query is not using > > > the index above, which is what's not making sense to me. > > > > It looks for an exact expression match, and doesn't know about values > > which are equal. > > > > You can provide both clauses. > > > > WHERE column LIKE 'A%' and column LIKE 'AA%'; > > I see. That's not really optimal either however as you can probably > see already.. adding AB, AC, AD...AZ is likely to be pretty bogus and > at the least is time consuming. I see now that you mean to add that to the SELECT clause and not the index, my mistake. > Perhaps.. SELECT column FROM (SELECT column FROM table WHERE column > LIKE 'A%') AS sq WHERE column LIKE 'AA%'; > > The query planner thinks this will be pretty fast indeed, and does use > the index I am after. This was indeed pretty fast. About 7 seconds, as was modifying the WHERE as suggested above. -Allen
On Fri, Nov 05, 2004 at 09:39:16 -0500, Allen Landsidel <alandsidel@gmail.com> wrote: > > For some reason it's a requirement that partial wildcard searches are > done on this field, such as "SELECT ... WHERE field LIKE 'A%'" > > I thought an interesting way to do this would be to simply create > partial indexes for each letter on that field, and it works when the > query matches the WHERE clause in the index exactly like above. The > problem is thus: That may not help much except for prefixes that have a below average number of occurences. If you are going to be select 1/26 of the records, you are probably going to do about as well with a sequential scan as an index scan. Just having a normal index on the column will work if the database locale is C. In 7.4 you can create an index usable by LIKE even in the database locale isn't C, but I don't remember the exact syntax. You will be better off having just one index rather than 26 partial indexes.
On Fri, 5 Nov 2004 11:51:59 -0600, Bruno Wolff III <bruno@wolff.to> wrote: > On Fri, Nov 05, 2004 at 09:39:16 -0500, > Allen Landsidel <alandsidel@gmail.com> wrote: > > > > For some reason it's a requirement that partial wildcard searches are > > done on this field, such as "SELECT ... WHERE field LIKE 'A%'" > > > > I thought an interesting way to do this would be to simply create > > partial indexes for each letter on that field, and it works when the > > query matches the WHERE clause in the index exactly like above. The > > problem is thus: > > That may not help much except for prefixes that have a below average > number of occurences. If you are going to be select 1/26 of the records, > you are probably going to do about as well with a sequential scan as an > index scan. The thing isn't that I want 1/26th of the records since the distribution is not exactly equal among different letters, but more importantly, there are about 76million rows currently, and for some reason I am being told by the people with the pointy hair that a query like "select foo,bar from table where foo like 'abc%';" is not an uncommon type of query to run. I don't know why it's common and to be honest, I'm afraid to ask. ;) With that many rows, and a normal index on the field, postgres figures the best option for say "I%" is not an index scan, but a sequential scan on the table, with a filter -- quite obviously this is slow as heck, and yes, I've run analyze several times and in fact have the vacuum analyze automated. With the partial index the index scan is used and the cost drops from 0..2million to 0..9000 -- a vast improvement. So I'm going to go with the partial indexes, and have a total of 36 of them -- A-Z and 0-9. > Just having a normal index on the column will work if the database locale > is C. In 7.4 you can create an index usable by LIKE even in the database > locale isn't C, but I don't remember the exact syntax. You will be better > off having just one index rather than 26 partial indexes. I haven't written a line of C in years, and it was never my strong suit, so despite all my years doing development and sysadminning, the locale stuff is still something of a mystery to me. The locale though is C, the default, and will for the time being at least be storing only ascii strings -- no unicode, other character sets, or anything funky like that. -Allen
> With that many rows, and a normal index on the field, > postgres figures the best option for say "I%" is not an index > scan, but a sequential scan on the table, with a filter -- > quite obviously this is slow as heck, and yes, I've run > analyze several times and in fact have the vacuum analyze automated. Ah, so "like 'I%'" uses a very slow seq scan, but "like 'ABC%'" uses an ordinary index OK? If so then... The planner would usually assume (from what Tom usually says) that 1/26 selectivity isn't worth doing an index scan for, but in your case it's wrong (maybe because the rows are very big?) You may be able to get the planner to go for an index scan on "like 'I%'" by tweaking the foo_cost variables in postgresql.conf Or you could have the app rewrite "like 'I%'" to "like 'IA%' or like 'IB%' ... ", or do that as a stored proc. > With the partial index the index scan is used and the cost > drops from 0..2million to 0..9000 -- a vast improvement. So there are really only 9000 rows out of 76 million starting with 'I'? How about combining some techniques - you could create an index on the first two chars of the field (should be selective enough to give an index scan), select from that, and select the actual data with the like clause. CREATE INDEX idx_firstletters ON table (substr(field, 1, 2)); CREATE INDEX idx_all ON table (field); SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE') AS approx WHERE field LIKE 'DE%'; Any good?
Allen Landsidel <alandsidel@gmail.com> writes: > With that many rows, and a normal index on the field, postgres figures > the best option for say "I%" is not an index scan, but a sequential > scan on the table, with a filter -- quite obviously this is slow as > heck, and yes, I've run analyze several times and in fact have the > vacuum analyze automated. > With the partial index the index scan is used and the cost drops from > 0..2million to 0..9000 -- a vast improvement. Hmm. This suggests to me that you're using a non-C locale and so a plain index *can't* be used for a LIKE query. Can you force it to use an indexscan by setting enable_seqscan = false? If not then you've got a locale problem. As someone else pointed out, this can be worked around by creating an index with the right operator class. regards, tom lane
On Fri, 05 Nov 2004 14:57:40 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Allen Landsidel <alandsidel@gmail.com> writes: > > With that many rows, and a normal index on the field, postgres figures > > the best option for say "I%" is not an index scan, but a sequential > > scan on the table, with a filter -- quite obviously this is slow as > > heck, and yes, I've run analyze several times and in fact have the > > vacuum analyze automated. > > With the partial index the index scan is used and the cost drops from > > 0..2million to 0..9000 -- a vast improvement. > > Hmm. This suggests to me that you're using a non-C locale and so a > plain index *can't* be used for a LIKE query. Can you force it to use > an indexscan by setting enable_seqscan = false? If not then you've got > a locale problem. As someone else pointed out, this can be worked > around by creating an index with the right operator class. Tom, disabling seqscan does cause it to use the index. With seqscan enabled however, "AB%" will use the index, but "A%" will not. The estimated cost for the query is much higher without the partial indexes than it is with them, and the actual runtime of the query is definitely longer without the partial indexes. The locale is set in the postgresql.conf file as per default, with.. # These settings are initialized by initdb -- they may be changed lc_messages = 'C' # locale for system error message strings lc_monetary = 'C' # locale for monetary formatting lc_numeric = 'C' # locale for number formatting lc_time = 'C' # locale for time formatting -Allen
On Fri, 5 Nov 2004 18:34:23 -0000, Matt Clark <matt@ymogen.net> wrote: > > With that many rows, and a normal index on the field, > > postgres figures the best option for say "I%" is not an index > > scan, but a sequential scan on the table, with a filter -- > > quite obviously this is slow as heck, and yes, I've run > > analyze several times and in fact have the vacuum analyze automated. > > Ah, so "like 'I%'" uses a very slow seq scan, but "like 'ABC%'" uses an > ordinary index OK? If so then... That is correct. > The planner would usually assume (from what Tom usually says) that 1/26 > selectivity isn't worth doing an index scan for, but in your case it's wrong > (maybe because the rows are very big?) The rows aren't big, it's a text field, a few ints, and a few timestamps. That's all. The text field is the one we're querying on here and lengthwise it's typically not over 32 chars. > You may be able to get the planner to go for an index scan on "like 'I%'" by > tweaking the foo_cost variables in postgresql.conf That's true but I'd rather not, there are times when the seqscan will have a faster net result (for other queries) and I'd rather not have them suffer. > Or you could have the app rewrite "like 'I%'" to "like 'IA%' or like 'IB%' > ... ", or do that as a stored proc. Holy cow. Yeah that seems a little outrageous. It would be cleaner looking in "\d table" than having all these indexes at the cost of having one very ugly query. > > With the partial index the index scan is used and the cost > > drops from 0..2million to 0..9000 -- a vast improvement. > > So there are really only 9000 rows out of 76 million starting with 'I'? How > about combining some techniques - you could create an index on the first two > chars of the field (should be selective enough to give an index scan), > select from that, and select the actual data with the like clause. I was talking about the cost, not the number of rows. About 74,000 rows are returned but the query only takes about 8 seconds to run. -- with the partial index in place. > CREATE INDEX idx_firstletters ON table (substr(field, 1, 2)); > CREATE INDEX idx_all ON table (field); > SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE') > AS approx WHERE field LIKE 'DE%'; That looks like a pretty slick way to create an index, I didn't know there was such a way to do it.. but It appears that this will not work with queries where the WHERE clause wants to find substrings longer than 2 characters. I will give it a try and see how it goes though I think I'm fairly "settled" on creating all the other indexes, unless there is some specific reason I shouldn't -- they are used in all cases where the substring is >= 1 character, so long as I make sure the first where clause (or inner select in a subquery) is the most ambiguous from an index standpoint. Going back to the initial problem -- having only one large, complete index on the table (no partial indexes) the query "SELECT field FROM table WHERE field LIKE 'A%';" does not use the index. The query "SELECT field FROM table WHERE field LIKE 'AB%';" however, does use the single large index if it exists. Adding the partial index "CREATE INDEX idx_table_substrfield_A ON table (field) WHERE field LIKE 'A%';" causes all queries with substrings of any length to do index scans.provided I issue the query as: SELECT field FROM table WHERE field LIKE 'A%' AND field LIKE 'AB%'; -- or even -- SELECT field FROM table WHERE field LIKE 'A%'; The latter query, without the partial index described, does a sequential scan on the table itself instead of an index scan. -Allen
Allen Landsidel <alandsidel@gmail.com> writes: > With seqscan enabled however, "AB%" will use the index, but "A%" will not. > The estimated cost for the query is much higher without the partial > indexes than it is with them, and the actual runtime of the query is > definitely longer without the partial indexes. OK. This suggests that the planner is drastically misestimating the selectivity of the 'A%' clause, which seems odd to me since in principle it could get that fairly well from the ANALYZE histogram. But it could well be that you need to increase the resolution of the histogram --- see ALTER TABLE SET STATISTICS. Did you ever show us EXPLAIN ANALYZE results for this query? regards, tom lane
On Fri, 05 Nov 2004 16:08:56 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Allen Landsidel <alandsidel@gmail.com> writes: > > With seqscan enabled however, "AB%" will use the index, but "A%" will not. > > > The estimated cost for the query is much higher without the partial > > indexes than it is with them, and the actual runtime of the query is > > definitely longer without the partial indexes. > > OK. This suggests that the planner is drastically misestimating > the selectivity of the 'A%' clause, which seems odd to me since in > principle it could get that fairly well from the ANALYZE histogram. > But it could well be that you need to increase the resolution of the > histogram --- see ALTER TABLE SET STATISTICS. I will look into this. > > Did you ever show us EXPLAIN ANALYZE results for this query? No, I didn't. I am running it now without the partial index on to give you the results but it's (the 'A%' problem query) been running pretty much since I got this message (an hour ago) and is still not finished. The EXPLAIN results without the ANALYZE will have to suffice until it's done, I can readd the index, and run it again, so you have both to compare to. First two queries run where both the main index, and the 'A%' index exist: -- QUERY 1 search=# explain search-# SELECT test_name FROM test WHERE test_name LIKE 'A%'; QUERY PLAN ------------------------------------------------------------------------------------------- Index Scan using test_name_idx_a on "test" (cost=0.00..8605.88 rows=391208 width=20) Index Cond: ((test_name >= 'A'::text) AND (test_name < 'B'::text)) Filter: (test_name ~~ 'A%'::text) (3 rows) Time: 16.507 ms -- QUERY 2 search=# explain search-# SELECT test_name FROM test WHERE test_name LIKE 'A%' AND test_name LIKE 'AB%'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Index Scan using test_name_idx_a on "test" (cost=0.00..113.79 rows=28 width=20) Index Cond: ((test_name >= 'A'::text) AND (test_name < 'B'::text) AND (test_name >= 'AB'::text) AND (test_name < 'AC'::text)) Filter: ((test_name ~~ 'A%'::text) AND (test_name ~~ 'AB%'::text)) (3 rows) Time: 3.197 ms Ok, now the same two queries after a DROP INDEX test_name_idx_a; search=# explain search-# SELECT test_name FROM test WHERE test_name LIKE 'A%'; QUERY PLAN ----------------------------------------------------------------------------------------------- Index Scan using test_name_unique on "test" (cost=0.00..1568918.66 rows=391208 width=20) Index Cond: ((test_name >= 'A'::text) AND (test_name < 'B'::text)) Filter: (test_name ~~ 'A%'::text) (3 rows) Time: 2.470 ms search=# explain search-# SELECT test_name FROM test WHERE test_name LIKE 'AB%'; QUERY PLAN ------------------------------------------------------------------------------------------- Index Scan using test_name_unique on "test" (cost=0.00..20379.49 rows=5081 width=20) Index Cond: ((test_name >= 'AB'::text) AND (test_name < 'AC'::text)) Filter: (test_name ~~ 'AB%'::text) (3 rows) Time: 2.489 ms ------------------ Copying just the costs you can see the vast difference... Index Scan using test_name_unique on "test" (cost=0.00..1568918.66 rows=391208 width=20) Index Scan using test_name_unique on "test" (cost=0.00..20379.49 rows=5081 width=20) vs Index Scan using test_name_idx_a on "test" (cost=0.00..8605.88 rows=391208 width=20) Index Scan using test_name_idx_a on "test" (cost=0.00..113.79 rows=28 width=20) Lastly no, neither of these row guesstimates is correct.. I'll get back and tell you how much they're off by if it's important, once this query is done. The odd thing is it used the index scan here each time -- that has not always been the case with the main unique index, it's trying to make a liar out of me heh. I'm used to the estimates and plan changing from one vacuum analyze to the next, even without any inserts or updates between.. the index scan is always used however when I have the partial indexes in place, and something like.. CREATE TEMP TABLE t1 AS SELECT field FROM table WHERE field LIKE 'A%' AND field LIKE 'AA%'; runs in 6-8 seconds as well, with a bit under 100k records. -Allen
>>So there are really only 9000 rows out of 76 million starting with 'I'? How >>about combining some techniques - you could create an index on the first two >>chars of the field (should be selective enough to give an index scan), >>select from that, and select the actual data with the like clause. >> >> > >I was talking about the cost, not the number of rows. About 74,000 >rows are returned but the query only takes about 8 seconds to run. -- > > Well, 74000/76000000 ~= 0.1%, way less than 1/26, so no surprise that an indexscan is better, and also no surprise that the planner can't know that I is such an uncommon initial char. >with the partial index in place. > > > >>CREATE INDEX idx_firstletters ON table (substr(field, 1, 2)); >>CREATE INDEX idx_all ON table (field); >>SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE') >>AS approx WHERE field LIKE 'DE%'; >> >> > >That looks like a pretty slick way to create an index, I didn't know >there was such a way to do it.. but It appears that this will not work >with queries where the WHERE clause wants to find substrings longer >than 2 characters. > > I don't see why not, it just uses the functional index to grap the 1/(ascii_chars^2) of the rows that are of obvious interest, and then uses the standard index to filter that set.. Where it won't work is where you just want one initial char! Which is why I suggested the silly query rewrite... >Going back to the initial problem -- having only one large, complete >index on the table (no partial indexes) the query "SELECT field FROM >table WHERE field LIKE 'A%';" does not use the index. The query >"SELECT field FROM table WHERE field LIKE 'AB%';" however, does use >the single large index if it exists. > > > If you were planning the query, what would you do? Assuming we're talking about A-Z as possible first chars, and assuming we don't know the distribution of those chars, then we have to assume 1/26 probability of each char, so a seq scan makes sense. Whereas like 'JK%' should only pull 1/500 rows. >Adding the partial index "CREATE INDEX idx_table_substrfield_A ON >table (field) WHERE field LIKE 'A%';" causes all queries with >substrings of any length to do index scans.provided I issue the query >as: > >SELECT field FROM table WHERE field LIKE 'A%' AND field LIKE 'AB%'; > -- or even -- >SELECT field FROM table WHERE field LIKE 'A%'; > >The latter query, without the partial index described, does a >sequential scan on the table itself instead of an index scan. > > Yes, because (I assume, Tom will no doubt clarify/correct), by creating the partial indices you create a lot more information about the distribution of the first char - either that, or the planner simply always uses an exactly matching partial index if available. I _think_ that creating 26 partial indexes on '?%' is essentially the same thing as creating one functional index on substr(field,1,1), just messier, unless the partial indexes cause the planner to do something special... M
Matt Clark <matt@ymogen.net> writes: > Well, 74000/76000000 ~= 0.1%, way less than 1/26, so no surprise that an > indexscan is better, and also no surprise that the planner can't know > that I is such an uncommon initial char. But it *can* know that, at least given adequate ANALYZE statistics. I'm pretty convinced that the basic answer to Allen's problem is to increase the histogram size. How large he needs to make it is not clear --- obviously his data distribution is not uniform, but I don't have a fix on how badly non-uniform. regards, tom lane
On Fri, 05 Nov 2004 23:04:23 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Matt Clark <matt@ymogen.net> writes: > > Well, 74000/76000000 ~= 0.1%, way less than 1/26, so no surprise that an > > indexscan is better, and also no surprise that the planner can't know > > that I is such an uncommon initial char. > > But it *can* know that, at least given adequate ANALYZE statistics. > I'm pretty convinced that the basic answer to Allen's problem is to > increase the histogram size. How large he needs to make it is not > clear --- obviously his data distribution is not uniform, but I don't > have a fix on how badly non-uniform. > Tom just an update, it's now 2am.. several hours since I started that EXPLAIN ANALYZE and it still hasn't finished, so I've aborted it. I will do the example with the more precise substring instead to illustrate the performance differences, both with and without the substring index and report back here. I'm also interested in something someone else posted, namely that the 36 indexes I have, "A%" through "Z%" and "0%" through "9%" could be replaced with a single index like: "CREATE INDEX idx_table_field_substr ON table substr(field, 1, 1);" I'm wondering, histogram and other information aside, will this function as well (or better) than creating all the individual indexes?
Ok, you thought maybe this thread died or got abandoned in the face of all the senseless trolling and spam going on.. you were wrong.. ;) I thought though I'd start over trying to explain what's going on. I've gone through some dumps, and recreation of the database with some different filesystem options and whatnot, and starting over fresh here's the situation. First, the structure. CREATE TABLE testtable ( nid serial UNIQUE NOT NULL, sname text NOT NULL, iother int4 ); CREATE UNIQUE INDEX idx_sname_unique ON testtable (sname); ----- With the above, the query "SELECT sname FROM testtable WHERE sname LIKE 'A%';" DOES use an index scan on idx_sname_unique -- sometimes. Other times, the planner thinks a sequential scan would be better. The index is large. There are over 70 million rows in this table. The estimated cost and so forth from EXPLAIN on the above query is way off as well, but I expect that to be the case considering the size of the table -- perhaps there is a tunable in the statistics gathering backend ot fix this? My goal was to obviously make queries of the above type, as well as more refined ones such as "... LIKE 'AB%';" faster. This goal in mind, I thought that creating several indexes (36 of them) would speed things up -- one index per alphanumeric start character, via.. CREATE INDEX idx_sname_suba ON testtable (sname) WHERE sname LIKE 'A%'; CREATE INDEX idx_sname_subb ON testtable (sname) WHERE sname LIKE 'B%'; ... CREATE INDEX idx_sname_subz ON testtable (sname) WHERE sname LIKE 'Z%'; (also including 0..9) I've wracked my brain trying to come up with other ways of doing this, including partitioning the table, and trying the suggestions here such as "substr(1,1)" in the index creation instead of creating many distinct indexes. None of these seems to speed up the queries enough to make them "acceptable" when it comes to runtimes. My data from before was somehow in error.. not sure why. At this point, using one index vs. the other the runtimes are about the same. search=# explain analyze search-# SELECT sname FROM search-# (SELECT sname FROM testtable WHERE sname LIKE 'A%') AS subq search-# WHERE sname LIKE 'AA%'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Index Scan using sname_a on "testtable" (cost=0.00..189.41 rows=47 width=20) (actual time=16.219..547053.251 rows=74612 loops=1) Index Cond: ((sname >= 'A'::text) AND (sname < 'B'::text) AND (sname >= 'AA'::text) AND (sname < 'AB'::text)) Filter: ((sname ~~ 'A%'::text) AND (sname ~~ 'AA%'::text)) Total runtime: 547454.939 ms (4 rows) Time: 547458.216 ms search=# explain analyze search-# SELECT sname FROM testtable WHERE sname LIKE 'AA%'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using sname_unique on "testtable" (cost=0.00..34453.74 rows=8620 width=20) (actual time=77.004..537065.079 rows=74612 loops=1) Index Cond: ((sname >= 'AA'::text) AND (sname < 'AB'::text)) Filter: (sname ~~ 'AA%'::text) Total runtime: 537477.737 ms (4 rows) Time: 537480.571 ms
Allen, > Ok, you thought maybe this thread died or got abandoned in the face of > all the senseless trolling and spam going on.. you were wrong.. ;) > > I thought though I'd start over trying to explain what's going on. > I've gone through some dumps, and recreation of the database with some > different filesystem options and whatnot, and starting over fresh > here's the situation. I can't find the beginning of this thread. What's your sort_mem? Shared_buffers? -- --Josh Josh Berkus Aglio Database Solutions San Francisco
On Thu, 11 Nov 2004 10:52:43 -0800, Josh Berkus <josh@agliodbs.com> wrote: > Allen, > > > Ok, you thought maybe this thread died or got abandoned in the face of > > all the senseless trolling and spam going on.. you were wrong.. ;) > > > > I thought though I'd start over trying to explain what's going on. > > I've gone through some dumps, and recreation of the database with some > > different filesystem options and whatnot, and starting over fresh > > here's the situation. > > I can't find the beginning of this thread. What's your sort_mem? > Shared_buffers? Currently sort_mem is 64MB and shared_buffers is 256MB. The box is a dual 800 with 2GB physical, running FreeBSD 4.10-STABLE, single U2W SCSI hdd.
Allen Landsidel <alandsidel@gmail.com> writes: > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------- > Index Scan using sname_unique on "testtable" (cost=0.00..34453.74 > rows=8620 width=20) (actual time=77.004..537065.079 rows=74612 > loops=1) > Index Cond: ((sname >= 'AA'::text) AND (sname < 'AB'::text)) > Filter: (sname ~~ 'AA%'::text) > Total runtime: 537477.737 ms > (4 rows) > > Time: 537480.571 ms Nothing you're going to do to the query is going to come up with a more effective plan than this. It's using the index after all. It's never going to be lightning fast because it has to process 75k rows. However 75k rows shouldn't be taking nearly 10 minutes. It should be taking about 10 seconds. The 77ms before finding the first record is a bit suspicious. Have you vacuumed this table regularly? Try a VACUUM FULL VERBOSE, and send the results. You might try to REINDEX it as well, though I doubt that would help. Actually you might consider clustering the table on sname_unique. That would accomplish the same thing as the VACUUM FULL command and also speed up the index scan. And the optimizer knows (if you analyze afterwards) it so it should be more likely to pick the index scan. But currently you have to rerun cluster periodically. -- greg
On 11 Nov 2004 15:49:46 -0500, Greg Stark <gsstark@mit.edu> wrote: > Allen Landsidel <alandsidel@gmail.com> writes: > > > > > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------------------------------- > > Index Scan using sname_unique on "testtable" (cost=0.00..34453.74 > > rows=8620 width=20) (actual time=77.004..537065.079 rows=74612 > > loops=1) > > Index Cond: ((sname >= 'AA'::text) AND (sname < 'AB'::text)) > > Filter: (sname ~~ 'AA%'::text) > > Total runtime: 537477.737 ms > > (4 rows) > > > > Time: 537480.571 ms > > Nothing you're going to do to the query is going to come up with a more > effective plan than this. It's using the index after all. It's never going to > be lightning fast because it has to process 75k rows. > > However 75k rows shouldn't be taking nearly 10 minutes. It should be taking > about 10 seconds. That's my feeling as well, I thought the index was to blame because it will be quite large, possibly large enough to not fit in memory nor be quickly bursted up. > The 77ms before finding the first record is a bit suspicious. Have you > vacuumed this table regularly? Try a VACUUM FULL VERBOSE, and send the > results. You might try to REINDEX it as well, though I doubt that would help. This table is *brand spanking new* for lack of a better term. I have the data for it in a CSV. I load the CSV up which takes a bit, then create the indexes, do a vacuum analyze verbose, and then posted the results above. I don't think running vacuum a more times is going to change things, at least not without tweaking config settings that affect vacuum. Not a single row has been inserted or altered since the initial load.. it's just a test. I can't give vacuum stats right now because the thing is reloading (again) with different newfs settings -- something I figure I have the time to fiddle with now, and seldom do at other times. These numbers though don't change much between 8K on up to 64K 'cluster' sizes. I'm trying it now with 8K page sizes, with 8K "minimum fragment" sizes. Should speed things up a tiny bit but not enough to really affect this query. Do you still see a need to have the output from the vacuum? > Actually you might consider clustering the table on sname_unique. That would > accomplish the same thing as the VACUUM FULL command and also speed up the > index scan. And the optimizer knows (if you analyze afterwards) it so it > should be more likely to pick the index scan. But currently you have to rerun > cluster periodically. Clustering is really unworkable in this situation. It would work now, in this limited test case, but using it if this were to go into production is unrealistic. It would have to happen fairly often since this table is updated frequently, which will break the clustering quickly with MVCC. Running it often.. well.. it has 70M+ rows, and the entire table is copied, reordered, and rewritten.. so that's a lot of 'scratch space' needed. Finally, clustering locks the table.. Something I'd already considered but quickly ruled out because of these reasons.. More ideas are welcome though. ;) -Allen
Allen Landsidel <alandsidel@gmail.com> writes: > Clustering is really unworkable in this situation. Nonetheless, please do it in your test scenario, so we can see if it has any effect or not. The speed you're getting works out to about 7.2 msec/row, which would be about right if every single row fetch caused a disk seek, which seems improbable unless the table is just huge compared to your available RAM. regards, tom lane
>>>----------------------------------------------------------------------------------------------------------------------------------------------- >>> Index Scan using sname_unique on "testtable" (cost=0.00..34453.74 >>>rows=8620 width=20) (actual time=77.004..537065.079 rows=74612 >>>loops=1) >>> Index Cond: ((sname >= 'AA'::text) AND (sname < 'AB'::text)) >>> Filter: (sname ~~ 'AA%'::text) >>> Total runtime: 537477.737 ms >>>(4 rows) >>> >>>Time: 537480.571 ms >> >>Nothing you're going to do to the query is going to come up with a more >>effective plan than this. It's using the index after all. It's never going to >>be lightning fast because it has to process 75k rows. >> >>However 75k rows shouldn't be taking nearly 10 minutes. It should be taking >>about 10 seconds. I am confused about this statement. I have a table with 1.77 million rows that I use gist indexes on (TSearch) and I can pull out of it in less than 2 seconds. Are you saying it should be taking 10 seconds because of the type of plan? 10 seconds seems like an awfullong time for this. Sincerely, Joshua D. Drake > > > That's my feeling as well, I thought the index was to blame because it > will be quite large, possibly large enough to not fit in memory nor be > quickly bursted up. > > >>The 77ms before finding the first record is a bit suspicious. Have you >>vacuumed this table regularly? Try a VACUUM FULL VERBOSE, and send the >>results. You might try to REINDEX it as well, though I doubt that would help. > > > This table is *brand spanking new* for lack of a better term. I have > the data for it in a CSV. I load the CSV up which takes a bit, then > create the indexes, do a vacuum analyze verbose, and then posted the > results above. I don't think running vacuum a more times is going to > change things, at least not without tweaking config settings that > affect vacuum. Not a single row has been inserted or altered since the > initial load.. it's just a test. > > I can't give vacuum stats right now because the thing is reloading > (again) with different newfs settings -- something I figure I have the > time to fiddle with now, and seldom do at other times. These numbers > though don't change much between 8K on up to 64K 'cluster' sizes. I'm > trying it now with 8K page sizes, with 8K "minimum fragment" sizes. > Should speed things up a tiny bit but not enough to really affect this > query. > > Do you still see a need to have the output from the vacuum? > > >>Actually you might consider clustering the table on sname_unique. That would >>accomplish the same thing as the VACUUM FULL command and also speed up the >>index scan. And the optimizer knows (if you analyze afterwards) it so it >>should be more likely to pick the index scan. But currently you have to rerun >>cluster periodically. > > > Clustering is really unworkable in this situation. It would work now, > in this limited test case, but using it if this were to go into > production is unrealistic. It would have to happen fairly often since > this table is updated frequently, which will break the clustering > quickly with MVCC. > > Running it often.. well.. it has 70M+ rows, and the entire table is > copied, reordered, and rewritten.. so that's a lot of 'scratch space' > needed. Finally, clustering locks the table.. > > Something I'd already considered but quickly ruled out because of > these reasons.. > > More ideas are welcome though. ;) > > -Allen > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Command Prompt, Inc., home of PostgreSQL Replication, and plPHP. Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Attachment
On Thu, 11 Nov 2004 16:41:51 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Allen Landsidel <alandsidel@gmail.com> writes: > > Clustering is really unworkable in this situation. > > Nonetheless, please do it in your test scenario, so we can see if it has > any effect or not. It did not, not enough to measure anyway, which does strike me as pretty odd.. Here's what I've got, after the cluster. Note that this is also on a new filesystem, as I said, have been taking the chance to experiment. The other two results were from a filesystem with 64KB block size, 8KB fragment size. This one is 8KB and 8KB. search=# explain analyze search-# SELECT sname FROM testtable WHERE sname LIKE 'AA%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using sname_unique on "testtable" (cost=0.00..642138.83 rows=160399 width=20) (actual time=0.088..514438.470 rows=74612 loops=1) Index Cond: ((sname >= 'AA'::text) AND (sname < 'AB'::text)) Filter: (sname ~~ 'AA%'::text) Total runtime: 514818.837 ms (4 rows) Time: 514821.993 ms > > The speed you're getting works out to about 7.2 msec/row, which would be > about right if every single row fetch caused a disk seek, which seems > improbable unless the table is just huge compared to your available RAM. > > regards, tom lane The CSV for the table is "huge" but not compared to RAM. The dump of the database in native/binary format is ~1GB; the database currently has only this table and the system stuff. The time to fetch the first row was much faster with the cluster in place, but after that, it's pretty much the same. 537s vs. 515s
Allen Landsidel <alandsidel@gmail.com> writes: > On Thu, 11 Nov 2004 16:41:51 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Allen Landsidel <alandsidel@gmail.com> writes: >>> Clustering is really unworkable in this situation. >> >> Nonetheless, please do it in your test scenario, so we can see if it has >> any effect or not. > It did not, not enough to measure anyway, which does strike me as > pretty odd. Me too. Maybe we are barking up the wrong tree entirely, because I really expected to see a significant change. Lets start from first principles. While you are running this query, what sort of output do you get from "vmstat 1"? I'm wondering if it's I/O bound or CPU bound ... regards, tom lane
On Fri, 12 Nov 2004 17:35:00 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Allen Landsidel <alandsidel@gmail.com> writes: > > On Thu, 11 Nov 2004 16:41:51 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Allen Landsidel <alandsidel@gmail.com> writes: > >>> Clustering is really unworkable in this situation. > >> > >> Nonetheless, please do it in your test scenario, so we can see if it has > >> any effect or not. > > > It did not, not enough to measure anyway, which does strike me as > > pretty odd. > > Me too. Maybe we are barking up the wrong tree entirely, because I > really expected to see a significant change. > > Lets start from first principles. While you are running this query, > what sort of output do you get from "vmstat 1"? I'm wondering if it's > I/O bound or CPU bound ... I am running systat -vmstat 1 constantly on the box.. it's almost always I/O bound.. and the numbers are far lower than what I expect them to be, under 1MB/s. bonnie++ shows decent scores so.. I'm not sure what's goin on. [allen@dbtest01 /mnt_db/work#]bonnie++ -d /mnt_db/work -c 2 -u nobody Using uid:65534, gid:65534. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.93c ------Sequential Output------ --Sequential Input- --Random- Concurrency 2 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP dbtest01.distr 300M 100 98 17426 21 17125 18 197 98 182178 99 2027 42 Latency 96208us 594ms 472ms 56751us 15691us 3710ms Version 1.93c ------Sequential Create------ --------Random Create-------- dbtest01.distribute -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 12932 90 +++++ +++ 20035 98 11912 91 +++++ +++ 13074 93 Latency 26691us 268us 18789us 26755us 13586us 25039us 1.93c,1.93c,dbtest01.distributedmail.com,2,1100269160,300M,,100,98,17426,21,17125,18,197,98,182178,99,2027,42,16,,,,,12932,90,+++++,+++,20035,98,11912,91,+++++,+++,13074,93,96208us,594ms,472ms,56751us,15691us,3710ms,26691us,268us,18789us,26755us,13586us,25039us Looking at these numbers, obviously things could be a bit quicker, but it doesn't look slow enough to my eyes or experience to account for what I'm seeing with the query performance.. During the query, swap doesn't get touched, the cpus are mostly idle, but the disk activity seems to be maxed at under 1MB/s, 100% busy. To refresh and extend.. The box is FreeBSD 4.10-STABLE Dual 800MHz PIII's, 2GB of memory Relevent kernel options: maxusers 512 ... options SYSVSHM options SHMMAXPGS=262144 options SHMSEG=512 options SHMMNI=512 options SYSVSEM options SEMMNI=512 options SEMMNS=1024 options SEMMNU=512 options SEMMAP=512 ... nothing custom going on in /etc/sysctl.conf Filesystem is.. /dev/da1s1e on /mnt_db (ufs, local, noatime, soft-updates) And, from my postgresql.conf.. shared_buffers = 32768 # min 16, at least max_connections*2, 8KB each sort_mem = 65536 # min 64, size in KB vacuum_mem = 65536 # min 1024, size in KB ... max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 1000 # min 100, ~50 bytes each ... stats_start_collector = true stats_command_string = true stats_block_level = false stats_row_level = true stats_reset_on_server_start = true Thanks for helping me out with this Tom and everyone else. I suppose it's possible that something could be physically wrong with the drive, but I'm not seeing anything in syslog. I'm going to poke around with camcontrol looking for any bad sectors / remapped stuff while I wait for replies. -Allen
Sorry if I'm contributing more noise to the signal here, just thought I'd repost this one to the list since it may have gotten lost in all the garbage from the guy unhappy about the usenet thing.. ---------- Forwarded message ---------- From: Allen Landsidel <alandsidel@gmail.com> Date: Fri, 12 Nov 2004 19:26:39 -0500 Subject: Re: [PERFORM] Strange (?) Index behavior? To: pgsql-performance@postgresql.org On Fri, 12 Nov 2004 17:35:00 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Allen Landsidel <alandsidel@gmail.com> writes: > > On Thu, 11 Nov 2004 16:41:51 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Allen Landsidel <alandsidel@gmail.com> writes: > >>> Clustering is really unworkable in this situation. > >> > >> Nonetheless, please do it in your test scenario, so we can see if it has > >> any effect or not. > > > It did not, not enough to measure anyway, which does strike me as > > pretty odd. > > Me too. Maybe we are barking up the wrong tree entirely, because I > really expected to see a significant change. > > Lets start from first principles. While you are running this query, > what sort of output do you get from "vmstat 1"? I'm wondering if it's > I/O bound or CPU bound ... I am running systat -vmstat 1 constantly on the box.. it's almost always I/O bound.. and the numbers are far lower than what I expect them to be, under 1MB/s. bonnie++ shows decent scores so.. I'm not sure what's goin on. [allen@dbtest01 /mnt_db/work#]bonnie++ -d /mnt_db/work -c 2 -u nobody Using uid:65534, gid:65534. Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.93c ------Sequential Output------ --Sequential Input- --Random- Concurrency 2 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP dbtest01.distr 300M 100 98 17426 21 17125 18 197 98 182178 99 2027 42 Latency 96208us 594ms 472ms 56751us 15691us 3710ms Version 1.93c ------Sequential Create------ --------Random Create-------- dbtest01.distribute -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 12932 90 +++++ +++ 20035 98 11912 91 +++++ +++ 13074 93 Latency 26691us 268us 18789us 26755us 13586us 25039us 1.93c,1.93c,dbtest01.distributedmail.com,2,1100269160,300M,,100,98,17426,21,17125,18,197,98,182178,99,2027,42,16,,,,,12932,90,+++++,+++,20035,98,11912,91,+++++,+++,13074,93,96208us,594ms,472ms,56751us,15691us,3710ms,26691us,268us,18789us,26755us,13586us,25039us Looking at these numbers, obviously things could be a bit quicker, but it doesn't look slow enough to my eyes or experience to account for what I'm seeing with the query performance.. During the query, swap doesn't get touched, the cpus are mostly idle, but the disk activity seems to be maxed at under 1MB/s, 100% busy. To refresh and extend.. The box is FreeBSD 4.10-STABLE Dual 800MHz PIII's, 2GB of memory Relevent kernel options: maxusers 512 ... options SYSVSHM options SHMMAXPGS=262144 options SHMSEG=512 options SHMMNI=512 options SYSVSEM options SEMMNI=512 options SEMMNS=1024 options SEMMNU=512 options SEMMAP=512 ... nothing custom going on in /etc/sysctl.conf Filesystem is.. /dev/da1s1e on /mnt_db (ufs, local, noatime, soft-updates) And, from my postgresql.conf.. shared_buffers = 32768 # min 16, at least max_connections*2, 8KB each sort_mem = 65536 # min 64, size in KB vacuum_mem = 65536 # min 1024, size in KB ... max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 1000 # min 100, ~50 bytes each ... stats_start_collector = true stats_command_string = true stats_block_level = false stats_row_level = true stats_reset_on_server_start = true Thanks for helping me out with this Tom and everyone else. I suppose it's possible that something could be physically wrong with the drive, but I'm not seeing anything in syslog. I'm going to poke around with camcontrol looking for any bad sectors / remapped stuff while I wait for replies. -Allen