Thread: indexing words slow
> > Hi, > > I have done a little more testing, and the performance bottleneck > seems definitely be memory related. Note that it does not really seems > to be dependend on buffer-settings, but really on disk caches. > > additional info: > the index on this table is around 155 Megs big > > Now, if I do a count(*) on '^rol', after the second query, this takes > around 1 second, and returns 2528. > > On the other hand, if I do a count(*) on '^ric', his takes consequently > around 1:30 mins, no matter how often I run it. This returns 7866. > > A search on count(*) of '^lling' and '^tones' takes around 2.5 secs after > running it several times. Wow, this makes no sense to me. How can this happen? 'rol' is fast, but 'ric' which returns 3 times as many rows takes 1.5 minutes, no matter how often it is run? And 64MB is a lot of memory. Does profiling show that 'rol' and 'ric' take the same amount of CPU time in the backend? Is EXPLAIN identical for these two? If so, there must be something different about the I/O demands of the two queries, but I can't really understand what that difference could be. If I had to take a guess, it is that 'ric' is somehow causing a cache flush for each row it retrieves, while 'rol' fits in the cache. Several million rows is a good sized table, but certainly not too big for PostgreSQL. I am sure people have larger tables. I am kind of tempted to test it on my machine here. I have BSD/OS with a PP200 and 64MB RAM. Can you send me something I can recreate here? Perhaps put it on our ftp site? > > Running different queries in between affect these times. > > My computer has 64 Megs of RAM, and I'm running X (linux 2.0.30) > > So, it seems to me that with this amount of memory, my system is usable > only for smaller tables (ie. no 550,000 rows in the main table, and no > 4,500,000 rows in the 'index' table). If I want better performance for this > setup, I need faster disks and more (how much more?) memory. Wish I could > test this somehow.... Maybe I can ask my sysop at the university if I may > test this on the dual PPro with 256 megs, but I don't think so.... > > Maarten > > > _____________________________________________________________________________ > | TU Delft, The Netherlands, Faculty of Information Technology and Systems | > | Department of Electrical Engineering | > | Computer Architecture and Digital Technique section | > | M.Boekhold@et.tudelft.nl | > ----------------------------------------------------------------------------- > > -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
On Wed, 11 Mar 1998, Bruce Momjian wrote: > > > > Hi, > > > > I have done a little more testing, and the performance bottleneck > > seems definitely be memory related. Note that it does not really seems > > to be dependend on buffer-settings, but really on disk caches. > > > > additional info: > > the index on this table is around 155 Megs big > > > > Now, if I do a count(*) on '^rol', after the second query, this takes > > around 1 second, and returns 2528. > > > > On the other hand, if I do a count(*) on '^ric', his takes consequently > > around 1:30 mins, no matter how often I run it. This returns 7866. > > > > A search on count(*) of '^lling' and '^tones' takes around 2.5 secs after > > running it several times. btw. to make things clearer on what I mean with '^lling' and '^tones', I really mean "'^lling' *AND* '^tones'", ie. a join :) actually pretty good don't ya think? :) > Wow, this makes no sense to me. How can this happen? 'rol' is fast, but > 'ric' which returns 3 times as many rows takes 1.5 minutes, no matter how > often it is run? And 64MB is a lot of memory. > > Does profiling show that 'rol' and 'ric' take the same amount of CPU > time in the backend? Is EXPLAIN identical for these two? If so, there Profiling show different time (I guess you're talking about the cumulative time for the query?). 'rol' takes 0.68 secs cumulative, 'ric' takes 1.69 secs cumulative. EXPLAIN's are identical. > I am kind of tempted to test it on my machine here. I have BSD/OS with > a PP200 and 64MB RAM. Can you send me something I can recreate here? > Perhaps put it on our ftp site? I *can* put something up through http on my universiy machine I guess. It'll be a pg_dump of that table. Note however that this is generated from a commercial database, so please promise me to delete it after testing :) don't wanna get into any legal trouble (ah what, who's gonna find out :). I'll prepare a mail for you (ie. bruce) either tonight or tomorrow morning (CET), with the source and a 'psql script' to set things up, as a tar-file. The data itself you'll have to get through the web, much to large to mail (over 10 megs gzip -9). Maarten _____________________________________________________________________________ | TU Delft, The Netherlands, Faculty of Information Technology and Systems | | Department of Electrical Engineering | | Computer Architecture and Digital Technique section | | M.Boekhold@et.tudelft.nl | -----------------------------------------------------------------------------
Maarten Boekhold wrote: > > On Wed, 11 Mar 1998, Bruce Momjian wrote: > > > > > > > Hi, > > > > > > I have done a little more testing, and the performance bottleneck > > > seems definitely be memory related. Note that it does not really seems > > > to be dependend on buffer-settings, but really on disk caches. > > > > > > additional info: > > > the index on this table is around 155 Megs big > > > > > > Now, if I do a count(*) on '^rol', after the second query, this takes > > > around 1 second, and returns 2528. > > > > > > On the other hand, if I do a count(*) on '^ric', his takes consequently > > > around 1:30 mins, no matter how often I run it. This returns 7866. > > > > > > A search on count(*) of '^lling' and '^tones' takes around 2.5 secs after > > > running it several times. > > btw. to make things clearer on what I mean with '^lling' and '^tones', I > really mean "'^lling' *AND* '^tones'", ie. a join :) actually pretty good > don't ya think? :) This sounds like an unsatisfyable query, perhaps if the database figured this out, it could return zero rows without even hitting the index. If the first item matched, the first character is an 'l', if the second matches, a 't'. It can't be both an 'l' and a 't'! We could also do this for some other queries, like: select ... from ... where a<b and b<c and c<d If a and d are parameters to the query (constants), and d<a, then there can never be a row returned. Ocie
> > btw. to make things clearer on what I mean with '^lling' and '^tones', I > > really mean "'^lling' *AND* '^tones'", ie. a join :) actually pretty good > > don't ya think? :) > > This sounds like an unsatisfyable query, perhaps if the database > figured this out, it could return zero rows without even hitting the > index. If the first item matched, the first character is an 'l', if > the second matches, a 't'. It can't be both an 'l' and a 't'! He has to instances of the table, and is looking for index entries with both words. We have only been sharing part of the conversation. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
On Wed, 11 Mar 1998 ocie@paracel.com wrote: > Maarten Boekhold wrote: > > > > On Wed, 11 Mar 1998, Bruce Momjian wrote: > > > > > > > > > > Hi, > > > > > > > > I have done a little more testing, and the performance bottleneck > > > > seems definitely be memory related. Note that it does not really seems > > > > to be dependend on buffer-settings, but really on disk caches. > > > > > > > > additional info: > > > > the index on this table is around 155 Megs big > > > > > > > > Now, if I do a count(*) on '^rol', after the second query, this takes > > > > around 1 second, and returns 2528. > > > > > > > > On the other hand, if I do a count(*) on '^ric', his takes consequently > > > > around 1:30 mins, no matter how often I run it. This returns 7866. > > > > > > > > A search on count(*) of '^lling' and '^tones' takes around 2.5 secs after > > > > running it several times. > > > > btw. to make things clearer on what I mean with '^lling' and '^tones', I > > really mean "'^lling' *AND* '^tones'", ie. a join :) actually pretty good > > don't ya think? :) > > This sounds like an unsatisfyable query, perhaps if the database > figured this out, it could return zero rows without even hitting the > index. If the first item matched, the first character is an 'l', if > the second matches, a 't'. It can't be both an 'l' and a 't'! OK, I wasn't clear enough: select count(*) from table t1, table t2 where t1.string ~ '^lling' and t2.string '^tones'; Ofcourse the can be done (note that this is a table with 4,500,000 rows). Maarten _____________________________________________________________________________ | TU Delft, The Netherlands, Faculty of Information Technology and Systems | | Department of Electrical Engineering | | Computer Architecture and Digital Technique section | | M.Boekhold@et.tudelft.nl | -----------------------------------------------------------------------------
> > On Wed, 11 Mar 1998, Bruce Momjian wrote: > > > > > > > Hi, > > > > > > I have done a little more testing, and the performance bottleneck > > > seems definitely be memory related. Note that it does not really seems > > > to be dependend on buffer-settings, but really on disk caches. > > > > > > additional info: > > > the index on this table is around 155 Megs big > > > > > > Now, if I do a count(*) on '^rol', after the second query, this takes > > > around 1 second, and returns 2528. > > > > > > On the other hand, if I do a count(*) on '^ric', his takes consequently > > > around 1:30 mins, no matter how often I run it. This returns 7866. > > > > > > A search on count(*) of '^lling' and '^tones' takes around 2.5 secs after > > > running it several times. OK, I have the artist_fti table with 4.5 million rows, with an index artist_fti_idx on artist_fti.string. I don't have a 'product' table because I don't have the disk space, but that is not really a problem for testing. I used the product table to populate the artits_fti, then deleted the table so I could create the artist_fti_idx index. Single table, no join. I am running on BSD/OS 3.1 PP200 with SCSI Ultra Barracuda disks. I am seeing the same thing you are. 'lling' and 'tones' take 11-22 seconds on the first few runs, then return almost immediately. If I do another query and come back to the first one, I have the same slowness, with the disk light being on almost constantly. Certain queries seem to lend themselves to speeding up, while 'rol' never seems to get really fast. I have to conclude that because of the way this table is created by slicing words, its layout is almost random. The table is 272MB, and with 8k/page, that is 34,000 pages. If we are going for 2,500 rows, odds are that each row is in a separate page. So, to do the query, we are retrieving 2,500 8k pages, or 20MB of random data on the disk. How long does it take to issue 2,500 disk requests that are scattered all over the disk. Probably 11-22 seconds. My OS only lets me have 400 8k buffers, or 3.2MB of buffer. As we are transferring 2,500 8k pages or 20MB of data, it is no surprise that the buffer cache doesn't help much. Sometime, the data is grouped together on the disk, and that is why some are fast, but often they are not, and certainly in a situation where you are looking for two words to appear on the same row, they certainly are not on adjacent pages. Just started running CLUSTER, and it appears to be taking forever. Does not seem to be using the disk, but a lot of CPU. It appears to be caught in an endless loop. I can attach to a running process, so when I attach to the backend, I see: #$ gdb /u/pg/bin/postgres 29755 GDB is free software and you are welcome to distribute copies of it under certain conditions; type "show copying" to see the conditions. There is absolutely no warranty for GDB; type "show warranty" for details. GDB 4.16 (i386-unknown-bsdi3.0), Copyright 1996 Free Software Foundation, Inc... /usr/local/src/pgsql/pgsql/src/backend/commands/29755: No such file or directory. Attaching to program `/u/pg/bin/postgres', process 29755 0x94999 in WaitIO (buf=0x4c0ccc8, spinlock=0) at bufmgr.c:1107 1107 S_LOCK(&(buf->io_in_progress_lock)); --------------------------------------------------------------------------- Stepping to the next statement shows it is just hung here. The backtrace shows: --------------------------------------------------------------------------- #0 0x9499b in WaitIO (buf=0x4c0ccc8, spinlock=0) at bufmgr.c:1107 #1 0x94109 in BufferAlloc (reln=0x19df90, blockNum=11324, foundPtr=0xefbfb903 "\001\030����=\t", bufferLockHeld=0) at bufmgr.c:400 #2 0x93e91 in ReadBufferWithBufferLock (reln=0x19df90, blockNum=11324, bufferLockHeld=0) at bufmgr.c:255 #3 0x93dee in ReadBuffer (reln=0x19df90, blockNum=11324) at bufmgr.c:174 #4 0xb28f in heap_fetch (relation=0x19df90, seeself=0 '\000', tid=0x1b2256, b=0xefbfb974) at heapam.c:1050 #5 0x303d5 in rebuildheap (OIDNewHeap=11466400, OIDOldHeap=6424406, OIDOldIndex=11466368) at cluster.c:357 #6 0x30111 in cluster (oldrelname=0x112790 "artist_fti", oldindexname=0x10cdd0 "artist_fti_idx") at cluster.c:160 #7 0xa29c1 in ProcessUtility (parsetree=0x1127b0, dest=Remote) at utility.c:626 --------------------------------------------------------------------------- Nothing else is happening on my machine, so it must be locked against itself. I will check into this. CLUSTER may be misbehaving on your machine too. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> OK, I have the artist_fti table with 4.5 million rows, with an index > artist_fti_idx on artist_fti.string. I don't have a 'product' table > because I don't have the disk space, but that is not really a problem > for testing. > > I used the product table to populate the artits_fti, then deleted the > table so I could create the artist_fti_idx index. Single table, no > join. > > I am running on BSD/OS 3.1 PP200 with SCSI Ultra Barracuda disks. > > I am seeing the same thing you are. 'lling' and 'tones' take 11-22 > seconds on the first few runs, then return almost immediately. If I do > another query and come back to the first one, I have the same slowness, > with the disk light being on almost constantly. Certain queries seem to > lend themselves to speeding up, while 'rol' never seems to get really > fast. > > I have to conclude that because of the way this table is created by > slicing words, its layout is almost random. The table is 272MB, and > with 8k/page, that is 34,000 pages. If we are going for 2,500 rows, > odds are that each row is in a separate page. So, to do the query, we > are retrieving 2,500 8k pages, or 20MB of random data on the disk. How > long does it take to issue 2,500 disk requests that are scattered all > over the disk. Probably 11-22 seconds. > > My OS only lets me have 400 8k buffers, or 3.2MB of buffer. As we are > transferring 2,500 8k pages or 20MB of data, it is no surprise that the > buffer cache doesn't help much. Sometime, the data is grouped together > on the disk, and that is why some are fast, but often they are not, and > certainly in a situation where you are looking for two words to appear > on the same row, they certainly are not on adjacent pages. > > Just started running CLUSTER, and it appears to be taking forever. Does > not seem to be using the disk, but a lot of CPU. It appears to be > caught in an endless loop. OK, I have an answer for you. First, the CLUSTER problem I was having was some fluke, probably something else that got stuck somewhere. Not a bug. Second, CLUSTER takes forever because it is moving all over the disk retrieving each row in order. Third, I should have been able to solve this for you sooner. The issues of slowness you are seeing are the exact issues I dealt with five years ago when I designed this fragmentation system. I was so excited that you could develop triggers to slice the words, I had forgotten the other issues. OK, it is. As part of this fragmentation job, EVERY NIGHT, I re-slice the user strings and dump them into a flat file. I then load them into Ingres as a heap table, then modify the table to ISAM. You may say, why every night. Well, the problem is I load the data at 100% fill-factor, and ISAM rapidly becomes fragmented with insertions/deletions. Btree does not become fragmented, but the problem there is that btree use was very slow. I believe this is because ISAM is basically a SORTED HEAP with an index, so everything is close and packed tight. Btree doesn't seem to do that as well. It is more spread out. Users complain about the nightly re-index, but I tell them, 'Hey, you are searching for fragments of words in 200k entries in 6 seconds. I can design it without the night job, but your searches all day will take much longer." That stops the conversation. So, to your solution. CLUSTER is too slow. The disk is going crazy moving single rows into the temp table. I recommend doing a COPY of artist_fti to a flat file, doing a Unix 'sort' on the flat file, then re-loading the data into the artist_fti, and then putting the index on the table and vacuum. I have done this, and now all searches are instantaneous THE FIRST TIME and every time. With this change, I am anxious to hear how fast you can now do your multi-word searches. Daily changes will not really impact performance because they are a small part of the total search, but this process of COPY/sort/reload/reindex will need to be done on a regular basis to maintain performance. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
Bruce Momjian writes: > Just started running CLUSTER, and it appears to be taking forever. Does > not seem to be using the disk, but a lot of CPU. It appears to be > caught in an endless loop. ... > Attaching to program `/u/pg/bin/postgres', process 29755 > 0x94999 in WaitIO (buf=0x4c0ccc8, spinlock=0) at bufmgr.c:1107 > 1107 S_LOCK(&(buf->io_in_progress_lock)); > > --------------------------------------------------------------------------- > > Stepping to the next statement shows it is just hung here. > The backtrace shows: > > --------------------------------------------------------------------------- > > #0 0x9499b in WaitIO (buf=0x4c0ccc8, spinlock=0) at bufmgr.c:1107 > #1 0x94109 in BufferAlloc (reln=0x19df90, blockNum=11324, > foundPtr=0xefbfb903 "\001\030����=\t", bufferLockHeld=0) at bufmgr.c:400 > #2 0x93e91 in ReadBufferWithBufferLock (reln=0x19df90, blockNum=11324, > bufferLockHeld=0) at bufmgr.c:255 > #3 0x93dee in ReadBuffer (reln=0x19df90, blockNum=11324) at bufmgr.c:174 > #4 0xb28f in heap_fetch (relation=0x19df90, seeself=0 '\000', tid=0x1b2256, > b=0xefbfb974) at heapam.c:1050 > #5 0x303d5 in rebuildheap (OIDNewHeap=11466400, OIDOldHeap=6424406, > OIDOldIndex=11466368) at cluster.c:357 > #6 0x30111 in cluster (oldrelname=0x112790 "artist_fti", > oldindexname=0x10cdd0 "artist_fti_idx") at cluster.c:160 > #7 0xa29c1 in ProcessUtility (parsetree=0x1127b0, dest=Remote) > at utility.c:626 This is all just speculation from memory as I do not have the code in front of me, but here is what I think might be happening: heap_fetch -> ReadBuffer -> ReadBufferWithBufferLock -- trying to fetch the next row, needed to read a new page -> BufferAlloc -- to read a new page, we need a buffer, so grab one -- or possibly we found the buffer we were looking for -> WaitIO -- but the target buffer was already being read or written (by another process???) -> S_LOCK(&(buf->io_in_progress_lock)) -- so spin on the buffer in_progress lock waiting for the I/O to complete and clear the lock Apparently, the I/O completion is never seen so the lock is never cleared so this will wait forever. Since the system is stuck at this point, somebody dropped the ball on clearing the io_in_progress_lock spinlock. I can't tell without looking at the code (and probably not even then) if this is I/O initiated by the stuck process and just forgotten about, or if some other process was involved. Were there any other active backends while this was running? If so, did any of them exit abnormally (ie without cleaning up)? Is the stuck process holding any other spinlocks (like the bufmgr spinlock for instance)? (There is an array of held spinlocks in the Proc structure that gets set by SpinAcquire to record spinlocks owned by the current process). If so, is it possible that another process is somehow deadlocked with this one? eg: A has bufmgr spinlock, wants io_in_progress_lock spinlock B has io_in_progress_lock spinlock, wants bufmgr spinlock I hope this is useful... -dg ps: what is CLUSTER anyhow? I have a guess, but... David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 - I realize now that irony has no place in business communications.
> Second, CLUSTER takes forever because it is moving all over the disk > retrieving each row in order. > > So, to your solution. CLUSTER is too slow. The disk is going crazy > moving single rows into the temp table. I recommend doing a COPY of > artist_fti to a flat file, doing a Unix 'sort' on the flat file, then > re-loading the data into the artist_fti, and then putting the index on > the table and vacuum. > > I have done this, and now all searches are instantaneous THE FIRST TIME > and every time. > > With this change, I am anxious to hear how fast you can now do your > multi-word searches. Daily changes will not really impact performance > because they are a small part of the total search, but this process of > COPY/sort/reload/reindex will need to be done on a regular basis to > maintain performance. One more piece of good news. The reason CLUSTER was so slow is because you loaded massive unordered amounts of data into the system. Once you do the COPY out/reload, subsequent clusters will run very quickly, because 99% of the data is already ordered. Only the new/changed data is unordered, so you should be able to rapidly run CLUSTER from then on to keep good performance. I think a user module allowing this word fragment searching will be a big hit with users. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> I have to conclude that because of the way this table is created by > slicing words, its layout is almost random. The table is 272MB, and > with 8k/page, that is 34,000 pages. If we are going for 2,500 rows, > odds are that each row is in a separate page. So, to do the query, we > are retrieving 2,500 8k pages, or 20MB of random data on the disk. How > long does it take to issue 2,500 disk requests that are scattered all > over the disk. Probably 11-22 seconds. > My OS only lets me have 400 8k buffers, or 3.2MB of buffer. As we are > transferring 2,500 8k pages or 20MB of data, it is no surprise that the > buffer cache doesn't help much. Sometime, the data is grouped together > on the disk, and that is why some are fast, but often they are not, and > certainly in a situation where you are looking for two words to appear > on the same row, they certainly are not on adjacent pages. Thanx bruce, this si a good report :) This confirms what I thought myself too btw. > Just started running CLUSTER, and it appears to be taking forever. Does > not seem to be using the disk, but a lot of CPU. It appears to be > caught in an endless loop. Note that cluster actually *did* something on my system. It created a temp-table, which was populated, but very very slowly. I also had disk-activity, but maybe you don't notice much of it because of your SCSI disks.... Maarten _____________________________________________________________________________ | TU Delft, The Netherlands, Faculty of Information Technology and Systems | | Department of Electrical Engineering | | Computer Architecture and Digital Technique section | | M.Boekhold@et.tudelft.nl | -----------------------------------------------------------------------------