Thread: psql is very slow
Hi all,
I have a psql 7.3.4, apache 2.0.40, perl v5.8.0. Database is pretty big, (dump is about 100Megs).
But all the operations are very, very slow.
Is any possibility to make postgresql more quick? (don't tell me to cut the database :))
OS version: 2.4.20-gentoo-r7
RAM: 512MB
CPU:i686 AMD Athlon(tm) XP 1800+
Thanks.
Regards,
Michal
Michal
On 2/22/06 10:23 AM, "Michal Merta" <michal.merta@gmail.com> wrote: > I have a psql 7.3.4, apache 2.0.40, perl v5.8.0. Database is pretty big, > (dump is about 100Megs). > But all the operations are very, very slow. > Is any possibility to make postgresql more quick? (don't tell me to cut the > database :)) I assume that by "psql" you mean postgresql? You'll probably have to be more specific about what you are doing, what your database is like and how it is used, what you want to change, and what kinds of time you are talking about. Sean
am 22.02.2006, um 16:23:16 +0100 mailte Michal Merta folgendes: > Hi all, > > I have a psql 7.3.4, apache 2.0.40, perl v5.8.0. Database is pretty big, > (dump is about 100Megs). > But all the operations are very, very slow. - 7.3 is very, very old, newer verions are much faster! - runs vacuum! - use explain - tell us your slow querys You are from germany? We have a german mailing list too. HTH, Andreas -- Andreas Kretschmer (Kontakt: siehe Header) Heynitz: 035242/47215, D1: 0160/7141639 GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net === Schollglas Unternehmensgruppe ===
PostgreSQL 8.1.3 I'm trying to collect some hard numbers to show just how much it degrades and over how long a time interval. All I have now is anecdotal evidence, and I was hoping to save myself some downtime by seeking advice early. I have a search table which I use for partial-match text searches: CREATE TABLE search ( id int4 NOT NULL DEFAULT nextval('search_id_seq'::regclass), item_id int8 NOT NULL, search_vector ltree NOT NULL, CONSTRAINT search_id_pkey PRIMARY KEY (id), CONSTRAINT search_item_id_fkey FOREIGN KEY (item_id) REFERENCES items (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE ) WITH OIDS; CREATE INDEX lsearch_vector_idx ON search USING gist (search_vector); I have some triggers that insert rows into the search table as rows are inserted into "items". I implimented this yesterday, and the immediate effect was a fantastic return time for partial text searches in the sub-second range. By today, these queries take 10 minutes sometimes... There are about 134000 rows in the table. The table gets analyzed nightly. Should the frequency be more? There are about 1000 rows added a day, only about 30 or so rows removed, and nothing is ever updated. There's not that much turnover. The search vectors are built like this: For a string "Hello World" the ltree is created like 'h.e.l.l.o.w.o.r.l.d' ... If I wanted to find all rows with "orl" in them i would construct an lquery like '*.o.r.l.*' and use the "~" operator in the where clause. I would link to the table "items" by the item_id ... What could be making this go so wrong? Is there a better way to accomplish my task? CG __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Fri, 2006-02-24 at 11:02, CG wrote: > PostgreSQL 8.1.3 > > I'm trying to collect some hard numbers to show just how much it degrades and > over how long a time interval. > > All I have now is anecdotal evidence, and I was hoping to save myself some > downtime by seeking advice early. > > I have a search table which I use for partial-match text searches: > > CREATE TABLE search > ( > id int4 NOT NULL DEFAULT nextval('search_id_seq'::regclass), > item_id int8 NOT NULL, > search_vector ltree NOT NULL, > CONSTRAINT search_id_pkey PRIMARY KEY (id), > CONSTRAINT search_item_id_fkey FOREIGN KEY (item_id) > REFERENCES items (id) MATCH SIMPLE > ON UPDATE CASCADE ON DELETE CASCADE > ) > WITH OIDS; > > CREATE INDEX lsearch_vector_idx > ON search > USING gist > (search_vector); > > I have some triggers that insert rows into the search table as rows are > inserted into "items". > > I implimented this yesterday, and the immediate effect was a fantastic return > time for partial text searches in the sub-second range. By today, these queries > take 10 minutes sometimes... There are about 134000 rows in the table. > > The table gets analyzed nightly. Should the frequency be more? There are about > 1000 rows added a day, only about 30 or so rows removed, and nothing is ever > updated. There's not that much turnover. > > The search vectors are built like this: > > For a string "Hello World" the ltree is created like 'h.e.l.l.o.w.o.r.l.d' ... > If I wanted to find all rows with "orl" in them i would construct an lquery > like '*.o.r.l.*' and use the "~" operator in the where clause. I would link to > the table "items" by the item_id ... > > What could be making this go so wrong? Is there a better way to accomplish my > task? Are you vacuuming regularly, are your fsm settings high enough, and what does vacuum verbose say?
Re: ltree + gist index performance degrades significantly over a night
From
Martijn van Oosterhout
Date:
On Fri, Feb 24, 2006 at 09:02:04AM -0800, CG wrote: > PostgreSQL 8.1.3 > > I'm trying to collect some hard numbers to show just how much it degrades and > over how long a time interval. > > All I have now is anecdotal evidence, and I was hoping to save myself some > downtime by seeking advice early. <snip> > I implimented this yesterday, and the immediate effect was a fantastic return > time for partial text searches in the sub-second range. By today, these queries > take 10 minutes sometimes... There are about 134000 rows in the table. > > The table gets analyzed nightly. Should the frequency be more? There are about > 1000 rows added a day, only about 30 or so rows removed, and nothing is ever > updated. There's not that much turnover. That's very odd. Like the other person said, do you vacuum and analyse? But my question is: is it using the index? What does EXPLAIN / EXPLAIN ANALYZE tell you? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
--- Scott Marlowe <smarlowe@g2switchworks.com> wrote: > Are you vacuuming regularly, are your fsm settings high enough, and what > does vacuum verbose say? Autovacuum is running, but I do a nightly vacuum analyze. When I just do a vacuum analyze on the table I get: data=# vacuum analyze verbose search; INFO: vacuuming "search" INFO: index "search_id_pkey" now contains 1344661 row versions in 5134 pages DETAIL: 9 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.09s/0.07u sec elapsed 4.91 sec. INFO: index "search_vector_idx" now contains 1344672 row versions in 47725 pages DETAIL: 9 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 1.77s/0.37u sec elapsed 407.55 sec. INFO: index "search_item_id_idx" now contains 1344690 row versions in 6652 pages DETAIL: 9 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.24s/0.08u sec elapsed 45.62 sec. INFO: "search": removed 9 row versions in 2 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.05 sec. INFO: "letter_search": found 9 removable, 1344661 nonremovable row versions in 33984 pages DETAIL: 0 dead row versions cannot be removed yet. There were 141 unused item pointers. 0 pages are entirely empty. CPU 2.41s/0.62u sec elapsed 483.06 sec. INFO: vacuuming "pg_toast.pg_toast_174918394" INFO: index "pg_toast_174918394_index" now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: "pg_toast_174918394": found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: analyzing "search" INFO: "search": scanned 3000 of 33984 pages, containing 119035 live rows and 0 dead rows; 3000 rows in sample, 1348428 estimated total rows VACUUM max_fsm_pages = 60000 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 2000 # min 100, ~70 bytes each __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
--- Martijn van Oosterhout <kleptog@svana.org> wrote: > That's very odd. Like the other person said, do you vacuum and analyse? > But my question is: is it using the index? What does EXPLAIN / EXPLAIN > ANALYZE tell you? data=# explain select * from search where search_vector ~ '*.o.r.l.*'::lquery; QUERY PLAN ------------------------------------------------------------------------------------------- Bitmap Heap Scan on search (cost=53.71..4566.65 rows=1345 width=161) Recheck Cond: (search_vector ~ '*.o.r.l.*'::lquery) -> Bitmap Index Scan on search_vector_idx (cost=0.00..53.71 rows=1345 width=0) Index Cond: (search_vector ~ '*.o.r.l.*'::lquery) (4 rows) data=# explain analyze select * from search where search_vector ~ '*.o.r.l.*'::lquery; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on search (cost=53.71..4566.65 rows=1345 width=161) (actual time=183684.156..196997.278 rows=1655 loops=1) Recheck Cond: (search_vector ~ '*.o.r.l.*'::lquery) -> Bitmap Index Scan on search_vector_idx (cost=0.00..53.71 rows=1345 width=0) (actual time=183683.857..183683.857 rows=1655 loops=1) Index Cond: (search_vector ~ '*.o.r.l.*'::lquery) Total runtime: 197000.061 ms (5 rows) I appreciate you taking the time to help me out. Thank you all. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
CG <cgg007@yahoo.com> writes: > INFO: index "search_vector_idx" now contains 1344672 row versions in 47725 pages > INFO: "letter_search": found 9 removable, 1344661 nonremovable row versions in 33984 pages Yikes ... the index is bigger than the table! Seems like you've not been vacuuming enough, or else gist has got a major bloat problem. If you REINDEX, does the index get materially smaller? regards, tom lane
Re: ltree + gist index performance degrades significantly over a night
From
Martijn van Oosterhout
Date:
On Fri, Feb 24, 2006 at 09:44:37AM -0800, CG wrote: > > > --- Martijn van Oosterhout <kleptog@svana.org> wrote: > > > That's very odd. Like the other person said, do you vacuum and analyse? > > But my question is: is it using the index? What does EXPLAIN / EXPLAIN > > ANALYZE tell you? > > data=# explain select * from search where search_vector ~ '*.o.r.l.*'::lquery; <snip> > data=# explain analyze select * from search where search_vector ~ > '*.o.r.l.*'::lquery; <snip> > Total runtime: 197000.061 ms Ouch! The index is obviously not very good in this case. Unfortunatly it's not clear where the slowdown is. You'd probably need to recompile postgresql with profiling to find exactly where it's going wrong. Quick test though, if you disable the index (set enable_indexscan=off) so it does a seqscan, is it faster or slower? By how much? i.e. is it helping much. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Oops! michal.merta@gmail.com ("Michal Merta") was seen spray-painting on a wall: > I have a psql 7.3.4, apache 2.0.40, perl v5.8.0. Database is pretty > big, (dump is about 100Megs). > > But all the operations are very, very slow. > > Is any possibility to make postgresql more quick? (don't tell me to > cut the database :)) Well, have you read the manual to follow the maintenance directions that are recommended? You're running a rather old version; 7.3 was neat enough stuff a couple of years ago, but there have been several substantially improved major versions since then. An upgrade would seem wise. -- let name="cbbrowne" and tld="gmail.com" in name ^ "@" ^ tld;; http://cbbrowne.com/info/slony.html When marriage is outlawed, only outlaws will have inlaws.
On Fri, Feb 24, 2006 at 09:02:04AM -0800, CG wrote: > I have a search table which I use for partial-match text searches: <snip> > For a string "Hello World" the ltree is created like 'h.e.l.l.o.w.o.r.l.d' ... > If I wanted to find all rows with "orl" in them i would construct an lquery > like '*.o.r.l.*' and use the "~" operator in the where clause. I would link to > the table "items" by the item_id ... Is there some reason you can't use tsearch2? I suspect it would probably work better; if nothing else you'd probably get better support since a lot more people use it. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Tsearch2 searches for whole words, and is designed with language in mind, yes? I'm looking for consecutive characters in words or serial numbers, etc. As for support, the same guys who wrote Tsearch2 wrote ltree. Can't go wrong there! Here's the solution to this problem: As usual, operator error. :( For some reason I thought it would be a good idea to cluster the table on the item_id index... What in the world was I thinking? When I clustered the search table on the search_vector index (which makes the most sense, yes?) it seemed to bring actual performance in-line with the type of performance I imagined that I would receive. I could probably get even better performance out of the table, at the cost of a significant increase in table and index size, by chopping up the columns into smaller chunks. "Hello World" would yield 'h.e.l.l.o.w.o.r.l.d' 'e.l.l.o.w.o.r.l.d' 'l.l.o.w.o.r.l.d' 'l.o.w.o.r.l.d' 'o.w.o.r.l.d' 'w.o.r.l.d' 'o.r.l.d' 'r.l.d' and using a wildcard search "search_vector ~ 'o.r.l.*'" would jump right to the vectors which start with "o.r.l" ... Thanks for all the responses! They did get my head pointed in the right direction. CG --- "Jim C. Nasby" <jnasby@pervasive.com> wrote: > On Fri, Feb 24, 2006 at 09:02:04AM -0800, CG wrote: > > I have a search table which I use for partial-match text searches: > <snip> > > For a string "Hello World" the ltree is created like 'h.e.l.l.o.w.o.r.l.d' > ... > > If I wanted to find all rows with "orl" in them i would construct an lquery > > like '*.o.r.l.*' and use the "~" operator in the where clause. I would link > to > > the table "items" by the item_id ... > > Is there some reason you can't use tsearch2? I suspect it would probably > work better; if nothing else you'd probably get better support since a > lot more people use it. > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: ltree + gist index performance degrades significantly over a night
From
Bernhard Weisshuhn
Date:
On Mon, Feb 27, 2006 at 09:14:40AM -0800, CG <cgg007@yahoo.com> wrote: > I could probably get even better performance out of the table, at the cost of a > significant increase in table and index size, by chopping up the columns into > smaller chunks. > > "Hello World" would yield > > 'h.e.l.l.o.w.o.r.l.d' > 'e.l.l.o.w.o.r.l.d' > 'l.l.o.w.o.r.l.d' > 'l.o.w.o.r.l.d' > 'o.w.o.r.l.d' > 'w.o.r.l.d' > 'o.r.l.d' > 'r.l.d' > > and using a wildcard search "search_vector ~ 'o.r.l.*'" would jump right to the > vectors which start with "o.r.l" ... But with this approch you'd be fine with a normal varchar_ops btree index for textfields and searching using "like 'world%'", wouldn't you? Or is the ltree approch more efficient? I'm not trying to be smart-assed, it's a naive question, since I'm looking for an efficient substring search solution in postgresql myself. regards, bkw
That would do the job, wouldn't it? :) I don't think it's a naive question at all. Its quite a good question, and the solution you suggest is a good option to have, and would probably work better than the single-vector ltree index for simple substring matching. In my case, the ltree+gist index table actually contains more pages than the table of data itself. I'd need to see if the space required for the varchar+btree tables are comparible, better, or worse than the ltree+gist tables with regards to size. Now that I think about it, building substrings out of ltree nodes would be incredible overkill comapred to the effetiveness of the varchar+btree. The extra advantages of ltree are the ability to match and extract nodes in a path based not only on contents but also proximity, and aggregate on those characteristics. In my case this might be good for serial numbers where each digit or grouping of digits have special values which would be used to aggregate on. The ltree method was suggested to me a while back when I was frustrated with the performance of "like '%something%'" ... --- Bernhard Weisshuhn <bkw@weisshuhn.de> wrote: > On Mon, Feb 27, 2006 at 09:14:40AM -0800, CG <cgg007@yahoo.com> wrote: > > > I could probably get even better performance out of the table, at the cost > of a > > significant increase in table and index size, by chopping up the columns > into > > smaller chunks. > > > > "Hello World" would yield > > > > 'h.e.l.l.o.w.o.r.l.d' > > 'e.l.l.o.w.o.r.l.d' > > 'l.l.o.w.o.r.l.d' > > 'l.o.w.o.r.l.d' > > 'o.w.o.r.l.d' > > 'w.o.r.l.d' > > 'o.r.l.d' > > 'r.l.d' > > > > and using a wildcard search "search_vector ~ 'o.r.l.*'" would jump right to > the > > vectors which start with "o.r.l" ... > > But with this approch you'd be fine with a normal varchar_ops btree index > for textfields and searching using "like 'world%'", wouldn't you? > Or is the ltree approch more efficient? > > I'm not trying to be smart-assed, it's a naive question, since I'm > looking for an efficient substring search solution in postgresql myself. > > regards, > bkw > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: ltree + gist index performance degrades significantly over a night
From
Bernhard Weisshuhn
Date:
On Mon, Feb 27, 2006 at 10:27:20AM -0800, CG <cgg007@yahoo.com> wrote: > [...] I'd need to see if the space required for the varchar+btree tables are > comparible, better, or worse than the ltree+gist tables with regards to size. Please test this, I'm guessing (hoping actually) that having bazillions of combinations of 26 (or so) characters (ltree labels) might be consuming less space than having bazillions of substings in the database. Or maybe some clever combination of both approaches? If you find out something interesting, please let me know. regards, bkw
--- Bernhard Weisshuhn <bkw@weisshuhn.de> wrote: > On Mon, Feb 27, 2006 at 10:27:20AM -0800, CG <cgg007@yahoo.com> wrote: > > > [...] I'd need to see if the space required for the varchar+btree tables > are > > comparible, better, or worse than the ltree+gist tables with regards to > size. > > Please test this, I'm guessing (hoping actually) that having bazillions of > combinations of 26 (or so) characters (ltree labels) might be consuming > less space than having bazillions of substings in the database. > > Or maybe some clever combination of both approaches? > > If you find out something interesting, please let me know. Performance using varchar+btree, breaking up the string into distinct letter groups >= 3 chars is slightly better. Size of the varchar search vector table table is much bigger.. Most of my fields are about 15-25 characters in length. Expect even bigger tables for longer fields. The size of the btree index is less. The time to bootstrap the data into the tables was significantly longer. I used two triggers, one that normalized the search field before insert, and another that inserted a breakdown row after the insert row. There's a recursive effect built-in to get down to the smallest unique element. I'm sticking with ltree and setting up a vacuum analyze on a cron to keep the searches snappy. Hope that helps you with your project! CG __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com