Thread: psql is very slow

psql is very slow

From
"Michal Merta"
Date:
Hi all,
 
I have a psql 7.3.4, apache 2.0.40, perl  v5.8.0. Database is pretty big, (dump is about 100Megs).
But all the operations are very, very slow.
Is any possibility to make postgresql more quick? (don't tell me to cut the database :))
 
OS version: 2.4.20-gentoo-r7
RAM: 512MB
CPU:i686 AMD Athlon(tm) XP 1800+
 
Thanks.
Regards,
Michal

Re: psql is very slow

From
Sean Davis
Date:


On 2/22/06 10:23 AM, "Michal Merta" <michal.merta@gmail.com> wrote:

> I have a psql 7.3.4, apache 2.0.40, perl  v5.8.0. Database is pretty big,
> (dump is about 100Megs).
> But all the operations are very, very slow.
> Is any possibility to make postgresql more quick? (don't tell me to cut the
> database :))

I assume that by "psql" you mean postgresql?

You'll probably have to be more specific about what you are doing, what your
database is like and how it is used, what you want to change, and what kinds
of time you are talking about.

Sean



Re: psql is very slow

From
"A. Kretschmer"
Date:
am  22.02.2006, um 16:23:16 +0100 mailte Michal Merta folgendes:
> Hi all,
>
> I have a psql 7.3.4, apache 2.0.40, perl  v5.8.0. Database is pretty big,
> (dump is about 100Megs).
> But all the operations are very, very slow.

- 7.3 is very, very old, newer verions are much faster!
- runs vacuum!
- use explain
- tell us your slow querys

You are from germany? We have a german mailing list too.


HTH, Andreas
--
Andreas Kretschmer    (Kontakt: siehe Header)
Heynitz:  035242/47215,      D1: 0160/7141639
GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net
 ===    Schollglas Unternehmensgruppe    ===

ltree + gist index performance degrades significantly over a night

From
CG
Date:
PostgreSQL 8.1.3

I'm trying to collect some hard numbers to show just how much it degrades and
over how long a time interval.

All I have now is anecdotal evidence, and I was hoping to save myself some
downtime by seeking advice early.

I have a search table which I use for partial-match text searches:

CREATE TABLE search
(
  id int4 NOT NULL DEFAULT nextval('search_id_seq'::regclass),
  item_id int8 NOT NULL,
  search_vector ltree NOT NULL,
  CONSTRAINT search_id_pkey PRIMARY KEY (id),
  CONSTRAINT search_item_id_fkey FOREIGN KEY (item_id)
      REFERENCES items (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE
)
WITH OIDS;

CREATE INDEX lsearch_vector_idx
  ON search
  USING gist
  (search_vector);

I have some triggers that insert rows into the search table as rows are
inserted into "items".

I implimented this yesterday, and the immediate effect was a fantastic return
time for partial text searches in the sub-second range. By today, these queries
take 10 minutes sometimes... There are about 134000 rows in the table.

The table gets analyzed nightly. Should the frequency be more? There are about
1000 rows added a day, only about 30 or so rows removed, and nothing is ever
updated. There's not that much turnover.

The search vectors are built like this:

For a string "Hello World" the ltree is created like 'h.e.l.l.o.w.o.r.l.d' ...
If I wanted to find all rows with "orl" in them i would construct an lquery
like '*.o.r.l.*' and use the "~" operator in the where clause. I would link to
the table "items" by the item_id ...

What could be making this go so wrong? Is there a better way to accomplish my
task?

CG

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: ltree + gist index performance degrades

From
Scott Marlowe
Date:
On Fri, 2006-02-24 at 11:02, CG wrote:
> PostgreSQL 8.1.3
>
> I'm trying to collect some hard numbers to show just how much it degrades and
> over how long a time interval.
>
> All I have now is anecdotal evidence, and I was hoping to save myself some
> downtime by seeking advice early.
>
> I have a search table which I use for partial-match text searches:
>
> CREATE TABLE search
> (
>   id int4 NOT NULL DEFAULT nextval('search_id_seq'::regclass),
>   item_id int8 NOT NULL,
>   search_vector ltree NOT NULL,
>   CONSTRAINT search_id_pkey PRIMARY KEY (id),
>   CONSTRAINT search_item_id_fkey FOREIGN KEY (item_id)
>       REFERENCES items (id) MATCH SIMPLE
>       ON UPDATE CASCADE ON DELETE CASCADE
> )
> WITH OIDS;
>
> CREATE INDEX lsearch_vector_idx
>   ON search
>   USING gist
>   (search_vector);
>
> I have some triggers that insert rows into the search table as rows are
> inserted into "items".
>
> I implimented this yesterday, and the immediate effect was a fantastic return
> time for partial text searches in the sub-second range. By today, these queries
> take 10 minutes sometimes... There are about 134000 rows in the table.
>
> The table gets analyzed nightly. Should the frequency be more? There are about
> 1000 rows added a day, only about 30 or so rows removed, and nothing is ever
> updated. There's not that much turnover.
>
> The search vectors are built like this:
>
> For a string "Hello World" the ltree is created like 'h.e.l.l.o.w.o.r.l.d' ...
> If I wanted to find all rows with "orl" in them i would construct an lquery
> like '*.o.r.l.*' and use the "~" operator in the where clause. I would link to
> the table "items" by the item_id ...
>
> What could be making this go so wrong? Is there a better way to accomplish my
> task?

Are you vacuuming regularly, are your fsm settings high enough, and what
does vacuum verbose say?

Re: ltree + gist index performance degrades significantly over a night

From
Martijn van Oosterhout
Date:
On Fri, Feb 24, 2006 at 09:02:04AM -0800, CG wrote:
> PostgreSQL 8.1.3
>
> I'm trying to collect some hard numbers to show just how much it degrades and
> over how long a time interval.
>
> All I have now is anecdotal evidence, and I was hoping to save myself some
> downtime by seeking advice early.

<snip>

> I implimented this yesterday, and the immediate effect was a fantastic return
> time for partial text searches in the sub-second range. By today, these queries
> take 10 minutes sometimes... There are about 134000 rows in the table.
>
> The table gets analyzed nightly. Should the frequency be more? There are about
> 1000 rows added a day, only about 30 or so rows removed, and nothing is ever
> updated. There's not that much turnover.

That's very odd. Like the other person said, do you vacuum and analyse?
But my question is: is it using the index? What does EXPLAIN / EXPLAIN
ANALYZE tell you?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment
--- Scott Marlowe <smarlowe@g2switchworks.com> wrote:

> Are you vacuuming regularly, are your fsm settings high enough, and what
> does vacuum verbose say?

Autovacuum is running, but I do a nightly vacuum analyze. When I just do a
vacuum analyze on the table I get:

data=# vacuum analyze verbose search;
INFO:  vacuuming "search"
INFO:  index "search_id_pkey" now contains 1344661 row versions in 5134 pages
DETAIL:  9 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.09s/0.07u sec elapsed 4.91 sec.
INFO:  index "search_vector_idx" now contains 1344672 row versions in 47725
pages
DETAIL:  9 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.77s/0.37u sec elapsed 407.55 sec.
INFO:  index "search_item_id_idx" now contains 1344690 row versions in 6652
pages
DETAIL:  9 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.24s/0.08u sec elapsed 45.62 sec.
INFO:  "search": removed 9 row versions in 2 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.05 sec.
INFO:  "letter_search": found 9 removable, 1344661 nonremovable row versions in
33984 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 141 unused item pointers.
0 pages are entirely empty.
CPU 2.41s/0.62u sec elapsed 483.06 sec.
INFO:  vacuuming "pg_toast.pg_toast_174918394"
INFO:  index "pg_toast_174918394_index" now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  "pg_toast_174918394": found 0 removable, 0 nonremovable row versions in
0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  analyzing "search"
INFO:  "search": scanned 3000 of 33984 pages, containing 119035 live rows and 0
dead rows; 3000 rows in sample, 1348428 estimated total rows
VACUUM

max_fsm_pages = 60000               # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 2000            # min 100, ~70 bytes each




__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


--- Martijn van Oosterhout <kleptog@svana.org> wrote:

> That's very odd. Like the other person said, do you vacuum and analyse?
> But my question is: is it using the index? What does EXPLAIN / EXPLAIN
> ANALYZE tell you?

data=# explain select * from search where search_vector ~ '*.o.r.l.*'::lquery;
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 Bitmap Heap Scan on search  (cost=53.71..4566.65 rows=1345 width=161)
   Recheck Cond: (search_vector ~ '*.o.r.l.*'::lquery)
   ->  Bitmap Index Scan on search_vector_idx  (cost=0.00..53.71 rows=1345
width=0)
         Index Cond: (search_vector ~ '*.o.r.l.*'::lquery)
(4 rows)

data=# explain analyze select * from search where search_vector ~
'*.o.r.l.*'::lquery;
                                                                    QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on search  (cost=53.71..4566.65 rows=1345 width=161) (actual
time=183684.156..196997.278 rows=1655 loops=1)
   Recheck Cond: (search_vector ~ '*.o.r.l.*'::lquery)
   ->  Bitmap Index Scan on search_vector_idx  (cost=0.00..53.71 rows=1345
width=0) (actual time=183683.857..183683.857 rows=1655 loops=1)
         Index Cond: (search_vector ~ '*.o.r.l.*'::lquery)
 Total runtime: 197000.061 ms
(5 rows)

I appreciate you taking the time to help me out. Thank you all.


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: ltree + gist index performance degrades significantly over a night

From
Tom Lane
Date:
CG <cgg007@yahoo.com> writes:
> INFO:  index "search_vector_idx" now contains 1344672 row versions in 47725 pages
> INFO:  "letter_search": found 9 removable, 1344661 nonremovable row versions in 33984 pages

Yikes ... the index is bigger than the table!  Seems like you've not
been vacuuming enough, or else gist has got a major bloat problem.
If you REINDEX, does the index get materially smaller?

            regards, tom lane

Re: ltree + gist index performance degrades significantly over a night

From
Martijn van Oosterhout
Date:
On Fri, Feb 24, 2006 at 09:44:37AM -0800, CG wrote:
>
>
> --- Martijn van Oosterhout <kleptog@svana.org> wrote:
>
> > That's very odd. Like the other person said, do you vacuum and analyse?
> > But my question is: is it using the index? What does EXPLAIN / EXPLAIN
> > ANALYZE tell you?
>
> data=# explain select * from search where search_vector ~ '*.o.r.l.*'::lquery;

<snip>

> data=# explain analyze select * from search where search_vector ~
> '*.o.r.l.*'::lquery;

<snip>
>  Total runtime: 197000.061 ms

Ouch! The index is obviously not very good in this case. Unfortunatly
it's not clear where the slowdown is. You'd probably need to recompile
postgresql with profiling to find exactly where it's going wrong.

Quick test though, if you disable the index (set enable_indexscan=off)
so it does a seqscan, is it faster or slower? By how much? i.e. is it
helping much.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: psql is very slow

From
Christopher Browne
Date:
Oops! michal.merta@gmail.com ("Michal Merta") was seen spray-painting on a wall:
> I have a psql 7.3.4, apache 2.0.40, perl  v5.8.0. Database is pretty
> big, (dump is about 100Megs).
>
> But all the operations are very, very slow.
>
> Is any possibility to make postgresql more quick? (don't tell me to
> cut the database :))

Well, have you read the manual to follow the maintenance directions
that are recommended?

You're running a rather old version; 7.3 was neat enough stuff a
couple of years ago, but there have been several substantially
improved major versions since then.  An upgrade would seem wise.
--
let name="cbbrowne" and tld="gmail.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/slony.html
When marriage is outlawed, only outlaws will have inlaws.

Re: ltree + gist index performance degrades significantly over a night

From
"Jim C. Nasby"
Date:
On Fri, Feb 24, 2006 at 09:02:04AM -0800, CG wrote:
> I have a search table which I use for partial-match text searches:
<snip>
> For a string "Hello World" the ltree is created like 'h.e.l.l.o.w.o.r.l.d' ...
> If I wanted to find all rows with "orl" in them i would construct an lquery
> like '*.o.r.l.*' and use the "~" operator in the where clause. I would link to
> the table "items" by the item_id ...

Is there some reason you can't use tsearch2? I suspect it would probably
work better; if nothing else you'd probably get better support since a
lot more people use it.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Tsearch2 searches for whole words, and is designed with language in mind, yes?

I'm looking for consecutive characters in words or serial numbers, etc.

As for support, the same guys who wrote Tsearch2 wrote ltree. Can't go wrong
there!

Here's the solution to this problem: As usual, operator error. :(

For some reason I thought it would be a good idea to cluster the table on the
item_id index... What in the world was I thinking? When I clustered the search
table on the search_vector index (which makes the most sense, yes?) it seemed
to bring actual performance in-line with the type of performance I imagined
that I would receive.

I could probably get even better performance out of the table, at the cost of a
significant increase in table and index size, by chopping up the columns into
smaller chunks.

"Hello World" would yield

'h.e.l.l.o.w.o.r.l.d'
'e.l.l.o.w.o.r.l.d'
'l.l.o.w.o.r.l.d'
'l.o.w.o.r.l.d'
'o.w.o.r.l.d'
'w.o.r.l.d'
'o.r.l.d'
'r.l.d'

and using a wildcard search "search_vector ~ 'o.r.l.*'" would jump right to the
vectors which start with "o.r.l" ...

Thanks for all the responses! They did get my head pointed in the right
direction.

CG

--- "Jim C. Nasby" <jnasby@pervasive.com> wrote:

> On Fri, Feb 24, 2006 at 09:02:04AM -0800, CG wrote:
> > I have a search table which I use for partial-match text searches:
> <snip>
> > For a string "Hello World" the ltree is created like 'h.e.l.l.o.w.o.r.l.d'
> ...
> > If I wanted to find all rows with "orl" in them i would construct an lquery
> > like '*.o.r.l.*' and use the "~" operator in the where clause. I would link
> to
> > the table "items" by the item_id ...
>
> Is there some reason you can't use tsearch2? I suspect it would probably
> work better; if nothing else you'd probably get better support since a
> lot more people use it.
> --
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
>



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: ltree + gist index performance degrades significantly over a night

From
Bernhard Weisshuhn
Date:
On Mon, Feb 27, 2006 at 09:14:40AM -0800, CG <cgg007@yahoo.com> wrote:

> I could probably get even better performance out of the table, at the cost of a
> significant increase in table and index size, by chopping up the columns into
> smaller chunks.
>
> "Hello World" would yield
>
> 'h.e.l.l.o.w.o.r.l.d'
> 'e.l.l.o.w.o.r.l.d'
> 'l.l.o.w.o.r.l.d'
> 'l.o.w.o.r.l.d'
> 'o.w.o.r.l.d'
> 'w.o.r.l.d'
> 'o.r.l.d'
> 'r.l.d'
>
> and using a wildcard search "search_vector ~ 'o.r.l.*'" would jump right to the
> vectors which start with "o.r.l" ...

But with this approch you'd be fine with a normal varchar_ops btree index
for textfields and searching using "like 'world%'", wouldn't you?
Or is the ltree approch more efficient?

I'm not trying to be smart-assed, it's a naive question, since I'm
looking for an efficient substring search solution in postgresql myself.

regards,
bkw


That would do the job, wouldn't it? :)

I don't think it's a naive question at all. Its quite a good question, and the
solution you suggest is a good option to have, and would probably work better
than the single-vector ltree index for simple substring matching. In my case,
the ltree+gist index table actually contains more pages than the table of data
itself. I'd need to see if the space required for the varchar+btree tables are
comparible, better, or worse than the ltree+gist tables with regards to size.
Now that I think about it, building substrings out of ltree nodes would be
incredible overkill comapred to the effetiveness of the varchar+btree.

The extra advantages of ltree are the ability to match and extract nodes in a
path based not only on contents but also proximity, and aggregate on those
characteristics.

In my case this might be good for serial numbers where each digit or grouping
of digits have special values which would be used to aggregate on.

The ltree method was suggested to me a while back when I was frustrated with
the performance of "like '%something%'" ...



--- Bernhard Weisshuhn <bkw@weisshuhn.de> wrote:

> On Mon, Feb 27, 2006 at 09:14:40AM -0800, CG <cgg007@yahoo.com> wrote:
>
> > I could probably get even better performance out of the table, at the cost
> of a
> > significant increase in table and index size, by chopping up the columns
> into
> > smaller chunks.
> >
> > "Hello World" would yield
> >
> > 'h.e.l.l.o.w.o.r.l.d'
> > 'e.l.l.o.w.o.r.l.d'
> > 'l.l.o.w.o.r.l.d'
> > 'l.o.w.o.r.l.d'
> > 'o.w.o.r.l.d'
> > 'w.o.r.l.d'
> > 'o.r.l.d'
> > 'r.l.d'
> >
> > and using a wildcard search "search_vector ~ 'o.r.l.*'" would jump right to
> the
> > vectors which start with "o.r.l" ...
>
> But with this approch you'd be fine with a normal varchar_ops btree index
> for textfields and searching using "like 'world%'", wouldn't you?
> Or is the ltree approch more efficient?
>
> I'm not trying to be smart-assed, it's a naive question, since I'm
> looking for an efficient substring search solution in postgresql myself.
>
> regards,
> bkw
>
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: ltree + gist index performance degrades significantly over a night

From
Bernhard Weisshuhn
Date:
On Mon, Feb 27, 2006 at 10:27:20AM -0800, CG <cgg007@yahoo.com> wrote:

> [...] I'd need to see if the space required for the varchar+btree tables are
> comparible, better, or worse than the ltree+gist tables with regards to size.

Please test this, I'm guessing (hoping actually) that having bazillions of
combinations of 26 (or so) characters (ltree labels) might be consuming
less space than having bazillions of substings in the database.

Or maybe some clever combination of both approaches?

If you find out something interesting, please let me know.

regards,
bkw



--- Bernhard Weisshuhn <bkw@weisshuhn.de> wrote:

> On Mon, Feb 27, 2006 at 10:27:20AM -0800, CG <cgg007@yahoo.com> wrote:
>
> > [...] I'd need to see if the space required for the varchar+btree tables
> are
> > comparible, better, or worse than the ltree+gist tables with regards to
> size.
>
> Please test this, I'm guessing (hoping actually) that having bazillions of
> combinations of 26 (or so) characters (ltree labels) might be consuming
> less space than having bazillions of substings in the database.
>
> Or maybe some clever combination of both approaches?
>
> If you find out something interesting, please let me know.

Performance using varchar+btree, breaking up the string into distinct letter
groups >= 3 chars is slightly better. Size of the varchar search vector table
table is much bigger.. Most of my fields are about 15-25 characters in length.
Expect even bigger tables for longer fields. The size of the btree index is
less. The time to bootstrap the data into the tables was significantly longer.
I used two triggers, one that normalized the search field before insert, and
another that inserted a breakdown row after the insert row. There's a recursive
effect built-in to get down to the smallest unique element.

I'm sticking with ltree and setting up a vacuum analyze on a cron to keep the
searches snappy. Hope that helps you with your project!

CG

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com