Thread: Postmaster processes taking all the CPU
.ExternalClass P {padding:0px;} .ExternalClass {font-size:10pt;font-family:Tahoma;} Hi,
I have this server that I use as db database. It's decent box Ubuntu, 2GB, AMD Barton 2.8Gb L2 2Mb. DB version is 7.4.7 - that version was the only one available at that time. I have it for about 2 years in similar configuration. Lately I've notices that a pack of postmaster(4-22) processes ran by postgres user are taking over almost all the CPU. If I restart or reboot it calms down but picks up again after a while. I do use psqladmin to vaccum the database but I have nothing schedule to run. Data is added on a daily basis and is under 100 records a day.
My plan is to enable all posible logging and take it from there. I checked syslog there is no error there, I know that too many files open can be a problem. I see none of those or similar. Also I don't see any problem into the syslog. I run a web app that has a connection pool of 25 connection and what appears to happen is that those connection are not finishing or they are very, very slow. Too me it looks that they are waiting on something. Is there a way to see what each postmaster runs or do ?
What's the best approach in this case ? Anyone dealt with similar cases, what was the culprit ?
Thanks,
MC
Anyone ?
From: mcmoisei@hotmail.com
To: pgsql-general@postgresql.org
Subject: [GENERAL] Postmaster processes taking all the CPU
Date: Fri, 8 Jun 2007 13:23:00 -0500.ExternalClass P {padding:0px;} .ExternalClass {font-size:10pt;font-family:Tahoma;} .ExternalClass P {padding:0px;} .ExternalClass {font-size:10pt;font-family:Tahoma;} Hi,
I have this server that I use as db server. It's decent box Ubuntu, 2GB, AMD Barton 2.8Gb L2 2Mb. DB version is 7.4.7 - that version was the only one available at that time. I have it for about 2 years in similar configuration. Lately I've notices that a pack of postmaster(4-22) processes ran by postgres user are taking over almost all the CPU. If I restart or reboot it calms down but picks up again after a while. I do use psqladmin to vaccum the database but I have nothing schedule to run. Data is added on a daily basis and is under 100 records a day.
My plan is to enable all posible logging and take it from there. I checked syslog there is no error there, I know that too many files open can be a problem. I see none of those or similar. Also I don't see any problem into the syslog. I run a web app that has a connection pool of 25 connection and what appears to happen is that those connection are not finishing or they are very, very slow. Too me it looks that they are waiting on something. Is there a way to see what each postmaster runs or do ?
What's the best approach in this case ? Anyone dealt with similar cases, what was the culprit ?
Thanks,
MC
On Fri, Jun 08, 2007 at 03:20:28PM -0500, MC Moisei wrote: > > pack of postmaster(4-22) processes ran by postgres user are taking > over almost all the CPU. What else is the box doing? If it doesn't have any other work to do, why shouldn't postgres use the CPU time? (This is a way of saying, "You didn't tell us anything that would allow us to help.") A -- Andrew Sullivan | ajs@crankycanuck.ca In the future this spectacle of the middle classes shocking the avant- garde will probably become the textbook definition of Postmodernism. --Brad Holland
Have you done a full vacuum and not just a reqular vacuum? - Ericson Smith Developer http://www.funadvice.com On 6/8/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote: > On Fri, Jun 08, 2007 at 03:20:28PM -0500, MC Moisei wrote: > > > > pack of postmaster(4-22) processes ran by postgres user are taking > > over almost all the CPU. > > What else is the box doing? If it doesn't have any other work to do, > why shouldn't postgres use the CPU time? (This is a way of saying, > "You didn't tell us anything that would allow us to help.") > > A > > -- > Andrew Sullivan | ajs@crankycanuck.ca > In the future this spectacle of the middle classes shocking the avant- > garde will probably become the textbook definition of Postmodernism. > --Brad Holland > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/ >
.ExternalClass P {padding:0px;} .ExternalClass {font-size:10pt;font-family:Tahoma;} I'm not sure I understand the question. What else runs on it ?
I have an Apache that fronts a Tomcat (Java Enterprise App Server). In tomcat I only run this application that has a connection pool of 30 connections(if I remember correctly).
Once the application starts to open connections it looks that the each postmaster associated with the connection is not exiting as fast as was before. I can follow up with a ps -aux capture if you think that's helpful. Till yesterday all was working smoothly for about 2 years. It looks like the postmasters are not finishing of if they do takes a good while to finish. Also I've seen that the swap increases. I never use to have swap used. I don't have space problems not errors in the syslog.
Am I running out of memory and all gets delayed by the swap handling ? I have the feeling that I spin around my tail. So these processes are taking all the CPU and memory and they hold for too long just doing a select. The traffic didn't increase by any means so one can say that causes the problem - at one point it sustained 4 times more traffic without problems.
Hope this provide more insight.
MC
> Date: Fri, 8 Jun 2007 16:35:40 -0400
> From: ajs@crankycanuck.ca
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Postmaster processes taking all the CPU
>
> On Fri, Jun 08, 2007 at 03:20:28PM -0500, MC Moisei wrote:
> >
> > pack of postmaster(4-22) processes ran by postgres user are taking
> > over almost all the CPU.
>
> What else is the box doing? If it doesn't have any other work to do,
> why shouldn't postgres use the CPU time? (This is a way of saying,
> "You didn't tell us anything that would allow us to help.")
>
> A
>
> --
> Andrew Sullivan | ajs@crankycanuck.ca
> In the future this spectacle of the middle classes shocking the avant-
> garde will probably become the textbook definition of Postmodernism.
> --Brad Holland
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org/
I did that remotely, thru the psqladmin. How do I do it from that box ?
> Date: Fri, 8 Jun 2007 16:41:57 -0400
> From: esconsult1@gmail.com
> Subject: Re: [GENERAL] Postmaster processes taking all the CPU
> CC: pgsql-general@postgresql.org
>
> Have you done a full vacuum and not just a reqular vacuum?
>
> - Ericson Smith
> Developer
> http://www.funadvice.com
>
> On 6/8/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
> > On Fri, Jun 08, 2007 at 03:20:28PM -0500, MC Moisei wrote:
> > >
> > > pack of postmaster(4-22) processes ran by postgres user are taking
> > > over almost all the CPU.
> >
> > What else is the box doing? If it doesn't have any other work to do,
> > why shouldn't postgres use the CPU time? (This is a way of saying,
> > "You didn't tell us anything that would allow us to help.")
> >
> > A
> >
> > --
> > Andrew Sullivan | ajs@crankycanuck.ca
> > In the future this spectacle of the middle classes shocking the avant-
> > garde will probably become the textbook definition of Postmodernism.
> > --Brad Holland
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> > http://archives.postgresql.org/
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
I mean, have you run a VACUUM FULL VERBOSE ANALYZE; lately? SInce you're constantly inserting stuff (are you updating too?), if you have have not analyzed recently, then the planner will give you crappy queries. Also, if you're updating that table frequently, lots of dead tuples will remain in there if you don't do a VACUUM FULL regularly. Reminds me of when I just started using Postgresql around 5 years ago, and we had a tiny table with 10 records that was updated every few seconds or so with a status. Over a fairly short period of time, things got super slow and we did no know why. Doing a VACUUM VERBOSE showed tens of thousands of records after just a couple of days. VACUUM FULL cleaned that out right quick. You might also want to do a dump and restore as well to prevent transaction wraparound. - Ericson Smith Developer http://www.funadvice.com On 6/8/07, MC Moisei <mcmoisei@hotmail.com> wrote: > > > > I'm not sure I understand the question. What else runs on it ? > > I have an Apache that fronts a Tomcat (Java Enterprise App Server). In > tomcat I only run this application that has a connection pool of 30 > connections(if I remember correctly). > > Once the application starts to open connections it looks that the each > postmaster associated with the connection is not exiting as fast as was > before. I can follow up with a ps -aux capture if you think that's helpful. > Till yesterday all was working smoothly for about 2 years. It looks like the > postmasters are not finishing of if they do takes a good while to finish. > Also I've seen that the swap increases. I never use to have swap used. I > don't have space problems not errors in the syslog. > > Am I running out of memory and all gets delayed by the swap handling ? I > have the feeling that I spin around my tail. So these processes are taking > all the CPU and memory and they hold for too long just doing a select. The > traffic didn't increase by any means so one can say that causes the problem > - at one point it sustained 4 times more traffic without problems. > > Hope this provide more insight. > > MC > > > > > > ________________________________ > > Date: Fri, 8 Jun 2007 16:35:40 -0400 > > From: ajs@crankycanuck.ca > > To: pgsql-general@postgresql.org > > Subject: Re: [GENERAL] Postmaster processes taking all the CPU > > > > On Fri, Jun 08, 2007 at 03:20:28PM -0500, MC Moisei wrote: > > > > > > pack of postmaster(4-22) processes ran by postgres user are taking > > > over almost all the CPU. > > > > What else is the box doing? If it doesn't have any other work to do, > > why shouldn't postgres use the CPU time? (This is a way of saying, > > "You didn't tell us anything that would allow us to help.") > > > > A > > > > -- > > Andrew Sullivan | ajs@crankycanuck.ca > > In the future this spectacle of the middle classes shocking the avant- > > garde will probably become the textbook definition of Postmodernism. > > --Brad Holland > > > > ---------------------------(end of > broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org/ >
First, your mail is coming through really garbled. Maybe you need to add some linebreaks or something? Anyway On Fri, Jun 08, 2007 at 03:58:40PM -0500, MC Moisei wrote: > > I'm not sure I understand the question. What else runs on it ?I > have an Apache that fronts a Tomcat (Java Enterprise App Server). > In tomcat I only run this application that has a connection pool of > 30 connections(if I remember correctly).Once the application starts All on the same box? And maybe you better check exactly how it's configured if you want support, eh? > to finish. Also I've seen that the swap increases. I never use to > have swap used. I don't have space problems not errors in the If you're into swap, that suggests you are running out of memory. That'd explain just about everything. Have you tuned postgres so that it can use more memory than you actually have? After two years, I'd expect the data to be larger, which might mean you have reached some threshold where an optimisation you made that wasn't actually right is now really wrong. If you're swapping, the CPU time is probably going to bringing some data back in from disk (i.e. it's actually in OS calls). A -- Andrew Sullivan | ajs@crankycanuck.ca I remember when computers were frustrating because they *did* exactly what you told them to. That actually seems sort of quaint now. --J.D. Baldwin
Hello,
if vacuuming does not help, you may also want to log all queries running more than x milliseconds to help localize the problem.
see log_min_duration_statement=xxxx #ms in postgresql.conf
(i didn't check its avaibility in older version)
HTH,
Marc
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of MC Moisei
Sent: Friday, June 08, 2007 11:11 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Postmaster processes taking all the CPU
I did that remotely, thru the psqladmin. How do I do it from that box ?
> Date: Fri, 8 Jun 2007 16:41:57 -0400
> From: esconsult1@gmail.com
> Subject: Re: [GENERAL] Postmaster processes taking all the CPU
> CC: pgsql-general@postgresql.org
>
> Have you done a full vacuum and not just a reqular vacuum?
>
> - Ericson Smith
> Developer
> http://www.funadvice.com
>
> On 6/8/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
> > On Fri, Jun 08, 2007 at 03:20:28PM -0500, MC Moisei wrote:
> > >
> > > pack of postmaster(4-22) processes ran by postgres user are taking
> > > over almost all the CPU.
> >
> > What else is the box doing? If it doesn't have any other work to do,
> > why shouldn't postgres use the CPU time? (This is a way of saying,
> > "You didn't tell us anything that would allow us to help.")
> >
> > A
> >
> > --
> > Andrew Sullivan | ajs@crankycanuck.ca
> > In the future this spectacle of the middle classes shocking the avant-
> > garde will probably become the textbook definition of Postmodernism.
> > --Brad Holland
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> > http://archives.postgresql.org/
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
On Fri, Jun 08, 2007 at 05:11:44PM -0400, Ericson Smith wrote: > > Also, if you're updating that table frequently, lots of dead tuples > will remain in there if you don't do a VACUUM FULL regularly. No, they won't. No well-tuned postgres installation has needed VACUUM FULL in a long time. VACUUM FULL is there to save you if you've overshot the regular VACUUM capabilities. This is all explained in the manual. > You might also want to do a dump and restore as well to prevent > transaction wraparound. You better not need this any more, either, if you're doing regular VACUUM. That hasn't been a problem since 7.2. A -- Andrew Sullivan | ajs@crankycanuck.ca When my information changes, I alter my conclusions. What do you do sir? --attr. John Maynard Keynes
Yes all the connection are coming from within the box so no network latency.
Well, isn't the swap can be because too many process postmaster are requiring more memory. I will reproduce it and I'd try post a memory and processes footprint.
The reason I said I feel like spinning around the tail is that if "something" delays my many postmasters would automatically feel my 2GB off RAM and take over the CPU and eventually swap.
Thanks for getting back to me, much appreciated.
MC
Well, isn't the swap can be because too many process postmaster are requiring more memory. I will reproduce it and I'd try post a memory and processes footprint.
The reason I said I feel like spinning around the tail is that if "something" delays my many postmasters would automatically feel my 2GB off RAM and take over the CPU and eventually swap.
Thanks for getting back to me, much appreciated.
MC
> Date: Fri, 8 Jun 2007 17:55:26 -0400
> From: ajs@crankycanuck.ca
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Postmaster processes taking all the CPU
>
> First, your mail is coming through really garbled. Maybe you need to
> add some linebreaks or something? Anyway
>
> On Fri, Jun 08, 2007 at 03:58:40PM -0500, MC Moisei wrote:
> >
> > I'm not sure I understand the question. What else runs on it ?I
> > have an Apache that fronts a Tomcat (Java Enterprise App Server).
> > In tomcat I only run this application that has a connection pool of
> > 30 connections(if I remember correctly).Once the application starts
>
> All on the same box? And maybe you better check exactly how it's
> configured if you want support, eh?
>
> > to finish. Also I've seen that the swap increases. I never use to
> > have swap used. I don't have space problems not errors in the
>
> If you're into swap, that suggests you are running out of memory.
> That'd explain just about everything. Have you tuned postgres so
> that it can use more memory than you actually have? After two years,
> I'd expect the data to be larger, which might mean you have reached
> some threshold where an optimisation you made that wasn't actually
> right is now really wrong. If you're swapping, the CPU time is
> probably going to bringing some data back in from disk (i.e. it's
> actually in OS calls).
>
> A
>
> --
> Andrew Sullivan | ajs@crankycanuck.ca
> I remember when computers were frustrating because they *did* exactly what
> you told them to. That actually seems sort of quaint now.
> --J.D. Baldwin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
On Fri, Jun 08, 2007 at 05:08:26PM -0500, MC Moisei wrote: > Yes all the connection are coming from within the box so no network > latency.Well, isn't the swap can be because too many process > postmaster are requiring more memory. But why are they requring more memory? Do you maybe have (e.g.) work_mem set too high, and that's what is causing your problem? Or shared buffers too big? This is a common error, and on a smaller set of data, it won't hurt; but when the data gets to a point, you lose. A -- Andrew Sullivan | ajs@crankycanuck.ca A certain description of men are for getting out of debt, yet are against all taxes for raising money to pay it off. --Alexander Hamilton
MC Moisei wrote: > > I'm not sure I understand the question. What else runs on it ?I have an Apache that fronts a Tomcat (Java Enterprise AppServer). In tomcat I only run this application that has a connection pool of 30 connections(if I remember correctly).Oncethe application starts to open connections it looks that the each postmaster associated with the connectionis not exiting as fast as was before. I can follow up with a ps -aux capture if you think that's helpful. Tillyesterday all was working smoothly for about 2 years. It looks like the postmasters are not finishing of if they do takesa good while to finish. Also I've seen that the swap increases. I never use to have swap used. I don't have space problemsnot errors in the syslog.Am I running out of memory and all gets delayed by the swap handling ? I have the feelingthat I spin around my tail. So these processes are taking all the CPU and memory and they hold for too long just doinga select. The traffic didn't increase by any me ans so one can say that causes the problem - at one point it sustained 4 times more traffic without problems.Hope this providemore insight.MC> Date: Fri, 8 Jun 2007 16:35:40 -0400> From: ajs@crankycanuck.ca> To: pgsql-general@postgresql.org>Subject: Re: [GENERAL] Postmaster processes taking all the CPU> > On Fri, Jun 08, 2007 at 03:20:28PM-0500, MC Moisei wrote:> > > > pack of postmaster(4-22) processes ran by postgres user are taking> > over almostall the CPU. > > What else is the box doing? If it doesn't have any other work to do,> why shouldn't postgres usethe CPU time? (This is a way of saying,> "You didn't tell us anything that would allow us to help.")> > A> > -- > AndrewSullivan | ajs@crankycanuck.ca> In the future this spectacle of the middle classes shocking the avant-> garde willprobably become the textbook definition of Postmodernism. > --Brad Holland> > ---------------------------(endof broadcast)---------------------------> TIP 4: Have you searched our list archives?> > http://archives.postgresql.org/ > Could you please send your messages as plain text? Your mail client is doing something foul to the plain text alternative incorporated in the multi-part message. It's almost unreadable this way. -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
I promised that I will get back to the group with the reason. Well, of course was a query :). I do use a search engine file system based(lucene) that will take any desired entity saved into the database and find the primary keys and then do a
select * from entity where id is in (:ids)
If I get too many matches(3000-4000)... that will delay my postmaster and that postmaster associated with the query would take 10-15 minutes to process that query. So, now I limit that to 500, anything bigger than that will ask user to refine the query.
However this whole investigation made me observe some things. On my server box 7.4.7 I have some queries that are executing pretty slow 1.2-1.5secs it's a lot for a query that goes thru 5000 records. On my local environment 8.1 the same queries on similar table size executes much faster like 200-400ms. Do you know if this is a known issue or my dev box is better than my server box? I do have indexes on those fields I have criteria and order by on, I did run the reindex and I did a full vacuum. Anything else I need to do or I just need to go ahead an do an upgrade to 8.2 ?
Another thing I try to figure it out is the postgresql.conf setting. I wanted only to log the statements that are taking more than 500ms, I enabled that in my conf file but that alone won't track the statements. It looks I need pt make log_statement=true but that will track all the statements. Is there anything I miss here ?
Please let me know what you think.
MC
Ps.
I heard people complaining about my posting format. I use the hotmail web interface and the way they send the message is beyond my control ;-|
select * from entity where id is in (:ids)
If I get too many matches(3000-4000)... that will delay my postmaster and that postmaster associated with the query would take 10-15 minutes to process that query. So, now I limit that to 500, anything bigger than that will ask user to refine the query.
However this whole investigation made me observe some things. On my server box 7.4.7 I have some queries that are executing pretty slow 1.2-1.5secs it's a lot for a query that goes thru 5000 records. On my local environment 8.1 the same queries on similar table size executes much faster like 200-400ms. Do you know if this is a known issue or my dev box is better than my server box? I do have indexes on those fields I have criteria and order by on, I did run the reindex and I did a full vacuum. Anything else I need to do or I just need to go ahead an do an upgrade to 8.2 ?
Another thing I try to figure it out is the postgresql.conf setting. I wanted only to log the statements that are taking more than 500ms, I enabled that in my conf file but that alone won't track the statements. It looks I need pt make log_statement=true but that will track all the statements. Is there anything I miss here ?
Please let me know what you think.
MC
Ps.
I heard people complaining about my posting format. I use the hotmail web interface and the way they send the message is beyond my control ;-|
> Date: Fri, 8 Jun 2007 18:13:02 -0400
> From: ajs@crankycanuck.ca
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Postmaster processes taking all the CPU
>
> On Fri, Jun 08, 2007 at 05:08:26PM -0500, MC Moisei wrote:
> > Yes all the connection are coming from within the box so no network
> > latency.Well, isn't the swap can be because too many process
> > postmaster are requiring more memory.
>
> But why are they requring more memory? Do you maybe have (e.g.)
> work_mem set too high, and that's what is causing your problem? Or
> shared buffers too big? This is a common error, and on a smaller set
> of data, it won't hurt; but when the data gets to a point, you lose.
>
> A
>
> --
> Andrew Sullivan | ajs@crankycanuck.ca
> A certain description of men are for getting out of debt, yet are
> against all taxes for raising money to pay it off.
> --Alexander Hamilton
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
> I promised that I will get back to the group with the reason. Well, of > course was a query :). I do use a search engine file system > based(lucene) that will take any desired entity saved into the database > and find the primary keys and then do a select * from entity where id is > in (:ids)If I get too many matches(3000-4000)... that will delay my > postmaster and that postmaster associated with the query would take > 10-15 minutes to process that query. So, now I limit that to 500, > anything bigger than that will ask user to refine the query.However this > whole investigation made me observe some things. On my server box 7.4.7 > I have some queries that are executing pretty slow 1.2-1.5secs it's a > lot for a query that goes thru 5000 records. On my local environment 8.1 > the same queries on similar table size executes much faster like > 200-400ms. Do you know if this is a known issue or my dev box is better > than my server box? PG developers are not Microsoft, hence, every version of PG gets meaner and faster instead of slower and more bloated. The speedup you see might simply be due to developer competence... Anyway, if you use Lucene for full text search (I didn't try it, I tried Xapian which seems similar) : Users don't care about searches returning 5000 results ; they usually only look at the first pages. It's the sorting that is important, and this should be done by Lucene itself (sorting on phrase relevance, for instance, is a search engine job). So, you are right in limiting the results to 500, it could even be lower. The role of the search engine is to sort the good results at the top of the list so users are happy. Do you really display 500 results ? If you paginate the results, the users will probably never go past page3. Fetching a lot of results is therefore a waste of database power : just fetch the ids from Lucene and grab the results from Postgres, but only for the page you are going to display. If you use Postgres to do the sorting (example : lucene matches the records and you want to order them by a field which is not stored in Lucene but in Postgres) obviously this advice does not apply.
Thanks for replying!
I will try to upgrade to the latest on my prod box. Any easy way to do that ? Can I have two releases in the same time 7.4.7 and 8.2.3 ? This is a phased approach so it may take a little while to do it and I want to have 7.4.7 till I'd be able to switch to the 8.2.3( on a diff. port number)
On the speedup of the queries I doubt it's a developoer thing since on my dev box (8.1) the same queries are way faster.
On the lucene thing I do let the Lucene provide the best matches and I take those ids and feed the my query with...
select * from entity where id is in (:ids)
So if the entity scores more in Lucene search it will show higher, works pretty well. I can limit it to less first 100 result. I only display 5 at the time and I use a paginated approach, since you asked me.
Thanks again,
MC
I will try to upgrade to the latest on my prod box. Any easy way to do that ? Can I have two releases in the same time 7.4.7 and 8.2.3 ? This is a phased approach so it may take a little while to do it and I want to have 7.4.7 till I'd be able to switch to the 8.2.3( on a diff. port number)
On the speedup of the queries I doubt it's a developoer thing since on my dev box (8.1) the same queries are way faster.
On the lucene thing I do let the Lucene provide the best matches and I take those ids and feed the my query with...
select * from entity where id is in (:ids)
So if the entity scores more in Lucene search it will show higher, works pretty well. I can limit it to less first 100 result. I only display 5 at the time and I use a paginated approach, since you asked me.
Thanks again,
MC
> Date: Fri, 15 Jun 2007 20:03:08 +0200
> To: mcmoisei@hotmail.com; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Postmaster processes taking all the CPU
> From: lists@peufeu.com
>
>
> > I promised that I will get back to the group with the reason. Well, of
> > course was a query :). I do use a search engine file system
> > based(lucene) that will take any desired entity saved into the database
> > and find the primary keys and then do a select * from entity where id is
> > in (:ids)If I get too many matches(3000-4000)... that will delay my
> > postmaster and that postmaster associated with the query would take
> > 10-15 minutes to process that query. So, now I limit that to 500,
> > anything bigger than that will ask user to refine the query.However this
> > whole investigation made me observe some things. On my server box 7.4.7
> > I have some queries that are executing pretty slow 1.2-1.5secs it's a
> > lot for a query that goes thru 5000 records. On my local environment 8.1
> > the same queries on similar table size executes much faster like
> > 200-400ms. Do you know if this is a known issue or my dev box is better
> > than my server box?
>
> PG developers are not Microsoft, hence, every version of PG gets meaner
> and faster instead of slower and more bloated.
> The speedup you see might simply be due to developer competence...
>
> Anyway, if you use Lucene for full text search (I didn't try it, I tried
> Xapian which seems similar) :
>
> Users don't care about searches returning 5000 results ; they usually
> only look at the first pages. It's the sorting that is important, and this
> should be done by Lucene itself (sorting on phrase relevance, for
> instance, is a search engine job). So, you are right in limiting the
> results to 500, it could even be lower. The role of the search engine is
> to sort the good results at the top of the list so users are happy.
>
> Do you really display 500 results ? If you paginate the results, the
> users will probably never go past page3. Fetching a lot of results is
> therefore a waste of database power : just fetch the ids from Lucene and
> grab the results from Postgres, but only for the page you are going to
> display.
>
> If you use Postgres to do the sorting (example : lucene matches the
> records and you want to order them by a field which is not stored in
> Lucene but in Postgres) obviously this advice does not apply.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings