Thread: Very slow queries on 8.1
I'm porting an application from Sybase and I've noticed that similar application functions take 2 to 3 times longer on postgres than they used to on the same machine running under Sybase. I've tried changing various "performance tuning" parameters, such as shared_buffers, effective_cache_size, etc but there's little or no effect. I'm beginning to think there's a deeper root cause to the slowness. Right now, I'm working on a test case that involves a table with ~360k rows called "nb.sigs". My sample query is: select * from nb.sigs where signum > 250000 With no index, explain says this query costs 11341. After CREATE INDEX on the signum field, along with an ANALYZE for nb.sigs, the query costs 3456 and takes around 4 seconds to return the first row. This seems extremely slow to me, but I can't figure out what I might be doing wrong. Any ideas? (If necessary, I can write an entire script that creates and populates a table and then give my performance on that sample for someone else to check against.)
What is the data type for "signum" ??? "David Rysdam" <drysdam@ll.mit.edu> wrote in message news:437CA40A.8020507@ll.mit.edu... > I'm porting an application from Sybase and I've noticed that similar > application functions take 2 to 3 times longer on postgres than they used > to on the same machine running under Sybase. I've tried changing various > "performance tuning" parameters, such as shared_buffers, > effective_cache_size, etc but there's little or no effect. I'm beginning > to think there's a deeper root cause to the slowness. > > Right now, I'm working on a test case that involves a table with ~360k > rows called "nb.sigs". My sample query is: > > select * from nb.sigs where signum > 250000 > > With no index, explain says this query costs 11341. After CREATE INDEX on > the signum field, along with an ANALYZE for nb.sigs, the query costs 3456 > and takes around 4 seconds to return the first row. This seems extremely > slow to me, but I can't figure out what I might be doing wrong. Any > ideas? > > (If necessary, I can write an entire script that creates and populates a > table and then give my performance on that sample for someone else to > check against.) > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >
David Rysdam <drysdam@ll.mit.edu> writes: > Right now, I'm working on a test case that involves a table with ~360k > rows called "nb.sigs". My sample query is: > select * from nb.sigs where signum > 250000 > With no index, explain says this query costs 11341. After CREATE INDEX > on the signum field, along with an ANALYZE for nb.sigs, the query costs > 3456 and takes around 4 seconds to return the first row. This seems > extremely slow to me, but I can't figure out what I might be doing > wrong. Any ideas? How many rows does that actually return, and what client interface are you fetching it with? libpq, at least, likes to fetch the entire query result before it gives it to you --- so you're talking about 4 sec to get all the rows, not only the first one. That might be reasonable if you're fetching 100k rows via an indexscan... regards, tom lane
int4, not null and the index is unique. I even tried clustering on it to no avail. codeWarrior wrote: >What is the data type for "signum" ??? > > > >"David Rysdam" <drysdam@ll.mit.edu> wrote in message >news:437CA40A.8020507@ll.mit.edu... > > >>I'm porting an application from Sybase and I've noticed that similar >>application functions take 2 to 3 times longer on postgres than they used >>to on the same machine running under Sybase. I've tried changing various >>"performance tuning" parameters, such as shared_buffers, >>effective_cache_size, etc but there's little or no effect. I'm beginning >>to think there's a deeper root cause to the slowness. >> >>Right now, I'm working on a test case that involves a table with ~360k >>rows called "nb.sigs". My sample query is: >> >>select * from nb.sigs where signum > 250000 >> >>With no index, explain says this query costs 11341. After CREATE INDEX on >>the signum field, along with an ANALYZE for nb.sigs, the query costs 3456 >>and takes around 4 seconds to return the first row. This seems extremely >>slow to me, but I can't figure out what I might be doing wrong. Any >>ideas? >> >>(If necessary, I can write an entire script that creates and populates a >>table and then give my performance on that sample for someone else to >>check against.) >> >>---------------------------(end of broadcast)--------------------------- >>TIP 6: explain analyze is your friend >> >> >> > > > >---------------------------(end of broadcast)--------------------------- >TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > > > >
Tom Lane wrote: >David Rysdam <drysdam@ll.mit.edu> writes: > > >>Right now, I'm working on a test case that involves a table with ~360k >>rows called "nb.sigs". My sample query is: >>select * from nb.sigs where signum > 250000 >>With no index, explain says this query costs 11341. After CREATE INDEX >>on the signum field, along with an ANALYZE for nb.sigs, the query costs >>3456 and takes around 4 seconds to return the first row. This seems >>extremely slow to me, but I can't figure out what I might be doing >>wrong. Any ideas? >> >> > >How many rows does that actually return, and what client interface are >you fetching it with? libpq, at least, likes to fetch the entire query >result before it gives it to you --- so you're talking about 4 sec to >get all the rows, not only the first one. That might be reasonable if >you're fetching 100k rows via an indexscan... > > regards, tom lane > > > > Right, it's about 100k rows and it is through libpq (pgadmin in this case, but my app uses libpq from pgtcl). Is there a way to tell libpq to not do what it "likes" and do what I need instead? I didn't see anything in the docs, but I didn't look very hard.
David Rysdam <drysdam@ll.mit.edu> writes: > Right, it's about 100k rows and it is through libpq (pgadmin in this > case, but my app uses libpq from pgtcl). Is there a way to tell libpq > to not do what it "likes" and do what I need instead? The only way ATM is to declare a cursor on the query and then FETCH whatever amount seems reasonable at a time. There is support in the on-the-wire protocol for partial fetches from ordinary queries (ie, without the notational overhead of creating a cursor). libpq doesn't expose that at the moment. There's a thread right now in pgsql-interfaces about adding such a feature to libpq ... regards, tom lane
On Thu, Nov 17, 2005 at 11:31:27 -0500, David Rysdam <drysdam@ll.mit.edu> wrote: > Right, it's about 100k rows and it is through libpq (pgadmin in this > case, but my app uses libpq from pgtcl). Is there a way to tell libpq > to not do what it "likes" and do what I need instead? I didn't see > anything in the docs, but I didn't look very hard. You could use a cursor. That will bias the plan toward fast start plans which might give you lower throughput if you are normally planning to fetch all of the rows, but will give you quicker access to the first row.
On Thu, Nov 17, 2005 at 11:31:27AM -0500, David Rysdam wrote: > Right, it's about 100k rows and it is through libpq (pgadmin in this > case, but my app uses libpq from pgtcl). Is there a way to tell libpq > to not do what it "likes" and do what I need instead? I didn't see > anything in the docs, but I didn't look very hard. Use the async interface. There you submit the query and retrieve rows as they come in. It's a bit trickier to program but it can be done. psql doesn't do this though, it's not clear how it could anyway, given the way it formats. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Martijn van Oosterhout wrote: >On Thu, Nov 17, 2005 at 11:31:27AM -0500, David Rysdam wrote: > > >>Right, it's about 100k rows and it is through libpq (pgadmin in this >>case, but my app uses libpq from pgtcl). Is there a way to tell libpq >>to not do what it "likes" and do what I need instead? I didn't see >>anything in the docs, but I didn't look very hard. >> >> > >Use the async interface. There you submit the query and retrieve rows >as they come in. It's a bit trickier to program but it can be done. > >psql doesn't do this though, it's not clear how it could anyway, given >the way it formats. > >Have a nice day, > > I'm experimenting with the async interface right now. Hopefully it will fit in well. It's OK if psql/pgadmin don't do it, as long as the app does.
Bruno Wolff III wrote: >On Thu, Nov 17, 2005 at 11:31:27 -0500, > David Rysdam <drysdam@ll.mit.edu> wrote: > > >>Right, it's about 100k rows and it is through libpq (pgadmin in this >>case, but my app uses libpq from pgtcl). Is there a way to tell libpq >>to not do what it "likes" and do what I need instead? I didn't see >>anything in the docs, but I didn't look very hard. >> >> > >You could use a cursor. That will bias the plan toward fast start plans which >might give you lower throughput if you are normally planning to fetch all of >the rows, but will give you quicker access to the first row. > > > > That is exactly what I want, but is it possible to use a cursor from pgtcl?
On Thu, Nov 17, 2005 at 10:38:50AM -0500, David Rysdam wrote: > I'm porting an application from Sybase and I've noticed that similar > application functions take 2 to 3 times longer on postgres than they > used to on the same machine running under Sybase. I've tried changing > various "performance tuning" parameters, such as shared_buffers, > effective_cache_size, etc but there's little or no effect. What's your OS and hardware environment and what are your non-default PostgreSQL settings? > Right now, I'm working on a test case that involves a table with ~360k > rows called "nb.sigs". My sample query is: > > select * from nb.sigs where signum > 250000 > > With no index, explain says this query costs 11341. After CREATE INDEX > on the signum field, along with an ANALYZE for nb.sigs, the query costs > 3456 and takes around 4 seconds to return the first row. Please post the EXPLAIN ANALYZE output for the query -- that'll show us the query plan, the accuracy of the planner's row count estimate, and how long the query takes to execute on the server. It might also be useful to see the table definition and the output of the following query: SELECT null_frac, n_distinct, correlation FROM pg_stats WHERE schemaname = 'nb' AND tablename = 'sigs' AND attname = 'signum'; What client interface are you using? If the query returns a lot of rows then you might benefit from using a cursor to fetch the result set in chunks; otherwise the client library is probably waiting for the entire result set to arrive before returning any rows to you. If the result set is large then you can experience performance problems due to a shortage of real memory. How volatile is the data and how common are queries based on signum? You might benefit from clustering on the signum index. > (If necessary, I can write an entire script that creates and populates a > table and then give my performance on that sample for someone else to > check against.) If it's a short script that populates the table with canned data then go ahead and post it. -- Michael Fuhr
Michael Fuhr wrote: >On Thu, Nov 17, 2005 at 10:38:50AM -0500, David Rysdam wrote: > > >>I'm porting an application from Sybase and I've noticed that similar >>application functions take 2 to 3 times longer on postgres than they >>used to on the same machine running under Sybase. I've tried changing >>various "performance tuning" parameters, such as shared_buffers, >>effective_cache_size, etc but there's little or no effect. >> >> > >What's your OS and hardware environment and what are your non-default >PostgreSQL settings? > > > Fedora Core 2, dual 2.8 GHz, 2 GB ram. shared_buffers = 10000 effective_cache_size = 100000 >>Right now, I'm working on a test case that involves a table with ~360k >>rows called "nb.sigs". My sample query is: >> >>select * from nb.sigs where signum > 250000 >> >>With no index, explain says this query costs 11341. After CREATE INDEX >>on the signum field, along with an ANALYZE for nb.sigs, the query costs >>3456 and takes around 4 seconds to return the first row. >> >> > >Please post the EXPLAIN ANALYZE output for the query -- that'll >show us the query plan, the accuracy of the planner's row count >estimate, and how long the query takes to execute on the server. >It might also be useful to see the table definition and the output >of the following query: > >SELECT null_frac, n_distinct, correlation >FROM pg_stats >WHERE schemaname = 'nb' AND tablename = 'sigs' AND attname = 'signum'; > > > The first things would be problematic to supply, since they are actually on a computer that doesn't have access to the Internet or to the machine I'm writing this on. As for the query: Row null_frac n_distinct correlation 1 0 -1 1 >What client interface are you using? If the query returns a lot >of rows then you might benefit from using a cursor to fetch the >result set in chunks; otherwise the client library is probably >waiting for the entire result set to arrive before returning any >rows to you. > That does seem to be the problem. I've never worked with cursors, so I'll have to see if I can fit our DB module into that mold.