Thread: Postgres slowdown on large table joins
I'm having a problem here. I'm using Postgres 7.0.3 on a FreeBSD 4.2-RELEASE machine... it's a Pentium II/450 w/ 128MB of RAM (not nearly enough, but there'll be an upgrade soon). Anyway, I have a data table, which currently has around 146,000 entries, though it will grow to a few million eventually. There is also config and prefs tables, which have 4-5 rows each. When I execute the following command: SELECT c.unit_id,c.name,c.auxenable,c.bias,c.feedback,c.gain,c.igain, c.mode,c.reverse,c.setpoint,c.switch1,c.switch2,c.timeout, c.valvetype,d.active,d.drive_1_s,d.drive_1_f,d.drive_2_s, d.drive_2_f,d.mval,d.m4val,d.sw1,d.sw2,d.cycle,d.itemp, d.error,d.aval,d.ts,c.ts,p.degree,c.outputa,c.outputb, c.outputc,c.rawtemp FROM config c, data d, prefs p WHERE c.conf_id = '4' AND d.conf_id = c.conf_id AND p.conf_id = c.conf_id ORDER BY d.ts DESC LIMIT 1 ...it takes an astounding 50 seconds to complete, CPU usage goes to about 85% Now, a simple... SELECT * FROM data ORDER BY ts desc LIMIT 1 ...takes about 16-26 seconds - still sloooow, but not as bad as with the table join. What's really causing the slowdown? ...should I just execute the command differently? I'm trying to get the latest data in all three tables. Once the server has 768MB+ of RAM, is it possible to load the entire table into memory? should speed things up considerably. Thanks, -- David Edmondson <david@jlc.net> GMU/FA d-(--) s+: a18>? C++++$ UB++++$ P+>+++++ L- E--- W++ N- o K-> w-- O? M-(--) V? PS+ PE+ Y? PGP t 5 X R+ tv-->! b DI+++ D+ G(--) e>* h!>+ r++ y+>++ ICQ: 79043921 AIM: AbsintheXL #music,#hellven on irc.esper.net
Can you EXPLAIN that query and send us the results (the query plan)? That should tell a whole lot. -Mitch ----- Original Message ----- From: "Dave Edmondson" <david@jlc.net> To: <pgsql-general@postgresql.org> Sent: Friday, February 16, 2001 1:32 PM Subject: Postgres slowdown on large table joins > I'm having a problem here. I'm using Postgres 7.0.3 on a FreeBSD 4.2-RELEASE > machine... it's a Pentium II/450 w/ 128MB of RAM (not nearly enough, but > there'll be an upgrade soon). Anyway, I have a data table, which currently > has around 146,000 entries, though it will grow to a few million eventually. > There is also config and prefs tables, which have 4-5 rows each. When I > execute the following command: > > SELECT c.unit_id,c.name,c.auxenable,c.bias,c.feedback,c.gain,c.igain, > c.mode,c.reverse,c.setpoint,c.switch1,c.switch2,c.timeout, > c.valvetype,d.active,d.drive_1_s,d.drive_1_f,d.drive_2_s, > d.drive_2_f,d.mval,d.m4val,d.sw1,d.sw2,d.cycle,d.itemp, > d.error,d.aval,d.ts,c.ts,p.degree,c.outputa,c.outputb, > c.outputc,c.rawtemp > FROM config c, data d, prefs p > WHERE c.conf_id = '4' > AND d.conf_id = c.conf_id > AND p.conf_id = c.conf_id > ORDER BY d.ts DESC > LIMIT 1 > > ...it takes an astounding 50 seconds to complete, CPU usage goes to about > 85% Now, a simple... > > SELECT * > FROM data > ORDER BY ts desc > LIMIT 1 > > ...takes about 16-26 seconds - still sloooow, but not as bad as with the > table join. What's really causing the slowdown? ...should I just execute > the command differently? I'm trying to get the latest data in all three > tables. > > Once the server has 768MB+ of RAM, is it possible to load the entire table > into memory? should speed things up considerably. > > Thanks, > > -- > David Edmondson <david@jlc.net> > GMU/FA d-(--) s+: a18>? C++++$ UB++++$ P+>+++++ L- E--- W++ N- o K-> w-- O? > M-(--) V? PS+ PE+ Y? PGP t 5 X R+ tv-->! b DI+++ D+ G(--) e>* h!>+ r++ y+>++ > ICQ: 79043921 AIM: AbsintheXL #music,#hellven on irc.esper.net >
I didn't see you mention whether you had VACUUM ANALYZEd your database. That alone will cause a huge boost in performance. You also didn't mention indicies, but since they're SQL as opposed to db-specific, I will assume that you know about those... Greg ----- Original Message ----- From: "Dave Edmondson" <david@jlc.net> To: <pgsql-general@postgresql.org> Sent: Friday, February 16, 2001 1:32 PM Subject: Postgres slowdown on large table joins > I'm having a problem here. I'm using Postgres 7.0.3 on a FreeBSD 4.2-RELEASE > machine... it's a Pentium II/450 w/ 128MB of RAM (not nearly enough, but > there'll be an upgrade soon). Anyway, I have a data table, which currently > has around 146,000 entries, though it will grow to a few million eventually. > There is also config and prefs tables, which have 4-5 rows each. When I > execute the following command: > > SELECT c.unit_id,c.name,c.auxenable,c.bias,c.feedback,c.gain,c.igain, > c.mode,c.reverse,c.setpoint,c.switch1,c.switch2,c.timeout, > c.valvetype,d.active,d.drive_1_s,d.drive_1_f,d.drive_2_s, > d.drive_2_f,d.mval,d.m4val,d.sw1,d.sw2,d.cycle,d.itemp, > d.error,d.aval,d.ts,c.ts,p.degree,c.outputa,c.outputb, > c.outputc,c.rawtemp > FROM config c, data d, prefs p > WHERE c.conf_id = '4' > AND d.conf_id = c.conf_id > AND p.conf_id = c.conf_id > ORDER BY d.ts DESC > LIMIT 1 > > ...it takes an astounding 50 seconds to complete, CPU usage goes to about > 85% Now, a simple... > > SELECT * > FROM data > ORDER BY ts desc > LIMIT 1 > > ...takes about 16-26 seconds - still sloooow, but not as bad as with the > table join. What's really causing the slowdown? ...should I just execute > the command differently? I'm trying to get the latest data in all three > tables. > > Once the server has 768MB+ of RAM, is it possible to load the entire table > into memory? should speed things up considerably. > > Thanks, > > -- > David Edmondson <david@jlc.net> > GMU/FA d-(--) s+: a18>? C++++$ UB++++$ P+>+++++ L- E--- W++ N- o K-> w-- O? > M-(--) V? PS+ PE+ Y? PGP t 5 X R+ tv-->! b DI+++ D+ G(--) e>* h!>+ r++ y+>++ > ICQ: 79043921 AIM: AbsintheXL #music,#hellven on irc.esper.net >
Ack! I just timed it at 74 seconds. Added two indexes, here's the query plan... it doesn't seem to be using the indexes at all. I'm sure I'm doing something wrong here... NOTICE: QUERY PLAN: Sort (cost=6707.62..6707.62 rows=10596 width=170) -> Merge Join (cost=1.34..5492.29 rows=10596 width=170) -> Nested Loop (cost=0.00..4943.38 rows=36493 width=154) -> Index Scan using config_pkey on config c (cost=0.00..2.01 rows=1 width=113) -> Seq Scan on data d (cost=0.00..3116.72 rows=145972 width=41) -> Sort (cost=1.34..1.34 rows=12 width=16) -> Seq Scan on prefs p (cost=0.00..1.12 rows=12 width=16) EXPLAIN I actually didn't know anything about indexes before now. Know of anywhere with a good explanation of them? The Postgres user manual wasn't too helpful, it just explained the syntax of the CREATE USER command. Actual commands I entered: create index data_index ON data using btree (conf_id); create index prefs_index ON prefs using btree (conf_id); On Fri, Feb 16, 2001 at 03:27:36PM -0500, Mitch Vincent wrote: > NOTICE: QUERY PLAN: > > Sort (cost=6707.62..6707.62 rows=10596 width=170) > -> Merge Join (cost=1.34..5492.29 rows=10596 width=170) > -> Nested Loop (cost=0.00..4943.38 rows=36493 width=154) > -> Index Scan using config_pkey on config c (cost=0.00..2.01 > rows=1 width=113) > -> Seq Scan on data d (cost=0.00..3116.72 rows=145972 > width=41) > -> Sort (cost=1.34..1.34 rows=12 width=16) > -> Seq Scan on prefs p (cost=0.00..1.12 rows=12 width=16) > > EXPLAIN > > OK, well, an index on data.conf_id and prefs.conf_id will help. > > After you make the indexes do a VACUUM ANALYZE > > Then send me that query plan :-) > > The Join is hurting you but that's to be expected with a larger table, there > isn't much we can do there.. > > The indexes should speed things up a lot though.. Let me know.. > > > A link to some info on EXPLAIN : > http://postgresql.readysetnet.com/users-lounge/docs/7.0/user/c4884.htm > > -Mitch > > > > There you go... I'll be searching around for how to interpret that. > > (I've never done an EXPLAIN before now...) > > > > > SELECT c.unit_id,c.name,c.auxenable,c.bias,c.feedback,c.gain,c.igain, > > > > c.mode,c.reverse,c.setpoint,c.switch1,c.switch2,c.timeout, > > > > c.valvetype,d.active,d.drive_1_s,d.drive_1_f,d.drive_2_s, > > > > d.drive_2_f,d.mval,d.m4val,d.sw1,d.sw2,d.cycle,d.itemp, > > > > d.error,d.aval,d.ts,c.ts,p.degree,c.outputa,c.outputb, > > > > c.outputc,c.rawtemp > > > > FROM config c, data d, prefs p > > > > WHERE c.conf_id = '4' > > > > AND d.conf_id = c.conf_id > > > > AND p.conf_id = c.conf_id > > > > ORDER BY d.ts DESC > > > > LIMIT 1 > > > > > > > > ...it takes an astounding 50 seconds to complete, CPU usage goes to > about > > > > 85% Now, a simple... > > > > > > > > SELECT * > > > > FROM data > > > > ORDER BY ts desc > > > > LIMIT 1 > > > > > > > > ...takes about 16-26 seconds - still sloooow, but not as bad as with > the > > > > table join. What's really causing the slowdown? ...should I just > execute > > > > the command differently? I'm trying to get the latest data in all > three > > > > tables. > > > > > > > > Once the server has 768MB+ of RAM, is it possible to load the entire > table > > > > into memory? should speed things up considerably. > > > > > > > > Thanks, > > > > -- > > David Edmondson <david@jlc.net> > > > -- David Edmondson <david@jlc.net> GMU/FA d-(--) s+: a18>? C++++$ UB++++$ P+>+++++ L- E--- W++ N- o K-> w-- O? M-(--) V? PS+ PE+ Y? PGP t 5 X R+ tv-->! b DI+++ D+ G(--) e>* h!>+ r++ y+>++ ICQ: 79043921 AIM: AbsintheXL #music,#hellven on irc.esper.net
Dave Edmondson writes: > Added two indexes, here's the query plan... it doesn't seem to be using the > indexes at all. I'm sure I'm doing something wrong here... > > NOTICE: QUERY PLAN: > > Sort (cost=6707.62..6707.62 rows=10596 width=170) > -> Merge Join (cost=1.34..5492.29 rows=10596 width=170) > -> Nested Loop (cost=0.00..4943.38 rows=36493 width=154) > -> Index Scan using config_pkey on config c (cost=0.00..2.01 rows=1 width=113) > -> Seq Scan on data d (cost=0.00..3116.72 rows=145972 width=41) > -> Sort (cost=1.34..1.34 rows=12 width=16) > -> Seq Scan on prefs p (cost=0.00..1.12 rows=12 width=16) I don't see anything blatantly wrong with this. The Nested Loop can obviously(?) not use two index scans, and the Seq Scan under the Sort is okay since it only retrieves 12 rows. (Unless those 12 rows are a wild misguess.) > I actually didn't know anything about indexes before now. Know of anywhere > with a good explanation of them? The Postgres user manual wasn't too > helpful, it just explained the syntax of the CREATE USER command. There's this: http://www.postgresql.org/users-lounge/docs/7.0/postgres/indices.htm but it could probably need to some work. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
On Mon, Feb 19, 2001 at 12:22:11PM -0500, Tom Lane wrote: > Dave Edmondson <david@jlc.net> writes: > > Ack! I just timed it at 74 seconds. > > Added two indexes, here's the query plan... it doesn't seem to be using the > > indexes at all. I'm sure I'm doing something wrong here... > > Have you done a VACUUM ANALYZE on these tables since filling them? > Are the estimated rows counts in the plan anywhere near reality? The > rows=145972 for data looks particularly fishy ... how many rows of data > do you think will match the conf_id extracted from config? > > regards, tom lane yes. I ran VACUUM ANALYZE after creating the indicies. (Actually, I VACUUM the database twice a day.) The data table literally has 145972 rows, and 145971 will match conf_id 4... which is why I'm getting the feeling that an index on conf_id in data won't actually solve the problem (after reading about what indicies actually do). I think I might just have split to the one table join into separate queries, and ignore any entries before a certain date/time... guess I'll just have to get creative for that part. :) -- David Edmondson <david@jlc.net> GMU/FA d-(--) s+: a18>? C++++$ UB++++$ P+>+++++ L- E--- W++ N- o K-> w-- O? M-(--) V? PS+ PE+ Y? PGP t 5 X R+ tv-->! b DI+++ D+ G(--) e>* h!>+ r++ y+>++ ICQ: 79043921 AIM: AbsintheXL #music,#hellven on irc.esper.net
Dave Edmondson <david@jlc.net> writes: > yes. I ran VACUUM ANALYZE after creating the indicies. (Actually, I VACUUM > the database twice a day.) The data table literally has 145972 rows, and > 145971 will match conf_id 4... Hm. In that case the seqscan on data looks pretty reasonable ... not sure if you can improve on this much, except by restructuring the tables. How many rows does the query actually produce, anyway? It might be that most of the time is going into sorting and delivering the result rows. regards, tom lane
> > yes. I ran VACUUM ANALYZE after creating the indicies. (Actually, I VACUUM > > the database twice a day.) The data table literally has 145972 rows, and > > 145971 will match conf_id 4... > > Hm. In that case the seqscan on data looks pretty reasonable ... not > sure if you can improve on this much, except by restructuring the tables. > How many rows does the query actually produce, anyway? It might be that > most of the time is going into sorting and delivering the result rows. All I'm really trying to get is the latest row with a conf_id of 4... I'm not sure if there's an easier way to do this, but it seems a bit ridiculous to read in almost 146000 rows to return 1. :( -- David Edmondson <david@jlc.net> GMU/FA d-(--) s+: a18>? C++++$ UB++++$ P+>+++++ L- E--- W++ N- o K-> w-- O? M-(--) V? PS+ PE+ Y? PGP t 5 X R+ tv-->! b DI+++ D+ G(--) e>* h!>+ r++ y+>++ ICQ: 79043921 AIM: AbsintheXL #music,#hellven on irc.esper.net
* Dave Edmondson <david@jlc.net> [010219 14:40]: > > > yes. I ran VACUUM ANALYZE after creating the indicies. (Actually, I VACUUM > > > the database twice a day.) The data table literally has 145972 rows, and > > > 145971 will match conf_id 4... > > > > Hm. In that case the seqscan on data looks pretty reasonable ... not > > sure if you can improve on this much, except by restructuring the tables. > > How many rows does the query actually produce, anyway? It might be that > > most of the time is going into sorting and delivering the result rows. > > All I'm really trying to get is the latest row with a conf_id of 4... I'm > not sure if there's an easier way to do this, but it seems a bit ridiculous > to read in almost 146000 rows to return 1. :( is there a timestamp or date/time tuple in the row? If so, index THAT. LER > > -- > David Edmondson <david@jlc.net> > GMU/FA d-(--) s+: a18>? C++++$ UB++++$ P+>+++++ L- E--- W++ N- o K-> w-- O? > M-(--) V? PS+ PE+ Y? PGP t 5 X R+ tv-->! b DI+++ D+ G(--) e>* h!>+ r++ y+>++ > ICQ: 79043921 AIM: AbsintheXL #music,#hellven on irc.esper.net -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: ler@lerctr.org US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
On Mon, Feb 19, 2001 at 08:34:47PM -0600, Larry Rosenman wrote: > * Dave Edmondson <david@jlc.net> [010219 14:40]: > > > > yes. I ran VACUUM ANALYZE after creating the indicies. (Actually, I VACUUM > > > > the database twice a day.) The data table literally has 145972 rows, and > > > > 145971 will match conf_id 4... > > > > > > Hm. In that case the seqscan on data looks pretty reasonable ... not > > > sure if you can improve on this much, except by restructuring the tables. > > > How many rows does the query actually produce, anyway? It might be that > > > most of the time is going into sorting and delivering the result rows. > > > > All I'm really trying to get is the latest row with a conf_id of 4... I'm > > not sure if there's an easier way to do this, but it seems a bit ridiculous > > to read in almost 146000 rows to return 1. :( > > is there a timestamp or date/time tuple in the row? If so, index > THAT. > > LER actually, just did that yesterday... now that I finally understand incides. Thanks anyway. -- David Edmondson <david@jlc.net> GMU/FA d-(--) s+: a18>? C++++$ UB++++$ P+>+++++ L- E--- W++ N- o K-> w-- O? M-(--) V? PS+ PE+ Y? PGP t 5 X R+ tv-->! b DI+++ D+ G(--) e>* h!>+ r++ y+>++ ICQ: 79043921 AIM: AbsintheXL #music,#hellven on irc.esper.net
On Fri, Feb 16, 2001 at 01:32:13PM -0500, Dave Edmondson wrote: > SELECT * > FROM data > ORDER BY ts desc > LIMIT 1 > > ...takes about 16-26 seconds - still sloooow, but not as bad as with the > table join. What's really causing the slowdown? ...should I just execute > the command differently? I'm trying to get the latest data in all three > tables. How much does it take to SELECT MAX(ts) FROM data; Definitely should be much faster than sorting. If it's faster, try this SELECT c.unit_id,c.name,c.auxenable,c.bias,c.feedback,c.gain,c.igain, c.mode,c.reverse,c.setpoint,c.switch1,c.switch2,c.timeout, c.valvetype,d.active,d.drive_1_s,d.drive_1_f,d.drive_2_s, d.drive_2_f,d.mval,d.m4val,d.sw1,d.sw2,d.cycle,d.itemp, d.error,d.aval,d.ts,c.ts,p.degree,c.outputa,c.outputb, c.outputc,c.rawtemp FROM config c, data d, prefs p WHERE c.conf_id = '4' AND d.conf_id = c.conf_id AND p.conf_id = c.conf_id AND d.ts = (SELECT max(dd.ts) FROM data dd WHERE dd.conf_id = '4'); If data.ts fields are not unique for the given conf_id, you can get more than one row from this query, but then you can LIMIT it again. I think, that indices could be useful for finding min/max. PosgreSQL doesn't seem do use them for that work. It would be easy with simple queries like 'SELECT max(ts) FROM data'. Harder for more complex queries, say: SELECT max(ts) FROM data WHERE conf_id = '4' It wouldn't suffice to find the rightmost btree element, but if we had two column index on (conf_id, ts)...? Search for rightmost ('4', X) ? Am I missing something or this would be possible? greetings for all postgresql developers and users! tom -- .signature: Too many levels of symbolic links