Thread: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running
Hi, appreciate if someone can have some pointers for this. PG.8.2.4 1.4G centrino(s/core) 1.5GB ram/5400rpm laptop HD 3 mail tables which has already been selected "out" into separate tables (useing create table foo as select * from foo_main where x=y) These test tables containing only a very small subset of the main data's table (max 1k to 10k rows vs 1.5mill to 7mill rows in the main table) table definitions and actual query are attached. (names has been altered to protect the innocent) I've played around with some tweaking of the postgres.conf setting per guidance from jdavis (in irc) w/o much(any) improvement. Also tried re-writing the queries to NOT use subselects (per depesz in irc also) also yielded nothing spectacular. The only thing I noticed was that when the subqueries combine more than 3 tables, then PG will choke. If only at 3 joined tables per subquery, the results come out fast, even for 6K rows. but if the subqueries (these subqueries by itself, executes fast and returns results in 1 to 10secs) were done independently and then placed into a temp table, and then finally joined together using a query such as select a.a,b.b,c.c from a inner join b on (x = x) left outer join c on(x = y) then it would also be fast work_mem = 8MB / 32MB /128MB (32 MB default in my setup) effective_Cache_size = 128MB/500MB (500 default) shared_buffers = 200MB geqo_threshold = 5 (default 12) geqo_effort = 2 (default 5) ramdom_page_cose = 8.0 (default 4) maintenance_work_mem = 64MB join_collapse_limit = 1/8/15 (8 default) from_collapse_limit = 1/8/15 (8 default) enable_nestloop = f (on by default) based on current performance, even with a small number of rows in the individual tables (max 20k), I can't even get a result out in 2 hours. (> 3 tables joined per subquery) which is making me re-think of PG's useful-ness. BTW, I also tried 8.2.4 CVS_STABLE Branch
Attachment
sorry.. I sent this as I was about to go to bed and the explain analyse of the query w/ 4 tables joined per subquery came out. So.. attaching it.. On Wed, 2007-09-12 at 00:57 +0800, El-Lotso wrote: > Hi, > > appreciate if someone can have some pointers for this. > > PG.8.2.4 1.4G centrino(s/core) 1.5GB ram/5400rpm laptop HD > > 3 mail tables which has already been selected "out" into separate tables > (useing create table foo as select * from foo_main where x=y) > > These test tables containing only a very small subset of the main data's > table (max 1k to 10k rows vs 1.5mill to 7mill rows in the main table) > > table definitions and actual query are attached. (names has been altered > to protect the innocent) > > I've played around with some tweaking of the postgres.conf setting per > guidance from jdavis (in irc) w/o much(any) improvement. Also tried > re-writing the queries to NOT use subselects (per depesz in irc also) > also yielded nothing spectacular. > > The only thing I noticed was that when the subqueries combine more than > 3 tables, then PG will choke. If only at 3 joined tables per subquery, > the results come out fast, even for 6K rows. > > but if the subqueries (these subqueries by itself, executes fast and > returns results in 1 to 10secs) were done independently and then placed > into a temp table, and then finally joined together using a query such > as > > select a.a,b.b,c.c from a inner join b on (x = x) left outer join c on(x > = y) > > then it would also be fast > > work_mem = 8MB / 32MB /128MB (32 MB default in my setup) > effective_Cache_size = 128MB/500MB (500 default) > shared_buffers = 200MB > geqo_threshold = 5 (default 12) > geqo_effort = 2 (default 5) > ramdom_page_cose = 8.0 (default 4) > maintenance_work_mem = 64MB > join_collapse_limit = 1/8/15 (8 default) > from_collapse_limit = 1/8/15 (8 default) > enable_nestloop = f (on by default) > > based on current performance, even with a small number of rows in the > individual tables (max 20k), I can't even get a result out in 2 hours. > (> 3 tables joined per subquery) which is making me re-think of PG's > useful-ness. > > > > BTW, I also tried 8.2.4 CVS_STABLE Branch
Attachment
El-Lotso <el.lotso@gmail.com> writes: > sorry.. I sent this as I was about to go to bed and the explain analyse > of the query w/ 4 tables joined per subquery came out. It's those factor-of-1000 misestimates of the join sizes that are killing you, eg this one: > -> Hash Join (cost=249.61..512.56 rows=1 width=87) (actual time=15.139..32.858 rows=969 loops=1) > Hash Cond: (((test_db.ts.id)::text = (test_db.d.id)::text) AND (test_db.ts.start_timestamp = test_db.trd.start_timestamp)AND (test_db.ts.ttype = test_db.trd.ttype)) > -> Seq Scan on ts (cost=0.00..226.44 rows=3244 width=40) (actual time=0.135..6.916 rows=3244 loops=1) > -> Hash (cost=235.00..235.00 rows=835 width=47) (actual time=14.933..14.933 rows=1016 loops=1) The single-row-result estimate persuades it to use a nestloop at the next level up, and then when the output is actually 969 rows, that means 969 executions of the other side of the upper join. The two input size estimates are reasonably close to reality, so the problem seems to be in the estimate of selectivity of the join condition. First off, do you have up-to-date statistics for all the columns being joined here? It might be that increasing the statistics targets for those columns would help. But what I'm a bit worried about is the idea that the join conditions are correlated or even outright redundant; the planner will not know that, and will make an unrealistic estimate of their combined selectivity. If that's the case, you might need to redesign the table schema to eliminate the redundancy before you'll get good plans. regards, tom lane
On Tue, 2007-09-11 at 14:23 -0400, Tom Lane wrote: > El-Lotso <el.lotso@gmail.com> writes: > > sorry.. I sent this as I was about to go to bed and the explain analyse > > of the query w/ 4 tables joined per subquery came out. > > It's those factor-of-1000 misestimates of the join sizes that are > killing you, eg this one: > > > -> Hash Join (cost=249.61..512.56 rows=1 width=87) (actual time=15.139..32.858 rows=969 loops=1) > > Hash Cond: (((test_db.ts.id)::text = (test_db.d.id)::text) AND (test_db.ts.start_timestamp = test_db.trd.start_timestamp)AND (test_db.ts.ttype = test_db.trd.ttype)) > > -> Seq Scan on ts (cost=0.00..226.44 rows=3244 width=40) (actual time=0.135..6.916 rows=3244 loops=1) > > -> Hash (cost=235.00..235.00 rows=835 width=47) (actual time=14.933..14.933 rows=1016 loops=1) > > The single-row-result estimate persuades it to use a nestloop at the > next level up, and then when the output is actually 969 rows, that > means 969 executions of the other side of the upper join. Yep.. that's consistent with the larger results output. more rows = more loops > > The two input size estimates are reasonably close to reality, so > the problem seems to be in the estimate of selectivity of the > join condition. First off, do you have up-to-date statistics > for all the columns being joined here? It might be that > increasing the statistics targets for those columns would help. I've already upped the stats level to 1000, reindex, vacuum, analysed etc but nothing has basically changed. The issue here is mainly because for each id, there is between 2 to 8 hid. eg: table d seq : 1234567 / code : CED89 table trh seq : 123456 hid : 0/1/2/3/4/5/6/7 and the prob is also compounded by the different ttypes available which causes the use of the subqueries. end of the day.. this data output is desired ID HID =========== 1234567 |0 1234567 |1 1234567 |2 1234567 |3 1234567 |4 1234567 |5 1234567 |6 1234567 |7 the d table has the unique id whereas the other tables has all the subsets. Like a family tree.. Starts at 2, (mom/pop) then to children + children's grandchildren (pair1) children's grandchildren(pair2) d to trh is a one to many relationship > But what I'm a bit worried about is the idea that the join > conditions are correlated or even outright redundant; the > planner will not know that, and will make an unrealistic > estimate of their combined selectivity. If that's the > case, you might need to redesign the table schema to > eliminate the redundancy before you'll get good plans. I'm not I understand (actually, i don't) the above comment. I've already made then from subqueries to actual joins (collapse it) and still no dice. btw, this same schema runs fine on SQL server. (which I'm pulling data from and pumping into PG) I'm downgrading to 8.1.9 to see if it helps too. appreciate any pointers at all.
On Wed, 2007-09-12 at 10:15 +0800, El-Lotso wrote: > I'm downgrading to 8.1.9 to see if it helps too.\ Nope : Doesn't help at all.. the number of rows at the nested loop and hash joins are still 1 to 500 ratio. This plan is slightly different in that PG is choosing seq_scans Nested Loop Left Join (cost=2604.28..4135.15 rows=1 width=59) (actual time=249.973..15778.157 rows=528 loops=1) Join Filter: ((("inner".id)::text = ("outer".id)::text) AND ("inner".hid = "outer".hid) AND ("inner".seq_time = "outer".seq_time)AND ("inner".seq_date = "outer".seq_date)) -> Nested Loop Left Join (cost=1400.08..2766.23 rows=1 width=67) (actual time=168.375..8002.573 rows=528 loops=1) Join Filter: ((("inner".id)::text = ("outer".id)::text) AND ("inner".hid = "outer".hid) AND ("inner".seq_time = "outer".seq_time)AND ("inner".seq_date = "outer".seq_date)) -> Hash Join (cost=127.25..1328.68 rows=1 width=59) (actual time=74.195..84.855 rows=528 loops=1) Hash Cond: ((("outer".id)::text = ("inner".id)::text) AND ("outer".ttype = "inner".ttype) AND ("outer".start_timestamp= "inner".start_timestamp)) -> Seq Scan on trh (cost=0.00..1060.18 rows=9416 width=36) (actual time=0.022..53.830 rows=9416 loops=1) Filter: ((ttype = 35) OR (ttype = 75) OR (ttype = 703) OR (ttype = 740) OR (ttype = 764)) -> Hash (cost=125.53..125.53 rows=230 width=63) (actual time=12.487..12.487 rows=192 loops=1) -> Hash Join (cost=18.69..125.53 rows=230 width=63) (actual time=11.043..12.007 rows=192 loops=1) Hash Cond: (("outer".id)::text = ("inner".id)::text) -> Seq Scan on ts (cost=0.00..87.36 rows=3436 width=40) (actual time=0.003..5.436 rows=3436 loops=1) -> Hash (cost=18.57..18.57 rows=48 width=23) (actual time=0.876..0.876 rows=48 loops=1) -> Seq Scan on d (cost=0.00..18.57 rows=48 width=23) (actual time=0.019..0.771 rows=48loops=1) Filter: ((record_update_date_time >= '2007-08-20 00:00:00'::timestamp without timezone) AND (record_update_date_time <= '2007-09-08 00:00:00'::timestamp without time zone) AND ((code)::text = 'HUA75'::text)) -> Hash Join (cost=1272.83..1437.52 rows=1 width=61) (actual time=11.784..14.216 rows=504 loops=528) Hash Cond: ((("outer".id)::text = ("inner".id)::text) AND ("outer".ttype = "inner".ttype) AND ("outer".start_timestamp= "inner".start_timestamp)) -> Seq Scan on ts (cost=0.00..87.36 rows=3436 width=40) (actual time=0.003..5.744 rows=3436 loops=528) -> Hash (cost=1268.29..1268.29 rows=606 width=59) (actual time=82.783..82.783 rows=504 loops=1) -> Hash Join (cost=18.69..1268.29 rows=606 width=59) (actual time=76.454..81.515 rows=504 loops=1) Hash Cond: (("outer".id)::text = ("inner".id)::text) -> Seq Scan on trh (cost=0.00..1198.22 rows=9064 width=36) (actual time=0.051..66.555 rows=9064loops=1) Filter: ((ttype = 69) OR (ttype = 178) OR (ttype = 198) OR (ttype = 704) OR (ttype = 757)OR (ttype = 741) OR (ttype = 765)) -> Hash (cost=18.57..18.57 rows=48 width=23) (actual time=0.863..0.863 rows=48 loops=1) -> Seq Scan on d (cost=0.00..18.57 rows=48 width=23) (actual time=0.019..0.761 rows=48loops=1) Filter: ((record_update_date_time >= '2007-08-20 00:00:00'::timestamp without timezone) AND (record_update_date_time <= '2007-09-08 00:00:00'::timestamp without time zone) AND ((code)::text = 'HUA75'::text)) -> Hash Join (cost=1204.20..1368.89 rows=1 width=61) (actual time=11.498..13.941 rows=504 loops=528) Hash Cond: ((("outer".id)::text = ("inner".id)::text) AND ("outer".ttype = "inner".ttype) AND ("outer".start_timestamp= "inner".start_timestamp)) -> Seq Scan on ts (cost=0.00..87.36 rows=3436 width=40) (actual time=0.003..5.593 rows=3436 loops=528) -> Hash (cost=1199.62..1199.62 rows=610 width=59) (actual time=70.186..70.186 rows=504 loops=1) -> Hash Join (cost=18.69..1199.62 rows=610 width=59) (actual time=64.270..68.886 rows=504 loops=1) Hash Cond: (("outer".id)::text = ("inner".id)::text) -> Seq Scan on trh (cost=0.00..1129.20 rows=9128 width=36) (actual time=0.020..54.050 rows=9128 loops=1) Filter: ((ttype = 177) OR (ttype = 197) OR (ttype = 705) OR (ttype = 742) OR (ttype = 758) OR (ttype= 766)) -> Hash (cost=18.57..18.57 rows=48 width=23) (actual time=1.100..1.100 rows=48 loops=1) -> Seq Scan on d (cost=0.00..18.57 rows=48 width=23) (actual time=0.019..0.994 rows=48 loops=1) Filter: ((record_update_date_time >= '2007-08-20 00:00:00'::timestamp without time zone)AND (record_update_date_time <= '2007-09-08 00:00:00'::timestamp without time zone) AND ((code)::text = 'HUA75'::text)) Total runtime: 15779.769 ms Am I screwed? Is a schema redesign really a necessity? This would be a real pain given the rewrite of _all_ the queries and can't maintain compatibility in the front-end app between sql server and PG.
On Wed, 2007-09-12 at 10:15 +0800, El-Lotso wrote: > On Tue, 2007-09-11 at 14:23 -0400, Tom Lane wrote: > > El-Lotso <el.lotso@gmail.com> writes: > > > sorry.. I sent this as I was about to go to bed and the explain analyse > > > of the query w/ 4 tables joined per subquery came out. > > > > It's those factor-of-1000 misestimates of the join sizes that are > > killing you, eg this one: > > > > > -> Hash Join (cost=249.61..512.56 rows=1 width=87) (actual time=15.139..32.858 rows=969 loops=1) > > > Hash Cond: (((test_db.ts.id)::text = (test_db.d.id)::text) AND (test_db.ts.start_timestamp = test_db.trd.start_timestamp)AND (test_db.ts.ttype = test_db.trd.ttype)) > > > -> Seq Scan on ts (cost=0.00..226.44 rows=3244 width=40) (actual time=0.135..6.916 rows=3244loops=1) > > > -> Hash (cost=235.00..235.00 rows=835 width=47) (actual time=14.933..14.933 rows=1016 loops=1) > > > > The single-row-result estimate persuades it to use a nestloop at the > > next level up, and then when the output is actually 969 rows, that > > means 969 executions of the other side of the upper join. > > Yep.. that's consistent with the larger results output. more rows = more > loops I'm on the verge of giving up... the schema seems simple and yet there's so much issues with it. Perhaps it's the layout of the data, I don't know. But based on the ordering/normalisation of the data and the one to many relationship of some tables, this is giving the planner a headache (and me a bulge on the head from knockin it against the wall) I've tried multiple variations, subqueries, not use subqueries, not join the table, (but to include it as a subquery - which gets re-written to a join anyway) exists/not exists to no avail. PG is fast, yes even w/ all the nested loops for up to 48K of results, (within 4 minutes) but as soon as I put it into a inner join/left join/multiple temporary(memory) tables it will choke. select a.a,b.b,c.c from (select x,y,z from zz)a inner join b on a.a = b.a left join (select x,a,z from xx) then it will choke. I'm really at my wits end here.
El-Lotso skrev: > I'm on the verge of giving up... the schema seems simple and yet there's > so much issues with it. Perhaps it's the layout of the data, I don't > know. But based on the ordering/normalisation of the data and the one to > many relationship of some tables, this is giving the planner a headache > (and me a bulge on the head from knockin it against the wall) I think you should look more at the db design, and less on rewriting the query. Here are some observations: - Your table structure is quite hard to understand (possibly because you have changed the names) - if you want help on improving it, you will need to explain the data to us, and possibly post some sample data. - You seem to be lacking constraints on the tables. My guess is that (id,ttype,start_timestamp) is unique in both trh and ts - but I cannot tell (and neither can the query planner). Foreign key constraints might help as well. These would also help others to understand your data, and suggest reformulations of your queries. - Another guess is that the ttype sets (177,197,705,742,758,766), (69,178,198,704,757,741,765) are actually indicating some other property a common "type" of record, and that only one of each will be present for an id,start_timestamp combination. This may be related to the repeating fields issue - if a certain ttype indicates that we are interested in a certain pber_x field (and possibly that the others are empty). - You have what looks like repeating fields - pber_x, fval_x, index_x - in your tables. Fixing this might not improve your query, but might be a good idea for other reasons. - seq_date and seq_time seems like they may be redundant - are they different casts of the same data? All speculation. Hope it helps Nis
El-Lotso <el.lotso@gmail.com> writes: > I'm really at my wits end here. Try to merge the multiple join keys into one, somehow. I'm not sure why the planner is overestimating the selectivity of the combined join conditions, but that's basically where your problem is coming from. A truly brute-force solution would be "set enable_nestloop = off" but this is likely to screw performance for other queries. regards, tom lane
On Wed, 2007-09-12 at 10:41 -0400, Tom Lane wrote: > El-Lotso <el.lotso@gmail.com> writes: > > I'm really at my wits end here. > > Try to merge the multiple join keys into one, somehow. I'm not sure why > the planner is overestimating the selectivity of the combined join > conditions, but that's basically where your problem is coming from. I've tried merging them together.. what previously was INNER JOIN TS ON TS.ID = TRH.ID AND TS.TTYPE = TRH.TTYPE AND TS.START_TIMESTAMP = TRH.START_TIMESTAMP has become inner join TS on ts.id_ttype_startstamp = trh.id_ttype_startstamp where id_ttype_startstamp = (id || '-'||ttype || '-' || start_timestamp) It's working somewhat better but everything is not as rosy as it should as the planner is still over/under estimating the # of rows. FROM org : Nested Loop Left Join (cost=10612.48..24857.20 rows=1 width=61) (actual time=1177.626..462856.007 rows=750 loops=1) TO merge joined conditions : Hash Join (cost=41823.94..45889.49 rows=6101 width=61) (actual time=3019.609..3037.692 rows=750 loops=1) Hash Cond: (trd.trd_join_key = ts.ts_join_key) Merged Join using the Main table : 3 - 5 million rows Hash Left Join (cost=80846.38..121112.36 rows=25 width=244) (actual time=5088.437..5457.269 rows=750 loops=1) Note that it still doesn't really help that much, the estimated rows is still way off the actual number of rows. On one of the querys there the hid field has a subset of 8 values, it's even worst. And it seems like the merge condition doesn't help at all. I'm still trying to merge more join conditions to see if it helps. > A truly brute-force solution would be "set enable_nestloop = off" > but this is likely to screw performance for other queries. I've also tried this... It's not helping much actually. As mentioned previously, this is a one to many relationship and because of that, somehow PG just doesn't take it into account. I'm still not having much luck here. (playing with a subset of the main table's data _does_ show some promise, but when querying on the main table w/ 3 million data, everything grinds to a halt)
On Wed, 2007-09-12 at 15:14 +0200, Nis Jørgensen wrote: > El-Lotso skrev: > > > I'm on the verge of giving up... the schema seems simple and yet there's > > so much issues with it. Perhaps it's the layout of the data, I don't > > know. But based on the ordering/normalisation of the data and the one to > > many relationship of some tables, this is giving the planner a headache > > (and me a bulge on the head from knockin it against the wall) > > I think you should look more at the db design, and less on rewriting the > query. Here are some observations: I can't help much with the design per-se. So.. > > - Your table structure is quite hard to understand (possibly because you > have changed the names) - if you want help on improving it, you will > need to explain the data to us, and possibly post some sample data. If anyone is willing, I can send some sample data to you off-list. on the trh table, hid is a subset of data for a particular id. eg: PARENT : CHILD 1 PARENT : CHILD 2 PARENT : CHILD 3 PARENT : CHILD 4 uniqueid = merged fields from id / index1 / index2 / start_timestamp(IN EPOCH) / phase_id / ttype which is unique on each table (but not across ALL the tables) > - You seem to be lacking constraints on the tables. My guess is that > (id,ttype,start_timestamp) is unique in both trh and ts - but I cannot > tell (and neither can the query planner). Foreign key constraints might > help as well. These would also help others to understand your data, and > suggest reformulations of your queries. AFAICT, there are no foreign constraints in the original DB design. (and I'm not even sure how to begin the FK design based on this org design) the unique_id is as above. TRH/TRD uniqueid = merged fields from id / index1 / index2 / start_timestamp(IN EPOCH) / phase_id / ttype TS uniqueid = merged fields from id / start_timestamp(IN EPOCH) / ttype Problem with this is that the fields in which they are unique is different across the different tables, so the unique_id is only unique for that table alone and acts as a primary key so that no dupes exists in that one table. > - Another guess is that the ttype sets (177,197,705,742,758,766), > (69,178,198,704,757,741,765) are actually indicating some other property > a common "type" of record, and that only one of each will be present for > an id,start_timestamp combination. This may be related to the repeatingd > fields issue - if a certain ttype indicates that we are interested in a > certain pber_x field (and possibly that the others are empty). yes.. eg: id | hid |ttype | start_timestamp | pber_2 | pber 3 |pber_4 PARENT | 0 |764 | 2007-07-01 00:00 | 4000 | null | null PARENT | 0 |765 | 2007-07-01 00:00 | null | 9000 | null PARENT | 0 |766 | 2007-07-01 00:00 | null | null | 7999 PARENT | 1 |764 | 2007-07-01 00:00 | 4550 | null | null PARENT | 1 |765 | 2007-07-01 00:00 | null | 9220 | null PARENT | 1 |766 | 2007-07-01 00:00 | null | null | 6669 the subqueries are just to take out the fields with the value and leave the nulls so that we end-up with id |hid| start_timestamp |pber_2 | pber 3 | pber_4 PARENT | 0 | 2007-07-01 00:00 | 4000 | 9000 | 7999 PARENT | 1 | 2007-07-01 00:00 | 4550 | 9220 | 6669 which is basically just joining a table by itself, but there is a caveat whereby pber_3 and pber_4 is/can only be joined together based on the seq_date/seq_time in the ts table hence the query.. JOIN1.id = join2.id and join1.seq_date = join2.seq_date etc.. but the problem is confounded by the fact that there is numerous hid values for head id > - You have what looks like repeating fields - pber_x, fval_x, index_x - > in your tables. Fixing this might not improve your query, but might be a > good idea for other reasons. it's being looked at by some other team to collapse this to something like this ttype | pber 764 | 500 765 | 600 766 | 700 so that there are lesser # of columns and no null fields. But the query will remain the same > - seq_date and seq_time seems like they may be redundant - are they > different casts of the same data? No. They're used to join together the pber_2/3/4 fields as one may happen between a few hours to days between each other, but each will be uniquely identified by the seq_date/time eg : id | pber_2 | seq_date | seq time PARENT | 400 | 2007-07-01 00:00:00 | 1980-01-01 20:00:00 PARENT | 410 | 2007-07-10 00:00:00 | 1980-01-01 22:00:00 id | pber_3 | seq_date | seq time PARENT | 900 | 2007-07-01 00:00:00 | 1980-01-01 20:00:00 PARENT | 100 | 2007-07-10 00:00:00 | 1980-01-01 22:00:00 id | pber_4 | seq_date | seq time PARENT | 10000 | 2007-07-01 00:00:00 | 1980-01-01 20:00:00 PARENT | 999 | 2007-07-10 00:00:00 | 1980-01-01 22:00:00 so, the correct value for the fields when joined together will be of the form id |start_timestamp |seq_date | seq_time |pber_2 | pber 3 | pber_4 PARENT |2007-07-01 00:00 |2007-07-01 00:00:00 | 1980-01-01 20:00:00| 400 | 900 | 10000 PARENT |2007-07-01 00:00 |2007-07-10 00:00:00 | 1980-01-01 22:00:00| 410 | 100 | 999 (repeating for each hid subset value) > All speculation. Hope it helps anything would help.. I'm more or less willing to try anything to make things faster else this project is going to the toilet.