Thread: another query optimization question
Hi, its me again. As far as we tested postgresql ist fast, very fast compared to the other db system we test and are using currently. We are now testing some test databases on Postgres. We use one function which simply calculates a difference between two values and checks if on value is 0, so something like this: declare diff integer; begin if $1 > $2 then diff := $1 -$2; return diff * diff; else return 0; end if; end; Language for this function is plpgsql executing a select like this: select sum(job_property_difference(t0.int_value, t1.int_value)) as rank from job_property t0, job_property t1 where t0.id_job_profile = 911 and t0.id_job_attribute = t1.id_job_attribute and t1.id_job_profile in (select id_job_profile from unemployed) and t1.id_job_profile <> 911; results in a query plan result: QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------------- Aggregate (cost=70521.28..70521.28 rows=1 width=8) -> Merge Join (cost=66272.11..70158.29 rows=145194 width=8) Merge Cond: ("outer".id_job_attribute = "inner".id_job_attribute) -> Sort (cost=31.53..32.44 rows=366 width=8) Sort Key: t0.id_job_attribute -> Index Scan using job_property__id_job_profile__fk_index on job_property t0 (cost=0.00..15.95 rows=366 width=8) Index Cond: (id_job_profile = 911) -> Sort (cost=66240.58..67456.79 rows=486483 width=8) Sort Key: t1.id_job_attribute -> Hash IN Join (cost=34.08..20287.32 rows=486483 width=8) Hash Cond: ("outer".id_job_profile = "inner".id_job_profile) -> Seq Scan on job_property t1 (cost=0.00..12597.89 rows=558106 width=12) Filter: (id_job_profile <> 911) -> Hash (cost=31.46..31.46 rows=1046 width=4) -> Seq Scan on unemployed (cost=0.00..31.46 rows=1046 width=4) (21 rows) This takes about 1minute, 45 seconds on a test database with about 31.882 job_profile and 8.483.005 job_property records. The final solution will have about 1.000.000 job_profile records and, well ... about 266.074.901 so we wonder what options we have in order to improve this select. Should we rewrite the function (and others) in C? Turning off seqscans makes it slower which might be because psql is hopping between the index and the row values back and forth as a lot of rows are involved. Any hint how to speed up this would be great. regards David
On Jan 30, 2004, at 11:00 AM, David Teran wrote: > > executing a select like this: > > select > sum(job_property_difference(t0.int_value, t1.int_value)) as rank > from > job_property t0, > job_property t1 > where > t0.id_job_profile = 911 > and t0.id_job_attribute = t1.id_job_attribute > and t1.id_job_profile in (select id_job_profile from unemployed) > and t1.id_job_profile <> 911; > > results in a query plan result: > What's the goal here? Are you trying to rank by how the int_value relates to each other? I'd like to know more about what kind of result you're trying to achieve. -- PC Drew
On Fri, 30 Jan 2004, David Teran wrote: > select > sum(job_property_difference(t0.int_value, t1.int_value)) as rank > from > job_property t0, > job_property t1 > where > t0.id_job_profile = 911 > and t0.id_job_attribute = t1.id_job_attribute > and t1.id_job_profile in (select id_job_profile from unemployed) > and t1.id_job_profile <> 911; > > results in a query plan result: Can we see explain analyze output for the query, it'll give more information about actual time and row counts than plain explain.
Hi, On 30.01.2004, at 19:10, Stephan Szabo wrote: > > On Fri, 30 Jan 2004, David Teran wrote: > >> select >> sum(job_property_difference(t0.int_value, t1.int_value)) as rank >> from >> job_property t0, >> job_property t1 >> where >> t0.id_job_profile = 911 >> and t0.id_job_attribute = t1.id_job_attribute >> and t1.id_job_profile in (select id_job_profile from unemployed) >> and t1.id_job_profile <> 911; >> >> results in a query plan result: > > Can we see explain analyze output for the query, it'll give more > information about actual time and row counts than plain explain. > sure, here it is comes. What we need to achieve is: we have different job_profiles, each profile has multiple values. For one given profile we need the ' sum of the distance of every value in the given profile and every other profile'. The result is usually grouped by the profile id but to make the query easier i removed this, it does not cost too much time and it turned out that this query here uses most of the time. thanks, David QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ -------------------------------- Aggregate (cost=2689349.81..2689349.81 rows=1 width=8) (actual time=100487.423..100487.423 rows=1 loops=1) -> Merge Join (cost=2451266.53..2655338.83 rows=13604393 width=8) (actual time=82899.466..-2371037.726 rows=2091599 loops=1) Merge Cond: ("outer".id_job_attribute = "inner".id_job_attribute) -> Sort (cost=97.43..100.63 rows=1281 width=8) (actual time=3.937..4.031 rows=163 loops=1) Sort Key: t0.id_job_attribute -> Index Scan using job_property__id_job_profile__fk_index on job_property t0 (cost=0.00..31.31 rows=1281 width=8) (actual time=1.343..3.766 rows=163 loops=1) Index Cond: (id_job_profile = 911) -> Sort (cost=2451169.10..2483246.47 rows=12830947 width=8) (actual time=82891.076..-529619.213 rows=4187378 loops=1) Sort Key: t1.id_job_attribute -> Hash IN Join (cost=507.32..439065.37 rows=12830947 width=8) (actual time=61.943..1874640.807 rows=4187378 loops=1) Hash Cond: ("outer".id_job_profile = "inner".id_job_profile) -> Seq Scan on job_property t1 (cost=0.00..246093.84 rows=12830947 width=12) (actual time=0.136..19101.796 rows=8482533 loops=1) Filter: (id_job_profile <> 911) -> Hash (cost=467.46..467.46 rows=15946 width=4) (actual time=61.313..61.313 rows=0 loops=1) -> Seq Scan on unemployed (cost=0.00..467.46 rows=15946 width=4) (actual time=0.157..50.842 rows=15960 loops=1) Total runtime: 103769.592 ms
David Teran <david.teran@cluster9.com> writes: > [ query plan ] I got a little distracted by the bizarre actual-time values shown for some of the query steps: > -> Merge Join (cost=2451266.53..2655338.83 rows=13604393 width=8) > (actual time=82899.466..-2371037.726 rows=2091599 loops=1) > -> Sort (cost=2451169.10..2483246.47 rows=12830947 width=8) > (actual time=82891.076..-529619.213 rows=4187378 loops=1) > -> Hash IN Join (cost=507.32..439065.37 rows=12830947 > width=8) (actual time=61.943..1874640.807 rows=4187378 loops=1) The hash-join total time is obviously wrong seeing that the total runtime is only about 100000 msec, and the negative values for the other two are even more obviously wrong. I recall that we saw similar symptoms once before, and I thought we'd fixed it, but I didn't find any relevant mentions in the CVS logs. What version are you running, exactly? Could you repeat the EXPLAIN with client_min_messages set to 'debug2', and see if you see any messages about InstrStartTimer or InstrStopNode? regards, tom lane
On Friday 30 January 2004 19:19, Tom Lane wrote: > > The hash-join total time is obviously wrong seeing that the total > runtime is only about 100000 msec, and the negative values for the other > two are even more obviously wrong. > > I recall that we saw similar symptoms once before, and I thought we'd > fixed it, but I didn't find any relevant mentions in the CVS logs. You're not thinking of the -ve durations on the auto-vacuum gizmo? Something to do with system clock accuracy IIRC. -- Richard Huxton Archonet Ltd
HI Tom. > I got a little distracted by the bizarre actual-time values shown for > some of the query steps: > >> -> Merge Join (cost=2451266.53..2655338.83 rows=13604393 >> width=8) >> (actual time=82899.466..-2371037.726 rows=2091599 loops=1) > >> -> Sort (cost=2451169.10..2483246.47 rows=12830947 >> width=8) >> (actual time=82891.076..-529619.213 rows=4187378 loops=1) > >> -> Hash IN Join (cost=507.32..439065.37 >> rows=12830947 >> width=8) (actual time=61.943..1874640.807 rows=4187378 loops=1) > > The hash-join total time is obviously wrong seeing that the total > runtime is only about 100000 msec, and the negative values for the > other > two are even more obviously wrong. > > I recall that we saw similar symptoms once before, and I thought we'd > fixed it, but I didn't find any relevant mentions in the CVS logs. > > What version are you running, exactly? Could you repeat the EXPLAIN > with client_min_messages set to 'debug2', and see if you see any > messages about InstrStartTimer or InstrStopNode? > 7.4.1, build from sourcecode. Running on MacOS X Server 10.3.2, dual G5 with 3.5 GB RAM I have set client_min_messages in postgresql.conf to debug2 but i see nothing. Is this going to the normal logfile? Must i activate anything else? regards David
David Teran <david.teran@cluster9.com> writes: >> I recall that we saw similar symptoms once before, and I thought we'd >> fixed it, but I didn't find any relevant mentions in the CVS logs. >> >> What version are you running, exactly? > 7.4.1, build from sourcecode. Running on MacOS X Server 10.3.2, dual G5 > with 3.5 GB RAM Interesting. I have recollected where we saw this before: http://archives.postgresql.org/pgsql-hackers/2003-11/msg01528.php Apparently gettimeofday() has a glitch on some BSD releases. OS X is a BSD derivative and it's not so surprising if it has it too. May I suggest that you try the test program given here: http://archives.postgresql.org/pgsql-hackers/2003-11/msg01546.php and file a bug report with Apple if it shows any out-of-order results? I am fairly certain that I tried that test program when I devised it on my own OS X machine, and couldn't get a failure. Maybe it depends on your hardware (hm, could dual CPUs be the key factor)? regards, tom lane
Hi Tim, you are right: > Interesting. I have recollected where we saw this before: > http://archives.postgresql.org/pgsql-hackers/2003-11/msg01528.php > Apparently gettimeofday() has a glitch on some BSD releases. OS X is > a BSD derivative and it's not so surprising if it has it too. > > May I suggest that you try the test program given here: > http://archives.postgresql.org/pgsql-hackers/2003-11/msg01546.php > and file a bug report with Apple if it shows any out-of-order results? > > I am fairly certain that I tried that test program when I devised it > on my own OS X machine, and couldn't get a failure. Maybe it depends > on your hardware (hm, could dual CPUs be the key factor)? p:~ david$ ./a.out bogus tv_usec: 1075544305 -615731632, prev 1075544304 349672 out of order tv_sec: 1075544304 349759, prev 1075544305 -615731632 out of order tv_usec: 1075544305 339904, prev 1075544305 339905 bogus tv_usec: 1075544307 -615731811, prev 1075544306 349493 out of order tv_sec: 1075544306 349498, prev 1075544307 -615731811 out of order tv_usec: 1075544307 339442, prev 1075544307 339443 out of order tv_usec: 1075544308 339351, prev 1075544308 339352 This is a part of the output. Whats -very- interesting: Apple provides a little tool that can enable / disable the l2 cache ... one CPU of a dual CPU system on the fly. When i start the testapp with two CPU's enabled i get this output here, when i turn off one CPU while the app is still running the messages disappear as long as one CPU is turned off. Reactivating the CPU again produces new error messages. I checked the app on a single G4, no errors and i checked the app on a dual G4, -not- G5 and also no error messages. Do you remember where one can find a patch? Maybe its something one can fix because parts of the OS from Apple are 'open source'. Do you know if this bug makes a system unusable with PostgresSQL? Regards David
David Teran <david.teran@cluster9.com> writes: > Apple provides a little tool that can enable / disable the l2 cache ... > one CPU of a dual CPU system on the fly. When i start the testapp with > two CPU's enabled i get this output here, when i turn off one CPU while > the app is still running the messages disappear as long as one CPU is > turned off. Reactivating the CPU again produces new error messages. Ah-hah, so the gettimeofday bug *is* linked to multiple CPUs. Marc, were the machines you saw it on all multi-CPU? > Do you remember where one can find a patch? File a bug report with Apple. > Do you know if this bug makes a system unusable with PostgresSQL? Not at all. The only bad effect we've noted is that EXPLAIN results are sometimes wacky. In theory you could sometimes see the result of now() being off by a second or two, but no one's reported seeing that. regards, tom lane
On Sat, 31 Jan 2004, Tom Lane wrote: > David Teran <david.teran@cluster9.com> writes: > > Apple provides a little tool that can enable / disable the l2 cache ... > > one CPU of a dual CPU system on the fly. When i start the testapp with > > two CPU's enabled i get this output here, when i turn off one CPU while > > the app is still running the messages disappear as long as one CPU is > > turned off. Reactivating the CPU again produces new error messages. > > Ah-hah, so the gettimeofday bug *is* linked to multiple CPUs. Marc, > were the machines you saw it on all multi-CPU? I'm not sure ... I thought I ran it on my P4 here in the office and saw it too, albeit not near as frequently ... but, in FreeBSD's case, it is a "design issue" ... there are two different functions, once that is kinda fuzzy (but fast), and the other that is designed to be exact, but at a performance loss ... or was it the same function, but a 'sysctl' variable that changes the state? Can't remember which, but it is by design on FreeBSD ... and, if we're talking about Apple, the same most likely applies, as its based on the same kernel ... Back of my mind, I *think* it was these sysctl variables: kern.timecounter.method: 0 kern.timecounter.hardware: i8254 ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Hi, > I'm not sure ... I thought I ran it on my P4 here in the office and > saw it > too, albeit not near as frequently ... but, in FreeBSD's case, it is a > "design issue" ... there are two different functions, once that is > kinda > fuzzy (but fast), and the other that is designed to be exact, but at a > performance loss ... or was it the same function, but a 'sysctl' > variable > that changes the state? > > Can't remember which, but it is by design on FreeBSD ... and, if we're > talking about Apple, the same most likely applies, as its based on the > same kernel ... > > Back of my mind, I *think* it was these sysctl variables: > > kern.timecounter.method: 0 > kern.timecounter.hardware: i8254 > I will try to check this on my system. But here another hint, maybe more interesting for Apple though: The bug does -not- occur if another process uses a lot of CPU time. We encoded a quicktime movie into mpeg2 and while this was using about 90% and while encoding the vcd i wanted to show the bug to a friend and it did not work. But besides this, is there any chance that we can optimize our initial performance problem ;-) regards David