Thread: Major performance problem after upgrade from 8.3 to 8.4
Hello, I just upgraded with pg_dump/restore from PostgreSQL 8.3.11 to 8.4.4 but I'm having major performance problems with a query with many left joins. Problem is that costs are now very, very, very high (was ok in 8.3). Analyze has been done. Indexes are of course there. -> Merge Left Join (cost=1750660.22..4273805884876845789194861338991916289697885665127154313046252183850255795798561612107149662486528.00 rows=238233578115856634454073334945297075430094545596765511255148896328828230572227215727052643001958400 width=16) Merge Cond: (l.id = d2000903.fk_id) Details with execution plan can be found at: http://www.wiesinger.com/tmp/pg_perf_84.txt I know that the data model is key/value pairs but it worked well in 8.3. I need this flexibility. Any ideas? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
On Mon, Aug 30, 2010 at 12:20 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote: > Hello, > > I just upgraded with pg_dump/restore from PostgreSQL 8.3.11 to 8.4.4 but I'm > having major performance problems with a query with many left joins. Problem > is that costs are now very, very, very high (was ok in 8.3). Analyze has > been done. Indexes are of course there. > > -> Merge Left Join > (cost=1750660.22..4273805884876845789194861338991916289697885665127154313046252183850255795798561612107149662486528.00 > rows=238233578115856634454073334945297075430094545596765511255148896328828230572227215727052643001958400 > width=16) > Merge Cond: (l.id = d2000903.fk_id) Wow! Other than an incredibly high cost AND row estimate, was the query plan the same on 8.3 or different? > Details with execution plan can be found at: > http://www.wiesinger.com/tmp/pg_perf_84.txt What's up with the "(actual time=.. rows= loops=) " in the explain analyze? > I know that the data model is key/value pairs but it worked well in 8.3. I > need this flexibility. > > Any ideas? Not really. I would like an explain analyze from both 8.3 and 8.4. Are they tuned the same, things like work mem and default stats target?
Gerhard Wiesinger <lists@wiesinger.com> wrote: > I know that the data model is key/value pairs but it worked well in 8.3. > I need this flexibility. > > Any ideas? If i understand the query correctly it's a pivot-table, right? If yes, and if i where you, i would try to rewrite this query, to something like: select timestamp, sum (case when keyid = 1 then value else 0 end) as Raumsolltemperatur, ... from log group by timestamp; Assuming you can read a german text: http://www.pg-forum.de/h-ufig-gestellte-fragen-faq/4067-faq-zeilen-zu-spalten.html Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
On Mon, 30 Aug 2010, Scott Marlowe wrote: > On Mon, Aug 30, 2010 at 12:20 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote: >> Hello, >> >> I just upgraded with pg_dump/restore from PostgreSQL 8.3.11 to 8.4.4 but I'm >> having major performance problems with a query with many left joins. Problem >> is that costs are now very, very, very high (was ok in 8.3). Analyze has >> been done. Indexes are of course there. >> >> -> Merge Left Join >> (cost=1750660.22..4273805884876845789194861338991916289697885665127154313046252183850255795798561612107149662486528.00 >> rows=238233578115856634454073334945297075430094545596765511255148896328828230572227215727052643001958400 >> width=16) >> Merge Cond: (l.id = d2000903.fk_id) > > Wow! Other than an incredibly high cost AND row estimate, was the > query plan the same on 8.3 or different? > >> Details with execution plan can be found at: >> http://www.wiesinger.com/tmp/pg_perf_84.txt > > What's up with the "(actual time=.. rows= loops=) " in the explain analyze? What do you mean exactly? missing? I did it not with psql but with a GUI program. > >> I know that the data model is key/value pairs but it worked well in 8.3. I >> need this flexibility. >> >> Any ideas? > > Not really. I would like an explain analyze from both 8.3 and 8.4. > Are they tuned the same, things like work mem and default stats > target? I don't have a 8.3 version running anymore. But I'm havin an OLD version of a nearly exactly query plan (The sort was missing due to performance issues and it done now in a view, maybe also some more JOINS are added, but all that doesn't have impacts on the basic principle of the query plan): http://www.wiesinger.com/tmp/pg_perf.txt Tuning: Yes, on same machine with same parameters (manual diff on old config and added manually the parameters again). Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
On Mon, Aug 30, 2010 at 1:25 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote: > On Mon, 30 Aug 2010, Scott Marlowe wrote: > >> On Mon, Aug 30, 2010 at 12:20 AM, Gerhard Wiesinger <lists@wiesinger.com> >> wrote: >>> >>> Hello, >>> >>> I just upgraded with pg_dump/restore from PostgreSQL 8.3.11 to 8.4.4 but >>> I'm >>> having major performance problems with a query with many left joins. >>> Problem >>> is that costs are now very, very, very high (was ok in 8.3). Analyze has >>> been done. Indexes are of course there. >>> >>> -> Merge Left Join >>> >>> (cost=1750660.22..4273805884876845789194861338991916289697885665127154313046252183850255795798561612107149662486528.00 >>> >>> rows=238233578115856634454073334945297075430094545596765511255148896328828230572227215727052643001958400 >>> width=16) >>> Merge Cond: (l.id = d2000903.fk_id) >> >> Wow! Other than an incredibly high cost AND row estimate, was the >> query plan the same on 8.3 or different? >> >>> Details with execution plan can be found at: >>> http://www.wiesinger.com/tmp/pg_perf_84.txt >> >> What's up with the "(actual time=.. rows= loops=) " in the explain >> analyze? > > What do you mean exactly? missing? > I did it not with psql but with a GUI program. Nevermind, that was an artifact at http://explain.depesz.com/s/KyU not your fault. Sorry. >>> I know that the data model is key/value pairs but it worked well in 8.3. >>> I >>> need this flexibility. >>> >>> Any ideas? >> >> Not really. I would like an explain analyze from both 8.3 and 8.4. >> Are they tuned the same, things like work mem and default stats >> target? > > I don't have a 8.3 version running anymore. But I'm havin an OLD version of > a nearly exactly query plan (The sort was missing due to performance issues > and it done now in a view, maybe also some more JOINS are added, but all > that doesn't have impacts on the basic principle of the query plan): > http://www.wiesinger.com/tmp/pg_perf.txt > > Tuning: Yes, on same machine with same parameters (manual diff on old config > and added manually the parameters again). How long does the query take to run in 8.4? Do you have an explain analyze of that? I'm still thinking that some change in the query planner might be seeing all those left joins and coming up with some non-linear value for row estimation. What's default stats target set to in that db? -- To understand recursion, one must first understand recursion.
Hello 2010/8/30 Andreas Kretschmer <akretschmer@spamfence.net>: > Gerhard Wiesinger <lists@wiesinger.com> wrote: > >> I know that the data model is key/value pairs but it worked well in 8.3. >> I need this flexibility. >> >> Any ideas? > > If i understand the query correctly it's a pivot-table, right? > no - it's just EAV table on very large data :( Regards Pavel Stehule > If yes, and if i where you, i would try to rewrite this query, to > something like: > > select > timestamp, > sum (case when keyid = 1 then value else 0 end) as Raumsolltemperatur, > ... > from > log > group by > timestamp; > > > Assuming you can read a german text: > http://www.pg-forum.de/h-ufig-gestellte-fragen-faq/4067-faq-zeilen-zu-spalten.html > > > > Andreas > -- > Really, I'm not out to destroy Microsoft. That will just be a completely > unintentional side effect. (Linus Torvalds) > "If I was god, I would recompile penguin with --enable-fly." (unknown) > Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Mon, 30 Aug 2010, Andreas Kretschmer wrote: > Gerhard Wiesinger <lists@wiesinger.com> wrote: > >> I know that the data model is key/value pairs but it worked well in 8.3. >> I need this flexibility. >> >> Any ideas? > > If i understand the query correctly it's a pivot-table, right? > The view flattens the key/value structure (for enchanceable technical measure values) to a flat structure with many columns. The view is that I have a flat structure and then have easy to write queries for aggregation. The query itself is just an aggregating query over that flat structure. Any ideas for better optimizations? > If yes, and if i where you, i would try to rewrite this query, to > something like: > > select > timestamp, > sum (case when keyid = 1 then value else 0 end) as Raumsolltemperatur, > ... > from > log > group by > timestamp; > I will try that. But what I don't understand: query was really fast in 8.3 even with 24 hours timeframe. > > Assuming you can read a german text: > http://www.pg-forum.de/h-ufig-gestellte-fragen-faq/4067-faq-zeilen-zu-spalten.html Hopefully yes :-) after that Fedora/Postgresql nightmare update night (Parallel Fedora upgrade stalled whole machine since that query from Nagios was executed I guess hundreds of times since it got very slow in 8.4). Machine was pingable but nothing else even on console :-( RAID rebuild and all other stuff :-( I planned the upgrade but I didn't expect problems with the query plan instability :-( Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
On Mon, 30 Aug 2010, Scott Marlowe wrote: > On Mon, Aug 30, 2010 at 1:25 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote: >> On Mon, 30 Aug 2010, Scott Marlowe wrote: >> >>> On Mon, Aug 30, 2010 at 12:20 AM, Gerhard Wiesinger <lists@wiesinger.com> >>> wrote: >>>> >>>> Hello, >>>> >>>> I just upgraded with pg_dump/restore from PostgreSQL 8.3.11 to 8.4.4 but >>>> I'm >>>> having major performance problems with a query with many left joins. >>>> Problem >>>> is that costs are now very, very, very high (was ok in 8.3). Analyze has >>>> been done. Indexes are of course there. >>>> >>>> -> Merge Left Join >>>> >>>> (cost=1750660.22..4273805884876845789194861338991916289697885665127154313046252183850255795798561612107149662486528.00 >>>> >>>> rows=238233578115856634454073334945297075430094545596765511255148896328828230572227215727052643001958400 >>>> width=16) >>>> Merge Cond: (l.id = d2000903.fk_id) >>> >>> Wow! Other than an incredibly high cost AND row estimate, was the >>> query plan the same on 8.3 or different? >>> >>>> Details with execution plan can be found at: >>>> http://www.wiesinger.com/tmp/pg_perf_84.txt >>> >>> What's up with the "(actual time=.. rows= loops=) " in the explain >>> analyze? >> >> What do you mean exactly? missing? >> I did it not with psql but with a GUI program. > > Nevermind, that was an artifact at http://explain.depesz.com/s/KyU not > your fault. Sorry. > >>>> I know that the data model is key/value pairs but it worked well in 8.3. >>>> I >>>> need this flexibility. >>>> >>>> Any ideas? >>> >>> Not really. I would like an explain analyze from both 8.3 and 8.4. >>> Are they tuned the same, things like work mem and default stats >>> target? >> >> I don't have a 8.3 version running anymore. But I'm havin an OLD version of >> a nearly exactly query plan (The sort was missing due to performance issues >> and it done now in a view, maybe also some more JOINS are added, but all >> that doesn't have impacts on the basic principle of the query plan): >> http://www.wiesinger.com/tmp/pg_perf.txt >> >> Tuning: Yes, on same machine with same parameters (manual diff on old config >> and added manually the parameters again). > > How long does the query take to run in 8.4? Do you have an explain > analyze of that? I'm still thinking that some change in the query > planner might be seeing all those left joins and coming up with some > non-linear value for row estimation. What's default stats target set > to in that db? In config, default values: #default_statistics_target = 100 # range 1-10000 How can I find that out? Ciao, Gerhard -- http://www.wiesinger.com/
On Mon, 30 Aug 2010, Pavel Stehule wrote: > Hello > > 2010/8/30 Andreas Kretschmer <akretschmer@spamfence.net>: >> Gerhard Wiesinger <lists@wiesinger.com> wrote: >> >>> I know that the data model is key/value pairs but it worked well in 8.3. >>> I need this flexibility. >>> >>> Any ideas? >> >> If i understand the query correctly it's a pivot-table, right? >> > > no - it's just EAV table on very large data :( Yes, it is an EAV table, but with query space comparable low (Max. 1 day out of years, typically 5mins out of years). Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
2010/8/30 Gerhard Wiesinger <lists@wiesinger.com>: > On Mon, 30 Aug 2010, Pavel Stehule wrote: > >> Hello >> >> 2010/8/30 Andreas Kretschmer <akretschmer@spamfence.net>: >>> >>> Gerhard Wiesinger <lists@wiesinger.com> wrote: >>> >>>> I know that the data model is key/value pairs but it worked well in 8.3. >>>> I need this flexibility. >>>> >>>> Any ideas? >>> >>> If i understand the query correctly it's a pivot-table, right? >>> >> >> no - it's just EAV table on very large data :( > > Yes, it is an EAV table, but with query space comparable low (Max. 1 day out > of years, typically 5mins out of years). > it is irelevant - there are repeated seq scans - so you need a partitioning or classic table - maybe materialized views can help Pavel > Thnx. > > Ciao, > Gerhard > > -- > http://www.wiesinger.com/ > >
On Mon, 30 Aug 2010, Pavel Stehule wrote: > 2010/8/30 Gerhard Wiesinger <lists@wiesinger.com>: >> On Mon, 30 Aug 2010, Pavel Stehule wrote: >> >>> Hello >>> >>> 2010/8/30 Andreas Kretschmer <akretschmer@spamfence.net>: >>>> >>>> Gerhard Wiesinger <lists@wiesinger.com> wrote: >>>> >>>>> I know that the data model is key/value pairs but it worked well in 8.3. >>>>> I need this flexibility. >>>>> >>>>> Any ideas? >>>> >>>> If i understand the query correctly it's a pivot-table, right? >>>> >>> >>> no - it's just EAV table on very large data :( >> >> Yes, it is an EAV table, but with query space comparable low (Max. 1 day out >> of years, typically 5mins out of years). >> > > it is irelevant - there are repeated seq scans - so you need a > partitioning or classic table - maybe materialized views can help I know the drawbacks of an EAV design but I don't want to discuss that. I want to discuss the major performance decrease of PostgreSQL 8.3 (performance was ok) to PostgreSQL 8.4 (performance is NOT ok). Any further ideas how I can track this down? Can someone explain the difference in query plan from an optimizer point of view? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
Gerhard Wiesinger <lists@wiesinger.com> writes: > I know the drawbacks of an EAV design but I don't want to discuss that. I > want to discuss the major performance decrease of PostgreSQL 8.3 > (performance was ok) to PostgreSQL 8.4 (performance is NOT ok). > Any further ideas how I can track this down? > Can someone explain the difference in query plan from an optimizer point > of view? Since you haven't shown us the 8.3 plan, it's kind of hard to speculate ;-) One thing that jumped out at me was that 8.4 appears to be expecting multiple matches in each of the left-joined tables, which is why the total rowcount estimate balloons so fast. I rather imagine that you are expecting at most one match in reality, else the query isn't going to behave nicely. Is this correct? Are you *sure* you analyzed all these tables? And if that is how the data looks, where is the actual performance problem? A bad rowcount estimate isn't in itself going to kill you. FWIW, in a similar albeit toy example, I don't see any difference between the 8.3 and 8.4 plans or cost estimates. regards, tom lane
On Mon, 30 Aug 2010, Tom Lane wrote: > Gerhard Wiesinger <lists@wiesinger.com> writes: >> I know the drawbacks of an EAV design but I don't want to discuss that. I >> want to discuss the major performance decrease of PostgreSQL 8.3 >> (performance was ok) to PostgreSQL 8.4 (performance is NOT ok). > >> Any further ideas how I can track this down? >> Can someone explain the difference in query plan from an optimizer point >> of view? > > Since you haven't shown us the 8.3 plan, it's kind of hard to speculate ;-) > > One thing that jumped out at me was that 8.4 appears to be expecting > multiple matches in each of the left-joined tables, which is why the > total rowcount estimate balloons so fast. I rather imagine that you are > expecting at most one match in reality, else the query isn't going to > behave nicely. Is this correct? Are you *sure* you analyzed all these > tables? And if that is how the data looks, where is the actual > performance problem? A bad rowcount estimate isn't in itself going > to kill you. > > FWIW, in a similar albeit toy example, I don't see any difference > between the 8.3 and 8.4 plans or cost estimates. Yes, I'm expecting only one match in reality and I thing PostgreSQL should also know that from table definition and constraints. Long answer below. Query doesn't "end" in PostgreSQL. From the definition: CREATE TABLE value_types ( valuetypeid bigint PRIMARY KEY, description varchar(256) NOT NULL -- e.g. 'float', 'integer', 'boolean' ); CREATE TABLE key_description ( keyid bigint PRIMARY KEY, description varchar(256) NOT NULL UNIQUE, fk_valuetypeid bigint NOT NULL, unit varchar(256) NOT NULL, -- e.g. '°C' FOREIGN KEY(fk_valuetypeid) REFERENCES value_types(valuetypeid) ON DELETE RESTRICT ); -- ALTER TABLE key_description DROP CONSTRAINT c_key_description_description; -- ALTER TABLE key_description ADD CONSTRAINT c_key_description_description UNIQUE(description); CREATE TABLE log ( id bigserial PRIMARY KEY, datetime timestamp with time zone NOT NULL, tdate date NOT NULL, ttime time with time zone NOT NULL ); CREATE TABLE log_details ( fk_id bigint NOT NULL, fk_keyid bigint NOT NULL, value double precision NOT NULL, FOREIGN KEY (fk_id) REFERENCES log(id) ON DELETE CASCADE, FOREIGN KEY (fk_keyid) REFERENCES key_description(keyid) ON DELETE RESTRICT, CONSTRAINT unique_key_and_id UNIQUE(fk_id, fk_keyid) ); Therefore keyid is unique and eg d1.fk_keyid is unique. With constraint from log_details and d1.fk_keyid is unique fk_id is unique for a given d1.fk_keyid. BTW: I have the old data setup. /var/lib/pgsql-old. Is there a fast setup with old version on different TCP port possible to compare query plans? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
Gerhard Wiesinger <lists@wiesinger.com> writes: > BTW: I have the old data setup. /var/lib/pgsql-old. Is there a fast setup > with old version on different TCP port possible to compare query plans? You'll need to reinstall the old executables. If you put the new executables in the same directories, it's not going to be easy to run both in parallel. If you didn't, then you just need to start the old postmaster using a different port number.
On Mon, 30 Aug 2010, Tom Lane wrote: > Gerhard Wiesinger <lists@wiesinger.com> writes: >> BTW: I have the old data setup. /var/lib/pgsql-old. Is there a fast setup >> with old version on different TCP port possible to compare query plans? > > You'll need to reinstall the old executables. If you put the new > executables in the same directories, it's not going to be easy to > run both in parallel. If you didn't, then you just need to start > the old postmaster using a different port number. > I tried to get 8.3.11 ready again: # Problems with timezone on startup (Default not found) ./configure --with-system-tzdata=/usr/share cp ./src/backend/postgres /bin/postgres-8.3.11 su -l postgres -c "/bin/postgres-8.3.11 -p 54321 -D /var/lib/pgsql-old/data &" >> /var/lib/pgsql-old/pgstartup.log 2>&1 < /dev/null Problem is that PostgreSQL doesn't listen and take much CPU and also disk I/O. 8.3 was shut down cleanly. 8.4 runs in parallel. Are there any problems with shared buffer conflicts? PID USER PR NI VIRT SWAP RES CODE DATA SHR S %CPU %MEM TIME+ COMMAND 6997 postgres 20 0 113m 112m 1236 3688 1064 712 D 38.7 0.0 0:45.43 /bin/postgres-8.3.11 -p 54321 -D /var/lib/pgsql-old/data Any ideas? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
Gerhard Wiesinger <lists@wiesinger.com> writes: > Problem is that PostgreSQL doesn't listen and take much CPU and also disk > I/O. 8.3 was shut down cleanly. Hm, you sure about that? What's in the postmaster log? regards, tom lane
On Thu, 2 Sep 2010, Tom Lane wrote: > Gerhard Wiesinger <lists@wiesinger.com> writes: >> Problem is that PostgreSQL doesn't listen and take much CPU and also disk >> I/O. 8.3 was shut down cleanly. > > Hm, you sure about that? What's in the postmaster log? That's the strange thing: I don't have anything in stdout/stderr log and in pg_log directory and even in syslog. Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
On Thu, 2 Sep 2010, Tom Lane wrote: > Gerhard Wiesinger <lists@wiesinger.com> writes: >> Problem is that PostgreSQL doesn't listen and take much CPU and also disk >> I/O. 8.3 was shut down cleanly. > > Hm, you sure about that? What's in the postmaster log? BTW: Do I need other postgres user with a different home directory? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
Gerhard Wiesinger <lists@wiesinger.com> writes: > On Thu, 2 Sep 2010, Tom Lane wrote: >> Hm, you sure about that? What's in the postmaster log? > That's the strange thing: I don't have anything in stdout/stderr log and > in pg_log directory and even in syslog. Not even in that pgstartup.log file you sent stderr to? I have seen cases before where Postgres couldn't log anything because of SELinux. If this is a Red Hat based system, look in the kernel log for AVC messages. If you see any, then SELinux is probably blocking things because of the nonstandard directory locations. Turning it off temporarily would be the easiest fix, though relabeling the files would be a better one. regards, tom lane
On Fri, 3 Sep 2010, Tom Lane wrote: > Not even in that pgstartup.log file you sent stderr to? > > I have seen cases before where Postgres couldn't log anything because of > SELinux. If this is a Red Hat based system, look in the kernel log for > AVC messages. If you see any, then SELinux is probably blocking things > because of the nonstandard directory locations. Turning it off > temporarily would be the easiest fix, though relabeling the files would > be a better one. SELinux is already disabled. cat /etc/selinux/config | grep -v "^#"|grep -v "^$" SELINUXTYPE=targeted SELINUX=disabled Yes, also the redirected log file is empty. Also kernel log is empty. I tried to redirect it to a different, new file, on startup I get nothing, after killing it I get: 2010-09-03 16:35:39.177 GMT [2149] @/: LOG: could not stat "/usr/share/doc/xalan-j2-manual-2.7.0/apidocs": No such fileor directory Any ideas? BTW: Shared memory can't be any issue? Ciao, Gerhard -- http://www.wiesinger.com/
Gerhard Wiesinger <lists@wiesinger.com> writes: > On Fri, 3 Sep 2010, Tom Lane wrote: >> Not even in that pgstartup.log file you sent stderr to? > Yes, also the redirected log file is empty. Also kernel log is empty. Huh. Try strace'ing the process to see what it's doing. > BTW: Shared memory can't be any issue? If you're not getting any log messages at all, you've got worse problems than shared memory. regards, tom lane
On Fri, 3 Sep 2010, Tom Lane wrote: > Gerhard Wiesinger <lists@wiesinger.com> writes: >> On Fri, 3 Sep 2010, Tom Lane wrote: >>> Not even in that pgstartup.log file you sent stderr to? > >> Yes, also the redirected log file is empty. Also kernel log is empty. > > Huh. Try strace'ing the process to see what it's doing. It tries to find something in /usr/share/ ... Ok, I think from the compile time options: ./configure --with-system-tzdata=/usr/share Previously I had problems with the timezone on startup (Default not found) so I tried to set the directory. Maybe the timezone thing (it looks for the Default timezone) is easier to fix ... Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
Gerhard Wiesinger <lists@wiesinger.com> writes: > On Fri, 3 Sep 2010, Tom Lane wrote: >> Huh. Try strace'ing the process to see what it's doing. > It tries to find something in /usr/share/ ... > Ok, I think from the compile time options: > ./configure --with-system-tzdata=/usr/share Doh. I hadn't looked closely at that. Probably you want /usr/share/zoneinfo --- at least that's what the Red Hat RPMs use. regards, tom lane
On Fri, 3 Sep 2010, Tom Lane wrote: > Gerhard Wiesinger <lists@wiesinger.com> writes: >> On Fri, 3 Sep 2010, Tom Lane wrote: >>> Huh. Try strace'ing the process to see what it's doing. > >> It tries to find something in /usr/share/ ... >> Ok, I think from the compile time options: >> ./configure --with-system-tzdata=/usr/share > > Doh. I hadn't looked closely at that. Probably you want > /usr/share/zoneinfo --- at least that's what the Red Hat RPMs use. > I tried even before I wrote to the mailinglist without success: ./configure ./configure --with-system-tzdata=/usr/share/pgsql/timezonesets ./configure --with-system-tzdata=/usr/share/pgsql ./configure --with-system-tzdata=/usr/share Now I tried without success: ./configure --with-system-tzdata=/usr/share/zoneinfo With last one I also get: 2010-09-03 19:51:29.079 CEST [27916] @/: FATAL: invalid value for parameter "timezone_abbreviations": "Default" Correct directory would be: ls -l /usr/share/pgsql/timezonesets/Default -rw-r--r-- 1 root root 29602 2010-05-17 20:07 /usr/share/pgsql/timezonesets/Default File looks also good. Any ideas? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
Gerhard Wiesinger <lists@wiesinger.com> writes: > On Fri, 3 Sep 2010, Tom Lane wrote: >> Doh. I hadn't looked closely at that. Probably you want >> /usr/share/zoneinfo --- at least that's what the Red Hat RPMs use. > I tried even before I wrote to the mailinglist without success: > ./configure I'd definitely suggest leaving out the --with-system-tzdata option altogether if you're not certain it works. > With last one I also get: > 2010-09-03 19:51:29.079 CEST [27916] @/: FATAL: invalid value for parameter "timezone_abbreviations": "Default" This is a different problem; the --with-system-tzdata option wouldn't affect that. I think what may be happening here is that a postgres executable expects to find itself in a full installation tree, ie if it's in /someplace/bin then the timezone files are in /someplace/share, etc. Did you do a full "make install" after building, or did you just copy the postgres executable? regards, tom lane
On Fri, 3 Sep 2010, Tom Lane wrote: > Gerhard Wiesinger <lists@wiesinger.com> writes: >> On Fri, 3 Sep 2010, Tom Lane wrote: >>> Doh. I hadn't looked closely at that. Probably you want >>> /usr/share/zoneinfo --- at least that's what the Red Hat RPMs use. > >> I tried even before I wrote to the mailinglist without success: >> ./configure > > I'd definitely suggest leaving out the --with-system-tzdata option > altogether if you're not certain it works. OK. > >> With last one I also get: >> 2010-09-03 19:51:29.079 CEST [27916] @/: FATAL: invalid value for parameter "timezone_abbreviations": "Default" > > This is a different problem; the --with-system-tzdata option wouldn't > affect that. > > I think what may be happening here is that a postgres executable expects > to find itself in a full installation tree, ie if it's in /someplace/bin > then the timezone files are in /someplace/share, etc. Did you do a full > "make install" after building, or did you just copy the postgres > executable? I just copied it as discussed in the original mail to avoid that make install kills the 8.4 production RPM version: cp ./src/backend/postgres /bin/postgres-8.3.11 cd /bin ln -s /usr/share/pgsql/timezonesets share cd tarballdir ./configure make cp ./src/backend/postgres /bin/postgres-8.3.11 2010-09-03 18:24:53.936 GMT [11753] @/: LOG: could not open directory "/share/timezone": No such file or directory 2010-09-03 18:24:53.936 GMT [11753] @/: LOG: could not open directory "/share/timezone": No such file or directory 2010-09-03 18:24:53.936 GMT [11753] @/: LOG: could not open directory "/share/timezone": No such file or directory 2010-09-03 18:24:53.936 GMT [11753] @/: LOG: could not open directory "/share/timezone": No such file or directory 2010-09-03 18:24:53.936 GMT [11753] @/: LOG: could not open directory "/share/timezone": No such file or directory 2010-09-03 20:24:53.936 CEST [11753] @/: FATAL: invalid value for parameter "timezone_abbreviations": "Default" I previously made the strace and therefore I added the option to configure to get the right directory. Any further idea where I should copy the binary or any option or any file copy for the time zone files? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
Gerhard Wiesinger <lists@wiesinger.com> writes: > On Fri, 3 Sep 2010, Tom Lane wrote: >> I think what may be happening here is that a postgres executable expects >> to find itself in a full installation tree, ie if it's in /someplace/bin >> then the timezone files are in /someplace/share, etc. Did you do a full >> "make install" after building, or did you just copy the postgres >> executable? > I just copied it as discussed in the original mail to avoid that make > install kills the 8.4 production RPM version: > cp ./src/backend/postgres /bin/postgres-8.3.11 Definitely not going to work. Instead, configure with --prefix set to /someplace/harmless, make, make install, execute from /someplace/harmless/bin/. regards, tom lane
On Fri, 3 Sep 2010, Tom Lane wrote: > Gerhard Wiesinger <lists@wiesinger.com> writes: >> On Fri, 3 Sep 2010, Tom Lane wrote: >>> I think what may be happening here is that a postgres executable expects >>> to find itself in a full installation tree, ie if it's in /someplace/bin >>> then the timezone files are in /someplace/share, etc. Did you do a full >>> "make install" after building, or did you just copy the postgres >>> executable? > >> I just copied it as discussed in the original mail to avoid that make >> install kills the 8.4 production RPM version: >> cp ./src/backend/postgres /bin/postgres-8.3.11 > > Definitely not going to work. Instead, configure with --prefix set > to /someplace/harmless, make, make install, execute from > /someplace/harmless/bin/. Thanks tom, your support for PostgreSQL is really very, very good. Install: ./configure --prefix /opt/postgres-8.3 make make install su -l postgres -c "/opt/postgres-8.3/bin/postgres -p 54321 -D /var/lib/pgsql-old/data &" >> /var/lib/pgsql-old/pgstartup.log2>&1 < /dev/null Back to the original problem: 8.3 query plans: http://www.wiesinger.com/tmp/pg_perf_83_new.txt 8.4 quey plans: http://www.wiesinger.com/tmp/pg_perf_84.txt Main difference as I saw: 8.3: -> Nested Loop Left Join (cost=0.00..1195433.19 rows=67 width=16) 8.4: -> Merge Left Join (cost=1750660.22..4273805884876845789194861338991916289697885665127154313046252183850255795798561612107149662486528.00 rows=238233578115856634454073334945297075430094545596765511255148896328828230572227215727052643001958400width=16) Any ideas why? How to fix? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
Gerhard Wiesinger <lists@wiesinger.com> writes: > Back to the original problem: Finally ;-) > 8.3 query plans: http://www.wiesinger.com/tmp/pg_perf_83_new.txt > 8.4 quey plans: http://www.wiesinger.com/tmp/pg_perf_84.txt Hmm. The 8.3 plan is indeed assuming that the number of rows will stay constant as we bubble up through the join levels, but AFAICS this is simply wrong: -> Nested Loop Left Join (cost=0.00..38028.89 rows=67 width=8) -> Nested Loop Left Join (cost=0.00..25399.46 rows=67 width=8) -> Nested Loop Left Join (cost=0.00..12770.04 rows=67 width=8) -> Index Scan using i_log_unique on log l (cost=0.00..140.61 rows=67 width=8) Index Cond: (datetime >= (now() - '00:01:00'::interval)) -> Index Scan using unique_key_and_id on log_details d7 (cost=0.00..187.39 rows=89 width=8) Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6)) -> Index Scan using unique_key_and_id on log_details d6 (cost=0.00..187.39 rows=89 width=8) Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5)) -> Index Scan using unique_key_and_id on log_details d5 (cost=0.00..187.39 rows=89 width=8) Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4)) If the log_details indexscans are expected to produce 89 rows per execution, then surely the join size should go up 89x at each level, because the join steps themselves don't eliminate anything. In 8.4 the arithmetic is at least self-consistent: -> Nested Loop Left Join (cost=0.00..505256.95 rows=57630 width=8) -> Nested Loop Left Join (cost=0.00..294671.96 rows=6059 width=8) -> Nested Loop Left Join (cost=0.00..272532.55 rows=637 width=8) -> Index Scan using log_pkey on log l (cost=0.00..270203.92 rows=67 width=8) Filter: (datetime >= (now() - '00:01:00'::interval)) -> Index Scan using unique_key_and_id on log_details d7 (cost=0.00..34.63 rows=10 width=8) Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6)) -> Index Scan using unique_key_and_id on log_details d6 (cost=0.00..34.63 rows=10 width=8) Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5)) -> Index Scan using unique_key_and_id on log_details d5 (cost=0.00..34.63 rows=10 width=8) Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4)) The rowcount estimates are apparently a shade less than 10, but they get rounded off in the display. I believe the reason for this change is that 8.4's join estimation code was rewritten so that it wasn't completely bogus for outer joins. 8.3 might have been getting the right answer, but it was for the wrong reasons. So the real question to be answered here is why doesn't it think that each of the unique_key_and_id indexscans produce just a single row, as you indicated was the case. The 8.4 estimate is already a factor of almost 10 closer to reality than 8.3's, but you need another factor of 10. You might find that increasing the statistics target for the log_details table helps. regards, tom lane
On Fri, 3 Sep 2010, Tom Lane wrote: > Gerhard Wiesinger <lists@wiesinger.com> writes: >> 8.3 query plans: http://www.wiesinger.com/tmp/pg_perf_83_new.txt >> 8.4 quey plans: http://www.wiesinger.com/tmp/pg_perf_84.txt > > Hmm. The 8.3 plan is indeed assuming that the number of rows will stay > constant as we bubble up through the join levels, but AFAICS this is > simply wrong: > > -> Nested Loop Left Join (cost=0.00..38028.89 rows=67 width=8) > -> Nested Loop Left Join (cost=0.00..25399.46 rows=67 width=8) > -> Nested Loop Left Join (cost=0.00..12770.04 rows=67 width=8) > -> Index Scan using i_log_unique on log l (cost=0.00..140.61 rows=67 width=8) > Index Cond: (datetime >= (now() - '00:01:00'::interval)) > -> Index Scan using unique_key_and_id on log_details d7 (cost=0.00..187.39 rows=89 width=8) > Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6)) > -> Index Scan using unique_key_and_id on log_details d6 (cost=0.00..187.39 rows=89 width=8) > Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5)) > -> Index Scan using unique_key_and_id on log_details d5 (cost=0.00..187.39 rows=89 width=8) > Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4)) > > If the log_details indexscans are expected to produce 89 rows per > execution, then surely the join size should go up 89x at each level, > because the join steps themselves don't eliminate anything. > > In 8.4 the arithmetic is at least self-consistent: > > -> Nested Loop Left Join (cost=0.00..505256.95 rows=57630 width=8) > -> Nested Loop Left Join (cost=0.00..294671.96 rows=6059 width=8) > -> Nested Loop Left Join (cost=0.00..272532.55 rows=637 width=8) > -> Index Scan using log_pkey on log l (cost=0.00..270203.92 rows=67 width=8) > Filter: (datetime >= (now() - '00:01:00'::interval)) > -> Index Scan using unique_key_and_id on log_details d7 (cost=0.00..34.63 rows=10 width=8) > Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6)) > -> Index Scan using unique_key_and_id on log_details d6 (cost=0.00..34.63 rows=10 width=8) > Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5)) > -> Index Scan using unique_key_and_id on log_details d5 (cost=0.00..34.63 rows=10 width=8) > Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4)) > > The rowcount estimates are apparently a shade less than 10, but they get > rounded off in the display. > > I believe the reason for this change is that 8.4's join estimation code > was rewritten so that it wasn't completely bogus for outer joins. 8.3 > might have been getting the right answer, but it was for the wrong > reasons. > > So the real question to be answered here is why doesn't it think that > each of the unique_key_and_id indexscans produce just a single row, as > you indicated was the case. The 8.4 estimate is already a factor of > almost 10 closer to reality than 8.3's, but you need another factor of > 10. You might find that increasing the statistics target for the > log_details table helps. Ok, Tom, tried different things (more details are below): 1.) Setting statistic target to 1000 and 10000 (without success), still merge join 2.) Tried to added a Index on description to help the planner for uniqueness (without success) 3.) Forced the planner to use nested loop joins (SUCCESS): SET enable_hashjoin=false;SET enable_mergejoin=false; (BTW: How do use such settings in Java and PHP and Perl, is there a command available?) Open questions: Why does the planner not choose nested loop joins, that should be the optimal one for that situation? Does the planner value: a.) UNIQUENESS b.) UNIQUENESS and NOT NULLs? Any ideas for improvement of the planner? Details: -- CREATE UNIQUE INDEX unique_key_and_id ON log_details USING btree (fk_id, fk_keyid) -- 1000 and 10000 didn't help for better query plan for Nested Loop Left Join, still Merge Left Join -- Sample with: -- ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 10000; -- ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 10000; -- ANALYZE VERBOSE log_details; -- Still Merge Join: -- -> Merge Left Join (cost=9102353.88..83786934.25 rows=2726186787 width=16) -- Merge Cond: (l.id = d2000902.fk_id) -- -> Merge Left Join (cost=8926835.18..40288402.09 rows=972687282 width=24) -- Merge Cond: (l.id = d2000904.fk_id) -- Default values again ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 100; ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 100; ANALYZE VERBOSE log_details; -- Tried to add WITHOUT SUCCESS (that planner could know that description is NOT NULL and UNIQE) DROP INDEX IF EXISTS i_key_description_desc; CREATE UNIQUE INDEX i_key_description_desc ON key_description (description); -- Therefore planner should know: keyid is NOT NULL and UNIQUE and only one result: (SELECT keyid FROM key_description WHEREdescription = 'Raumsolltemperatur') -- Therefore from constraint planner should know that fk_id is NOT NULL and UNIQUE: CONSTRAINT unique_key_and_id UNIQUE(fk_id,fk_keyid): -- LEFT JOIN log_details d1 ON l.id = d1.fk_id AND -- d1.fk_keyid = (SELECT keyid FROM key_description WHERE description = 'Raumsolltemperatur') -- Does the planner value alls those UNIQUEnesses and NOT NULLs? -- Again back to 8.3 query plan which is fast (319ms): SET enable_hashjoin=false; SET enable_mergejoin=false; -- -> Nested Loop Left Join (cost=0.00..22820970510.45 rows=2727492136 width=16) -- -> Nested Loop Left Join (cost=0.00..12810087616.29 rows=973121653 width=24) -- -> Nested Loop Left Join (cost=0.00..9238379092.22 rows=347192844 width=24) Thnx. Ciao, Gerhard -- http://www.wiesinger.com/
Hello, Any news or ideas regarding this issue? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ On Sat, 4 Sep 2010, Gerhard Wiesinger wrote: > On Fri, 3 Sep 2010, Tom Lane wrote: > >> Gerhard Wiesinger <lists@wiesinger.com> writes: >>> 8.3 query plans: http://www.wiesinger.com/tmp/pg_perf_83_new.txt >>> 8.4 quey plans: http://www.wiesinger.com/tmp/pg_perf_84.txt >> >> Hmm. The 8.3 plan is indeed assuming that the number of rows will stay >> constant as we bubble up through the join levels, but AFAICS this is >> simply wrong: >> >> -> Nested Loop Left Join (cost=0.00..38028.89 rows=67 width=8) >> -> Nested Loop Left Join (cost=0.00..25399.46 rows=67 width=8) >> -> Nested Loop Left Join (cost=0.00..12770.04 rows=67 >> width=8) >> -> Index Scan using i_log_unique on log l >> (cost=0.00..140.61 rows=67 width=8) >> Index Cond: (datetime >= (now() - '00:01:00'::interval)) >> -> Index Scan using unique_key_and_id on log_details d7 >> (cost=0.00..187.39 rows=89 width=8) >> Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6)) >> -> Index Scan using unique_key_and_id on log_details d6 >> (cost=0.00..187.39 rows=89 width=8) >> Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5)) >> -> Index Scan using unique_key_and_id on log_details d5 >> (cost=0.00..187.39 rows=89 width=8) >> Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4)) >> >> If the log_details indexscans are expected to produce 89 rows per >> execution, then surely the join size should go up 89x at each level, >> because the join steps themselves don't eliminate anything. >> >> In 8.4 the arithmetic is at least self-consistent: >> >> -> Nested Loop Left Join (cost=0.00..505256.95 rows=57630 >> width=8) >> -> Nested Loop Left Join (cost=0.00..294671.96 rows=6059 >> width=8) >> -> Nested Loop Left Join (cost=0.00..272532.55 rows=637 >> width=8) >> -> Index Scan using log_pkey on log l >> (cost=0.00..270203.92 rows=67 width=8) >> Filter: (datetime >= (now() - '00:01:00'::interval)) >> -> Index Scan using unique_key_and_id on log_details d7 >> (cost=0.00..34.63 rows=10 width=8) >> Index Cond: ((l.id = d7.fk_id) AND (d7.fk_keyid = $6)) >> -> Index Scan using unique_key_and_id on log_details d6 >> (cost=0.00..34.63 rows=10 width=8) >> Index Cond: ((l.id = d6.fk_id) AND (d6.fk_keyid = $5)) >> -> Index Scan using unique_key_and_id on log_details d5 >> (cost=0.00..34.63 rows=10 width=8) >> Index Cond: ((l.id = d5.fk_id) AND (d5.fk_keyid = $4)) >> >> The rowcount estimates are apparently a shade less than 10, but they get >> rounded off in the display. >> >> I believe the reason for this change is that 8.4's join estimation code >> was rewritten so that it wasn't completely bogus for outer joins. 8.3 >> might have been getting the right answer, but it was for the wrong >> reasons. >> >> So the real question to be answered here is why doesn't it think that >> each of the unique_key_and_id indexscans produce just a single row, as >> you indicated was the case. The 8.4 estimate is already a factor of >> almost 10 closer to reality than 8.3's, but you need another factor of >> 10. You might find that increasing the statistics target for the >> log_details table helps. > > > Ok, Tom, tried different things (more details are below): > 1.) Setting statistic target to 1000 and 10000 (without success), still merge > join > 2.) Tried to added a Index on description to help the planner for uniqueness > (without success) > 3.) Forced the planner to use nested loop joins (SUCCESS): > SET enable_hashjoin=false;SET enable_mergejoin=false; > (BTW: How do use such settings in Java and PHP and Perl, is there a command > available?) > > Open questions: > Why does the planner not choose nested loop joins, that should be the optimal > one for that situation? > Does the planner value: a.) UNIQUENESS b.) UNIQUENESS and NOT NULLs? > Any ideas for improvement of the planner? > > Details: > -- CREATE UNIQUE INDEX unique_key_and_id ON log_details USING btree (fk_id, > fk_keyid) > -- 1000 and 10000 didn't help for better query plan for Nested Loop Left > Join, still Merge Left Join > -- Sample with: > -- ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 10000; > -- ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 10000; > -- ANALYZE VERBOSE log_details; > -- Still Merge Join: > -- -> Merge Left Join (cost=9102353.88..83786934.25 rows=2726186787 > width=16) > -- Merge Cond: (l.id = d2000902.fk_id) > -- -> Merge Left Join (cost=8926835.18..40288402.09 rows=972687282 > width=24) > -- Merge Cond: (l.id = d2000904.fk_id) > -- Default values again > ALTER TABLE log_details ALTER COLUMN fk_id SET STATISTICS 100; > ALTER TABLE log_details ALTER COLUMN fk_keyid SET STATISTICS 100; > ANALYZE VERBOSE log_details; > > -- Tried to add WITHOUT SUCCESS (that planner could know that description is > NOT NULL and UNIQE) > DROP INDEX IF EXISTS i_key_description_desc; > CREATE UNIQUE INDEX i_key_description_desc ON key_description (description); > -- Therefore planner should know: keyid is NOT NULL and UNIQUE and only one > result: (SELECT keyid FROM key_description WHERE description = > 'Raumsolltemperatur') > -- Therefore from constraint planner should know that fk_id is NOT NULL and > UNIQUE: CONSTRAINT unique_key_and_id UNIQUE(fk_id, fk_keyid): > -- LEFT JOIN log_details d1 ON l.id = d1.fk_id AND > -- d1.fk_keyid = (SELECT keyid FROM key_description WHERE description = > 'Raumsolltemperatur') > -- Does the planner value alls those UNIQUEnesses and NOT NULLs? > > -- Again back to 8.3 query plan which is fast (319ms): > SET enable_hashjoin=false; > SET enable_mergejoin=false; > -- -> Nested Loop Left Join (cost=0.00..22820970510.45 rows=2727492136 > width=16) > -- -> Nested Loop Left Join (cost=0.00..12810087616.29 > rows=973121653 width=24) > -- -> Nested Loop Left Join (cost=0.00..9238379092.22 > rows=347192844 width=24) > > Thnx. > > Ciao, > Gerhard > > -- > http://www.wiesinger.com/ > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Mon, Sep 13, 2010 at 2:39 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote: > Hello, > > Any news or ideas regarding this issue? hm. is retooling the query an option? specifically, can you try converting CREATE OR REPLACE VIEW log_entries AS SELECT l.id AS id, l.datetime AS datetime, l.tdate AS tdate, l.ttime AS ttime, d1.value AS Raumsolltemperatur, [...] FROM log l LEFT JOIN log_details d1 ON l.id = d1.fk_id AND d1.fk_keyid = (SELECT keyid FROM key_description WHERE description = 'Raumsolltemperatur') [...] to CREATE OR REPLACE VIEW log_entries AS SELECT l.id AS id, l.datetime AS datetime, l.tdate AS tdate, l.ttime AS ttime, (select value from log_details ld join key_description kd on ld.fk_keyid = kd.keyid where ld.fk_id = l.id and description = 'Raumsolltemperatur') AS Raumsolltemperatur, [...] (I am not 100% sure I have your head around your query, but I think I do)? This should get you a guaranteed (although not necessarily 'the best' plan, with each returned view column being treated independently of the other (is that what you want?). Also, if schema changes are under consideration, you can play log_details/key_description, using natural key and cut out one of the joins. I can't speak to some of the more complex planner issues at play, but your query absolutely screams optimization at the SQL level. What I am 100% sure of, is that you can get better performance if you do a little out of the box thinking here... merlin
Hello Merlin, Seems to be a feasible approach. On problem which might be that when multiple rows are returned that they are not ordered in each subselect correctly. Any idea to solve that? e.g. Raumsolltemperatur | Raumisttemperatur Value from time 1 | Value from time 2 Value from time 2 | Value from time 1 but should be Raumsolltemperatur | Raumisttemperatur Value from time 1 | Value from time 1 Value from time 2 | Value from time 2 But that might be solveable by first selecting keys from the log_details table and then join again. I will try it in the evening and I have to think about in detail. But thank you for the new approach and opening the mind :-) Ciao, Gerhard -- http://www.wiesinger.com/ On Mon, 13 Sep 2010, Merlin Moncure wrote: > On Mon, Sep 13, 2010 at 2:39 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote: >> Hello, >> >> Any news or ideas regarding this issue? > > hm. is retooling the query an option? specifically, can you try converting > > CREATE OR REPLACE VIEW log_entries AS > SELECT > l.id AS id, > l.datetime AS datetime, > l.tdate AS tdate, > l.ttime AS ttime, > d1.value AS Raumsolltemperatur, > [...] > FROM > log l > LEFT JOIN log_details d1 ON l.id = d1.fk_id AND > d1.fk_keyid = (SELECT keyid FROM key_description WHERE description = > 'Raumsolltemperatur') > [...] > > to > > CREATE OR REPLACE VIEW log_entries AS > SELECT > l.id AS id, > l.datetime AS datetime, > l.tdate AS tdate, > l.ttime AS ttime, > (select value from log_details ld join key_description kd on > ld.fk_keyid = kd.keyid where ld.fk_id = l.id and description = > 'Raumsolltemperatur') AS Raumsolltemperatur, > [...] > > (I am not 100% sure I have your head around your query, but I think I do)? > This should get you a guaranteed (although not necessarily 'the best' > plan, with each returned view column being treated independently of > the other (is that what you want?). Also, if schema changes are under > consideration, you can play log_details/key_description, using natural > key and cut out one of the joins. I can't speak to some of the more > complex planner issues at play, but your query absolutely screams > optimization at the SQL level. > > What I am 100% sure of, is that you can get better performance if you > do a little out of the box thinking here... > > merlin >
On Tue, Sep 14, 2010 at 2:07 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote: > Hello Merlin, > > Seems to be a feasible approach. On problem which might be that when > multiple rows are returned that they are not ordered in each subselect > correctly. Any idea to solve that? > > e.g. > Raumsolltemperatur | Raumisttemperatur > Value from time 1 | Value from time 2 > Value from time 2 | Value from time 1 > > but should be > Raumsolltemperatur | Raumisttemperatur > Value from time 1 | Value from time 1 > Value from time 2 | Value from time 2 > > But that might be solveable by first selecting keys from the log_details > table and then join again. > > I will try it in the evening and I have to think about in detail. > > But thank you for the new approach and opening the mind :-) Using subquery in that style select (<subquery>), ... is limited to results that return 1 row, 1 column. I assumed that was the case...if it isn't in your view, you can always attempt arrays: CREATE OR REPLACE VIEW log_entries AS SELECT l.id AS id, l.datetime AS datetime, l.tdate AS tdate, l.ttime AS ttime, array(select value from log_details ld join key_description kd on ld.fk_keyid = kd.keyid where ld.fk_id = l.id and description = 'Raumsolltemperatur' order by XYZ) AS Raumsolltemperatur, [...] arrays might raise the bar somewhat in terms of dealing with the returned data, or they might work great. some experimentation is in order. XYZ being the ordering condition you want. If that isn't available inside the join then we need to think about this some more. We could probably help more if you could describe the schema in a little more detail. This is solvable. merlin
On Tue, 14 Sep 2010, Merlin Moncure wrote: > On Tue, Sep 14, 2010 at 2:07 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote: >> Hello Merlin, >> >> Seems to be a feasible approach. On problem which might be that when >> multiple rows are returned that they are not ordered in each subselect >> correctly. Any idea to solve that? >> >> e.g. >> Raumsolltemperatur | Raumisttemperatur >> Value from time 1 | Value from time 2 >> Value from time 2 | Value from time 1 >> >> but should be >> Raumsolltemperatur | Raumisttemperatur >> Value from time 1 | Value from time 1 >> Value from time 2 | Value from time 2 >> >> But that might be solveable by first selecting keys from the log_details >> table and then join again. >> >> I will try it in the evening and I have to think about in detail. >> >> But thank you for the new approach and opening the mind :-) > > Using subquery in that style select (<subquery>), ... is limited to > results that return 1 row, 1 column. I assumed that was the case...if > it isn't in your view, you can always attempt arrays: > > CREATE OR REPLACE VIEW log_entries AS > SELECT > l.id AS id, > l.datetime AS datetime, > l.tdate AS tdate, > l.ttime AS ttime, > array(select value from log_details ld join key_description kd on > ld.fk_keyid = kd.keyid where ld.fk_id = l.id and description = > 'Raumsolltemperatur' order by XYZ) AS Raumsolltemperatur, > [...] > > arrays might raise the bar somewhat in terms of dealing with the > returned data, or they might work great. some experimentation is in > order. > > XYZ being the ordering condition you want. If that isn't available > inside the join then we need to think about this some more. We could > probably help more if you could describe the schema in a little more > detail. This is solvable. Of course, subquery is limited to a result set returning 1 row and 1 column. Also order is of course preserved because of the join. Further, I think I found a perfect query plan for the EAV pattern. First I tried your suggestion but there were some limitation with O(n^2) efforts (e.g. nested loops=12586 and also index scans with loop 12586): CREATE OR REPLACE VIEW log_entries_test AS SELECT l.id AS id, l.datetime AS datetime, l.tdate AS tdate, l.ttime AS ttime, (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid = kd.keyid WHERE l.id = d.fk_id AND kd.description= 'Raumsolltemperatur') AS Raumsolltemperatur, (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid = kd.keyid WHERE l.id = d.fk_id AND kd.description= 'Raumtemperatur') AS Raumtemperatur, (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid = kd.keyid WHERE l.id = d.fk_id AND kd.description= 'Kesselsolltemperatur') AS Kesselsolltemperatur, (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid = kd.keyid WHERE l.id = d.fk_id AND kd.description= 'Kesseltemperatur') AS Kesseltemperatur, .... FROM log l ; ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ EXPLAIN ANALYZE SELECT * FROM log_entries_test WHERE datetime > now() - INTERVAL '10 days' ORDER BY datetime DESC; ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Index Scan Backward using i_log_unique on log l (cost=0.00..140820.12 rows=69 width=32) (actual time=2.848..22812.331 rows=12586loops=1) Index Cond: (datetime > (now() - '10 days'::interval)) SubPlan 1 -> Nested Loop (cost=0.00..19.99 rows=1 width=8) (actual time=0.007..0.018 rows=1 loops=12586) -> Seq Scan on key_description kd (cost=0.00..2.38 rows=1 width=8) (actual time=0.003..0.013 rows=1 loops=12586) Filter: ((description)::text = 'Raumsolltemperatur'::text) -> Index Scan using unique_key_and_id on log_details d (cost=0.00..17.60 rows=1 width=16) (actual time=0.004..0.004rows=1 loops=12586) Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid)) SubPlan 2 -> Nested Loop (cost=0.00..19.99 rows=1 width=8) (actual time=0.006..0.017 rows=1 loops=12586) -> Seq Scan on key_description kd (cost=0.00..2.38 rows=1 width=8) (actual time=0.003..0.013 rows=1 loops=12586) Filter: ((description)::text = 'Raumtemperatur'::text) -> Index Scan using unique_key_and_id on log_details d (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003rows=1 loops=12586) Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid)) SubPlan 3 -> Nested Loop (cost=0.00..19.99 rows=1 width=8) (actual time=0.005..0.017 rows=1 loops=12586) -> Seq Scan on key_description kd (cost=0.00..2.38 rows=1 width=8) (actual time=0.002..0.013 rows=1 loops=12586) Filter: ((description)::text = 'Kesselsolltemperatur'::text) -> Index Scan using unique_key_and_id on log_details d (cost=0.00..17.60 rows=1 width=16) (actual time=0.003..0.003rows=1 loops=12586) Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid)) SubPlan 4 -> Nested Loop (cost=0.00..19.99 rows=1 width=8) (actual time=0.006..0.017 rows=1 loops=12586) -> Seq Scan on key_description kd (cost=0.00..2.38 rows=1 width=8) (actual time=0.002..0.013 rows=1 loops=12586) Filter: ((description)::text = 'Kesseltemperatur'::text) -> Index Scan using unique_key_and_id on log_details d (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003rows=1 loops=12586) Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid)) SubPlan 5 -> Nested Loop (cost=0.00..19.99 rows=1 width=8) (actual time=0.005..0.017 rows=1 loops=12586) -> Seq Scan on key_description kd (cost=0.00..2.38 rows=1 width=8) (actual time=0.002..0.014 rows=1 loops=12586) Filter: ((description)::text = 'Speichersolltemperatur'::text) -> Index Scan using unique_key_and_id on log_details d (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003rows=1 loops=12586) Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid)) SubPlan 6 -> Nested Loop (cost=0.00..19.99 rows=1 width=8) (actual time=0.006..0.017 rows=1 loops=12586) -> Seq Scan on key_description kd (cost=0.00..2.38 rows=1 width=8) (actual time=0.003..0.013 rows=1 loops=12586) Filter: ((description)::text = 'Speichertemperatur'::text) -> Index Scan using unique_key_and_id on log_details d (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003rows=1 loops=12586) Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid)) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Therefore I optimized the query further which can be done in the following way with another subquery and IHMO a perfect query plan. Also the subselect avoid multiple iterations for each of the result rows: CREATE OR REPLACE VIEW log_entries_test AS SELECT l.id AS id, l.datetime AS datetime, l.tdate AS tdate, l.ttime AS ttime, (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = (SELECT keyid FROM key_description WHERE description= 'Raumsolltemperatur')) AS Raumsolltemperatur, (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = (SELECT keyid FROM key_description WHERE description= 'Raumtemperatur')) AS Raumtemperatur, (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = (SELECT keyid FROM key_description WHERE description= 'Kesselsolltemperatur')) AS Kesselsolltemperatur, (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = (SELECT keyid FROM key_description WHERE description= 'Kesseltemperatur')) AS Kesseltemperatur, ... FROM log l ; ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ EXPLAIN ANALYZE SELECT * FROM log_entries_test WHERE datetime > now() - INTERVAL '10 days' ORDER BY datetime DESC; ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Index Scan Backward using i_log_unique on log l (cost=0.00..140603.99 rows=69 width=32) (actual time=2.588..5602.899 rows=12586loops=1) Index Cond: (datetime > (now() - '10 days'::interval)) SubPlan 2 -> Index Scan using unique_key_and_id on log_details d (cost=2.38..19.97 rows=1 width=8) (actual time=0.010..0.011rows=1 loops=12586) Index Cond: (($1 = fk_id) AND (fk_keyid = $0)) InitPlan 1 (returns $0) -> Seq Scan on key_description (cost=0.00..2.38 rows=1 width=8) (actual time=0.015..0.066 rows=1 loops=1) Filter: ((description)::text = 'Raumsolltemperatur'::text) SubPlan 4 -> Index Scan using unique_key_and_id on log_details d (cost=2.38..19.97 rows=1 width=8) (actual time=0.003..0.003rows=1 loops=12586) Index Cond: (($1 = fk_id) AND (fk_keyid = $2)) InitPlan 3 (returns $2) -> Seq Scan on key_description (cost=0.00..2.38 rows=1 width=8) (actual time=0.009..0.020 rows=1 loops=1) Filter: ((description)::text = 'Raumtemperatur'::text) SubPlan 6 -> Index Scan using unique_key_and_id on log_details d (cost=2.38..19.97 rows=1 width=8) (actual time=0.002..0.003rows=1 loops=12586) Index Cond: (($1 = fk_id) AND (fk_keyid = $3)) InitPlan 5 (returns $3) -> Seq Scan on key_description (cost=0.00..2.38 rows=1 width=8) (actual time=0.005..0.017 rows=1 loops=1) Filter: ((description)::text = 'Kesselsolltemperatur'::text) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ BTW: Schemadata is in the links discussed in the thread Thnx to all for helping me. Ciao, Gerhard -- http://www.wiesinger.com/
On Tue, Sep 14, 2010 at 3:59 PM, Gerhard Wiesinger <lists@wiesinger.com> wrote: > On Tue, 14 Sep 2010, Merlin Moncure wrote: > >> On Tue, Sep 14, 2010 at 2:07 AM, Gerhard Wiesinger <lists@wiesinger.com> >> wrote: >>> >>> Hello Merlin, >>> >>> Seems to be a feasible approach. On problem which might be that when >>> multiple rows are returned that they are not ordered in each subselect >>> correctly. Any idea to solve that? >>> >>> e.g. >>> Raumsolltemperatur | Raumisttemperatur >>> Value from time 1 | Value from time 2 >>> Value from time 2 | Value from time 1 >>> >>> but should be >>> Raumsolltemperatur | Raumisttemperatur >>> Value from time 1 | Value from time 1 >>> Value from time 2 | Value from time 2 >>> >>> But that might be solveable by first selecting keys from the log_details >>> table and then join again. >>> >>> I will try it in the evening and I have to think about in detail. >>> >>> But thank you for the new approach and opening the mind :-) >> >> Using subquery in that style select (<subquery>), ... is limited to >> results that return 1 row, 1 column. I assumed that was the case...if >> it isn't in your view, you can always attempt arrays: >> >> CREATE OR REPLACE VIEW log_entries AS >> SELECT >> l.id AS id, >> l.datetime AS datetime, >> l.tdate AS tdate, >> l.ttime AS ttime, >> array(select value from log_details ld join key_description kd on >> ld.fk_keyid = kd.keyid where ld.fk_id = l.id and description = >> 'Raumsolltemperatur' order by XYZ) AS Raumsolltemperatur, >> [...] >> >> arrays might raise the bar somewhat in terms of dealing with the >> returned data, or they might work great. some experimentation is in >> order. >> >> XYZ being the ordering condition you want. If that isn't available >> inside the join then we need to think about this some more. We could >> probably help more if you could describe the schema in a little more >> detail. This is solvable. > > Of course, subquery is limited to a result set returning 1 row and 1 column. > Also order is of course preserved because of the join. > > Further, I think I found a perfect query plan for the EAV pattern. > > First I tried your suggestion but there were some limitation with O(n^2) > efforts (e.g. nested loops=12586 and also index scans with loop 12586): > > CREATE OR REPLACE VIEW log_entries_test AS > SELECT > l.id AS id, > l.datetime AS datetime, > l.tdate AS tdate, > l.ttime AS ttime, > (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid = > kd.keyid WHERE l.id = d.fk_id AND kd.description = 'Raumsolltemperatur') AS > Raumsolltemperatur, > (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid = > kd.keyid WHERE l.id = d.fk_id AND kd.description = 'Raumtemperatur') AS > Raumtemperatur, > (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid = > kd.keyid WHERE l.id = d.fk_id AND kd.description = 'Kesselsolltemperatur') > AS Kesselsolltemperatur, > (SELECT value FROM log_details d JOIN key_description kd ON d.fk_keyid = > kd.keyid WHERE l.id = d.fk_id AND kd.description = 'Kesseltemperatur') AS > Kesseltemperatur, > .... > FROM > log l > ; > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > EXPLAIN ANALYZE SELECT * FROM log_entries_test WHERE datetime > now() - > INTERVAL '10 days' ORDER BY datetime DESC; > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > Index Scan Backward using i_log_unique on log l (cost=0.00..140820.12 > rows=69 width=32) (actual time=2.848..22812.331 rows=12586 loops=1) > Index Cond: (datetime > (now() - '10 days'::interval)) > SubPlan 1 > -> Nested Loop (cost=0.00..19.99 rows=1 width=8) (actual > time=0.007..0.018 rows=1 loops=12586) > -> Seq Scan on key_description kd (cost=0.00..2.38 rows=1 > width=8) (actual time=0.003..0.013 rows=1 loops=12586) > Filter: ((description)::text = 'Raumsolltemperatur'::text) > -> Index Scan using unique_key_and_id on log_details d > (cost=0.00..17.60 rows=1 width=16) (actual time=0.004..0.004 rows=1 > loops=12586) > Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid)) > SubPlan 2 > -> Nested Loop (cost=0.00..19.99 rows=1 width=8) (actual > time=0.006..0.017 rows=1 loops=12586) > -> Seq Scan on key_description kd (cost=0.00..2.38 rows=1 > width=8) (actual time=0.003..0.013 rows=1 loops=12586) > Filter: ((description)::text = 'Raumtemperatur'::text) > -> Index Scan using unique_key_and_id on log_details d > (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003 rows=1 > loops=12586) > Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid)) > SubPlan 3 > -> Nested Loop (cost=0.00..19.99 rows=1 width=8) (actual > time=0.005..0.017 rows=1 loops=12586) > -> Seq Scan on key_description kd (cost=0.00..2.38 rows=1 > width=8) (actual time=0.002..0.013 rows=1 loops=12586) > Filter: ((description)::text = 'Kesselsolltemperatur'::text) > -> Index Scan using unique_key_and_id on log_details d > (cost=0.00..17.60 rows=1 width=16) (actual time=0.003..0.003 rows=1 > loops=12586) > Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid)) > SubPlan 4 > -> Nested Loop (cost=0.00..19.99 rows=1 width=8) (actual > time=0.006..0.017 rows=1 loops=12586) > -> Seq Scan on key_description kd (cost=0.00..2.38 rows=1 > width=8) (actual time=0.002..0.013 rows=1 loops=12586) > Filter: ((description)::text = 'Kesseltemperatur'::text) > -> Index Scan using unique_key_and_id on log_details d > (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003 rows=1 > loops=12586) > Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid)) > SubPlan 5 > -> Nested Loop (cost=0.00..19.99 rows=1 width=8) (actual > time=0.005..0.017 rows=1 loops=12586) > -> Seq Scan on key_description kd (cost=0.00..2.38 rows=1 > width=8) (actual time=0.002..0.014 rows=1 loops=12586) > Filter: ((description)::text = > 'Speichersolltemperatur'::text) > -> Index Scan using unique_key_and_id on log_details d > (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003 rows=1 > loops=12586) > Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid)) > SubPlan 6 > -> Nested Loop (cost=0.00..19.99 rows=1 width=8) (actual > time=0.006..0.017 rows=1 loops=12586) > -> Seq Scan on key_description kd (cost=0.00..2.38 rows=1 > width=8) (actual time=0.003..0.013 rows=1 loops=12586) > Filter: ((description)::text = 'Speichertemperatur'::text) > -> Index Scan using unique_key_and_id on log_details d > (cost=0.00..17.60 rows=1 width=16) (actual time=0.002..0.003 rows=1 > loops=12586) > Index Cond: (($0 = d.fk_id) AND (d.fk_keyid = kd.keyid)) > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Therefore I optimized the query further which can be done in the following > way with another subquery and IHMO a perfect query plan. Also the subselect > avoid multiple iterations for each of the result rows: > > CREATE OR REPLACE VIEW log_entries_test AS > SELECT > l.id AS id, > l.datetime AS datetime, > l.tdate AS tdate, > l.ttime AS ttime, > (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = > (SELECT keyid FROM key_description WHERE description = > 'Raumsolltemperatur')) AS Raumsolltemperatur, > (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = > (SELECT keyid FROM key_description WHERE description = 'Raumtemperatur')) AS > Raumtemperatur, > (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = > (SELECT keyid FROM key_description WHERE description = > 'Kesselsolltemperatur')) AS Kesselsolltemperatur, > (SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = > (SELECT keyid FROM key_description WHERE description = 'Kesseltemperatur')) > AS Kesseltemperatur, > ... > FROM > log l > ; > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > EXPLAIN ANALYZE SELECT * FROM log_entries_test WHERE datetime > now() - > INTERVAL '10 days' ORDER BY datetime DESC; > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > Index Scan Backward using i_log_unique on log l (cost=0.00..140603.99 > rows=69 width=32) (actual time=2.588..5602.899 rows=12586 loops=1) > Index Cond: (datetime > (now() - '10 days'::interval)) > SubPlan 2 > -> Index Scan using unique_key_and_id on log_details d > (cost=2.38..19.97 rows=1 width=8) (actual time=0.010..0.011 rows=1 > loops=12586) > Index Cond: (($1 = fk_id) AND (fk_keyid = $0)) > InitPlan 1 (returns $0) > -> Seq Scan on key_description (cost=0.00..2.38 rows=1 width=8) > (actual time=0.015..0.066 rows=1 loops=1) > Filter: ((description)::text = 'Raumsolltemperatur'::text) > SubPlan 4 > -> Index Scan using unique_key_and_id on log_details d > (cost=2.38..19.97 rows=1 width=8) (actual time=0.003..0.003 rows=1 > loops=12586) > Index Cond: (($1 = fk_id) AND (fk_keyid = $2)) > InitPlan 3 (returns $2) > -> Seq Scan on key_description (cost=0.00..2.38 rows=1 width=8) > (actual time=0.009..0.020 rows=1 loops=1) > Filter: ((description)::text = 'Raumtemperatur'::text) > SubPlan 6 > -> Index Scan using unique_key_and_id on log_details d > (cost=2.38..19.97 rows=1 width=8) (actual time=0.002..0.003 rows=1 > loops=12586) > Index Cond: (($1 = fk_id) AND (fk_keyid = $3)) > InitPlan 5 (returns $3) > -> Seq Scan on key_description (cost=0.00..2.38 rows=1 width=8) > (actual time=0.005..0.017 rows=1 loops=1) > Filter: ((description)::text = > 'Kesselsolltemperatur'::text) > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > BTW: Schemadata is in the links discussed in the thread > > Thnx to all for helping me. np -- this felt particularly satisfying for some reason. btw, I think you have some more low hanging optimization fruit. I think (although it would certainly have to be tested) hiding your attribute description under keyid is buying you nothing but headaches. If you used natural key style, making description primary key of key_description (or unique), and had log_details have a description column that directly referenced that column, your subquery: ( SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = ( SELECT keyid FROM key_description WHERE description = 'Kesselsolltemperatur' ) ) AS Kesselsolltemperatur, would look like this: ( SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.description = 'Kesselsolltemperatur' ) AS Kesselsolltemperatur, your index on log_details(fk_id, description) is of course fatter, but quite precise...does require rebuilding your entire dataset however. food for thought. merlin
On Tue, 14 Sep 2010, Merlin Moncure wrote: > np -- this felt particularly satisfying for some reason. btw, I think > you have some more low hanging optimization fruit. I think (although > it would certainly have to be tested) hiding your attribute > description under keyid is buying you nothing but headaches. If you > used natural key style, making description primary key of > key_description (or unique), and had log_details have a description > column that directly referenced that column, your subquery: > > ( > SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = > ( > SELECT keyid FROM key_description WHERE description = 'Kesselsolltemperatur' > ) > ) AS Kesselsolltemperatur, > > would look like this: > ( > SELECT value FROM log_details d WHERE l.id = d.fk_id AND > d.description = 'Kesselsolltemperatur' > ) AS Kesselsolltemperatur, > > your index on log_details(fk_id, description) is of course fatter, but > quite precise...does require rebuilding your entire dataset however. > food for thought. I think your suggestion might be slower because the WHERE clause and possible JOINS with BIGINT is much faster (especially when a lot of data is queried) than with a VARCHAR. With the latest query plan key_description is only queried once per subselect which is perfect. I've also chosen that indirection that I can change description without changing too much in data model and all data rows on refactoring. @Tom: Do you think of planner enhancements regarding such situations where JOINS are "converted" to subselects? BTW: I had a small bug in the queries and in the code that one description was wrong (one space too much: 'Meldung F4 2. Zeile' => 'Meldung F4 2. Zeile'). With this indirect data model this is very easy to change: Change the view and change one code line. With your suggested data model I would have to update millions of rows ... Ciao, Gerhard -- http://www.wiesinger.com/
On Wed, Sep 15, 2010 at 2:32 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote: > On Tue, 14 Sep 2010, Merlin Moncure wrote: >> >> np -- this felt particularly satisfying for some reason. btw, I think >> you have some more low hanging optimization fruit. I think (although >> it would certainly have to be tested) hiding your attribute >> description under keyid is buying you nothing but headaches. If you >> used natural key style, making description primary key of >> key_description (or unique), and had log_details have a description >> column that directly referenced that column, your subquery: >> >> ( >> SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = >> ( >> SELECT keyid FROM key_description WHERE description = >> 'Kesselsolltemperatur' >> ) >> ) AS Kesselsolltemperatur, >> >> would look like this: >> ( >> SELECT value FROM log_details d WHERE l.id = d.fk_id AND >> d.description = 'Kesselsolltemperatur' >> ) AS Kesselsolltemperatur, >> >> your index on log_details(fk_id, description) is of course fatter, but >> quite precise...does require rebuilding your entire dataset however. >> food for thought. > > I think your suggestion might be slower because the WHERE clause and > possible JOINS with BIGINT is much faster (especially when a lot of data is > queried) than with a VARCHAR. With the latest query plan key_description is > only queried once per subselect which is perfect. I've also chosen that > indirection that I can change description without changing too much in data > model and all data rows on refactoring. You're not joining -- you're filtering (and your assumption that bigint is always going to be faster is quite debatable depending on circumstances). The join is skipped because of the key (yes, it's cheap lookup, but w/50 columns each doing it, nothing is cheap). merlin
On Wed, 15 Sep 2010, Merlin Moncure wrote: > On Wed, Sep 15, 2010 at 2:32 AM, Gerhard Wiesinger <lists@wiesinger.com> wrote: >> On Tue, 14 Sep 2010, Merlin Moncure wrote: >>> >>> np -- this felt particularly satisfying for some reason. btw, I think >>> you have some more low hanging optimization fruit. I think (although >>> it would certainly have to be tested) hiding your attribute >>> description under keyid is buying you nothing but headaches. If you >>> used natural key style, making description primary key of >>> key_description (or unique), and had log_details have a description >>> column that directly referenced that column, your subquery: >>> >>> ( >>> SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid = >>> ( >>> SELECT keyid FROM key_description WHERE description = >>> 'Kesselsolltemperatur' >>> ) >>> ) AS Kesselsolltemperatur, >>> >>> would look like this: >>> ( >>> SELECT value FROM log_details d WHERE l.id = d.fk_id AND >>> d.description = 'Kesselsolltemperatur' >>> ) AS Kesselsolltemperatur, >>> >>> your index on log_details(fk_id, description) is of course fatter, but >>> quite precise...does require rebuilding your entire dataset however. >>> food for thought. >> >> I think your suggestion might be slower because the WHERE clause and >> possible JOINS with BIGINT is much faster (especially when a lot of data is >> queried) than with a VARCHAR. With the latest query plan key_description is >> only queried once per subselect which is perfect. I've also chosen that >> indirection that I can change description without changing too much in data >> model and all data rows on refactoring. > > You're not joining -- you're filtering (and your assumption that > bigint is always going to be faster is quite debatable depending on > circumstances). The join is skipped because of the key (yes, it's > cheap lookup, but w/50 columns each doing it, nothing is cheap). I know that I'm not JOINing in that case - as discussed I ment possible JOINs in other query scenarios. BTW: Latest query plan is also optimal that only the used columns from the view are evaluated. With the full joined version all columns where used even when dropped in the result-set, e.g.: SELECT col1, col2 FROM view1; -- Equivalent to SELECT * FROM view1; as col1, col2 are all colums in that view SELECT col1 FROM view1; -- less effort with subselects when less columns are needed, joins have same "full view" effort here Ciao, Gerhard -- http://www.wiesinger.com/
Hi, I had a similar problem with many left join, reading about planning optimization i tried to edit postgresql.conf and uncommented the line join_collapse_limit = 8 and set it to 1, disables collapsing of explicit . My query its taking 2000s in 8.4 and the same query 2ms in 8.3. Now its working fast in 8.4. Best regards, Marc -- View this message in context: http://postgresql.1045698.n5.nabble.com/Major-performance-problem-after-upgrade-from-8-3-to-8-4-tp2796390p3329435.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.