Thread: Pgsql (and mysql) benchmark on T2000/Solaris and some profiling
Hi List, In the past few weeks we have been developing a read-heavy mysql-benchmark to have an alternative take at cpu/platform-performance. Not really to have a look at how fast mysql can be. This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled after our website's production database and the load generated on it is modelled after a simplified version of our visitor behaviour. Long story short, we think the test is a nice example of the relatively lightweight, read-heavy webapplications out there and therefore decided to have a go at postgresql as well. Of course the queries and indexes have been adjusted to (by our knowledge) best suit postgresql, while maintaining the same output to the application/interface layer. While the initial structure only got postgresql at about half the performance of mysql 4.1.x, the current version of our postgresql-benchmark has quite similar results to mysql 4.1.x, but both are quite a bit slower than 5.0.x (I think its about 30-40% faster). Since the results from those benchmarks are not yet public (they will be put together in a story at our website), I won't go into too much details about this benchmark. Currently we're having a look at a Sun T2000 and will be looking at will be looking at other machines as well in the future. We are running the sun-release of postgresql 8.1.3 on that T2000 now, but are looking at compiling the cvs-head version (for its index-root-cache) somewhere this week. My guess is there are a few people on this list who are interested in some dtrace results taken during our benchmarks on that T2000. Although my knowledge of both Solaris and Dtrace are very limited, I already took some samples of the system and user calls. I used Jignesh Shah's scripts for that: http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on You can find the samples here: http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log And I also did the memcpy-scripts, here: http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log (this last log is 3.5MB) If anyone is interested in some more dtrace results, let me know (and tell me what commands to run ;-) ). Best regards, Arjen
"Arjen van der Meijden" <acmmailing@tweakers.net> wrote > > Long story short, we think the test is a nice example of the relatively > lightweight, read-heavy webapplications out there and therefore decided > to have a go at postgresql as well. > Some sort of web query behavior is quite optimized in MySQL. For example, the query below is runing very fast due to the query result cache implementation in MySQL. Loop N times SELECT * FROM A WHERE i = 1; End loop. > You can find the samples here: > http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log > http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log > IMHO, without knowing the exact queries you sent, these logs are not very useful :-(. I would suggest you compare the queries in pair and then post their dtrace/timing results here (just like the previous Firebird vs. PostgreSQL comparison did). Regards, Qingqing
Qingqing Zhou wrote: > "Arjen van der Meijden" <acmmailing@tweakers.net> wrote > Some sort of web query behavior is quite optimized in MySQL. For example, > the query below is runing very fast due to the query result cache > implementation in MySQL. > > Loop N times > SELECT * FROM A WHERE i = 1; > End loop. Yeah, I know. But our queries get random parameters though for identifiers and the like, so its not just a few queries getting executed a lot of times, there are. In a run for which I just logged all queries, almost 42k distinct queries executed from 128k in total (it may actually be more random than real life). Besides that, they are not so extremely simple queries as your example. Most join at least two tables, while the rest often joins three to five. But I agree, MySQL has a big advantage with its query result cache. That makes the current performance of postgresql even more impressive in this situation, since the query cache of the 4.1.x run was enabled as well. > IMHO, without knowing the exact queries you sent, these logs are not very > useful :-(. I would suggest you compare the queries in pair and then post > their dtrace/timing results here (just like the previous Firebird vs. > PostgreSQL comparison did). Well, I'm bound to some privacy and copyright laws, but I'll see if I can show some example plans of at least the top few queries later today (the top two is resp 27% and 21% of the total time). But those top queries aren't the only ones run during the benchmarks or in the production environment, nor are they run exclusively at any given time. So the overall load-picture should be usefull too, shouldn't it? Best regards, Arjen
Hi Arjen, Looking at your outputs...of syscall and usrcall it looks like * Spending too much time in semsys .... which means you have too many connections and they are contending to get a lock.. which is potentially the WAL log lock * llseek is high which means you can obviously gain a bit with the right file system/files tuning by caching them right. Have you set the values for Solaris for T2000 tuned for Postgresql? Check out the tunables from the following URL http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp Try specially the /etc/system and postgresql.conf changes and see if it changes/improves your performance. Regards, Jignesh Arjen van der Meijden wrote: > Hi List, > > In the past few weeks we have been developing a read-heavy > mysql-benchmark to have an alternative take at cpu/platform-performance. > Not really to have a look at how fast mysql can be. > > This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled > after our website's production database and the load generated on it is > modelled after a simplified version of our visitor behaviour. > > Long story short, we think the test is a nice example of the relatively > lightweight, read-heavy webapplications out there and therefore decided > to have a go at postgresql as well. > Of course the queries and indexes have been adjusted to (by our > knowledge) best suit postgresql, while maintaining the same output to > the application/interface layer. While the initial structure only got > postgresql at about half the performance of mysql 4.1.x, the current > version of our postgresql-benchmark has quite similar results to mysql > 4.1.x, but both are quite a bit slower than 5.0.x (I think its about > 30-40% faster). > > Since the results from those benchmarks are not yet public (they will be > put together in a story at our website), I won't go into too much > details about this benchmark. > > Currently we're having a look at a Sun T2000 and will be looking at will > be looking at other machines as well in the future. We are running the > sun-release of postgresql 8.1.3 on that T2000 now, but are looking at > compiling the cvs-head version (for its index-root-cache) somewhere this > week. > > My guess is there are a few people on this list who are interested in > some dtrace results taken during our benchmarks on that T2000. > Although my knowledge of both Solaris and Dtrace are very limited, I > already took some samples of the system and user calls. I used Jignesh > Shah's scripts for that: > http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on > > > You can find the samples here: > http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log > http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log > > And I also did the memcpy-scripts, here: > http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log > http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log > (this last log is 3.5MB) > > If anyone is interested in some more dtrace results, let me know (and > tell me what commands to run ;-) ). > > Best regards, > > Arjen > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
Hi Jignesh, Jignesh K. Shah wrote: > Hi Arjen, > > Looking at your outputs...of syscall and usrcall it looks like > > * Spending too much time in semsys .... which means you have too many > connections and they are contending to get a lock.. which is potentially > the WAL log lock > > * llseek is high which means you can obviously gain a bit with the right > file system/files tuning by caching them right. > > Have you set the values for Solaris for T2000 tuned for Postgresql? Not particularly, we got a "special T2000 Solaris dvd" from your colleagues here in the Netherlands and installed that (actually one of your colleagues did). Doing so all the "better default" /etc/system-settings are supposed to be set. I haven't really checked that they are, since two of your colleagues have been working on it for the mysql-version of the benchmark and I assumed they'd have verified that. > Check out the tunables from the following URL > > http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp > > Try specially the /etc/system and postgresql.conf changes and see if it > changes/improves your performance. I will see that those tunables are verified to be set. I am a bit surprised though about your remarks, since they'd point at the I/O being in the way? But we only have about 600k/sec i/o according to vmstat. The database easily fits in memory. In total I logged about 500k queries of which only 70k where altering queries, of which almost all where inserts in log-tables which aren't actively read in this benchmark. But I'll give it a try. Best regards, Arjen > > Arjen van der Meijden wrote: >> Hi List, >> >> In the past few weeks we have been developing a read-heavy >> mysql-benchmark to have an alternative take at >> cpu/platform-performance. Not really to have a look at how fast mysql >> can be. >> >> This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled >> after our website's production database and the load generated on it >> is modelled after a simplified version of our visitor behaviour. >> >> Long story short, we think the test is a nice example of the >> relatively lightweight, read-heavy webapplications out there and >> therefore decided to have a go at postgresql as well. >> Of course the queries and indexes have been adjusted to (by our >> knowledge) best suit postgresql, while maintaining the same output to >> the application/interface layer. While the initial structure only got >> postgresql at about half the performance of mysql 4.1.x, the current >> version of our postgresql-benchmark has quite similar results to mysql >> 4.1.x, but both are quite a bit slower than 5.0.x (I think its about >> 30-40% faster). >> >> Since the results from those benchmarks are not yet public (they will >> be put together in a story at our website), I won't go into too much >> details about this benchmark. >> >> Currently we're having a look at a Sun T2000 and will be looking at >> will be looking at other machines as well in the future. We are >> running the sun-release of postgresql 8.1.3 on that T2000 now, but are >> looking at compiling the cvs-head version (for its index-root-cache) >> somewhere this week. >> >> My guess is there are a few people on this list who are interested in >> some dtrace results taken during our benchmarks on that T2000. >> Although my knowledge of both Solaris and Dtrace are very limited, I >> already took some samples of the system and user calls. I used Jignesh >> Shah's scripts for that: >> http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on >> >> >> You can find the samples here: >> http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log >> http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log >> >> And I also did the memcpy-scripts, here: >> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log >> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log >> (this last log is 3.5MB) >> >> If anyone is interested in some more dtrace results, let me know (and >> tell me what commands to run ;-) ). >> >> Best regards, >> >> Arjen >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 3: Have you checked our extensive FAQ? >> >> http://www.postgresql.org/docs/faq > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >
Hi Arjen, Can you send me my colleagues's names in a private email? One of the drawbacks of the syscall.d script is relative timings and hence if system CPU usage is very low, it gives the relative weightage about what portion in that low is associated with what call.. So even if you have say..1% system time.. it says that most of it was IO related or semsys related. So iostat output with -c option to include CPU times helps to put it in the right perspective. Also do check the tunables mentioned and make sure they are set. Regards, Jignesh Arjen van der Meijden wrote: > Hi Jignesh, > > Jignesh K. Shah wrote: > >> Hi Arjen, >> >> Looking at your outputs...of syscall and usrcall it looks like >> >> * Spending too much time in semsys .... which means you have too many >> connections and they are contending to get a lock.. which is >> potentially the WAL log lock >> >> * llseek is high which means you can obviously gain a bit with the >> right file system/files tuning by caching them right. >> >> Have you set the values for Solaris for T2000 tuned for Postgresql? > > > Not particularly, we got a "special T2000 Solaris dvd" from your > colleagues here in the Netherlands and installed that (actually one of > your colleagues did). Doing so all the "better default" > /etc/system-settings are supposed to be set. I haven't really checked > that they are, since two of your colleagues have been working on it > for the mysql-version of the benchmark and I assumed they'd have > verified that. > >> Check out the tunables from the following URL >> >> http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp >> >> Try specially the /etc/system and postgresql.conf changes and see if >> it changes/improves your performance. > > > I will see that those tunables are verified to be set. > > I am a bit surprised though about your remarks, since they'd point at > the I/O being in the way? But we only have about 600k/sec i/o > according to vmstat. The database easily fits in memory. > In total I logged about 500k queries of which only 70k where altering > queries, of which almost all where inserts in log-tables which aren't > actively read in this benchmark. > > But I'll give it a try. > > Best regards, > > Arjen > >> >> Arjen van der Meijden wrote: >> >>> Hi List, >>> >>> In the past few weeks we have been developing a read-heavy >>> mysql-benchmark to have an alternative take at >>> cpu/platform-performance. Not really to have a look at how fast >>> mysql can be. >>> >>> This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled >>> after our website's production database and the load generated on it >>> is modelled after a simplified version of our visitor behaviour. >>> >>> Long story short, we think the test is a nice example of the >>> relatively lightweight, read-heavy webapplications out there and >>> therefore decided to have a go at postgresql as well. >>> Of course the queries and indexes have been adjusted to (by our >>> knowledge) best suit postgresql, while maintaining the same output >>> to the application/interface layer. While the initial structure only >>> got postgresql at about half the performance of mysql 4.1.x, the >>> current version of our postgresql-benchmark has quite similar >>> results to mysql 4.1.x, but both are quite a bit slower than 5.0.x >>> (I think its about 30-40% faster). >>> >>> Since the results from those benchmarks are not yet public (they >>> will be put together in a story at our website), I won't go into too >>> much details about this benchmark. >>> >>> Currently we're having a look at a Sun T2000 and will be looking at >>> will be looking at other machines as well in the future. We are >>> running the sun-release of postgresql 8.1.3 on that T2000 now, but >>> are looking at compiling the cvs-head version (for its >>> index-root-cache) somewhere this week. >>> >>> My guess is there are a few people on this list who are interested >>> in some dtrace results taken during our benchmarks on that T2000. >>> Although my knowledge of both Solaris and Dtrace are very limited, I >>> already took some samples of the system and user calls. I used >>> Jignesh Shah's scripts for that: >>> http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on >>> >>> >>> You can find the samples here: >>> http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log >>> http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log >>> >>> And I also did the memcpy-scripts, here: >>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log >>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log >>> (this last log is 3.5MB) >>> >>> If anyone is interested in some more dtrace results, let me know >>> (and tell me what commands to run ;-) ). >>> >>> Best regards, >>> >>> Arjen >>> >>> ---------------------------(end of >>> broadcast)--------------------------- >>> TIP 3: Have you checked our extensive FAQ? >>> >>> http://www.postgresql.org/docs/faq >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 6: explain analyze is your friend >> > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster
Hi Jignesh, The settings from that 'special T2000 dvd' differed from the recommended settings on the website you provided. But I don't see much difference in performance with any of the adjustments, it appears to be more or less the same. Here are a few iostat lines by the way: sd0 sd1 sd2 nfs1 cpu kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id 7 1 12 958 50 35 0 0 7 0 0 0 13 1 0 85 0 0 0 2353 296 3 0 0 0 0 0 0 92 7 0 1 0 0 0 2062 326 2 0 0 0 0 0 0 93 7 0 0 1 1 1 1575 350 0 0 0 0 0 0 0 92 7 0 1 0 0 0 1628 362 0 0 0 0 0 0 0 92 8 0 1 It appears to be doing a little less kps/tps on sd1 when I restore my own postgresql.conf-settings. (default wal/checkpoints, 20k buffers, 2k work mem). Is it possible to trace the stack's for semsys, like the memcpy-traces, or are those of no interest here? Best regards, Arjen On 16-5-2006 17:52, Jignesh K. Shah wrote: > Hi Arjen, > > Can you send me my colleagues's names in a private email? > > One of the drawbacks of the syscall.d script is relative timings and > hence if system CPU usage is very low, it gives the relative weightage > about what portion in that low is associated with what call.. So even if > you have say..1% system time.. it says that most of it was IO related or > semsys related. So iostat output with -c option to include CPU times > helps to put it in the right perspective. > > > Also do check the tunables mentioned and make sure they are set. > > Regards, > Jignesh > > > Arjen van der Meijden wrote: > >> Hi Jignesh, >> >> Jignesh K. Shah wrote: >> >>> Hi Arjen, >>> >>> Looking at your outputs...of syscall and usrcall it looks like >>> >>> * Spending too much time in semsys .... which means you have too many >>> connections and they are contending to get a lock.. which is >>> potentially the WAL log lock >>> >>> * llseek is high which means you can obviously gain a bit with the >>> right file system/files tuning by caching them right. >>> >>> Have you set the values for Solaris for T2000 tuned for Postgresql? >> >> >> Not particularly, we got a "special T2000 Solaris dvd" from your >> colleagues here in the Netherlands and installed that (actually one of >> your colleagues did). Doing so all the "better default" >> /etc/system-settings are supposed to be set. I haven't really checked >> that they are, since two of your colleagues have been working on it >> for the mysql-version of the benchmark and I assumed they'd have >> verified that. >> >>> Check out the tunables from the following URL >>> >>> http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp >>> >>> Try specially the /etc/system and postgresql.conf changes and see if >>> it changes/improves your performance. >> >> >> I will see that those tunables are verified to be set. >> >> I am a bit surprised though about your remarks, since they'd point at >> the I/O being in the way? But we only have about 600k/sec i/o >> according to vmstat. The database easily fits in memory. >> In total I logged about 500k queries of which only 70k where altering >> queries, of which almost all where inserts in log-tables which aren't >> actively read in this benchmark. >> >> But I'll give it a try. >> >> Best regards, >> >> Arjen >> >>> >>> Arjen van der Meijden wrote: >>> >>>> Hi List, >>>> >>>> In the past few weeks we have been developing a read-heavy >>>> mysql-benchmark to have an alternative take at >>>> cpu/platform-performance. Not really to have a look at how fast >>>> mysql can be. >>>> >>>> This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled >>>> after our website's production database and the load generated on it >>>> is modelled after a simplified version of our visitor behaviour. >>>> >>>> Long story short, we think the test is a nice example of the >>>> relatively lightweight, read-heavy webapplications out there and >>>> therefore decided to have a go at postgresql as well. >>>> Of course the queries and indexes have been adjusted to (by our >>>> knowledge) best suit postgresql, while maintaining the same output >>>> to the application/interface layer. While the initial structure only >>>> got postgresql at about half the performance of mysql 4.1.x, the >>>> current version of our postgresql-benchmark has quite similar >>>> results to mysql 4.1.x, but both are quite a bit slower than 5.0.x >>>> (I think its about 30-40% faster). >>>> >>>> Since the results from those benchmarks are not yet public (they >>>> will be put together in a story at our website), I won't go into too >>>> much details about this benchmark. >>>> >>>> Currently we're having a look at a Sun T2000 and will be looking at >>>> will be looking at other machines as well in the future. We are >>>> running the sun-release of postgresql 8.1.3 on that T2000 now, but >>>> are looking at compiling the cvs-head version (for its >>>> index-root-cache) somewhere this week. >>>> >>>> My guess is there are a few people on this list who are interested >>>> in some dtrace results taken during our benchmarks on that T2000. >>>> Although my knowledge of both Solaris and Dtrace are very limited, I >>>> already took some samples of the system and user calls. I used >>>> Jignesh Shah's scripts for that: >>>> http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on >>>> >>>> >>>> You can find the samples here: >>>> http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log >>>> http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log >>>> >>>> And I also did the memcpy-scripts, here: >>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log >>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log >>>> (this last log is 3.5MB) >>>> >>>> If anyone is interested in some more dtrace results, let me know >>>> (and tell me what commands to run ;-) ). >>>> >>>> Best regards, >>>> >>>> Arjen >>>> >>>> ---------------------------(end of >>>> broadcast)--------------------------- >>>> TIP 3: Have you checked our extensive FAQ? >>>> >>>> http://www.postgresql.org/docs/faq >>> >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 6: explain analyze is your friend >>> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 2: Don't 'kill -9' the postmaster > >
You usertime is way too high for T2000... If you have a 6 core machine with 24 threads, it says all 24 threads are reported as being busy with iostat output. Best way to debug this is use prstat -amL (or if you are dumping it in a file prstat -amLc > prstat.txt) and find the pids with high user cpu time and then use the usrcall.d on few of those pids. Also how many database connections do you have and what's the type of query run by each connection? -Jignesh Arjen van der Meijden wrote: > Hi Jignesh, > > The settings from that 'special T2000 dvd' differed from the recommended > settings on the website you provided. But I don't see much difference in > performance with any of the adjustments, it appears to be more or less > the same. > > Here are a few iostat lines by the way: > > sd0 sd1 sd2 nfs1 cpu > kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id > 7 1 12 958 50 35 0 0 7 0 0 0 13 1 0 85 > 0 0 0 2353 296 3 0 0 0 0 0 0 92 7 0 1 > 0 0 0 2062 326 2 0 0 0 0 0 0 93 7 0 0 > 1 1 1 1575 350 0 0 0 0 0 0 0 92 7 0 1 > 0 0 0 1628 362 0 0 0 0 0 0 0 92 8 0 1 > > It appears to be doing a little less kps/tps on sd1 when I restore my > own postgresql.conf-settings. (default wal/checkpoints, 20k buffers, 2k > work mem). > > Is it possible to trace the stack's for semsys, like the memcpy-traces, > or are those of no interest here? > > Best regards, > > Arjen > > > On 16-5-2006 17:52, Jignesh K. Shah wrote: > >> Hi Arjen, >> >> Can you send me my colleagues's names in a private email? >> >> One of the drawbacks of the syscall.d script is relative timings and >> hence if system CPU usage is very low, it gives the relative weightage >> about what portion in that low is associated with what call.. So even >> if you have say..1% system time.. it says that most of it was IO >> related or semsys related. So iostat output with -c option to include >> CPU times helps to put it in the right perspective. >> >> >> Also do check the tunables mentioned and make sure they are set. >> >> Regards, >> Jignesh >> >> >> Arjen van der Meijden wrote: >> >>> Hi Jignesh, >>> >>> Jignesh K. Shah wrote: >>> >>>> Hi Arjen, >>>> >>>> Looking at your outputs...of syscall and usrcall it looks like >>>> >>>> * Spending too much time in semsys .... which means you have too >>>> many connections and they are contending to get a lock.. which is >>>> potentially the WAL log lock >>>> >>>> * llseek is high which means you can obviously gain a bit with the >>>> right file system/files tuning by caching them right. >>>> >>>> Have you set the values for Solaris for T2000 tuned for Postgresql? >>> >>> >>> >>> Not particularly, we got a "special T2000 Solaris dvd" from your >>> colleagues here in the Netherlands and installed that (actually one >>> of your colleagues did). Doing so all the "better default" >>> /etc/system-settings are supposed to be set. I haven't really checked >>> that they are, since two of your colleagues have been working on it >>> for the mysql-version of the benchmark and I assumed they'd have >>> verified that. >>> >>>> Check out the tunables from the following URL >>>> >>>> http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp >>>> >>>> Try specially the /etc/system and postgresql.conf changes and see >>>> if it changes/improves your performance. >>> >>> >>> >>> I will see that those tunables are verified to be set. >>> >>> I am a bit surprised though about your remarks, since they'd point at >>> the I/O being in the way? But we only have about 600k/sec i/o >>> according to vmstat. The database easily fits in memory. >>> In total I logged about 500k queries of which only 70k where altering >>> queries, of which almost all where inserts in log-tables which aren't >>> actively read in this benchmark. >>> >>> But I'll give it a try. >>> >>> Best regards, >>> >>> Arjen >>> >>>> >>>> Arjen van der Meijden wrote: >>>> >>>>> Hi List, >>>>> >>>>> In the past few weeks we have been developing a read-heavy >>>>> mysql-benchmark to have an alternative take at >>>>> cpu/platform-performance. Not really to have a look at how fast >>>>> mysql can be. >>>>> >>>>> This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled >>>>> after our website's production database and the load generated on >>>>> it is modelled after a simplified version of our visitor behaviour. >>>>> >>>>> Long story short, we think the test is a nice example of the >>>>> relatively lightweight, read-heavy webapplications out there and >>>>> therefore decided to have a go at postgresql as well. >>>>> Of course the queries and indexes have been adjusted to (by our >>>>> knowledge) best suit postgresql, while maintaining the same output >>>>> to the application/interface layer. While the initial structure >>>>> only got postgresql at about half the performance of mysql 4.1.x, >>>>> the current version of our postgresql-benchmark has quite similar >>>>> results to mysql 4.1.x, but both are quite a bit slower than 5.0.x >>>>> (I think its about 30-40% faster). >>>>> >>>>> Since the results from those benchmarks are not yet public (they >>>>> will be put together in a story at our website), I won't go into >>>>> too much details about this benchmark. >>>>> >>>>> Currently we're having a look at a Sun T2000 and will be looking at >>>>> will be looking at other machines as well in the future. We are >>>>> running the sun-release of postgresql 8.1.3 on that T2000 now, but >>>>> are looking at compiling the cvs-head version (for its >>>>> index-root-cache) somewhere this week. >>>>> >>>>> My guess is there are a few people on this list who are interested >>>>> in some dtrace results taken during our benchmarks on that T2000. >>>>> Although my knowledge of both Solaris and Dtrace are very limited, >>>>> I already took some samples of the system and user calls. I used >>>>> Jignesh Shah's scripts for that: >>>>> http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on >>>>> >>>>> >>>>> You can find the samples here: >>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log >>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log >>>>> >>>>> And I also did the memcpy-scripts, here: >>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log >>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log >>>>> (this last log is 3.5MB) >>>>> >>>>> If anyone is interested in some more dtrace results, let me know >>>>> (and tell me what commands to run ;-) ). >>>>> >>>>> Best regards, >>>>> >>>>> Arjen >>>>> >>>>> ---------------------------(end of >>>>> broadcast)--------------------------- >>>>> TIP 3: Have you checked our extensive FAQ? >>>>> >>>>> http://www.postgresql.org/docs/faq >>>> >>>> >>>> >>>> ---------------------------(end of >>>> broadcast)--------------------------- >>>> TIP 6: explain analyze is your friend >>>> >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 2: Don't 'kill -9' the postmaster >> >> >>
We have the 4 core machine. However, these numbers are taken during a benchmark, not normal work load. So the output should display the system being working fully ;) So its postgres doing a lot of work and you already had a look at the usrcall for that. The benchmark just tries to do the queries for "random page visits". This totals up to about some 80 different queries being executed with mostly random parameters. The workload is generated using php so there are no connection pools, nor prepared statements. The queries vary, but are all relatively lightweight queries with less than 6 or 7 joinable tables. Almost all queries can use indexes. Most tables are under a few MB of data, although there are a few larger than that. Most records are relatively small, consisting of mostly numbers (id's and such). The results presented here was with 25 concurrent connections. Best regards, Arjen Jignesh K. Shah wrote: > You usertime is way too high for T2000... > > If you have a 6 core machine with 24 threads, it says all 24 threads are > reported as being busy with iostat output. > > Best way to debug this is use > > prstat -amL > (or if you are dumping it in a file prstat -amLc > prstat.txt) > > and find the pids with high user cpu time and then use the usrcall.d on > few of those pids. > > Also how many database connections do you have and what's the type of > query run by each connection? > > -Jignesh > > > > Arjen van der Meijden wrote: >> Hi Jignesh, >> >> The settings from that 'special T2000 dvd' differed from the >> recommended settings on the website you provided. But I don't see much >> difference in performance with any of the adjustments, it appears to >> be more or less the same. >> >> Here are a few iostat lines by the way: >> >> sd0 sd1 sd2 nfs1 cpu >> kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id >> 7 1 12 958 50 35 0 0 7 0 0 0 13 1 0 85 >> 0 0 0 2353 296 3 0 0 0 0 0 0 92 7 0 1 >> 0 0 0 2062 326 2 0 0 0 0 0 0 93 7 0 0 >> 1 1 1 1575 350 0 0 0 0 0 0 0 92 7 0 1 >> 0 0 0 1628 362 0 0 0 0 0 0 0 92 8 0 1 >> >> It appears to be doing a little less kps/tps on sd1 when I restore my >> own postgresql.conf-settings. (default wal/checkpoints, 20k buffers, >> 2k work mem). >> >> Is it possible to trace the stack's for semsys, like the >> memcpy-traces, or are those of no interest here? >> >> Best regards, >> >> Arjen >> >> >> On 16-5-2006 17:52, Jignesh K. Shah wrote: >> >>> Hi Arjen, >>> >>> Can you send me my colleagues's names in a private email? >>> >>> One of the drawbacks of the syscall.d script is relative timings and >>> hence if system CPU usage is very low, it gives the relative >>> weightage about what portion in that low is associated with what >>> call.. So even if you have say..1% system time.. it says that most of >>> it was IO related or semsys related. So iostat output with -c option >>> to include CPU times helps to put it in the right perspective. >>> >>> >>> Also do check the tunables mentioned and make sure they are set. >>> >>> Regards, >>> Jignesh >>> >>> >>> Arjen van der Meijden wrote: >>> >>>> Hi Jignesh, >>>> >>>> Jignesh K. Shah wrote: >>>> >>>>> Hi Arjen, >>>>> >>>>> Looking at your outputs...of syscall and usrcall it looks like >>>>> >>>>> * Spending too much time in semsys .... which means you have too >>>>> many connections and they are contending to get a lock.. which is >>>>> potentially the WAL log lock >>>>> >>>>> * llseek is high which means you can obviously gain a bit with the >>>>> right file system/files tuning by caching them right. >>>>> >>>>> Have you set the values for Solaris for T2000 tuned for Postgresql? >>>> >>>> >>>> >>>> Not particularly, we got a "special T2000 Solaris dvd" from your >>>> colleagues here in the Netherlands and installed that (actually one >>>> of your colleagues did). Doing so all the "better default" >>>> /etc/system-settings are supposed to be set. I haven't really >>>> checked that they are, since two of your colleagues have been >>>> working on it for the mysql-version of the benchmark and I assumed >>>> they'd have verified that. >>>> >>>>> Check out the tunables from the following URL >>>>> >>>>> http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp >>>>> >>>>> Try specially the /etc/system and postgresql.conf changes and see >>>>> if it changes/improves your performance. >>>> >>>> >>>> >>>> I will see that those tunables are verified to be set. >>>> >>>> I am a bit surprised though about your remarks, since they'd point >>>> at the I/O being in the way? But we only have about 600k/sec i/o >>>> according to vmstat. The database easily fits in memory. >>>> In total I logged about 500k queries of which only 70k where >>>> altering queries, of which almost all where inserts in log-tables >>>> which aren't actively read in this benchmark. >>>> >>>> But I'll give it a try. >>>> >>>> Best regards, >>>> >>>> Arjen >>>> >>>>> >>>>> Arjen van der Meijden wrote: >>>>> >>>>>> Hi List, >>>>>> >>>>>> In the past few weeks we have been developing a read-heavy >>>>>> mysql-benchmark to have an alternative take at >>>>>> cpu/platform-performance. Not really to have a look at how fast >>>>>> mysql can be. >>>>>> >>>>>> This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is >>>>>> modelled after our website's production database and the load >>>>>> generated on it is modelled after a simplified version of our >>>>>> visitor behaviour. >>>>>> >>>>>> Long story short, we think the test is a nice example of the >>>>>> relatively lightweight, read-heavy webapplications out there and >>>>>> therefore decided to have a go at postgresql as well. >>>>>> Of course the queries and indexes have been adjusted to (by our >>>>>> knowledge) best suit postgresql, while maintaining the same output >>>>>> to the application/interface layer. While the initial structure >>>>>> only got postgresql at about half the performance of mysql 4.1.x, >>>>>> the current version of our postgresql-benchmark has quite similar >>>>>> results to mysql 4.1.x, but both are quite a bit slower than 5.0.x >>>>>> (I think its about 30-40% faster). >>>>>> >>>>>> Since the results from those benchmarks are not yet public (they >>>>>> will be put together in a story at our website), I won't go into >>>>>> too much details about this benchmark. >>>>>> >>>>>> Currently we're having a look at a Sun T2000 and will be looking >>>>>> at will be looking at other machines as well in the future. We are >>>>>> running the sun-release of postgresql 8.1.3 on that T2000 now, but >>>>>> are looking at compiling the cvs-head version (for its >>>>>> index-root-cache) somewhere this week. >>>>>> >>>>>> My guess is there are a few people on this list who are interested >>>>>> in some dtrace results taken during our benchmarks on that T2000. >>>>>> Although my knowledge of both Solaris and Dtrace are very limited, >>>>>> I already took some samples of the system and user calls. I used >>>>>> Jignesh Shah's scripts for that: >>>>>> http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on >>>>>> >>>>>> >>>>>> You can find the samples here: >>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log >>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log >>>>>> >>>>>> And I also did the memcpy-scripts, here: >>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log >>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log >>>>>> (this last log is 3.5MB) >>>>>> >>>>>> If anyone is interested in some more dtrace results, let me know >>>>>> (and tell me what commands to run ;-) ). >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Arjen >>>>>> >>>>>> ---------------------------(end of >>>>>> broadcast)--------------------------- >>>>>> TIP 3: Have you checked our extensive FAQ? >>>>>> >>>>>> http://www.postgresql.org/docs/faq >>>>> >>>>> >>>>> >>>>> ---------------------------(end of >>>>> broadcast)--------------------------- >>>>> TIP 6: explain analyze is your friend >>>>> >>>> >>>> ---------------------------(end of >>>> broadcast)--------------------------- >>>> TIP 2: Don't 'kill -9' the postmaster >>> >>> >>> > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >
I have seen MemoryContextSwitchTo taking time before.. However I am not sure why would it take so much CPU time? Maybe that function does not work efficiently on Solaris? Also I donot have much idea about slot_getattr. Anybody else? (Other option is to use "collect -p $pid" experiments to gather the data to figure out what instruction is causing the high CPU usage) Maybe the Sun engineers out there can help out -Jignesh Arjen van der Meijden wrote: > We have the 4 core machine. However, these numbers are taken during a > benchmark, not normal work load. So the output should display the system > being working fully ;) > > So its postgres doing a lot of work and you already had a look at the > usrcall for that. > > The benchmark just tries to do the queries for "random page visits". > This totals up to about some 80 different queries being executed with > mostly random parameters. The workload is generated using php so there > are no connection pools, nor prepared statements. > > The queries vary, but are all relatively lightweight queries with less > than 6 or 7 joinable tables. Almost all queries can use indexes. Most > tables are under a few MB of data, although there are a few larger than > that. Most records are relatively small, consisting of mostly numbers > (id's and such). > > The results presented here was with 25 concurrent connections. > > Best regards, > > Arjen > > > Jignesh K. Shah wrote: > >> You usertime is way too high for T2000... >> >> If you have a 6 core machine with 24 threads, it says all 24 threads >> are reported as being busy with iostat output. >> >> Best way to debug this is use >> >> prstat -amL >> (or if you are dumping it in a file prstat -amLc > prstat.txt) >> >> and find the pids with high user cpu time and then use the usrcall.d >> on few of those pids. >> >> Also how many database connections do you have and what's the type of >> query run by each connection? >> >> -Jignesh >> >> >> >> Arjen van der Meijden wrote: >> >>> Hi Jignesh, >>> >>> The settings from that 'special T2000 dvd' differed from the >>> recommended settings on the website you provided. But I don't see >>> much difference in performance with any of the adjustments, it >>> appears to be more or less the same. >>> >>> Here are a few iostat lines by the way: >>> >>> sd0 sd1 sd2 nfs1 cpu >>> kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id >>> 7 1 12 958 50 35 0 0 7 0 0 0 13 1 0 85 >>> 0 0 0 2353 296 3 0 0 0 0 0 0 92 7 0 1 >>> 0 0 0 2062 326 2 0 0 0 0 0 0 93 7 0 0 >>> 1 1 1 1575 350 0 0 0 0 0 0 0 92 7 0 1 >>> 0 0 0 1628 362 0 0 0 0 0 0 0 92 8 0 1 >>> >>> It appears to be doing a little less kps/tps on sd1 when I restore my >>> own postgresql.conf-settings. (default wal/checkpoints, 20k buffers, >>> 2k work mem). >>> >>> Is it possible to trace the stack's for semsys, like the >>> memcpy-traces, or are those of no interest here? >>> >>> Best regards, >>> >>> Arjen >>> >>> >>> On 16-5-2006 17:52, Jignesh K. Shah wrote: >>> >>>> Hi Arjen, >>>> >>>> Can you send me my colleagues's names in a private email? >>>> >>>> One of the drawbacks of the syscall.d script is relative timings and >>>> hence if system CPU usage is very low, it gives the relative >>>> weightage about what portion in that low is associated with what >>>> call.. So even if you have say..1% system time.. it says that most >>>> of it was IO related or semsys related. So iostat output with -c >>>> option to include CPU times helps to put it in the right perspective. >>>> >>>> >>>> Also do check the tunables mentioned and make sure they are set. >>>> >>>> Regards, >>>> Jignesh >>>> >>>> >>>> Arjen van der Meijden wrote: >>>> >>>>> Hi Jignesh, >>>>> >>>>> Jignesh K. Shah wrote: >>>>> >>>>>> Hi Arjen, >>>>>> >>>>>> Looking at your outputs...of syscall and usrcall it looks like >>>>>> >>>>>> * Spending too much time in semsys .... which means you have too >>>>>> many connections and they are contending to get a lock.. which is >>>>>> potentially the WAL log lock >>>>>> >>>>>> * llseek is high which means you can obviously gain a bit with the >>>>>> right file system/files tuning by caching them right. >>>>>> >>>>>> Have you set the values for Solaris for T2000 tuned for Postgresql? >>>>> >>>>> >>>>> >>>>> >>>>> Not particularly, we got a "special T2000 Solaris dvd" from your >>>>> colleagues here in the Netherlands and installed that (actually one >>>>> of your colleagues did). Doing so all the "better default" >>>>> /etc/system-settings are supposed to be set. I haven't really >>>>> checked that they are, since two of your colleagues have been >>>>> working on it for the mysql-version of the benchmark and I assumed >>>>> they'd have verified that. >>>>> >>>>>> Check out the tunables from the following URL >>>>>> >>>>>> http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp >>>>>> >>>>>> >>>>>> Try specially the /etc/system and postgresql.conf changes and see >>>>>> if it changes/improves your performance. >>>>> >>>>> >>>>> >>>>> >>>>> I will see that those tunables are verified to be set. >>>>> >>>>> I am a bit surprised though about your remarks, since they'd point >>>>> at the I/O being in the way? But we only have about 600k/sec i/o >>>>> according to vmstat. The database easily fits in memory. >>>>> In total I logged about 500k queries of which only 70k where >>>>> altering queries, of which almost all where inserts in log-tables >>>>> which aren't actively read in this benchmark. >>>>> >>>>> But I'll give it a try. >>>>> >>>>> Best regards, >>>>> >>>>> Arjen >>>>> >>>>>> >>>>>> Arjen van der Meijden wrote: >>>>>> >>>>>>> Hi List, >>>>>>> >>>>>>> In the past few weeks we have been developing a read-heavy >>>>>>> mysql-benchmark to have an alternative take at >>>>>>> cpu/platform-performance. Not really to have a look at how fast >>>>>>> mysql can be. >>>>>>> >>>>>>> This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is >>>>>>> modelled after our website's production database and the load >>>>>>> generated on it is modelled after a simplified version of our >>>>>>> visitor behaviour. >>>>>>> >>>>>>> Long story short, we think the test is a nice example of the >>>>>>> relatively lightweight, read-heavy webapplications out there and >>>>>>> therefore decided to have a go at postgresql as well. >>>>>>> Of course the queries and indexes have been adjusted to (by our >>>>>>> knowledge) best suit postgresql, while maintaining the same >>>>>>> output to the application/interface layer. While the initial >>>>>>> structure only got postgresql at about half the performance of >>>>>>> mysql 4.1.x, the current version of our postgresql-benchmark has >>>>>>> quite similar results to mysql 4.1.x, but both are quite a bit >>>>>>> slower than 5.0.x (I think its about 30-40% faster). >>>>>>> >>>>>>> Since the results from those benchmarks are not yet public (they >>>>>>> will be put together in a story at our website), I won't go into >>>>>>> too much details about this benchmark. >>>>>>> >>>>>>> Currently we're having a look at a Sun T2000 and will be looking >>>>>>> at will be looking at other machines as well in the future. We >>>>>>> are running the sun-release of postgresql 8.1.3 on that T2000 >>>>>>> now, but are looking at compiling the cvs-head version (for its >>>>>>> index-root-cache) somewhere this week. >>>>>>> >>>>>>> My guess is there are a few people on this list who are >>>>>>> interested in some dtrace results taken during our benchmarks on >>>>>>> that T2000. >>>>>>> Although my knowledge of both Solaris and Dtrace are very >>>>>>> limited, I already took some samples of the system and user >>>>>>> calls. I used Jignesh Shah's scripts for that: >>>>>>> http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on >>>>>>> >>>>>>> >>>>>>> You can find the samples here: >>>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log >>>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log >>>>>>> >>>>>>> And I also did the memcpy-scripts, here: >>>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log >>>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log >>>>>>> (this last log is 3.5MB) >>>>>>> >>>>>>> If anyone is interested in some more dtrace results, let me know >>>>>>> (and tell me what commands to run ;-) ). >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Arjen >>>>>>> >>>>>>> ---------------------------(end of >>>>>>> broadcast)--------------------------- >>>>>>> TIP 3: Have you checked our extensive FAQ? >>>>>>> >>>>>>> http://www.postgresql.org/docs/faq >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ---------------------------(end of >>>>>> broadcast)--------------------------- >>>>>> TIP 6: explain analyze is your friend >>>>>> >>>>> >>>>> ---------------------------(end of >>>>> broadcast)--------------------------- >>>>> TIP 2: Don't 'kill -9' the postmaster >>>> >>>> >>>> >>>> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 6: explain analyze is your friend >>
Here's a "corner case" that might interest someone. It tripped up one of our programmers. We have a table with > 10 million rows. The ID column is indexed, the table has been vacuum/analyzed. Compare these twoqueries: select * from tbl where id >= 10000000 limit 1; select * from tbl where id >= 10000000 order by id limit 1; The first takes 4 seconds, and uses a full table scan. The second takes 32 msec and uses the index. Details are below. I understand why the planner makes the choices it does -- the "id > 10000000" isn't very selective and under normal circumstancesa full table scan is probably the right choice. But the "limit 1" apparently doesn't alter the planner's strategyat all. We were surprised by this. Adding the "order by" was a simple solution. Craig pg=> explain analyze select url, url_digest from url_queue where priority >= 10000000 limit 1; QUERY PLAN ------------------------------------------------------------------------------------------ Limit (cost=0.00..0.65 rows=1 width=108) (actual time=4036.113..4036.117 rows=1 loops=1) -> Seq Scan on url_queue (cost=0.00..391254.35 rows=606176 width=108) (actual time=4036.101..4036.101 rows=1 loops=1) Filter: (priority >= 10000000) Total runtime: 4036.200 ms (4 rows) pg=> explain analyze select url, url_digest from url_queue where priority >= 10000000 order by priority limit 1; QUERY PLAN -------------------------------------------------------------------------------------- Limit (cost=0.00..2.38 rows=1 width=112) (actual time=32.445..32.448 rows=1 loops=1) -> Index Scan using url_queue_priority on url_queue (cost=0.00..1440200.41 rows=606176 width=112) (actual time=32.434..32.434rows=1 loops=1) Index Cond: (priority >= 10000000) Total runtime: 32.566 ms
Please don't reply to previous messages to start new threads. This makes it harder to find stuff in the archives and may keep people from noticing your message. On Wed, May 17, 2006 at 08:54:52 -0700, "Craig A. James" <cjames@modgraph-usa.com> wrote: > Here's a "corner case" that might interest someone. It tripped up one of > our programmers. > > We have a table with > 10 million rows. The ID column is indexed, the > table has been vacuum/analyzed. Compare these two queries: > > select * from tbl where id >= 10000000 limit 1; > select * from tbl where id >= 10000000 order by id limit 1; > > The first takes 4 seconds, and uses a full table scan. The second takes 32 > msec and uses the index. Details are below. I suspect it wasn't intended to be a full table scan. But rather a sequential scan until it found a matching row. If the data in the table is ordered by by id, this strategy may not work out well. Where as if the data is randomly ordered, it would be expected to find a match quickly. Have you analyzed the table recently? If the planner has bad stats on the table, that is going to make it more likely to choose a bad plan. > I understand why the planner makes the choices it does -- the "id > > 10000000" isn't very selective and under normal circumstances a full table > scan is probably the right choice. But the "limit 1" apparently doesn't > alter the planner's strategy at all. We were surprised by this. > > Adding the "order by" was a simple solution. > > Craig > > > > pg=> explain analyze select url, url_digest from url_queue where priority > >= 10000000 limit 1; > QUERY PLAN > ------------------------------------------------------------------------------------------ > Limit (cost=0.00..0.65 rows=1 width=108) (actual time=4036.113..4036.117 > rows=1 loops=1) > -> Seq Scan on url_queue (cost=0.00..391254.35 rows=606176 width=108) > (actual time=4036.101..4036.101 rows=1 loops=1) > Filter: (priority >= 10000000) > Total runtime: 4036.200 ms > (4 rows) > > pg=> explain analyze select url, url_digest from url_queue where priority > >= 10000000 order by priority limit 1; > QUERY PLAN > -------------------------------------------------------------------------------------- > Limit (cost=0.00..2.38 rows=1 width=112) (actual time=32.445..32.448 > rows=1 loops=1) > -> Index Scan using url_queue_priority on url_queue > (cost=0.00..1440200.41 rows=606176 width=112) (actual time=32.434..32.434 > rows=1 loops=1) > Index Cond: (priority >= 10000000) > Total runtime: 32.566 ms > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster
On Wed, 2006-05-17 at 08:54 -0700, Craig A. James wrote: > Here's a "corner case" that might interest someone. It tripped up one of our programmers. > > We have a table with > 10 million rows. The ID column is indexed, the table has been vacuum/analyzed. Compare these twoqueries: > > select * from tbl where id >= 10000000 limit 1; > select * from tbl where id >= 10000000 order by id limit 1; > > The first takes 4 seconds, and uses a full table scan. The second takes 32 msec and uses the index. > Details are below. The rows are not randomly distributed, so the SeqScan takes longer to find 1 row than the index scan. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Bruno Wolff III <bruno@wolff.to> writes: > I suspect it wasn't intended to be a full table scan. But rather a sequential > scan until it found a matching row. If the data in the table is ordered by > by id, this strategy may not work out well. Where as if the data is randomly > ordered, it would be expected to find a match quickly. Right. You can see from the differential in the estimates for the SeqScan and the Limit nodes that the planner is not expecting the seqscan to run to completion, but rather to find a matching row quite quickly. There is not anything in there that considers whether the table's physical order is so nonrandom that the search will take much longer than it would given uniform distribution. It might be possible to do something with the correlation statistic in simple cases ... regards, tom lane
Tom Lane wrote: > There is not anything in there that considers whether the table's > physical order is so nonrandom that the search will take much longer > than it would given uniform distribution. It might be possible to do > something with the correlation statistic in simple cases ... In this case, the rows are not random at all, in fact they're inserted from a sequence, then rows are deleted as they areprocessed. If the planner is hoping for random physical distribution, this particular case is exactly wrong. Craig
On Wed, May 17, 2006 at 08:54:52AM -0700, Craig A. James wrote: > Here's a "corner case" that might interest someone. It tripped up one of > our programmers. > > We have a table with > 10 million rows. The ID column is indexed, the > table has been vacuum/analyzed. Compare these two queries: > > select * from tbl where id >= 10000000 limit 1; > select * from tbl where id >= 10000000 order by id limit 1; > > The first takes 4 seconds, and uses a full table scan. The second takes 32 > msec and uses the index. Details are below. > > I understand why the planner makes the choices it does -- the "id > > 10000000" isn't very selective and under normal circumstances a full table > scan is probably the right choice. But the "limit 1" apparently doesn't > alter the planner's strategy at all. We were surprised by this. Is it really not very selective? If there's 10000000 rows in the table, and id starts at 1 with very few gaps, then >= 10000000 should actually be very selective... Also, I hope you understand there's a big difference between a limit query that does and doesn't have an order by. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
* Jignesh K. Shah: > * llseek is high which means you can obviously gain a bit with the > right file system/files tuning by caching them right. It might also make sense to switch from lseek-read/write to pread/pwrite. It shouldn't be too hard to hack this into the virtual file descriptor module.