Thread: Pgsql (and mysql) benchmark on T2000/Solaris and some profiling

Pgsql (and mysql) benchmark on T2000/Solaris and some profiling

From

Arjen van der Meijden

Date:

16 May 2006, 06:40:23

Hi List,

In the past few weeks we have been developing a read-heavy
mysql-benchmark to have an alternative take at cpu/platform-performance.
Not really to have a look at how fast mysql can be.

This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled
after our website's production database and the load generated on it is
modelled after a simplified version of our visitor behaviour.

Long story short, we think the test is a nice example of the relatively
lightweight, read-heavy webapplications out there and therefore decided
to have a go at postgresql as well.
Of course the queries and indexes have been adjusted to (by our
knowledge) best suit postgresql, while maintaining the same output to
the application/interface layer. While the initial structure only got
postgresql at about half the performance of mysql 4.1.x, the current
version of our postgresql-benchmark has quite similar results to mysql
4.1.x, but both are quite a bit slower than 5.0.x (I think its about
30-40% faster).

Since the results from those benchmarks are not yet public (they will be
put together in a story at our website), I won't go into too much
details about this benchmark.

Currently we're having a look at a Sun T2000 and will be looking at will
be looking at other machines as well in the future. We are running the
sun-release of postgresql 8.1.3 on that T2000 now, but are looking at
compiling the cvs-head version (for its index-root-cache) somewhere this
week.

My guess is there are a few people on this list who are interested in
some dtrace results taken during our benchmarks on that T2000.
Although my knowledge of both Solaris and Dtrace are very limited, I
already took some samples of the system and user calls. I used Jignesh
Shah's scripts for that:
http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on

You can find the samples here:
http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log

And I also did the memcpy-scripts, here:
http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log
http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log
(this last log is 3.5MB)

If anyone is interested in some more dtrace results, let me know (and
tell me what commands to run ;-) ).

Best regards,

Arjen

Re: Pgsql (and mysql) benchmark on T2000/Solaris and some profiling

From

"Qingqing Zhou"

Date:

16 May 2006, 07:05:23

"Arjen van der Meijden" <acmmailing@tweakers.net> wrote
>
> Long story short, we think the test is a nice example of the relatively
> lightweight, read-heavy webapplications out there and therefore decided
> to have a go at postgresql as well.
>

Some sort of web query behavior is quite optimized in MySQL. For example,
the query below is runing very fast due to the query result cache
implementation in MySQL.

Loop N times
    SELECT * FROM A WHERE i = 1;
End loop.

> You can find the samples here:
> http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
> http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log
>

IMHO, without knowing the exact queries you sent, these logs are not very
useful :-(. I would suggest you compare the queries in pair and then post
their dtrace/timing results here (just like the previous Firebird vs.
PostgreSQL comparison did).

Regards,
Qingqing

Re: Pgsql (and mysql) benchmark on T2000/Solaris and some

From

Arjen van der Meijden

Date:

16 May 2006, 07:48:08

Qingqing Zhou wrote:
> "Arjen van der Meijden" <acmmailing@tweakers.net> wrote
> Some sort of web query behavior is quite optimized in MySQL. For example,
> the query below is runing very fast due to the query result cache
> implementation in MySQL.
>
> Loop N times
>     SELECT * FROM A WHERE i = 1;
> End loop.

Yeah, I know. But our queries get random parameters though for
identifiers and the like, so its not just a few queries getting executed
a lot of times, there are. In a run for which I just logged all queries,
almost 42k distinct queries executed from 128k in total (it may actually
be more random than real life).
Besides that, they are not so extremely simple queries as your example.
Most join at least two tables, while the rest often joins three to five.

But I agree, MySQL has a big advantage with its query result cache. That
makes the current performance of postgresql even more impressive in this
situation, since the query cache of the 4.1.x run was enabled as well.

> IMHO, without knowing the exact queries you sent, these logs are not very
> useful :-(. I would suggest you compare the queries in pair and then post
> their dtrace/timing results here (just like the previous Firebird vs.
> PostgreSQL comparison did).

Well, I'm bound to some privacy and copyright laws, but I'll see if I
can show some example plans of at least the top few queries later today
(the top two is resp 27% and 21% of the total time).
But those top queries aren't the only ones run during the benchmarks or
in the production environment, nor are they run exclusively at any given
time. So the overall load-picture should be usefull too, shouldn't it?

Best regards,

Arjen

Re: Pgsql (and mysql) benchmark on T2000/Solaris and some

From

"Jignesh K. Shah"

Date:

16 May 2006, 10:20:25

Hi Arjen,

Looking at your outputs...of syscall and usrcall it looks like

* Spending too much time in semsys .... which means you have too many
connections and they are contending to get a lock.. which is potentially
the WAL log lock


* llseek is high which means you can obviously gain a bit with the right
file system/files tuning by caching them right.


Have you set the values for Solaris for T2000 tuned for Postgresql?

Check out the tunables from the following URL

http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp

Try specially the /etc/system and postgresql.conf changes  and see if it
changes/improves your performance.


Regards,
Jignesh


Arjen van der Meijden wrote:
> Hi List,
>
> In the past few weeks we have been developing a read-heavy
> mysql-benchmark to have an alternative take at cpu/platform-performance.
> Not really to have a look at how fast mysql can be.
>
> This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled
> after our website's production database and the load generated on it is
> modelled after a simplified version of our visitor behaviour.
>
> Long story short, we think the test is a nice example of the relatively
> lightweight, read-heavy webapplications out there and therefore decided
> to have a go at postgresql as well.
> Of course the queries and indexes have been adjusted to (by our
> knowledge) best suit postgresql, while maintaining the same output to
> the application/interface layer. While the initial structure only got
> postgresql at about half the performance of mysql 4.1.x, the current
> version of our postgresql-benchmark has quite similar results to mysql
> 4.1.x, but both are quite a bit slower than 5.0.x (I think its about
> 30-40% faster).
>
> Since the results from those benchmarks are not yet public (they will be
> put together in a story at our website), I won't go into too much
> details about this benchmark.
>
> Currently we're having a look at a Sun T2000 and will be looking at will
> be looking at other machines as well in the future. We are running the
> sun-release of postgresql 8.1.3 on that T2000 now, but are looking at
> compiling the cvs-head version (for its index-root-cache) somewhere this
> week.
>
> My guess is there are a few people on this list who are interested in
> some dtrace results taken during our benchmarks on that T2000.
> Although my knowledge of both Solaris and Dtrace are very limited, I
> already took some samples of the system and user calls. I used Jignesh
> Shah's scripts for that:
> http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on
>
>
> You can find the samples here:
> http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
> http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log
>
> And I also did the memcpy-scripts, here:
> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log
> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log
> (this last log is 3.5MB)
>
> If anyone is interested in some more dtrace results, let me know (and
> tell me what commands to run ;-) ).
>
> Best regards,
>
> Arjen
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq

Re: Pgsql (and mysql) benchmark on T2000/Solaris and some

From

Arjen van der Meijden

Date:

16 May 2006, 10:54:53

Hi Jignesh,

Jignesh K. Shah wrote:
> Hi Arjen,
>
> Looking at your outputs...of syscall and usrcall it looks like
>
> * Spending too much time in semsys .... which means you have too many
> connections and they are contending to get a lock.. which is potentially
> the WAL log lock
>
> * llseek is high which means you can obviously gain a bit with the right
> file system/files tuning by caching them right.
>
> Have you set the values for Solaris for T2000 tuned for Postgresql?

Not particularly, we got a "special T2000 Solaris dvd" from your
colleagues here in the Netherlands and installed that (actually one of
your colleagues did). Doing so all the "better default"
/etc/system-settings are supposed to be set. I haven't really checked
that they are, since two of your colleagues have been working on it for
the mysql-version of the benchmark and I assumed they'd have verified that.

> Check out the tunables from the following URL
>
> http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp
>
> Try specially the /etc/system and postgresql.conf changes  and see if it
> changes/improves your performance.

I will see that those tunables are verified to be set.

I am a bit surprised though about your remarks, since they'd point at
the I/O being in the way? But we only have about 600k/sec i/o according
to vmstat. The database easily fits in memory.
In total I logged about 500k queries of which only 70k where altering
queries, of which almost all where inserts in log-tables which aren't
actively read in this benchmark.

But I'll give it a try.

Best regards,

Arjen

>
> Arjen van der Meijden wrote:
>> Hi List,
>>
>> In the past few weeks we have been developing a read-heavy
>> mysql-benchmark to have an alternative take at
>> cpu/platform-performance. Not really to have a look at how fast mysql
>> can be.
>>
>> This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled
>> after our website's production database and the load generated on it
>> is modelled after a simplified version of our visitor behaviour.
>>
>> Long story short, we think the test is a nice example of the
>> relatively lightweight, read-heavy webapplications out there and
>> therefore decided to have a go at postgresql as well.
>> Of course the queries and indexes have been adjusted to (by our
>> knowledge) best suit postgresql, while maintaining the same output to
>> the application/interface layer. While the initial structure only got
>> postgresql at about half the performance of mysql 4.1.x, the current
>> version of our postgresql-benchmark has quite similar results to mysql
>> 4.1.x, but both are quite a bit slower than 5.0.x (I think its about
>> 30-40% faster).
>>
>> Since the results from those benchmarks are not yet public (they will
>> be put together in a story at our website), I won't go into too much
>> details about this benchmark.
>>
>> Currently we're having a look at a Sun T2000 and will be looking at
>> will be looking at other machines as well in the future. We are
>> running the sun-release of postgresql 8.1.3 on that T2000 now, but are
>> looking at compiling the cvs-head version (for its index-root-cache)
>> somewhere this week.
>>
>> My guess is there are a few people on this list who are interested in
>> some dtrace results taken during our benchmarks on that T2000.
>> Although my knowledge of both Solaris and Dtrace are very limited, I
>> already took some samples of the system and user calls. I used Jignesh
>> Shah's scripts for that:
>> http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on
>>
>>
>> You can find the samples here:
>> http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
>> http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log
>>
>> And I also did the memcpy-scripts, here:
>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log
>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log
>> (this last log is 3.5MB)
>>
>> If anyone is interested in some more dtrace results, let me know (and
>> tell me what commands to run ;-) ).
>>
>> Best regards,
>>
>> Arjen
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 3: Have you checked our extensive FAQ?
>>
>>               http://www.postgresql.org/docs/faq
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

Re: Pgsql (and mysql) benchmark on T2000/Solaris and some

From

"Jignesh K. Shah"

Date:

16 May 2006, 12:53:14

Hi Arjen,

Can you send me my colleagues's names in a private email?

One of the drawbacks of the syscall.d script is relative timings and
hence if system CPU usage is very low, it gives the relative weightage
about what portion in that low is associated with what call.. So even if
you have say..1% system time.. it says that most of it was IO related or
semsys related. So iostat output with -c option to include CPU times
helps to put it in  the right perspective.


Also do check the tunables mentioned and make sure they are set.

Regards,
Jignesh


Arjen van der Meijden wrote:

> Hi Jignesh,
>
> Jignesh K. Shah wrote:
>
>> Hi Arjen,
>>
>> Looking at your outputs...of syscall and usrcall it looks like
>>
>> * Spending too much time in semsys .... which means you have too many
>> connections and they are contending to get a lock.. which is
>> potentially the WAL log lock
>>
>> * llseek is high which means you can obviously gain a bit with the
>> right file system/files tuning by caching them right.
>>
>> Have you set the values for Solaris for T2000 tuned for Postgresql?
>
>
> Not particularly, we got a "special T2000 Solaris dvd" from your
> colleagues here in the Netherlands and installed that (actually one of
> your colleagues did). Doing so all the "better default"
> /etc/system-settings are supposed to be set. I haven't really checked
> that they are, since two of your colleagues have been working on it
> for the mysql-version of the benchmark and I assumed they'd have
> verified that.
>
>> Check out the tunables from the following URL
>>
>> http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp
>>
>> Try specially the /etc/system and postgresql.conf changes  and see if
>> it changes/improves your performance.
>
>
> I will see that those tunables are verified to be set.
>
> I am a bit surprised though about your remarks, since they'd point at
> the I/O being in the way? But we only have about 600k/sec i/o
> according to vmstat. The database easily fits in memory.
> In total I logged about 500k queries of which only 70k where altering
> queries, of which almost all where inserts in log-tables which aren't
> actively read in this benchmark.
>
> But I'll give it a try.
>
> Best regards,
>
> Arjen
>
>>
>> Arjen van der Meijden wrote:
>>
>>> Hi List,
>>>
>>> In the past few weeks we have been developing a read-heavy
>>> mysql-benchmark to have an alternative take at
>>> cpu/platform-performance. Not really to have a look at how fast
>>> mysql can be.
>>>
>>> This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled
>>> after our website's production database and the load generated on it
>>> is modelled after a simplified version of our visitor behaviour.
>>>
>>> Long story short, we think the test is a nice example of the
>>> relatively lightweight, read-heavy webapplications out there and
>>> therefore decided to have a go at postgresql as well.
>>> Of course the queries and indexes have been adjusted to (by our
>>> knowledge) best suit postgresql, while maintaining the same output
>>> to the application/interface layer. While the initial structure only
>>> got postgresql at about half the performance of mysql 4.1.x, the
>>> current version of our postgresql-benchmark has quite similar
>>> results to mysql 4.1.x, but both are quite a bit slower than 5.0.x
>>> (I think its about 30-40% faster).
>>>
>>> Since the results from those benchmarks are not yet public (they
>>> will be put together in a story at our website), I won't go into too
>>> much details about this benchmark.
>>>
>>> Currently we're having a look at a Sun T2000 and will be looking at
>>> will be looking at other machines as well in the future. We are
>>> running the sun-release of postgresql 8.1.3 on that T2000 now, but
>>> are looking at compiling the cvs-head version (for its
>>> index-root-cache) somewhere this week.
>>>
>>> My guess is there are a few people on this list who are interested
>>> in some dtrace results taken during our benchmarks on that T2000.
>>> Although my knowledge of both Solaris and Dtrace are very limited, I
>>> already took some samples of the system and user calls. I used
>>> Jignesh Shah's scripts for that:
>>> http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on
>>>
>>>
>>> You can find the samples here:
>>> http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
>>> http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log
>>>
>>> And I also did the memcpy-scripts, here:
>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log
>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log
>>> (this last log is 3.5MB)
>>>
>>> If anyone is interested in some more dtrace results, let me know
>>> (and tell me what commands to run ;-) ).
>>>
>>> Best regards,
>>>
>>> Arjen
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 3: Have you checked our extensive FAQ?
>>>
>>>               http://www.postgresql.org/docs/faq
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: explain analyze is your friend
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

Re: Pgsql (and mysql) benchmark on T2000/Solaris and some

From

Arjen van der Meijden

Date:

16 May 2006, 15:30:05

Hi Jignesh,

The settings from that 'special T2000 dvd' differed from the recommended
settings on the website you provided. But I don't see much difference in
performance with any of the adjustments, it appears to be more or less
the same.

Here are a few iostat lines by the way:

     sd0           sd1           sd2           nfs1           cpu
kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
   7   1   12  958  50   35    0   0    7    0   0    0   13  1  0 85
   0   0    0  2353 296    3    0   0    0    0   0    0   92  7  0  1
   0   0    0  2062 326    2    0   0    0    0   0    0   93  7  0  0
   1   1    1  1575 350    0    0   0    0    0   0    0   92  7  0  1
   0   0    0  1628 362    0    0   0    0    0   0    0   92  8  0  1

It appears to be doing a little less kps/tps on sd1 when I restore my
own postgresql.conf-settings. (default wal/checkpoints, 20k buffers, 2k
work mem).

Is it possible to trace the stack's for semsys, like the memcpy-traces,
or are those of no interest here?

Best regards,

Arjen


On 16-5-2006 17:52, Jignesh K. Shah wrote:
> Hi Arjen,
>
> Can you send me my colleagues's names in a private email?
>
> One of the drawbacks of the syscall.d script is relative timings and
> hence if system CPU usage is very low, it gives the relative weightage
> about what portion in that low is associated with what call.. So even if
> you have say..1% system time.. it says that most of it was IO related or
> semsys related. So iostat output with -c option to include CPU times
> helps to put it in  the right perspective.
>
>
> Also do check the tunables mentioned and make sure they are set.
>
> Regards,
> Jignesh
>
>
> Arjen van der Meijden wrote:
>
>> Hi Jignesh,
>>
>> Jignesh K. Shah wrote:
>>
>>> Hi Arjen,
>>>
>>> Looking at your outputs...of syscall and usrcall it looks like
>>>
>>> * Spending too much time in semsys .... which means you have too many
>>> connections and they are contending to get a lock.. which is
>>> potentially the WAL log lock
>>>
>>> * llseek is high which means you can obviously gain a bit with the
>>> right file system/files tuning by caching them right.
>>>
>>> Have you set the values for Solaris for T2000 tuned for Postgresql?
>>
>>
>> Not particularly, we got a "special T2000 Solaris dvd" from your
>> colleagues here in the Netherlands and installed that (actually one of
>> your colleagues did). Doing so all the "better default"
>> /etc/system-settings are supposed to be set. I haven't really checked
>> that they are, since two of your colleagues have been working on it
>> for the mysql-version of the benchmark and I assumed they'd have
>> verified that.
>>
>>> Check out the tunables from the following URL
>>>
>>> http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp
>>>
>>> Try specially the /etc/system and postgresql.conf changes  and see if
>>> it changes/improves your performance.
>>
>>
>> I will see that those tunables are verified to be set.
>>
>> I am a bit surprised though about your remarks, since they'd point at
>> the I/O being in the way? But we only have about 600k/sec i/o
>> according to vmstat. The database easily fits in memory.
>> In total I logged about 500k queries of which only 70k where altering
>> queries, of which almost all where inserts in log-tables which aren't
>> actively read in this benchmark.
>>
>> But I'll give it a try.
>>
>> Best regards,
>>
>> Arjen
>>
>>>
>>> Arjen van der Meijden wrote:
>>>
>>>> Hi List,
>>>>
>>>> In the past few weeks we have been developing a read-heavy
>>>> mysql-benchmark to have an alternative take at
>>>> cpu/platform-performance. Not really to have a look at how fast
>>>> mysql can be.
>>>>
>>>> This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled
>>>> after our website's production database and the load generated on it
>>>> is modelled after a simplified version of our visitor behaviour.
>>>>
>>>> Long story short, we think the test is a nice example of the
>>>> relatively lightweight, read-heavy webapplications out there and
>>>> therefore decided to have a go at postgresql as well.
>>>> Of course the queries and indexes have been adjusted to (by our
>>>> knowledge) best suit postgresql, while maintaining the same output
>>>> to the application/interface layer. While the initial structure only
>>>> got postgresql at about half the performance of mysql 4.1.x, the
>>>> current version of our postgresql-benchmark has quite similar
>>>> results to mysql 4.1.x, but both are quite a bit slower than 5.0.x
>>>> (I think its about 30-40% faster).
>>>>
>>>> Since the results from those benchmarks are not yet public (they
>>>> will be put together in a story at our website), I won't go into too
>>>> much details about this benchmark.
>>>>
>>>> Currently we're having a look at a Sun T2000 and will be looking at
>>>> will be looking at other machines as well in the future. We are
>>>> running the sun-release of postgresql 8.1.3 on that T2000 now, but
>>>> are looking at compiling the cvs-head version (for its
>>>> index-root-cache) somewhere this week.
>>>>
>>>> My guess is there are a few people on this list who are interested
>>>> in some dtrace results taken during our benchmarks on that T2000.
>>>> Although my knowledge of both Solaris and Dtrace are very limited, I
>>>> already took some samples of the system and user calls. I used
>>>> Jignesh Shah's scripts for that:
>>>> http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on
>>>>
>>>>
>>>> You can find the samples here:
>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log
>>>>
>>>> And I also did the memcpy-scripts, here:
>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log
>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log
>>>> (this last log is 3.5MB)
>>>>
>>>> If anyone is interested in some more dtrace results, let me know
>>>> (and tell me what commands to run ;-) ).
>>>>
>>>> Best regards,
>>>>
>>>> Arjen
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 3: Have you checked our extensive FAQ?
>>>>
>>>>               http://www.postgresql.org/docs/faq
>>>
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 6: explain analyze is your friend
>>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: Don't 'kill -9' the postmaster
>
>

Re: Pgsql (and mysql) benchmark on T2000/Solaris and some

From

"Jignesh K. Shah"

Date:

17 May 2006, 08:08:08

You usertime is way too high for T2000...

If you have a 6 core machine with 24 threads, it says all 24 threads are
reported as being busy with iostat output.

Best way to debug this is  use

prstat -amL
(or if you are dumping it in a file prstat -amLc > prstat.txt)

and find the pids with high user cpu time  and then use the usrcall.d on
few of those pids.

Also how many database connections do you have and what's the type of
query run by each connection?

-Jignesh



Arjen van der Meijden wrote:
> Hi Jignesh,
>
> The settings from that 'special T2000 dvd' differed from the recommended
> settings on the website you provided. But I don't see much difference in
> performance with any of the adjustments, it appears to be more or less
> the same.
>
> Here are a few iostat lines by the way:
>
>     sd0           sd1           sd2           nfs1           cpu
> kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
>   7   1   12  958  50   35    0   0    7    0   0    0   13  1  0 85
>   0   0    0  2353 296    3    0   0    0    0   0    0   92  7  0  1
>   0   0    0  2062 326    2    0   0    0    0   0    0   93  7  0  0
>   1   1    1  1575 350    0    0   0    0    0   0    0   92  7  0  1
>   0   0    0  1628 362    0    0   0    0    0   0    0   92  8  0  1
>
> It appears to be doing a little less kps/tps on sd1 when I restore my
> own postgresql.conf-settings. (default wal/checkpoints, 20k buffers, 2k
> work mem).
>
> Is it possible to trace the stack's for semsys, like the memcpy-traces,
> or are those of no interest here?
>
> Best regards,
>
> Arjen
>
>
> On 16-5-2006 17:52, Jignesh K. Shah wrote:
>
>> Hi Arjen,
>>
>> Can you send me my colleagues's names in a private email?
>>
>> One of the drawbacks of the syscall.d script is relative timings and
>> hence if system CPU usage is very low, it gives the relative weightage
>> about what portion in that low is associated with what call.. So even
>> if you have say..1% system time.. it says that most of it was IO
>> related or semsys related. So iostat output with -c option to include
>> CPU times helps to put it in  the right perspective.
>>
>>
>> Also do check the tunables mentioned and make sure they are set.
>>
>> Regards,
>> Jignesh
>>
>>
>> Arjen van der Meijden wrote:
>>
>>> Hi Jignesh,
>>>
>>> Jignesh K. Shah wrote:
>>>
>>>> Hi Arjen,
>>>>
>>>> Looking at your outputs...of syscall and usrcall it looks like
>>>>
>>>> * Spending too much time in semsys .... which means you have too
>>>> many connections and they are contending to get a lock.. which is
>>>> potentially the WAL log lock
>>>>
>>>> * llseek is high which means you can obviously gain a bit with the
>>>> right file system/files tuning by caching them right.
>>>>
>>>> Have you set the values for Solaris for T2000 tuned for Postgresql?
>>>
>>>
>>>
>>> Not particularly, we got a "special T2000 Solaris dvd" from your
>>> colleagues here in the Netherlands and installed that (actually one
>>> of your colleagues did). Doing so all the "better default"
>>> /etc/system-settings are supposed to be set. I haven't really checked
>>> that they are, since two of your colleagues have been working on it
>>> for the mysql-version of the benchmark and I assumed they'd have
>>> verified that.
>>>
>>>> Check out the tunables from the following URL
>>>>
>>>> http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp
>>>>
>>>> Try specially the /etc/system and postgresql.conf changes  and see
>>>> if it changes/improves your performance.
>>>
>>>
>>>
>>> I will see that those tunables are verified to be set.
>>>
>>> I am a bit surprised though about your remarks, since they'd point at
>>> the I/O being in the way? But we only have about 600k/sec i/o
>>> according to vmstat. The database easily fits in memory.
>>> In total I logged about 500k queries of which only 70k where altering
>>> queries, of which almost all where inserts in log-tables which aren't
>>> actively read in this benchmark.
>>>
>>> But I'll give it a try.
>>>
>>> Best regards,
>>>
>>> Arjen
>>>
>>>>
>>>> Arjen van der Meijden wrote:
>>>>
>>>>> Hi List,
>>>>>
>>>>> In the past few weeks we have been developing a read-heavy
>>>>> mysql-benchmark to have an alternative take at
>>>>> cpu/platform-performance. Not really to have a look at how fast
>>>>> mysql can be.
>>>>>
>>>>> This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled
>>>>> after our website's production database and the load generated on
>>>>> it is modelled after a simplified version of our visitor behaviour.
>>>>>
>>>>> Long story short, we think the test is a nice example of the
>>>>> relatively lightweight, read-heavy webapplications out there and
>>>>> therefore decided to have a go at postgresql as well.
>>>>> Of course the queries and indexes have been adjusted to (by our
>>>>> knowledge) best suit postgresql, while maintaining the same output
>>>>> to the application/interface layer. While the initial structure
>>>>> only got postgresql at about half the performance of mysql 4.1.x,
>>>>> the current version of our postgresql-benchmark has quite similar
>>>>> results to mysql 4.1.x, but both are quite a bit slower than 5.0.x
>>>>> (I think its about 30-40% faster).
>>>>>
>>>>> Since the results from those benchmarks are not yet public (they
>>>>> will be put together in a story at our website), I won't go into
>>>>> too much details about this benchmark.
>>>>>
>>>>> Currently we're having a look at a Sun T2000 and will be looking at
>>>>> will be looking at other machines as well in the future. We are
>>>>> running the sun-release of postgresql 8.1.3 on that T2000 now, but
>>>>> are looking at compiling the cvs-head version (for its
>>>>> index-root-cache) somewhere this week.
>>>>>
>>>>> My guess is there are a few people on this list who are interested
>>>>> in some dtrace results taken during our benchmarks on that T2000.
>>>>> Although my knowledge of both Solaris and Dtrace are very limited,
>>>>> I already took some samples of the system and user calls. I used
>>>>> Jignesh Shah's scripts for that:
>>>>> http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on
>>>>>
>>>>>
>>>>> You can find the samples here:
>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log
>>>>>
>>>>> And I also did the memcpy-scripts, here:
>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log
>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log
>>>>> (this last log is 3.5MB)
>>>>>
>>>>> If anyone is interested in some more dtrace results, let me know
>>>>> (and tell me what commands to run ;-) ).
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Arjen
>>>>>
>>>>> ---------------------------(end of
>>>>> broadcast)---------------------------
>>>>> TIP 3: Have you checked our extensive FAQ?
>>>>>
>>>>>               http://www.postgresql.org/docs/faq
>>>>
>>>>
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 6: explain analyze is your friend
>>>>
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 2: Don't 'kill -9' the postmaster
>>
>>
>>

Re: Pgsql (and mysql) benchmark on T2000/Solaris and some

From

Arjen van der Meijden

Date:

17 May 2006, 08:54:14

We have the 4 core machine. However, these numbers are taken during a
benchmark, not normal work load. So the output should display the system
being working fully ;)

So its postgres doing a lot of work and you already had a look at the
usrcall for that.

The benchmark just tries to do the queries for "random page visits".
This totals up to about some 80 different queries being executed with
mostly random parameters. The workload is generated using php so there
are no connection pools, nor prepared statements.

The queries vary, but are all relatively lightweight queries with less
than 6 or 7 joinable tables. Almost all queries can use indexes. Most
tables are under a few MB of data, although there are a few larger than
that. Most records are relatively small, consisting of mostly numbers
(id's and such).

The results presented here was with 25 concurrent connections.

Best regards,

Arjen


Jignesh K. Shah wrote:
> You usertime is way too high for T2000...
>
> If you have a 6 core machine with 24 threads, it says all 24 threads are
> reported as being busy with iostat output.
>
> Best way to debug this is  use
>
> prstat -amL
> (or if you are dumping it in a file prstat -amLc > prstat.txt)
>
> and find the pids with high user cpu time  and then use the usrcall.d on
> few of those pids.
>
> Also how many database connections do you have and what's the type of
> query run by each connection?
>
> -Jignesh
>
>
>
> Arjen van der Meijden wrote:
>> Hi Jignesh,
>>
>> The settings from that 'special T2000 dvd' differed from the
>> recommended settings on the website you provided. But I don't see much
>> difference in performance with any of the adjustments, it appears to
>> be more or less the same.
>>
>> Here are a few iostat lines by the way:
>>
>>     sd0           sd1           sd2           nfs1           cpu
>> kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
>>   7   1   12  958  50   35    0   0    7    0   0    0   13  1  0 85
>>   0   0    0  2353 296    3    0   0    0    0   0    0   92  7  0  1
>>   0   0    0  2062 326    2    0   0    0    0   0    0   93  7  0  0
>>   1   1    1  1575 350    0    0   0    0    0   0    0   92  7  0  1
>>   0   0    0  1628 362    0    0   0    0    0   0    0   92  8  0  1
>>
>> It appears to be doing a little less kps/tps on sd1 when I restore my
>> own postgresql.conf-settings. (default wal/checkpoints, 20k buffers,
>> 2k work mem).
>>
>> Is it possible to trace the stack's for semsys, like the
>> memcpy-traces, or are those of no interest here?
>>
>> Best regards,
>>
>> Arjen
>>
>>
>> On 16-5-2006 17:52, Jignesh K. Shah wrote:
>>
>>> Hi Arjen,
>>>
>>> Can you send me my colleagues's names in a private email?
>>>
>>> One of the drawbacks of the syscall.d script is relative timings and
>>> hence if system CPU usage is very low, it gives the relative
>>> weightage about what portion in that low is associated with what
>>> call.. So even if you have say..1% system time.. it says that most of
>>> it was IO related or semsys related. So iostat output with -c option
>>> to include CPU times helps to put it in  the right perspective.
>>>
>>>
>>> Also do check the tunables mentioned and make sure they are set.
>>>
>>> Regards,
>>> Jignesh
>>>
>>>
>>> Arjen van der Meijden wrote:
>>>
>>>> Hi Jignesh,
>>>>
>>>> Jignesh K. Shah wrote:
>>>>
>>>>> Hi Arjen,
>>>>>
>>>>> Looking at your outputs...of syscall and usrcall it looks like
>>>>>
>>>>> * Spending too much time in semsys .... which means you have too
>>>>> many connections and they are contending to get a lock.. which is
>>>>> potentially the WAL log lock
>>>>>
>>>>> * llseek is high which means you can obviously gain a bit with the
>>>>> right file system/files tuning by caching them right.
>>>>>
>>>>> Have you set the values for Solaris for T2000 tuned for Postgresql?
>>>>
>>>>
>>>>
>>>> Not particularly, we got a "special T2000 Solaris dvd" from your
>>>> colleagues here in the Netherlands and installed that (actually one
>>>> of your colleagues did). Doing so all the "better default"
>>>> /etc/system-settings are supposed to be set. I haven't really
>>>> checked that they are, since two of your colleagues have been
>>>> working on it for the mysql-version of the benchmark and I assumed
>>>> they'd have verified that.
>>>>
>>>>> Check out the tunables from the following URL
>>>>>
>>>>> http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp
>>>>>
>>>>> Try specially the /etc/system and postgresql.conf changes  and see
>>>>> if it changes/improves your performance.
>>>>
>>>>
>>>>
>>>> I will see that those tunables are verified to be set.
>>>>
>>>> I am a bit surprised though about your remarks, since they'd point
>>>> at the I/O being in the way? But we only have about 600k/sec i/o
>>>> according to vmstat. The database easily fits in memory.
>>>> In total I logged about 500k queries of which only 70k where
>>>> altering queries, of which almost all where inserts in log-tables
>>>> which aren't actively read in this benchmark.
>>>>
>>>> But I'll give it a try.
>>>>
>>>> Best regards,
>>>>
>>>> Arjen
>>>>
>>>>>
>>>>> Arjen van der Meijden wrote:
>>>>>
>>>>>> Hi List,
>>>>>>
>>>>>> In the past few weeks we have been developing a read-heavy
>>>>>> mysql-benchmark to have an alternative take at
>>>>>> cpu/platform-performance. Not really to have a look at how fast
>>>>>> mysql can be.
>>>>>>
>>>>>> This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is
>>>>>> modelled after our website's production database and the load
>>>>>> generated on it is modelled after a simplified version of our
>>>>>> visitor behaviour.
>>>>>>
>>>>>> Long story short, we think the test is a nice example of the
>>>>>> relatively lightweight, read-heavy webapplications out there and
>>>>>> therefore decided to have a go at postgresql as well.
>>>>>> Of course the queries and indexes have been adjusted to (by our
>>>>>> knowledge) best suit postgresql, while maintaining the same output
>>>>>> to the application/interface layer. While the initial structure
>>>>>> only got postgresql at about half the performance of mysql 4.1.x,
>>>>>> the current version of our postgresql-benchmark has quite similar
>>>>>> results to mysql 4.1.x, but both are quite a bit slower than 5.0.x
>>>>>> (I think its about 30-40% faster).
>>>>>>
>>>>>> Since the results from those benchmarks are not yet public (they
>>>>>> will be put together in a story at our website), I won't go into
>>>>>> too much details about this benchmark.
>>>>>>
>>>>>> Currently we're having a look at a Sun T2000 and will be looking
>>>>>> at will be looking at other machines as well in the future. We are
>>>>>> running the sun-release of postgresql 8.1.3 on that T2000 now, but
>>>>>> are looking at compiling the cvs-head version (for its
>>>>>> index-root-cache) somewhere this week.
>>>>>>
>>>>>> My guess is there are a few people on this list who are interested
>>>>>> in some dtrace results taken during our benchmarks on that T2000.
>>>>>> Although my knowledge of both Solaris and Dtrace are very limited,
>>>>>> I already took some samples of the system and user calls. I used
>>>>>> Jignesh Shah's scripts for that:
>>>>>> http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on
>>>>>>
>>>>>>
>>>>>> You can find the samples here:
>>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
>>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log
>>>>>>
>>>>>> And I also did the memcpy-scripts, here:
>>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log
>>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log
>>>>>> (this last log is 3.5MB)
>>>>>>
>>>>>> If anyone is interested in some more dtrace results, let me know
>>>>>> (and tell me what commands to run ;-) ).
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Arjen
>>>>>>
>>>>>> ---------------------------(end of
>>>>>> broadcast)---------------------------
>>>>>> TIP 3: Have you checked our extensive FAQ?
>>>>>>
>>>>>>               http://www.postgresql.org/docs/faq
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------(end of
>>>>> broadcast)---------------------------
>>>>> TIP 6: explain analyze is your friend
>>>>>
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 2: Don't 'kill -9' the postmaster
>>>
>>>
>>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

Re: Pgsql (and mysql) benchmark on T2000/Solaris and some

From

"Jignesh K. Shah"

Date:

17 May 2006, 12:14:59

I have seen MemoryContextSwitchTo taking time before.. However I am not
sure why would it take so much CPU time?
Maybe that function does not work efficiently on Solaris?

Also I donot have much idea about slot_getattr.

Anybody else? (Other option is to use "collect -p $pid" experiments to
gather the data to figure out what instruction is causing the high CPU
usage) Maybe the Sun engineers out there can help out

-Jignesh


Arjen van der Meijden wrote:
> We have the 4 core machine. However, these numbers are taken during a
> benchmark, not normal work load. So the output should display the system
> being working fully ;)
>
> So its postgres doing a lot of work and you already had a look at the
> usrcall for that.
>
> The benchmark just tries to do the queries for "random page visits".
> This totals up to about some 80 different queries being executed with
> mostly random parameters. The workload is generated using php so there
> are no connection pools, nor prepared statements.
>
> The queries vary, but are all relatively lightweight queries with less
> than 6 or 7 joinable tables. Almost all queries can use indexes. Most
> tables are under a few MB of data, although there are a few larger than
> that. Most records are relatively small, consisting of mostly numbers
> (id's and such).
>
> The results presented here was with 25 concurrent connections.
>
> Best regards,
>
> Arjen
>
>
> Jignesh K. Shah wrote:
>
>> You usertime is way too high for T2000...
>>
>> If you have a 6 core machine with 24 threads, it says all 24 threads
>> are reported as being busy with iostat output.
>>
>> Best way to debug this is  use
>>
>> prstat -amL
>> (or if you are dumping it in a file prstat -amLc > prstat.txt)
>>
>> and find the pids with high user cpu time  and then use the usrcall.d
>> on few of those pids.
>>
>> Also how many database connections do you have and what's the type of
>> query run by each connection?
>>
>> -Jignesh
>>
>>
>>
>> Arjen van der Meijden wrote:
>>
>>> Hi Jignesh,
>>>
>>> The settings from that 'special T2000 dvd' differed from the
>>> recommended settings on the website you provided. But I don't see
>>> much difference in performance with any of the adjustments, it
>>> appears to be more or less the same.
>>>
>>> Here are a few iostat lines by the way:
>>>
>>>     sd0           sd1           sd2           nfs1           cpu
>>> kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
>>>   7   1   12  958  50   35    0   0    7    0   0    0   13  1  0 85
>>>   0   0    0  2353 296    3    0   0    0    0   0    0   92  7  0  1
>>>   0   0    0  2062 326    2    0   0    0    0   0    0   93  7  0  0
>>>   1   1    1  1575 350    0    0   0    0    0   0    0   92  7  0  1
>>>   0   0    0  1628 362    0    0   0    0    0   0    0   92  8  0  1
>>>
>>> It appears to be doing a little less kps/tps on sd1 when I restore my
>>> own postgresql.conf-settings. (default wal/checkpoints, 20k buffers,
>>> 2k work mem).
>>>
>>> Is it possible to trace the stack's for semsys, like the
>>> memcpy-traces, or are those of no interest here?
>>>
>>> Best regards,
>>>
>>> Arjen
>>>
>>>
>>> On 16-5-2006 17:52, Jignesh K. Shah wrote:
>>>
>>>> Hi Arjen,
>>>>
>>>> Can you send me my colleagues's names in a private email?
>>>>
>>>> One of the drawbacks of the syscall.d script is relative timings and
>>>> hence if system CPU usage is very low, it gives the relative
>>>> weightage about what portion in that low is associated with what
>>>> call.. So even if you have say..1% system time.. it says that most
>>>> of it was IO related or semsys related. So iostat output with -c
>>>> option to include CPU times helps to put it in  the right perspective.
>>>>
>>>>
>>>> Also do check the tunables mentioned and make sure they are set.
>>>>
>>>> Regards,
>>>> Jignesh
>>>>
>>>>
>>>> Arjen van der Meijden wrote:
>>>>
>>>>> Hi Jignesh,
>>>>>
>>>>> Jignesh K. Shah wrote:
>>>>>
>>>>>> Hi Arjen,
>>>>>>
>>>>>> Looking at your outputs...of syscall and usrcall it looks like
>>>>>>
>>>>>> * Spending too much time in semsys .... which means you have too
>>>>>> many connections and they are contending to get a lock.. which is
>>>>>> potentially the WAL log lock
>>>>>>
>>>>>> * llseek is high which means you can obviously gain a bit with the
>>>>>> right file system/files tuning by caching them right.
>>>>>>
>>>>>> Have you set the values for Solaris for T2000 tuned for Postgresql?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Not particularly, we got a "special T2000 Solaris dvd" from your
>>>>> colleagues here in the Netherlands and installed that (actually one
>>>>> of your colleagues did). Doing so all the "better default"
>>>>> /etc/system-settings are supposed to be set. I haven't really
>>>>> checked that they are, since two of your colleagues have been
>>>>> working on it for the mysql-version of the benchmark and I assumed
>>>>> they'd have verified that.
>>>>>
>>>>>> Check out the tunables from the following URL
>>>>>>
>>>>>> http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp
>>>>>>
>>>>>>
>>>>>> Try specially the /etc/system and postgresql.conf changes  and see
>>>>>> if it changes/improves your performance.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I will see that those tunables are verified to be set.
>>>>>
>>>>> I am a bit surprised though about your remarks, since they'd point
>>>>> at the I/O being in the way? But we only have about 600k/sec i/o
>>>>> according to vmstat. The database easily fits in memory.
>>>>> In total I logged about 500k queries of which only 70k where
>>>>> altering queries, of which almost all where inserts in log-tables
>>>>> which aren't actively read in this benchmark.
>>>>>
>>>>> But I'll give it a try.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Arjen
>>>>>
>>>>>>
>>>>>> Arjen van der Meijden wrote:
>>>>>>
>>>>>>> Hi List,
>>>>>>>
>>>>>>> In the past few weeks we have been developing a read-heavy
>>>>>>> mysql-benchmark to have an alternative take at
>>>>>>> cpu/platform-performance. Not really to have a look at how fast
>>>>>>> mysql can be.
>>>>>>>
>>>>>>> This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is
>>>>>>> modelled after our website's production database and the load
>>>>>>> generated on it is modelled after a simplified version of our
>>>>>>> visitor behaviour.
>>>>>>>
>>>>>>> Long story short, we think the test is a nice example of the
>>>>>>> relatively lightweight, read-heavy webapplications out there and
>>>>>>> therefore decided to have a go at postgresql as well.
>>>>>>> Of course the queries and indexes have been adjusted to (by our
>>>>>>> knowledge) best suit postgresql, while maintaining the same
>>>>>>> output to the application/interface layer. While the initial
>>>>>>> structure only got postgresql at about half the performance of
>>>>>>> mysql 4.1.x, the current version of our postgresql-benchmark has
>>>>>>> quite similar results to mysql 4.1.x, but both are quite a bit
>>>>>>> slower than 5.0.x (I think its about 30-40% faster).
>>>>>>>
>>>>>>> Since the results from those benchmarks are not yet public (they
>>>>>>> will be put together in a story at our website), I won't go into
>>>>>>> too much details about this benchmark.
>>>>>>>
>>>>>>> Currently we're having a look at a Sun T2000 and will be looking
>>>>>>> at will be looking at other machines as well in the future. We
>>>>>>> are running the sun-release of postgresql 8.1.3 on that T2000
>>>>>>> now, but are looking at compiling the cvs-head version (for its
>>>>>>> index-root-cache) somewhere this week.
>>>>>>>
>>>>>>> My guess is there are a few people on this list who are
>>>>>>> interested in some dtrace results taken during our benchmarks on
>>>>>>> that T2000.
>>>>>>> Although my knowledge of both Solaris and Dtrace are very
>>>>>>> limited, I already took some samples of the system and user
>>>>>>> calls. I used Jignesh Shah's scripts for that:
>>>>>>> http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on
>>>>>>>
>>>>>>>
>>>>>>> You can find the samples here:
>>>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
>>>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log
>>>>>>>
>>>>>>> And I also did the memcpy-scripts, here:
>>>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log
>>>>>>> http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log
>>>>>>> (this last log is 3.5MB)
>>>>>>>
>>>>>>> If anyone is interested in some more dtrace results, let me know
>>>>>>> (and tell me what commands to run ;-) ).
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Arjen
>>>>>>>
>>>>>>> ---------------------------(end of
>>>>>>> broadcast)---------------------------
>>>>>>> TIP 3: Have you checked our extensive FAQ?
>>>>>>>
>>>>>>>               http://www.postgresql.org/docs/faq
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------(end of
>>>>>> broadcast)---------------------------
>>>>>> TIP 6: explain analyze is your friend
>>>>>>
>>>>>
>>>>> ---------------------------(end of
>>>>> broadcast)---------------------------
>>>>> TIP 2: Don't 'kill -9' the postmaster
>>>>
>>>>
>>>>
>>>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: explain analyze is your friend
>>

Optimizer: limit not taken into account

From

"Craig A. James"

Date:

17 May 2006, 12:55:08

Here's a "corner case" that might interest someone.  It tripped up one of our programmers.

We have a table with > 10 million rows.  The ID column is indexed, the table has been vacuum/analyzed.  Compare these
twoqueries: 

   select * from tbl where id >= 10000000 limit 1;
   select * from tbl where id >= 10000000 order by id limit 1;

The first takes 4 seconds, and uses a full table scan.  The second takes 32 msec and uses the index.
Details are below.

I understand why the planner makes the choices it does -- the "id > 10000000" isn't very selective and under normal
circumstancesa full table scan is probably the right choice.  But the "limit 1" apparently doesn't alter the planner's
strategyat all.  We were surprised by this. 

Adding the "order by" was a simple solution.

Craig



pg=> explain analyze select url, url_digest from url_queue where priority >= 10000000 limit 1;
                       QUERY PLAN
------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.65 rows=1 width=108) (actual time=4036.113..4036.117 rows=1 loops=1)
   ->  Seq Scan on url_queue  (cost=0.00..391254.35 rows=606176 width=108) (actual time=4036.101..4036.101 rows=1
loops=1)
         Filter: (priority >= 10000000)
 Total runtime: 4036.200 ms
(4 rows)

pg=> explain analyze select url, url_digest from url_queue where priority >= 10000000 order by priority limit 1;
                       QUERY PLAN
--------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.38 rows=1 width=112) (actual time=32.445..32.448 rows=1 loops=1)
   ->  Index Scan using url_queue_priority on url_queue  (cost=0.00..1440200.41 rows=606176 width=112) (actual
time=32.434..32.434rows=1 loops=1) 
         Index Cond: (priority >= 10000000)
 Total runtime: 32.566 ms

Re: Optimizer: limit not taken into account

From

Bruno Wolff III

Date:

17 May 2006, 14:43:03

Please don't reply to previous messages to start new threads. This makes it
harder to find stuff in the archives and may keep people from noticing your
message.

On Wed, May 17, 2006 at 08:54:52 -0700,
  "Craig A. James" <cjames@modgraph-usa.com> wrote:
> Here's a "corner case" that might interest someone.  It tripped up one of
> our programmers.
>
> We have a table with > 10 million rows.  The ID column is indexed, the
> table has been vacuum/analyzed.  Compare these two queries:
>
>   select * from tbl where id >= 10000000 limit 1;
>   select * from tbl where id >= 10000000 order by id limit 1;
>
> The first takes 4 seconds, and uses a full table scan.  The second takes 32
> msec and uses the index.  Details are below.

I suspect it wasn't intended to be a full table scan. But rather a sequential
scan until it found a matching row. If the data in the table is ordered by
by id, this strategy may not work out well. Where as if the data is randomly
ordered, it would be expected to find a match quickly.

Have you analyzed the table recently? If the planner has bad stats on the
table, that is going to make it more likely to choose a bad plan.


> I understand why the planner makes the choices it does -- the "id >
> 10000000" isn't very selective and under normal circumstances a full table
> scan is probably the right choice.  But the "limit 1" apparently doesn't
> alter the planner's strategy at all.  We were surprised by this.
>
> Adding the "order by" was a simple solution.
>
> Craig
>
>
>
> pg=> explain analyze select url, url_digest from url_queue where priority
> >= 10000000 limit 1;
>                       QUERY PLAN
> ------------------------------------------------------------------------------------------
> Limit  (cost=0.00..0.65 rows=1 width=108) (actual time=4036.113..4036.117
> rows=1 loops=1)
>   ->  Seq Scan on url_queue  (cost=0.00..391254.35 rows=606176 width=108)
>   (actual time=4036.101..4036.101 rows=1 loops=1)
>         Filter: (priority >= 10000000)
> Total runtime: 4036.200 ms
> (4 rows)
>
> pg=> explain analyze select url, url_digest from url_queue where priority
> >= 10000000 order by priority limit 1;
>                       QUERY PLAN
> --------------------------------------------------------------------------------------
> Limit  (cost=0.00..2.38 rows=1 width=112) (actual time=32.445..32.448
> rows=1 loops=1)
>   ->  Index Scan using url_queue_priority on url_queue
>   (cost=0.00..1440200.41 rows=606176 width=112) (actual time=32.434..32.434
>   rows=1 loops=1)
>         Index Cond: (priority >= 10000000)
> Total runtime: 32.566 ms
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

Re: Optimizer: limit not taken into account

From

Simon Riggs

Date:

17 May 2006, 15:22:16

On Wed, 2006-05-17 at 08:54 -0700, Craig A. James wrote:
> Here's a "corner case" that might interest someone.  It tripped up one of our programmers.
>
> We have a table with > 10 million rows.  The ID column is indexed, the table has been vacuum/analyzed.  Compare these
twoqueries: 
>
>    select * from tbl where id >= 10000000 limit 1;
>    select * from tbl where id >= 10000000 order by id limit 1;
>
> The first takes 4 seconds, and uses a full table scan.  The second takes 32 msec and uses the index.
> Details are below.

The rows are not randomly distributed, so the SeqScan takes longer to
find 1 row than the index scan.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com

Re: Optimizer: limit not taken into account

From

Tom Lane

Date:

17 May 2006, 15:30:41

Bruno Wolff III <bruno@wolff.to> writes:
> I suspect it wasn't intended to be a full table scan. But rather a sequential
> scan until it found a matching row. If the data in the table is ordered by
> by id, this strategy may not work out well. Where as if the data is randomly
> ordered, it would be expected to find a match quickly.

Right.  You can see from the differential in the estimates for the
SeqScan and the Limit nodes that the planner is not expecting the
seqscan to run to completion, but rather to find a matching row quite
quickly.

There is not anything in there that considers whether the table's
physical order is so nonrandom that the search will take much longer
than it would given uniform distribution.  It might be possible to do
something with the correlation statistic in simple cases ...

            regards, tom lane

Re: Optimizer: limit not taken into account

From

"Craig A. James"

Date:

17 May 2006, 15:58:35

Tom Lane wrote:
> There is not anything in there that considers whether the table's
> physical order is so nonrandom that the search will take much longer
> than it would given uniform distribution.  It might be possible to do
> something with the correlation statistic in simple cases ...

In this case, the rows are not random at all, in fact they're inserted from a sequence, then rows are deleted as they
areprocessed.  If the planner is hoping for random physical distribution, this particular case is exactly wrong.  

Craig

Re: Optimizer: limit not taken into account

From

"Jim C. Nasby"

Date:

17 May 2006, 16:20:53

On Wed, May 17, 2006 at 08:54:52AM -0700, Craig A. James wrote:
> Here's a "corner case" that might interest someone.  It tripped up one of
> our programmers.
>
> We have a table with > 10 million rows.  The ID column is indexed, the
> table has been vacuum/analyzed.  Compare these two queries:
>
>   select * from tbl where id >= 10000000 limit 1;
>   select * from tbl where id >= 10000000 order by id limit 1;
>
> The first takes 4 seconds, and uses a full table scan.  The second takes 32
> msec and uses the index.  Details are below.
>
> I understand why the planner makes the choices it does -- the "id >
> 10000000" isn't very selective and under normal circumstances a full table
> scan is probably the right choice.  But the "limit 1" apparently doesn't
> alter the planner's strategy at all.  We were surprised by this.

Is it really not very selective? If there's 10000000 rows in the table,
and id starts at 1 with very few gaps, then >= 10000000 should actually
be very selective...

Also, I hope you understand there's a big difference between a limit
query that does and doesn't have an order by.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Pgsql (and mysql) benchmark on T2000/Solaris and some

From

Florian Weimer

Date:

18 May 2006, 04:52:55

* Jignesh K. Shah:

> * llseek is high which means you can obviously gain a bit with the
> right file system/files tuning by caching them right.

It might also make sense to switch from lseek-read/write to
pread/pwrite.  It shouldn't be too hard to hack this into the virtual
file descriptor module.