Thread: Performance Question

Performance Question

From
- -
Date:
I've been searching for performance metrics and tweaks for a few weeks now. I'm trying to determine if the length of time to process my queries is accurate or not and I'm having a difficult time determining that. I know postgres performance is very dependent on hardware and settings and I understand how difficult it is to tackle. However, I was wondering if I could get some feedback based on my results please.

The database is running on a dual-core 2GHz Opteron processor with 8GB of RAM. The drives are 10K RPM 146GB drives in RAID 5 (I've read RAID 5 is bad for Postgres, but moving the database to another server didn't change performance at all). Some of the key parameters from postgresql.conf are:

max_connections = 100
shared_buffers = 16MB
work_mem = 64MB
everything else is set to the default

One of my tables has 660,000 records and doing a SELECT * from that table (without any joins or sorts) takes 72 seconds. Ordering the table based on 3 columns almost doubles that time to an average of 123 seconds. To me, those numbers are crazy slow and I don't understand why the queries are taking so long. The tables are UTF-8 encode and contain a mix of languages (English, Spanish, etc). I'm running the query from pgadmin3 on a remote host. The server has nothing else running on it except the database.

As a test I tried splitting up the data across a number of other tables. I ran 10 queries (to correspond with the 10 tables) with a UNION ALL to join the results together. This was even slower, taking an average of 103 seconds to complete the generic select all query.

I'm convinced something is wrong, I just can't pinpoint where it is. I can provide any other information necessary. If anyone has any suggestions it would be greatly appreciated.

Re: Performance Question

From
"J Sisson"
Date:
There are a few things you didn't mention...

First off, what is the context this database is being used in?  Is it the backend for a web server?  Data warehouse?  Etc?

Second, you didn't mention the use of indexes.  Do you have any indexes on the table in question, and if so, does EXPLAIN ANALYZE show the planner utilizing the index(es)?

Third, you have 8 GB of RAM on a dedicated machine.  Consider upping the memory settings in postgresql.conf.  For instance, on my data warehouse machines (8 GB RAM each) I have shared_buffers set to almost 2 GB and effective_cache_size set to nearly 5.5 GB.  (This is dependent on how you're utilizing this database, so don't blindly set these values!)

Last, you didn't mention what RAID level the other server you tested this on was running.

On Wed, Nov 12, 2008 at 10:27 AM, - - <themanatuf@yahoo.com> wrote:
I've been searching for performance metrics and tweaks for a few weeks now. I'm trying to determine if the length of time to process my queries is accurate or not and I'm having a difficult time determining that. I know postgres performance is very dependent on hardware and settings and I understand how difficult it is to tackle. However, I was wondering if I could get some feedback based on my results please.

The database is running on a dual-core 2GHz Opteron processor with 8GB of RAM. The drives are 10K RPM 146GB drives in RAID 5 (I've read RAID 5 is bad for Postgres, but moving the database to another server didn't change performance at all). Some of the key parameters from postgresql.conf are:

max_connections = 100
shared_buffers = 16MB
work_mem = 64MB
everything else is set to the default

One of my tables has 660,000 records and doing a SELECT * from that table (without any joins or sorts) takes 72 seconds. Ordering the table based on 3 columns almost doubles that time to an average of 123 seconds. To me, those numbers are crazy slow and I don't understand why the queries are taking so long. The tables are UTF-8 encode and contain a mix of languages (English, Spanish, etc). I'm running the query from pgadmin3 on a remote host. The server has nothing else running on it except the database.

As a test I tried splitting up the data across a number of other tables. I ran 10 queries (to correspond with the 10 tables) with a UNION ALL to join the results together. This was even slower, taking an average of 103 seconds to complete the generic select all query.

I'm convinced something is wrong, I just can't pinpoint where it is. I can provide any other information necessary. If anyone has any suggestions it would be greatly appreciated.




--
Computers are like air conditioners...
They quit working when you open Windows.

Re: Performance Question

From
tv@fuzzy.cz
Date:
> max_connections = 100
> shared_buffers = 16MB
> work_mem = 64MB
> everything else is set to the default

OK, but what about effective_cache_size for example?

Anyway, we need more information about the table itself - the number of
rows is nice, but it does not say how large the table is. The rows might
be small (say 100B each) or large (say several kilobytes), affecting the
amount of data to be read.

We need to know the structure of the table, and the output of the
following commands:

ANALYZE table;
SELECT relpages, reltuples FROM pg_class WHERE relname = 'table';
EXPLAIN SELECT * FROM table;

>
> One of my tables has 660,000 records and doing a SELECT * from that table
> (without any joins or sorts) takes 72 seconds. Ordering the table based on
> 3 columns almost doubles that time to an average of 123 seconds. To me,
> those numbers are crazy slow and I don't understand why the queries are
> taking so long. The tables are UTF-8 encode and contain a mix of languages
> (English, Spanish, etc). I'm running the query from pgadmin3 on a remote
> host. The server has nothing else running on it except the database.
>
> As a test I tried splitting up the data across a number of other tables. I
> ran 10 queries (to correspond with the 10 tables) with a UNION ALL to join
> the results together. This was even slower, taking an average of 103
> seconds to complete the generic select all query.

Well, splitting the tables just to read all of them won't help. It will
make the problem even worse, due to the necessary processing (UNION ALL).

regards
Tomas


Re: Performance Question

From
"Fernando Hevia"
Date:
Incrementing shared_buffers to 1024MB and set effective_cache_size to 6000MB and test again.
To speed up sort operations, increase work_mem till you notice an improvement.
Play with those settings with different values.
 


De: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] En nombre de - -
Enviado el: Miércoles, 12 de Noviembre de 2008 14:28
Para: pgsql-performance@postgresql.org
Asunto: [PERFORM] Performance Question

I've been searching for performance metrics and tweaks for a few weeks now. I'm trying to determine if the length of time to process my queries is accurate or not and I'm having a difficult time determining that. I know postgres performance is very dependent on hardware and settings and I understand how difficult it is to tackle. However, I was wondering if I could get some feedback based on my results please.

The database is running on a dual-core 2GHz Opteron processor with 8GB of RAM. The drives are 10K RPM 146GB drives in RAID 5 (I've read RAID 5 is bad for Postgres, but moving the database to another server didn't change performance at all). Some of the key parameters from postgresql.conf are:

max_connections = 100
shared_buffers = 16MB
work_mem = 64MB
everything else is set to the default

One of my tables has 660,000 records and doing a SELECT * from that table (without any joins or sorts) takes 72 seconds. Ordering the table based on 3 columns almost doubles that time to an average of 123 seconds. To me, those numbers are crazy slow and I don't understand why the queries are taking so long. The tables are UTF-8 encode and contain a mix of languages (English, Spanish, etc). I'm running the query from pgadmin3 on a remote host. The server has nothing else running on it except the database.

As a test I tried splitting up the data across a number of other tables. I ran 10 queries (to correspond with the 10 tables) with a UNION ALL to join the results together. This was even slower, taking an average of 103 seconds to complete the generic select all query.

I'm convinced something is wrong, I just can't pinpoint where it is. I can provide any other information necessary. If anyone has any suggestions it would be greatly appreciated.

Re: Performance Question

From
Tom Lane
Date:
- - <themanatuf@yahoo.com> writes:
> One of my tables has 660,000 records and doing a SELECT * from that table (without any joins or sorts) takes 72
seconds.Ordering the table based on 3 columns almost doubles that time to an average of 123 seconds. To me, those
numbersare crazy slow and I don't understand why the queries are taking so long. The tables are UTF-8 encode and
containa mix of languages (English, Spanish, etc). I'm running the query from pgadmin3 on a remote host. The server has
nothingelse running on it except the database. 

pgadmin has got its own performance issues with large select results.
Are you sure the bulk of the time isn't being spent on the client side?
Watching top or vmstat on both machines would probably tell much.

            regards, tom lane

Re: Performance Question

From
"Scott Marlowe"
Date:
On Wed, Nov 12, 2008 at 9:27 AM, - - <themanatuf@yahoo.com> wrote:
> I've been searching for performance metrics and tweaks for a few weeks now.
> I'm trying to determine if the length of time to process my queries is
> accurate or not and I'm having a difficult time determining that. I know
> postgres performance is very dependent on hardware and settings and I
> understand how difficult it is to tackle. However, I was wondering if I
> could get some feedback based on my results please.
>
> The database is running on a dual-core 2GHz Opteron processor with 8GB of
> RAM. The drives are 10K RPM 146GB drives in RAID 5 (I've read RAID 5 is bad
> for Postgres, but moving the database to another server didn't change
> performance at all). Some of the key parameters from postgresql.conf are:

I'm not sure what you mean.  Did you move it to another server with a
single drive?  A 100 drive RAID-10 array with a battery backed caching
controller?  There's a lot of possibility in "another server".

>
> max_connections = 100
> shared_buffers = 16MB

WAY low.  try 512M to 2G on a machine that big.

> work_mem = 64MB

acceptable.  For 100 clients, if each did a sort you'd need 6.4Gig of
free ram, but since the chances of all 100 clients doing a sort that
big at the same time are small, you're probably safe.

>
> One of my tables has 660,000 records and doing a SELECT * from that table
> (without any joins or sorts) takes 72 seconds. Ordering the table based on 3
> columns almost doubles that time to an average of 123 seconds. To me, those

How wide is this table?  IF it's got 300 columns, then it's gonna be a
lot slower than if it has 10 columns.

Try running your query like this:

\timing
select count(*) from (my big query goes here) as a;

and see how long it takes.  This will remove the network effect of
transferring the data.  If that runs fast enough, then the real
problem is that your client is waiting til it gets all the data to
display it.

Re: Performance Question

From
"Dave Page"
Date:
On Wed, Nov 12, 2008 at 8:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> - - <themanatuf@yahoo.com> writes:
>> One of my tables has 660,000 records and doing a SELECT * from that table (without any joins or sorts) takes 72
seconds.Ordering the table based on 3 columns almost doubles that time to an average of 123 seconds. To me, those
numbersare crazy slow and I don't understand why the queries are taking so long. The tables are UTF-8 encode and
containa mix of languages (English, Spanish, etc). I'm running the query from pgadmin3 on a remote host. The server has
nothingelse running on it except the database. 
>
> pgadmin has got its own performance issues with large select results.

They were fixed a couple of years ago. We're essentially at the mercy
of libpq now.


--
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

Re: Performance Question

From
"Merlin Moncure"
Date:
On Wed, Nov 12, 2008 at 11:27 AM, - - <themanatuf@yahoo.com> wrote:
> I've been searching for performance metrics and tweaks for a few weeks now.
> I'm trying to determine if the length of time to process my queries is
> accurate or not and I'm having a difficult time determining that. I know
> postgres performance is very dependent on hardware and settings and I
> understand how difficult it is to tackle. However, I was wondering if I
> could get some feedback based on my results please.
>
> The database is running on a dual-core 2GHz Opteron processor with 8GB of
> RAM. The drives are 10K RPM 146GB drives in RAID 5 (I've read RAID 5 is bad
> for Postgres, but moving the database to another server didn't change
> performance at all). Some of the key parameters from postgresql.conf are:
>
> max_connections = 100
> shared_buffers = 16MB
> work_mem = 64MB
> everything else is set to the default
>
> One of my tables has 660,000 records and doing a SELECT * from that table
> (without any joins or sorts) takes 72 seconds. Ordering the table based on 3
> columns almost doubles that time to an average of 123 seconds. To me, those
> numbers are crazy slow and I don't understand why the queries are taking so
> long. The tables are UTF-8 encode and contain a mix of languages (English,
> Spanish, etc). I'm running the query from pgadmin3 on a remote host. The
> server has nothing else running on it except the database.
>
> As a test I tried splitting up the data across a number of other tables. I
> ran 10 queries (to correspond with the 10 tables) with a UNION ALL to join
> the results together. This was even slower, taking an average of 103 seconds
> to complete the generic select all query.
>
> I'm convinced something is wrong, I just can't pinpoint where it is. I can
> provide any other information necessary. If anyone has any suggestions it
> would be greatly appreciated.

Maybe there is a lot of dead rows?  Do a
VACUUM VERBOSE;

That performance is quite slow unless the rows are really big (you
have huge text or bytea columns).  What is the average row size in
bytes?  Try running the following command as a benchmark:

select generate_series(1,500000);

on my imac that takes about 600ms.

merlin

Re: Performance Question

From
PFC
Date:
> I've been searching for performance metrics and tweaks for a few weeks
> now. I'm trying to determine if the length of time to process my queries
> is accurate or not and I'm having a difficult time determining that. I
> know postgres performance is very dependent on hardware and settings and
> I understand how difficult it is to tackle. However, I was wondering if
> I could get some feedback based on my results please.

    Well, the simplest thing is to measure the time it takes to process a
query, but :

    - EXPLAIN ANALYZE will always report a longer time than the reality,
because instrumenting the query takes time. For instance, EXPLAIN ANALYZE
on a count(*) on a query could take more time to count how many times the
"count" aggregate is called and how much time is spent in it, than to
actually compute the aggregate... This is because it takes much longer to
measure the time it takes to call "count" on a row (syscalls...) than it
takes to increment the count.
    This is not a problem as long as you are aware of it, and the information
provided by EXPLAIN ANALYZE is very valuable.

    - Using \timing in psql is also a good way to examine queries, but if
your query returns lots of results, the time it takes for the client to
process those results will mess with your measurements. In this case a
simple : SELECT sum(1) FROM (your query) can provide less polluted
timings. Remember you are not that interested in client load : you can
always add more webservers, but adding more database servers is a lot more
difficult.

    - You can add some query logging in your application (always a good idea
IMHO). For instance, the administrator (you) could see a list of queries
at the bottom of the page with the time it takes to run them. In that
case, keep in mind that any load will add randomness to this measurements.
For instance, when you hit F5 in your browser, of the webserver and
database run on the same machine as the browser, the browser's CPU usage
can make one of your queries appear to take up to half a second... even if
it takes, in reality, half a millisecond... So, average.
    You could push the idea further. Sometimes I log the parameterized query
(without args), the args separately, and the query time, so I can get
average timings for things like "SELECT stuff FROM table WHERE column=$1",
not get a zillion separate queries depending on the parameters. Such
logging can destroy your performance, though, use with care.

    OF COURSE YOU SHOULD MEASURE WHAT IS RELEVANT, that is, queries that your
application uses.

> The database is running on a dual-core 2GHz Opteron processor with 8GB
> of RAM.

    8GB. 64 bits I presume ?

> The drives are 10K RPM 146GB drives in RAID 5 (I've read RAID 5 is bad
> for Postgres, but moving the database to another server didn't change
> performance at all).

    RAID5 = good for reads, and large writes.
    RAID5 = hell for small random writes.
    Depends on your load...

> shared_buffers = 16MB

    That's a bit small IMHO. (try 2 GB).

> work_mem = 64MB
> everything else is set to the default
>
> One of my tables has 660,000 records and doing a SELECT * from that
> table (without any joins or sorts) takes 72 seconds.

    Well, sure, but why would you do such a thing ? I mean, I don't know your
row size, but say it is 2 KB, you just used 1.5 GB of RAM on the client
and on the server. Plus of course transferring all this data over your
network connection. If client and server are on the same machine, you just
zapped 3 GB of RAM. I hope you don't do too many of those concurrently...
    This is never going to be fast and it is never going to be a good
performance metric.

    If you need to pull 600.000 rows from a table, use a CURSOR, and pull
them in batches of say, 1000.
    Then you will use 600 times less RAM. I hope you have gigabit ethernet
though. Network and disk IO will be your main bottleneck.

    If you don't need to pull 600.000 rows from a table, well then, don't do
it.

    If you're using a client app to display the results, well, how long does
it take to display 600.000 rows in a GUI box ?...

> Ordering the table based on 3 columns almost doubles that time to an
> average of 123 seconds.

    Same as above, if your rows are small, say 100 bytes, you're sorting 66
megabytes, which would easily be done in RAM, but you specified work_mem
too small, so it is done on disk, with several passes. If your rows are
large, well you're facing a multi gigabyte disksort with only 64 MB of
working memory, so it's really going to take lots of passes.

    If you often need to pull 600.000 rows from a table in a specific order,
create an index on the column, use a CURSOR, and pull them in batches of
say, 1000.
    If you seldom need to, don't create an index but do use a CURSOR, and
pull them in batches of say, 1000.
    If you don't need to pull 600.000 rows from a table in a specific order,
well then, don't do it.

> To me, those numbers are crazy slow and I don't understand why the
> queries are taking so long. The tables are UTF-8 encode and contain a
> mix of languages (English, Spanish, etc). I'm running the query from
> pgadmin3 on a remote host. The server has nothing else running on it
> except the database.

    OK, I presume you are sorting UNICODE strings (which is also slower than
binary compare) so in this case you should really try to minimize the
number of string comparisons which means using a much larger work_mem.

> I'm convinced something is wrong, I just can't pinpoint where it is. I
> can provide any other information necessary. If anyone has any
> suggestions it would be greatly appreciated.

    Well, the big questions are :

    - do you need to run this query often ?
    - what do you use it for ?
    - how many bytes does it weigh ?

    Until you answer that, it is difficult to help...