Thread: scale up (postgresql vs mssql)

scale up (postgresql vs mssql)

From
Eyal Wilde
Date:
hi,

i had a stored procedure in ms-sql server. this stored procedure gets a parameter (account-id), dose about 20 queries, fills some temporary tables, and finally, returns a few result-sets. this stored procedure converted to stored function in postgresql (9.1). the result-sets are being returned using refcursors. this stored function is logically, almost identical to the ms-sql stored procedure.  a LOT of work had been done to make postgresql getting close to ms-sql speed (preparing temp-tables in advance, using "analyze" in special places inside the stored function in order to hint the optimizer that the temp-tables have very few records, thus eliminating unnecessary and expansive hash-join, and a lot more..). after all that, the stored function is running in a reasonable speed (normally ~60 milliseconds).

now, i run a test that simulates 20 simultaneous clients, asking for "account-id" randomly. once a client get a result, it immediately asks for another one. the test last 5 seconds.  i use a connection pool (with Tomcat web-server). the pool is automatically increased to ~20 connections (as expected). the result is postgresql dose ~60 "account-id"s, whereas ms-sql dose ~330 "account-id"s. postgresql shows that each "account-id" took about 400-1000 msec ,which is so much slower than the ~60 msec of a single execution. 

in a single execution postgresql may be less the twice slower than ms-sql, but in 20 simultaneous clients, it's about 6 times worse. why is that?

the hardware is one 4-core xeon. 8GB of ram. the database size is just a few GB's. centos-6.2.

do you think the fact that postgresql use a process per connection (instead of multi-threading) is inherently a weakness of postgrsql, regarding scale-up?
would it be better to limit the number of connections to something like 4, so that executions don't interrupt each other?

thanks in advance for any help!

Re: scale up (postgresql vs mssql)

From
Andrew Dunstan
Date:

On 04/11/2012 06:11 PM, Eyal Wilde wrote:
> hi,
>
> i had a stored procedure in ms-sql server. this stored procedure gets
> a parameter (account-id), dose about 20 queries, fills some temporary
> tables, and finally, returns a few result-sets. this stored procedure
> converted to stored function in postgresql (9.1). the result-sets are
> being returned using refcursors. this stored function is logically,
> almost identical to the ms-sql stored procedure.  a LOT of work had
> been done to make postgresql getting close to ms-sql speed (preparing
> temp-tables in advance, using "analyze" in special places inside the
> stored function in order to hint the optimizer that the temp-tables
> have very few records, thus eliminating unnecessary and expansive
> hash-join, and a lot more..). after all that, the stored function is
> running in a reasonable speed (normally ~60 milliseconds).
>
> now, i run a test that simulates 20 simultaneous clients, asking for
> "account-id" randomly. once a client get a result, it immediately asks
> for another one. the test last 5 seconds.  i use a connection pool
> (with Tomcat web-server). the pool is automatically increased to ~20
> connections (as expected). the result is postgresql dose ~60
> "account-id"s, whereas ms-sql dose ~330 "account-id"s. postgresql
> shows that each "account-id" took about 400-1000 msec ,which is so
> much slower than the ~60 msec of a single execution.
>
> in a single execution postgresql may be less the twice slower than
> ms-sql, but in 20 simultaneous clients, it's about 6 times worse. why
> is that?
>
> the hardware is one 4-core xeon. 8GB of ram. the database size is just
> a few GB's. centos-6.2.
>
> do you think the fact that postgresql use a process per connection
> (instead of multi-threading) is inherently a weakness of postgrsql,
> regarding scale-up?
> would it be better to limit the number of connections to something
> like 4, so that executions don't interrupt each other?
>
> thanks in advance for any help!


I doubt that the process-per-connection has much effect, especially on
Linux where process creation is extremely cheap, and you're using a
connection pooler anyway. The server is pretty modest, though. If you
can add enough RAM that you can fit the whole db into Postgres shared
buffers you might find things run a whole lot better. You should show us
your memory settings, among other things - especially shared_buffers,
temp_buffers and work_mem.

cheers

andrew



Re: scale up (postgresql vs mssql)

From
"Kevin Grittner"
Date:
Eyal Wilde <eyal@impactsoft.co.il> wrote:

> now, i run a test that simulates 20 simultaneous clients, asking
> for "account-id" randomly. once a client get a result, it
> immediately asks for another one. the test last 5 seconds.  i use
> a connection pool (with Tomcat web-server). the pool is
> automatically increased to ~20 connections (as expected). the
> result is postgresql dose ~60 "account-id"s, whereas ms-sql dose
> ~330 "account-id"s. postgresql shows that each "account-id" took
> about 400-1000 msec ,which is so much slower than the ~60 msec of
> a single execution.

> the hardware is one 4-core xeon. 8GB of ram. the database size is
> just a few GB's. centos-6.2.
>
> do you think the fact that postgresql use a process per connection
> (instead of multi-threading) is inherently a weakness of
> postgrsql, regarding scale-up?

I doubt that has much to do with anything.

> would it be better to limit the number of connections to something
> like 4, so that executions don't interrupt each other?

The point where a lot of workloads hit optimal performance is with
the number of active connections limited to ((core count * 2) +
effective spindle count).  Determining "active spindle count can be
tricky (for example it is zero in a fully-cached read-only
workload), so it takes more information than you've given us to know
exactly where the optimal point might be, but if it's a single
drive, then if you have 4 cores (not 2 cores with hyperthreading)
you might want to limit your connection pool to somewhere in the 8
to 10 range.  You generally should configure a connection pool to be
transaction based, with a request to start a transaction while all
connections are busy causing the request queue, with completion of a
transaction causing it to pull a request for the queue, if
available.  I'm pretty sure that Tomcat's pool supports this.

Could you describe your disk system and show us the result of
running the query?:

http://wiki.postgresql.org/wiki/Server_Configuration

-Kevin

Re: scale up (postgresql vs mssql)

From
Claudio Freire
Date:
On Wed, Apr 11, 2012 at 7:11 PM, Eyal Wilde <eyal@impactsoft.co.il> wrote:
> in a single execution postgresql may be less the twice slower than ms-sql,
> but in 20 simultaneous clients, it's about 6 times worse. why is that?
>
> the hardware is one 4-core xeon. 8GB of ram. the database size is just a few
> GB's. centos-6.2.
>
> do you think the fact that postgresql use a process per connection (instead
> of multi-threading) is inherently a weakness of postgrsql, regarding
> scale-up?
> would it be better to limit the number of connections to something like 4,
> so that executions don't interrupt each other?

What about posting some details on the tables, the 20 queries, the temp table?

I'm thinking creating so many temp tables may be hurting pgsql more
than mssql. You might want to try unlogged temp tables, which more
closely resemble mssql temp tables.

Re: scale up (postgresql vs mssql)

From
Steve Crawford
Date:
On 04/13/2012 08:04 AM, Claudio Freire wrote:
> ...You might want to try unlogged temp tables, which more
> closely resemble mssql temp tables.
>
If they are permanent tables used for temporary storage then making them
unlogged may be beneficial. But actual temporary tables *are* unlogged
and attempting to create an unlogged temporary table will raise an error.

Cheers,
Steve


Re: scale up (postgresql vs mssql)

From
Claudio Freire
Date:
On Fri, Apr 13, 2012 at 1:36 PM, Steve Crawford
<scrawford@pinpointresearch.com> wrote:
>>
> If they are permanent tables used for temporary storage then making them
> unlogged may be beneficial. But actual temporary tables *are* unlogged and
> attempting to create an unlogged temporary table will raise an error.

Interesting, yes, I was wondering why PG didn't make temp tables
unlogged by default.

Then, I guess, the docs[0] have to mention it. Especially due to the
error condition. Right?

[0] http://www.postgresql.org/docs/9.1/static/sql-createtable.html

Re: scale up (postgresql vs mssql)

From
Steve Crawford
Date:
On 04/13/2012 09:43 AM, Claudio Freire wrote:
> On Fri, Apr 13, 2012 at 1:36 PM, Steve Crawford
> <scrawford@pinpointresearch.com>  wrote:
>> If they are permanent tables used for temporary storage then making them
>> unlogged may be beneficial. But actual temporary tables *are* unlogged and
>> attempting to create an unlogged temporary table will raise an error.
> Interesting, yes, I was wondering why PG didn't make temp tables
> unlogged by default.
>
> Then, I guess, the docs[0] have to mention it. Especially due to the
> error condition. Right?
>
> [0] http://www.postgresql.org/docs/9.1/static/sql-createtable.html
>
Well, the fact that temporary and unlogged cannot be simultaneously
specified *is* documented:

CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF
NOT EXISTS ] table_name

But it would probably be worth adding a note under the description of
temporary tables that they are, in fact, unlogged.

Cheers,
Steve


Re: scale up (postgresql vs mssql)

From
Claudio Freire
Date:
On Fri, Apr 13, 2012 at 2:49 PM, Steve Crawford
<scrawford@pinpointresearch.com> wrote:
> Well, the fact that temporary and unlogged cannot be simultaneously
> specified *is* documented:
>
> CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT
> EXISTS ] table_name
>
> But it would probably be worth adding a note under the description of
> temporary tables that they are, in fact, unlogged.

Yes, it was quite subtle, but you're right. I should've read the
syntax more closely.

Re: scale up (postgresql vs mssql)

From
Merlin Moncure
Date:
On Wed, Apr 11, 2012 at 5:11 PM, Eyal Wilde <eyal@impactsoft.co.il> wrote:
> hi,
>
> i had a stored procedure in ms-sql server. this stored procedure gets a
> parameter (account-id), dose about 20 queries, fills some temporary tables,
> and finally, returns a few result-sets. this stored procedure converted to
> stored function in postgresql (9.1). the result-sets are being returned
> using refcursors. this stored function is logically, almost identical to the
> ms-sql stored procedure.  a LOT of work had been done to make postgresql
> getting close to ms-sql speed (preparing temp-tables in advance, using
> "analyze" in special places inside the stored function in order to hint the
> optimizer that the temp-tables have very few records, thus
> eliminating unnecessary and expansive hash-join, and a lot more..). after
> all that, the stored function is running in a reasonable speed (normally ~60
> milliseconds).
>
> now, i run a test that simulates 20 simultaneous clients, asking for
> "account-id" randomly. once a client get a result, it immediately asks for
> another one. the test last 5 seconds.  i use a connection pool (with Tomcat
> web-server). the pool is automatically increased to ~20 connections (as
> expected). the result is postgresql dose ~60 "account-id"s, whereas ms-sql
> dose ~330 "account-id"s. postgresql shows that each "account-id" took about
> 400-1000 msec ,which is so much slower than the ~60 msec of a single
> execution.
>
> in a single execution postgresql may be less the twice slower than ms-sql,
> but in 20 simultaneous clients, it's about 6 times worse. why is that?
>
> the hardware is one 4-core xeon. 8GB of ram. the database size is just a few
> GB's. centos-6.2.
>
> do you think the fact that postgresql use a process per connection (instead
> of multi-threading) is inherently a weakness of postgrsql, regarding
> scale-up?
> would it be better to limit the number of connections to something like 4,
> so that executions don't interrupt each other?
>
> thanks in advance for any help!

let's see the procedure.  I bet that the temp tables are the issue
here -- while they are speeding single user the i/o is stacking during
high concurrency (you are also writing to system catalogs which is not
good).

I'm sure we can get it fast but it's hard to do that without seeing the code.

merlin

Re: scale up (postgresql vs mssql)

From
Eyal Wilde
Date:
hi,

thanks a lot to all of you for your help.

(i'm sorry i did not know how to reply to a certain message)

i found that the best number of active connections is indeed 8-10. with 8-10 active connections postgresql did ~170 "account-id"s. this is still only half of what mssql did, but it now makes sence, considering that mssql works close to twice faster.

i "played" with work_mem, shared_buffers, temp_buffers. i ran the tests with both of the following configurations, but no significant difference was found.

thanks again for any more help.


"version";"PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit"
"bytea_output";"escape"
"client_encoding";"UNICODE"
"lc_collate";"en_US.UTF-8"
"lc_ctype";"en_US.UTF-8"
"listen_addresses";"*"
"log_destination";"stderr"
"log_line_prefix";"%t "
"logging_collector";"on"
"max_connections";"100"
"max_stack_depth";"2MB"
"server_encoding";"UTF8"
"shared_buffers";"32MB"
"TimeZone";"Israel"
"wal_buffers";"1MB"



"version";"PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit"
"bytea_output";"escape"
"client_encoding";"UNICODE"
"lc_collate";"en_US.UTF-8"
"lc_ctype";"en_US.UTF-8"
"listen_addresses";"*"
"log_destination";"stderr"
"log_line_prefix";"%t "
"logging_collector";"on"
"max_connections";"100"
"max_stack_depth";"2MB"
"port";"5432"
"server_encoding";"UTF8"
"shared_buffers";"3GB"
"temp_buffers";"64MB"
"TimeZone";"Israel"
"wal_buffers";"16MB"
"work_mem";"20MB"

Re: scale up (postgresql vs mssql)

From
Andy Colson
Date:
On 4/15/2012 7:43 AM, Eyal Wilde wrote:
> hi,
>
> thanks a lot to all of you for your help.
>
> (i'm sorry i did not know how to reply to a certain message)
>
> i found that the best number of active connections is indeed 8-10. with
> 8-10 active connections postgresql did ~170 "account-id"s. this is still
> only half of what mssql did, but it now makes sence, considering that
> mssql works close to twice faster.
>
> i "played" with work_mem, shared_buffers, temp_buffers. i ran the tests
> with both of the following configurations, but no significant difference
> was found.
>
> thanks again for any more help.
>

We'd need to know if you are CPU bound or IO bound before we can help.
Watch "vmstat 2" while the server is busy (and maybe post a few rows).


> i had a stored procedure in ms-sql server. this stored procedure gets a parameter (account-id), dose about 20
queries,fills some temporary tables, and finally, returns a few result-sets. this stored procedure converted to stored
functionin postgresql (9.1). the result-sets are being returned using refcursors. this stored function is logically,
almostidentical to the ms-sql stored procedure. 

I think that may be a problem.  Treating PG like its mssql wont work
well I'd bet.  things that work well in one database may not work well
in another.

Instead of temp tables, have you tried derived tables?  Instead of:

insert into temptable select * from stuff;
select * from temptable;

try something like:

select * from (
   select * from stuff
) as subq

Another thing you might try to remove temp tables is to use views.

I dont know if it'll be faster, I'm just guessing.  Pulling out
individual parts and running "explain analyze" on them will help you
find the slow ones.  Finding which is faster (temp tables, derived
tables, or views) might help you deiced what needs to be re-written.

Also, I'm not sure how well PG does "return multiple refcursors". there
may be a lot of round trips from client to server to fetch next.  How
hard would it be to re-do your single procedure that returns a bunch of
refcursors into multiple procedures each returning one resultset?

Or maybe it would be something you can speed test to see if it would
even make a difference.

-Andy

Re: scale up (postgresql vs mssql)

From
Richard Huxton
Date:
On 15/04/12 13:43, Eyal Wilde wrote:
>
> "version";"PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc
> (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit"

You've probably checked this, but if not it's worth making sure your
disk I/O is roughly equivalent for the two operating-systems. It might
be poor drivers on the CentOs system.

Do you have two equivalent machines, or are you dual-booting?

--
   Richard Huxton
   Archonet Ltd

Re: scale up (postgresql vs mssql)

From
Eyal Wilde
Date:
hi all,

i ran vmstat during the test :

[yb@centos08 ~]$ vmstat 1 15
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 6131400 160556 1115792    0    0     1    12   22   17  0  0 100  0  0
 0  0      0 6131124 160556 1115800    0    0     0   532  540  360  1  0 99  0  0
 5  1      0 6127852 160556 1116048    0    0     0  3352 1613 1162 18  1 80  1  0
 7  0      0 6122984 160556 1117312    0    0     0 14608 5408 3703 86  7  6  1  0
 8  0      0 6121372 160556 1117968    0    0     0 13424 5434 3741 86  7  5  2  0
 7  1      0 6120504 160556 1118952    0    0     0 13616 5296 3546 86  7  5  2  0
 7  0      0 6119528 160572 1119728    0    0     0 13836 5494 3597 86  7  4  2  0
 6  1      0 6118744 160572 1120408    0    0     0 15296 5552 3869 89  8  3  1  0
 2  0      0 6118620 160572 1120288    0    0     0 13792 4548 3054 63  6 25  6  0
 0  0      0 6118620 160572 1120392    0    0     0  3552 1090  716  8  1 88  3  0
 0  0      0 6118736 160572 1120392    0    0     0  1136  787  498  1  0 98  1  0
 0  0      0 6118868 160580 1120400    0    0     0    28  348  324  1  0 99  0  0
 0  0      0 6118992 160580 1120440    0    0     0   380  405  347  1  0 99  1  0
 0  0      0 6118868 160580 1120440    0    0     0  1544  468  320  1  0 100  0  0
 0  0      0 6118720 160580 1120440    0    0     0     0  382  335  0  0 99  0  0


the temp-tables normally don't populate more then 10 rows. they are being created in advanced. we don't drop them, we use ON COMMIT DELETE ROWS. i believe temp-tables are in the RAM, so no disk-i/o, right? and also: no writing to the system catalogs, right?

about returning multiple refcursors, we checked this issue in the past, and we concluded that returning many small refcursors (all have the same structure), is faster than returning 1 big refcursor. dose it sound wired (maybe it worth more tests)?  that's why we took that path. 

about having multiple procedures each returning one resultset: it's too much code rewrite at the web-server's code.

the disk system is a built-in intel fake-raid, configured as raid0. i do a dual-boot, so both windows and centos are on the same hardware.

Thanks again for any more help.

Re: scale up (postgresql vs mssql)

From
Andy Colson
Date:
On 4/18/2012 2:32 AM, Eyal Wilde wrote:
> hi all,
>
> i ran vmstat during the test :
>
> [yb@centos08 ~]$ vmstat 1 15
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  2  0      0 6118620 160572 1120288    0    0     0 13792 4548 3054 63
  6 25  6  0
> the temp-tables normally don't populate more then 10 rows. they are
> being created in advanced. we don't drop them, we use ON COMMIT DELETE
> ROWS. i believe temp-tables are in the RAM, so no disk-i/o, right? and
> also: no writing to the system catalogs, right?

Temp tables are not 100% ram, they might spill to disk. The vmstat shows
there is disk io.  The BO column (blocks out) shows you are writing to
disk.  And you have wait time (which means one or more of the cpus is
stopped waiting for disk).

I don't know if the disk io is because of the temp tables (I've never
used them myself), or something else (stats, vacuum, logs, other sql, etc).

I'd bet, though, that a derived table would be faster than "create temp
table...; insert into temp .... ; select .. from temp;"

Of course it may not be that much faster... and it might require a lot
of code change.  Might be worth a quick benchmark though.

>
> about returning multiple refcursors, we checked this issue in the past,
> and we concluded that returning many small refcursors (all have the same
> structure), is faster than returning 1 big refcursor. dose it sound
> wired (maybe it worth more tests)?  that's why we took that path.
>

No, if you tried it out, I'd stick with what you have.  I've never used
them myself, so I was just wondering aloud.

-Andy

Re: scale up (postgresql vs mssql)

From
Merlin Moncure
Date:
On Wed, Apr 18, 2012 at 2:32 AM, Eyal Wilde <eyal@impactsoft.co.il> wrote:
> hi all,
>
> i ran vmstat during the test :
>
> [yb@centos08 ~]$ vmstat 1 15
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa st
>  0  0      0 6131400 160556 1115792    0    0     1    12   22   17  0  0
> 100  0  0
>  0  0      0 6131124 160556 1115800    0    0     0   532  540  360  1  0 99
>  0  0
>  5  1      0 6127852 160556 1116048    0    0     0  3352 1613 1162 18  1 80
>  1  0
>  7  0      0 6122984 160556 1117312    0    0     0 14608 5408 3703 86  7  6
>  1  0
>  8  0      0 6121372 160556 1117968    0    0     0 13424 5434 3741 86  7  5
>  2  0
>  7  1      0 6120504 160556 1118952    0    0     0 13616 5296 3546 86  7  5
>  2  0
>  7  0      0 6119528 160572 1119728    0    0     0 13836 5494 3597 86  7  4
>  2  0
>  6  1      0 6118744 160572 1120408    0    0     0 15296 5552 3869 89  8  3
>  1  0
>  2  0      0 6118620 160572 1120288    0    0     0 13792 4548 3054 63  6 25
>  6  0
>  0  0      0 6118620 160572 1120392    0    0     0  3552 1090  716  8  1 88
>  3  0
>  0  0      0 6118736 160572 1120392    0    0     0  1136  787  498  1  0 98
>  1  0
>  0  0      0 6118868 160580 1120400    0    0     0    28  348  324  1  0 99
>  0  0
>  0  0      0 6118992 160580 1120440    0    0     0   380  405  347  1  0 99
>  1  0
>  0  0      0 6118868 160580 1120440    0    0     0  1544  468  320  1  0
> 100  0  0
>  0  0      0 6118720 160580 1120440    0    0     0     0  382  335  0  0 99
>  0  0
>
>
> the temp-tables normally don't populate more then 10 rows. they are being
> created in advanced. we don't drop them, we use ON COMMIT DELETE ROWS. i
> believe temp-tables are in the RAM, so no disk-i/o, right? and also: no
> writing to the system catalogs, right?
>
> about returning multiple refcursors, we checked this issue in the past, and
> we concluded that returning many small refcursors (all have the same
> structure), is faster than returning 1 big refcursor. dose it sound wired
> (maybe it worth more tests)?  that's why we took that path.

no chance of seeing the code or a reasonable reproduction?

merlin

Re: scale up (postgresql vs mssql)

From
Eyal Wilde
Date:
hi, all.

well, i wondered why there is high rate of bo (blocks out). the procedure is practically read-only during the whole test. although it's not strictly read-only, because in a certain condition, there might be writing to a certain table. but that condition can not be met during this test.

so, i created a ramdisk:
mkfs -q /dev/ram2 100000
mkdir -p /ram4
mount /dev/ram2 /ram4
df -H | grep /ram4
 
and then:
CREATE TABLESPACE pgram4 OWNER postgres LOCATION '/ram4';

and in postgresql.conf i configured:
temp_tablespaces = 'pgram4'

now, i believe, all the temp-table were in RAM.
vmstat showed:
r b swpd free buff cache si so bi bo in cs us sy id wa st
6 0 0 5916720 69488 1202668 0 0 0 3386 1902 1765 25 3 72 1 0
9 1 0 5907728 69532 1204212 0 0 0 1392 5375 4510 88 8 3 1 0
7 0 0 5886584 69672 1205096 0 0 0 1472 5278 4520 88 10 2 0 0
8 0 0 5877384 69688 1206384 0 0 0 1364 5312 4522 89 8 2 1 0
8 0 0 5869332 69748 1207188 0 0 0 1296 5285 4437 88 8 3 1 0
6 1 0 5854404 69852 1208776 0 0 0 2955 5333 4518 88 9 2 0 0

10 times less bo (blocks out)
5 times less wa (percentage of time spent by cpu waiting to IO)
2 times less b (wait Queue – Process which are waiting for I/O)

the overall test result was (just?) ~15% better...

when i created the ramdisk with mkfs.ext4 (instead of the default ext2), the performance was the same (~15% better), but vmstat output looked much the same as before (without the ramdisk) !?? why is that?

as i mentioned, the procedure is practically read-only. shouldn't i expect bo (blocks out) to be ~0? after forcing temp-tables to be in the RAM, what other reasons may be the cause for bo (blocks out)?

i see no point pasting the whole procedure here, since it's very long. the general course of the procedure is:
create temp-tables if they are not exist (practically, they do exist)
do a lot of: insert into temp-table select from table
and         : insert into temp-table select from table join temp-table....
after finished insert into temp-table: analyze temp-table (this was the only way the optimizer behaved properly)
finally, open refcursors of select from temp-tables

Thanks again.

Re: scale up (postgresql vs mssql)

From
Merlin Moncure
Date:
On Sun, Apr 29, 2012 at 8:21 AM, Eyal Wilde <eyal@impactsoft.co.il> wrote:
> hi, all.
>
> well, i wondered why there is high rate of bo (blocks out). the procedure is
> practically read-only during the whole test. although it's not strictly
> read-only, because in a certain condition, there might be writing to a
> certain table. but that condition can not be met during this test.
>
> so, i created a ramdisk:
> mkfs -q /dev/ram2 100000
> mkdir -p /ram4
> mount /dev/ram2 /ram4
> df -H | grep /ram4
>
> and then:
> CREATE TABLESPACE pgram4 OWNER postgres LOCATION '/ram4';
>
> and in postgresql.conf i configured:
> temp_tablespaces = 'pgram4'
>
> now, i believe, all the temp-table were in RAM.
> vmstat showed:
> r b swpd free buff cache si so bi bo in cs us sy id wa st
> 6 0 0 5916720 69488 1202668 0 0 0 3386 1902 1765 25 3 72 1 0
> 9 1 0 5907728 69532 1204212 0 0 0 1392 5375 4510 88 8 3 1 0
> 7 0 0 5886584 69672 1205096 0 0 0 1472 5278 4520 88 10 2 0 0
> 8 0 0 5877384 69688 1206384 0 0 0 1364 5312 4522 89 8 2 1 0
> 8 0 0 5869332 69748 1207188 0 0 0 1296 5285 4437 88 8 3 1 0
> 6 1 0 5854404 69852 1208776 0 0 0 2955 5333 4518 88 9 2 0 0
>
> 10 times less bo (blocks out)
> 5 times less wa (percentage of time spent by cpu waiting to IO)
> 2 times less b (wait Queue – Process which are waiting for I/O)
>
> the overall test result was (just?) ~15% better...
>
> when i created the ramdisk with mkfs.ext4 (instead of the default ext2), the
> performance was the same (~15% better), but vmstat output looked much the
> same as before (without the ramdisk) !?? why is that?
>
> as i mentioned, the procedure is practically read-only. shouldn't i expect
> bo (blocks out) to be ~0? after forcing temp-tables to be in the RAM, what
> other reasons may be the cause for bo (blocks out)?
>
> i see no point pasting the whole procedure here, since it's very long. the
> general course of the procedure is:
> create temp-tables if they are not exist (practically, they do exist)
> do a lot of: insert into temp-table select from table
> and         : insert into temp-table select from table join temp-table....
> after finished insert into temp-table: analyze temp-table (this was the only
> way the optimizer behaved properly)
> finally, open refcursors of select from temp-tables

i/o writes from read queries can be caused by a couple of things:
*) sorts, and other 'spill to disk' features of large queries
*) hint bits (what I think is happening in your case):

the first time a tuple is touched after it's controlling transaction
is committed, the transaction's state (committed or aborted) is saved
on the tuple itself to optimize subsequent accesses.  for most
workloads this is barely noticeable but it can show up if you're
moving a lot of records around per transaction.

merlin

Re: scale up (postgresql vs mssql)

From
Eyal Wilde
Date:
guess what:

after reducing bo (blocks out) to ~10% by using a ramdisk (improving overall performance by ~15-20%), i now managed to reduced it to ~3% by .... removing the "analyze temp-table" statements. 
it also :
reduced b (Process which are waiting for I/O) to zero
reduced wa (percentage of time spent by cpu for waiting to IO) to zero
and reduced id (cpu idle time percent) to be 4 times less.

r b swpd free buff cache si so bi bo in cs us sy id wa st
8 0 0 6144048 237472 485640 0 0 0 40 4380 3237 79 5 16 0 0
8 0 0 6138216 238032 485736 0 0 0 40 4741 3506 93 7 0 0 0
8 0 0 6125256 238276 486484 0 0 0 2709 4801 3447 92 7 1 0 0
7 0 0 6119400 238376 485792 0 0 0 32 4854 4311 93 6 1 0 0
5 0 0 6105624 238476 486172 0 0 0 364 4783 3430 92 7 1 0 0
5 0 0 6092956 238536 485384 0 0 0 416 4954 3652 91 8 2 0 0


unfortunately, this time there was no significant performance gain. ):

i afraid now there are certain statements that do not use an optimal query-plan. these statements looks like:
insert into temp-table1 (value) select table1.f1 from table1 join temp-table2 on table1.recid=temp-table2.recid where table1.f2 in (x,y,z);
temp-table2 never contains more then 10 records.
there is an index on table1: recid,f2
previous tests showed that the query-optimizer normally chose to do hash-join (i.e: ignoring the index), but once we did "analyze temp-table2;", the index was used. i read somewhere that the optimizer's assumption is that every temp-table contains 1k of records. i believe that is the reason for the bad plan. we tried to do "set enable_hashjoin=false;", but it did not seem to be working inside a function (although it did work independently). what can we do about that?

another thing i found is that a sequence on a temp-table is being stored on the current tablespace, and not on the temp_tablespace. would you consider this as a bug?
anyway, i found a way to not using any sequences on the temp-tables. but this did not change the bo (blocks-out) figures.

merlin, 
about the Hint Bits. i read this article: http://wiki.postgresql.org/wiki/Hint_Bits
as far as i understand, this is not the case here, because i ran the test many times, and there were no DML operations at all in between. so i believe that the hint-bits are already cleared in most of the tuples.

Thanks again for any more help.

Re: scale up (postgresql vs mssql)

From
Merlin Moncure
Date:
On Thu, May 3, 2012 at 12:07 PM, Eyal Wilde <eyal@impactsoft.co.il> wrote:
> guess what:
>
> after reducing bo (blocks out) to ~10% by using a ramdisk (improving overall
> performance by ~15-20%), i now managed to reduced it to ~3% by .... removing
> the "analyze temp-table" statements.
> it also :
> reduced b (Process which are waiting for I/O) to zero
> reduced wa (percentage of time spent by cpu for waiting to IO) to zero
> and reduced id (cpu idle time percent) to be 4 times less.
>
> r b swpd free buff cache si so bi bo in cs us sy id wa st
> 8 0 0 6144048 237472 485640 0 0 0 40 4380 3237 79 5 16 0 0
> 8 0 0 6138216 238032 485736 0 0 0 40 4741 3506 93 7 0 0 0
> 8 0 0 6125256 238276 486484 0 0 0 2709 4801 3447 92 7 1 0 0
> 7 0 0 6119400 238376 485792 0 0 0 32 4854 4311 93 6 1 0 0
> 5 0 0 6105624 238476 486172 0 0 0 364 4783 3430 92 7 1 0 0
> 5 0 0 6092956 238536 485384 0 0 0 416 4954 3652 91 8 2 0 0
>
>
> unfortunately, this time there was no significant performance gain. ):
>
> i afraid now there are certain statements that do not use an optimal
> query-plan. these statements looks like:
> insert into temp-table1 (value) select table1.f1 from table1 join
> temp-table2 on table1.recid=temp-table2.recid where table1.f2 in (x,y,z);
> temp-table2 never contains more then 10 records.
> there is an index on table1: recid,f2
> previous tests showed that the query-optimizer normally chose to do
> hash-join (i.e: ignoring the index), but once we did "analyze temp-table2;",
> the index was used. i read somewhere that the optimizer's assumption is that
> every temp-table contains 1k of records. i believe that is the reason for
> the bad plan. we tried to do "set enable_hashjoin=false;", but it did not
> seem to be working inside a function (although it did work independently).
> what can we do about that?

let's see the query plan...when you turned it off, did it go faster?
put your suspicious plans here: http://explain.depesz.com/

> another thing i found is that a sequence on a temp-table is being stored on
> the current tablespace, and not on the temp_tablespace. would you consider
> this as a bug?
> anyway, i found a way to not using any sequences on the temp-tables. but
> this did not change the bo (blocks-out) figures.
>
> merlin,
> about the Hint Bits. i read this
> article: http://wiki.postgresql.org/wiki/Hint_Bits
> as far as i understand, this is not the case here, because i ran the test
> many times, and there were no DML operations at all in between. so i believe
> that the hint-bits are already cleared in most of the tuples.

yeah.  well, your query was an insert? that would naturally result in
blocks out.

merlin

Re: scale up (postgresql vs mssql)

From
Robert Klemme
Date:
On Fri, May 4, 2012 at 3:04 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

> let's see the query plan...when you turned it off, did it go faster?
> put your suspicious plans here: http://explain.depesz.com/

I suggest to post three plans:

1. insert into temp table
2. access to temp table before analyze
3. access to temp table after analyze

Maybe getting rid of the temp table (e.g. using a view or even an
inline view) is even better than optimizing temp table access.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Re: scale up (postgresql vs mssql)

From
Merlin Moncure
Date:
On Wed, May 9, 2012 at 2:11 AM, Robert Klemme
<shortcutter@googlemail.com> wrote:
> On Fri, May 4, 2012 at 3:04 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
>> let's see the query plan...when you turned it off, did it go faster?
>> put your suspicious plans here: http://explain.depesz.com/
>
> I suggest to post three plans:
>
> 1. insert into temp table
> 2. access to temp table before analyze
> 3. access to temp table after analyze
>
> Maybe getting rid of the temp table (e.g. using a view or even an
> inline view) is even better than optimizing temp table access.

yeah -- perhaps a CTE might work as well.

merlin

Re: scale up (postgresql vs mssql)

From
Eyal Wilde
Date:
Hi, all.

this is an obligation from the past:

the same test, that did ~230 results, is now doing ~700 results. that is, BTW even better than mssql.

the ultimate solution for that problem was to NOT to do "ON COMMIT DELETE ROWS" for the temporary tables. instead, we just do "DELETE FROM temp_table1".

doing "TRUNCATE temp_table1" is defiantly the worst case (~100 results in the same test). this is something we knew for a long time, which is why we did "ON COMMIT DELETE ROWS", but eventually it turned out as far from being the best.

another minor issue is that when configuring  temp_tablespace='other_tablespace', the sequences of the temporary tables remain on the 'main_tablespace'. 

i hope that will help making postgres even better :)

Re: scale up (postgresql vs mssql)

From
Andy Colson
Date:
On 6/20/2012 1:01 AM, Eyal Wilde wrote:
> Hi, all.
>
> this is an obligation from the past:
> http://archives.postgresql.org/pgsql-performance/2012-05/msg00017.php
>
> the same test, that did ~230 results, is now doing ~700 results. that
> is, BTW even better than mssql.
>
> the ultimate solution for that problem was to NOT to do "ON COMMIT
> DELETE ROWS" for the temporary tables. instead, we just do "DELETE FROM
> temp_table1".
>
> doing "TRUNCATE temp_table1" is defiantly the worst case (~100 results
> in the same test). this is something we knew for a long time, which is
> why we did "ON COMMIT DELETE ROWS", but eventually it turned out as far
> from being the best.
>
> another minor issue is that when configuring
>   temp_tablespace='other_tablespace', the sequences of the temporary
> tables remain on the 'main_tablespace'.
>
> i hope that will help making postgres even better :)
>

Did you ever try re-writing some of the temp table usage to use
subselect's/views/cte/etc?

-Andy


Re: scale up (postgresql vs mssql)

From
Merlin Moncure
Date:
On Wed, Jun 20, 2012 at 8:43 AM, Andy Colson <andy@squeakycode.net> wrote:
>> this is an obligation from the past:
>> http://archives.postgresql.org/pgsql-performance/2012-05/msg00017.php
>>
>> the same test, that did ~230 results, is now doing ~700 results. that
>> is, BTW even better than mssql.
>>
>> the ultimate solution for that problem was to NOT to do "ON COMMIT
>> DELETE ROWS" for the temporary tables. instead, we just do "DELETE FROM
>> temp_table1".
>>
>> doing "TRUNCATE temp_table1" is defiantly the worst case (~100 results
>> in the same test). this is something we knew for a long time, which is
>> why we did "ON COMMIT DELETE ROWS", but eventually it turned out as far
>> from being the best.
>>
>> another minor issue is that when configuring
>>  temp_tablespace='other_tablespace', the sequences of the temporary
>> tables remain on the 'main_tablespace'.
>>
>> i hope that will help making postgres even better :)
>>
>
> Did you ever try re-writing some of the temp table usage to use
> subselect's/views/cte/etc?

Yeah -- especially CTE.  But, assuming you really do need to keep a
temp table organized and you want absolutely minimum latency in the
temp table manipulating function, you can use a nifty trick so
organize a table around txid_current();

CREATE UNLOGGED TABLE Cache (txid BIGINT DEFAULT txid_current(), a
TEXT, b TEXT);
CREATE INDEX ON Cache(txid);
-- or --
CREATE INDEX ON Cache(txid, a); -- if a is lookup key etc.

When you insert to the table let the default catch the current txid
and make sure that all queries are properly filtering the table on
txid, and that all indexes are left prefixed on txid.

Why do this? Now the record delete operations can be delegated to an
external process.  At any time, a scheduled process can do:
DELETE from Cache;

This is not guaranteed to be faster, but it probably will be.

merlin