Thread: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

From
Robert Bernabe
Date:
Hi All,
    I've been tasked to evaluate PG as a possible replacement of our MS SQL 2000 solution. Our solution is 100% stored procedure/function centric. It's a report generation system whose sole task is to produce text files filled with processed data that is post-processed by a secondary system. Basically options are selected via a web interface and all these parameters are passed unto the stored procedure and then the stored procedure would run and in the process call other stored procedures until eventually a single formatted text file is produced.
    I decided on Fedora Core 7 and the 8.3 Beta release of Enterprise DB PostgreSQL. I decided to port 1 stored procedure plus it's several support stored procedures into pl/pgsql from T-SQL and compare the performance by measuring how long each system takes to produce the text file. For this test,  the output to the text file was discarded and the stored procedure/function would end once the final temporary table is filled with the information that is eventually dumped into the text file.

Windows 2000 Professional + MSDE (/MS SQL) Box    vs.   FC7 + EnterpriseDB PG Box

Note that both boxes have EXACTLY the same hardware (not VMWARE or anything)
AMD X2 3800
2 G RAM DDR 400
80 G Barracuda Sata

The data was copied to the Linux box and checked lightly for consistency versus the windows box (number of tables / columns and records) and they all match. After data transfer to the Linux Box, I ran REINDEX and ANALYZE.

For the actual run the following tables were used and I'm displaying the results of analyze.

INFO:  analyzing "public.AreaDefn"
INFO:  "AreaDefn": scanned 15 of 15 pages, containing 2293 live rows and 0 dead rows; 2293 rows in sample, 2293 estimated total rows
INFO:  analyzing "public.AreaDefn2"
INFO:  "AreaDefn2": scanned 30 of 30 pages, containing 3439 live rows and 0 dead rows; 3000 rows in sample, 3439 estimated total rows
INFO:  analyzing "public.Areas"
INFO:  "Areas": scanned 2 of 2 pages, containing 164 live rows and 0 dead rows; 164 rows in sample, 164 estimated total rows
INFO:  analyzing "public.Brands"
INFO:  "Brands": scanned 1 of 1 pages, containing 11 live rows and 0 dead rows; 11 rows in sample, 11 estimated total rows
INFO:  analyzing "public.Categories"
INFO:  "Categories": scanned 1 of 1 pages, containing 26 live rows and 0 dead rows; 26 rows in sample, 26 estimated total rows
INFO:  analyzing "public.CategoryDefn"
INFO:  "CategoryDefn": scanned 1 of 1 pages, containing 133 live rows and 0 dead rows; 133 rows in sample, 133 estimated total rows
INFO:  analyzing "public.CategoryDefn2"
INFO:  "CategoryDefn2": scanned 2 of 2 pages, containing 211 live rows and 0 dead rows; 211 rows in sample, 211 estimated total rows
INFO:  analyzing "public.CategorySets"
INFO:  "CategorySets": scanned 1 of 1 pages, containing 3 live rows and 0 dead rows; 3 rows in sample, 3 estimated total rows
INFO:  analyzing "public.CATemplateGroup"
INFO:  analyzing "public.Channels"
INFO:  "Channels": scanned 1 of 1 pages, containing 7 live rows and 0 dead rows; 7 rows in sample, 7 estimated total rows
INFO:  analyzing "public.ClientCodes"
INFO:  analyzing "public.Clients"
INFO:  "Clients": scanned 7 of 7 pages, containing 366 live rows and 0 dead rows; 366 rows in sample, 366 estimated total rows
INFO:  analyzing "public.Customers"
INFO:  "Customers": scanned 2 of 2 pages, containing 129 live rows and 0 dead rows; 129 rows in sample, 129 estimated total rows
NFO:  analyzing "public.Databases"
INFO:  "Databases": scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 1 rows in sample, 1 estimated total rows
INFO:  analyzing "public.DataSources"
INFO:  "DataSources": scanned 1 of 1 pages, containing 8 live rows and 0 dead rows; 8 rows in sample, 8 estimated total rows
INFO:  analyzing "public.DateToWeekConversion"
INFO:  "DateToWeekConversion": scanned 4 of 4 pages, containing 371 live rows and 0 dead rows; 371 rows in sample, 371 estimated total rows
INFO:  analyzing "public.Divisions"
INFO:  "Divisions": scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 1 rows in sample, 1 estimated total rows
INFO:  analyzing "public.MetricTable"
INFO:  "MetricTable": scanned 1 of 1 pages, containing 48 live rows and 0 dead rows; 48 rows in sample, 48 estimated total rows
INFO:  analyzing "public.Offtake"
INFO:  "Offtake": scanned 3000 of 13824 pages, containing 141000 live rows and 0 dead rows; 3000 rows in sample, 649728 estimated total rows
INFO:  analyzing "public.SKUs"
INFO:  "SKUs": scanned 3 of 3 pages, containing 73 live rows and 0 dead rows; 73 rows in sample, 73 estimated total rows
INFO:  analyzing "public.SMSDefaults"
INFO:  "SMSDefaults": scanned 1 of 1 pages, containing 43 live rows and 0 dead rows; 43 rows in sample, 43 estimated total rows
INFO:  analyzing "public.StandardPeriods"
INFO:  "StandardPeriods": scanned 1 of 1 pages, containing 8 live rows and 0 dead rows; 8 rows in sample, 8 estimated total rows
INFO:  analyzing "public.StandardUnits"
INFO:  "StandardUnits": scanned 1 of 1 pages, containing 9 live rows and 0 dead rows; 9 rows in sample, 9 estimated total rows
INFO:  analyzing "public.SubDataSources"
INFO:  "SubDataSources": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO:  analyzing "public.VolumeUnitDefn"
INFO:  "VolumeUnitDefn": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO:  analyzing "public.VolumeUnits"
INFO:  "VolumeUnits": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows

as you can see the biggest one only has 600k records.

Here are the settings used for postgresql.conf ( will just list those that were modified)
#---------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#---------------------------------------------------------------------------

# - Memory -
#shared_buffers = 32MB
shared_buffers = 128MB            # min 128kB or max_connections*16kB
                    # (change requires restart)
#temp_buffers = 8MB            # min 800kB
temp_buffers = 32MB
#max_prepared_transactions = 5        # can be 0 or more
                    # (change requires restart)
#max_prepared_transactions = 20       
# Note: increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
#work_mem = 1MB                # min 64kB
work_mem = 2MB
#maintenance_work_mem = 16MB        # min 1MB
maintenance_work_mem = 32MB
#max_stack_depth = 2MB            # min 100kB


Everything else was kept as is (given by the installer).

/etc/sysctl.conf  settings below :
kernel.shmmax = 1536000000
kernel.msgmni = 1024
fs.file-max = 8192
kernel.sem = "250 32000 32 1024"

The main stored function has approximately 1200 lines of pl/pgsql code. While it is running it calls 7 other support stored functions plus a small view.

The test basically was run two ways :

a) for the linux box, we used PG Admin III to run the core stored function and in the windows box we used query analyzer.

b) Created a small vb program that just calls the stored function for both boxes.

    Since I'm a total newbie in PG, I was expecting dismal results in the initial run since our port of the code would not be optimized for PG and sure enough I got them.

Windows 2k Pro + MSDE - 4 seconds
FC7 + postgresql-8.3-beta3 - 77 seconds

    thinking that maybe the GUI of FC7 is hampering the load, I decided to run it again using runlevel 3. The result was approximately a 1-2 % gain on the FC7 but that is insignificant compared to the discrepancy I'm measuring so I decided to just use the w/GUI results. We noticed that the CPU for the linux box was tapped out (at least one of the cores) nearly 100% for the duration of the process. There was plenty of ram available and there was little to no disk access during the entire run.

I decided to further normalize the test and make the OS constant.

Windows 2k Pro + MSDE - 4 seconds
Windows 2k Pro +  postgresql-8.3-beta3 - 54 seconds

Turns out that for our code, running PG in windows is faster significantly. This was a surprise coming from all the research on the internet but it does jive with the results of this guy :

http://channel9.msdn.com/ShowPost.aspx?PostID=179064

Note that this guy used MS SQL 2005. Which we have proved to be 2-3 times slower than MS SQL 2000 and hence our research into other options. :)

Anyways I'd like to break up this request/begging for help into two parts.

1) First would be settings of postgresql.conf. Did I do it correctly? The sample data is so small....I based my settings on the recommendations researched for data centers.

2) Code optimization which I plan to start in another email thread since the discussions there would be more detailed.

Would it also make sense to optimize (as far as possible) everything (including the code) for windows first? The target here would be a linux OS but since the discrepancy is so big...the unified Windows OS might be a good place to start for now.

Many Thanks in advance.

Regards



Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

From
"Usama Dar"
Date:


On Dec 5, 2007 1:13 PM, Robert Bernabe <robert_bernabe@yahoo.com> wrote:
Anyways I'd like to break up this request/begging for help into two parts.

1) First would be settings of postgresql.conf. Did I do it correctly? The sample data is so small....I based my settings on the recommendations researched for data centers.

i think this would  mainly depend on what do your stored procedures do, are they writing stuff to the tables, or reading most of the time or something else? i would imagine with small dataset the postgresql.conf settings should be ok, but more can be told after looking at the code.



2) Code optimization which I plan to start in another email thread since the discussions there would be more detailed.

i think that might be a better starting point.  What are you trying to do and how.
 


Would it also make sense to optimize (as far as possible) everything (including the code) for windows first? The target here would be a linux OS but since the discrepancy is so big...the unified Windows OS might be a good place to start for now.

Sure, but i am not able to comprehend how the pl/pgsql could contain code which can tuned OS wise, i would think that any optimization you would do there in the stored code would apply to all platforms.


 


Many Thanks in advance.

Regards






--
Usama Munir Dar http://linkedin.com/in/usamadar
Consultant Architect
Cell:+92 321 5020666
Skype: usamadar

Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

From
Bill Moran
Date:
In response to Robert Bernabe <robert_bernabe@yahoo.com>:

> Hi All,
>     I've been tasked to evaluate PG as a possible replacement of our
> MS SQL 2000 solution. Our solution is 100% stored procedure/function
> centric.

I've trimmed 99% of your email out, because it's not relevant to my
answer.

Fact is, it's pretty much impossible for anyone to give specific help
because you've obviously got a large, complex operation going on here,
and have not provided any real details.  The reality is that we'd
probably have to see your code to give any specific help.

However, I can help you with an approach to fixing it.  Based on your
description of the problem, I would guess that there are some differences
in best practices between MSSQL and PG that are what's hurting your
application once it's ported to PG.  Basically, you just need to isolate
them and adjust.

I recommend enabling full query logging with timing on the PG server.
In the postgresql.conf file, set the following:
log_min_duration_statement = 0

Note that this will result in a LOT of log information being written,
which will invariably make the application run even slower on PG, but
for tuning purposes it's invaluable as it will log every SQL statement
issued with the time it took to run.

From there, look for the low-hanging fruit.  I recommend running your
tests a few times, then running the logs through pgFouine:
http://pgfouine.projects.postgresql.org/

Once you've identified the queries that are taking the most time, start
adjusting the queries and/or the DB schema to improve the timing.  In
my experience, you'll usually find 2 or 3 queries that are slowing the
thing down, and the performance will come up to spec once they're
rewritten (or appropriate indexes added, or whatever)  EXPLAIN can
be your friend once you've found problematic queries.

Another piece of broadly useful advice is to install the pgbuffercache
addon and monitor shared_buffer usage to see if you've got enough.  Also
useful is monitoring the various statistics in the pg_stat_database
table.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

From
"Merlin Moncure"
Date:
On Dec 5, 2007 3:13 AM, Robert Bernabe <robert_bernabe@yahoo.com> wrote:
> Would it also make sense to optimize (as far as possible) everything
> (including the code) for windows first? The target here would be a linux OS
> but since the discrepancy is so big...the unified Windows OS might be a good
> place to start for now.

speaking in very general terms, postgresql should be competitive with
ms sql in this type of application.  there are a few things here and
there you have to watch out for...for example select count(*) from
table is slower on pg.  another common thing is certain query forms
that you have to watch out for...but these issues are often addressed
with small adjustments.

the key here is to isolate specific things in your procedure that are
underperforming and to determine the answer why.  to get the most
benefit from this list, try and post some particulars along with some
'explain analyze' results.

merlin

Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

From
"Scott Marlowe"
Date:
On Dec 5, 2007 2:13 AM, Robert Bernabe <robert_bernabe@yahoo.com> wrote:
>
> Hi All,
>     I've been tasked to evaluate PG as a possible replacement of our MS SQL
> 2000 solution. Our solution is 100% stored procedure/function centric. It's
> a report generation system whose sole task is to produce text files filled
> with processed data that is post-processed by a secondary system. Basically
> options are selected via a web interface and all these parameters are passed
> unto the stored procedure and then the stored procedure would run and in the
> process call other stored procedures until eventually a single formatted
> text file is produced.
>     I decided on Fedora Core 7 and the 8.3 Beta release of Enterprise DB
> PostgreSQL. I decided to port 1 stored procedure plus it's several support
> stored procedures into pl/pgsql from T-SQL and compare the performance by

Noble, but if you're a postgresql beginner, you might want to take a
pass on running beta code.  You might be hitting a corner case,
performance wise, and never know it.

A few pointers.
1: Up your shared_buffers to 512M or so.
2: Up work_mem to 16M

Now, use the poor man's debugging tool for your stored procs, raise notice


create or replace function testfunc() returns int as $$
DECLARE
    tm text;
    cnt int;
BEGIN
    select timeofday() into tm;
   RAISE NOTICE 'Time is now %',tm;
   select count(*) into cnt from accounts;
   select timeofday() into tm;
   RAISE NOTICE 'Time is now %',tm;
   RETURN 0;
END;
$$ language plpgsql;

Once you've found what's running slow, narrow it down to a specific part.

Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

From
Mark Cave-Ayland
Date:
On Wed, 2007-12-05 at 00:13 -0800, Robert Bernabe wrote:
> Hi All,
>     I've been tasked to evaluate PG as a possible replacement of our
> MS SQL 2000 solution. Our solution is 100% stored procedure/function
> centric. It's a report generation system whose sole task is to produce
> text files filled with processed data that is post-processed by a
> secondary system. Basically options are selected via a web interface
> and all these parameters are passed unto the stored procedure and then
> the stored procedure would run and in the process call other stored
> procedures until eventually a single formatted text file is produced.
>     I decided on Fedora Core 7 and the 8.3 Beta release of Enterprise
> DB PostgreSQL. I decided to port 1 stored procedure plus it's several
> support stored procedures into pl/pgsql from T-SQL and compare the
> performance by measuring how long each system takes to produce the
> text file. For this test,  the output to the text file was discarded
> and the stored procedure/function would end once the final temporary
> table is filled with the information that is eventually dumped into
> the text file.
>
> Windows 2000 Professional + MSDE (/MS SQL) Box    vs.   FC7 +
> EnterpriseDB PG Box
>
> Note that both boxes have EXACTLY the same hardware (not VMWARE or
> anything)
> AMD X2 3800
> 2 G RAM DDR 400
> 80 G Barracuda Sata
>
> The data was copied to the Linux box and checked lightly for
> consistency versus the windows box (number of tables / columns and
> records) and they all match. After data transfer to the Linux Box, I
> ran REINDEX and ANALYZE.
>
> For the actual run the following tables were used and I'm displaying
> the results of analyze.
>
> INFO:  analyzing "public.AreaDefn"
> INFO:  "AreaDefn": scanned 15 of 15 pages, containing 2293 live rows
> and 0 dead rows; 2293 rows in sample, 2293 estimated total rows
> INFO:  analyzing "public.AreaDefn2"
> INFO:  "AreaDefn2": scanned 30 of 30 pages, containing 3439 live rows
> and 0 dead rows; 3000 rows in sample, 3439 estimated total rows
> INFO:  analyzing "public.Areas"
> INFO:  "Areas": scanned 2 of 2 pages, containing 164 live rows and 0
> dead rows; 164 rows in sample, 164 estimated total rows
> INFO:  analyzing "public.Brands"
> INFO:  "Brands": scanned 1 of 1 pages, containing 11 live rows and 0
> dead rows; 11 rows in sample, 11 estimated total rows
> INFO:  analyzing "public.Categories"
> INFO:  "Categories": scanned 1 of 1 pages, containing 26 live rows and
> 0 dead rows; 26 rows in sample, 26 estimated total rows
> INFO:  analyzing "public.CategoryDefn"
> INFO:  "CategoryDefn": scanned 1 of 1 pages, containing 133 live rows
> and 0 dead rows; 133 rows in sample, 133 estimated total rows
> INFO:  analyzing "public.CategoryDefn2"
> INFO:  "CategoryDefn2": scanned 2 of 2 pages, containing 211 live rows
> and 0 dead rows; 211 rows in sample, 211 estimated total rows
> INFO:  analyzing "public.CategorySets"
> INFO:  "CategorySets": scanned 1 of 1 pages, containing 3 live rows
> and 0 dead rows; 3 rows in sample, 3 estimated total rows
> INFO:  analyzing "public.CATemplateGroup"
> INFO:  analyzing "public.Channels"
> INFO:  "Channels": scanned 1 of 1 pages, containing 7 live rows and 0
> dead rows; 7 rows in sample, 7 estimated total rows
> INFO:  analyzing "public.ClientCodes"
> INFO:  analyzing "public.Clients"
> INFO:  "Clients": scanned 7 of 7 pages, containing 366 live rows and 0
> dead rows; 366 rows in sample, 366 estimated total rows
> INFO:  analyzing "public.Customers"
> INFO:  "Customers": scanned 2 of 2 pages, containing 129 live rows and
> 0 dead rows; 129 rows in sample, 129 estimated total rows
> NFO:  analyzing "public.Databases"
> INFO:  "Databases": scanned 1 of 1 pages, containing 1 live rows and 0
> dead rows; 1 rows in sample, 1 estimated total rows
> INFO:  analyzing "public.DataSources"
> INFO:  "DataSources": scanned 1 of 1 pages, containing 8 live rows and
> 0 dead rows; 8 rows in sample, 8 estimated total rows
> INFO:  analyzing "public.DateToWeekConversion"
> INFO:  "DateToWeekConversion": scanned 4 of 4 pages, containing 371
> live rows and 0 dead rows; 371 rows in sample, 371 estimated total
> rows
> INFO:  analyzing "public.Divisions"
> INFO:  "Divisions": scanned 1 of 1 pages, containing 1 live rows and 0
> dead rows; 1 rows in sample, 1 estimated total rows
> INFO:  analyzing "public.MetricTable"
> INFO:  "MetricTable": scanned 1 of 1 pages, containing 48 live rows
> and 0 dead rows; 48 rows in sample, 48 estimated total rows
> INFO:  analyzing "public.Offtake"
> INFO:  "Offtake": scanned 3000 of 13824 pages, containing 141000 live
> rows and 0 dead rows; 3000 rows in sample, 649728 estimated total rows
> INFO:  analyzing "public.SKUs"
> INFO:  "SKUs": scanned 3 of 3 pages, containing 73 live rows and 0
> dead rows; 73 rows in sample, 73 estimated total rows
> INFO:  analyzing "public.SMSDefaults"
> INFO:  "SMSDefaults": scanned 1 of 1 pages, containing 43 live rows
> and 0 dead rows; 43 rows in sample, 43 estimated total rows
> INFO:  analyzing "public.StandardPeriods"
> INFO:  "StandardPeriods": scanned 1 of 1 pages, containing 8 live rows
> and 0 dead rows; 8 rows in sample, 8 estimated total rows
> INFO:  analyzing "public.StandardUnits"
> INFO:  "StandardUnits": scanned 1 of 1 pages, containing 9 live rows
> and 0 dead rows; 9 rows in sample, 9 estimated total rows
> INFO:  analyzing "public.SubDataSources"
> INFO:  "SubDataSources": scanned 0 of 0 pages, containing 0 live rows
> and 0 dead rows; 0 rows in sample, 0 estimated total rows
> INFO:  analyzing "public.VolumeUnitDefn"
> INFO:  "VolumeUnitDefn": scanned 0 of 0 pages, containing 0 live rows
> and 0 dead rows; 0 rows in sample, 0 estimated total rows
> INFO:  analyzing "public.VolumeUnits"
> INFO:  "VolumeUnits": scanned 0 of 0 pages, containing 0 live rows and
> 0 dead rows; 0 rows in sample, 0 estimated total rows
>
> as you can see the biggest one only has 600k records.
>
> Here are the settings used for postgresql.conf ( will just list those
> that were modified)
> #---------------------------------------------------------------------------
> # RESOURCE USAGE (except WAL)
> #---------------------------------------------------------------------------
>
> # - Memory -
> #shared_buffers = 32MB
> shared_buffers = 128MB            # min 128kB or max_connections*16kB
>                     # (change requires restart)
> #temp_buffers = 8MB            # min 800kB
> temp_buffers = 32MB
> #max_prepared_transactions = 5        # can be 0 or more
>                     # (change requires restart)
> #max_prepared_transactions = 20
> # Note: increasing max_prepared_transactions costs ~600 bytes of
> shared memory
> # per transaction slot, plus lock space (see
> max_locks_per_transaction).
> #work_mem = 1MB                # min 64kB
> work_mem = 2MB
> #maintenance_work_mem = 16MB        # min 1MB
> maintenance_work_mem = 32MB
> #max_stack_depth = 2MB            # min 100kB
>
>
> Everything else was kept as is (given by the installer).
>
> /etc/sysctl.conf  settings below :
> kernel.shmmax = 1536000000
> kernel.msgmni = 1024
> fs.file-max = 8192
> kernel.sem = "250 32000 32 1024"
>
> The main stored function has approximately 1200 lines of pl/pgsql
> code. While it is running it calls 7 other support stored functions
> plus a small view.
>
> The test basically was run two ways :
>
> a) for the linux box, we used PG Admin III to run the core stored
> function and in the windows box we used query analyzer.
>
> b) Created a small vb program that just calls the stored function for
> both boxes.
>
>     Since I'm a total newbie in PG, I was expecting dismal results in
> the initial run since our port of the code would not be optimized for
> PG and sure enough I got them.
>
> Windows 2k Pro + MSDE - 4 seconds
> FC7 + postgresql-8.3-beta3 - 77 seconds
>
>     thinking that maybe the GUI of FC7 is hampering the load, I
> decided to run it again using runlevel 3. The result was approximately
> a 1-2 % gain on the FC7 but that is insignificant compared to the
> discrepancy I'm measuring so I decided to just use the w/GUI results.
> We noticed that the CPU for the linux box was tapped out (at least one
> of the cores) nearly 100% for the duration of the process. There was
> plenty of ram available and there was little to no disk access during
> the entire run.
>
> I decided to further normalize the test and make the OS constant.
>
> Windows 2k Pro + MSDE - 4 seconds
> Windows 2k Pro +  postgresql-8.3-beta3 - 54 seconds
>
> Turns out that for our code, running PG in windows is faster
> significantly. This was a surprise coming from all the research on the
> internet but it does jive with the results of this guy :
>
> http://channel9.msdn.com/ShowPost.aspx?PostID=179064
>
> Note that this guy used MS SQL 2005. Which we have proved to be 2-3
> times slower than MS SQL 2000 and hence our research into other
> options. :)
>
> Anyways I'd like to break up this request/begging for help into two
> parts.
>
> 1) First would be settings of postgresql.conf. Did I do it correctly?
> The sample data is so small....I based my settings on the
> recommendations researched for data centers.
>
> 2) Code optimization which I plan to start in another email thread
> since the discussions there would be more detailed.
>
> Would it also make sense to optimize (as far as possible) everything
> (including the code) for windows first? The target here would be a
> linux OS but since the discrepancy is so big...the unified Windows OS
> might be a good place to start for now.
>
> Many Thanks in advance.
>
> Regards


Hi Robert,

Assuming that you've transferred across all relevant indices, the
biggest gotcha I've found from porting stored procedures is forgetting
to mark them STABLE or IMMUTABLE where relevant (see
http://www.postgresql.org/docs/8.3/static/sql-createfunction.html for
more details). Without these function markers, PostgreSQL assumes that
the functions are VOLATILE which severely restricts their ability to be
optimised by the planner.

BTW you mention both EnterpriseDB PostgreSQL 8.3 beta and just
PostgreSQL 8.3 beta in the text above. Both of these are different -
which one are you actually using?


Kind regards,

Mark.

--
ILande - Open Source Consultancy
http://www.ilande.co.uk



Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

From
Dave Page
Date:
Mark Cave-Ayland wrote:
> BTW you mention both EnterpriseDB PostgreSQL 8.3 beta and just
> PostgreSQL 8.3 beta in the text above. Both of these are different -
> which one are you actually using?

No they're not. EnterpriseDB Postgres ships entirely standard binaries -
in fact, the Windows build uses the exact same binaries I build for the
community installer.

EnterpriseDB Postgres is essentially a packaging and bundling project in
which the aim is to provide consistent and easy to use installers for
Windows, Mac and Linux that allow users to get started with Postgres,
Slony, PostGIS, pgAdmin, phpPgAdmin etc...

EnterpriseDB Advanced Server is the entirely different product - thats
the one that includes the Oracle compatibility and replication/migration
tools etc.

Regards, Dave.

Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

From
Mark Cave-Ayland
Date:
On Thu, 2007-12-06 at 08:50 +0000, Dave Page wrote:

> EnterpriseDB Postgres is essentially a packaging and bundling project in
> which the aim is to provide consistent and easy to use installers for
> Windows, Mac and Linux that allow users to get started with Postgres,
> Slony, PostGIS, pgAdmin, phpPgAdmin etc...
>
> EnterpriseDB Advanced Server is the entirely different product - thats
> the one that includes the Oracle compatibility and replication/migration
> tools etc.

Ah indeed - my mistake for not realising EnterpriseDB Postgres was a
different product from EnterpriseDB Advanced Server
(postgres.enterprisedb.com turned out to be quite an enlightening read).


ATB,

Mark.

--
ILande - Open Source Consultancy
http://www.ilande.co.uk



Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

From
Robert Bernabe
Date:
Hi All,
    Thanks for all the help here. Sorry for the late update but we've found our problem and fixed it already. Prior to looking at the translated code more  intently, I wanted to make sure that our environmental settings were acceptable and the various emails from members have confirmed that...

In a nutshell it seems that MS SQL allows bad T-SQL code by optimizing and ignoring redundant/useless from and where clauses in an update statement whereas plpgsql will execute exactly what the code is asking it to do...

We had several update instances in the T-SQL code that looked like this :

update "_tbl_tmp2"
set "LongBackPeriod" = (select count ("EPeriod") from "_tbl_tmp1" where "_tbl_tmp1"."Row" = "_tbl_tmp2"."Row");
--------------------------------------------------
from "_tbl_tmp2" tmp2, "_tbl_tmp1" tmp1
where tmp2."Row" = tmp1."Row";
---------------------------------------------------

In T-SQL, the performance is the same whether the last two lines are there or not...

In plpgsql, this is not the case the from and where clauses are not necessary and probably creates an internal (rather useless and time consuming) inner join in plpgsql which accounts for the original performance issue.

I'm happy (actually ecstatic) to report that Win2kPro + PG performance is slightly faster than  Win2kPro + MSSQL/MSDE.

Linux(FC7) + PG 8.x performance seems to be 3x faster than Win2KPro + MSSQL/MSDE for our stored functions.

Thanks for all the help! Am a believer now. :)


Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

From
Dave Page
Date:
Robert Bernabe wrote:
> I'm happy (actually ecstatic) to report that Win2kPro + PG performance
> is slightly faster than  Win2kPro + MSSQL/MSDE.
>
> Linux(FC7) + PG 8.x performance seems to be 3x faster than Win2KPro +
> MSSQL/MSDE for our stored functions.
>
> Thanks for all the help! Am a believer now. :)

That's great news Robert - thanks for sharing!

Regards, Dave.


Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

From
Heikki Linnakangas
Date:
Robert Bernabe wrote:
> In a nutshell it seems that MS SQL allows bad T-SQL code by optimizing and ignoring redundant/useless from and where
clausesin an update statement whereas plpgsql will execute exactly what the code is asking it to do... 
>
> We had several update instances in the T-SQL code that looked like this :
>
> update "_tbl_tmp2"
> set "LongBackPeriod" = (select count ("EPeriod") from "_tbl_tmp1" where "_tbl_tmp1"."Row" = "_tbl_tmp2"."Row");
> --------------------------------------------------
> from "_tbl_tmp2" tmp2, "_tbl_tmp1" tmp1
> where tmp2."Row" = tmp1."Row";
> ---------------------------------------------------

Those lines are not totally useless from DB point of view. If there is
no rows that match the join, the WHERE clause will be false, and no rows
will be updated. So I'm sure MS SQL doesn't ignore those lines, but does
use a more clever plan. Perhaps it stops processing the join as soon as
is finds a match, while we perform the whole join, for example.

> In T-SQL, the performance is the same whether the last two lines are there or not...
>
> In plpgsql, this is not the case the from and where clauses are not necessary and probably creates an internal
(ratheruseless and time consuming) inner join in plpgsql which accounts for the original performance issue. 

You can check the access plan with EXPLAIN.

> I'm happy (actually ecstatic) to report that Win2kPro + PG performance is slightly faster than  Win2kPro +
MSSQL/MSDE. 
>
> Linux(FC7) + PG 8.x performance seems to be 3x faster than Win2KPro + MSSQL/MSDE for our stored functions.
>
> Thanks for all the help! Am a believer now. :)

Nice to hear :).

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Evaluation of PG performance vs MSDE/MSSQL 2000 (not 2005)

From
"Kevin Grittner"
Date:
>>> On Tue, Dec 18, 2007 at  3:23 AM, in message
<4767917E.9050206@enterprisedb.com>, Heikki Linnakangas
<heikki@enterprisedb.com> wrote:
> Robert Bernabe wrote:
>> In a nutshell it seems that MS SQL allows bad T-SQL code by optimizing and
>> ignoring redundant/useless from and where clauses in an update statement
>> whereas plpgsql will execute exactly what the code is asking it to do...
>>
>> We had several update instances in the T-SQL code that looked like this :
>>
>> update "_tbl_tmp2"
>> set "LongBackPeriod" = (select count ("EPeriod") from "_tbl_tmp1" where
> "_tbl_tmp1"."Row" = "_tbl_tmp2"."Row");
>> --------------------------------------------------
>> from "_tbl_tmp2" tmp2, "_tbl_tmp1" tmp1
>> where tmp2."Row" = tmp1."Row";
>> ---------------------------------------------------
>
> I'm sure MS SQL doesn't ignore those lines, but does
> use a more clever plan.

Actually, this is what happens in the absence of a standard --
allowing a FROM clause on an UPDATE statement is an extension to
the standard.  MS SQL Server and PostgreSQL have both added such an
extension with identical syntax and differing semantics.  MS SQL
Server allows you to declare the updated table in the FROM clause
so that you can alias it; the first reference to the updated table
in the FROM clause is not taken as a separate reference, so the
above is interpreted exactly the same as:

update "_tbl_tmp2"
set "LongBackPeriod" = (select count ("EPeriod") from "_tbl_tmp1" where
_tbl_tmp1"."Row" = "_tbl_tmp2"."Row")
from "_tbl_tmp1" tmp1
where "_tbl_tmp2"."Row" = tmp1."Row"

PostgreSQL sees tmp2 as a second, independent reference to the
updated table.  This can be another big "gotcha" in migration.

-Kevin