Re: Query Performance SQL Server vs. Postgresql - Mailing list pgsql-performance

From Pavel Stehule
Subject Re: Query Performance SQL Server vs. Postgresql
Date
Msg-id AANLkTikHPwzNuohB29x2ZCU6X8VQpvAiMXaFBMYfVO5v@mail.gmail.com
Whole thread Raw
In response to  (Humair Mohammed <humairm@hotmail.com>)
List pgsql-performance
Hello

I don't know. I checked similar query on similar dataset (size), and I
have a times about 3 sec on much more worse notebook. So problem can
be in disk IO operation speed - maybe in access to TOASTed value.

2010/11/21 Humair Mohammed <humairm@hotmail.com>:
> I am running 64-bit Windows 7 Enterprise with 8G RAM and a Dual Core 2.67
> Ghz Intel CPU. Both the SQL 2008 R2 and Postgresql are installed on the same
> machine. The DISTINCT FROM instead of the COALESCE does not help much. I ran
> 2 further tests with work_mem modifications (please note memory usage is
> quite low 650kb, so I am not sure if the work_mem is a factor):

it's has a little bit different meaning. work_mem is just limit, so
"memory usage" must not be great than work_mem ever. if then pg
increase "butches" number - store data to blocks on disk. Higher
work_mem ~ less butches. So ideal is 1 butches.

Regards

Pavel Stehule

> First, I modified the work_mem setting to 1GB (reloaded config) from the
> default 1MB and I see a response time of 33 seconds. Results below from
> EXPLAIN ANALYZE:
> "Hash Join  (cost=16212.30..52586.43 rows=92869 width=17) (actual
> time=26742.343..33274.317 rows=3163 loops=1)"
> "  Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text =
> (pg.question)::text))"
> "  Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text
> <> (COALESCE(pg.response, 'MISSING'::character varying))::text)"
> "  ->  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134)
> (actual time=0.055..40.710 rows=93496 loops=1)"
> "  ->  Hash  (cost=7537.12..7537.12 rows=251212 width=134) (actual
> time=25603.460..25603.460 rows=251212 loops=1)"
> "        Buckets: 1024  Batches: 64  Memory Usage: 650kB"
> "        ->  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212
> width=134) (actual time=0.050..120.269 rows=251212 loops=1)"
> "Total runtime: 33275.028 ms"
>
> Second, I modified the work_mem setting to 2GB (reloaded config) and I see a
> response time of 38 seconds. Results below from EXPLAIN ANALYZE:
> "Hash Join  (cost=16212.30..52586.43 rows=92869 width=17) (actual
> time=26574.459..38406.422 rows=3163 loops=1)"
> "  Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text =
> (pg.question)::text))"
> "  Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text
> <> (COALESCE(pg.response, 'MISSING'::character varying))::text)"
> "  ->  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134)
> (actual time=0.067..37.938 rows=93496 loops=1)"
> "  ->  Hash  (cost=7537.12..7537.12 rows=251212 width=134) (actual
> time=26426.127..26426.127 rows=251212 loops=1)"
> "        Buckets: 1024  Batches: 64  Memory Usage: 650kB"
> "        ->  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212
> width=134) (actual time=0.038..115.319 rows=251212 loops=1)"
> "Total runtime: 38406.927 ms"
>
> By no means I am trying to compare the 2 products. When I noticed the slow
> behavior of COALESCE I tried it on SQL Server. And since they are running on
> the same machine my comment regarding apples to apples. It is possible that
> this is not an apples to apples comparison other than the fact that it is
> running on the same machine.
>
>> From: pavel.stehule@gmail.com
>> Date: Thu, 18 Nov 2010 07:14:24 +0100
>> Subject: Re: Query Performance SQL Server vs. Postgresql
>> To: humairm@hotmail.com
>> CC: pgsql-performance@postgresql.org
>>
>> Hello,
>>
>> there should be a problem in a statistic, they are out of reality.
>> Please, try to use a DISTINCT OF operator now - maybe a statistic will
>> be better. Next - try to increase a work_mem. Hash join is
>> untypically slow in your comp.
>>
>> Regards
>>
>> Pavel Stehule
>>
>> 2010/11/17 Humair Mohammed <humairm@hotmail.com>:
>> > Yes strange indeed, I did rerun ANALYZE and VACCUM. Took 70 seconds to
>> > rerun
>> > the query. Results from EXPLAIN ANALYZE below:
>> > "Hash Join  (cost=16212.30..52586.43 rows=92869 width=17) (actual
>> > time=43200.223..49502.874 rows=3163 loops=1)"
>> > "  Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text
>> > =
>> > (pg.question)::text))"
>> > "  Join Filter: ((COALESCE(pb.response, 'MISSING'::character
>> > varying))::text
>> > <> (COALESCE(pg.response, 'MISSING'::character varying))::text)"
>> > "  ->  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496
>> > width=134)
>> > (actual time=0.009..48.200 rows=93496 loops=1)"
>> > "  ->  Hash  (cost=7537.12..7537.12 rows=251212 width=134) (actual
>> > time=42919.453..42919.453 rows=251212 loops=1)"
>> > "        Buckets: 1024  Batches: 64  Memory Usage: 650kB"
>> > "        ->  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212
>> > width=134) (actual time=0.119..173.019 rows=251212 loops=1)"
>> > "Total runtime: 49503.450 ms"
>> >
>> >> From: pavel.stehule@gmail.com
>> >> Date: Wed, 17 Nov 2010 05:47:51 +0100
>> >> Subject: Re: Query Performance SQL Server vs. Postgresql
>> >> To: humairm@hotmail.com
>> >> CC: pgsql-performance@postgresql.org
>> >>
>> >> 2010/11/17 Humair Mohammed <humairm@hotmail.com>:
>> >> >
>> >> > There are no indexes on the tables either in SQL Server or Postgresql
>> >> > -
>> >> > I am
>> >> > comparing apples to apples here. I ran ANALYZE on the postgresql
>> >> > tables,
>> >> > after that query performance times are still high 42 seconds with
>> >> > COALESCE
>> >> > and 35 seconds with IS DISTINCT FROM.
>> >> > Here is the execution plan from Postgresql for qurey - select pb.id
>> >> > from
>> >> > pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question
>> >> > =
>> >> > pg.question and coalesce(pb.response,'MISSING') <>
>> >> > coalesce(pg.response,'MISSING')
>> >> > Execution Time: 42 seconds
>> >> > "Hash Join  (cost=16212.30..48854.24 rows=93477 width=17)"
>> >> > "  Hash Cond: (((pb.id)::text = (pg.id)::text) AND
>> >> > ((pb.question)::text
>> >> > =
>> >> > (pg.question)::text))"
>> >> > "  Join Filter: ((COALESCE(pb.response, 'MISSING'::character
>> >> > varying))::text
>> >> > <> (COALESCE(pg.response, 'MISSING'::character varying))::text)"
>> >> > "  ->  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496
>> >> > width=134)"
>> >> > "  ->  Hash  (cost=7537.12..7537.12 rows=251212 width=134)"
>> >> > "        ->  Seq Scan on pivotgood pg  (cost=0.00..7537.12
>> >> > rows=251212
>> >> > width=134)"
>> >>
>> >> this is little bit strange - did you ANALYZE and VACUUM?
>> >>
>> >> please send result of EXPLAIN ANALYZE
>> >>
>> >> Pavel
>> >>
>> >> >
>> >> > And here is the execution plan from SQL Server for query - select
>> >> > pb.id
>> >> > from
>> >> > pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question
>> >> > =
>> >> > pg.question and isnull(pb.response,'ISNULL')<>
>> >> >  isnull(pg.response,'ISNULL')
>> >> > Execution Time: < 1 second
>> >> > Cost: 1%  |--Parallelism(Gather Streams)
>> >> > Cost: 31%       |--Hash Match(Inner Join, HASH:([pb].[ID],
>> >> > [pb].[Question])=([pg].[ID], [pg].[Question]),
>> >> > RESIDUAL:([master].[dbo].[pivotbad].[ID] as
>> >> > [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND
>> >> > [master].[dbo].[pivotbad].[Question] as
>> >> > [pb].[Question]=[master].[dbo].[pivotgood].[Question] as
>> >> > [pg].[Question]
>> >> > AND
>> >> > [Expr1006]<>[Expr1007]))
>> >> >     Cost: 0%  |--Bitmap(HASH:([pb].[ID], [pb].[Question]),
>> >> > DEFINE:([Bitmap1008]))
>> >> >             Cost: 0%    |--Compute
>> >> > Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response]
>> >> > as
>> >> > [pb].[Response],'ISNULL')))
>> >> >             Cost:  6%   |--Parallelism(Repartition Streams, Hash
>> >> > Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question]))
>> >> >             Cost: 12%  |--Table
>> >> > Scan(OBJECT:([master].[dbo].[pivotbad]
>> >> > AS
>> >> > [pb]))
>> >> >             Cost: 0% |--Compute
>> >> >
>> >> > Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response]
>> >> > as
>> >> > [pg].[Response],'ISNULL')))
>> >> >                 Cost: 17% |--Parallelism(Repartition Streams, Hash
>> >> > Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question]))
>> >> >                     Cost: 33% |--Table
>> >> > Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]),
>> >> > WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as
>> >> > [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question])))
>> >> >
>> >> >
>> >> >
>> >> >> From: pavel.stehule@gmail.com
>> >> >> Date: Tue, 16 Nov 2010 08:12:03 +0100
>> >> >> Subject: Re: [PERFORM]
>> >> >> To: humairm@hotmail.com
>> >> >> CC: pgsql-performance@postgresql.org
>> >> >>
>> >> >> 2010/11/15 Humair Mohammed <humairm@hotmail.com>:
>> >> >> > I have 2 tables with a 200,000 rows of data 3 character/string
>> >> >> > columns
>> >> >> > ID,
>> >> >> > Question and Response. The query below compares the data between
>> >> >> > the
>> >> >> > 2
>> >> >> > tables based on ID and Question and if the Response does not match
>> >> >> > between
>> >> >> > the left table and the right table it identifies the ID's where
>> >> >> > there
>> >> >> > is
>> >> >> > a
>> >> >> > mismatch. Running the query in SQL Server 2008 using the ISNULL
>> >> >> > function
>> >> >> > take a few milliseconds. Running the same query in Postgresql
>> >> >> > takes
>> >> >> > over
>> >> >> > 70
>> >> >> > seconds. The 2 queries are below:
>> >> >> > SQL Server 2008 R2 Query
>> >> >> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id
>> >> >> > and
>> >> >> > t1.question = t2.question and isnull(t1.response,'ISNULL') <>
>> >> >> > isnull(t2.response,'ISNULL')
>> >> >>
>> >> >> > Postgres 9.1 Query
>> >> >> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id
>> >> >> > and
>> >> >> > t1.question = t2.question and coalesce(t1.response,'ISNULL') <>
>> >> >> > coalesce(t2.response,'ISNULL')
>> >> >> > What gives?
>> >> >>
>> >> >> I think, so must problem can be in ugly predicate
>> >> >> coalesce(t1.response,'ISNULL') <>
>> >> >> > coalesce(t2.response,'ISNULL')
>> >> >>
>> >> >> try use a IS DISTINCT OF operator
>> >> >>
>> >> >> ... AND t1.response IS DISTINCT t2.response
>> >> >>
>> >> >> Regards
>> >> >>
>> >> >> Pavel Stehule
>> >> >>
>> >> >> p.s. don't use a coalesce in WHERE clause if it is possible.
>> >> >
>> >
>

pgsql-performance by date:

Previous
From: Bob Lunney
Date:
Subject: Re: best db schema for time series data?
Next
From: goran
Date:
Subject: Should changing offset in LIMIT change query plan (at all/so early)?