Home > mailing lists

Re: Query Performance SQL Server vs. Postgresql - Mailing list pgsql-performance

From	Humair Mohammed
Subject	Re: Query Performance SQL Server vs. Postgresql
Date	November 17, 2010 19:03:44
Msg-id	COL115-W3987D9D7DE908C5C47792DA8380@phx.gbl Whole thread Raw
In response to	Re: (Pavel Stehule <pavel.stehule@gmail.com>)
List	pgsql-performance

Tree view

There are no indexes on the tables either in SQL Server or Postgresql - I am comparing apples to apples here. I ran ANALYZE on the postgresql tables, after that query performance times are still high 42 seconds with COALESCE and 35 seconds with IS DISTINCT FROM.

Here is the execution plan from Postgresql for qurey - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and coalesce(pb.response,'MISSING') <> coalesce(pg.response,'MISSING')

Execution Time: 42 seconds

"Hash Join (cost=16212.30..48854.24 rows=93477 width=17)"

" Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text))"

" Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text <> (COALESCE(pg.response, 'MISSING'::character varying))::text)"

" -> Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134)"

" -> Hash (cost=7537.12..7537.12 rows=251212 width=134)"

" -> Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134)"

And here is the execution plan from SQL Server for query - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and isnull(pb.response,'ISNULL')<> isnull(pg.response,'ISNULL')

Execution Time: < 1 second

Cost: 1% |--Parallelism(Gather Streams)

Cost: 31% |--Hash Match(Inner Join, HASH:([pb].[ID], [pb].[Question])=([pg].[ID], [pg].[Question]), RESIDUAL:([master].[dbo].[pivotbad].[ID] as [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND [master].[dbo].[pivotbad].[Question] as [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question] AND [Expr1006]<>[Expr1007]))

Cost: 0% |--Bitmap(HASH:([pb].[ID], [pb].[Question]), DEFINE:([Bitmap1008]))

Cost: 0% |--Compute Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as [pb].[Response],'ISNULL')))

Cost: 6% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question]))

Cost: 12% |--Table Scan(OBJECT:([master].[dbo].[pivotbad] AS [pb]))

Cost: 0% |--Compute Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] as [pg].[Response],'ISNULL')))

Cost: 17% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question]))

Cost: 33% |--Table Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]), WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question])))

> From: pavel.stehule@gmail.com
> Date: Tue, 16 Nov 2010 08:12:03 +0100
> Subject: Re: [PERFORM]
> To: humairm@hotmail.com
> CC: pgsql-performance@postgresql.org
>
> 2010/11/15 Humair Mohammed <humairm@hotmail.com>:
> > I have 2 tables with a 200,000 rows of data 3 character/string columns ID,
> > Question and Response. The query below compares the data between the 2
> > tables based on ID and Question and if the Response does not match between
> > the left table and the right table it identifies the ID's where there is a
> > mismatch. Running the query in SQL Server 2008 using the ISNULL function
> > take a few milliseconds. Running the same query in Postgresql takes over 70
> > seconds. The 2 queries are below:
> > SQL Server 2008 R2 Query
> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
> > t1.question = t2.question and isnull(t1.response,'ISNULL') <>
> > isnull(t2.response,'ISNULL')
>
> > Postgres 9.1 Query
> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
> > t1.question = t2.question and coalesce(t1.response,'ISNULL') <>
> > coalesce(t2.response,'ISNULL')
> > What gives?
>
> I think, so must problem can be in ugly predicate
> coalesce(t1.response,'ISNULL') <>
> > coalesce(t2.response,'ISNULL')
>
> try use a IS DISTINCT OF operator
>
> ... AND t1.response IS DISTINCT t2.response
>
> Regards
>
> Pavel Stehule
>
> p.s. don't use a coalesce in WHERE clause if it is possible.

pgsql-performance by date:

From: Humair Mohammed
Date: 17 November 2010, 19:03:42
Subject: Re: Query Performance SQL Server vs. Postgresql

From: Scott Carey
Date: 17 November 2010, 19:18:54
Subject: Re: Defaulting wal_sync_method to fdatasync on Linux for 9.1?

Re: Query Performance SQL Server vs. Postgresql - Mailing list pgsql-performance

Previous

Next