Re: Query Performance SQL Server vs. Postgresql - Mailing list pgsql-performance
From | Humair Mohammed |
---|---|
Subject | Re: Query Performance SQL Server vs. Postgresql |
Date | |
Msg-id | COL115-W3987D9D7DE908C5C47792DA8380@phx.gbl Whole thread Raw |
In response to | Re: (Pavel Stehule <pavel.stehule@gmail.com>) |
List | pgsql-performance |
There are no indexes on the tables either in SQL Server or Postgresql - I am comparing apples to apples here. I ran ANALYZE on the postgresql tables, after that query performance times are still high 42 seconds with COALESCE and 35 seconds with IS DISTINCT FROM.
Here is the execution plan from Postgresql for qurey - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and coalesce(pb.response,'MISSING') <> coalesce(pg.response,'MISSING')
Execution Time: 42 seconds
"Hash Join (cost=16212.30..48854.24 rows=93477 width=17)"
" Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text))"
" Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text <> (COALESCE(pg.response, 'MISSING'::character varying))::text)"
" -> Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134)"
" -> Hash (cost=7537.12..7537.12 rows=251212 width=134)"
" -> Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134)"
And here is the execution plan from SQL Server for query - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and isnull(pb.response,'ISNULL')<> isnull(pg.response,'ISNULL')
Execution Time: < 1 second
Cost: 1% |--Parallelism(Gather Streams)
Cost: 31% |--Hash Match(Inner Join, HASH:([pb].[ID], [pb].[Question])=([pg].[ID], [pg].[Question]), RESIDUAL:([master].[dbo].[pivotbad].[ID] as [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND [master].[dbo].[pivotbad].[Question] as [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question] AND [Expr1006]<>[Expr1007]))
Cost: 0% |--Bitmap(HASH:([pb].[ID], [pb].[Question]), DEFINE:([Bitmap1008]))
Cost: 0% |--Compute Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as [pb].[Response],'ISNULL')))
Cost: 6% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question]))
Cost: 12% |--Table Scan(OBJECT:([master].[dbo].[pivotbad] AS [pb]))
Cost: 0% |--Compute Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] as [pg].[Response],'ISNULL')))
Cost: 17% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question]))
Cost: 33% |--Table Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]), WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question])))
> From: pavel.stehule@gmail.com
> Date: Tue, 16 Nov 2010 08:12:03 +0100
> Subject: Re: [PERFORM]
> To: humairm@hotmail.com
> CC: pgsql-performance@postgresql.org
>
> 2010/11/15 Humair Mohammed <humairm@hotmail.com>:
> > I have 2 tables with a 200,000 rows of data 3 character/string columns ID,
> > Question and Response. The query below compares the data between the 2
> > tables based on ID and Question and if the Response does not match between
> > the left table and the right table it identifies the ID's where there is a
> > mismatch. Running the query in SQL Server 2008 using the ISNULL function
> > take a few milliseconds. Running the same query in Postgresql takes over 70
> > seconds. The 2 queries are below:
> > SQL Server 2008 R2 Query
> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
> > t1.question = t2.question and isnull(t1.response,'ISNULL') <>
> > isnull(t2.response,'ISNULL')
>
> > Postgres 9.1 Query
> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
> > t1.question = t2.question and coalesce(t1.response,'ISNULL') <>
> > coalesce(t2.response,'ISNULL')
> > What gives?
>
> I think, so must problem can be in ugly predicate
> coalesce(t1.response,'ISNULL') <>
> > coalesce(t2.response,'ISNULL')
>
> try use a IS DISTINCT OF operator
>
> ... AND t1.response IS DISTINCT t2.response
>
> Regards
>
> Pavel Stehule
>
> p.s. don't use a coalesce in WHERE clause if it is possible.
pgsql-performance by date: