Thread: SELECT is faster on SQL Server

SELECT is faster on SQL Server

From
Frank Millman
Date:
Hi all

This may be a non-issue, and I don't want to waste your time. But 
perhaps someone can have a look to see if there is anything obvious I 
have missed.

I am writing a cross-platform accounting app, and I test using Sql 
Server on Windows 10 and PostgreSql on Fedora 31. Performance is usually 
very similar, with a slight edge to PostgreSql. Now I have a SELECT 
which runs over twice as fast on Sql Server compared to PostgreSql.

This is the table definition -

                                          Table "prop.ar_totals"
      Column      |     Type      | Collation | Nullable 
|                    Default
-----------------+---------------+-----------+----------+------------------------------------------------
  row_id          | integer       |           | not null | 
nextval('prop.ar_totals_row_id_seq'::regclass)
  created_id      | integer       |           |          | 0
  deleted_id      | integer       |           |          | 0
  ledger_row_id   | integer       |           |          |
  location_row_id | integer       |           |          |
  function_row_id | integer       |           |          |
  source_code_id  | integer       |           |          |
  tran_date       | date          |           |          |
  tran_day        | numeric(21,2) |           |          | 0
  tran_tot        | numeric(21,2) |           |          | 0
Indexes:
     "ar_totals_pkey" PRIMARY KEY, btree (row_id)
     "_ar_totals" UNIQUE, btree (ledger_row_id NULLS FIRST, 
location_row_id NULLS FIRST, function_row_id NULLS FIRST, source_code_id 
NULLS FIRST, tran_date NULLS FIRST) WHERE deleted_id = 0
     "ar_tots_cover" btree (ledger_row_id NULLS FIRST, location_row_id 
NULLS FIRST, function_row_id NULLS FIRST, source_code_id NULLS FIRST, 
tran_date DESC NULLS LAST, tran_day NULLS FIRST, tran_tot NULLS FIRST) 
WHERE deleted_id = 0

This is the SELECT -

SELECT
     '2018-03-01' AS op_date, '2018-03-31' AS cl_date,
     cl_bal.source_code_id, op_bal.op_tot, cl_bal.cl_tot
FROM (
     SELECT a.source_code_id, SUM(a.tran_tot) AS cl_tot FROM (
         SELECT source_code_id, tran_tot,
         ROW_NUMBER() OVER (PARTITION BY
         ledger_row_id, location_row_id, function_row_id, source_code_id
         ORDER BY tran_date DESC) row_num
         FROM prop.ar_totals WHERE deleted_id = 0 AND tran_date <= 
'2018-03-31'
             AND ledger_row_id = 1
         ) AS a
     WHERE a.row_num = 1
     GROUP BY a.source_code_id
     ) as cl_bal
LEFT JOIN (
     SELECT a.source_code_id, SUM(a.tran_tot) AS op_tot FROM (
         SELECT source_code_id, tran_tot,
         ROW_NUMBER() OVER (PARTITION BY
         ledger_row_id, location_row_id, function_row_id, source_code_id
         ORDER BY tran_date DESC) row_num
         FROM prop.ar_totals WHERE deleted_id = 0 AND tran_date < 
'2018-03-01'
             AND ledger_row_id = 1
         ) AS a
     WHERE a.row_num = 1
     GROUP BY a.source_code_id
     ) as op_bal
ON op_bal.source_code_id = cl_bal.source_code_id

This is the EXPLAIN -

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop Left Join  (cost=5.66..5.74 rows=1 width=132)
    Join Filter: (a_1.source_code_id = a.source_code_id)
    ->  GroupAggregate  (cost=3.65..3.67 rows=1 width=36)
          Group Key: a.source_code_id
          ->  Sort  (cost=3.65..3.65 rows=1 width=10)
                Sort Key: a.source_code_id
                ->  Subquery Scan on a  (cost=2.36..3.64 rows=1 width=10)
                      Filter: (a.row_num = 1)
                      ->  WindowAgg  (cost=2.36..3.24 rows=32 width=34)
                            ->  Sort  (cost=2.36..2.44 rows=32 width=26)
                                  Sort Key: ar_totals.location_row_id, 
ar_totals.function_row_id, ar_totals.source_code_id, ar_totals.tran_date 
DESC
                                  ->  Seq Scan on ar_totals 
(cost=0.00..1.56 rows=32 width=26)
                                        Filter: ((tran_date <= 
'2018-03-31'::date) AND (deleted_id = 0) AND (ledger_row_id = 1))
    ->  GroupAggregate  (cost=2.01..2.03 rows=1 width=36)
          Group Key: a_1.source_code_id
          ->  Sort  (cost=2.01..2.02 rows=1 width=10)
                Sort Key: a_1.source_code_id
                ->  Subquery Scan on a_1  (cost=1.68..2.00 rows=1 width=10)
                      Filter: (a_1.row_num = 1)
                      ->  WindowAgg  (cost=1.68..1.90 rows=8 width=34)
                            ->  Sort  (cost=1.68..1.70 rows=8 width=26)
                                  Sort Key: ar_totals_1.location_row_id, 
ar_totals_1.function_row_id, ar_totals_1.source_code_id, 
ar_totals_1.tran_date DESC
                                  ->  Seq Scan on ar_totals ar_totals_1  
(cost=0.00..1.56 rows=8 width=26)
                                        Filter: ((tran_date < 
'2018-03-01'::date) AND (deleted_id = 0) AND (ledger_row_id = 1))
(24 rows)

Maybe SQL Server has a way of optimising this, and there is nothing more 
I can do. I can live with that. But I just thought I would ask the question.

Thanks for any advice.

Frank Millman





Re: SELECT is faster on SQL Server

From
Thomas Kellerer
Date:
Frank Millman schrieb am 19.03.2021 um 09:19:
> This may be a non-issue, and I don't want to waste your time. But perhaps someone can have a look to see if there is
anythingobvious I have missed. 
>
> I am writing a cross-platform accounting app, and I test using Sql
> Server on Windows 10 and PostgreSql on Fedora 31. Performance is
> usually very similar, with a slight edge to PostgreSql. Now I have a
> SELECT which runs over twice as fast on Sql Server compared to
> PostgreSql.
>

Can you change the SELECT statement?

Very often "distinct on ()" is faster in Postgres compared to the equivalent solution using window functions

Something along the lines (for the first derived table):

SELECT ...
FROM (
    SELECT a.source_code_id, SUM(a.tran_tot) AS cl_tot
    FROM (
        SELECT distinct on (location_row_id, function_row_id, source_code_id) source_code_id, tran_tot
        FROM prop.ar_totals
        WHERE deleted_id = 0
          AND tran_date <= '2018-03-31'
          AND ledger_row_id = 1
        ORDER BY location_row_id, function_row_id, source_code_id, tran_date DESC
    ) AS a
    GROUP BY a.source_code_id
) as cl_bal
...

Thomas



Re: SELECT is faster on SQL Server

From
Frank Millman
Date:
On 2021-03-19 10:29 AM, Thomas Kellerer wrote:
> Frank Millman schrieb am 19.03.2021 um 09:19:
>> This may be a non-issue, and I don't want to waste your time. But perhaps someone can have a look to see if there is
anythingobvious I have missed.
 
>>
>> I am writing a cross-platform accounting app, and I test using Sql
>> Server on Windows 10 and PostgreSql on Fedora 31. Performance is
>> usually very similar, with a slight edge to PostgreSql. Now I have a
>> SELECT which runs over twice as fast on Sql Server compared to
>> PostgreSql.
>>
> Can you change the SELECT statement?
>
> Very often "distinct on ()" is faster in Postgres compared to the equivalent solution using window functions
>
> Something along the lines (for the first derived table):
>
> SELECT ...
> FROM (
>      SELECT a.source_code_id, SUM(a.tran_tot) AS cl_tot
>      FROM (
>          SELECT distinct on (location_row_id, function_row_id, source_code_id) source_code_id, tran_tot
>          FROM prop.ar_totals
>          WHERE deleted_id = 0
>            AND tran_date <= '2018-03-31'
>            AND ledger_row_id = 1
>          ORDER BY location_row_id, function_row_id, source_code_id, tran_date DESC
>      ) AS a
>      GROUP BY a.source_code_id
> ) as cl_bal
> ...
Thanks, Thomas

I tried that, and it ran about 10% faster. Every little helps, but SQL 
Server appears to have some secret sauce!

Frank





Re: SELECT is faster on SQL Server

From
Pavel Stehule
Date:


pá 19. 3. 2021 v 9:53 odesílatel Frank Millman <frank@chagford.com> napsal:

On 2021-03-19 10:29 AM, Thomas Kellerer wrote:
> Frank Millman schrieb am 19.03.2021 um 09:19:
>> This may be a non-issue, and I don't want to waste your time. But perhaps someone can have a look to see if there is anything obvious I have missed.
>>
>> I am writing a cross-platform accounting app, and I test using Sql
>> Server on Windows 10 and PostgreSql on Fedora 31. Performance is
>> usually very similar, with a slight edge to PostgreSql. Now I have a
>> SELECT which runs over twice as fast on Sql Server compared to
>> PostgreSql.
>>
> Can you change the SELECT statement?
>
> Very often "distinct on ()" is faster in Postgres compared to the equivalent solution using window functions
>
> Something along the lines (for the first derived table):
>
> SELECT ...
> FROM (
>      SELECT a.source_code_id, SUM(a.tran_tot) AS cl_tot
>      FROM (
>          SELECT distinct on (location_row_id, function_row_id, source_code_id) source_code_id, tran_tot
>          FROM prop.ar_totals
>          WHERE deleted_id = 0
>            AND tran_date <= '2018-03-31'
>            AND ledger_row_id = 1
>          ORDER BY location_row_id, function_row_id, source_code_id, tran_date DESC
>      ) AS a
>      GROUP BY a.source_code_id
> ) as cl_bal
> ...
Thanks, Thomas

I tried that, and it ran about 10% faster. Every little helps, but SQL
Server appears to have some secret sauce!

can you send a result of EXPLAIN ANALYZE?

Pavel


Frank




Re: SELECT is faster on SQL Server

From
Thomas Kellerer
Date:
Frank Millman schrieb am 19.03.2021 um 09:52:
>>> I am writing a cross-platform accounting app, and I test using Sql
>>> Server on Windows 10 and PostgreSql on Fedora 31. Performance is
>>> usually very similar, with a slight edge to PostgreSql. Now I have a
>>> SELECT which runs over twice as fast on Sql Server compared to
>>> PostgreSql.
>>>
>> Can you change the SELECT statement?
>>
>> Very often "distinct on ()" is faster in Postgres compared to the equivalent solution using window functions
>
> Thanks, Thomas
>
> I tried that, and it ran about 10% faster. Every little helps, but SQL Server appears to have some secret sauce!

The two derived tables (cl_bal, op_bal) seem to be doing exactly the same thing - at least I can't spot a difference.

If that is correct, you can move them into a common table expression - maybe detecting that is SQL Server's secret
sauce.

    with totals as (
       SELECT a.source_code_id, SUM(a.tran_tot) AS total
       FROM (
           SELECT distinct on (location_row_id, function_row_id, source_code_id) source_code_id, tran_tot
           FROM prop.ar_totals
           WHERE deleted_id = 0
             AND tran_date <= '2018-03-31'
             AND ledger_row_id = 1
           ORDER BY location_row_id, function_row_id, source_code_id, tran_date DESC
       ) AS a
       GROUP BY a.source_code_id
    )
    select
        '2018-03-01' AS op_date, '2018-03-31' AS cl_date,
        cl_bal.source_code_id, op_bal.total as op_tot, cl_bal.total.cl_tot
    FROM totals as cl_bal
      LEFT JOIN totals as op_bal ON op_bal.source_code_id = cl_bal.source_code_id;



Re: SELECT is faster on SQL Server

From
Frank Millman
Date:
On 2021-03-19 11:04 AM, Thomas Kellerer wrote:
> Frank Millman schrieb am 19.03.2021 um 09:52:
>>>> I am writing a cross-platform accounting app, and I test using Sql
>>>> Server on Windows 10 and PostgreSql on Fedora 31. Performance is
>>>> usually very similar, with a slight edge to PostgreSql. Now I have a
>>>> SELECT which runs over twice as fast on Sql Server compared to
>>>> PostgreSql.
>>>>
>>> Can you change the SELECT statement?
>>>
>>> Very often "distinct on ()" is faster in Postgres compared to the equivalent solution using window functions
>> Thanks, Thomas
>>
>> I tried that, and it ran about 10% faster. Every little helps, but SQL Server appears to have some secret sauce!
> The two derived tables (cl_bal, op_bal) seem to be doing exactly the same thing - at least I can't spot a
difference.
>
> If that is correct, you can move them into a common table expression - maybe detecting that is SQL Server's secret
sauce.
>
>      with totals as (
>         SELECT a.source_code_id, SUM(a.tran_tot) AS total
>         FROM (
>             SELECT distinct on (location_row_id, function_row_id, source_code_id) source_code_id, tran_tot
>             FROM prop.ar_totals
>             WHERE deleted_id = 0
>               AND tran_date <= '2018-03-31'
>               AND ledger_row_id = 1
>             ORDER BY location_row_id, function_row_id, source_code_id, tran_date DESC
>         ) AS a
>         GROUP BY a.source_code_id
>      )
>      select
>          '2018-03-01' AS op_date, '2018-03-31' AS cl_date,
>          cl_bal.source_code_id, op_bal.total as op_tot, cl_bal.total.cl_tot
>      FROM totals as cl_bal
>        LEFT JOIN totals as op_bal ON op_bal.source_code_id = cl_bal.source_code_id;
>
There is a difference.

cl_bal selects WHERE tran_date <= '2018-03-31'.

op_bal selects WHERE tran_date < '2018-03-01'.

The second one could be written as WHERE tran_date <= '2018-02-28', but 
I don't think that would make any difference.

Frank





Re: SELECT is faster on SQL Server

From
Frank Millman
Date:


On 2021-03-19 10:56 AM, Pavel Stehule wrote:


pá 19. 3. 2021 v 9:53 odesílatel Frank Millman <frank@chagford.com> napsal:

On 2021-03-19 10:29 AM, Thomas Kellerer wrote:
> Frank Millman schrieb am 19.03.2021 um 09:19:
>> This may be a non-issue, and I don't want to waste your time. But perhaps someone can have a look to see if there is anything obvious I have missed.
>>
>> I am writing a cross-platform accounting app, and I test using Sql
>> Server on Windows 10 and PostgreSql on Fedora 31. Performance is
>> usually very similar, with a slight edge to PostgreSql. Now I have a
>> SELECT which runs over twice as fast on Sql Server compared to
>> PostgreSql.
>>
> Can you change the SELECT statement?
>
> Very often "distinct on ()" is faster in Postgres compared to the equivalent solution using window functions
>
> Something along the lines (for the first derived table):
>
> SELECT ...
> FROM (
>      SELECT a.source_code_id, SUM(a.tran_tot) AS cl_tot
>      FROM (
>          SELECT distinct on (location_row_id, function_row_id, source_code_id) source_code_id, tran_tot
>          FROM prop.ar_totals
>          WHERE deleted_id = 0
>            AND tran_date <= '2018-03-31'
>            AND ledger_row_id = 1
>          ORDER BY location_row_id, function_row_id, source_code_id, tran_date DESC
>      ) AS a
>      GROUP BY a.source_code_id
> ) as cl_bal
> ...
Thanks, Thomas

I tried that, and it ran about 10% faster. Every little helps, but SQL
Server appears to have some secret sauce!

can you send a result of EXPLAIN ANALYZE?

                                                                         QUERY PLAN                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=5.66..5.74 rows=1 width=132) (actual time=0.213..0.248 rows=5 loops=1)
   Join Filter: (a_1.source_code_id = a.source_code_id)
   Rows Removed by Join Filter: 4
   ->  GroupAggregate  (cost=3.65..3.67 rows=1 width=36) (actual time=0.144..0.157 rows=5 loops=1)
         Group Key: a.source_code_id
         ->  Sort  (cost=3.65..3.65 rows=1 width=10) (actual time=0.131..0.135 rows=29 loops=1)
               Sort Key: a.source_code_id
               Sort Method: quicksort  Memory: 26kB
               ->  Subquery Scan on a  (cost=2.36..3.64 rows=1 width=10) (actual time=0.063..0.116 rows=29 loops=1)
                     Filter: (a.row_num = 1)
                     Rows Removed by Filter: 3
                     ->  WindowAgg  (cost=2.36..3.24 rows=32 width=34) (actual time=0.062..0.107 rows=32 loops=1)
                           ->  Sort  (cost=2.36..2.44 rows=32 width=26) (actual time=0.054..0.059 rows=32 loops=1)
                                 Sort Key: ar_totals.location_row_id, ar_totals.function_row_id, ar_totals.source_code_id, ar_totals.tran_date DESC
                                 Sort Method: quicksort  Memory: 27kB
                                 ->  Seq Scan on ar_totals  (cost=0.00..1.56 rows=32 width=26) (actual time=0.014..0.028 rows=32 loops=1)
                                       Filter: ((tran_date <= '2018-03-31'::date) AND (deleted_id = 0) AND (ledger_row_id = 1))
   ->  GroupAggregate  (cost=2.01..2.03 rows=1 width=36) (actual time=0.017..0.017 rows=1 loops=5)
         Group Key: a_1.source_code_id
         ->  Sort  (cost=2.01..2.02 rows=1 width=10) (actual time=0.012..0.013 rows=8 loops=5)
               Sort Key: a_1.source_code_id
               Sort Method: quicksort  Memory: 25kB
               ->  Subquery Scan on a_1  (cost=1.68..2.00 rows=1 width=10) (actual time=0.032..0.047 rows=8 loops=1)
                     Filter: (a_1.row_num = 1)
                     ->  WindowAgg  (cost=1.68..1.90 rows=8 width=34) (actual time=0.031..0.043 rows=8 loops=1)
                           ->  Sort  (cost=1.68..1.70 rows=8 width=26) (actual time=0.023..0.024 rows=8 loops=1)
                                 Sort Key: ar_totals_1.location_row_id, ar_totals_1.function_row_id, ar_totals_1.source_code_id, ar_totals_1.tran_date DESC
                                 Sort Method: quicksort  Memory: 25kB
                                 ->  Seq Scan on ar_totals ar_totals_1  (cost=0.00..1.56 rows=8 width=26) (actual time=0.006..0.013 rows=8 loops=1)
                                       Filter: ((tran_date < '2018-03-01'::date) AND (deleted_id = 0) AND (ledger_row_id = 1))
                                       Rows Removed by Filter: 24
 Planning Time: 0.479 ms
 Execution Time: 0.344 ms
(33 rows)



Re: SELECT is faster on SQL Server

From
Pavel Stehule
Date:


pá 19. 3. 2021 v 10:22 odesílatel Frank Millman <frank@chagford.com> napsal:


On 2021-03-19 10:56 AM, Pavel Stehule wrote:


pá 19. 3. 2021 v 9:53 odesílatel Frank Millman <frank@chagford.com> napsal:

On 2021-03-19 10:29 AM, Thomas Kellerer wrote:
> Frank Millman schrieb am 19.03.2021 um 09:19:
>> This may be a non-issue, and I don't want to waste your time. But perhaps someone can have a look to see if there is anything obvious I have missed.
>>
>> I am writing a cross-platform accounting app, and I test using Sql
>> Server on Windows 10 and PostgreSql on Fedora 31. Performance is
>> usually very similar, with a slight edge to PostgreSql. Now I have a
>> SELECT which runs over twice as fast on Sql Server compared to
>> PostgreSql.
>>
> Can you change the SELECT statement?
>
> Very often "distinct on ()" is faster in Postgres compared to the equivalent solution using window functions
>
> Something along the lines (for the first derived table):
>
> SELECT ...
> FROM (
>      SELECT a.source_code_id, SUM(a.tran_tot) AS cl_tot
>      FROM (
>          SELECT distinct on (location_row_id, function_row_id, source_code_id) source_code_id, tran_tot
>          FROM prop.ar_totals
>          WHERE deleted_id = 0
>            AND tran_date <= '2018-03-31'
>            AND ledger_row_id = 1
>          ORDER BY location_row_id, function_row_id, source_code_id, tran_date DESC
>      ) AS a
>      GROUP BY a.source_code_id
> ) as cl_bal
> ...
Thanks, Thomas

I tried that, and it ran about 10% faster. Every little helps, but SQL
Server appears to have some secret sauce!

can you send a result of EXPLAIN ANALYZE?

                                                                         QUERY PLAN                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=5.66..5.74 rows=1 width=132) (actual time=0.213..0.248 rows=5 loops=1)
   Join Filter: (a_1.source_code_id = a.source_code_id)
   Rows Removed by Join Filter: 4
   ->  GroupAggregate  (cost=3.65..3.67 rows=1 width=36) (actual time=0.144..0.157 rows=5 loops=1)
         Group Key: a.source_code_id
         ->  Sort  (cost=3.65..3.65 rows=1 width=10) (actual time=0.131..0.135 rows=29 loops=1)
               Sort Key: a.source_code_id
               Sort Method: quicksort  Memory: 26kB
               ->  Subquery Scan on a  (cost=2.36..3.64 rows=1 width=10) (actual time=0.063..0.116 rows=29 loops=1)
                     Filter: (a.row_num = 1)
                     Rows Removed by Filter: 3
                     ->  WindowAgg  (cost=2.36..3.24 rows=32 width=34) (actual time=0.062..0.107 rows=32 loops=1)
                           ->  Sort  (cost=2.36..2.44 rows=32 width=26) (actual time=0.054..0.059 rows=32 loops=1)
                                 Sort Key: ar_totals.location_row_id, ar_totals.function_row_id, ar_totals.source_code_id, ar_totals.tran_date DESC
                                 Sort Method: quicksort  Memory: 27kB
                                 ->  Seq Scan on ar_totals  (cost=0.00..1.56 rows=32 width=26) (actual time=0.014..0.028 rows=32 loops=1)
                                       Filter: ((tran_date <= '2018-03-31'::date) AND (deleted_id = 0) AND (ledger_row_id = 1))
   ->  GroupAggregate  (cost=2.01..2.03 rows=1 width=36) (actual time=0.017..0.017 rows=1 loops=5)
         Group Key: a_1.source_code_id
         ->  Sort  (cost=2.01..2.02 rows=1 width=10) (actual time=0.012..0.013 rows=8 loops=5)
               Sort Key: a_1.source_code_id
               Sort Method: quicksort  Memory: 25kB
               ->  Subquery Scan on a_1  (cost=1.68..2.00 rows=1 width=10) (actual time=0.032..0.047 rows=8 loops=1)
                     Filter: (a_1.row_num = 1)
                     ->  WindowAgg  (cost=1.68..1.90 rows=8 width=34) (actual time=0.031..0.043 rows=8 loops=1)
                           ->  Sort  (cost=1.68..1.70 rows=8 width=26) (actual time=0.023..0.024 rows=8 loops=1)
                                 Sort Key: ar_totals_1.location_row_id, ar_totals_1.function_row_id, ar_totals_1.source_code_id, ar_totals_1.tran_date DESC
                                 Sort Method: quicksort  Memory: 25kB
                                 ->  Seq Scan on ar_totals ar_totals_1  (cost=0.00..1.56 rows=8 width=26) (actual time=0.006..0.013 rows=8 loops=1)
                                       Filter: ((tran_date < '2018-03-01'::date) AND (deleted_id = 0) AND (ledger_row_id = 1))
                                       Rows Removed by Filter: 24
 Planning Time: 0.479 ms
 Execution Time: 0.344 ms
(33 rows)


In this query the most slow operation is query planning. You try to do tests on almost empty tables. This has no practical sense. You should test queries on tables with size similar to production size.

The times less 1 ms has significant variance, and are not comparable.

Regards

Pavel


 

Re: SELECT is faster on SQL Server

From
Frank Millman
Date:


On 2021-03-19 12:00 PM, Pavel Stehule wrote:

In this query the most slow operation is query planning. You try to do tests on almost empty tables. This has no practical sense. You should test queries on tables with size similar to production size.

Sorry about that. I hope this one is better. Same query, different data set.

                                                                         QUERY PLAN                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=1401.00..1401.12 rows=1 width=132) (actual time=3.595..3.611 rows=5 loops=1)
   Merge Cond: (a.source_code_id = a_1.source_code_id)
   ->  GroupAggregate  (cost=673.16..673.18 rows=1 width=36) (actual time=1.101..1.108 rows=5 loops=1)
         Group Key: a.source_code_id
         ->  Sort  (cost=673.16..673.16 rows=1 width=12) (actual time=1.092..1.093 rows=5 loops=1)
               Sort Key: a.source_code_id
               Sort Method: quicksort  Memory: 25kB
               ->  Subquery Scan on a  (cost=670.67..673.15 rows=1 width=12) (actual time=1.008..1.086 rows=5 loops=1)
                     Filter: (a.row_num = 1)
                     Rows Removed by Filter: 59
                     ->  WindowAgg  (cost=670.67..672.37 rows=62 width=36) (actual time=1.006..1.076 rows=64 loops=1)
                           ->  Sort  (cost=670.67..670.82 rows=62 width=28) (actual time=0.996..1.004 rows=64 loops=1)
                                 Sort Key: ar_totals.location_row_id, ar_totals.function_row_id, ar_totals.source_code_id, ar_totals.tran_date DESC
                                 Sort Method: quicksort  Memory: 30kB
                                 ->  Seq Scan on ar_totals  (cost=0.00..668.82 rows=62 width=28) (actual time=0.012..0.933 rows=64 loops=1)
                                       Filter: ((tran_date <= '2015-04-30'::date) AND (deleted_id = 0) AND (ledger_row_id = 1))
                                       Rows Removed by Filter: 840
   ->  GroupAggregate  (cost=727.85..727.89 rows=2 width=36) (actual time=2.490..2.495 rows=5 loops=1)
         Group Key: a_1.source_code_id
         ->  Sort  (cost=727.85..727.85 rows=3 width=12) (actual time=2.485..2.485 rows=5 loops=1)
               Sort Key: a_1.source_code_id
               Sort Method: quicksort  Memory: 25kB
               ->  Subquery Scan on a_1  (cost=700.70..727.82 rows=3 width=12) (actual time=1.684..2.479 rows=5 loops=1)
                     Filter: (a_1.row_num = 1)
                     Rows Removed by Filter: 674
                     ->  WindowAgg  (cost=700.70..719.35 rows=678 width=36) (actual time=1.682..2.397 rows=679 loops=1)
                           ->  Sort  (cost=700.70..702.40 rows=678 width=28) (actual time=1.676..1.758 rows=679 loops=1)
                                 Sort Key: ar_totals_1.location_row_id, ar_totals_1.function_row_id, ar_totals_1.source_code_id, ar_totals_1.tran_date DESC
                                 Sort Method: quicksort  Memory: 78kB
                                 ->  Seq Scan on ar_totals ar_totals_1  (cost=0.00..668.82 rows=678 width=28) (actual time=0.007..0.836 rows=679 loops=1)
                                       Filter: ((tran_date < '2015-09-01'::date) AND (deleted_id = 0) AND (ledger_row_id = 1))
                                       Rows Removed by Filter: 225
 Planning Time: 0.496 ms
 Execution Time: 3.695 ms
(34 rows)


Re: SELECT is faster on SQL Server

From
hubert depesz lubaczewski
Date:
On Fri, Mar 19, 2021 at 12:58:10PM +0200, Frank Millman wrote:
> On 2021-03-19 12:00 PM, Pavel Stehule wrote:
> 
>   In this query the most slow operation is query planning. You try to do tests on almost empty tables. This has no
practicalsense.
 
>   You should test queries on tables with size similar to production size.
> 
> Sorry about that. I hope this one is better. Same query, different data set.

For starters, I'm not really sure it makes sense to optimize a query
that runs in 3.5 miliseconds!

Having said that, after putting the plan on explain.depesz.com, I got:
https://explain.depesz.com/s/xZel

Which shows that ~ 50% of time was spent in scan on ar_totals and
sorting it.

You seem to have some really weird indexed on ar_totals created (mixed
of nulls ordering).

Why don't you start with simple:
create index q on ar_totals (ledger_row_id, tran_date) where deleted_id = 0;

But, again - either you're overthinking performance of a query that can
run over 200 times per second on single core, or you're testing it with
different data than the one that is really a problem.

Best regards,

depesz




Re: SELECT is faster on SQL Server

From
Pavel Stehule
Date:


pá 19. 3. 2021 v 11:58 odesílatel Frank Millman <frank@chagford.com> napsal:


On 2021-03-19 12:00 PM, Pavel Stehule wrote:

In this query the most slow operation is query planning. You try to do tests on almost empty tables. This has no practical sense. You should test queries on tables with size similar to production size.

Sorry about that. I hope this one is better. Same query, different data set.

                                                                         QUERY PLAN                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=1401.00..1401.12 rows=1 width=132) (actual time=3.595..3.611 rows=5 loops=1)
   Merge Cond: (a.source_code_id = a_1.source_code_id)
   ->  GroupAggregate  (cost=673.16..673.18 rows=1 width=36) (actual time=1.101..1.108 rows=5 loops=1)
         Group Key: a.source_code_id
         ->  Sort  (cost=673.16..673.16 rows=1 width=12) (actual time=1.092..1.093 rows=5 loops=1)
               Sort Key: a.source_code_id
               Sort Method: quicksort  Memory: 25kB
               ->  Subquery Scan on a  (cost=670.67..673.15 rows=1 width=12) (actual time=1.008..1.086 rows=5 loops=1)
                     Filter: (a.row_num = 1)
                     Rows Removed by Filter: 59
                     ->  WindowAgg  (cost=670.67..672.37 rows=62 width=36) (actual time=1.006..1.076 rows=64 loops=1)
                           ->  Sort  (cost=670.67..670.82 rows=62 width=28) (actual time=0.996..1.004 rows=64 loops=1)
                                 Sort Key: ar_totals.location_row_id, ar_totals.function_row_id, ar_totals.source_code_id, ar_totals.tran_date DESC
                                 Sort Method: quicksort  Memory: 30kB
                                 ->  Seq Scan on ar_totals  (cost=0.00..668.82 rows=62 width=28) (actual time=0.012..0.933 rows=64 loops=1)
                                       Filter: ((tran_date <= '2015-04-30'::date) AND (deleted_id = 0) AND (ledger_row_id = 1))
                                       Rows Removed by Filter: 840
   ->  GroupAggregate  (cost=727.85..727.89 rows=2 width=36) (actual time=2.490..2.495 rows=5 loops=1)
         Group Key: a_1.source_code_id
         ->  Sort  (cost=727.85..727.85 rows=3 width=12) (actual time=2.485..2.485 rows=5 loops=1)
               Sort Key: a_1.source_code_id
               Sort Method: quicksort  Memory: 25kB
               ->  Subquery Scan on a_1  (cost=700.70..727.82 rows=3 width=12) (actual time=1.684..2.479 rows=5 loops=1)
                     Filter: (a_1.row_num = 1)
                     Rows Removed by Filter: 674
                     ->  WindowAgg  (cost=700.70..719.35 rows=678 width=36) (actual time=1.682..2.397 rows=679 loops=1)
                           ->  Sort  (cost=700.70..702.40 rows=678 width=28) (actual time=1.676..1.758 rows=679 loops=1)
                                 Sort Key: ar_totals_1.location_row_id, ar_totals_1.function_row_id, ar_totals_1.source_code_id, ar_totals_1.tran_date DESC
                                 Sort Method: quicksort  Memory: 78kB
                                 ->  Seq Scan on ar_totals ar_totals_1  (cost=0.00..668.82 rows=678 width=28) (actual time=0.007..0.836 rows=679 loops=1)
                                       Filter: ((tran_date < '2015-09-01'::date) AND (deleted_id = 0) AND (ledger_row_id = 1))
                                       Rows Removed by Filter: 225
 Planning Time: 0.496 ms
 Execution Time: 3.695 ms
(34 rows)


The most slow operation here is seq scan and sort of  ar_totals, but still the 4ms query is pretty fast. Maybe MSSQL server can read data faster. Did you run VACUUM on your table?

MSSQL has a more simple data format - so maybe seq scan can be faster.


 

Re: SELECT is faster on SQL Server

From
Frank Millman
Date:
On 2021-03-19 12:58 PM, Frank Millman wrote:
> QUERY PLAN
>
>
------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Merge Left Join  (cost=1401.00..1401.12 rows=1 width=132) (actual 
> time=3.595..3.611 rows=5 loops=1)
>    Merge Cond: (a.source_code_id = a_1.source_code_id)
>    ->  GroupAggregate  (cost=673.16..673.18 rows=1 width=36) (actual 
> time=1.101..1.108 rows=5 loops=1)
>          Group Key: a.source_code_id
>          ->  Sort  (cost=673.16..673.16 rows=1 width=12) (actual 
> time=1.092..1.093 rows=5 loops=1)
>                Sort Key: a.source_code_id
>                Sort Method: quicksort  Memory: 25kB
>                ->  Subquery Scan on a  (cost=670.67..673.15 rows=1 
> width=12) (actual time=1.008..1.086 rows=5 loops=1)
>                      Filter: (a.row_num = 1)
>                      Rows Removed by Filter: 59
>                      ->  WindowAgg  (cost=670.67..672.37 rows=62 
> width=36) (actual time=1.006..1.076 rows=64 loops=1)
>                            ->  Sort  (cost=670.67..670.82 rows=62 
> width=28) (actual time=0.996..1.004 rows=64 loops=1)
>                                  Sort Key: ar_totals.location_row_id, 
> ar_totals.function_row_id, ar_totals.source_code_id, 
> ar_totals.tran_date DESC
>                                  Sort Method: quicksort  Memory: 30kB
>                                  ->  Seq Scan on ar_totals 
> (cost=0.00..668.82 rows=62 width=28) (actual time=0.012..0.933 rows=64 
> loops=1)
>                                        Filter: ((tran_date <= 
> '2015-04-30'::date) AND (deleted_id = 0) AND (ledger_row_id = 1))
>                                        Rows Removed by Filter: 840
>    ->  GroupAggregate  (cost=727.85..727.89 rows=2 width=36) (actual 
> time=2.490..2.495 rows=5 loops=1)
>          Group Key: a_1.source_code_id
>          ->  Sort  (cost=727.85..727.85 rows=3 width=12) (actual 
> time=2.485..2.485 rows=5 loops=1)
>                Sort Key: a_1.source_code_id
>                Sort Method: quicksort  Memory: 25kB
>                ->  Subquery Scan on a_1  (cost=700.70..727.82 rows=3 
> width=12) (actual time=1.684..2.479 rows=5 loops=1)
>                      Filter: (a_1.row_num = 1)
>                      Rows Removed by Filter: 674
>                      ->  WindowAgg  (cost=700.70..719.35 rows=678 
> width=36) (actual time=1.682..2.397 rows=679 loops=1)
>                            ->  Sort  (cost=700.70..702.40 rows=678 
> width=28) (actual time=1.676..1.758 rows=679 loops=1)
>                                  Sort Key: 
> ar_totals_1.location_row_id, ar_totals_1.function_row_id, 
> ar_totals_1.source_code_id, ar_totals_1.tran_date DESC
>                                  Sort Method: quicksort  Memory: 78kB
>                                  ->  Seq Scan on ar_totals 
> ar_totals_1  (cost=0.00..668.82 rows=678 width=28) (actual 
> time=0.007..0.836 rows=679 loops=1)
>                                        Filter: ((tran_date < 
> '2015-09-01'::date) AND (deleted_id = 0) AND (ledger_row_id = 1))
>                                        Rows Removed by Filter: 225
>  Planning Time: 0.496 ms
>  Execution Time: 3.695 ms
> (34 rows)
>

@Pavel & depesz

Thanks for the replies. I am now focusing on the index. I tried dropping 
the index 'ar_tots_cover', and then adding back the index columns one at 
a time. Adding 'tran_date desc' made a small difference. Adding 
'tran_day' and 'tran_tot' made a big difference. This changed the index 
into a 'covering' index, and this is reflected in the new EXPLAIN 
ANALYSE (see below).

Execution of my main query has improved from 50ms to 33ms. Sql Server 
takes 25ms, but this is much better than it was.

However, the bizarre thing is that I have simply restored the index to 
what it was in the first place. If you look at the table definition in 
my original message you can see that all the columns were included in 
the index. But the query did not use it as a covering index. Now the 
EXPLAIN ANALYSE clearly shows 'Index Only Scan using ar_tots_cover'. I 
have no idea what changed.

Here is the new EXPLAIN ANALYSE -

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Merge Left Join  (cost=161.39..161.51 rows=1 width=132) (actual 
time=1.566..1.581 rows=5 loops=1)
    Merge Cond: (a.source_code_id = a_1.source_code_id)
    ->  GroupAggregate  (cost=50.27..50.29 rows=1 width=36) (actual 
time=0.226..0.232 rows=5 loops=1)
          Group Key: a.source_code_id
          ->  Sort  (cost=50.27..50.28 rows=1 width=12) (actual 
time=0.217..0.218 rows=5 loops=1)
                Sort Key: a.source_code_id
                Sort Method: quicksort  Memory: 25kB
                ->  Subquery Scan on a  (cost=47.78..50.26 rows=1 
width=12) (actual time=0.135..0.212 rows=5 loops=1)
                      Filter: (a.row_num = 1)
                      Rows Removed by Filter: 59
                      ->  WindowAgg  (cost=47.78..49.49 rows=62 
width=36) (actual time=0.133..0.202 rows=64 loops=1)
                            ->  Sort  (cost=47.78..47.94 rows=62 
width=28) (actual time=0.124..0.132 rows=64 loops=1)
                                  Sort Key: ar_totals.location_row_id, 
ar_totals.function_row_id, ar_totals.source_code_id, ar_totals.tran_date 
DESC
                                  Sort Method: quicksort  Memory: 30kB
                                  ->  Index Only Scan using 
ar_tots_cover on ar_totals  (cost=0.28..45.93 rows=62 width=28) (actual 
time=0.038..0.094 rows=64 loops=1)
                                        Index Cond: ((ledger_row_id = 1) 
AND (tran_date <= '2015-04-30'::date))
                                        Heap Fetches: 0
    ->  GroupAggregate  (cost=111.12..111.17 rows=2 width=36) (actual 
time=1.337..1.342 rows=5 loops=1)
          Group Key: a_1.source_code_id
          ->  Sort  (cost=111.12..111.13 rows=3 width=12) (actual 
time=1.333..1.334 rows=5 loops=1)
                Sort Key: a_1.source_code_id
                Sort Method: quicksort  Memory: 25kB
                ->  Subquery Scan on a_1  (cost=83.98..111.10 rows=3 
width=12) (actual time=0.538..1.328 rows=5 loops=1)
                      Filter: (a_1.row_num = 1)
                      Rows Removed by Filter: 674
                      ->  WindowAgg  (cost=83.98..102.62 rows=678 
width=36) (actual time=0.537..1.248 rows=679 loops=1)
                            ->  Sort  (cost=83.98..85.67 rows=678 
width=28) (actual time=0.531..0.613 rows=679 loops=1)
                                  Sort Key: ar_totals_1.location_row_id, 
ar_totals_1.function_row_id, ar_totals_1.source_code_id, 
ar_totals_1.tran_date DESC
                                  Sort Method: quicksort  Memory: 78kB
                                  ->  Index Only Scan using 
ar_tots_cover on ar_totals ar_totals_1  (cost=0.28..52.09 rows=678 
width=28) (actual time=0.017..0.270 rows=679 loops=1)
                                        Index Cond: ((ledger_row_id = 1) 
AND (tran_date < '2015-09-01'::date))
                                        Heap Fetches: 0
  Planning Time: 0.504 ms
  Execution Time: 1.673 ms
(34 rows)

I am happy to leave this here. Thanks for all the assistance.

Frank





Re: SELECT is faster on SQL Server

From
Tom Lane
Date:
Frank Millman <frank@chagford.com> writes:
> However, the bizarre thing is that I have simply restored the index to 
> what it was in the first place. If you look at the table definition in 
> my original message you can see that all the columns were included in 
> the index. But the query did not use it as a covering index. Now the 
> EXPLAIN ANALYSE clearly shows 'Index Only Scan using ar_tots_cover'. I 
> have no idea what changed.

VACUUM, maybe?  Even if there's a covering index, the planner is not
likely to prefer an index-only scan unless it thinks that most of the
table's pages are known all-visible.  If they're not, most of the
rows will require heap probes anyway to check row visibility, meaning
that the "index-only" scan's performance degrades to about that of a
regular indexscan.

In this example, since you're fetching such a large fraction of the
table (which the planner is accurately estimating), there's not a lot
of daylight between the estimated costs of seqscan and index-only
scan to begin with.  I'm not surprised that it'd prefer the former
if the table isn't recently vacuumed.

            regards, tom lane



Re: SELECT is faster on SQL Server

From
Jehan-Guillaume de Rorthais
Date:
On Fri, 19 Mar 2021 14:28:27 +0200
Frank Millman <frank@chagford.com> wrote:

> [...]
> Execution of my main query has improved from 50ms to 33ms. Sql Server
> takes 25ms, but this is much better than it was.
>
> [...]
>
> Here is the new EXPLAIN ANALYSE -
>
>                            QUERY PLAN
> --------------------------------------------------------------------
>   Merge Left Join  (...) (actual time=1.566..1.581 rows=5 loops=1)

1.581ms to output the very last row of this plan. This is in contradiction with
the 33ms you are referencing above.

What do I miss here? Maybe your 33ms comes yet from another set of data? Could
you share an explain analyze actually showing this 33ms total execution time?



Re: SELECT is faster on SQL Server

From
Thomas Kellerer
Date:
Frank Millman schrieb am 19.03.2021 um 10:16:
>>>> Very often "distinct on ()" is faster in Postgres compared to the equivalent solution using window functions
>> The two derived tables (cl_bal, op_bal) seem to be doing exactly the same thing - at least I can't spot a
difference.
>>
>> If that is correct, you can move them into a common table expression - maybe detecting that is SQL Server's secret
sauce.
>>
> There is a difference.
>
> cl_bal selects WHERE tran_date <= '2018-03-31'.
>
> op_bal selects WHERE tran_date < '2018-03-01'.
>
> The second one could be written as WHERE tran_date <= '2018-02-28', but I don't think that would make any difference.

I knew I overlooked something ;)

But as one is a true subset of the other, I think you can merge that into a single SELECT statement:

     select '2018-03-01' AS op_date,
            '2018-03-31' AS cl_date,
            a.source_code_id,
            sum(a.tran_tot) AS cl_tot,
            sum(a.tran_tot) filter (where tran_date < '2018-03-01') AS op_tot
     FROM (
        SELECT distinct on (location_row_id, function_row_id, source_code_id) source_code_id, tran_tot, tran_date
        FROM prop.ar_totals
        WHERE deleted_id = 0
          AND tran_date <= '2018-03-31'
          AND ledger_row_id = 1
        ORDER BY location_row_id, function_row_id, source_code_id, tran_date DESC
     ) AS a
     GROUP BY a.source_code_id




Re: SELECT is faster on SQL Server

From
Frank Millman
Date:
On 2021-03-19 4:38 PM, Tom Lane wrote:
> Frank Millman <frank@chagford.com> writes:
>> However, the bizarre thing is that I have simply restored the index to
>> what it was in the first place. If you look at the table definition in
>> my original message you can see that all the columns were included in
>> the index. But the query did not use it as a covering index. Now the
>> EXPLAIN ANALYSE clearly shows 'Index Only Scan using ar_tots_cover'. I
>> have no idea what changed.
> VACUUM, maybe?  Even if there's a covering index, the planner is not
> likely to prefer an index-only scan unless it thinks that most of the
> table's pages are known all-visible.  If they're not, most of the
> rows will require heap probes anyway to check row visibility, meaning
> that the "index-only" scan's performance degrades to about that of a
> regular indexscan.
>
> In this example, since you're fetching such a large fraction of the
> table (which the planner is accurately estimating), there's not a lot
> of daylight between the estimated costs of seqscan and index-only
> scan to begin with.  I'm not surprised that it'd prefer the former
> if the table isn't recently vacuumed.

It is possible. I know that I *did* vacuum. But I also ran a program to 
generate a few hundred additional rows, and I cannot remember if I ran 
the vacuum before or after that.

Frank





Re: SELECT is faster on SQL Server

From
Frank Millman
Date:
On 2021-03-19 5:32 PM, Jehan-Guillaume de Rorthais wrote:
> On Fri, 19 Mar 2021 14:28:27 +0200
> Frank Millman <frank@chagford.com> wrote:
>
>> [...]
>> Execution of my main query has improved from 50ms to 33ms. Sql Server
>> takes 25ms, but this is much better than it was.
>>
>> [...]
>>
>> Here is the new EXPLAIN ANALYSE -
>>
>>                             QUERY PLAN
>> --------------------------------------------------------------------
>>    Merge Left Join  (...) (actual time=1.566..1.581 rows=5 loops=1)
> 1.581ms to output the very last row of this plan. This is in contradiction with
> the 33ms you are referencing above.
>
> What do I miss here? Maybe your 33ms comes yet from another set of data? Could
> you share an explain analyze actually showing this 33ms total execution time?

Sorry, I should have explained.

The query I showed selects data for a single month. The 'real' query 
repeats this 12 times, each with different dates, and combines the 
results using UNION ALL. This was the timing mentioned above.

BTW, I know that I can improve this by setting up the dates in a CTE and 
using JOIN LATERAL. I am avoiding this as it is not supported by SQL 
Server or sqlite3, and I am trying to stick to one code base for all 
databases. But I will look into it further.

Frank





Re: SELECT is faster on SQL Server

From
Frank Millman
Date:
On 2021-03-19 7:11 PM, Thomas Kellerer wrote:
> Frank Millman schrieb am 19.03.2021 um 10:16:
>>
>> cl_bal selects WHERE tran_date <= '2018-03-31'.
>>
>> op_bal selects WHERE tran_date < '2018-03-01'.
>>
>> The second one could be written as WHERE tran_date <= '2018-02-28', 
>> but I don't think that would make any difference.
>
> I knew I overlooked something ;)
>
> But as one is a true subset of the other, I think you can merge that 
> into a single SELECT statement:
>
>     select '2018-03-01' AS op_date,
>            '2018-03-31' AS cl_date,
>            a.source_code_id,
>            sum(a.tran_tot) AS cl_tot,
>            sum(a.tran_tot) filter (where tran_date < '2018-03-01') AS 
> op_tot
>     FROM (
>        SELECT distinct on (location_row_id, function_row_id, 
> source_code_id) source_code_id, tran_tot, tran_date
>        FROM prop.ar_totals
>        WHERE deleted_id = 0
>          AND tran_date <= '2018-03-31'
>          AND ledger_row_id = 1
>        ORDER BY location_row_id, function_row_id, source_code_id, 
> tran_date DESC
>     ) AS a
>     GROUP BY a.source_code_id
>
Thanks very much Thomas - I did not know about FILTER.

But it does not quite work. If the SELECT does find a row where the max 
tran_date is <= '2018-03-31' it correctly includes it in 'cl_tot'. But 
the filter returns nothing for 'op_tot' because there is no 
corresponding row where tran_date < '2018-03-01'.

But I have learned something new, so thanks for that.

Frank