Thread: Very slow Query compared to Oracle / SQL - Server

Very slow Query compared to Oracle / SQL - Server

From
Semen Yefimenko
Date:
Hi there,

I've recently been involved in migrating our old system to SQL Server and then PostgreSQL. Everything has been working fine so far but now after executing our tests on Postgres, we saw a very slow running query on a large table in our database. 
I have tried asking on other platforms but no one has been able to give me a satisfying answer. 

Postgres Version : PostgreSQL 12.2, compiled by Visual C++ build 1914, 64-bit
No notable errors in the Server log and the Postgres Server itself.

The table structure :

CREATE TABLE logtable
(
    key character varying(20) COLLATE pg_catalog."default" NOT NULL,
    id integer,
    column3 integer,
    column4 integer,
    column5 integer,
    column6 integer,
    column7 integer,
    column8 integer,
    column9 character varying(128) COLLATE pg_catalog."default",
    column10   character varying(2048) COLLATE pg_catalog."default",
    column11   character varying(2048) COLLATE pg_catalog."default",
    column12   character varying(2048) COLLATE pg_catalog."default",
    column13   character varying(2048) COLLATE pg_catalog."default",
    column14   character varying(2048) COLLATE pg_catalog."default",
    column15   character varying(2048) COLLATE pg_catalog."default",
    column16   character varying(2048) COLLATE pg_catalog."default",
    column17   character varying(2048) COLLATE pg_catalog."default",
    column18   character varying(2048) COLLATE pg_catalog."default",
    column19   character varying(2048) COLLATE pg_catalog."default",
    column21 character varying(256) COLLATE pg_catalog."default",
    column22 character varying(256) COLLATE pg_catalog."default",
    column23 character varying(256) COLLATE pg_catalog."default",
    column24 character varying(256) COLLATE pg_catalog."default",
    column25 character varying(256) COLLATE pg_catalog."default",
    column26 character varying(256) COLLATE pg_catalog."default",
    column27 character varying(256) COLLATE pg_catalog."default",
    column28 character varying(256) COLLATE pg_catalog."default",
    column29 character varying(256) COLLATE pg_catalog."default",
    column30 character varying(256) COLLATE pg_catalog."default",
    column31 character varying(256) COLLATE pg_catalog."default",
    column32 character varying(256) COLLATE pg_catalog."default",
    column33 character varying(256) COLLATE pg_catalog."default",
    column34 character varying(256) COLLATE pg_catalog."default",
    column35 character varying(256) COLLATE pg_catalog."default",
    entrytype integer,
    column37 bigint,
    column38 bigint,
    column39 bigint,
    column40 bigint,
    column41 bigint,
    column42 bigint,
    column43 bigint,
    column44 bigint,
    column45 bigint,
    column46 bigint,
    column47 character varying(128) COLLATE pg_catalog."default",
    timestampcol timestamp without time zone,
    column49 timestamp without time zone,
    column50 timestamp without time zone,
    column51 timestamp without time zone,
    column52 timestamp without time zone,
    archivestatus integer,
    column54 integer,
    column55 character varying(20) COLLATE pg_catalog."default",
    CONSTRAINT pkey PRIMARY KEY (key)
        USING INDEX TABLESPACE tablespace
)

TABLESPACE tablespace;

ALTER TABLE schema.logtable
    OWNER to user;

CREATE INDEX idx_timestampcol
    ON schema.logtable USING btree
    ( timestampcol ASC NULLS LAST )
    TABLESPACE tablespace ;

CREATE INDEX idx_test2
    ON schema.logtable USING btree
    ( entrytype  ASC NULLS LAST)
    TABLESPACE tablespace
    WHERE archivestatus <= 1;

CREATE INDEX idx_arcstatus
    ON schema.logtable USING btree
    ( archivestatus ASC NULLS LAST)
    TABLESPACE tablespace;

CREATE INDEX idx_entrytype
    ON schema.logtable USING btree
    ( entrytype ASC NULLS LAST)
    TABLESPACE tablespace ;


The table contains 14.000.000 entries and has about 3.3 GB of data:
No triggers, inserts per day, probably 5-20 K per day.

SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='logtable';

relname           |relpages|reltuples|relallvisible|relkind|relnatts|relhassubclass|reloptions|pg_table_size|
------------------|--------|---------|-------------|-------|--------|--------------|----------|-------------|
logtable          |  405988| 14091424|       405907|r      |      54|false         |NULL      |   3326803968|


The slow running query:

SELECT column1,..., column54  where ((entrytype = 4000 or entrytype = 4001 or entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;

This query runs in about 45-60 seconds.
The same query runs in about 289 ms Oracle and 423 ms in SQL-Server. 
Now I understand that actually loading all results would take a while. (about 520K or so rows) 
But that shouldn't be exactly what happens right? There should be a resultset iterator which can retrieve all data but doesn't from the get go. 

With the help of some people in the slack and so thread, I've found a configuration parameter which helps performance : 

set random_page_cost = 1;

This improved performance from 45-60 s to 15-35 s. (since we are using ssd's) 
Still not acceptable but definitely an improvement. 
Some maybe relevant system parameters:

effective_cache_size 4GB
maintenance_work_mem 1GB
shared_buffers 2GB
work_mem 1GB

Currently I'm accessing the data through DbBeaver (JDBC - postgresql-42.2.5.jar) and our JAVA application (JDBC - postgresql-42.2.19.jar). Both use the defaultRowFetchSize=5000 to not load everything into memory and limit the results. 
The explain plan:

EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)...
(Above Query)

Gather Merge  (cost=347142.71..397196.91 rows=429006 width=2558) (actual time=21210.019..22319.444 rows=515841 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=141487 read=153489
  ->  Sort  (cost=346142.69..346678.95 rows=214503 width=2558) (actual time=21148.887..21297.428 rows=171947 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 62180kB
        Worker 0:  Sort Method: quicksort  Memory: 56969kB
        Worker 1:  Sort Method: quicksort  Memory: 56837kB
        Buffers: shared hit=141487 read=153489
        Worker 0: actual time=21129.973..21296.839 rows=166776 loops=1
          Buffers: shared hit=45558 read=49514
        Worker 1: actual time=21114.439..21268.117 rows=165896 loops=1
          Buffers: shared hit=45104 read=49506
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5652.74..327147.77 rows=214503 width=2558) (actual time=1304.813..20637.462 rows=171947 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=103962
              Buffers: shared hit=141473 read=153489
              Worker 0: actual time=1280.472..20638.620 rows=166776 loops=1
                Buffers: shared hit=45551 read=49514
              Worker 1: actual time=1275.274..20626.219 rows=165896 loops=1
                Buffers: shared hit=45097 read=49506
              ->  BitmapOr  (cost=5652.74..5652.74 rows=520443 width=0) (actual time=1179.438..1179.438 rows=0 loops=1)
                    Buffers: shared hit=9 read=1323
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..556.61 rows=54957 width=0) (actual time=161.939..161.940 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=1 read=171
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2243.22 rows=221705 width=0) (actual time=548.849..548.849 rows=224945 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=4 read=576
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2466.80 rows=243782 width=0) (actual time=468.637..468.637 rows=224926 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=4 read=576
Settings: random_page_cost = '1', search_path = '"$user", schema, public', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.578 ms
Execution Time: 22617.351 ms


As mentioned before, oracle does this much faster. 

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                         |  6878 |  2491K|       |  2143   (1)| 00:00:01 |
|   1 |  SORT ORDER BY                        |                         |  6878 |  2491K|  3448K|  2143   (1)| 00:00:01 |
|   2 |   INLIST ITERATOR                     |                         |       |       |       |            |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| logtable                |  6878 |  2491K|       |  1597   (1)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | idx_entrytype           |  6878 |       |       |    23   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------------

Is there much I can analyze, any information you might need to further analyze this? 

Re: Very slow Query compared to Oracle / SQL - Server

From
luis.roberto@siscobra.com.br
Date:
----- Mensagem original -----
> De: "Semen Yefimenko" <semen.yefimenko@gmail.com>
> Para: "pgsql-performance" <pgsql-performance@lists.postgresql.org>
> Enviadas: Quinta-feira, 6 de maio de 2021 11:38:39
> Assunto: Very slow Query compared to Oracle / SQL - Server


> SELECT column1,..., column54 where ((entrytype = 4000 or entrytype = 4001 or
> entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;
 

The first thing I would try is rewriting the query to:

SELECT column1,..., column54 
  FROM logtable
 WHERE (entrytype in (4000,4001,4002)) 
   AND (archivestatus <= 1)) 
 ORDER BY timestampcol DESC;

Check if that makes a difference...

Luis R. Weck 



Re: Very slow Query compared to Oracle / SQL - Server

From
Alexey M Boltenkov
Date:
On 05/06/21 19:11, luis.roberto@siscobra.com.br wrote:
----- Mensagem original -----
De: "Semen Yefimenko" <semen.yefimenko@gmail.com>
Para: "pgsql-performance" <pgsql-performance@lists.postgresql.org>
Enviadas: Quinta-feira, 6 de maio de 2021 11:38:39
Assunto: Very slow Query compared to Oracle / SQL - Server

SELECT column1,..., column54 where ((entrytype = 4000 or entrytype = 4001 or
entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;
 

The first thing I would try is rewriting the query to:

SELECT column1,..., column54   FROM logtable WHERE (entrytype in (4000,4001,4002))    AND (archivestatus <= 1))  ORDER BY timestampcol DESC;

Check if that makes a difference...

Luis R. Weck 



The IN statement will probable result in just recheck condition change to entrytype = any('{a,b,c}'::int[]). Looks like dispersion of archivestatus is not enough to use index idx_arcstatus.

Please try to create partial index with condition like (archivestatus <= 1) and rewrite select to use (archivestatus is not null and archivestatus <= 1).

CREATE INDEX idx_arcstatus_le1 ON schema.logtable ( archivestatus ) where (archivestatus <= 1) TABLESPACE tablespace;

Re: Very slow Query compared to Oracle / SQL - Server

From
Alexey M Boltenkov
Date:
On 05/06/21 21:15, Alexey M Boltenkov wrote:
On 05/06/21 19:11, luis.roberto@siscobra.com.br wrote:
----- Mensagem original -----
De: "Semen Yefimenko" <semen.yefimenko@gmail.com>
Para: "pgsql-performance" <pgsql-performance@lists.postgresql.org>
Enviadas: Quinta-feira, 6 de maio de 2021 11:38:39
Assunto: Very slow Query compared to Oracle / SQL - Server
SELECT column1,..., column54 where ((entrytype = 4000 or entrytype = 4001 or
entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;
 

The first thing I would try is rewriting the query to:

SELECT column1,..., column54   FROM logtable WHERE (entrytype in (4000,4001,4002))    AND (archivestatus <= 1))  ORDER BY timestampcol DESC;

Check if that makes a difference...

Luis R. Weck 



The IN statement will probable result in just recheck condition change to entrytype = any('{a,b,c}'::int[]). Looks like dispersion of archivestatus is not enough to use index idx_arcstatus.

Please try to create partial index with condition like (archivestatus <= 1) and rewrite select to use (archivestatus is not null and archivestatus <= 1).

CREATE INDEX idx_arcstatus_le1 ON schema.logtable ( archivestatus ) where (archivestatus <= 1) TABLESPACE tablespace;

I'm sorry, 'archivestatus is not null' is only necessary for index without nulls.


CREATE INDEX idx_arcstatus_le1 ON schema.logtable ( archivestatus ) where (archivestatus is not null and archivestatus <= 1) TABLESPACE tablespace;

Re: Very slow Query compared to Oracle / SQL - Server

From
Semen Yefimenko
Date:
Yes, rewriting the query with an IN clause was also my first approach, but I didn't help much. 
The Query plan did change a little bit but the performance was not impacted.

CREATE INDEX idx_arcstatus_le1 ON schema.logtable ( archivestatus ) where (archivestatus <= 1)
ANALYZE  schema.logtable

This resulted in this query plan:

Gather Merge  (cost=344618.96..394086.05 rows=423974 width=2549) (actual time=7327.777..9142.358 rows=516031 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=179817 read=115290
  ->  Sort  (cost=343618.94..344148.91 rows=211987 width=2549) (actual time=7258.314..7476.733 rows=172010 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 64730kB
        Worker 0:  Sort Method: quicksort  Memory: 55742kB
        Worker 1:  Sort Method: quicksort  Memory: 55565kB
        Buffers: shared hit=179817 read=115290
        Worker 0: actual time=7231.774..7458.703 rows=161723 loops=1
          Buffers: shared hit=55925 read=36265
        Worker 1: actual time=7217.856..7425.754 rows=161990 loops=1
          Buffers: shared hit=56197 read=36242
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5586.50..324864.86 rows=211987 width=2549) (actual time=1073.266..6805.850 rows=172010 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=109146
              Buffers: shared hit=179803 read=115290
              Worker 0: actual time=1049.875..6809.231 rows=161723 loops=1
                Buffers: shared hit=55918 read=36265
              Worker 1: actual time=1035.156..6788.037 rows=161990 loops=1
                Buffers: shared hit=56190 read=36242
              ->  BitmapOr  (cost=5586.50..5586.50 rows=514483 width=0) (actual time=945.179..945.179 rows=0 loops=1)
                    Buffers: shared hit=3 read=1329
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..738.13 rows=72893 width=0) (actual time=147.915..147.916 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=1 read=171
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2326.17 rows=229965 width=0) (actual time=473.450..473.451 rows=225040 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=1 read=579
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2140.61 rows=211624 width=0) (actual time=323.801..323.802 rows=225021 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=1 read=579
Settings: random_page_cost = '1', search_path = '"$user", schema, public', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.810 ms
Execution Time: 9647.406 ms

seemingly faster.
After doing a few selects, I reran ANALYZE:
Now it's even faster, probably due to cache and other mechanisms.

Gather Merge  (cost=342639.19..391676.44 rows=420290 width=2542) (actual time=2944.803..4534.725 rows=516035 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=147334 read=147776
  ->  Sort  (cost=341639.16..342164.53 rows=210145 width=2542) (actual time=2827.256..3013.960 rows=172012 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 71565kB
        Worker 0:  Sort Method: quicksort  Memory: 52916kB
        Worker 1:  Sort Method: quicksort  Memory: 51556kB
        Buffers: shared hit=147334 read=147776
        Worker 0: actual time=2771.975..2948.928 rows=153292 loops=1
          Buffers: shared hit=43227 read=43808
        Worker 1: actual time=2767.752..2938.688 rows=148424 loops=1
          Buffers: shared hit=42246 read=42002
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5537.95..323061.27 rows=210145 width=2542) (actual time=276.401..2418.925 rows=172012 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=122495
              Buffers: shared hit=147320 read=147776
              Worker 0: actual time=227.701..2408.580 rows=153292 loops=1
                Buffers: shared hit=43220 read=43808
              Worker 1: actual time=225.996..2408.705 rows=148424 loops=1
                Buffers: shared hit=42239 read=42002
              ->  BitmapOr  (cost=5537.95..5537.95 rows=509918 width=0) (actual time=203.940..203.941 rows=0 loops=1)
                    Buffers: shared hit=1332
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..680.48 rows=67206 width=0) (actual time=31.155..31.156 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2220.50 rows=219476 width=0) (actual time=112.459..112.461 rows=225042 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=580
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2258.70 rows=223236 width=0) (actual time=60.313..60.314 rows=225023 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=580
Settings: random_page_cost = '1', search_path = '"$user", schema, public', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.609 ms
Execution Time: 4984.490 ms

I don't see the new index used but it seems it's boosting the performance nevertheless.
I kept the query, so I didn't rewrite the query to be WITHOUT nulls. 
Thank you already for the hint. What else can I do? With the current parameters, the query finishes in about 3.9-5.2 seconds which is already much better but still nowhere near the speeds of 280 ms in oracle.
I would love to get it to at least 1 second.  


Am Do., 6. Mai 2021 um 20:20 Uhr schrieb Alexey M Boltenkov <padrebolt@yandex.ru>:
On 05/06/21 21:15, Alexey M Boltenkov wrote:
On 05/06/21 19:11, luis.roberto@siscobra.com.br wrote:
----- Mensagem original -----
De: "Semen Yefimenko" <semen.yefimenko@gmail.com>
Para: "pgsql-performance" <pgsql-performance@lists.postgresql.org>
Enviadas: Quinta-feira, 6 de maio de 2021 11:38:39
Assunto: Very slow Query compared to Oracle / SQL - Server
SELECT column1,..., column54 where ((entrytype = 4000 or entrytype = 4001 or
entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;
 

The first thing I would try is rewriting the query to:

SELECT column1,..., column54   FROM logtable WHERE (entrytype in (4000,4001,4002))    AND (archivestatus <= 1))  ORDER BY timestampcol DESC;

Check if that makes a difference...

Luis R. Weck 



The IN statement will probable result in just recheck condition change to entrytype = any('{a,b,c}'::int[]). Looks like dispersion of archivestatus is not enough to use index idx_arcstatus.

Please try to create partial index with condition like (archivestatus <= 1) and rewrite select to use (archivestatus is not null and archivestatus <= 1).

CREATE INDEX idx_arcstatus_le1 ON schema.logtable ( archivestatus ) where (archivestatus <= 1) TABLESPACE tablespace;

I'm sorry, 'archivestatus is not null' is only necessary for index without nulls.


CREATE INDEX idx_arcstatus_le1 ON schema.logtable ( archivestatus ) where (archivestatus is not null and archivestatus <= 1) TABLESPACE tablespace;

Re: Very slow Query compared to Oracle / SQL - Server

From
Andreas Joseph Krogh
Date:
På torsdag 06. mai 2021 kl. 20:59:34, skrev Semen Yefimenko <semen.yefimenko@gmail.com>:
Yes, rewriting the query with an IN clause was also my first approach, but I didn't help much. 
The Query plan did change a little bit but the performance was not impacted.
 
CREATE INDEX idx_arcstatus_le1 ON schema.logtable ( archivestatus ) where (archivestatus <= 1)
ANALYZE  schema.logtable

This resulted in this query plan:
[...]
 
I assume (4000,4001,4002) are just example-values and that they might be anything? Else you can just include them in your partial-index.
 
--
Andreas Joseph Krogh

Re: Very slow Query compared to Oracle / SQL - Server

From
Vijaykumar Jain
Date:
I am not sure, if the goal is just for the specific set of predicates or performance in general.

Also from the explain plan, it seems there is still a significant  amount of buffers read vs hit. 
That would constitute i/o and may add to slow result.

What is the size of the table and the index ?
Is it possible to increase shared buffers ?
coz it seems, you would end up reading a ton of rows and columns which would benefit from having the pages in cache.
although the cache needs to be warmed  by a query or via external extension :) 

Can you try tuning by increasing the shared_buffers slowly in steps of  500MB, and running explain analyze against the query.

If the Buffers read are reduced, i guess that would help speed up the query.
FYI, increasing shared_buffers requires a server restart.

As Always,
Ignore if this does not work :)


Thanks,
Vijay



On Fri, 7 May 2021 at 00:56, Andreas Joseph Krogh <andreas@visena.com> wrote:
På torsdag 06. mai 2021 kl. 20:59:34, skrev Semen Yefimenko <semen.yefimenko@gmail.com>:
Yes, rewriting the query with an IN clause was also my first approach, but I didn't help much. 
The Query plan did change a little bit but the performance was not impacted.
 
CREATE INDEX idx_arcstatus_le1 ON schema.logtable ( archivestatus ) where (archivestatus <= 1)
ANALYZE  schema.logtable

This resulted in this query plan:
[...]
 
I assume (4000,4001,4002) are just example-values and that they might be anything? Else you can just include them in your partial-index.
 
--
Andreas Joseph Krogh


--
Thanks,
Vijay
Mumbai, India

Re: Very slow Query compared to Oracle / SQL - Server

From
Justin Pryzby
Date:
On Thu, May 06, 2021 at 04:38:39PM +0200, Semen Yefimenko wrote:
> Hi there,
> 
> I've recently been involved in migrating our old system to SQL Server and
> then PostgreSQL. Everything has been working fine so far but now after
> executing our tests on Postgres, we saw a very slow running query on a
> large table in our database.
> I have tried asking on other platforms but no one has been able to give me
> a satisfying answer.

> With the help of some people in the slack and so thread, I've found a
> configuration parameter which helps performance :
> set random_page_cost = 1;

I wonder what the old query plan was...
Would you include links to your prior correspondance ?

>         ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5652.74..327147.77 rows=214503 width=2558) (actual
time=1304.813..20637.462rows=171947 loops=3)
 
>               Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype =
4002))
>               Filter: (logtable.archivestatus <= 1)
>               Heap Blocks: exact=103962
>               Buffers: shared hit=141473 read=153489
> 
>
-------------------------------------------------------------------------------------------------------------------------
> | Id  | Operation                             | Name                    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time
 |
 
>
-------------------------------------------------------------------------------------------------------------------------
> |   0 | SELECT STATEMENT                      |                         | 6878 |  2491K|       |  2143   (1)|
00:00:01|
 
> |   1 |  SORT ORDER BY                        |                         | 6878 |  2491K|  3448K|  2143   (1)|
00:00:01|
 
> |   2 |   INLIST ITERATOR                     |                         | |       |       |            |          |
> |*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| logtable                | 6878 |  2491K|       |  1597   (1)|
00:00:01|
 
> |*  4 |     INDEX RANGE SCAN                  | idx_entrytype           | 6878 |       |       |    23   (0)|
00:00:01|
 
>
-------------------------------------------------------------------------------------------------------------------------
> 
> Is there much I can analyze, any information you might need to further
> analyze this?

Oracle is apparently doing a single scan on "entrytype".

As a test, you could try forcing that, like:
begin; SET enable_bitmapscan=off ; explain (analyze) [...]; rollback;
or
begin; DROP INDEX idx_arcstatus; explain (analyze) [...]; rollback;

You could try to reduce the cost of that scan, by clustering on idx_arcstatus,
and then analyzing.  That will affect all other queries, too.  Also, the
"clustering" won't be preserved with future inserts/updates/deletes, so you may
have to do that as a periodic maintenance command.

-- 
Justin



Re: Very slow Query compared to Oracle / SQL - Server

From
Alexey M Boltenkov
Date:
On 05/06/21 22:58, Alexey M Boltenkov wrote:
Have you try of excluding not null from index? Can you give dispersion of archivestatus?


06.05.2021, 21:59, "Semen Yefimenko" <semen.yefimenko@gmail.com>:
Yes, rewriting the query with an IN clause was also my first approach, but I didn't help much. 
The Query plan did change a little bit but the performance was not impacted.

CREATE INDEX idx_arcstatus_le1 ON schema.logtable ( archivestatus ) where (archivestatus <= 1)
ANALYZE  schema.logtable

This resulted in this query plan:

Gather Merge  (cost=344618.96..394086.05 rows=423974 width=2549) (actual time=7327.777..9142.358 rows=516031 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=179817 read=115290
  ->  Sort  (cost=343618.94..344148.91 rows=211987 width=2549) (actual time=7258.314..7476.733 rows=172010 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 64730kB
        Worker 0:  Sort Method: quicksort  Memory: 55742kB
        Worker 1:  Sort Method: quicksort  Memory: 55565kB
        Buffers: shared hit=179817 read=115290
        Worker 0: actual time=7231.774..7458.703 rows=161723 loops=1
          Buffers: shared hit=55925 read=36265
        Worker 1: actual time=7217.856..7425.754 rows=161990 loops=1
          Buffers: shared hit=56197 read=36242
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5586.50..324864.86 rows=211987 width=2549) (actual time=1073.266..6805.850 rows=172010 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=109146
              Buffers: shared hit=179803 read=115290
              Worker 0: actual time=1049.875..6809.231 rows=161723 loops=1
                Buffers: shared hit=55918 read=36265
              Worker 1: actual time=1035.156..6788.037 rows=161990 loops=1
                Buffers: shared hit=56190 read=36242
              ->  BitmapOr  (cost=5586.50..5586.50 rows=514483 width=0) (actual time=945.179..945.179 rows=0 loops=1)
                    Buffers: shared hit=3 read=1329
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..738.13 rows=72893 width=0) (actual time=147.915..147.916 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=1 read=171
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2326.17 rows=229965 width=0) (actual time=473.450..473.451 rows=225040 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=1 read=579
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2140.61 rows=211624 width=0) (actual time=323.801..323.802 rows=225021 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=1 read=579
Settings: random_page_cost = '1', search_path = '"$user", schema, public', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.810 ms
Execution Time: 9647.406 ms

seemingly faster.
After doing a few selects, I reran ANALYZE:
Now it's even faster, probably due to cache and other mechanisms.

Gather Merge  (cost=342639.19..391676.44 rows=420290 width=2542) (actual time=2944.803..4534.725 rows=516035 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=147334 read=147776
  ->  Sort  (cost=341639.16..342164.53 rows=210145 width=2542) (actual time=2827.256..3013.960 rows=172012 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 71565kB
        Worker 0:  Sort Method: quicksort  Memory: 52916kB
        Worker 1:  Sort Method: quicksort  Memory: 51556kB
        Buffers: shared hit=147334 read=147776
        Worker 0: actual time=2771.975..2948.928 rows=153292 loops=1
          Buffers: shared hit=43227 read=43808
        Worker 1: actual time=2767.752..2938.688 rows=148424 loops=1
          Buffers: shared hit=42246 read=42002
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5537.95..323061.27 rows=210145 width=2542) (actual time=276.401..2418.925 rows=172012 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=122495
              Buffers: shared hit=147320 read=147776
              Worker 0: actual time=227.701..2408.580 rows=153292 loops=1
                Buffers: shared hit=43220 read=43808
              Worker 1: actual time=225.996..2408.705 rows=148424 loops=1
                Buffers: shared hit=42239 read=42002
              ->  BitmapOr  (cost=5537.95..5537.95 rows=509918 width=0) (actual time=203.940..203.941 rows=0 loops=1)
                    Buffers: shared hit=1332
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..680.48 rows=67206 width=0) (actual time=31.155..31.156 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2220.50 rows=219476 width=0) (actual time=112.459..112.461 rows=225042 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=580
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2258.70 rows=223236 width=0) (actual time=60.313..60.314 rows=225023 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=580
Settings: random_page_cost = '1', search_path = '"$user", schema, public', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.609 ms
Execution Time: 4984.490 ms

I don't see the new index used but it seems it's boosting the performance nevertheless.
I kept the query, so I didn't rewrite the query to be WITHOUT nulls. 
Thank you already for the hint. What else can I do? With the current parameters, the query finishes in about 3.9-5.2 seconds which is already much better but still nowhere near the speeds of 280 ms in oracle.
I would love to get it to at least 1 second.  


Am Do., 6. Mai 2021 um 20:20 Uhr schrieb Alexey M Boltenkov <padrebolt@yandex.ru>:
On 05/06/21 21:15, Alexey M Boltenkov wrote:
On 05/06/21 19:11, luis.roberto@siscobra.com.br wrote:
----- Mensagem original -----
De: "Semen Yefimenko" <semen.yefimenko@gmail.com>
Para: "pgsql-performance" <pgsql-performance@lists.postgresql.org>
Enviadas: Quinta-feira, 6 de maio de 2021 11:38:39
Assunto: Very slow Query compared to Oracle / SQL - Server
SELECT column1,..., column54 where ((entrytype = 4000 or entrytype = 4001 or
entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;
 

The first thing I would try is rewriting the query to:

SELECT column1,..., column54   FROM logtable WHERE (entrytype in (4000,4001,4002))    AND (archivestatus <= 1))  ORDER BY timestampcol DESC;

Check if that makes a difference...

Luis R. Weck 



The IN statement will probable result in just recheck condition change to entrytype = any('{a,b,c}'::int[]). Looks like dispersion of archivestatus is not enough to use index idx_arcstatus.

Please try to create partial index with condition like (archivestatus <= 1) and rewrite select to use (archivestatus is not null and archivestatus <= 1).

CREATE INDEX idx_arcstatus_le1 ON schema.logtable ( archivestatus ) where (archivestatus <= 1) TABLESPACE tablespace;

I'm sorry, 'archivestatus is not null' is only necessary for index without nulls.


CREATE INDEX idx_arcstatus_le1 ON schema.logtable ( archivestatus ) where (archivestatus is not null and archivestatus <= 1) TABLESPACE tablespace;

BTW, please try to reset random_page_cost.


Re: Very slow Query compared to Oracle / SQL - Server

From
Alexey M Boltenkov
Date:
On 05/06/21 23:02, Alexey M Boltenkov wrote:
On 05/06/21 22:58, Alexey M Boltenkov wrote:
Have you try of excluding not null from index? Can you give dispersion of archivestatus?


06.05.2021, 21:59, "Semen Yefimenko" <semen.yefimenko@gmail.com>:
Yes, rewriting the query with an IN clause was also my first approach, but I didn't help much. 
The Query plan did change a little bit but the performance was not impacted.

CREATE INDEX idx_arcstatus_le1 ON schema.logtable ( archivestatus ) where (archivestatus <= 1)
ANALYZE  schema.logtable

This resulted in this query plan:

Gather Merge  (cost=344618.96..394086.05 rows=423974 width=2549) (actual time=7327.777..9142.358 rows=516031 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=179817 read=115290
  ->  Sort  (cost=343618.94..344148.91 rows=211987 width=2549) (actual time=7258.314..7476.733 rows=172010 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 64730kB
        Worker 0:  Sort Method: quicksort  Memory: 55742kB
        Worker 1:  Sort Method: quicksort  Memory: 55565kB
        Buffers: shared hit=179817 read=115290
        Worker 0: actual time=7231.774..7458.703 rows=161723 loops=1
          Buffers: shared hit=55925 read=36265
        Worker 1: actual time=7217.856..7425.754 rows=161990 loops=1
          Buffers: shared hit=56197 read=36242
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5586.50..324864.86 rows=211987 width=2549) (actual time=1073.266..6805.850 rows=172010 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=109146
              Buffers: shared hit=179803 read=115290
              Worker 0: actual time=1049.875..6809.231 rows=161723 loops=1
                Buffers: shared hit=55918 read=36265
              Worker 1: actual time=1035.156..6788.037 rows=161990 loops=1
                Buffers: shared hit=56190 read=36242
              ->  BitmapOr  (cost=5586.50..5586.50 rows=514483 width=0) (actual time=945.179..945.179 rows=0 loops=1)
                    Buffers: shared hit=3 read=1329
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..738.13 rows=72893 width=0) (actual time=147.915..147.916 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=1 read=171
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2326.17 rows=229965 width=0) (actual time=473.450..473.451 rows=225040 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=1 read=579
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2140.61 rows=211624 width=0) (actual time=323.801..323.802 rows=225021 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=1 read=579
Settings: random_page_cost = '1', search_path = '"$user", schema, public', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.810 ms
Execution Time: 9647.406 ms

seemingly faster.
After doing a few selects, I reran ANALYZE:
Now it's even faster, probably due to cache and other mechanisms.

Gather Merge  (cost=342639.19..391676.44 rows=420290 width=2542) (actual time=2944.803..4534.725 rows=516035 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=147334 read=147776
  ->  Sort  (cost=341639.16..342164.53 rows=210145 width=2542) (actual time=2827.256..3013.960 rows=172012 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 71565kB
        Worker 0:  Sort Method: quicksort  Memory: 52916kB
        Worker 1:  Sort Method: quicksort  Memory: 51556kB
        Buffers: shared hit=147334 read=147776
        Worker 0: actual time=2771.975..2948.928 rows=153292 loops=1
          Buffers: shared hit=43227 read=43808
        Worker 1: actual time=2767.752..2938.688 rows=148424 loops=1
          Buffers: shared hit=42246 read=42002
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5537.95..323061.27 rows=210145 width=2542) (actual time=276.401..2418.925 rows=172012 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=122495
              Buffers: shared hit=147320 read=147776
              Worker 0: actual time=227.701..2408.580 rows=153292 loops=1
                Buffers: shared hit=43220 read=43808
              Worker 1: actual time=225.996..2408.705 rows=148424 loops=1
                Buffers: shared hit=42239 read=42002
              ->  BitmapOr  (cost=5537.95..5537.95 rows=509918 width=0) (actual time=203.940..203.941 rows=0 loops=1)
                    Buffers: shared hit=1332
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..680.48 rows=67206 width=0) (actual time=31.155..31.156 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2220.50 rows=219476 width=0) (actual time=112.459..112.461 rows=225042 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=580
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2258.70 rows=223236 width=0) (actual time=60.313..60.314 rows=225023 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=580
Settings: random_page_cost = '1', search_path = '"$user", schema, public', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.609 ms
Execution Time: 4984.490 ms

I don't see the new index used but it seems it's boosting the performance nevertheless.
I kept the query, so I didn't rewrite the query to be WITHOUT nulls. 
Thank you already for the hint. What else can I do? With the current parameters, the query finishes in about 3.9-5.2 seconds which is already much better but still nowhere near the speeds of 280 ms in oracle.
I would love to get it to at least 1 second.  


Am Do., 6. Mai 2021 um 20:20 Uhr schrieb Alexey M Boltenkov <padrebolt@yandex.ru>:
On 05/06/21 21:15, Alexey M Boltenkov wrote:
On 05/06/21 19:11, luis.roberto@siscobra.com.br wrote:
----- Mensagem original -----
De: "Semen Yefimenko" <semen.yefimenko@gmail.com>
Para: "pgsql-performance" <pgsql-performance@lists.postgresql.org>
Enviadas: Quinta-feira, 6 de maio de 2021 11:38:39
Assunto: Very slow Query compared to Oracle / SQL - Server
SELECT column1,..., column54 where ((entrytype = 4000 or entrytype = 4001 or
entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;
 

The first thing I would try is rewriting the query to:

SELECT column1,..., column54   FROM logtable WHERE (entrytype in (4000,4001,4002))    AND (archivestatus <= 1))  ORDER BY timestampcol DESC;

Check if that makes a difference...

Luis R. Weck 



The IN statement will probable result in just recheck condition change to entrytype = any('{a,b,c}'::int[]). Looks like dispersion of archivestatus is not enough to use index idx_arcstatus.

Please try to create partial index with condition like (archivestatus <= 1) and rewrite select to use (archivestatus is not null and archivestatus <= 1).

CREATE INDEX idx_arcstatus_le1 ON schema.logtable ( archivestatus ) where (archivestatus <= 1) TABLESPACE tablespace;

I'm sorry, 'archivestatus is not null' is only necessary for index without nulls.


CREATE INDEX idx_arcstatus_le1 ON schema.logtable ( archivestatus ) where (archivestatus is not null and archivestatus <= 1) TABLESPACE tablespace;

BTW, please try to reset random_page_cost.


The root of problem is in:

              Heap Blocks: exact=122495

  Buffers: shared hit=147334 read=147776

Have you tune shared buffers enough? Each block is of 8k by default.
Пиши напрямую мимо рассылки, если что :)


Re: Very slow Query compared to Oracle / SQL - Server

From
Imre Samu
Date:
> Postgres Version : PostgreSQL 12.2,
>  ... ON ... USING btree

IMHO:
The next minor (bugix&security) release is near ( expected ~ May 13th, 2021 )   https://www.postgresql.org/developer/roadmap/
so you can update your PostgreSQL to 12.7  ( + full Reindexing recommended ! )
 
You can find a lot of B-tree index-related fixes.
 - Fix possible undercounting of deleted B-tree index pages in VACUUM VERBOSE output
-  Fix wrong bookkeeping for oldest deleted page in a B-tree index
-  Ensure INCLUDE'd columns are always removed from B-tree pivot tuples
  - Avoid repeated marking of dead btree index entries as dead
  - Fix failure of parallel B-tree index scans when the index condition is unsatisfiable


> COLLATE pg_catalog."default"

You can test the "C" Collation  in some columns  (keys ? ) ;  in theory, it should be faster :
"The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them."

Best,
 Imre


Semen Yefimenko <semen.yefimenko@gmail.com> ezt írta (időpont: 2021. máj. 6., Cs, 16:38):
Hi there,

I've recently been involved in migrating our old system to SQL Server and then PostgreSQL. Everything has been working fine so far but now after executing our tests on Postgres, we saw a very slow running query on a large table in our database. 
I have tried asking on other platforms but no one has been able to give me a satisfying answer. 

Postgres Version : PostgreSQL 12.2, compiled by Visual C++ build 1914, 64-bit
No notable errors in the Server log and the Postgres Server itself.

The table structure :

CREATE TABLE logtable
(
    key character varying(20) COLLATE pg_catalog."default" NOT NULL,
    id integer,
    column3 integer,
    column4 integer,
    column5 integer,
    column6 integer,
    column7 integer,
    column8 integer,
    column9 character varying(128) COLLATE pg_catalog."default",
    column10   character varying(2048) COLLATE pg_catalog."default",
    column11   character varying(2048) COLLATE pg_catalog."default",
    column12   character varying(2048) COLLATE pg_catalog."default",
    column13   character varying(2048) COLLATE pg_catalog."default",
    column14   character varying(2048) COLLATE pg_catalog."default",
    column15   character varying(2048) COLLATE pg_catalog."default",
    column16   character varying(2048) COLLATE pg_catalog."default",
    column17   character varying(2048) COLLATE pg_catalog."default",
    column18   character varying(2048) COLLATE pg_catalog."default",
    column19   character varying(2048) COLLATE pg_catalog."default",
    column21 character varying(256) COLLATE pg_catalog."default",
    column22 character varying(256) COLLATE pg_catalog."default",
    column23 character varying(256) COLLATE pg_catalog."default",
    column24 character varying(256) COLLATE pg_catalog."default",
    column25 character varying(256) COLLATE pg_catalog."default",
    column26 character varying(256) COLLATE pg_catalog."default",
    column27 character varying(256) COLLATE pg_catalog."default",
    column28 character varying(256) COLLATE pg_catalog."default",
    column29 character varying(256) COLLATE pg_catalog."default",
    column30 character varying(256) COLLATE pg_catalog."default",
    column31 character varying(256) COLLATE pg_catalog."default",
    column32 character varying(256) COLLATE pg_catalog."default",
    column33 character varying(256) COLLATE pg_catalog."default",
    column34 character varying(256) COLLATE pg_catalog."default",
    column35 character varying(256) COLLATE pg_catalog."default",
    entrytype integer,
    column37 bigint,
    column38 bigint,
    column39 bigint,
    column40 bigint,
    column41 bigint,
    column42 bigint,
    column43 bigint,
    column44 bigint,
    column45 bigint,
    column46 bigint,
    column47 character varying(128) COLLATE pg_catalog."default",
    timestampcol timestamp without time zone,
    column49 timestamp without time zone,
    column50 timestamp without time zone,
    column51 timestamp without time zone,
    column52 timestamp without time zone,
    archivestatus integer,
    column54 integer,
    column55 character varying(20) COLLATE pg_catalog."default",
    CONSTRAINT pkey PRIMARY KEY (key)
        USING INDEX TABLESPACE tablespace
)

TABLESPACE tablespace;

ALTER TABLE schema.logtable
    OWNER to user;

CREATE INDEX idx_timestampcol
    ON schema.logtable USING btree
    ( timestampcol ASC NULLS LAST )
    TABLESPACE tablespace ;

CREATE INDEX idx_test2
    ON schema.logtable USING btree
    ( entrytype  ASC NULLS LAST)
    TABLESPACE tablespace
    WHERE archivestatus <= 1;

CREATE INDEX idx_arcstatus
    ON schema.logtable USING btree
    ( archivestatus ASC NULLS LAST)
    TABLESPACE tablespace;

CREATE INDEX idx_entrytype
    ON schema.logtable USING btree
    ( entrytype ASC NULLS LAST)
    TABLESPACE tablespace ;


The table contains 14.000.000 entries and has about 3.3 GB of data:
No triggers, inserts per day, probably 5-20 K per day.

SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='logtable';

relname           |relpages|reltuples|relallvisible|relkind|relnatts|relhassubclass|reloptions|pg_table_size|
------------------|--------|---------|-------------|-------|--------|--------------|----------|-------------|
logtable          |  405988| 14091424|       405907|r      |      54|false         |NULL      |   3326803968|


The slow running query:

SELECT column1,..., column54  where ((entrytype = 4000 or entrytype = 4001 or entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;

This query runs in about 45-60 seconds.
The same query runs in about 289 ms Oracle and 423 ms in SQL-Server. 
Now I understand that actually loading all results would take a while. (about 520K or so rows) 
But that shouldn't be exactly what happens right? There should be a resultset iterator which can retrieve all data but doesn't from the get go. 

With the help of some people in the slack and so thread, I've found a configuration parameter which helps performance : 

set random_page_cost = 1;

This improved performance from 45-60 s to 15-35 s. (since we are using ssd's) 
Still not acceptable but definitely an improvement. 
Some maybe relevant system parameters:

effective_cache_size 4GB
maintenance_work_mem 1GB
shared_buffers 2GB
work_mem 1GB

Currently I'm accessing the data through DbBeaver (JDBC - postgresql-42.2.5.jar) and our JAVA application (JDBC - postgresql-42.2.19.jar). Both use the defaultRowFetchSize=5000 to not load everything into memory and limit the results. 
The explain plan:

EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)...
(Above Query)

Gather Merge  (cost=347142.71..397196.91 rows=429006 width=2558) (actual time=21210.019..22319.444 rows=515841 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=141487 read=153489
  ->  Sort  (cost=346142.69..346678.95 rows=214503 width=2558) (actual time=21148.887..21297.428 rows=171947 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 62180kB
        Worker 0:  Sort Method: quicksort  Memory: 56969kB
        Worker 1:  Sort Method: quicksort  Memory: 56837kB
        Buffers: shared hit=141487 read=153489
        Worker 0: actual time=21129.973..21296.839 rows=166776 loops=1
          Buffers: shared hit=45558 read=49514
        Worker 1: actual time=21114.439..21268.117 rows=165896 loops=1
          Buffers: shared hit=45104 read=49506
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5652.74..327147.77 rows=214503 width=2558) (actual time=1304.813..20637.462 rows=171947 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=103962
              Buffers: shared hit=141473 read=153489
              Worker 0: actual time=1280.472..20638.620 rows=166776 loops=1
                Buffers: shared hit=45551 read=49514
              Worker 1: actual time=1275.274..20626.219 rows=165896 loops=1
                Buffers: shared hit=45097 read=49506
              ->  BitmapOr  (cost=5652.74..5652.74 rows=520443 width=0) (actual time=1179.438..1179.438 rows=0 loops=1)
                    Buffers: shared hit=9 read=1323
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..556.61 rows=54957 width=0) (actual time=161.939..161.940 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=1 read=171
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2243.22 rows=221705 width=0) (actual time=548.849..548.849 rows=224945 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=4 read=576
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2466.80 rows=243782 width=0) (actual time=468.637..468.637 rows=224926 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=4 read=576
Settings: random_page_cost = '1', search_path = '"$user", schema, public', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.578 ms
Execution Time: 22617.351 ms


As mentioned before, oracle does this much faster. 

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                         |  6878 |  2491K|       |  2143   (1)| 00:00:01 |
|   1 |  SORT ORDER BY                        |                         |  6878 |  2491K|  3448K|  2143   (1)| 00:00:01 |
|   2 |   INLIST ITERATOR                     |                         |       |       |       |            |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| logtable                |  6878 |  2491K|       |  1597   (1)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | idx_entrytype           |  6878 |       |       |    23   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------------

Is there much I can analyze, any information you might need to further analyze this? 

Re: Very slow Query compared to Oracle / SQL - Server

From
Semen Yefimenko
Date:
Sorry if I'm cumulatively answering everyone in one E-Mail, I'm not sure how I'm supposed to do it. (single E-Mails vs many) 
 
Can you try tuning by increasing the shared_buffers slowly in steps of  500MB, and running explain analyze against the query.

-- 2500 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=2076.329..3737.050 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=295446
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=2007.487..2202.707 rows=172172 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 65154kB
        Worker 0:  Sort Method: quicksort  Memory: 55707kB
        Worker 1:  Sort Method: quicksort  Memory: 55304kB
        Buffers: shared hit=295446
        Worker 0: actual time=1963.969..2156.624 rows=161205 loops=1
          Buffers: shared hit=91028
        Worker 1: actual time=1984.700..2179.697 rows=161935 loops=1
          Buffers: shared hit=92133
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=322.125..1618.971 rows=172172 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=110951
              Buffers: shared hit=295432
              Worker 0: actual time=282.201..1595.117 rows=161205 loops=1
                Buffers: shared hit=91021
              Worker 1: actual time=303.671..1623.299 rows=161935 loops=1
                Buffers: shared hit=92126
              ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0) (actual time=199.119..199.119 rows=0 loops=1)
                    Buffers: shared hit=1334
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..682.13 rows=67293 width=0) (actual time=28.856..28.857 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2223.63 rows=219760 width=0) (actual time=108.871..108.872 rows=225283 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=581
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2261.87 rows=223525 width=0) (actual time=61.377..61.377 rows=225264 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=581
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.940 ms
Execution Time: 4188.083 ms
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-- 3000 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=2062.280..3763.408 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=295446
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=1987.933..2180.422 rows=172172 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 66602kB
        Worker 0:  Sort Method: quicksort  Memory: 55149kB
        Worker 1:  Sort Method: quicksort  Memory: 54415kB
        Buffers: shared hit=295446
        Worker 0: actual time=1963.059..2147.916 rows=159556 loops=1
          Buffers: shared hit=89981
        Worker 1: actual time=1949.726..2136.200 rows=158554 loops=1
          Buffers: shared hit=90141
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=340.705..1603.796 rows=172172 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=113990
              Buffers: shared hit=295432
              Worker 0: actual time=317.918..1605.548 rows=159556 loops=1
                Buffers: shared hit=89974
              Worker 1: actual time=304.744..1589.221 rows=158554 loops=1
                Buffers: shared hit=90134
              ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0) (actual time=218.972..218.973 rows=0 loops=1)
                    Buffers: shared hit=1334
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..682.13 rows=67293 width=0) (actual time=37.741..37.742 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2223.63 rows=219760 width=0) (actual time=119.120..119.121 rows=225283 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=581
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2261.87 rows=223525 width=0) (actual time=62.097..62.098 rows=225264 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=581
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 2.717 ms
Execution Time: 4224.670 ms
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-- 3500 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=3578.155..4932.858 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=14 read=295432 written=67
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=3482.159..3677.227 rows=172172 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 58533kB
        Worker 0:  Sort Method: quicksort  Memory: 56878kB
        Worker 1:  Sort Method: quicksort  Memory: 60755kB
        Buffers: shared hit=14 read=295432 written=67
        Worker 0: actual time=3435.131..3632.985 rows=166842 loops=1
          Buffers: shared hit=7 read=95783 written=25
        Worker 1: actual time=3441.545..3649.345 rows=179354 loops=1
          Buffers: shared hit=5 read=101608 written=20
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=345.111..3042.932 rows=172172 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=96709
              Buffers: shared hit=2 read=295430 written=67
              Worker 0: actual time=300.525..2999.403 rows=166842 loops=1
                Buffers: shared read=95783 written=25
              Worker 1: actual time=300.552..3004.859 rows=179354 loops=1
                Buffers: shared read=101606 written=20
              ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0) (actual time=241.996..241.997 rows=0 loops=1)
                    Buffers: shared hit=2 read=1332
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..682.13 rows=67293 width=0) (actual time=37.129..37.130 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared read=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2223.63 rows=219760 width=0) (actual time=131.051..131.052 rows=225283 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=1 read=580
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2261.87 rows=223525 width=0) (actual time=73.800..73.800 rows=225264 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=1 read=580
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.597 ms
Execution Time: 5389.811 ms
 
This doesn't seem to have had an effect. 
Thanks for the suggestion. 

Have you try of excluding not null from index? Can you give dispersion of archivestatus?

Yes I have, it yielded the same performance boost as :

 create index test on logtable(entrytype) where archivestatus <= 1;

I wonder what the old query plan was...
Would you include links to your prior correspondance ?

So prior Execution Plans are present in the SO.
The other forums I've tried are the official slack channel : 
https://postgresteam.slack.com/archives/C0FS3UTAP/p1620286295228600
And SO : https://stackoverflow.com/questions/67401792/slow-running-postgresql-query
But I think most of the points discussed in these posts have already been mentionend by you except bloating of indexes. 

Oracle is apparently doing a single scan on "entrytype".
As a test, you could try forcing that, like:
begin; SET enable_bitmapscan=off ; explain (analyze) [...]; rollback;
or
begin; DROP INDEX idx_arcstatus; explain (analyze) [...]; rollback;

I've tried enable_bitmapscan=off but it didn't yield any good results.

-- 2000 MB shared buffers - random_page_cost = 4 - enable_bitmapscan to off
Gather Merge  (cost=543949.72..593050.69 rows=420836 width=2542) (actual time=7716.031..9043.399 rows=516517 loops=1)
  Output: column1, .., column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=192 read=406605
  ->  Sort  (cost=542949.70..543475.75 rows=210418 width=2542) (actual time=7642.666..7835.527 rows=172172 loops=3)
        Output: column1, .., column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 58803kB
        Worker 0:  Sort Method: quicksort  Memory: 60376kB
        Worker 1:  Sort Method: quicksort  Memory: 56988kB
        Buffers: shared hit=192 read=406605
        Worker 0: actual time=7610.482..7814.905 rows=177637 loops=1
          Buffers: shared hit=78 read=137826
        Worker 1: actual time=7607.645..7803.561 rows=167316 loops=1
          Buffers: shared hit=80 read=132672
        ->  Parallel Seq Scan on schema.logtable  (cost=0.00..524345.70 rows=210418 width=2542) (actual time=1.669..7189.365 rows=172172 loops=3)
              Output: column1, .., column54
              Filter: ((logtable.acrhivestatus <= 1) AND ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)))
              Rows Removed by Filter: 4533459
              Buffers: shared hit=96 read=406605
              Worker 0: actual time=1.537..7158.286 rows=177637 loops=1
                Buffers: shared hit=30 read=137826
              Worker 1: actual time=1.414..7161.670 rows=167316 loops=1
                Buffers: shared hit=32 read=132672
Settings: enable_bitmapscan = 'off', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.725 ms
Execution Time: 9500.928 ms
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2000 MB shared buffers - random_page_cost = 4 - -- 2000 -- 2000 MB shared buffers - random_page_cost = 1 - enable_bitmapscan to off
Gather Merge  (cost=543949.72..593050.69 rows=420836 width=2542) (actual time=7519.032..8871.433 rows=516517 loops=1)
  Output: column1, .., column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=576 read=406221
  ->  Sort  (cost=542949.70..543475.75 rows=210418 width=2542) (actual time=7451.958..7649.480 rows=172172 loops=3)
        Output: column1, .., column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 58867kB
        Worker 0:  Sort Method: quicksort  Memory: 58510kB
        Worker 1:  Sort Method: quicksort  Memory: 58788kB
        Buffers: shared hit=576 read=406221
        Worker 0: actual time=7438.271..7644.241 rows=172085 loops=1
          Buffers: shared hit=203 read=135166
        Worker 1: actual time=7407.574..7609.922 rows=172948 loops=1
          Buffers: shared hit=202 read=135225
        ->  Parallel Seq Scan on schema.logtable  (cost=0.00..524345.70 rows=210418 width=2542) (actual time=2.839..7017.729 rows=172172 loops=3)
              Output: column1, .., column54
              Filter: ((logtable.acrhivestatus <= 1) AND ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)))
              Rows Removed by Filter: 4533459
              Buffers: shared hit=480 read=406221
              Worker 0: actual time=2.628..7006.420 rows=172085 loops=1
                Buffers: shared hit=155 read=135166
              Worker 1: actual time=3.948..6978.154 rows=172948 loops=1
                Buffers: shared hit=154 read=135225
Settings: enable_bitmapscan = 'off', random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.621 ms
Execution Time: 9339.457 ms

Have you tune shared buffers enough? Each block is of 8k by default.
BTW, please try to reset random_page_cost.

Look above. 

I will try upgrading the minor version next.
I will also try setting up a 13.X version locally and import the data from 12.2 to 13.X and see if it might be faster.


Am Do., 6. Mai 2021 um 23:16 Uhr schrieb Imre Samu <pella.samu@gmail.com>:
> Postgres Version : PostgreSQL 12.2,
>  ... ON ... USING btree

IMHO:
The next minor (bugix&security) release is near ( expected ~ May 13th, 2021 )   https://www.postgresql.org/developer/roadmap/
so you can update your PostgreSQL to 12.7  ( + full Reindexing recommended ! )
 
You can find a lot of B-tree index-related fixes.
 - Fix possible undercounting of deleted B-tree index pages in VACUUM VERBOSE output
-  Fix wrong bookkeeping for oldest deleted page in a B-tree index
-  Ensure INCLUDE'd columns are always removed from B-tree pivot tuples
  - Avoid repeated marking of dead btree index entries as dead
  - Fix failure of parallel B-tree index scans when the index condition is unsatisfiable


> COLLATE pg_catalog."default"

You can test the "C" Collation  in some columns  (keys ? ) ;  in theory, it should be faster :
"The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them."

Best,
 Imre


Semen Yefimenko <semen.yefimenko@gmail.com> ezt írta (időpont: 2021. máj. 6., Cs, 16:38):
Hi there,

I've recently been involved in migrating our old system to SQL Server and then PostgreSQL. Everything has been working fine so far but now after executing our tests on Postgres, we saw a very slow running query on a large table in our database. 
I have tried asking on other platforms but no one has been able to give me a satisfying answer. 

Postgres Version : PostgreSQL 12.2, compiled by Visual C++ build 1914, 64-bit
No notable errors in the Server log and the Postgres Server itself.

The table structure :

CREATE TABLE logtable
(
    key character varying(20) COLLATE pg_catalog."default" NOT NULL,
    id integer,
    column3 integer,
    column4 integer,
    column5 integer,
    column6 integer,
    column7 integer,
    column8 integer,
    column9 character varying(128) COLLATE pg_catalog."default",
    column10   character varying(2048) COLLATE pg_catalog."default",
    column11   character varying(2048) COLLATE pg_catalog."default",
    column12   character varying(2048) COLLATE pg_catalog."default",
    column13   character varying(2048) COLLATE pg_catalog."default",
    column14   character varying(2048) COLLATE pg_catalog."default",
    column15   character varying(2048) COLLATE pg_catalog."default",
    column16   character varying(2048) COLLATE pg_catalog."default",
    column17   character varying(2048) COLLATE pg_catalog."default",
    column18   character varying(2048) COLLATE pg_catalog."default",
    column19   character varying(2048) COLLATE pg_catalog."default",
    column21 character varying(256) COLLATE pg_catalog."default",
    column22 character varying(256) COLLATE pg_catalog."default",
    column23 character varying(256) COLLATE pg_catalog."default",
    column24 character varying(256) COLLATE pg_catalog."default",
    column25 character varying(256) COLLATE pg_catalog."default",
    column26 character varying(256) COLLATE pg_catalog."default",
    column27 character varying(256) COLLATE pg_catalog."default",
    column28 character varying(256) COLLATE pg_catalog."default",
    column29 character varying(256) COLLATE pg_catalog."default",
    column30 character varying(256) COLLATE pg_catalog."default",
    column31 character varying(256) COLLATE pg_catalog."default",
    column32 character varying(256) COLLATE pg_catalog."default",
    column33 character varying(256) COLLATE pg_catalog."default",
    column34 character varying(256) COLLATE pg_catalog."default",
    column35 character varying(256) COLLATE pg_catalog."default",
    entrytype integer,
    column37 bigint,
    column38 bigint,
    column39 bigint,
    column40 bigint,
    column41 bigint,
    column42 bigint,
    column43 bigint,
    column44 bigint,
    column45 bigint,
    column46 bigint,
    column47 character varying(128) COLLATE pg_catalog."default",
    timestampcol timestamp without time zone,
    column49 timestamp without time zone,
    column50 timestamp without time zone,
    column51 timestamp without time zone,
    column52 timestamp without time zone,
    archivestatus integer,
    column54 integer,
    column55 character varying(20) COLLATE pg_catalog."default",
    CONSTRAINT pkey PRIMARY KEY (key)
        USING INDEX TABLESPACE tablespace
)

TABLESPACE tablespace;

ALTER TABLE schema.logtable
    OWNER to user;

CREATE INDEX idx_timestampcol
    ON schema.logtable USING btree
    ( timestampcol ASC NULLS LAST )
    TABLESPACE tablespace ;

CREATE INDEX idx_test2
    ON schema.logtable USING btree
    ( entrytype  ASC NULLS LAST)
    TABLESPACE tablespace
    WHERE archivestatus <= 1;

CREATE INDEX idx_arcstatus
    ON schema.logtable USING btree
    ( archivestatus ASC NULLS LAST)
    TABLESPACE tablespace;

CREATE INDEX idx_entrytype
    ON schema.logtable USING btree
    ( entrytype ASC NULLS LAST)
    TABLESPACE tablespace ;


The table contains 14.000.000 entries and has about 3.3 GB of data:
No triggers, inserts per day, probably 5-20 K per day.

SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='logtable';

relname           |relpages|reltuples|relallvisible|relkind|relnatts|relhassubclass|reloptions|pg_table_size|
------------------|--------|---------|-------------|-------|--------|--------------|----------|-------------|
logtable          |  405988| 14091424|       405907|r      |      54|false         |NULL      |   3326803968|


The slow running query:

SELECT column1,..., column54  where ((entrytype = 4000 or entrytype = 4001 or entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;

This query runs in about 45-60 seconds.
The same query runs in about 289 ms Oracle and 423 ms in SQL-Server. 
Now I understand that actually loading all results would take a while. (about 520K or so rows) 
But that shouldn't be exactly what happens right? There should be a resultset iterator which can retrieve all data but doesn't from the get go. 

With the help of some people in the slack and so thread, I've found a configuration parameter which helps performance : 

set random_page_cost = 1;

This improved performance from 45-60 s to 15-35 s. (since we are using ssd's) 
Still not acceptable but definitely an improvement. 
Some maybe relevant system parameters:

effective_cache_size 4GB
maintenance_work_mem 1GB
shared_buffers 2GB
work_mem 1GB

Currently I'm accessing the data through DbBeaver (JDBC - postgresql-42.2.5.jar) and our JAVA application (JDBC - postgresql-42.2.19.jar). Both use the defaultRowFetchSize=5000 to not load everything into memory and limit the results. 
The explain plan:

EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)...
(Above Query)

Gather Merge  (cost=347142.71..397196.91 rows=429006 width=2558) (actual time=21210.019..22319.444 rows=515841 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=141487 read=153489
  ->  Sort  (cost=346142.69..346678.95 rows=214503 width=2558) (actual time=21148.887..21297.428 rows=171947 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 62180kB
        Worker 0:  Sort Method: quicksort  Memory: 56969kB
        Worker 1:  Sort Method: quicksort  Memory: 56837kB
        Buffers: shared hit=141487 read=153489
        Worker 0: actual time=21129.973..21296.839 rows=166776 loops=1
          Buffers: shared hit=45558 read=49514
        Worker 1: actual time=21114.439..21268.117 rows=165896 loops=1
          Buffers: shared hit=45104 read=49506
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5652.74..327147.77 rows=214503 width=2558) (actual time=1304.813..20637.462 rows=171947 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=103962
              Buffers: shared hit=141473 read=153489
              Worker 0: actual time=1280.472..20638.620 rows=166776 loops=1
                Buffers: shared hit=45551 read=49514
              Worker 1: actual time=1275.274..20626.219 rows=165896 loops=1
                Buffers: shared hit=45097 read=49506
              ->  BitmapOr  (cost=5652.74..5652.74 rows=520443 width=0) (actual time=1179.438..1179.438 rows=0 loops=1)
                    Buffers: shared hit=9 read=1323
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..556.61 rows=54957 width=0) (actual time=161.939..161.940 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=1 read=171
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2243.22 rows=221705 width=0) (actual time=548.849..548.849 rows=224945 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=4 read=576
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2466.80 rows=243782 width=0) (actual time=468.637..468.637 rows=224926 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=4 read=576
Settings: random_page_cost = '1', search_path = '"$user", schema, public', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.578 ms
Execution Time: 22617.351 ms


As mentioned before, oracle does this much faster. 

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                         |  6878 |  2491K|       |  2143   (1)| 00:00:01 |
|   1 |  SORT ORDER BY                        |                         |  6878 |  2491K|  3448K|  2143   (1)| 00:00:01 |
|   2 |   INLIST ITERATOR                     |                         |       |       |       |            |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| logtable                |  6878 |  2491K|       |  1597   (1)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | idx_entrytype           |  6878 |       |       |    23   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------------

Is there much I can analyze, any information you might need to further analyze this? 

Re: Very slow Query compared to Oracle / SQL - Server

From
Vijaykumar Jain
Date:
ok one last thing, not to be a PITA, but just in case if this helps.

postgres=# SELECT * FROM pg_stat_user_indexes where relname = 'logtable'; postgres=# SELECT * FROM pg_stat_user_tables where relname = 'logtable';
basically, i just to verify if the table is not bloated.
looking at n_live_tup vs n_dead_tup would help understand it.

if you see too many dead tuples,
vacuum (ANALYZE,verbose) logtable; -- would get rid of dead tuples (if there are no tx using the dead tuples) 

and then run your query.

Thanks,
Vijay






On Fri, 7 May 2021 at 13:34, Semen Yefimenko <semen.yefimenko@gmail.com> wrote:
Sorry if I'm cumulatively answering everyone in one E-Mail, I'm not sure how I'm supposed to do it. (single E-Mails vs many) 
 
Can you try tuning by increasing the shared_buffers slowly in steps of  500MB, and running explain analyze against the query.

-- 2500 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=2076.329..3737.050 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=295446
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=2007.487..2202.707 rows=172172 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 65154kB
        Worker 0:  Sort Method: quicksort  Memory: 55707kB
        Worker 1:  Sort Method: quicksort  Memory: 55304kB
        Buffers: shared hit=295446
        Worker 0: actual time=1963.969..2156.624 rows=161205 loops=1
          Buffers: shared hit=91028
        Worker 1: actual time=1984.700..2179.697 rows=161935 loops=1
          Buffers: shared hit=92133
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=322.125..1618.971 rows=172172 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=110951
              Buffers: shared hit=295432
              Worker 0: actual time=282.201..1595.117 rows=161205 loops=1
                Buffers: shared hit=91021
              Worker 1: actual time=303.671..1623.299 rows=161935 loops=1
                Buffers: shared hit=92126
              ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0) (actual time=199.119..199.119 rows=0 loops=1)
                    Buffers: shared hit=1334
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..682.13 rows=67293 width=0) (actual time=28.856..28.857 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2223.63 rows=219760 width=0) (actual time=108.871..108.872 rows=225283 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=581
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2261.87 rows=223525 width=0) (actual time=61.377..61.377 rows=225264 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=581
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.940 ms
Execution Time: 4188.083 ms
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-- 3000 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=2062.280..3763.408 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=295446
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=1987.933..2180.422 rows=172172 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 66602kB
        Worker 0:  Sort Method: quicksort  Memory: 55149kB
        Worker 1:  Sort Method: quicksort  Memory: 54415kB
        Buffers: shared hit=295446
        Worker 0: actual time=1963.059..2147.916 rows=159556 loops=1
          Buffers: shared hit=89981
        Worker 1: actual time=1949.726..2136.200 rows=158554 loops=1
          Buffers: shared hit=90141
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=340.705..1603.796 rows=172172 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=113990
              Buffers: shared hit=295432
              Worker 0: actual time=317.918..1605.548 rows=159556 loops=1
                Buffers: shared hit=89974
              Worker 1: actual time=304.744..1589.221 rows=158554 loops=1
                Buffers: shared hit=90134
              ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0) (actual time=218.972..218.973 rows=0 loops=1)
                    Buffers: shared hit=1334
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..682.13 rows=67293 width=0) (actual time=37.741..37.742 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2223.63 rows=219760 width=0) (actual time=119.120..119.121 rows=225283 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=581
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2261.87 rows=223525 width=0) (actual time=62.097..62.098 rows=225264 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=581
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 2.717 ms
Execution Time: 4224.670 ms
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-- 3500 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=3578.155..4932.858 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=14 read=295432 written=67
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=3482.159..3677.227 rows=172172 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 58533kB
        Worker 0:  Sort Method: quicksort  Memory: 56878kB
        Worker 1:  Sort Method: quicksort  Memory: 60755kB
        Buffers: shared hit=14 read=295432 written=67
        Worker 0: actual time=3435.131..3632.985 rows=166842 loops=1
          Buffers: shared hit=7 read=95783 written=25
        Worker 1: actual time=3441.545..3649.345 rows=179354 loops=1
          Buffers: shared hit=5 read=101608 written=20
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=345.111..3042.932 rows=172172 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=96709
              Buffers: shared hit=2 read=295430 written=67
              Worker 0: actual time=300.525..2999.403 rows=166842 loops=1
                Buffers: shared read=95783 written=25
              Worker 1: actual time=300.552..3004.859 rows=179354 loops=1
                Buffers: shared read=101606 written=20
              ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0) (actual time=241.996..241.997 rows=0 loops=1)
                    Buffers: shared hit=2 read=1332
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..682.13 rows=67293 width=0) (actual time=37.129..37.130 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared read=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2223.63 rows=219760 width=0) (actual time=131.051..131.052 rows=225283 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=1 read=580
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2261.87 rows=223525 width=0) (actual time=73.800..73.800 rows=225264 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=1 read=580
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.597 ms
Execution Time: 5389.811 ms
 
This doesn't seem to have had an effect. 
Thanks for the suggestion. 

Have you try of excluding not null from index? Can you give dispersion of archivestatus?

Yes I have, it yielded the same performance boost as :

 create index test on logtable(entrytype) where archivestatus <= 1;

I wonder what the old query plan was...
Would you include links to your prior correspondance ?

So prior Execution Plans are present in the SO.
The other forums I've tried are the official slack channel : 
https://postgresteam.slack.com/archives/C0FS3UTAP/p1620286295228600
And SO : https://stackoverflow.com/questions/67401792/slow-running-postgresql-query
But I think most of the points discussed in these posts have already been mentionend by you except bloating of indexes. 

Oracle is apparently doing a single scan on "entrytype".
As a test, you could try forcing that, like:
begin; SET enable_bitmapscan=off ; explain (analyze) [...]; rollback;
or
begin; DROP INDEX idx_arcstatus; explain (analyze) [...]; rollback;

I've tried enable_bitmapscan=off but it didn't yield any good results.

-- 2000 MB shared buffers - random_page_cost = 4 - enable_bitmapscan to off
Gather Merge  (cost=543949.72..593050.69 rows=420836 width=2542) (actual time=7716.031..9043.399 rows=516517 loops=1)
  Output: column1, .., column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=192 read=406605
  ->  Sort  (cost=542949.70..543475.75 rows=210418 width=2542) (actual time=7642.666..7835.527 rows=172172 loops=3)
        Output: column1, .., column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 58803kB
        Worker 0:  Sort Method: quicksort  Memory: 60376kB
        Worker 1:  Sort Method: quicksort  Memory: 56988kB
        Buffers: shared hit=192 read=406605
        Worker 0: actual time=7610.482..7814.905 rows=177637 loops=1
          Buffers: shared hit=78 read=137826
        Worker 1: actual time=7607.645..7803.561 rows=167316 loops=1
          Buffers: shared hit=80 read=132672
        ->  Parallel Seq Scan on schema.logtable  (cost=0.00..524345.70 rows=210418 width=2542) (actual time=1.669..7189.365 rows=172172 loops=3)
              Output: column1, .., column54
              Filter: ((logtable.acrhivestatus <= 1) AND ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)))
              Rows Removed by Filter: 4533459
              Buffers: shared hit=96 read=406605
              Worker 0: actual time=1.537..7158.286 rows=177637 loops=1
                Buffers: shared hit=30 read=137826
              Worker 1: actual time=1.414..7161.670 rows=167316 loops=1
                Buffers: shared hit=32 read=132672
Settings: enable_bitmapscan = 'off', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.725 ms
Execution Time: 9500.928 ms
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2000 MB shared buffers - random_page_cost = 4 - -- 2000 -- 2000 MB shared buffers - random_page_cost = 1 - enable_bitmapscan to off
Gather Merge  (cost=543949.72..593050.69 rows=420836 width=2542) (actual time=7519.032..8871.433 rows=516517 loops=1)
  Output: column1, .., column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=576 read=406221
  ->  Sort  (cost=542949.70..543475.75 rows=210418 width=2542) (actual time=7451.958..7649.480 rows=172172 loops=3)
        Output: column1, .., column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 58867kB
        Worker 0:  Sort Method: quicksort  Memory: 58510kB
        Worker 1:  Sort Method: quicksort  Memory: 58788kB
        Buffers: shared hit=576 read=406221
        Worker 0: actual time=7438.271..7644.241 rows=172085 loops=1
          Buffers: shared hit=203 read=135166
        Worker 1: actual time=7407.574..7609.922 rows=172948 loops=1
          Buffers: shared hit=202 read=135225
        ->  Parallel Seq Scan on schema.logtable  (cost=0.00..524345.70 rows=210418 width=2542) (actual time=2.839..7017.729 rows=172172 loops=3)
              Output: column1, .., column54
              Filter: ((logtable.acrhivestatus <= 1) AND ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)))
              Rows Removed by Filter: 4533459
              Buffers: shared hit=480 read=406221
              Worker 0: actual time=2.628..7006.420 rows=172085 loops=1
                Buffers: shared hit=155 read=135166
              Worker 1: actual time=3.948..6978.154 rows=172948 loops=1
                Buffers: shared hit=154 read=135225
Settings: enable_bitmapscan = 'off', random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.621 ms
Execution Time: 9339.457 ms

Have you tune shared buffers enough? Each block is of 8k by default.
BTW, please try to reset random_page_cost.

Look above. 

I will try upgrading the minor version next.
I will also try setting up a 13.X version locally and import the data from 12.2 to 13.X and see if it might be faster.


Am Do., 6. Mai 2021 um 23:16 Uhr schrieb Imre Samu <pella.samu@gmail.com>:
> Postgres Version : PostgreSQL 12.2,
>  ... ON ... USING btree

IMHO:
The next minor (bugix&security) release is near ( expected ~ May 13th, 2021 )   https://www.postgresql.org/developer/roadmap/
so you can update your PostgreSQL to 12.7  ( + full Reindexing recommended ! )
 
You can find a lot of B-tree index-related fixes.
 - Fix possible undercounting of deleted B-tree index pages in VACUUM VERBOSE output
-  Fix wrong bookkeeping for oldest deleted page in a B-tree index
-  Ensure INCLUDE'd columns are always removed from B-tree pivot tuples
  - Avoid repeated marking of dead btree index entries as dead
  - Fix failure of parallel B-tree index scans when the index condition is unsatisfiable


> COLLATE pg_catalog."default"

You can test the "C" Collation  in some columns  (keys ? ) ;  in theory, it should be faster :
"The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them."

Best,
 Imre


Semen Yefimenko <semen.yefimenko@gmail.com> ezt írta (időpont: 2021. máj. 6., Cs, 16:38):
Hi there,

I've recently been involved in migrating our old system to SQL Server and then PostgreSQL. Everything has been working fine so far but now after executing our tests on Postgres, we saw a very slow running query on a large table in our database. 
I have tried asking on other platforms but no one has been able to give me a satisfying answer. 

Postgres Version : PostgreSQL 12.2, compiled by Visual C++ build 1914, 64-bit
No notable errors in the Server log and the Postgres Server itself.

The table structure :

CREATE TABLE logtable
(
    key character varying(20) COLLATE pg_catalog."default" NOT NULL,
    id integer,
    column3 integer,
    column4 integer,
    column5 integer,
    column6 integer,
    column7 integer,
    column8 integer,
    column9 character varying(128) COLLATE pg_catalog."default",
    column10   character varying(2048) COLLATE pg_catalog."default",
    column11   character varying(2048) COLLATE pg_catalog."default",
    column12   character varying(2048) COLLATE pg_catalog."default",
    column13   character varying(2048) COLLATE pg_catalog."default",
    column14   character varying(2048) COLLATE pg_catalog."default",
    column15   character varying(2048) COLLATE pg_catalog."default",
    column16   character varying(2048) COLLATE pg_catalog."default",
    column17   character varying(2048) COLLATE pg_catalog."default",
    column18   character varying(2048) COLLATE pg_catalog."default",
    column19   character varying(2048) COLLATE pg_catalog."default",
    column21 character varying(256) COLLATE pg_catalog."default",
    column22 character varying(256) COLLATE pg_catalog."default",
    column23 character varying(256) COLLATE pg_catalog."default",
    column24 character varying(256) COLLATE pg_catalog."default",
    column25 character varying(256) COLLATE pg_catalog."default",
    column26 character varying(256) COLLATE pg_catalog."default",
    column27 character varying(256) COLLATE pg_catalog."default",
    column28 character varying(256) COLLATE pg_catalog."default",
    column29 character varying(256) COLLATE pg_catalog."default",
    column30 character varying(256) COLLATE pg_catalog."default",
    column31 character varying(256) COLLATE pg_catalog."default",
    column32 character varying(256) COLLATE pg_catalog."default",
    column33 character varying(256) COLLATE pg_catalog."default",
    column34 character varying(256) COLLATE pg_catalog."default",
    column35 character varying(256) COLLATE pg_catalog."default",
    entrytype integer,
    column37 bigint,
    column38 bigint,
    column39 bigint,
    column40 bigint,
    column41 bigint,
    column42 bigint,
    column43 bigint,
    column44 bigint,
    column45 bigint,
    column46 bigint,
    column47 character varying(128) COLLATE pg_catalog."default",
    timestampcol timestamp without time zone,
    column49 timestamp without time zone,
    column50 timestamp without time zone,
    column51 timestamp without time zone,
    column52 timestamp without time zone,
    archivestatus integer,
    column54 integer,
    column55 character varying(20) COLLATE pg_catalog."default",
    CONSTRAINT pkey PRIMARY KEY (key)
        USING INDEX TABLESPACE tablespace
)

TABLESPACE tablespace;

ALTER TABLE schema.logtable
    OWNER to user;

CREATE INDEX idx_timestampcol
    ON schema.logtable USING btree
    ( timestampcol ASC NULLS LAST )
    TABLESPACE tablespace ;

CREATE INDEX idx_test2
    ON schema.logtable USING btree
    ( entrytype  ASC NULLS LAST)
    TABLESPACE tablespace
    WHERE archivestatus <= 1;

CREATE INDEX idx_arcstatus
    ON schema.logtable USING btree
    ( archivestatus ASC NULLS LAST)
    TABLESPACE tablespace;

CREATE INDEX idx_entrytype
    ON schema.logtable USING btree
    ( entrytype ASC NULLS LAST)
    TABLESPACE tablespace ;


The table contains 14.000.000 entries and has about 3.3 GB of data:
No triggers, inserts per day, probably 5-20 K per day.

SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='logtable';

relname           |relpages|reltuples|relallvisible|relkind|relnatts|relhassubclass|reloptions|pg_table_size|
------------------|--------|---------|-------------|-------|--------|--------------|----------|-------------|
logtable          |  405988| 14091424|       405907|r      |      54|false         |NULL      |   3326803968|


The slow running query:

SELECT column1,..., column54  where ((entrytype = 4000 or entrytype = 4001 or entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;

This query runs in about 45-60 seconds.
The same query runs in about 289 ms Oracle and 423 ms in SQL-Server. 
Now I understand that actually loading all results would take a while. (about 520K or so rows) 
But that shouldn't be exactly what happens right? There should be a resultset iterator which can retrieve all data but doesn't from the get go. 

With the help of some people in the slack and so thread, I've found a configuration parameter which helps performance : 

set random_page_cost = 1;

This improved performance from 45-60 s to 15-35 s. (since we are using ssd's) 
Still not acceptable but definitely an improvement. 
Some maybe relevant system parameters:

effective_cache_size 4GB
maintenance_work_mem 1GB
shared_buffers 2GB
work_mem 1GB

Currently I'm accessing the data through DbBeaver (JDBC - postgresql-42.2.5.jar) and our JAVA application (JDBC - postgresql-42.2.19.jar). Both use the defaultRowFetchSize=5000 to not load everything into memory and limit the results. 
The explain plan:

EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)...
(Above Query)

Gather Merge  (cost=347142.71..397196.91 rows=429006 width=2558) (actual time=21210.019..22319.444 rows=515841 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=141487 read=153489
  ->  Sort  (cost=346142.69..346678.95 rows=214503 width=2558) (actual time=21148.887..21297.428 rows=171947 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 62180kB
        Worker 0:  Sort Method: quicksort  Memory: 56969kB
        Worker 1:  Sort Method: quicksort  Memory: 56837kB
        Buffers: shared hit=141487 read=153489
        Worker 0: actual time=21129.973..21296.839 rows=166776 loops=1
          Buffers: shared hit=45558 read=49514
        Worker 1: actual time=21114.439..21268.117 rows=165896 loops=1
          Buffers: shared hit=45104 read=49506
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5652.74..327147.77 rows=214503 width=2558) (actual time=1304.813..20637.462 rows=171947 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=103962
              Buffers: shared hit=141473 read=153489
              Worker 0: actual time=1280.472..20638.620 rows=166776 loops=1
                Buffers: shared hit=45551 read=49514
              Worker 1: actual time=1275.274..20626.219 rows=165896 loops=1
                Buffers: shared hit=45097 read=49506
              ->  BitmapOr  (cost=5652.74..5652.74 rows=520443 width=0) (actual time=1179.438..1179.438 rows=0 loops=1)
                    Buffers: shared hit=9 read=1323
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..556.61 rows=54957 width=0) (actual time=161.939..161.940 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=1 read=171
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2243.22 rows=221705 width=0) (actual time=548.849..548.849 rows=224945 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=4 read=576
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2466.80 rows=243782 width=0) (actual time=468.637..468.637 rows=224926 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=4 read=576
Settings: random_page_cost = '1', search_path = '"$user", schema, public', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.578 ms
Execution Time: 22617.351 ms


As mentioned before, oracle does this much faster. 

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                         |  6878 |  2491K|       |  2143   (1)| 00:00:01 |
|   1 |  SORT ORDER BY                        |                         |  6878 |  2491K|  3448K|  2143   (1)| 00:00:01 |
|   2 |   INLIST ITERATOR                     |                         |       |       |       |            |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| logtable                |  6878 |  2491K|       |  1597   (1)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | idx_entrytype           |  6878 |       |       |    23   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------------

Is there much I can analyze, any information you might need to further analyze this? 


--
Thanks,
Vijay
Mumbai, India

Re: Very slow Query compared to Oracle / SQL - Server

From
Semen Yefimenko
Date:
As mentionend in the slack comments :
SELECT pg_size_pretty(pg_relation_size('logtable')) as table_size, pg_size_pretty(pg_relation_size('idx_entrytype')) as index_size, (pgstattuple('logtable')).dead_tuple_percent;
table_size | index_size | dead_tuple_percent
------------+------------+--------------------
3177 MB | 289 MB | 0

I have roughly 6 indexes which all have around 300 MB
SELECT pg_relation_size('logtable') as table_size, pg_relation_size(idx_entrytype) as index_size, 100-(pgstatindex('idx_entrytype')).avg_leaf_density as bloat_ratio
table_size | index_size | bloat_ratio
------------+------------+-------------------
3331694592 | 302555136 | 5.219999999999999

Your queries:
n_live_tup n_dead_tup
14118380   0

For testing, I've also been running VACUUM and ANALYZE pretty much before every test run.

Am Fr., 7. Mai 2021 um 10:44 Uhr schrieb Vijaykumar Jain <vijaykumarjain.github@gmail.com>:
ok one last thing, not to be a PITA, but just in case if this helps.

postgres=# SELECT * FROM pg_stat_user_indexes where relname = 'logtable'; postgres=# SELECT * FROM pg_stat_user_tables where relname = 'logtable';
basically, i just to verify if the table is not bloated.
looking at n_live_tup vs n_dead_tup would help understand it.

if you see too many dead tuples,
vacuum (ANALYZE,verbose) logtable; -- would get rid of dead tuples (if there are no tx using the dead tuples) 

and then run your query.

Thanks,
Vijay






On Fri, 7 May 2021 at 13:34, Semen Yefimenko <semen.yefimenko@gmail.com> wrote:
Sorry if I'm cumulatively answering everyone in one E-Mail, I'm not sure how I'm supposed to do it. (single E-Mails vs many) 
 
Can you try tuning by increasing the shared_buffers slowly in steps of  500MB, and running explain analyze against the query.

-- 2500 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=2076.329..3737.050 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=295446
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=2007.487..2202.707 rows=172172 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 65154kB
        Worker 0:  Sort Method: quicksort  Memory: 55707kB
        Worker 1:  Sort Method: quicksort  Memory: 55304kB
        Buffers: shared hit=295446
        Worker 0: actual time=1963.969..2156.624 rows=161205 loops=1
          Buffers: shared hit=91028
        Worker 1: actual time=1984.700..2179.697 rows=161935 loops=1
          Buffers: shared hit=92133
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=322.125..1618.971 rows=172172 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=110951
              Buffers: shared hit=295432
              Worker 0: actual time=282.201..1595.117 rows=161205 loops=1
                Buffers: shared hit=91021
              Worker 1: actual time=303.671..1623.299 rows=161935 loops=1
                Buffers: shared hit=92126
              ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0) (actual time=199.119..199.119 rows=0 loops=1)
                    Buffers: shared hit=1334
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..682.13 rows=67293 width=0) (actual time=28.856..28.857 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2223.63 rows=219760 width=0) (actual time=108.871..108.872 rows=225283 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=581
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2261.87 rows=223525 width=0) (actual time=61.377..61.377 rows=225264 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=581
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.940 ms
Execution Time: 4188.083 ms
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-- 3000 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=2062.280..3763.408 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=295446
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=1987.933..2180.422 rows=172172 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 66602kB
        Worker 0:  Sort Method: quicksort  Memory: 55149kB
        Worker 1:  Sort Method: quicksort  Memory: 54415kB
        Buffers: shared hit=295446
        Worker 0: actual time=1963.059..2147.916 rows=159556 loops=1
          Buffers: shared hit=89981
        Worker 1: actual time=1949.726..2136.200 rows=158554 loops=1
          Buffers: shared hit=90141
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=340.705..1603.796 rows=172172 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=113990
              Buffers: shared hit=295432
              Worker 0: actual time=317.918..1605.548 rows=159556 loops=1
                Buffers: shared hit=89974
              Worker 1: actual time=304.744..1589.221 rows=158554 loops=1
                Buffers: shared hit=90134
              ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0) (actual time=218.972..218.973 rows=0 loops=1)
                    Buffers: shared hit=1334
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..682.13 rows=67293 width=0) (actual time=37.741..37.742 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2223.63 rows=219760 width=0) (actual time=119.120..119.121 rows=225283 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=581
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2261.87 rows=223525 width=0) (actual time=62.097..62.098 rows=225264 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=581
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 2.717 ms
Execution Time: 4224.670 ms
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-- 3500 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=3578.155..4932.858 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=14 read=295432 written=67
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=3482.159..3677.227 rows=172172 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 58533kB
        Worker 0:  Sort Method: quicksort  Memory: 56878kB
        Worker 1:  Sort Method: quicksort  Memory: 60755kB
        Buffers: shared hit=14 read=295432 written=67
        Worker 0: actual time=3435.131..3632.985 rows=166842 loops=1
          Buffers: shared hit=7 read=95783 written=25
        Worker 1: actual time=3441.545..3649.345 rows=179354 loops=1
          Buffers: shared hit=5 read=101608 written=20
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=345.111..3042.932 rows=172172 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=96709
              Buffers: shared hit=2 read=295430 written=67
              Worker 0: actual time=300.525..2999.403 rows=166842 loops=1
                Buffers: shared read=95783 written=25
              Worker 1: actual time=300.552..3004.859 rows=179354 loops=1
                Buffers: shared read=101606 written=20
              ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0) (actual time=241.996..241.997 rows=0 loops=1)
                    Buffers: shared hit=2 read=1332
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..682.13 rows=67293 width=0) (actual time=37.129..37.130 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared read=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2223.63 rows=219760 width=0) (actual time=131.051..131.052 rows=225283 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=1 read=580
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2261.87 rows=223525 width=0) (actual time=73.800..73.800 rows=225264 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=1 read=580
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.597 ms
Execution Time: 5389.811 ms
 
This doesn't seem to have had an effect. 
Thanks for the suggestion. 

Have you try of excluding not null from index? Can you give dispersion of archivestatus?

Yes I have, it yielded the same performance boost as :

 create index test on logtable(entrytype) where archivestatus <= 1;

I wonder what the old query plan was...
Would you include links to your prior correspondance ?

So prior Execution Plans are present in the SO.
The other forums I've tried are the official slack channel : 
https://postgresteam.slack.com/archives/C0FS3UTAP/p1620286295228600
And SO : https://stackoverflow.com/questions/67401792/slow-running-postgresql-query
But I think most of the points discussed in these posts have already been mentionend by you except bloating of indexes. 

Oracle is apparently doing a single scan on "entrytype".
As a test, you could try forcing that, like:
begin; SET enable_bitmapscan=off ; explain (analyze) [...]; rollback;
or
begin; DROP INDEX idx_arcstatus; explain (analyze) [...]; rollback;

I've tried enable_bitmapscan=off but it didn't yield any good results.

-- 2000 MB shared buffers - random_page_cost = 4 - enable_bitmapscan to off
Gather Merge  (cost=543949.72..593050.69 rows=420836 width=2542) (actual time=7716.031..9043.399 rows=516517 loops=1)
  Output: column1, .., column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=192 read=406605
  ->  Sort  (cost=542949.70..543475.75 rows=210418 width=2542) (actual time=7642.666..7835.527 rows=172172 loops=3)
        Output: column1, .., column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 58803kB
        Worker 0:  Sort Method: quicksort  Memory: 60376kB
        Worker 1:  Sort Method: quicksort  Memory: 56988kB
        Buffers: shared hit=192 read=406605
        Worker 0: actual time=7610.482..7814.905 rows=177637 loops=1
          Buffers: shared hit=78 read=137826
        Worker 1: actual time=7607.645..7803.561 rows=167316 loops=1
          Buffers: shared hit=80 read=132672
        ->  Parallel Seq Scan on schema.logtable  (cost=0.00..524345.70 rows=210418 width=2542) (actual time=1.669..7189.365 rows=172172 loops=3)
              Output: column1, .., column54
              Filter: ((logtable.acrhivestatus <= 1) AND ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)))
              Rows Removed by Filter: 4533459
              Buffers: shared hit=96 read=406605
              Worker 0: actual time=1.537..7158.286 rows=177637 loops=1
                Buffers: shared hit=30 read=137826
              Worker 1: actual time=1.414..7161.670 rows=167316 loops=1
                Buffers: shared hit=32 read=132672
Settings: enable_bitmapscan = 'off', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.725 ms
Execution Time: 9500.928 ms
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2000 MB shared buffers - random_page_cost = 4 - -- 2000 -- 2000 MB shared buffers - random_page_cost = 1 - enable_bitmapscan to off
Gather Merge  (cost=543949.72..593050.69 rows=420836 width=2542) (actual time=7519.032..8871.433 rows=516517 loops=1)
  Output: column1, .., column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=576 read=406221
  ->  Sort  (cost=542949.70..543475.75 rows=210418 width=2542) (actual time=7451.958..7649.480 rows=172172 loops=3)
        Output: column1, .., column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 58867kB
        Worker 0:  Sort Method: quicksort  Memory: 58510kB
        Worker 1:  Sort Method: quicksort  Memory: 58788kB
        Buffers: shared hit=576 read=406221
        Worker 0: actual time=7438.271..7644.241 rows=172085 loops=1
          Buffers: shared hit=203 read=135166
        Worker 1: actual time=7407.574..7609.922 rows=172948 loops=1
          Buffers: shared hit=202 read=135225
        ->  Parallel Seq Scan on schema.logtable  (cost=0.00..524345.70 rows=210418 width=2542) (actual time=2.839..7017.729 rows=172172 loops=3)
              Output: column1, .., column54
              Filter: ((logtable.acrhivestatus <= 1) AND ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)))
              Rows Removed by Filter: 4533459
              Buffers: shared hit=480 read=406221
              Worker 0: actual time=2.628..7006.420 rows=172085 loops=1
                Buffers: shared hit=155 read=135166
              Worker 1: actual time=3.948..6978.154 rows=172948 loops=1
                Buffers: shared hit=154 read=135225
Settings: enable_bitmapscan = 'off', random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.621 ms
Execution Time: 9339.457 ms

Have you tune shared buffers enough? Each block is of 8k by default.
BTW, please try to reset random_page_cost.

Look above. 

I will try upgrading the minor version next.
I will also try setting up a 13.X version locally and import the data from 12.2 to 13.X and see if it might be faster.


Am Do., 6. Mai 2021 um 23:16 Uhr schrieb Imre Samu <pella.samu@gmail.com>:
> Postgres Version : PostgreSQL 12.2,
>  ... ON ... USING btree

IMHO:
The next minor (bugix&security) release is near ( expected ~ May 13th, 2021 )   https://www.postgresql.org/developer/roadmap/
so you can update your PostgreSQL to 12.7  ( + full Reindexing recommended ! )
 
You can find a lot of B-tree index-related fixes.
 - Fix possible undercounting of deleted B-tree index pages in VACUUM VERBOSE output
-  Fix wrong bookkeeping for oldest deleted page in a B-tree index
-  Ensure INCLUDE'd columns are always removed from B-tree pivot tuples
  - Avoid repeated marking of dead btree index entries as dead
  - Fix failure of parallel B-tree index scans when the index condition is unsatisfiable


> COLLATE pg_catalog."default"

You can test the "C" Collation  in some columns  (keys ? ) ;  in theory, it should be faster :
"The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them."

Best,
 Imre


Semen Yefimenko <semen.yefimenko@gmail.com> ezt írta (időpont: 2021. máj. 6., Cs, 16:38):
Hi there,

I've recently been involved in migrating our old system to SQL Server and then PostgreSQL. Everything has been working fine so far but now after executing our tests on Postgres, we saw a very slow running query on a large table in our database. 
I have tried asking on other platforms but no one has been able to give me a satisfying answer. 

Postgres Version : PostgreSQL 12.2, compiled by Visual C++ build 1914, 64-bit
No notable errors in the Server log and the Postgres Server itself.

The table structure :

CREATE TABLE logtable
(
    key character varying(20) COLLATE pg_catalog."default" NOT NULL,
    id integer,
    column3 integer,
    column4 integer,
    column5 integer,
    column6 integer,
    column7 integer,
    column8 integer,
    column9 character varying(128) COLLATE pg_catalog."default",
    column10   character varying(2048) COLLATE pg_catalog."default",
    column11   character varying(2048) COLLATE pg_catalog."default",
    column12   character varying(2048) COLLATE pg_catalog."default",
    column13   character varying(2048) COLLATE pg_catalog."default",
    column14   character varying(2048) COLLATE pg_catalog."default",
    column15   character varying(2048) COLLATE pg_catalog."default",
    column16   character varying(2048) COLLATE pg_catalog."default",
    column17   character varying(2048) COLLATE pg_catalog."default",
    column18   character varying(2048) COLLATE pg_catalog."default",
    column19   character varying(2048) COLLATE pg_catalog."default",
    column21 character varying(256) COLLATE pg_catalog."default",
    column22 character varying(256) COLLATE pg_catalog."default",
    column23 character varying(256) COLLATE pg_catalog."default",
    column24 character varying(256) COLLATE pg_catalog."default",
    column25 character varying(256) COLLATE pg_catalog."default",
    column26 character varying(256) COLLATE pg_catalog."default",
    column27 character varying(256) COLLATE pg_catalog."default",
    column28 character varying(256) COLLATE pg_catalog."default",
    column29 character varying(256) COLLATE pg_catalog."default",
    column30 character varying(256) COLLATE pg_catalog."default",
    column31 character varying(256) COLLATE pg_catalog."default",
    column32 character varying(256) COLLATE pg_catalog."default",
    column33 character varying(256) COLLATE pg_catalog."default",
    column34 character varying(256) COLLATE pg_catalog."default",
    column35 character varying(256) COLLATE pg_catalog."default",
    entrytype integer,
    column37 bigint,
    column38 bigint,
    column39 bigint,
    column40 bigint,
    column41 bigint,
    column42 bigint,
    column43 bigint,
    column44 bigint,
    column45 bigint,
    column46 bigint,
    column47 character varying(128) COLLATE pg_catalog."default",
    timestampcol timestamp without time zone,
    column49 timestamp without time zone,
    column50 timestamp without time zone,
    column51 timestamp without time zone,
    column52 timestamp without time zone,
    archivestatus integer,
    column54 integer,
    column55 character varying(20) COLLATE pg_catalog."default",
    CONSTRAINT pkey PRIMARY KEY (key)
        USING INDEX TABLESPACE tablespace
)

TABLESPACE tablespace;

ALTER TABLE schema.logtable
    OWNER to user;

CREATE INDEX idx_timestampcol
    ON schema.logtable USING btree
    ( timestampcol ASC NULLS LAST )
    TABLESPACE tablespace ;

CREATE INDEX idx_test2
    ON schema.logtable USING btree
    ( entrytype  ASC NULLS LAST)
    TABLESPACE tablespace
    WHERE archivestatus <= 1;

CREATE INDEX idx_arcstatus
    ON schema.logtable USING btree
    ( archivestatus ASC NULLS LAST)
    TABLESPACE tablespace;

CREATE INDEX idx_entrytype
    ON schema.logtable USING btree
    ( entrytype ASC NULLS LAST)
    TABLESPACE tablespace ;


The table contains 14.000.000 entries and has about 3.3 GB of data:
No triggers, inserts per day, probably 5-20 K per day.

SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='logtable';

relname           |relpages|reltuples|relallvisible|relkind|relnatts|relhassubclass|reloptions|pg_table_size|
------------------|--------|---------|-------------|-------|--------|--------------|----------|-------------|
logtable          |  405988| 14091424|       405907|r      |      54|false         |NULL      |   3326803968|


The slow running query:

SELECT column1,..., column54  where ((entrytype = 4000 or entrytype = 4001 or entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;

This query runs in about 45-60 seconds.
The same query runs in about 289 ms Oracle and 423 ms in SQL-Server. 
Now I understand that actually loading all results would take a while. (about 520K or so rows) 
But that shouldn't be exactly what happens right? There should be a resultset iterator which can retrieve all data but doesn't from the get go. 

With the help of some people in the slack and so thread, I've found a configuration parameter which helps performance : 

set random_page_cost = 1;

This improved performance from 45-60 s to 15-35 s. (since we are using ssd's) 
Still not acceptable but definitely an improvement. 
Some maybe relevant system parameters:

effective_cache_size 4GB
maintenance_work_mem 1GB
shared_buffers 2GB
work_mem 1GB

Currently I'm accessing the data through DbBeaver (JDBC - postgresql-42.2.5.jar) and our JAVA application (JDBC - postgresql-42.2.19.jar). Both use the defaultRowFetchSize=5000 to not load everything into memory and limit the results. 
The explain plan:

EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)...
(Above Query)

Gather Merge  (cost=347142.71..397196.91 rows=429006 width=2558) (actual time=21210.019..22319.444 rows=515841 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=141487 read=153489
  ->  Sort  (cost=346142.69..346678.95 rows=214503 width=2558) (actual time=21148.887..21297.428 rows=171947 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 62180kB
        Worker 0:  Sort Method: quicksort  Memory: 56969kB
        Worker 1:  Sort Method: quicksort  Memory: 56837kB
        Buffers: shared hit=141487 read=153489
        Worker 0: actual time=21129.973..21296.839 rows=166776 loops=1
          Buffers: shared hit=45558 read=49514
        Worker 1: actual time=21114.439..21268.117 rows=165896 loops=1
          Buffers: shared hit=45104 read=49506
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5652.74..327147.77 rows=214503 width=2558) (actual time=1304.813..20637.462 rows=171947 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=103962
              Buffers: shared hit=141473 read=153489
              Worker 0: actual time=1280.472..20638.620 rows=166776 loops=1
                Buffers: shared hit=45551 read=49514
              Worker 1: actual time=1275.274..20626.219 rows=165896 loops=1
                Buffers: shared hit=45097 read=49506
              ->  BitmapOr  (cost=5652.74..5652.74 rows=520443 width=0) (actual time=1179.438..1179.438 rows=0 loops=1)
                    Buffers: shared hit=9 read=1323
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..556.61 rows=54957 width=0) (actual time=161.939..161.940 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=1 read=171
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2243.22 rows=221705 width=0) (actual time=548.849..548.849 rows=224945 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=4 read=576
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2466.80 rows=243782 width=0) (actual time=468.637..468.637 rows=224926 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=4 read=576
Settings: random_page_cost = '1', search_path = '"$user", schema, public', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.578 ms
Execution Time: 22617.351 ms


As mentioned before, oracle does this much faster. 

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                         |  6878 |  2491K|       |  2143   (1)| 00:00:01 |
|   1 |  SORT ORDER BY                        |                         |  6878 |  2491K|  3448K|  2143   (1)| 00:00:01 |
|   2 |   INLIST ITERATOR                     |                         |       |       |       |            |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| logtable                |  6878 |  2491K|       |  1597   (1)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | idx_entrytype           |  6878 |       |       |    23   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------------

Is there much I can analyze, any information you might need to further analyze this? 


--
Thanks,
Vijay
Mumbai, India

Re: Very slow Query compared to Oracle / SQL - Server

From
Vijaykumar Jain
Date:
Is this on windows ?

I see a thread that mentions of performance penalty due to parallel worker 


There is a mailing thread with subject line - 
Huge performance penalty with parallel queries in Windows x64 v. Linux x64



On Fri, 7 May 2021 at 2:33 PM Semen Yefimenko <semen.yefimenko@gmail.com> wrote:
As mentionend in the slack comments :
SELECT pg_size_pretty(pg_relation_size('logtable')) as table_size, pg_size_pretty(pg_relation_size('idx_entrytype')) as index_size, (pgstattuple('logtable')).dead_tuple_percent;
table_size | index_size | dead_tuple_percent
------------+------------+--------------------
3177 MB | 289 MB | 0

I have roughly 6 indexes which all have around 300 MB
SELECT pg_relation_size('logtable') as table_size, pg_relation_size(idx_entrytype) as index_size, 100-(pgstatindex('idx_entrytype')).avg_leaf_density as bloat_ratio
table_size | index_size | bloat_ratio
------------+------------+-------------------
3331694592 | 302555136 | 5.219999999999999

Your queries:
n_live_tup n_dead_tup
14118380   0

For testing, I've also been running VACUUM and ANALYZE pretty much before every test run.

Am Fr., 7. Mai 2021 um 10:44 Uhr schrieb Vijaykumar Jain <vijaykumarjain.github@gmail.com>:
ok one last thing, not to be a PITA, but just in case if this helps.

postgres=# SELECT * FROM pg_stat_user_indexes where relname = 'logtable'; postgres=# SELECT * FROM pg_stat_user_tables where relname = 'logtable';
basically, i just to verify if the table is not bloated.
looking at n_live_tup vs n_dead_tup would help understand it.

if you see too many dead tuples,
vacuum (ANALYZE,verbose) logtable; -- would get rid of dead tuples (if there are no tx using the dead tuples) 

and then run your query.

Thanks,
Vijay






On Fri, 7 May 2021 at 13:34, Semen Yefimenko <semen.yefimenko@gmail.com> wrote:
Sorry if I'm cumulatively answering everyone in one E-Mail, I'm not sure how I'm supposed to do it. (single E-Mails vs many) 
 
Can you try tuning by increasing the shared_buffers slowly in steps of  500MB, and running explain analyze against the query.

-- 2500 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=2076.329..3737.050 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=295446
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=2007.487..2202.707 rows=172172 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 65154kB
        Worker 0:  Sort Method: quicksort  Memory: 55707kB
        Worker 1:  Sort Method: quicksort  Memory: 55304kB
        Buffers: shared hit=295446
        Worker 0: actual time=1963.969..2156.624 rows=161205 loops=1
          Buffers: shared hit=91028
        Worker 1: actual time=1984.700..2179.697 rows=161935 loops=1
          Buffers: shared hit=92133
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=322.125..1618.971 rows=172172 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=110951
              Buffers: shared hit=295432
              Worker 0: actual time=282.201..1595.117 rows=161205 loops=1
                Buffers: shared hit=91021
              Worker 1: actual time=303.671..1623.299 rows=161935 loops=1
                Buffers: shared hit=92126
              ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0) (actual time=199.119..199.119 rows=0 loops=1)
                    Buffers: shared hit=1334
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..682.13 rows=67293 width=0) (actual time=28.856..28.857 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2223.63 rows=219760 width=0) (actual time=108.871..108.872 rows=225283 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=581
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2261.87 rows=223525 width=0) (actual time=61.377..61.377 rows=225264 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=581
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.940 ms
Execution Time: 4188.083 ms
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-- 3000 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=2062.280..3763.408 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=295446
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=1987.933..2180.422 rows=172172 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 66602kB
        Worker 0:  Sort Method: quicksort  Memory: 55149kB
        Worker 1:  Sort Method: quicksort  Memory: 54415kB
        Buffers: shared hit=295446
        Worker 0: actual time=1963.059..2147.916 rows=159556 loops=1
          Buffers: shared hit=89981
        Worker 1: actual time=1949.726..2136.200 rows=158554 loops=1
          Buffers: shared hit=90141
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=340.705..1603.796 rows=172172 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=113990
              Buffers: shared hit=295432
              Worker 0: actual time=317.918..1605.548 rows=159556 loops=1
                Buffers: shared hit=89974
              Worker 1: actual time=304.744..1589.221 rows=158554 loops=1
                Buffers: shared hit=90134
              ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0) (actual time=218.972..218.973 rows=0 loops=1)
                    Buffers: shared hit=1334
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..682.13 rows=67293 width=0) (actual time=37.741..37.742 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2223.63 rows=219760 width=0) (actual time=119.120..119.121 rows=225283 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=581
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2261.87 rows=223525 width=0) (actual time=62.097..62.098 rows=225264 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=581
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 2.717 ms
Execution Time: 4224.670 ms
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-- 3500 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=3578.155..4932.858 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=14 read=295432 written=67
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=3482.159..3677.227 rows=172172 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 58533kB
        Worker 0:  Sort Method: quicksort  Memory: 56878kB
        Worker 1:  Sort Method: quicksort  Memory: 60755kB
        Buffers: shared hit=14 read=295432 written=67
        Worker 0: actual time=3435.131..3632.985 rows=166842 loops=1
          Buffers: shared hit=7 read=95783 written=25
        Worker 1: actual time=3441.545..3649.345 rows=179354 loops=1
          Buffers: shared hit=5 read=101608 written=20
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=345.111..3042.932 rows=172172 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=96709
              Buffers: shared hit=2 read=295430 written=67
              Worker 0: actual time=300.525..2999.403 rows=166842 loops=1
                Buffers: shared read=95783 written=25
              Worker 1: actual time=300.552..3004.859 rows=179354 loops=1
                Buffers: shared read=101606 written=20
              ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0) (actual time=241.996..241.997 rows=0 loops=1)
                    Buffers: shared hit=2 read=1332
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..682.13 rows=67293 width=0) (actual time=37.129..37.130 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared read=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2223.63 rows=219760 width=0) (actual time=131.051..131.052 rows=225283 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=1 read=580
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2261.87 rows=223525 width=0) (actual time=73.800..73.800 rows=225264 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=1 read=580
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.597 ms
Execution Time: 5389.811 ms
 
This doesn't seem to have had an effect. 
Thanks for the suggestion. 

Have you try of excluding not null from index? Can you give dispersion of archivestatus?

Yes I have, it yielded the same performance boost as :

 create index test on logtable(entrytype) where archivestatus <= 1;

I wonder what the old query plan was...
Would you include links to your prior correspondance ?

So prior Execution Plans are present in the SO.
The other forums I've tried are the official slack channel : 
https://postgresteam.slack.com/archives/C0FS3UTAP/p1620286295228600
And SO : https://stackoverflow.com/questions/67401792/slow-running-postgresql-query
But I think most of the points discussed in these posts have already been mentionend by you except bloating of indexes. 

Oracle is apparently doing a single scan on "entrytype".
As a test, you could try forcing that, like:
begin; SET enable_bitmapscan=off ; explain (analyze) [...]; rollback;
or
begin; DROP INDEX idx_arcstatus; explain (analyze) [...]; rollback;

I've tried enable_bitmapscan=off but it didn't yield any good results.

-- 2000 MB shared buffers - random_page_cost = 4 - enable_bitmapscan to off
Gather Merge  (cost=543949.72..593050.69 rows=420836 width=2542) (actual time=7716.031..9043.399 rows=516517 loops=1)
  Output: column1, .., column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=192 read=406605
  ->  Sort  (cost=542949.70..543475.75 rows=210418 width=2542) (actual time=7642.666..7835.527 rows=172172 loops=3)
        Output: column1, .., column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 58803kB
        Worker 0:  Sort Method: quicksort  Memory: 60376kB
        Worker 1:  Sort Method: quicksort  Memory: 56988kB
        Buffers: shared hit=192 read=406605
        Worker 0: actual time=7610.482..7814.905 rows=177637 loops=1
          Buffers: shared hit=78 read=137826
        Worker 1: actual time=7607.645..7803.561 rows=167316 loops=1
          Buffers: shared hit=80 read=132672
        ->  Parallel Seq Scan on schema.logtable  (cost=0.00..524345.70 rows=210418 width=2542) (actual time=1.669..7189.365 rows=172172 loops=3)
              Output: column1, .., column54
              Filter: ((logtable.acrhivestatus <= 1) AND ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)))
              Rows Removed by Filter: 4533459
              Buffers: shared hit=96 read=406605
              Worker 0: actual time=1.537..7158.286 rows=177637 loops=1
                Buffers: shared hit=30 read=137826
              Worker 1: actual time=1.414..7161.670 rows=167316 loops=1
                Buffers: shared hit=32 read=132672
Settings: enable_bitmapscan = 'off', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.725 ms
Execution Time: 9500.928 ms
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2000 MB shared buffers - random_page_cost = 4 - -- 2000 -- 2000 MB shared buffers - random_page_cost = 1 - enable_bitmapscan to off
Gather Merge  (cost=543949.72..593050.69 rows=420836 width=2542) (actual time=7519.032..8871.433 rows=516517 loops=1)
  Output: column1, .., column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=576 read=406221
  ->  Sort  (cost=542949.70..543475.75 rows=210418 width=2542) (actual time=7451.958..7649.480 rows=172172 loops=3)
        Output: column1, .., column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 58867kB
        Worker 0:  Sort Method: quicksort  Memory: 58510kB
        Worker 1:  Sort Method: quicksort  Memory: 58788kB
        Buffers: shared hit=576 read=406221
        Worker 0: actual time=7438.271..7644.241 rows=172085 loops=1
          Buffers: shared hit=203 read=135166
        Worker 1: actual time=7407.574..7609.922 rows=172948 loops=1
          Buffers: shared hit=202 read=135225
        ->  Parallel Seq Scan on schema.logtable  (cost=0.00..524345.70 rows=210418 width=2542) (actual time=2.839..7017.729 rows=172172 loops=3)
              Output: column1, .., column54
              Filter: ((logtable.acrhivestatus <= 1) AND ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)))
              Rows Removed by Filter: 4533459
              Buffers: shared hit=480 read=406221
              Worker 0: actual time=2.628..7006.420 rows=172085 loops=1
                Buffers: shared hit=155 read=135166
              Worker 1: actual time=3.948..6978.154 rows=172948 loops=1
                Buffers: shared hit=154 read=135225
Settings: enable_bitmapscan = 'off', random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.621 ms
Execution Time: 9339.457 ms

Have you tune shared buffers enough? Each block is of 8k by default.
BTW, please try to reset random_page_cost.

Look above. 

I will try upgrading the minor version next.
I will also try setting up a 13.X version locally and import the data from 12.2 to 13.X and see if it might be faster.


Am Do., 6. Mai 2021 um 23:16 Uhr schrieb Imre Samu <pella.samu@gmail.com>:
> Postgres Version : PostgreSQL 12.2,
>  ... ON ... USING btree

IMHO:
The next minor (bugix&security) release is near ( expected ~ May 13th, 2021 )   https://www.postgresql.org/developer/roadmap/
so you can update your PostgreSQL to 12.7  ( + full Reindexing recommended ! )
 
You can find a lot of B-tree index-related fixes.
 - Fix possible undercounting of deleted B-tree index pages in VACUUM VERBOSE output
-  Fix wrong bookkeeping for oldest deleted page in a B-tree index
-  Ensure INCLUDE'd columns are always removed from B-tree pivot tuples
  - Avoid repeated marking of dead btree index entries as dead
  - Fix failure of parallel B-tree index scans when the index condition is unsatisfiable


> COLLATE pg_catalog."default"

You can test the "C" Collation  in some columns  (keys ? ) ;  in theory, it should be faster :
"The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them."

Best,
 Imre


Semen Yefimenko <semen.yefimenko@gmail.com> ezt írta (időpont: 2021. máj. 6., Cs, 16:38):
Hi there,

I've recently been involved in migrating our old system to SQL Server and then PostgreSQL. Everything has been working fine so far but now after executing our tests on Postgres, we saw a very slow running query on a large table in our database. 
I have tried asking on other platforms but no one has been able to give me a satisfying answer. 

Postgres Version : PostgreSQL 12.2, compiled by Visual C++ build 1914, 64-bit
No notable errors in the Server log and the Postgres Server itself.

The table structure :

CREATE TABLE logtable
(
    key character varying(20) COLLATE pg_catalog."default" NOT NULL,
    id integer,
    column3 integer,
    column4 integer,
    column5 integer,
    column6 integer,
    column7 integer,
    column8 integer,
    column9 character varying(128) COLLATE pg_catalog."default",
    column10   character varying(2048) COLLATE pg_catalog."default",
    column11   character varying(2048) COLLATE pg_catalog."default",
    column12   character varying(2048) COLLATE pg_catalog."default",
    column13   character varying(2048) COLLATE pg_catalog."default",
    column14   character varying(2048) COLLATE pg_catalog."default",
    column15   character varying(2048) COLLATE pg_catalog."default",
    column16   character varying(2048) COLLATE pg_catalog."default",
    column17   character varying(2048) COLLATE pg_catalog."default",
    column18   character varying(2048) COLLATE pg_catalog."default",
    column19   character varying(2048) COLLATE pg_catalog."default",
    column21 character varying(256) COLLATE pg_catalog."default",
    column22 character varying(256) COLLATE pg_catalog."default",
    column23 character varying(256) COLLATE pg_catalog."default",
    column24 character varying(256) COLLATE pg_catalog."default",
    column25 character varying(256) COLLATE pg_catalog."default",
    column26 character varying(256) COLLATE pg_catalog."default",
    column27 character varying(256) COLLATE pg_catalog."default",
    column28 character varying(256) COLLATE pg_catalog."default",
    column29 character varying(256) COLLATE pg_catalog."default",
    column30 character varying(256) COLLATE pg_catalog."default",
    column31 character varying(256) COLLATE pg_catalog."default",
    column32 character varying(256) COLLATE pg_catalog."default",
    column33 character varying(256) COLLATE pg_catalog."default",
    column34 character varying(256) COLLATE pg_catalog."default",
    column35 character varying(256) COLLATE pg_catalog."default",
    entrytype integer,
    column37 bigint,
    column38 bigint,
    column39 bigint,
    column40 bigint,
    column41 bigint,
    column42 bigint,
    column43 bigint,
    column44 bigint,
    column45 bigint,
    column46 bigint,
    column47 character varying(128) COLLATE pg_catalog."default",
    timestampcol timestamp without time zone,
    column49 timestamp without time zone,
    column50 timestamp without time zone,
    column51 timestamp without time zone,
    column52 timestamp without time zone,
    archivestatus integer,
    column54 integer,
    column55 character varying(20) COLLATE pg_catalog."default",
    CONSTRAINT pkey PRIMARY KEY (key)
        USING INDEX TABLESPACE tablespace
)

TABLESPACE tablespace;

ALTER TABLE schema.logtable
    OWNER to user;

CREATE INDEX idx_timestampcol
    ON schema.logtable USING btree
    ( timestampcol ASC NULLS LAST )
    TABLESPACE tablespace ;

CREATE INDEX idx_test2
    ON schema.logtable USING btree
    ( entrytype  ASC NULLS LAST)
    TABLESPACE tablespace
    WHERE archivestatus <= 1;

CREATE INDEX idx_arcstatus
    ON schema.logtable USING btree
    ( archivestatus ASC NULLS LAST)
    TABLESPACE tablespace;

CREATE INDEX idx_entrytype
    ON schema.logtable USING btree
    ( entrytype ASC NULLS LAST)
    TABLESPACE tablespace ;


The table contains 14.000.000 entries and has about 3.3 GB of data:
No triggers, inserts per day, probably 5-20 K per day.

SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='logtable';

relname           |relpages|reltuples|relallvisible|relkind|relnatts|relhassubclass|reloptions|pg_table_size|
------------------|--------|---------|-------------|-------|--------|--------------|----------|-------------|
logtable          |  405988| 14091424|       405907|r      |      54|false         |NULL      |   3326803968|


The slow running query:

SELECT column1,..., column54  where ((entrytype = 4000 or entrytype = 4001 or entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;

This query runs in about 45-60 seconds.
The same query runs in about 289 ms Oracle and 423 ms in SQL-Server. 
Now I understand that actually loading all results would take a while. (about 520K or so rows) 
But that shouldn't be exactly what happens right? There should be a resultset iterator which can retrieve all data but doesn't from the get go. 

With the help of some people in the slack and so thread, I've found a configuration parameter which helps performance : 

set random_page_cost = 1;

This improved performance from 45-60 s to 15-35 s. (since we are using ssd's) 
Still not acceptable but definitely an improvement. 
Some maybe relevant system parameters:

effective_cache_size 4GB
maintenance_work_mem 1GB
shared_buffers 2GB
work_mem 1GB

Currently I'm accessing the data through DbBeaver (JDBC - postgresql-42.2.5.jar) and our JAVA application (JDBC - postgresql-42.2.19.jar). Both use the defaultRowFetchSize=5000 to not load everything into memory and limit the results. 
The explain plan:

EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)...
(Above Query)

Gather Merge  (cost=347142.71..397196.91 rows=429006 width=2558) (actual time=21210.019..22319.444 rows=515841 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=141487 read=153489
  ->  Sort  (cost=346142.69..346678.95 rows=214503 width=2558) (actual time=21148.887..21297.428 rows=171947 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 62180kB
        Worker 0:  Sort Method: quicksort  Memory: 56969kB
        Worker 1:  Sort Method: quicksort  Memory: 56837kB
        Buffers: shared hit=141487 read=153489
        Worker 0: actual time=21129.973..21296.839 rows=166776 loops=1
          Buffers: shared hit=45558 read=49514
        Worker 1: actual time=21114.439..21268.117 rows=165896 loops=1
          Buffers: shared hit=45104 read=49506
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5652.74..327147.77 rows=214503 width=2558) (actual time=1304.813..20637.462 rows=171947 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=103962
              Buffers: shared hit=141473 read=153489
              Worker 0: actual time=1280.472..20638.620 rows=166776 loops=1
                Buffers: shared hit=45551 read=49514
              Worker 1: actual time=1275.274..20626.219 rows=165896 loops=1
                Buffers: shared hit=45097 read=49506
              ->  BitmapOr  (cost=5652.74..5652.74 rows=520443 width=0) (actual time=1179.438..1179.438 rows=0 loops=1)
                    Buffers: shared hit=9 read=1323
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..556.61 rows=54957 width=0) (actual time=161.939..161.940 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=1 read=171
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2243.22 rows=221705 width=0) (actual time=548.849..548.849 rows=224945 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=4 read=576
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2466.80 rows=243782 width=0) (actual time=468.637..468.637 rows=224926 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=4 read=576
Settings: random_page_cost = '1', search_path = '"$user", schema, public', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.578 ms
Execution Time: 22617.351 ms


As mentioned before, oracle does this much faster. 

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                         |  6878 |  2491K|       |  2143   (1)| 00:00:01 |
|   1 |  SORT ORDER BY                        |                         |  6878 |  2491K|  3448K|  2143   (1)| 00:00:01 |
|   2 |   INLIST ITERATOR                     |                         |       |       |       |            |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| logtable                |  6878 |  2491K|       |  1597   (1)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | idx_entrytype           |  6878 |       |       |    23   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------------

Is there much I can analyze, any information you might need to further analyze this? 


--
Thanks,
Vijay
Mumbai, India
--
Thanks,
Vijay
Mumbai, India

Re: Very slow Query compared to Oracle / SQL - Server

From
Rick Otten
Date:

On Thu, May 6, 2021 at 10:38 AM Semen Yefimenko <semen.yefimenko@gmail.com> wrote:
Hi there,

I've recently been involved in migrating our old system to SQL Server and then PostgreSQL. Everything has been working fine so far but now after executing our tests on Postgres, we saw a very slow running query on a large table in our database. 
I have tried asking on other platforms but no one has been able to give me a satisfying answer. 
...
SELECT column1,..., column54  where ((entrytype = 4000 or entrytype = 4001 or entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;


I know several people have suggested using `IN` to replace the or statements, that would be my first go-to also.  Another approach I have found helpful is to keep in mind whenever you have an  `OR`  in a where clause it can be replaced with a `UNION ALL`.  Usually the `UNION ALL` is faster.

I recommend avoiding `OR` in where clauses as much as possible.  - Sometimes it can't be helped, especially if you need an exclusive or, but most of the time there is another way that is usually better.

Another thought is "archivestatus" really a boolean or does it have multiple states?  If it is actually a boolean, then can you change the column data type?

Re: Very slow Query compared to Oracle / SQL - Server

From
Semen Yefimenko
Date:
For testing purposes I set up a separate postgres 13.2 instance on windows.
To my surprise, it works perfectly fine. Also indexes, have about 1/4 of the size they had on 12.6. 
I'll try setting up a new 12.6 instance and see if I can reproduce anything. 

This explain plan is on a SSD local postgres 13.2 instance with default settings and not setting random_page_cost.

Gather Merge  (cost=19444.07..19874.60 rows=3690 width=2638) (actual time=41.633..60.538 rows=7087 loops=1)
  Output: column1, .. ,column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=2123
  ->  Sort  (cost=18444.05..18448.66 rows=1845 width=2638) (actual time=4.057..4.595 rows=2362 loops=3)
        Output: column1, .. ,column54
        Sort Key: logtable.timestampcol1 DESC
        Sort Method: quicksort  Memory: 3555kB
        Buffers: shared hit=2123
        Worker 0:  actual time=0.076..0.077 rows=0 loops=1
          Sort Method: quicksort  Memory: 25kB
          Buffers: shared hit=7
        Worker 1:  actual time=0.090..0.091 rows=0 loops=1
          Sort Method: quicksort  Memory: 25kB
          Buffers: shared hit=7
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=61.84..16243.96 rows=1845 width=2638) (actual time=0.350..2.419 rows=2362 loops=3)
              Output: column1, .. ,column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.tfnlogent_archivestatus <= 1)
              Heap Blocks: exact=2095
              Buffers: shared hit=2109
              Worker 0:  actual time=0.030..0.030 rows=0 loops=1
              Worker 1:  actual time=0.035..0.036 rows=0 loops=1
              ->  BitmapOr  (cost=61.84..61.84 rows=4428 width=0) (actual time=0.740..0.742 rows=0 loops=1)
                    Buffers: shared hit=14
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..19.50 rows=1476 width=0) (actual time=0.504..0.504 rows=5475 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=7
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..19.50 rows=1476 width=0) (actual time=0.056..0.056 rows=830 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=3
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..19.50 rows=1476 width=0) (actual time=0.178..0.179 rows=782 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=4
Planning Time: 0.212 ms
Execution Time: 61.692 ms


I've also installed a locally running 12.6 on windows. 
Unfortunately I couldn't reproduce the issue. I loaded the data with a tool that I wrote a few months ago which basically independently from that database inserts data and creates sequences and indexes.
Query also finishes in like 70 ~ ms. Then I've tried pg_dump into a different database on the same dev database (where the slow query still exists). The performance is just as bad on this database and indexes are also all 300 MB big (whereas on my locally running instance they're at around 80 MB) 
Now I'm trying to insert the data with the same tool I've used for my local installations on the remote dev database.
This will still take some time so I will update once I have this tested. 
Seems like there is something skewed going on with the development database so far. 


Am Fr., 7. Mai 2021 um 11:56 Uhr schrieb Vijaykumar Jain <vijaykumarjain.github@gmail.com>:
Is this on windows ?

I see a thread that mentions of performance penalty due to parallel worker 


There is a mailing thread with subject line - 
Huge performance penalty with parallel queries in Windows x64 v. Linux x64



On Fri, 7 May 2021 at 2:33 PM Semen Yefimenko <semen.yefimenko@gmail.com> wrote:
As mentionend in the slack comments :
SELECT pg_size_pretty(pg_relation_size('logtable')) as table_size, pg_size_pretty(pg_relation_size('idx_entrytype')) as index_size, (pgstattuple('logtable')).dead_tuple_percent;
table_size | index_size | dead_tuple_percent
------------+------------+--------------------
3177 MB | 289 MB | 0

I have roughly 6 indexes which all have around 300 MB
SELECT pg_relation_size('logtable') as table_size, pg_relation_size(idx_entrytype) as index_size, 100-(pgstatindex('idx_entrytype')).avg_leaf_density as bloat_ratio
table_size | index_size | bloat_ratio
------------+------------+-------------------
3331694592 | 302555136 | 5.219999999999999

Your queries:
n_live_tup n_dead_tup
14118380   0

For testing, I've also been running VACUUM and ANALYZE pretty much before every test run.

Am Fr., 7. Mai 2021 um 10:44 Uhr schrieb Vijaykumar Jain <vijaykumarjain.github@gmail.com>:
ok one last thing, not to be a PITA, but just in case if this helps.

postgres=# SELECT * FROM pg_stat_user_indexes where relname = 'logtable'; postgres=# SELECT * FROM pg_stat_user_tables where relname = 'logtable';
basically, i just to verify if the table is not bloated.
looking at n_live_tup vs n_dead_tup would help understand it.

if you see too many dead tuples,
vacuum (ANALYZE,verbose) logtable; -- would get rid of dead tuples (if there are no tx using the dead tuples) 

and then run your query.

Thanks,
Vijay






On Fri, 7 May 2021 at 13:34, Semen Yefimenko <semen.yefimenko@gmail.com> wrote:
Sorry if I'm cumulatively answering everyone in one E-Mail, I'm not sure how I'm supposed to do it. (single E-Mails vs many) 
 
Can you try tuning by increasing the shared_buffers slowly in steps of  500MB, and running explain analyze against the query.

-- 2500 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=2076.329..3737.050 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=295446
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=2007.487..2202.707 rows=172172 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 65154kB
        Worker 0:  Sort Method: quicksort  Memory: 55707kB
        Worker 1:  Sort Method: quicksort  Memory: 55304kB
        Buffers: shared hit=295446
        Worker 0: actual time=1963.969..2156.624 rows=161205 loops=1
          Buffers: shared hit=91028
        Worker 1: actual time=1984.700..2179.697 rows=161935 loops=1
          Buffers: shared hit=92133
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=322.125..1618.971 rows=172172 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=110951
              Buffers: shared hit=295432
              Worker 0: actual time=282.201..1595.117 rows=161205 loops=1
                Buffers: shared hit=91021
              Worker 1: actual time=303.671..1623.299 rows=161935 loops=1
                Buffers: shared hit=92126
              ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0) (actual time=199.119..199.119 rows=0 loops=1)
                    Buffers: shared hit=1334
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..682.13 rows=67293 width=0) (actual time=28.856..28.857 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2223.63 rows=219760 width=0) (actual time=108.871..108.872 rows=225283 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=581
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2261.87 rows=223525 width=0) (actual time=61.377..61.377 rows=225264 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=581
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.940 ms
Execution Time: 4188.083 ms
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-- 3000 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=2062.280..3763.408 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=295446
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=1987.933..2180.422 rows=172172 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 66602kB
        Worker 0:  Sort Method: quicksort  Memory: 55149kB
        Worker 1:  Sort Method: quicksort  Memory: 54415kB
        Buffers: shared hit=295446
        Worker 0: actual time=1963.059..2147.916 rows=159556 loops=1
          Buffers: shared hit=89981
        Worker 1: actual time=1949.726..2136.200 rows=158554 loops=1
          Buffers: shared hit=90141
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=340.705..1603.796 rows=172172 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=113990
              Buffers: shared hit=295432
              Worker 0: actual time=317.918..1605.548 rows=159556 loops=1
                Buffers: shared hit=89974
              Worker 1: actual time=304.744..1589.221 rows=158554 loops=1
                Buffers: shared hit=90134
              ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0) (actual time=218.972..218.973 rows=0 loops=1)
                    Buffers: shared hit=1334
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..682.13 rows=67293 width=0) (actual time=37.741..37.742 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2223.63 rows=219760 width=0) (actual time=119.120..119.121 rows=225283 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=581
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2261.87 rows=223525 width=0) (actual time=62.097..62.098 rows=225264 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=581
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 2.717 ms
Execution Time: 4224.670 ms
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-- 3500 MB shared buffers - random_page_cost = 1;
Gather Merge  (cost=343085.23..392186.19 rows=420836 width=2542) (actual time=3578.155..4932.858 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=14 read=295432 written=67
  ->  Sort  (cost=342085.21..342611.25 rows=210418 width=2542) (actual time=3482.159..3677.227 rows=172172 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 58533kB
        Worker 0:  Sort Method: quicksort  Memory: 56878kB
        Worker 1:  Sort Method: quicksort  Memory: 60755kB
        Buffers: shared hit=14 read=295432 written=67
        Worker 0: actual time=3435.131..3632.985 rows=166842 loops=1
          Buffers: shared hit=7 read=95783 written=25
        Worker 1: actual time=3441.545..3649.345 rows=179354 loops=1
          Buffers: shared hit=5 read=101608 written=20
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5546.39..323481.21 rows=210418 width=2542) (actual time=345.111..3042.932 rows=172172 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=96709
              Buffers: shared hit=2 read=295430 written=67
              Worker 0: actual time=300.525..2999.403 rows=166842 loops=1
                Buffers: shared read=95783 written=25
              Worker 1: actual time=300.552..3004.859 rows=179354 loops=1
                Buffers: shared read=101606 written=20
              ->  BitmapOr  (cost=5546.39..5546.39 rows=510578 width=0) (actual time=241.996..241.997 rows=0 loops=1)
                    Buffers: shared hit=2 read=1332
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..682.13 rows=67293 width=0) (actual time=37.129..37.130 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared read=172
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2223.63 rows=219760 width=0) (actual time=131.051..131.052 rows=225283 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=1 read=580
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2261.87 rows=223525 width=0) (actual time=73.800..73.800 rows=225264 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=1 read=580
Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.597 ms
Execution Time: 5389.811 ms
 
This doesn't seem to have had an effect. 
Thanks for the suggestion. 

Have you try of excluding not null from index? Can you give dispersion of archivestatus?

Yes I have, it yielded the same performance boost as :

 create index test on logtable(entrytype) where archivestatus <= 1;

I wonder what the old query plan was...
Would you include links to your prior correspondance ?

So prior Execution Plans are present in the SO.
The other forums I've tried are the official slack channel : 
https://postgresteam.slack.com/archives/C0FS3UTAP/p1620286295228600
And SO : https://stackoverflow.com/questions/67401792/slow-running-postgresql-query
But I think most of the points discussed in these posts have already been mentionend by you except bloating of indexes. 

Oracle is apparently doing a single scan on "entrytype".
As a test, you could try forcing that, like:
begin; SET enable_bitmapscan=off ; explain (analyze) [...]; rollback;
or
begin; DROP INDEX idx_arcstatus; explain (analyze) [...]; rollback;

I've tried enable_bitmapscan=off but it didn't yield any good results.

-- 2000 MB shared buffers - random_page_cost = 4 - enable_bitmapscan to off
Gather Merge  (cost=543949.72..593050.69 rows=420836 width=2542) (actual time=7716.031..9043.399 rows=516517 loops=1)
  Output: column1, .., column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=192 read=406605
  ->  Sort  (cost=542949.70..543475.75 rows=210418 width=2542) (actual time=7642.666..7835.527 rows=172172 loops=3)
        Output: column1, .., column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 58803kB
        Worker 0:  Sort Method: quicksort  Memory: 60376kB
        Worker 1:  Sort Method: quicksort  Memory: 56988kB
        Buffers: shared hit=192 read=406605
        Worker 0: actual time=7610.482..7814.905 rows=177637 loops=1
          Buffers: shared hit=78 read=137826
        Worker 1: actual time=7607.645..7803.561 rows=167316 loops=1
          Buffers: shared hit=80 read=132672
        ->  Parallel Seq Scan on schema.logtable  (cost=0.00..524345.70 rows=210418 width=2542) (actual time=1.669..7189.365 rows=172172 loops=3)
              Output: column1, .., column54
              Filter: ((logtable.acrhivestatus <= 1) AND ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)))
              Rows Removed by Filter: 4533459
              Buffers: shared hit=96 read=406605
              Worker 0: actual time=1.537..7158.286 rows=177637 loops=1
                Buffers: shared hit=30 read=137826
              Worker 1: actual time=1.414..7161.670 rows=167316 loops=1
                Buffers: shared hit=32 read=132672
Settings: enable_bitmapscan = 'off', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.725 ms
Execution Time: 9500.928 ms
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2000 MB shared buffers - random_page_cost = 4 - -- 2000 -- 2000 MB shared buffers - random_page_cost = 1 - enable_bitmapscan to off
Gather Merge  (cost=543949.72..593050.69 rows=420836 width=2542) (actual time=7519.032..8871.433 rows=516517 loops=1)
  Output: column1, .., column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=576 read=406221
  ->  Sort  (cost=542949.70..543475.75 rows=210418 width=2542) (actual time=7451.958..7649.480 rows=172172 loops=3)
        Output: column1, .., column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 58867kB
        Worker 0:  Sort Method: quicksort  Memory: 58510kB
        Worker 1:  Sort Method: quicksort  Memory: 58788kB
        Buffers: shared hit=576 read=406221
        Worker 0: actual time=7438.271..7644.241 rows=172085 loops=1
          Buffers: shared hit=203 read=135166
        Worker 1: actual time=7407.574..7609.922 rows=172948 loops=1
          Buffers: shared hit=202 read=135225
        ->  Parallel Seq Scan on schema.logtable  (cost=0.00..524345.70 rows=210418 width=2542) (actual time=2.839..7017.729 rows=172172 loops=3)
              Output: column1, .., column54
              Filter: ((logtable.acrhivestatus <= 1) AND ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)))
              Rows Removed by Filter: 4533459
              Buffers: shared hit=480 read=406221
              Worker 0: actual time=2.628..7006.420 rows=172085 loops=1
                Buffers: shared hit=155 read=135166
              Worker 1: actual time=3.948..6978.154 rows=172948 loops=1
                Buffers: shared hit=154 read=135225
Settings: enable_bitmapscan = 'off', random_page_cost = '1', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.621 ms
Execution Time: 9339.457 ms

Have you tune shared buffers enough? Each block is of 8k by default.
BTW, please try to reset random_page_cost.

Look above. 

I will try upgrading the minor version next.
I will also try setting up a 13.X version locally and import the data from 12.2 to 13.X and see if it might be faster.


Am Do., 6. Mai 2021 um 23:16 Uhr schrieb Imre Samu <pella.samu@gmail.com>:
> Postgres Version : PostgreSQL 12.2,
>  ... ON ... USING btree

IMHO:
The next minor (bugix&security) release is near ( expected ~ May 13th, 2021 )   https://www.postgresql.org/developer/roadmap/
so you can update your PostgreSQL to 12.7  ( + full Reindexing recommended ! )
 
You can find a lot of B-tree index-related fixes.
 - Fix possible undercounting of deleted B-tree index pages in VACUUM VERBOSE output
-  Fix wrong bookkeeping for oldest deleted page in a B-tree index
-  Ensure INCLUDE'd columns are always removed from B-tree pivot tuples
  - Avoid repeated marking of dead btree index entries as dead
  - Fix failure of parallel B-tree index scans when the index condition is unsatisfiable


> COLLATE pg_catalog."default"

You can test the "C" Collation  in some columns  (keys ? ) ;  in theory, it should be faster :
"The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them."

Best,
 Imre


Semen Yefimenko <semen.yefimenko@gmail.com> ezt írta (időpont: 2021. máj. 6., Cs, 16:38):
Hi there,

I've recently been involved in migrating our old system to SQL Server and then PostgreSQL. Everything has been working fine so far but now after executing our tests on Postgres, we saw a very slow running query on a large table in our database. 
I have tried asking on other platforms but no one has been able to give me a satisfying answer. 

Postgres Version : PostgreSQL 12.2, compiled by Visual C++ build 1914, 64-bit
No notable errors in the Server log and the Postgres Server itself.

The table structure :

CREATE TABLE logtable
(
    key character varying(20) COLLATE pg_catalog."default" NOT NULL,
    id integer,
    column3 integer,
    column4 integer,
    column5 integer,
    column6 integer,
    column7 integer,
    column8 integer,
    column9 character varying(128) COLLATE pg_catalog."default",
    column10   character varying(2048) COLLATE pg_catalog."default",
    column11   character varying(2048) COLLATE pg_catalog."default",
    column12   character varying(2048) COLLATE pg_catalog."default",
    column13   character varying(2048) COLLATE pg_catalog."default",
    column14   character varying(2048) COLLATE pg_catalog."default",
    column15   character varying(2048) COLLATE pg_catalog."default",
    column16   character varying(2048) COLLATE pg_catalog."default",
    column17   character varying(2048) COLLATE pg_catalog."default",
    column18   character varying(2048) COLLATE pg_catalog."default",
    column19   character varying(2048) COLLATE pg_catalog."default",
    column21 character varying(256) COLLATE pg_catalog."default",
    column22 character varying(256) COLLATE pg_catalog."default",
    column23 character varying(256) COLLATE pg_catalog."default",
    column24 character varying(256) COLLATE pg_catalog."default",
    column25 character varying(256) COLLATE pg_catalog."default",
    column26 character varying(256) COLLATE pg_catalog."default",
    column27 character varying(256) COLLATE pg_catalog."default",
    column28 character varying(256) COLLATE pg_catalog."default",
    column29 character varying(256) COLLATE pg_catalog."default",
    column30 character varying(256) COLLATE pg_catalog."default",
    column31 character varying(256) COLLATE pg_catalog."default",
    column32 character varying(256) COLLATE pg_catalog."default",
    column33 character varying(256) COLLATE pg_catalog."default",
    column34 character varying(256) COLLATE pg_catalog."default",
    column35 character varying(256) COLLATE pg_catalog."default",
    entrytype integer,
    column37 bigint,
    column38 bigint,
    column39 bigint,
    column40 bigint,
    column41 bigint,
    column42 bigint,
    column43 bigint,
    column44 bigint,
    column45 bigint,
    column46 bigint,
    column47 character varying(128) COLLATE pg_catalog."default",
    timestampcol timestamp without time zone,
    column49 timestamp without time zone,
    column50 timestamp without time zone,
    column51 timestamp without time zone,
    column52 timestamp without time zone,
    archivestatus integer,
    column54 integer,
    column55 character varying(20) COLLATE pg_catalog."default",
    CONSTRAINT pkey PRIMARY KEY (key)
        USING INDEX TABLESPACE tablespace
)

TABLESPACE tablespace;

ALTER TABLE schema.logtable
    OWNER to user;

CREATE INDEX idx_timestampcol
    ON schema.logtable USING btree
    ( timestampcol ASC NULLS LAST )
    TABLESPACE tablespace ;

CREATE INDEX idx_test2
    ON schema.logtable USING btree
    ( entrytype  ASC NULLS LAST)
    TABLESPACE tablespace
    WHERE archivestatus <= 1;

CREATE INDEX idx_arcstatus
    ON schema.logtable USING btree
    ( archivestatus ASC NULLS LAST)
    TABLESPACE tablespace;

CREATE INDEX idx_entrytype
    ON schema.logtable USING btree
    ( entrytype ASC NULLS LAST)
    TABLESPACE tablespace ;


The table contains 14.000.000 entries and has about 3.3 GB of data:
No triggers, inserts per day, probably 5-20 K per day.

SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='logtable';

relname           |relpages|reltuples|relallvisible|relkind|relnatts|relhassubclass|reloptions|pg_table_size|
------------------|--------|---------|-------------|-------|--------|--------------|----------|-------------|
logtable          |  405988| 14091424|       405907|r      |      54|false         |NULL      |   3326803968|


The slow running query:

SELECT column1,..., column54  where ((entrytype = 4000 or entrytype = 4001 or entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;

This query runs in about 45-60 seconds.
The same query runs in about 289 ms Oracle and 423 ms in SQL-Server. 
Now I understand that actually loading all results would take a while. (about 520K or so rows) 
But that shouldn't be exactly what happens right? There should be a resultset iterator which can retrieve all data but doesn't from the get go. 

With the help of some people in the slack and so thread, I've found a configuration parameter which helps performance : 

set random_page_cost = 1;

This improved performance from 45-60 s to 15-35 s. (since we are using ssd's) 
Still not acceptable but definitely an improvement. 
Some maybe relevant system parameters:

effective_cache_size 4GB
maintenance_work_mem 1GB
shared_buffers 2GB
work_mem 1GB

Currently I'm accessing the data through DbBeaver (JDBC - postgresql-42.2.5.jar) and our JAVA application (JDBC - postgresql-42.2.19.jar). Both use the defaultRowFetchSize=5000 to not load everything into memory and limit the results. 
The explain plan:

EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)...
(Above Query)

Gather Merge  (cost=347142.71..397196.91 rows=429006 width=2558) (actual time=21210.019..22319.444 rows=515841 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=141487 read=153489
  ->  Sort  (cost=346142.69..346678.95 rows=214503 width=2558) (actual time=21148.887..21297.428 rows=171947 loops=3)
        Output: column1, .. , column54
        Sort Key: logtable.timestampcol DESC
        Sort Method: quicksort  Memory: 62180kB
        Worker 0:  Sort Method: quicksort  Memory: 56969kB
        Worker 1:  Sort Method: quicksort  Memory: 56837kB
        Buffers: shared hit=141487 read=153489
        Worker 0: actual time=21129.973..21296.839 rows=166776 loops=1
          Buffers: shared hit=45558 read=49514
        Worker 1: actual time=21114.439..21268.117 rows=165896 loops=1
          Buffers: shared hit=45104 read=49506
        ->  Parallel Bitmap Heap Scan on schema.logtable  (cost=5652.74..327147.77 rows=214503 width=2558) (actual time=1304.813..20637.462 rows=171947 loops=3)
              Output: column1, .. , column54
              Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
              Filter: (logtable.archivestatus <= 1)
              Heap Blocks: exact=103962
              Buffers: shared hit=141473 read=153489
              Worker 0: actual time=1280.472..20638.620 rows=166776 loops=1
                Buffers: shared hit=45551 read=49514
              Worker 1: actual time=1275.274..20626.219 rows=165896 loops=1
                Buffers: shared hit=45097 read=49506
              ->  BitmapOr  (cost=5652.74..5652.74 rows=520443 width=0) (actual time=1179.438..1179.438 rows=0 loops=1)
                    Buffers: shared hit=9 read=1323
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..556.61 rows=54957 width=0) (actual time=161.939..161.940 rows=65970 loops=1)
                          Index Cond: (logtable.entrytype = 4000)
                          Buffers: shared hit=1 read=171
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2243.22 rows=221705 width=0) (actual time=548.849..548.849 rows=224945 loops=1)
                          Index Cond: (logtable.entrytype = 4001)
                          Buffers: shared hit=4 read=576
                    ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..2466.80 rows=243782 width=0) (actual time=468.637..468.637 rows=224926 loops=1)
                          Index Cond: (logtable.entrytype = 4002)
                          Buffers: shared hit=4 read=576
Settings: random_page_cost = '1', search_path = '"$user", schema, public', temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.578 ms
Execution Time: 22617.351 ms


As mentioned before, oracle does this much faster. 

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                         |  6878 |  2491K|       |  2143   (1)| 00:00:01 |
|   1 |  SORT ORDER BY                        |                         |  6878 |  2491K|  3448K|  2143   (1)| 00:00:01 |
|   2 |   INLIST ITERATOR                     |                         |       |       |       |            |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| logtable                |  6878 |  2491K|       |  1597   (1)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | idx_entrytype           |  6878 |       |       |    23   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------------

Is there much I can analyze, any information you might need to further analyze this? 


--
Thanks,
Vijay
Mumbai, India
--
Thanks,
Vijay
Mumbai, India

Re: Very slow Query compared to Oracle / SQL - Server

From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes:
> On Fri, May 07, 2021 at 05:57:19PM +0200, Semen Yefimenko wrote:
>> For testing purposes I set up a separate postgres 13.2 instance on windows.
>> To my surprise, it works perfectly fine. Also indexes, have about 1/4 of
>> the size they had on 12.6.

> In pg13, indexes are de-duplicated by default.
> But I suspect the performance is better because data was reload, and the
> smaller indexes are a small, additional benefit.

Index bloat is often a consequence of inadequate vacuuming.  You might
need to dial up autovacuum's aggressiveness to keep their sizes in check.

            regards, tom lane



Re: Very slow Query compared to Oracle / SQL - Server

From
Peter Geoghegan
Date:
On Fri, May 7, 2021 at 9:16 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> In pg13, indexes are de-duplicated by default.
>
> But I suspect the performance is better because data was reload, and the
> smaller indexes are a small, additional benefit.

That's a very reasonable interpretation, since the bitmap index scans
themselves just aren't doing that much I/O -- we see that there is
much more I/O for the heap scan, which is likely to be what the
general picture looks like no matter how much bloat there is.

However, I'm not sure if that reasonable interpretation is actually
correct. The nbtinsert.c code that handles deleting LP_DEAD index
tuples no longer relies on having a page-level garbage item flag set
in Postgres 13 -- it just scans the line pointer array for LP_DEAD
items each time. VACUUM has a rather unhelpful tendency to unset the
flag when it shouldn't, which we're no longer affected by. So that's
one possible explanation.

Another possible explanation is that smaller indexes (due to
deduplication) are more likely to get index scans, which leads to
setting the LP_DEAD bit of known-dead index tuples in passing more
often (bitmap index scans won't do the kill_prior_tuple optimization).
There could even be a virtuous circle over time. (Note that the index
deletion stuff in Postgres 14 pretty much makes sure that this
happens, but it is probably at least much more likely in Postgres 13
compared to 12.)

I could easily be very wrong about all of this in this instance,
though, because the behavior I've described is highly non-linear and
therefore highly unpredictable in general (not to mention highly
sensitive to workload characteristics). I'm sure that I've thought
about this stuff way more than any other individual Postgres
contributor, but it's easy to be wrong in any given instance. The real
explanation might be something else entirely. Though it's hard not to
imagine that what really matters here is avoiding all of that bitmap
heap scan I/O.

-- 
Peter Geoghegan



Re: Very slow Query compared to Oracle / SQL - Server

From
Peter Geoghegan
Date:
On Fri, May 7, 2021 at 2:28 PM Peter Geoghegan <pg@bowt.ie> wrote:
> That's a very reasonable interpretation, since the bitmap index scans
> themselves just aren't doing that much I/O -- we see that there is
> much more I/O for the heap scan, which is likely to be what the
> general picture looks like no matter how much bloat there is.
>
> However, I'm not sure if that reasonable interpretation is actually
> correct. The nbtinsert.c code that handles deleting LP_DEAD index
> tuples no longer relies on having a page-level garbage item flag set
> in Postgres 13 -- it just scans the line pointer array for LP_DEAD
> items each time.

BTW, I am pointing all of this out because I've heard informal reports
of big improvements following an upgrade to Postgres 13 that seem
unlikely to be related to the simple fact that indexes are smaller
(most of the time you cannot save that much I/O by shrinking indexes
without affected when and how TIDs/heap tuples are scanned).

It's necessary to simulate the production workload to have *any* idea
if LP_DEAD index tuple deletion might be a factor. If the OP is just
testing this one query on Postgres 13 in isolation, without anything
bloating up (or cleaning up) indexes, then that doesn't really tell us
anything about how Postgres 13 compares to Postgres 12. As you said,
simply shrinking the indexes is nice, but not enough -- we'd need some
second of second order effect to get acceptable performance over time
and under real world conditions.

-- 
Peter Geoghegan



Re: Very slow Query compared to Oracle / SQL - Server

From
didier
Date:
Are you sure you're using the same data det ?

Unless I'm overlooking something obvious one result has 500 000 rows the other 7 000.



Re: Very slow Query compared to Oracle / SQL - Server

From
Semen Yefimenko
Date:
Unless I'm overlooking something obvious one result has 500 000 rows the other 7 000.
You are right, it wasn't. I have 2 datasets, one containing 12 mil entries and the other 14 mil entries. I accidentally used the one with 12 mil entries in that table which actually only contains 7000~ entries for that sql query.
For now I have tested the 12.6 Postgres with default values and it finished in 12 seconds. I'll do some thorough testing and let you know once I finish, sorry for the confusion. 

Am Sa., 8. Mai 2021 um 00:17 Uhr schrieb didier <did447@gmail.com>:
Are you sure you're using the same data det ?

Unless I'm overlooking something obvious one result has 500 000 rows the other 7 000.



Re: Very slow Query compared to Oracle / SQL - Server

From
Semen Yefimenko
Date:
I've done some testing on different versions of postgres.
Unfortunately after the weekend, the problem vanished. 
The systems are running as usual and the query finishes in 500 MS.
It must have been an issue with the VMs or the DISKs.
Either way, thank you for your support. 

Here are btw. some testing results. 

----------------------------------------------------Linux PG_13.2 (docker instance)
Gather  (cost=1000.00..573832.18 rows=486255 width=2567) (actual time=232.444..23682.816 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=15883 read=390758
  ->  Parallel Seq Scan on schema.logtable  (cost=0.00..524206.67 rows=202606 width=2567) (actual time=256.462..23522.715 rows=172172 loops=3)
        Output: column1, .. , column54
        Filter: ((logtable.archivestatus <= 1) AND ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)))
        Rows Removed by Filter: 4533459
        Buffers: shared hit=15883 read=390758
        Worker 0:  actual time=266.613..23529.215 rows=171917 loops=1
          JIT:
            Functions: 2
            Options: Inlining true, Optimization true, Expressions true, Deforming true
            Timing: Generation 0.805 ms, Inlining 52.127 ms, Optimization 150.748 ms, Emission 63.482 ms, Total 267.162 ms
          Buffers: shared hit=5354 read=130007
        Worker 1:  actual time=270.921..23527.953 rows=172273 loops=1
          JIT:
            Functions: 2
            Options: Inlining true, Optimization true, Expressions true, Deforming true
            Timing: Generation 1.217 ms, Inlining 49.556 ms, Optimization 154.765 ms, Emission 65.153 ms, Total 270.690 ms
          Buffers: shared hit=5162 read=130108
Planning Time: 0.356 ms
JIT:
  Functions: 6
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 3.578 ms, Inlining 106.136 ms, Optimization 443.580 ms, Emission 217.728 ms, Total 771.021 ms
Execution Time: 23736.150 ms
-----------Query Takes 245 MS
-----------IndexSize: average 80 MB
----------------------------------------------------Windows PG_12.6 (local instance)
Gather  (cost=1000.00..575262.60 rows=499935 width=2526) (actual time=2.155..2555.388 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=128 read=406517
  ->  Parallel Seq Scan on schema.logtable  (cost=0.00..524269.10 rows=208306 width=2526) (actual time=0.651..2469.220 rows=172172 loops=3)
        Output: column1, .. , column54
        Filter: ((logtable.archivestatus <= 1) AND ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)))
        Rows Removed by Filter: 4533459
        Buffers: shared hit=128 read=406517
        Worker 0: actual time=0.637..2478.110 rows=172580 loops=1
          Buffers: shared hit=41 read=135683
        Worker 1: actual time=0.084..2474.863 rows=173408 loops=1
          Buffers: shared hit=42 read=135837
Planning Time: 0.201 ms
Execution Time: 2572.065 ms
-----------Query Takes 18 MS
-----------IndexSize: average 300 MB
----------------------------------------------------Windows PG_13.2 (local instance)
Gather  (cost=1000.00..575680.37 rows=503383 width=2531) (actual time=1.045..2586.700 rows=516517 loops=1)
  Output: column1, .. , column54
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=8620 read=398025
  ->  Parallel Seq Scan on schema.logtable  (cost=0.00..524342.07 rows=209743 width=2531) (actual time=0.346..2485.163 rows=172172 loops=3)
        Output: column1, .. , column54
        Filter: ((logtable.archivestatus <= 1) AND ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002)))
        Rows Removed by Filter: 4533459
        Buffers: shared hit=8620 read=398025
        Worker 0: actual time=0.155..2487.411 rows=174277 loops=1
          Buffers: shared hit=2954 read=133173
        Worker 1: actual time=0.746..2492.533 rows=173762 loops=1
          Buffers: shared hit=2813 read=133935
Planning Time: 0.154 ms
Execution Time: 2604.983 ms
-----------Query Takes 18 MS
-----------IndexSize: average 80 MB
----------------------------------------------------Windows PG_12.6 (remote instance)
Bitmap Heap Scan on schema.logtable  (cost=10326.36..449509.96 rows=530847 width=2540) (actual time=406.235..6770.263 rows=516517 loops=1)
  Output: column1, .. , column54
  Recheck Cond: ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
  Filter: (logtable.archivestatus <= 1)
  Heap Blocks: exact=294098
  Buffers: shared hit=3632 read=291886
  ->  BitmapOr  (cost=10326.36..10326.36 rows=536922 width=0) (actual time=212.117..212.124 rows=0 loops=1)
        Buffers: shared hit=1420
        ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..1196.37 rows=64525 width=0) (actual time=30.677..30.678 rows=65970 loops=1)
              Index Cond: (logtable.entrytype = 4000)
              Buffers: shared hit=183
        ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..4605.07 rows=249151 width=0) (actual time=110.538..110.539 rows=225283 loops=1)
              Index Cond: (logtable.entrytype = 4001)
              Buffers: shared hit=619
        ->  Bitmap Index Scan on idx_entrytype  (cost=0.00..4126.79 rows=223247 width=0) (actual time=70.887..70.888 rows=225264 loops=1)
              Index Cond: (logtable.entrytype = 4002)
              Buffers: shared hit=618
Settings: temp_buffers = '80MB', work_mem = '1GB'
Planning Time: 0.409 ms
Execution Time: 7259.515 ms
-----------Query Takes 570 MS
-----------IndexSize: average 300 MB




Am Sa., 8. Mai 2021 um 14:06 Uhr schrieb Semen Yefimenko <semen.yefimenko@gmail.com>:
Unless I'm overlooking something obvious one result has 500 000 rows the other 7 000.
You are right, it wasn't. I have 2 datasets, one containing 12 mil entries and the other 14 mil entries. I accidentally used the one with 12 mil entries in that table which actually only contains 7000~ entries for that sql query.
For now I have tested the 12.6 Postgres with default values and it finished in 12 seconds. I'll do some thorough testing and let you know once I finish, sorry for the confusion. 

Am Sa., 8. Mai 2021 um 00:17 Uhr schrieb didier <did447@gmail.com>:
Are you sure you're using the same data det ?

Unless I'm overlooking something obvious one result has 500 000 rows the other 7 000.