Thread: [BUGS] BUG #14635: Query is executed slower on hot standby slave databasethen on master database

The following bug has been logged on the website:

Bug reference:      14635
Logged by:          Vitaliy Gomenyuk
Email address:      vgomenyuk@callfire.com
PostgreSQL version: 9.2.14
Operating system:   CentOS 6
Description:

Hello,

We have two equal servers (CPU, RAM, HDD).
We are using Postgresql 9.2.14 currently.
Have setup master-slave streaming replication.

Running next query at master and slave databases:
explain analyze SELECT "om".* FROM "Table" AS "om" WHERE (om."UserID" =
11111) AND (NOT om."Deleted") AND (om."Status" = 0) ORDER BY "StampToSend"
asc LIMIT 11;
An execution plans are the same on master and slave databases.

An execution time on master database: 138.116 ms
An execution time on hot standby slave database with working replication:
1623.496 ms
An execution time on slave database with turned off replication (after
creating trigger file): 132.354 ms (equal to an execution time on master
database).

Why there is so huge difference in execution time between Postgresql with
and without working streaming replication?
How to improve the situation: to make an execution time on Hot standby slave
dtabase with working replication equeal to master database?

Need help. Thanks.

Vitaliy


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs



On Fri, Apr 28, 2017 at 11:38 PM, <vgomenyuk@callfire.com> wrote:
The following bug has been logged on the website:

Bug reference:      14635
Logged by:          Vitaliy Gomenyuk
Email address:      vgomenyuk@callfire.com
PostgreSQL version: 9.2.14
Operating system:   CentOS 6
Description:

Hello,

We have two equal servers (CPU, RAM, HDD).
We are using Postgresql 9.2.14 currently.
Have setup master-slave streaming replication.

Running next query at master and slave databases:
explain analyze SELECT "om".* FROM "Table" AS "om" WHERE (om."UserID" =
11111) AND (NOT om."Deleted") AND (om."Status" = 0) ORDER BY "StampToSend"
asc LIMIT 11;
An execution plans are the same on master and slave databases.

An execution time on master database: 138.116 ms
An execution time on hot standby slave database with working replication:
1623.496 ms
An execution time on slave database with turned off replication (after
creating trigger file): 132.354 ms (equal to an execution time on master
database).

Why there is so huge difference in execution time between Postgresql with
and without working streaming replication?
How to improve the situation: to make an execution time on Hot standby slave
dtabase with working replication equeal to master database?

Is the issue happening always or only sometimes? 
can you please check explain plan output on both the machines with verbose
and buffers options.

explain (analyze, verbose, buffers) /*query*/

The explain output may help in identifying the plan of the query where it is taking
more time compared to master.

Regards,
Hari Babu
Fujitsu Australia
Hello, 

I didn't found a way how to write a response for your message on Postgresql support web site, I didn't find ant button like "Response" or "New comment" or so. Can you please share this info, maybe there is present an instruction? 

Will write my response by mail.

Yes, it happens always.

There is an execution plan from master:
texting=# explain (analyze, verbose, buffers) SELECT "om".* FROM "Table" AS "om" WHERE (om."UserID" = 11111) AND (NOT om."Deleted") AND (om."Status" = 0) ORDER BY "StampToSend" asc LIMIT 11;

                  QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------
 Limit  (cost=3819.29..3819.30 rows=6 width=207) (actual time=139.847..139.847 rows=0 loops=1)
   Output: "ID", "MessageTypeID", "ServerID", "UserID", "RecipientsCount", "Credits", "Subject", "Message", "StampToSend", "StampProcessed"
, "SendToNewRecipients", "StampNewRecipientsChecked", "InsufficientCreditsEmailSent", "Deleted", "Status", "CreatedAt", "UpdatedAt", "Code"
, "Source", "MultiplePartsNumber"
   Buffers: shared hit=70256
   ->  Sort  (cost=3819.29..3819.30 rows=6 width=207) (actual time=139.845..139.845 rows=0 loops=1)
         Output: "ID", "MessageTypeID", "ServerID", "UserID", "RecipientsCount", "Credits", "Subject", "Message", "StampToSend", "StampProc
essed", "SendToNewRecipients", "StampNewRecipientsChecked", "InsufficientCreditsEmailSent", "Deleted", "Status", "CreatedAt", "UpdatedAt",
"Code", "Source", "MultiplePartsNumber"
         Sort Key: om."StampToSend"
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=70256
         ->  Bitmap Heap Scan on public."Table" om  (cost=3795.14..3819.21 rows=6 width=207) (actual time=139.836..139.836 rows=
0 loops=1)
               Output: "ID", "MessageTypeID", "ServerID", "UserID", "RecipientsCount", "Credits", "Subject", "Message", "StampToSend", "Sta
mpProcessed", "SendToNewRecipients", "StampNewRecipientsChecked", "InsufficientCreditsEmailSent", "Deleted", "Status", "CreatedAt", "Update
dAt", "Code", "Source", "MultiplePartsNumber"
               Recheck Cond: ((om."UserID" = 11111) AND (om."Status" = 0))
               Filter: (NOT om."Deleted")
               Buffers: shared hit=70256
               ->  BitmapAnd  (cost=3795.14..3795.14 rows=6 width=0) (actual time=139.832..139.832 rows=0 loops=1)
                     Buffers: shared hit=70256
                     ->  Bitmap Index Scan on "OutgoingMessages_UserID"  (cost=0.00..237.99 rows=6872 width=0) (actual time=0.032..0.032 ro
ws=12 loops=1)
                           Index Cond: (om."UserID" = 11111)
                           Buffers: shared hit=4
                     ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actua
l time=139.761..139.761 rows=9158 loops=1)
                           Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
                           Buffers: shared hit=70252
 Total runtime: 139.900 ms
(22 rows)


There is an execution plan from slave:
texting=# explain (analyze, verbose, buffers) SELECT "om".* FROM "Table" AS "om" WHERE (om."UserID" = 11111) AND (NOT om."Deleted") AND (om."Status" = 0) ORDER BY "StampToSend" asc LIMIT 11;

                  QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------
 Limit  (cost=3819.29..3819.30 rows=6 width=207) (actual time=1472.199..1472.199 rows=0 loops=1)
   Output: "ID", "MessageTypeID", "ServerID", "UserID", "RecipientsCount", "Credits", "Subject", "Message", "StampToSend", "StampProcessed"
, "SendToNewRecipients", "StampNewRecipientsChecked", "InsufficientCreditsEmailSent", "Deleted", "Status", "CreatedAt", "UpdatedAt", "Code"
, "Source", "MultiplePartsNumber"
   Buffers: shared hit=70256
   ->  Sort  (cost=3819.29..3819.30 rows=6 width=207) (actual time=1472.197..1472.197 rows=0 loops=1)
         Output: "ID", "MessageTypeID", "ServerID", "UserID", "RecipientsCount", "Credits", "Subject", "Message", "StampToSend", "StampProc
essed", "SendToNewRecipients", "StampNewRecipientsChecked", "InsufficientCreditsEmailSent", "Deleted", "Status", "CreatedAt", "UpdatedAt",
"Code", "Source", "MultiplePartsNumber"
         Sort Key: om."StampToSend"
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=70256
         ->  Bitmap Heap Scan on public."Table" om  (cost=3795.14..3819.21 rows=6 width=207) (actual time=1472.191..1472.191 row
s=0 loops=1)
               Output: "ID", "MessageTypeID", "ServerID", "UserID", "RecipientsCount", "Credits", "Subject", "Message", "StampToSend", "Sta
mpProcessed", "SendToNewRecipients", "StampNewRecipientsChecked", "InsufficientCreditsEmailSent", "Deleted", "Status", "CreatedAt", "Update
dAt", "Code", "Source", "MultiplePartsNumber"
               Recheck Cond: ((om."UserID" = 11111) AND (om."Status" = 0))
               Filter: (NOT om."Deleted")
               Buffers: shared hit=70256
               ->  BitmapAnd  (cost=3795.14..3795.14 rows=6 width=0) (actual time=1472.186..1472.186 rows=0 loops=1)
                     Buffers: shared hit=70256
                     ->  Bitmap Index Scan on "OutgoingMessages_UserID"  (cost=0.00..237.99 rows=6872 width=0) (actual time=0.025..0.025 ro
ws=12 loops=1)
                           Index Cond: (om."UserID" = 11111)
                           Buffers: shared hit=4
                     ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actua
l time=1470.853..1470.853 rows=8671249 loops=1)
                           Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
                           Buffers: shared hit=70252
 Total runtime: 1472.253 ms
(22 rows)


The difference is:
->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=139.761..139.761 rows=9158 loops=1)

->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=1470.853..1470.853 rows=8671249 loops=1)

On Tue, May 2, 2017 at 4:20 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:


On Fri, Apr 28, 2017 at 11:38 PM, <vgomenyuk@callfire.com> wrote:
The following bug has been logged on the website:

Bug reference:      14635
Logged by:          Vitaliy Gomenyuk
Email address:      vgomenyuk@callfire.com
PostgreSQL version: 9.2.14
Operating system:   CentOS 6
Description:

Hello,

We have two equal servers (CPU, RAM, HDD).
We are using Postgresql 9.2.14 currently.
Have setup master-slave streaming replication.

Running next query at master and slave databases:
explain analyze SELECT "om".* FROM "Table" AS "om" WHERE (om."UserID" =
11111) AND (NOT om."Deleted") AND (om."Status" = 0) ORDER BY "StampToSend"
asc LIMIT 11;
An execution plans are the same on master and slave databases.

An execution time on master database: 138.116 ms
An execution time on hot standby slave database with working replication:
1623.496 ms
An execution time on slave database with turned off replication (after
creating trigger file): 132.354 ms (equal to an execution time on master
database).

Why there is so huge difference in execution time between Postgresql with
and without working streaming replication?
How to improve the situation: to make an execution time on Hot standby slave
dtabase with working replication equeal to master database?

Is the issue happening always or only sometimes? 
can you please check explain plan output on both the machines with verbose
and buffers options.

explain (analyze, verbose, buffers) /*query*/

The explain output may help in identifying the plan of the query where it is taking
more time compared to master.

Regards,
Hari Babu
Fujitsu Australia



--

Best regards,

Vitaliy Gomenyuk Senior DBA
vgomenyuk@callfire.com | +380 67 220 5903 
email | website | Facebook | Twitter
Vitaliy Gomenyuk <vgomenyuk@callfire.com> writes:
> [ same query slower on slave ]

Hmm, the discrepancy is evidently in the larger bitmap index scan:

> There is an execution plan from master:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90
rows=80249width=0) (actual time=139.761..139.761 rows=9158 loops=1) 
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

> There is an execution plan from slave:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90
rows=80249width=0) (actual time=1470.853..1470.853 rows=8671249 loops=1) 
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

The contents of the indexes should be the same, so why is the slave
returning so many more rows?  It has to be because the index entries are
not marked as killed (known-dead-to-everybody), or not being treated as
killed, in the slave.  I vaguely recall that there's a difference in the
rules for index entry visibility on slaves, but it's not clear to me why
that should be.
        regards, tom lane


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs



On Thu, May 4, 2017 at 12:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Vitaliy Gomenyuk <vgomenyuk@callfire.com> writes:
> [ same query slower on slave ]

Hmm, the discrepancy is evidently in the larger bitmap index scan:

> There is an execution plan from master:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=139.761..139.761 rows=9158 loops=1)
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

> There is an execution plan from slave:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=1470.853..1470.853 rows=8671249 loops=1)
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

The contents of the indexes should be the same, so why is the slave
returning so many more rows?  It has to be because the index entries are
not marked as killed (known-dead-to-everybody), or not being treated as
killed, in the slave.  I vaguely recall that there's a difference in the
rules for index entry visibility on slaves, but it's not clear to me why
that should be.

The index cleanup by the full vacuum and vacuum one page are WAL logged,
so when they gets replayed on the slave, both the indexes must be same.

May be the WAL didn't replayed on the slave because of conflict transaction?
Or Any other scenarios it may be different?


Hi Vitaliy,

Is it possible for you check the status of the replication? and also is it possible
for you to create another fresh slave and check whether the issue is happening
there also?

Regards,
Hari Babu
Fujitsu Australia
Hello Haribabu,

Thank you for your answer.

There is status of the replication:
select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status", pg_is_xlog_replay_paused() "replication_paused", now() - pg_last_xact_replay_timestamp() "replication_delay";
 
receive_location | replay_location | recovery_status | replication_paused | replication_delay
------------------+-----------------+-----------------+--------------------+-------------------
 66A/8F77A330     | 66A/8F77A330    | t               | f                  | 00:00:00.015721
(1 row)

We have several slave servers and I also have setup new fresh slave databases several times - I got the same situation each time.

I still need help with it.

Waiting for your response. Thanks.

On Thu, May 4, 2017 at 3:26 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:


On Thu, May 4, 2017 at 12:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Vitaliy Gomenyuk <vgomenyuk@callfire.com> writes:
> [ same query slower on slave ]

Hmm, the discrepancy is evidently in the larger bitmap index scan:

> There is an execution plan from master:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=139.761..139.761 rows=9158 loops=1)
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

> There is an execution plan from slave:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=1470.853..1470.853 rows=8671249 loops=1)
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

The contents of the indexes should be the same, so why is the slave
returning so many more rows?  It has to be because the index entries are
not marked as killed (known-dead-to-everybody), or not being treated as
killed, in the slave.  I vaguely recall that there's a difference in the
rules for index entry visibility on slaves, but it's not clear to me why
that should be.

The index cleanup by the full vacuum and vacuum one page are WAL logged,
so when they gets replayed on the slave, both the indexes must be same.

May be the WAL didn't replayed on the slave because of conflict transaction?
Or Any other scenarios it may be different?


Hi Vitaliy,

Is it possible for you check the status of the replication? and also is it possible
for you to create another fresh slave and check whether the issue is happening
there also?

Regards,
Hari Babu
Fujitsu Australia



--

Best regards,

Vitaliy Gomenyuk Senior DBA
vgomenyuk@callfire.com | +380 67 220 5903 
email | website | Facebook | Twitter
Hello, guys,

Any updates on this?

Thank you in advance.

On Mon, May 8, 2017 at 4:22 PM, Vitaliy Gomenyuk <vgomenyuk@callfire.com> wrote:
Hello Haribabu,

Thank you for your answer.

There is status of the replication:
select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status", pg_is_xlog_replay_paused() "replication_paused", now() - pg_last_xact_replay_timestamp() "replication_delay";
 
receive_location | replay_location | recovery_status | replication_paused | replication_delay
------------------+-----------------+-----------------+--------------------+-------------------
 66A/8F77A330     | 66A/8F77A330    | t               | f                  | 00:00:00.015721
(1 row)

We have several slave servers and I also have setup new fresh slave databases several times - I got the same situation each time.

I still need help with it.

Waiting for your response. Thanks.

On Thu, May 4, 2017 at 3:26 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:


On Thu, May 4, 2017 at 12:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Vitaliy Gomenyuk <vgomenyuk@callfire.com> writes:
> [ same query slower on slave ]

Hmm, the discrepancy is evidently in the larger bitmap index scan:

> There is an execution plan from master:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=139.761..139.761 rows=9158 loops=1)
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

> There is an execution plan from slave:
>                      ->  Bitmap Index Scan on "OutgoingMessages_Status_StampToSend_Deleted"  (cost=0.00..3556.90 rows=80249 width=0) (actual time=1470.853..1470.853 rows=8671249 loops=1)
>                            Index Cond: ((om."Status" = 0) AND (om."Deleted" = false))
>                            Buffers: shared hit=70252

The contents of the indexes should be the same, so why is the slave
returning so many more rows?  It has to be because the index entries are
not marked as killed (known-dead-to-everybody), or not being treated as
killed, in the slave.  I vaguely recall that there's a difference in the
rules for index entry visibility on slaves, but it's not clear to me why
that should be.

The index cleanup by the full vacuum and vacuum one page are WAL logged,
so when they gets replayed on the slave, both the indexes must be same.

May be the WAL didn't replayed on the slave because of conflict transaction?
Or Any other scenarios it may be different?


Hi Vitaliy,

Is it possible for you check the status of the replication? and also is it possible
for you to create another fresh slave and check whether the issue is happening
there also?

Regards,
Hari Babu
Fujitsu Australia



--

Best regards,

Vitaliy Gomenyuk Senior DBA
vgomenyuk@callfire.com | +380 67 220 5903 
email | website | Facebook | Twitter



--

Best regards,

Vitaliy Gomenyuk Senior DBA
vgomenyuk@callfire.com | +380 67 220 5903 
email | website | Facebook | Twitter
On Thu, May  4, 2017 at 10:26:17AM +1000, Haribabu Kommi wrote:
>     The contents of the indexes should be the same, so why is the slave
>     returning so many more rows?  It has to be because the index entries are
>     not marked as killed (known-dead-to-everybody), or not being treated as
>     killed, in the slave.  I vaguely recall that there's a difference in the
>     rules for index entry visibility on slaves, but it's not clear to me why
>     that should be.
> 
> 
> The index cleanup by the full vacuum and vacuum one page are WAL logged,
> so when they gets replayed on the slave, both the indexes must be same.
> 
> May be the WAL didn't replayed on the slave because of conflict transaction?
> Or Any other scenarios it may be different?

We don't WAL-log hint bits.  Could that affect queries running on the
standbys?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

What does it mean "We don't WAL-log hint bits"?

On Fri, May 12, 2017 at 8:30 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, May  4, 2017 at 10:26:17AM +1000, Haribabu Kommi wrote:
>     The contents of the indexes should be the same, so why is the slave
>     returning so many more rows?  It has to be because the index entries are
>     not marked as killed (known-dead-to-everybody), or not being treated as
>     killed, in the slave.  I vaguely recall that there's a difference in the
>     rules for index entry visibility on slaves, but it's not clear to me why
>     that should be.
>
>
> The index cleanup by the full vacuum and vacuum one page are WAL logged,
> so when they gets replayed on the slave, both the indexes must be same.
>
> May be the WAL didn't replayed on the slave because of conflict transaction?
> Or Any other scenarios it may be different?

We don't WAL-log hint bits.  Could that affect queries running on the
standbys?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



--

Best regards,

Vitaliy Gomenyuk Senior DBA
vgomenyuk@callfire.com | +380 67 220 5903 
email | website | Facebook | Twitter


On Sat, May 13, 2017 at 3:30 AM, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, May  4, 2017 at 10:26:17AM +1000, Haribabu Kommi wrote:
>     The contents of the indexes should be the same, so why is the slave
>     returning so many more rows?  It has to be because the index entries are
>     not marked as killed (known-dead-to-everybody), or not being treated as
>     killed, in the slave.  I vaguely recall that there's a difference in the
>     rules for index entry visibility on slaves, but it's not clear to me why
>     that should be.
>
>
> The index cleanup by the full vacuum and vacuum one page are WAL logged,
> so when they gets replayed on the slave, both the indexes must be same.
>
> May be the WAL didn't replayed on the slave because of conflict transaction?
> Or Any other scenarios it may be different?

We don't WAL-log hint bits.  Could that affect queries running on the
standbys?

Yes, this may be a reason for extra number of records that are present in the
index.

Hi Vitaly,

Hint bits are used to mark tuples as created and/or deleted by transactions
that are known committed or aborted. The changes to the hint bits are not
WAL logged. These are WAL logged when the data checksums are enabled
or wal_log_hintbits GUC is set, but these two options are not available in
PostgreSQL version 9.2

Is it possible for you test the query when you execute the reindex of the index?

or

Try to add pgstattuple and pageinspect extensions and verify whether the index
is same in both master and standby.

If there exists more number of records in standby compared to master, this may 
be because of some cleanup that is happening in master and not done in the
standby.

Regards,
Hari Babu
Fujitsu Australia
On Mon, May 15, 2017 at 5:41 PM, Vitaliy Gomenyuk
<vgomenyuk@callfire.com> wrote:
> On Fri, May 12, 2017 at 8:30 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>
>> On Thu, May  4, 2017 at 10:26:17AM +1000, Haribabu Kommi wrote:
>> >     The contents of the indexes should be the same, so why is the slave
>> >     returning so many more rows?  It has to be because the index entries are
>> >     not marked as killed (known-dead-to-everybody), or not being treated as
>> >     killed, in the slave.  I vaguely recall that there's a difference in the
>> >     rules for index entry visibility on slaves, but it's not clear to me why
>> >     that should be.
>> >
>> >
>> > The index cleanup by the full vacuum and vacuum one page are WAL logged,
>> > so when they gets replayed on the slave, both the indexes must be same.
>> >
>> > May be the WAL didn't replayed on the slave because of conflict transaction?
>> > Or Any other scenarios it may be different?
>>
>> We don't WAL-log hint bits.  Could that affect queries running on the
>> standbys?
>
> What does it mean "We don't WAL-log hint bits"?

(Please avoid top-posting...)

Haribaru has given upthread an answer good enough for that. Here are
more details...

Here is a description about what are hint bits:
https://wiki.postgresql.org/wiki/Hint_Bits

This sentence means that no WAL records are generated when a hint bit
is updated on a page. There are circumstances where setting up a hint
bit on a page causes WAL activity though: if wal_log_hints is enabled,
if a hint bit is the first update happening on a page, then a WAL
record to log a full page write of this page is taken.
-- 
Michael


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

On Mon, May 15, 2017 at 10:44 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
>
> On Sat, May 13, 2017 at 3:30 AM, Bruce Momjian <bruce@momjian.us> wrote:
>>
>> On Thu, May  4, 2017 at 10:26:17AM +1000, Haribabu Kommi wrote:
>> >     The contents of the indexes should be the same, so why is the slave
>> >     returning so many more rows?  It has to be because the index entries
>> > are
>> >     not marked as killed (known-dead-to-everybody), or not being treated
>> > as
>> >     killed, in the slave.  I vaguely recall that there's a difference in
>> > the
>> >     rules for index entry visibility on slaves, but it's not clear to me
>> > why
>> >     that should be.
>> >
>> >
>> > The index cleanup by the full vacuum and vacuum one page are WAL logged,
>> > so when they gets replayed on the slave, both the indexes must be same.
>> >
>> > May be the WAL didn't replayed on the slave because of conflict
>> > transaction?
>> > Or Any other scenarios it may be different?
>>
>> We don't WAL-log hint bits.  Could that affect queries running on the
>> standbys?
>
>
> Yes, this may be a reason for extra number of records that are present in
> the
> index.
>
> Hi Vitaly,
>
> Hint bits are used to mark tuples as created and/or deleted by transactions
> that are known committed or aborted. The changes to the hint bits are not
> WAL logged. These are WAL logged when the data checksums are enabled
> or wal_log_hintbits GUC is set, but these two options are not available in
> PostgreSQL version 9.2
>
> Is it possible for you test the query when you execute the reindex of the
> index?
>
> or
>
> Try to add pgstattuple and pageinspect extensions and verify whether the
> index
> is same in both master and standby.
>
> If there exists more number of records in standby compared to master, this
> may
> be because of some cleanup that is happening in master and not done in the
> standby.
>

Because hint bit affects to only heap tuple I think that it's not
relevant. But because marking index entry as dead is not WAL-logged
it's possible that unnecessary index entries are marked as dead on
master server while these are not marked on standby server. This would
make index scan on standby server return entry much more than master
server.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Thank you for all your answers.

1. We can't reindex the table on a hot-standby database:
texting=# REINDEX TABLE "Table";
ERROR:  cannot execute REINDEX during recovery

2. I'm not sure that the problem is with not WAL-logged hint bits, because due to my next investigation:
  2.0. We have two equal servers for master and slave database (CPU, RAM, HDD).
  2.1. On master database I'm running a query, it executes in 138.116 ms
  2.2. On a hot-standby slave database with working replication process I'm running the same query, it executes in 1623.496 ms (much slower)
  2.3. Then on that hot-standby slave database I've created a trigger file, which stops a replication process immediately and right after that, I'm running the same query, it executes in 132.354 ms (fast like on master). 
  2.4. During the time when I'm creating a trigger file (less then 1 second), indexes in that table can't be changed. Even more, that slave hot-standby database was created from the full master copy just before I started test. So master and slave databases are equal. One difference is turned on replication process in a hot-standby mode. So only a replication process can cause this behaviour. But why and how to improve the situation?

A question is still open.




On Tue, May 16, 2017 at 4:58 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, May 15, 2017 at 10:44 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
>
> On Sat, May 13, 2017 at 3:30 AM, Bruce Momjian <bruce@momjian.us> wrote:
>>
>> On Thu, May  4, 2017 at 10:26:17AM +1000, Haribabu Kommi wrote:
>> >     The contents of the indexes should be the same, so why is the slave
>> >     returning so many more rows?  It has to be because the index entries
>> > are
>> >     not marked as killed (known-dead-to-everybody), or not being treated
>> > as
>> >     killed, in the slave.  I vaguely recall that there's a difference in
>> > the
>> >     rules for index entry visibility on slaves, but it's not clear to me
>> > why
>> >     that should be.
>> >
>> >
>> > The index cleanup by the full vacuum and vacuum one page are WAL logged,
>> > so when they gets replayed on the slave, both the indexes must be same.
>> >
>> > May be the WAL didn't replayed on the slave because of conflict
>> > transaction?
>> > Or Any other scenarios it may be different?
>>
>> We don't WAL-log hint bits.  Could that affect queries running on the
>> standbys?
>
>
> Yes, this may be a reason for extra number of records that are present in
> the
> index.
>
> Hi Vitaly,
>
> Hint bits are used to mark tuples as created and/or deleted by transactions
> that are known committed or aborted. The changes to the hint bits are not
> WAL logged. These are WAL logged when the data checksums are enabled
> or wal_log_hintbits GUC is set, but these two options are not available in
> PostgreSQL version 9.2
>
> Is it possible for you test the query when you execute the reindex of the
> index?
>
> or
>
> Try to add pgstattuple and pageinspect extensions and verify whether the
> index
> is same in both master and standby.
>
> If there exists more number of records in standby compared to master, this
> may
> be because of some cleanup that is happening in master and not done in the
> standby.
>

Because hint bit affects to only heap tuple I think that it's not
relevant. But because marking index entry as dead is not WAL-logged
it's possible that unnecessary index entries are marked as dead on
master server while these are not marked on standby server. This would
make index scan on standby server return entry much more than master
server.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



--

Best regards,

Vitaliy Gomenyuk DBA
On Thu, May 18, 2017 at 9:26 AM, Vitaliy Gomenyuk <vgomenyuk@callfire.com> wrote:

2. I'm not sure that the problem is with not WAL-logged hint bits, because due to my next investigation:
  2.0. We have two equal servers for master and slave database (CPU, RAM, HDD).
  2.1. On master database I'm running a query, it executes in 138.116 ms
  2.2. On a hot-standby slave database with working replication process I'm running the same query, it executes in 1623.496 ms (much slower)
  2.3. Then on that hot-standby slave database I've created a trigger file, which stops a replication process immediately and right after that, I'm running the same query, it executes in 132.354 ms (fast like on master). 
  2.4. During the time when I'm creating a trigger file (less then 1 second), indexes in that table can't be changed. Even more, that slave hot-standby database was created from the full master copy just before I started test. So master and slave databases are equal. One difference is turned on replication process in a hot-standby mode. So only a replication process can cause this behaviour.

If the index hint bits were set on the master and then a base backup is taken, then those bits are set on the replica created from that backup.  But it does not use the index hint bits while it is in hot standby mode.  But when it is promoted, it will start using them.
 
 
But why and how to improve the situation?

Vacuum your table (on the master).  

Cheers,

Jeff
Hello, 

So we have two reasons of current problem:

1. Changes to hint bits are not WAL logged => can be fixed by upgrading PostgreSQL at least to 9.4 version and turning on parameter "wal_log_hints";

2. Slave DB does not use the index hint bits while it is in hot standby mode => how this can be fixed?

Thank you for your answers.

On Fri, May 19, 2017 at 6:41 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Thu, May 18, 2017 at 9:26 AM, Vitaliy Gomenyuk <vgomenyuk@callfire.com> wrote:

2. I'm not sure that the problem is with not WAL-logged hint bits, because due to my next investigation:
  2.0. We have two equal servers for master and slave database (CPU, RAM, HDD).
  2.1. On master database I'm running a query, it executes in 138.116 ms
  2.2. On a hot-standby slave database with working replication process I'm running the same query, it executes in 1623.496 ms (much slower)
  2.3. Then on that hot-standby slave database I've created a trigger file, which stops a replication process immediately and right after that, I'm running the same query, it executes in 132.354 ms (fast like on master). 
  2.4. During the time when I'm creating a trigger file (less then 1 second), indexes in that table can't be changed. Even more, that slave hot-standby database was created from the full master copy just before I started test. So master and slave databases are equal. One difference is turned on replication process in a hot-standby mode. So only a replication process can cause this behaviour.

If the index hint bits were set on the master and then a base backup is taken, then those bits are set on the replica created from that backup.  But it does not use the index hint bits while it is in hot standby mode.  But when it is promoted, it will start using them.
 
 
But why and how to improve the situation?

Vacuum your table (on the master).  

Cheers,

Jeff



--

Best regards,

Vitaliy Gomenyuk Senior DBA
vgomenyuk@callfire.com | +380 67 220 5903 
email | website | Facebook | Twitter


On Fri, May 26, 2017 at 9:42 PM, Vitaliy Gomenyuk <vgomenyuk@callfire.com> wrote:
Hello, 

So we have two reasons of current problem:

1. Changes to hint bits are not WAL logged => can be fixed by upgrading PostgreSQL at least to 9.4 version and turning on parameter "wal_log_hints";

2. Slave DB does not use the index hint bits while it is in hot standby mode => how this can be fixed?

The main reason for slave DB not use the index hint bits because of difference
in between xmin value of both master and standby, This is required for proper
MVCC behavior for the queries that are running on the standby.

Following is the comment from the code that explains more.

/*
* During recovery we ignore killed tuples and don't bother to kill them
* either. We do this because the xmin on the primary node could easily be
* later than the xmin on the standby node, so that what the primary
* thinks is killed is supposed to be visible on standby. So for correct
* MVCC for queries during recovery we must ignore these hints and check
* all tuples. Do *not* set ignore_killed_tuples to true when running in a
* transaction that was started during recovery. xactStartedInRecovery
* should not be altered by index AMs.
*/

Even if the hint bits are WAL logged, it doesn't solve the problem of multiple
rows selection in standby. I feel the only way to reduce the number of dead tuples
visibility in standby, do a frequent VACUUM or Increase the AUTOVACUUM
interval or changing the default values of AUTOVACUUM parameters of the
tables that are frequently updated/deleted as these operations generates dead
tuples.
 
Regards,
Hari Babu
Fujitsu Australia
That make sense. 
Sadly, that there is no other way, except vacuum, how to improve the situation.
Thank you for all your answers.

On Fri, May 26, 2017 at 8:12 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:


On Fri, May 26, 2017 at 9:42 PM, Vitaliy Gomenyuk <vgomenyuk@callfire.com> wrote:
Hello, 

So we have two reasons of current problem:

1. Changes to hint bits are not WAL logged => can be fixed by upgrading PostgreSQL at least to 9.4 version and turning on parameter "wal_log_hints";

2. Slave DB does not use the index hint bits while it is in hot standby mode => how this can be fixed?

The main reason for slave DB not use the index hint bits because of difference
in between xmin value of both master and standby, This is required for proper
MVCC behavior for the queries that are running on the standby.

Following is the comment from the code that explains more.

/*
* During recovery we ignore killed tuples and don't bother to kill them
* either. We do this because the xmin on the primary node could easily be
* later than the xmin on the standby node, so that what the primary
* thinks is killed is supposed to be visible on standby. So for correct
* MVCC for queries during recovery we must ignore these hints and check
* all tuples. Do *not* set ignore_killed_tuples to true when running in a
* transaction that was started during recovery. xactStartedInRecovery
* should not be altered by index AMs.
*/

Even if the hint bits are WAL logged, it doesn't solve the problem of multiple
rows selection in standby. I feel the only way to reduce the number of dead tuples
visibility in standby, do a frequent VACUUM or Increase the AUTOVACUUM
interval or changing the default values of AUTOVACUUM parameters of the
tables that are frequently updated/deleted as these operations generates dead
tuples.
 
Regards,
Hari Babu
Fujitsu Australia


--

Best regards,

Vitaliy Gomenyuk 
On Fri, May 26, 2017 at 10:12 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:


On Fri, May 26, 2017 at 9:42 PM, Vitaliy Gomenyuk <vgomenyuk@callfire.com> wrote:
Hello, 

So we have two reasons of current problem:

1. Changes to hint bits are not WAL logged => can be fixed by upgrading PostgreSQL at least to 9.4 version and turning on parameter "wal_log_hints";

2. Slave DB does not use the index hint bits while it is in hot standby mode => how this can be fixed?

The main reason for slave DB not use the index hint bits because of difference
in between xmin value of both master and standby, This is required for proper
MVCC behavior for the queries that are running on the standby.

Following is the comment from the code that explains more.

/*
* During recovery we ignore killed tuples and don't bother to kill them
* either. We do this because the xmin on the primary node could easily be
* later than the xmin on the standby node, so that what the primary
* thinks is killed is supposed to be visible on standby. So for correct
* MVCC for queries during recovery we must ignore these hints and check
* all tuples. Do *not* set ignore_killed_tuples to true when running in a
* transaction that was started during recovery. xactStartedInRecovery
* should not be altered by index AMs.
*/

But why can't the same mechanisms used on tables work on indexes, which is to wait out or conflict out any too-old transactions on the standby?  Is there a fundamental reason for that, or just that no one got around to implementing it?

Cheers,

Jeff
Jeff Janes <jeff.janes@gmail.com> writes:
> But why can't the same mechanisms used on tables work on indexes, which is
> to wait out or conflict out any too-old transactions on the standby?  Is
> there a fundamental reason for that, or just that no one got around to
> implementing it?

I think it's probably do-able, but not exactly trivial.  The existing
behavior was an expedient solution when we were first getting hot standby
to work at all.  Seems like it may be time to go back and improve it.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs