Home > mailing lists

Extending pg_stat_statements to expose queryid - Mailing list pgsql-hackers

From	Sameer Thakur
Subject	Extending pg_stat_statements to expose queryid
Date	September 5, 2013 12:45:01
Msg-id	CABzZFEv_XaXu_4T8m3SCSvxSg_wcHB9RXOJrg5kmRpiiRrwYTQ@mail.gmail.com Whole thread Raw
List	pgsql-hackers

Tree view

<div dir="ltr"><p class="">Hello All,<p class="">I am trying to revive the discussion about exposing queryid in
pg_stat_statements.<pclass="">I did find the same request posted on hackers @ <a
href="http://www.postgresql.org/message-id/CABUevExLnb6xJwS=8rTaLOfDOS-tFm09==Z2M_vz5hhFkgWBTg@mail.gmail.com">http://www.postgresql.org/message-id/CABUevExLnb6xJwS=8rTaLOfDOS-tFm09==Z2M_vz5hhFkgWBTg@mail.gmail.com</a><p
class="">and<pclass=""><a
href="http://www.postgresql.org/message-id/CACN56+NLMTwHg8eQQqNYzqe2Q0nEGJoKmGFiUSK_aoHw627Q8Q@mail.gmail.com">http://www.postgresql.org/message-id/CACN56+NLMTwHg8eQQqNYzqe2Q0nEGJoKmGFiUSK_aoHw627Q8Q@mail.gmail.com</a><p
class=""><pclass=""> From the discussions I concluded<p class=""><p class="">1. The main use case for exposing queryid,
isit being a better substitute to hashing the query text of a pg_stat_statements snapshot, to make a candidate key.
Problemsoccur when hash value should be different even if query text is same. For example when a table referred in a
queryis dropped and recreated or when the query text is same on different schemas and schema name is not included in
querytext. <p class="">2. Exposing queryid was proposed earlier but was not accepted. The main reason was that queryid
couldbe unstable as well. Since  queryid was derived from hashing query tree and query tree could undergo changes
betweenminor PostgreSQL releases, meant the queryid for same query could be different between releases, resulting in
incorrectstatement statistics collection. <p class="">3. Another problem is to distinguish between queries whose
statisticsare continuously maintained and queries which are intermittent, whose statistics might be silent reset,
withoutthe reporting tool being wiser.<p class="">4. A solution to avoid misrepresentation of intermittent queries as
consistentqueries would be to assign a unique number to each new row and once that row is discarded, the unique number
cannotbe reused. The drawbacks here is possible collision of unique values generated. <p class="">5. A  patch
implementingsolution for identifying intermittent query is @ <a
href="https://github.com/fdr/postgres/branches/error-prop-pg_stat_statements-v2">https://github.com/fdr/postgres/branches/error-prop-pg_stat_statements-v2</a>.
<pclass="">The solution avoids using a counter, and achieves the same result by the property that intermittent queries
accumulateerrors due to eviction from hashtable while consistent queries do not. Error accumulation would be the
parameterby which a reporting tool can figure out if there was eviction of queries between snapshots.<p class="">6. To
addressthe problem of unstable queryid generated from query tree, it was proposed to eliminate any possible
misunderstandingthat queryid will remain the same between releases, by xoring the hash from query tree with statistics
sessionid.This also helps in all cases where the statistics file is reset like crash recovery,recovery mode, ensuring a
newhash value for reset statistics.<p class="">To avoid increasing the chance of collision, a longer session key and
paddingthe queryid can be done to complete the XOR. Implementation of this is @ <a
href="https://github.com/fdr/postgres/branches/pg_stat_statements-identification-v3">https://github.com/fdr/postgres/branches/pg_stat_statements-identification-v3</a><p
class="">7.The patch pg_stat_statements-identification-v3 was returned with feedback for more documentation in
commitfest2013-01.<p class="">Questions:<p class=""><p class="">1. Is there a plan to re-introduce this patch? The code
seemsto be documented.<p class="">2. There was mention of further testing of error propagation using hooks. Could this
beelaborated?<p class="">3. There was a use case that exposing queryid could be used to aggregate statistics across WAL
basedreplication clusters. But now that queryid is derived from statistics session id, which is randomly generated,
thisuse case is still not addressed. Is this correct?<p class=""><p class=""> Regards<p class="">Sameer<p
class=""></div>

pgsql-hackers by date:

From: Pavel Stehule
Date: 05 September 2013, 11:43:14
Subject: Re: Improving avg performance for numeric

From: Heikki Linnakangas
Date: 05 September 2013, 13:29:43
Subject: Re: Analysis on backend-private memory usage (and a patch)

Extending pg_stat_statements to expose queryid - Mailing list pgsql-hackers

Previous

Next