Extending pg_stat_statements to expose queryid - Mailing list pgsql-hackers
From | Sameer Thakur |
---|---|
Subject | Extending pg_stat_statements to expose queryid |
Date | |
Msg-id | CABzZFEv_XaXu_4T8m3SCSvxSg_wcHB9RXOJrg5kmRpiiRrwYTQ@mail.gmail.com Whole thread Raw |
List | pgsql-hackers |
<div dir="ltr"><p class="">Hello All,<p class="">I am trying to revive the discussion about exposing queryid in pg_stat_statements.<pclass="">I did find the same request posted on hackers @ <a href="http://www.postgresql.org/message-id/CABUevExLnb6xJwS=8rTaLOfDOS-tFm09==Z2M_vz5hhFkgWBTg@mail.gmail.com">http://www.postgresql.org/message-id/CABUevExLnb6xJwS=8rTaLOfDOS-tFm09==Z2M_vz5hhFkgWBTg@mail.gmail.com</a><p class="">and<pclass=""><a href="http://www.postgresql.org/message-id/CACN56+NLMTwHg8eQQqNYzqe2Q0nEGJoKmGFiUSK_aoHw627Q8Q@mail.gmail.com">http://www.postgresql.org/message-id/CACN56+NLMTwHg8eQQqNYzqe2Q0nEGJoKmGFiUSK_aoHw627Q8Q@mail.gmail.com</a><p class=""><pclass=""> From the discussions I concluded<p class=""><p class="">1. The main use case for exposing queryid, isit being a better substitute to hashing the query text of a pg_stat_statements snapshot, to make a candidate key. Problemsoccur when hash value should be different even if query text is same. For example when a table referred in a queryis dropped and recreated or when the query text is same on different schemas and schema name is not included in querytext. <p class="">2. Exposing queryid was proposed earlier but was not accepted. The main reason was that queryid couldbe unstable as well. Since queryid was derived from hashing query tree and query tree could undergo changes betweenminor PostgreSQL releases, meant the queryid for same query could be different between releases, resulting in incorrectstatement statistics collection. <p class="">3. Another problem is to distinguish between queries whose statisticsare continuously maintained and queries which are intermittent, whose statistics might be silent reset, withoutthe reporting tool being wiser.<p class="">4. A solution to avoid misrepresentation of intermittent queries as consistentqueries would be to assign a unique number to each new row and once that row is discarded, the unique number cannotbe reused. The drawbacks here is possible collision of unique values generated. <p class="">5. A patch implementingsolution for identifying intermittent query is @ <a href="https://github.com/fdr/postgres/branches/error-prop-pg_stat_statements-v2">https://github.com/fdr/postgres/branches/error-prop-pg_stat_statements-v2</a>. <pclass="">The solution avoids using a counter, and achieves the same result by the property that intermittent queries accumulateerrors due to eviction from hashtable while consistent queries do not. Error accumulation would be the parameterby which a reporting tool can figure out if there was eviction of queries between snapshots.<p class="">6. To addressthe problem of unstable queryid generated from query tree, it was proposed to eliminate any possible misunderstandingthat queryid will remain the same between releases, by xoring the hash from query tree with statistics sessionid.This also helps in all cases where the statistics file is reset like crash recovery,recovery mode, ensuring a newhash value for reset statistics.<p class="">To avoid increasing the chance of collision, a longer session key and paddingthe queryid can be done to complete the XOR. Implementation of this is @ <a href="https://github.com/fdr/postgres/branches/pg_stat_statements-identification-v3">https://github.com/fdr/postgres/branches/pg_stat_statements-identification-v3</a><p class="">7.The patch pg_stat_statements-identification-v3 was returned with feedback for more documentation in commitfest2013-01.<p class="">Questions:<p class=""><p class="">1. Is there a plan to re-introduce this patch? The code seemsto be documented.<p class="">2. There was mention of further testing of error propagation using hooks. Could this beelaborated?<p class="">3. There was a use case that exposing queryid could be used to aggregate statistics across WAL basedreplication clusters. But now that queryid is derived from statistics session id, which is randomly generated, thisuse case is still not addressed. Is this correct?<p class=""><p class=""> Regards<p class="">Sameer<p class=""></div>
pgsql-hackers by date: