Thread: Connection function
Hello all, I am very new to postgresql and python, but I have written a function to retrieve some admin data from the server. I think this is more of a python question rather than a psycopg2 question, but the context is psycopg2. I was wondering if this is the right place to post it and ask for a critique? Thanks, Bill
On 03/24/2012 05:03 PM, Bill House wrote: > Hello all, > > I am very new to postgresql and python, but I have written a function to > retrieve some admin data from the server. > > I think this is more of a python question rather than a psycopg2 > question, but the context is psycopg2. > > I was wondering if this is the right place to post it and ask for a > critique? I would say yes, though I am not sure what the question is:)? > > Thanks, > > Bill > -- Adrian Klaver adrian.klaver@gmail.com
On 03/24/2012 08:25 PM, Adrian Klaver wrote: > On 03/24/2012 05:03 PM, Bill House wrote: >> Hello all, >> >> I am very new to postgresql and python, but I have written a function to >> retrieve some admin data from the server. >> >> I think this is more of a python question rather than a psycopg2 >> question, but the context is psycopg2. >> >> I was wondering if this is the right place to post it and ask for a >> critique? > > I would say yes, though I am not sure what the question is:)? > >> >> Thanks, >> >> Bill >> > > Thanks, Sometimes it is helpful to know how many records one is working with. From my reading in the psycopg2 documentation (2.4.4) and experimentation, I think that this information is only available if one uses an unnamed cursor (in which case all the data will have been fetched to the client) or with an named cursor (but only after all the data has been fetched). If there is a large amount of data, this may be problematic. Am I wrong about this? If so, please let me know. Anyway, the way I approached this issue was to count the records I was interested in and return the value. The code below works. I am interested in a critique of it from more experienced persons. Are there more elegant or efficient ways to do what I have done? In particular, I am interested in: 1) the python string substitution technique I used 2) the choice of the postgresql source of data I was seeking (the record count) 3) if there was an existing function I overlooked which does the same thing (in psycopg2 or postgresql) Regarding # 2 above, I had been relying on the table: pg_stat_user_tables and the n_live_tup value. This item has shown the correct count for many days. Today, after updating my machines and rebooting, the count is 0 and I am not really sure why. Records are still there; and even though I have selected data from the database, the count is still 0. So I took the more direct but time consuming approach of just counting the records in the table. This will be better anyway because many times one will not be selecting all the records in a table, so this function can be expanded to include a parameter for the selection criteria so that that identical criteria may be used again in the work portion of any operation. Thanks in advance for your input. Bill code (indent 2): ------------------------------------------------------- #!/usr/bin/python # z_psycopg2_row_cnt7f.py # A program to illustrate the methods of creating or supplying runtime variables # to functions using psycopg2 functions import sys import os import psycopg2 import psycopg2.extras import psycopg2.extensions #for string generator import string import random con = None def gen_random_str(str_len): """generates a random string str_len long - randomly chooses and assembles a number of characters from the digits and the upper and lower case alphabet """ rand_str = ''.join(random.choice(string.ascii_uppercase + string.digits + string.ascii_lowercase) for x in range(str_len)) return rand_str def rec_cnt_func(conn, table_name): """Count the records selected from the table, right now all of them""" #Generate a random string to use as a curson name (short right now for debugging purposes) #The commands following will use this string rand_curs_name = gen_random_str(3) #Substitute data into a command string and execute #create the cursor exec "%s = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)" % (rand_curs_name,) #count the records in the table exec "%s.execute(\"SELECT COUNT (*) FROM %s;\")" % (rand_curs_name, table_name) #fetch the data exec "row = %s.fetchone()" % (rand_curs_name,) #All done now, throw the cursor away exec "%s.close()" % (rand_curs_name,) #return the data to the calling program return row[0] def main(): #supply your connection string here current_dsn = "host=xx" try: con = psycopg2.connect(current_dsn) #supply your table name here table_name_in = "xxxx" #Call the function which will retrieve the data from postgresql rec_cnt_val = rec_cnt_func(con, table_name_in) #print the formatted data in an informative way print "Records in table: {}: {:> 20,}".format(table_name_in, rec_cnt_val) except psycopg2.DatabaseError, e: print "psycopg2 reports error: {}".format(e) sys.exit(1) finally: if con: con.close() return if __name__ == '__main__': main() ----------------------------------------------------- end code
On 03/25/2012 08:14 AM, Bill House wrote: > On 03/24/2012 08:25 PM, Adrian Klaver wrote: >> On 03/24/2012 05:03 PM, Bill House wrote: >>> Hello all, >>> >>> I am very new to postgresql and python, but I have written a function to >>> retrieve some admin data from the server. >>> >>> I think this is more of a python question rather than a psycopg2 >>> question, but the context is psycopg2. >>> >>> I was wondering if this is the right place to post it and ask for a >>> critique? >> >> I would say yes, though I am not sure what the question is:)? >> >>> >>> Thanks, >>> >>> Bill >>> >> >> > Thanks, > > Sometimes it is helpful to know how many records one is working with. > > From my reading in the psycopg2 documentation (2.4.4) and > experimentation, I think that this information is only available if one > uses an unnamed cursor (in which case all the data will have been > fetched to the client) or with an named cursor (but only after all the > data has been fetched). > > If there is a large amount of data, this may be problematic. > > Am I wrong about this? If so, please let me know. See comments in line. > > Anyway, the way I approached this issue was to count the records I was > interested in and return the value. > > The code below works. > > I am interested in a critique of it from more experienced persons. Are > there more elegant or efficient ways to do what I have done? > > In particular, I am interested in: > 1) the python string substitution technique I used That works, though it also possible to use dictionaries for the Psycopg strings see here: http://initd.org/psycopg/docs/usage.html#query-parameters I find the dict form easier to use for many parameters because it is self documenting. The rest of the string substitution seems to be tied to using exec which I do not think is necessary, see below for more. > 2) the choice of the postgresql source of data I was seeking (the > record count) > 3) if there was an existing function I overlooked which does the > same thing (in psycopg2 or postgresql) > > Regarding # 2 above, I had been relying on the table: > pg_stat_user_tables and the n_live_tup value. This item has shown the > correct count for many days. Today, after updating my machines and > rebooting, the count is 0 and I am not really sure why. Records are > still there; and even though I have selected data from the database, the > count is still 0. The above are system tables/views used by the statistics mechanism of Postgres. For a starting point on this look at: http://www.postgresql.org/docs/9.0/interactive/sql-analyze.html Short version running Analyze will repopulate the tables. > > So I took the more direct but time consuming approach of just counting > the records in the table. That is the only way to get a 'true' count due to the MVCC nature of Postgres. There has been a recent optimization of that, but I cannot remember what version of Postgres has it. > > This will be better anyway because many times one will not be selecting > all the records in a table, so this function can be expanded to include > a parameter for the selection criteria so that that identical criteria > may be used again in the work portion of any operation. > > Thanks in advance for your input. > > Bill > > code (indent 2): > ------------------------------------------------------- > #!/usr/bin/python > # z_psycopg2_row_cnt7f.py > # A program to illustrate the methods of creating or supplying runtime > variables > # to functions using psycopg2 functions > > import sys > import os > > import psycopg2 > import psycopg2.extras > import psycopg2.extensions > > #for string generator > import string > import random > > con = None > > def gen_random_str(str_len): > """generates a random string str_len long - > randomly chooses and assembles a number of characters from the digits > and the upper and lower case alphabet """ > rand_str = ''.join(random.choice(string.ascii_uppercase + > string.digits + string.ascii_lowercase) for x in range(str_len)) > return rand_str > > def rec_cnt_func(conn, table_name): > """Count the records selected from the table, right now all of them""" > #Generate a random string to use as a curson name (short right now for > debugging purposes) > #The commands following will use this string > rand_curs_name = gen_random_str(3) > #Substitute data into a command string and execute > #create the cursor > exec "%s = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)" % > (rand_curs_name,) > #count the records in the table > exec "%s.execute(\"SELECT COUNT (*) FROM %s;\")" % (rand_curs_name, > table_name) Not sure why you are using exec? Why not?: cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor,name=rand_curs_name) Same for below. > #fetch the data > exec "row = %s.fetchone()" % (rand_curs_name,) > #All done now, throw the cursor away > exec "%s.close()" % (rand_curs_name,) > #return the data to the calling program > return row[0] > > > def main(): > #supply your connection string here > current_dsn = "host=xx" > > try: > con = psycopg2.connect(current_dsn) > > #supply your table name here > table_name_in = "xxxx" > > #Call the function which will retrieve the data from postgresql > rec_cnt_val = rec_cnt_func(con, table_name_in) > > #print the formatted data in an informative way > print "Records in table: {}: {:> 20,}".format(table_name_in, > rec_cnt_val) > > except psycopg2.DatabaseError, e: > print "psycopg2 reports error: {}".format(e) > sys.exit(1) > > finally: > > if con: > con.close() > return > > if __name__ == '__main__': > main() > ----------------------------------------------------- > end code > -- Adrian Klaver adrian.klaver@gmail.com
On 03/25/2012 02:12 PM, Adrian Klaver wrote: > On 03/25/2012 08:14 AM, Bill House wrote: >> On 03/24/2012 08:25 PM, Adrian Klaver wrote: >>> On 03/24/2012 05:03 PM, Bill House wrote: >>>> Hello all, >>>> >>>> I am very new to postgresql and python, but I have written a >>>> function to >>>> retrieve some admin data from the server. >>>> >>>> I think this is more of a python question rather than a psycopg2 >>>> question, but the context is psycopg2. >>>> >>>> I was wondering if this is the right place to post it and ask for a >>>> critique? >>> >>> I would say yes, though I am not sure what the question is:)? >>> >>>> >>>> Thanks, >>>> >>>> Bill >>>> >>> >>> >> Thanks, >> >> Sometimes it is helpful to know how many records one is working with. >> >> From my reading in the psycopg2 documentation (2.4.4) and >> experimentation, I think that this information is only available if one >> uses an unnamed cursor (in which case all the data will have been >> fetched to the client) or with an named cursor (but only after all the >> data has been fetched). >> >> If there is a large amount of data, this may be problematic. >> >> Am I wrong about this? If so, please let me know. > > See comments in line. > >> >> Anyway, the way I approached this issue was to count the records I was >> interested in and return the value. >> >> The code below works. >> >> I am interested in a critique of it from more experienced persons. Are >> there more elegant or efficient ways to do what I have done? >> >> In particular, I am interested in: >> 1) the python string substitution technique I used > > That works, though it also possible to use dictionaries for the > Psycopg strings see here: > > http://initd.org/psycopg/docs/usage.html#query-parameters > > I find the dict form easier to use for many parameters because it is > self documenting. > > > The rest of the string substitution seems to be tied to using exec > which I do not think is necessary, see below for more. > > >> 2) the choice of the postgresql source of data I was seeking (the >> record count) >> 3) if there was an existing function I overlooked which does the >> same thing (in psycopg2 or postgresql) >> >> Regarding # 2 above, I had been relying on the table: >> pg_stat_user_tables and the n_live_tup value. This item has shown the >> correct count for many days. Today, after updating my machines and >> rebooting, the count is 0 and I am not really sure why. Records are >> still there; and even though I have selected data from the database, the >> count is still 0. > > The above are system tables/views used by the statistics mechanism of > Postgres. For a starting point on this look at: > > http://www.postgresql.org/docs/9.0/interactive/sql-analyze.html > > Short version running Analyze will repopulate the tables. > >> >> So I took the more direct but time consuming approach of just counting >> the records in the table. > > That is the only way to get a 'true' count due to the MVCC nature of > Postgres. There has been a recent optimization of that, but I cannot > remember what version of Postgres has it. > >> >> This will be better anyway because many times one will not be selecting >> all the records in a table, so this function can be expanded to include >> a parameter for the selection criteria so that that identical criteria >> may be used again in the work portion of any operation. >> >> Thanks in advance for your input. >> >> Bill >> >> code (indent 2): >> ------------------------------------------------------- >> #!/usr/bin/python >> # z_psycopg2_row_cnt7f.py >> # A program to illustrate the methods of creating or supplying runtime >> variables >> # to functions using psycopg2 functions >> >> import sys >> import os >> >> import psycopg2 >> import psycopg2.extras >> import psycopg2.extensions >> >> #for string generator >> import string >> import random >> >> con = None >> >> def gen_random_str(str_len): >> """generates a random string str_len long - >> randomly chooses and assembles a number of characters from the digits >> and the upper and lower case alphabet """ >> rand_str = ''.join(random.choice(string.ascii_uppercase + >> string.digits + string.ascii_lowercase) for x in range(str_len)) >> return rand_str >> >> def rec_cnt_func(conn, table_name): >> """Count the records selected from the table, right now all of >> them""" >> #Generate a random string to use as a curson name (short right now >> for >> debugging purposes) >> #The commands following will use this string >> rand_curs_name = gen_random_str(3) >> #Substitute data into a command string and execute >> #create the cursor >> exec "%s = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)" % >> (rand_curs_name,) >> #count the records in the table >> exec "%s.execute(\"SELECT COUNT (*) FROM %s;\")" % (rand_curs_name, >> table_name) > > Not sure why you are using exec? Why not?: > > cur = > conn.cursor(cursor_factory=psycopg2.extras.DictCursor,name=rand_curs_name) > > Same for below. > >> #fetch the data >> exec "row = %s.fetchone()" % (rand_curs_name,) >> #All done now, throw the cursor away >> exec "%s.close()" % (rand_curs_name,) >> #return the data to the calling program >> return row[0] >> >> >> def main(): >> #supply your connection string here >> current_dsn = "host=xx" >> >> try: >> con = psycopg2.connect(current_dsn) >> >> #supply your table name here >> table_name_in = "xxxx" >> >> #Call the function which will retrieve the data from postgresql >> rec_cnt_val = rec_cnt_func(con, table_name_in) >> >> #print the formatted data in an informative way >> print "Records in table: {}: {:> 20,}".format(table_name_in, >> rec_cnt_val) >> >> except psycopg2.DatabaseError, e: >> print "psycopg2 reports error: {}".format(e) >> sys.exit(1) >> >> finally: >> >> if con: >> con.close() >> return >> >> if __name__ == '__main__': >> main() >> ----------------------------------------------------- >> end code >> > > Thanks, I will do the reading you have suggested. Regarding your question about why the use of exec: I read where psycopg2 has quite a bit of capability for parameter substitution, but that functionality is limited to just psycopg2. I grew to rely on runtime construction, substitution and execution of commands in another environment and I was trying to mimic that behavior in a more general way in python. This way works, I was just wondering if there was a better way. Regarding the statistics, I have since learned that the command: select reltuples from pg_class where relname = 'your_file_name'; Will give a record count. I don't know how well it will keep up in a dynamic environment but it's a start. Thanks, Bill
On 03/25/2012 06:00 PM, Bill House wrote: > >> > Thanks, I will do the reading you have suggested. > > Regarding your question about why the use of exec: I read where psycopg2 > has quite a bit of capability for parameter substitution, but that > functionality is limited to just psycopg2. The benefit of using the psycopg2 parameter substitution is it takes care of escaping strings and avoiding SQL injection. See here for humorous take on that: http://xkcd.com/327/ > > I grew to rely on runtime construction, substitution and execution of > commands in another environment and I was trying to mimic that behavior > in a more general way in python. This way works, I was just wondering > if there was a better way. I have been told over the years that using exec is not a good idea. For more detail: http://books.google.com/books?id=JnR9hQA3SncC&pg=PA328&lpg=PA328&dq=python++exec+bad&source=bl&ots=Jb4TLr-03u&sig=_Jz5cTdPZq1PeU1vajQgaDIgtgo&hl=en&sa=X&ei=GslvT-q1FuzYiAKI85zDBQ&ved=0CGMQ6AEwCA Python has a variety of ways to work with strings. I found the following a good introduction to Python: http://shop.oreilly.com/product/9780596158071.do > > Regarding the statistics, I have since learned that the command: > > select reltuples from pg_class where relname = 'your_file_name'; > > Will give a record count. I don't know how well it will keep up in a > dynamic environment but it's a start. If you have not already found it, look at: http://wiki.postgresql.org/wiki/Slow_Counting Not sure what your use case for the count is but remember when using named cursors: " Named cursors are also iterable like regular cursors. Note however that before Psycopg 2.4 iteration was performed fetching one record at time from the backend, resulting in a large overhead. The attribute itersize now controls how many records are fetched at time during the iteration: the default value of 2000 allows to fetch about 100KB per roundtrip assuming records of 10-20 columns of mixed number and strings; you may decrease this value if you are dealing with huge records." > > Thanks, > > Bill > > > > -- Adrian Klaver adrian.klaver@gmail.com
On 03/25/2012 09:00 PM, Bill House wrote: > Regarding the statistics, I have since learned that the command: > > select reltuples from pg_class where relname = 'your_file_name'; > > Will give a record count. I don't know how well it will keep up in a > dynamic environment but it's a start. reltuples "is only an estimate used by the planner." See http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html So if you want a more or less accurate count of rows at that given point in time, then you should not depend on it. As for the optimization of SELECT count(*) which Adrian had mentioned, it's a 9.2 feature (as a result of index-only scans). See https://wiki.postgresql.org/wiki/Todo under Cache Usage. Joe
On 25/03/12 17:14, Bill House wrote: [snip] > Sometimes it is helpful to know how many records one is working with. > > From my reading in the psycopg2 documentation (2.4.4) and > experimentation, I think that this information is only available if one > uses an unnamed cursor (in which case all the data will have been > fetched to the client) or with an named cursor (but only after all the > data has been fetched). > > If there is a large amount of data, this may be problematic. Not at all. You just need to MOVE to the end of the result set and count how many rows you skipped. At the moment psycopg doesn't use MOVE return value (that tells you what you want to know) but can just .execute() the MOVE and access the result on the cursor. Also, we could probably extend scroll() to return the number of scrolled rows in the future. > In particular, I am interested in: > 1) the python string substitution technique I used Use bound parameters: http://packages.python.org/psycopg2/usage.html#query-parameters federico
Thanks for all the help from everyone; I have a lot of studying to do. Regards Bill On 03/26/2012 02:52 AM, Federico Di Gregorio wrote: > On 25/03/12 17:14, Bill House wrote: > [snip] >> Sometimes it is helpful to know how many records one is working with. >> >> From my reading in the psycopg2 documentation (2.4.4) and >> experimentation, I think that this information is only available if one >> uses an unnamed cursor (in which case all the data will have been >> fetched to the client) or with an named cursor (but only after all the >> data has been fetched). >> >> If there is a large amount of data, this may be problematic. > > Not at all. You just need to MOVE to the end of the result set and > count how many rows you skipped. At the moment psycopg doesn't use > MOVE return value (that tells you what you want to know) but can just > .execute() the MOVE and access the result on the cursor. Also, we > could probably extend scroll() to return the number of scrolled rows > in the future. > >> In particular, I am interested in: >> 1) the python string substitution technique I used > > Use bound parameters: > > http://packages.python.org/psycopg2/usage.html#query-parameters > > federico >