Re: Connection function - Mailing list psycopg

From Bill House
Subject Re: Connection function
Date
Msg-id 4F6FBFBD.9060408@house-grp.net
Whole thread Raw
In response to Re: Connection function  (Adrian Klaver <adrian.klaver@gmail.com>)
Responses Re: Connection function  (Adrian Klaver <adrian.klaver@gmail.com>)
Re: Connection function  (Joe Abbate <jma@freedomcircle.com>)
List psycopg
On 03/25/2012 02:12 PM, Adrian Klaver wrote:
> On 03/25/2012 08:14 AM, Bill House wrote:
>> On 03/24/2012 08:25 PM, Adrian Klaver wrote:
>>> On 03/24/2012 05:03 PM, Bill House wrote:
>>>> Hello all,
>>>>
>>>> I am very new to postgresql and python, but I have written a
>>>> function to
>>>> retrieve some admin data from the server.
>>>>
>>>> I think this is more of a python question rather than a psycopg2
>>>> question, but the context is psycopg2.
>>>>
>>>> I was wondering if this is the right place to post it and ask for a
>>>> critique?
>>>
>>> I would say yes, though I am not sure what the question is:)?
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Bill
>>>>
>>>
>>>
>> Thanks,
>>
>> Sometimes it is helpful to know how many records one is working with.
>>
>>  From my reading in the psycopg2 documentation (2.4.4) and
>> experimentation, I think that this information is only available if one
>> uses an unnamed cursor (in which case all the data will have been
>> fetched to the client) or with an named cursor (but only after all the
>> data has been fetched).
>>
>> If there is a large amount of data, this may be problematic.
>>
>> Am I wrong about this?  If so, please let me know.
>
> See comments in line.
>
>>
>> Anyway, the way I approached this issue was to count the records I was
>> interested in and return the value.
>>
>> The code below works.
>>
>> I am interested in a critique of it from more experienced persons.  Are
>> there more elegant or efficient ways to do what I have done?
>>
>> In particular, I am interested in:
>>      1) the python string substitution technique I used
>
> That works, though it also possible to use dictionaries for the
> Psycopg strings see here:
>
> http://initd.org/psycopg/docs/usage.html#query-parameters
>
> I find the dict form easier to use for many parameters because it is
> self documenting.
>
>
> The rest of the string substitution seems to be tied to using exec
> which I do not think is necessary, see below for more.
>
>
>>      2) the choice of the postgresql source of data I was seeking (the
>> record count)
>>      3) if there was an existing function I overlooked which does the
>> same thing (in psycopg2 or postgresql)
>>
>> Regarding # 2 above, I had been relying on the table:
>> pg_stat_user_tables and the n_live_tup value.  This item has shown the
>> correct count for many days.  Today, after updating my machines and
>> rebooting, the count is 0 and I am not really sure why.  Records are
>> still there; and even though I have selected data from the database, the
>> count is still 0.
>
> The above are system tables/views used by the statistics mechanism of
> Postgres. For a starting point on this look at:
>
> http://www.postgresql.org/docs/9.0/interactive/sql-analyze.html
>
> Short version running Analyze will repopulate the tables.
>
>>
>> So I took the more direct but time consuming approach of just counting
>> the records in the table.
>
> That is the only way to get a 'true' count due to the MVCC nature of
> Postgres. There has been a recent optimization of that, but I cannot
> remember what version of Postgres has it.
>
>>
>> This will be better anyway because many times one will not be selecting
>> all the records in a table, so this function can be expanded to include
>> a parameter for the selection criteria so that that identical criteria
>> may be used again in the work portion of any operation.
>>
>> Thanks in advance for your input.
>>
>> Bill
>>
>> code (indent 2):
>> -------------------------------------------------------
>> #!/usr/bin/python
>> # z_psycopg2_row_cnt7f.py
>> # A program to illustrate the methods of creating or supplying runtime
>> variables
>> # to functions using psycopg2 functions
>>
>> import sys
>> import os
>>
>> import psycopg2
>> import psycopg2.extras
>> import psycopg2.extensions
>>
>> #for string generator
>> import string
>> import random
>>
>> con = None
>>
>> def gen_random_str(str_len):
>>    """generates a random string str_len long -
>>    randomly chooses and assembles a number of characters from the digits
>> and the upper and lower case alphabet """
>>    rand_str = ''.join(random.choice(string.ascii_uppercase +
>> string.digits + string.ascii_lowercase) for x in range(str_len))
>>    return rand_str
>>
>> def rec_cnt_func(conn, table_name):
>>    """Count the records selected from the table, right now all of
>> them"""
>>    #Generate a random string to use as a curson name (short right now
>> for
>> debugging purposes)
>>    #The commands following will use this string
>>    rand_curs_name = gen_random_str(3)
>>    #Substitute data into a command string and execute
>>    #create the cursor
>>    exec "%s = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)" %
>> (rand_curs_name,)
>>    #count the records in the table
>>    exec "%s.execute(\"SELECT COUNT (*) FROM %s;\")" % (rand_curs_name,
>> table_name)
>
> Not sure why you are using exec? Why not?:
>
> cur =
> conn.cursor(cursor_factory=psycopg2.extras.DictCursor,name=rand_curs_name)
>
> Same for below.
>
>>    #fetch the data
>>    exec "row = %s.fetchone()" % (rand_curs_name,)
>>    #All done now, throw the cursor away
>>    exec "%s.close()" % (rand_curs_name,)
>>    #return the data to the calling program
>>    return row[0]
>>
>>
>> def main():
>>    #supply your connection string here
>>    current_dsn = "host=xx"
>>
>>    try:
>>      con = psycopg2.connect(current_dsn)
>>
>>      #supply your table name here
>>      table_name_in = "xxxx"
>>
>>      #Call the function which will retrieve the data from postgresql
>>      rec_cnt_val = rec_cnt_func(con, table_name_in)
>>
>>      #print the formatted data in an informative way
>>      print "Records in table: {}: {:>  20,}".format(table_name_in,
>> rec_cnt_val)
>>
>>    except psycopg2.DatabaseError, e:
>>      print "psycopg2 reports error: {}".format(e)
>>      sys.exit(1)
>>
>>    finally:
>>
>>      if con:
>>        con.close()
>>      return
>>
>> if __name__ == '__main__':
>>    main()
>> -----------------------------------------------------
>> end code
>>
>
>
Thanks, I will do the reading you have suggested.

Regarding your question about why the use of exec: I read where psycopg2
has quite a bit of capability for parameter substitution, but that
functionality is limited to just psycopg2.

I grew to rely on runtime construction, substitution and execution of
commands in another environment and I was trying to mimic that behavior
in a more general way in python.  This way works, I was just wondering
if there was a better way.

Regarding the statistics, I have since learned that the command:

    select reltuples from pg_class where relname = 'your_file_name';

Will give a record count.  I don't know how well it will keep up in a
dynamic environment but it's a start.

Thanks,

Bill




psycopg by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Connection function
Next
From: Adrian Klaver
Date:
Subject: Re: Connection function