Thread: BUG #18527: Imported data using a csv file and made a latest entry into same table using a form. Got an issue!

The following bug has been logged on the website:

Bug reference:      18527
Logged by:          Vishnu Vardhan Vithanala
Email address:      vishnu.vardhann05@gmail.com
PostgreSQL version: 16.3
Operating system:   Windows
Description:

When I bulk uploaded some data using a csv file, it was all good. There were
543 rows in order and it went into the table and sat cleanly!

I've connected the same table to an input form. When I hit submit, the
latest entry which is supposed to be the last entry with a number id of 544
didn't go to the end of the table (bottom most) but instead sat on top of
all previous bulk uploaded entries.

Each entry has this kind of standard ID: DCMARIN10042024001
Subsequent entries will have the same ID with last 3 digits incrementing by
1.
DCMARIN10042024002
DCMARIN10042024003
...
...
DCMARIN10042024543

These ID's are generated using the below python code (flask):
def generate_unique_id(dept, sub_dept, table):
    ProjectCode = ""

    # Automatic Start Date capture - Production
    #start_date = date.today()

    # Start Date - Testing
    start_date = datetime.strptime('01-08-2024', '%d-%m-%Y').date()

    # To check if there is at least 1 entry in the DB or not
    first_entry = table.query.first()
    lastRow = str(table.query.order_by(table.id.desc()).first().id) if
first_entry else None

    # Format Date in lastRow from String to Date Object
    prev_date = datetime.strptime(lastRow[7:15], "%d%m%Y").date() if lastRow
is not None else None

    # If today > latest date that is already existing in DB - For first
entry of today
    if first_entry is None or start_date > prev_date:
        ProjectCode =
f"{dept}{sub_dept}{start_date.strftime('%d%m%Y')}001"
    
    # Subsequent entries for the same date
    elif start_date == prev_date:
        ProjectCode =
f"{dept}{sub_dept}{start_date.strftime('%d%m%Y')}{str(int(lastRow[15:18]) +
1).zfill(3)}"
    
    return ProjectCode

The difference in ordering of entries between importing data from csv and
manually creating a new entry into the table is causing a hindrance in
generating proper project codes. When ordering the rows based on
project_code in either ascending or descending order, the latest entry which
I made using the form is not filtering properly. In short, the order is not
being applied to it! This is causing the generation of wrong & repeated
project code and dumping into a column that acts as primary key and which by
default is supposed to be unique! It's throwing an error that the key
already exists!

This issue based on my observation is due to the variation of data entry
into the table - between bulk upload and manual upload (using a form). When
I dumped the same data by using some python code based on pandas library and
running it through the csv file and using flask to create and save entries,
the issue seems to be resolved!

I'm not sure if this is a bug or not but it seemed like a problem that is
worth solving!


On Mon, Jul 1, 2024 at 8:32 AM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      18527
Logged by:          Vishnu Vardhan Vithanala
Email address:      vishnu.vardhann05@gmail.com
PostgreSQL version: 16.3
Operating system:   Windows
Description:       

When I bulk uploaded some data using a csv file, it was all good. There were
543 rows in order and it went into the table and sat cleanly!

There is no such thing as "in order" when it comes to the contents of a table.

The rest of what you posted I cannot follow but unless you can distill this down to a test case whereby you only use psql and server features it is not going to be recognized as a bug in PostgreSQL.

If you need help writing programs using PostgreSQL the -general mailing list is where to seek it.  Right now it seems you have made incorrect assumptions and built on top of those.  That will require some work on your end to redesign things to work in a manner consistent with the non-ordering of tables in PostgreSQL.  Usually just adding an order by clause to your queries solves the issue.

David J.

On 2024-07-01 17:08 +0200, PG Bug reporting form wrote:
> When I bulk uploaded some data using a csv file, it was all good. There were
> 543 rows in order and it went into the table and sat cleanly!
> 
> I've connected the same table to an input form. When I hit submit, the
> latest entry which is supposed to be the last entry with a number id of 544
> didn't go to the end of the table (bottom most) but instead sat on top of
> all previous bulk uploaded entries.
> 
> Each entry has this kind of standard ID: DCMARIN10042024001
> Subsequent entries will have the same ID with last 3 digits incrementing by
> 1.
> DCMARIN10042024002
> DCMARIN10042024003
> ...
> ...
> DCMARIN10042024543
> 
> These ID's are generated using the below python code (flask):
> def generate_unique_id(dept, sub_dept, table):
>     ProjectCode = ""
> 
>     # Automatic Start Date capture - Production
>     #start_date = date.today()
> 
>     # Start Date - Testing
>     start_date = datetime.strptime('01-08-2024', '%d-%m-%Y').date()
> 
>     # To check if there is at least 1 entry in the DB or not
>     first_entry = table.query.first()
>     lastRow = str(table.query.order_by(table.id.desc()).first().id) if
> first_entry else None

Do you expect that lastRow is the one with the latest date encoded in
its ID?  If so, then date format %d%m%Y won't give you the expected
chronological ordering.  Use %Y%m%d (big-endian) instead.

>     # Format Date in lastRow from String to Date Object
>     prev_date = datetime.strptime(lastRow[7:15], "%d%m%Y").date() if lastRow
> is not None else None
> 
>     # If today > latest date that is already existing in DB - For first
> entry of today
>     if first_entry is None or start_date > prev_date:
>         ProjectCode =
> f"{dept}{sub_dept}{start_date.strftime('%d%m%Y')}001"
>     
>     # Subsequent entries for the same date
>     elif start_date == prev_date:
>         ProjectCode =
> f"{dept}{sub_dept}{start_date.strftime('%d%m%Y')}{str(int(lastRow[15:18]) +
> 1).zfill(3)}"
>     

This will generate duplicate IDs if you generate more than 1000 IDs per
day because this uses some weird modular arithmetic with
str(int(lastRow[15:18])+1).zfill(3), which will eventually generate 998,
999, 1000, 101, 102, ...

The row with number 1000 won't necessarily be the "last row", even with
date format %Y%m%d.  For example, 101 will come before 1000:

    test=> select unnest('{DCMARIN202404101000,DCMARIN20240410101}'::text[]) order by 1 desc;
           unnest
    ---------------------
     DCMARIN20240410101
     DCMARIN202404101000
    (2 rows)

>     return ProjectCode

-- 
Erik