Thread: Metadata performance

Metadata performance

From
"ldh@laurent-hasson.com"
Date:

Hello,

 

I am writing code that gets the metadata for a database. I use DatabaseMetaData to get tables, their columns, indices, PKs and FKs. I have instrumented my code and get the following metrics:

 

Schemas:    19 in    26ms or  0.3%

Tables :   280 in     3ms or  0.0%

Columns: 5,638 in   922ms or 10.8%

PK     :   280 in   630ms or  7.4%

FK-Out :   351 in 3,049ms or 35.8%

FK-In  :   353 in 2,735ms or 32.1%

Indices:   768 in 1,141ms or 13.4%

------------------------------------

Total: 8,509ms

 

I am on Postgres 12.3 with driver 42.2.18.

 

My questions are:

  • Are the methods DatabaseMetaData.getImportedKeys and getExportedKeys expected to take so long?
  • If so, is there another method to get that information that would perform much better?
  • DatabaseMetaData.getIndexInfo also feels like it should perform better

 

This utility I am working on is run often during development time. It takes a total of 9s and change, and out of that, 8.5s is spent just on fetching the meta-data 😊

 

I have tried multithreading this by distributing table meta-data fetching across multiple connections (I know the driver would block if multiple threads were to use a single connection), and to my surprise, seems that the driver also serializes all the requests across multiple connections: the meta-data APIs took the same amount of time over a single connection, or distributed over 4.

 

Any help or pointers are much appreciated.

 

Thank you,

Laurent.

Re: Metadata performance

From
Dave Cramer
Date:


On Wed, 2 Dec 2020 at 17:53, ldh@laurent-hasson.com <ldh@laurent-hasson.com> wrote:

Hello,

 

I am writing code that gets the metadata for a database. I use DatabaseMetaData to get tables, their columns, indices, PKs and FKs. I have instrumented my code and get the following metrics:

 

Schemas:    19 in    26ms or  0.3%

Tables :   280 in     3ms or  0.0%

Columns: 5,638 in   922ms or 10.8%

PK     :   280 in   630ms or  7.4%

FK-Out :   351 in 3,049ms or 35.8%

FK-In  :   353 in 2,735ms or 32.1%

Indices:   768 in 1,141ms or 13.4%

------------------------------------

Total: 8,509ms

 

I am on Postgres 12.3 with driver 42.2.18.

 

My questions are:

  • Are the methods DatabaseMetaData.getImportedKeys and getExportedKeys expected to take so long?
  • If so, is there another method to get that information that would perform much better?
  • DatabaseMetaData.getIndexInfo also feels like it should perform better

 

It is not fast. I presume you have a lot of tables and columns.
 

This utility I am working on is run often during development time. It takes a total of 9s and change, and out of that, 8.5s is spent just on fetching the meta-data 😊

 

I have tried multithreading this by distributing table meta-data fetching across multiple connections (I know the driver would block if multiple threads were to use a single connection), and to my surprise, seems that the driver also serializes all the requests across multiple connections: the meta-data APIs took the same amount of time over a single connection, or distributed over 4.

I find that hard to believe. 

Care to share your schema ?

Dave

 

Any help or pointers are much appreciated.

 

Thank you,

Laurent.

AW: Metadata performance

From
Daniel Migowski
Date:

Hello,

 

I don’t know i it helps but PostgreSQL has some ugly slow information_schema views that fetch from the tables in schema pg_catalog. I created some views from there directly which were mostly instant afterwards. Maybe you can refactor your code to query that catalog directly instead of using the Metadata classes in the driver (or maybe you can improve the driver).

 

Kind regards,

Daniel Migowski

 

 

 

 

Von: ldh@laurent-hasson.com <ldh@laurent-hasson.com>
Gesendet: Mittwoch, 2. Dezember 2020 23:53
An: pgsql-jdbc@lists.postgresql.org
Betreff: Metadata performance

 

Hello,

 

I am writing code that gets the metadata for a database. I use DatabaseMetaData to get tables, their columns, indices, PKs and FKs. I have instrumented my code and get the following metrics:

 

Schemas:    19 in    26ms or  0.3%

Tables :   280 in     3ms or  0.0%

Columns: 5,638 in   922ms or 10.8%

PK     :   280 in   630ms or  7.4%

FK-Out :   351 in 3,049ms or 35.8%

FK-In  :   353 in 2,735ms or 32.1%

Indices:   768 in 1,141ms or 13.4%

------------------------------------

Total: 8,509ms

 

I am on Postgres 12.3 with driver 42.2.18.

 

My questions are:

  • Are the methods DatabaseMetaData.getImportedKeys and getExportedKeys expected to take so long?
  • If so, is there another method to get that information that would perform much better?
  • DatabaseMetaData.getIndexInfo also feels like it should perform better

 

This utility I am working on is run often during development time. It takes a total of 9s and change, and out of that, 8.5s is spent just on fetching the meta-data 😊

 

I have tried multithreading this by distributing table meta-data fetching across multiple connections (I know the driver would block if multiple threads were to use a single connection), and to my surprise, seems that the driver also serializes all the requests across multiple connections: the meta-data APIs took the same amount of time over a single connection, or distributed over 4.

 

Any help or pointers are much appreciated.

 

Thank you,

Laurent.

Re: Metadata performance

From
Dave Cramer
Date:


On Thu, 3 Dec 2020 at 05:36, Daniel Migowski <dmigowski@ikoffice.de> wrote:

Hello,

 

I don’t know i it helps but PostgreSQL has some ugly slow information_schema views that fetch from the tables in schema pg_catalog. I created some views from there directly which were mostly instant afterwards. Maybe you can refactor your code to query that catalog directly instead of using the Metadata classes in the driver (or maybe you can improve the driver).

 

Kind regards,

Daniel Migowski


I'm confused . You somehow made the slow information_schema views faster ?


Dave Cramer
www.postgres.rocks 

 

 

 

 

Von: ldh@laurent-hasson.com <ldh@laurent-hasson.com>
Gesendet: Mittwoch, 2. Dezember 2020 23:53
An: pgsql-jdbc@lists.postgresql.org
Betreff: Metadata performance

 

Hello,

 

I am writing code that gets the metadata for a database. I use DatabaseMetaData to get tables, their columns, indices, PKs and FKs. I have instrumented my code and get the following metrics:

 

Schemas:    19 in    26ms or  0.3%

Tables :   280 in     3ms or  0.0%

Columns: 5,638 in   922ms or 10.8%

PK     :   280 in   630ms or  7.4%

FK-Out :   351 in 3,049ms or 35.8%

FK-In  :   353 in 2,735ms or 32.1%

Indices:   768 in 1,141ms or 13.4%

------------------------------------

Total: 8,509ms

 

I am on Postgres 12.3 with driver 42.2.18.

 

My questions are:

  • Are the methods DatabaseMetaData.getImportedKeys and getExportedKeys expected to take so long?
  • If so, is there another method to get that information that would perform much better?
  • DatabaseMetaData.getIndexInfo also feels like it should perform better

 

This utility I am working on is run often during development time. It takes a total of 9s and change, and out of that, 8.5s is spent just on fetching the meta-data 😊

 

I have tried multithreading this by distributing table meta-data fetching across multiple connections (I know the driver would block if multiple threads were to use a single connection), and to my surprise, seems that the driver also serializes all the requests across multiple connections: the meta-data APIs took the same amount of time over a single connection, or distributed over 4.

 

Any help or pointers are much appreciated.

 

Thank you,

Laurent.

RE: Metadata performance

From
"ldh@laurent-hasson.com"
Date:

I have looked at going directly against the tables, but trying to stay within the confines of JDBC proper. I could create a class system with plugins on a per-database level so this behavior could be overridden with custom logic, but seemed like overkill. I may have no choice.

 

As for the JDBC driver, what do you mean improve it? I am not familiar with the innards of it so I suspect this is not an easy task.

 

 

From: Daniel Migowski <dmigowski@ikoffice.de>
Sent: Thursday, December 3, 2020 05:37
To: ldh@laurent-hasson.com; pgsql-jdbc@lists.postgresql.org
Subject: AW: Metadata performance

 

Hello,

 

I don’t know i it helps but PostgreSQL has some ugly slow information_schema views that fetch from the tables in schema pg_catalog. I created some views from there directly which were mostly instant afterwards. Maybe you can refactor your code to query that catalog directly instead of using the Metadata classes in the driver (or maybe you can improve the driver).

 

Kind regards,

Daniel Migowski

 

 

 

 

Von: ldh@laurent-hasson.com <ldh@laurent-hasson.com>
Gesendet: Mittwoch, 2. Dezember 2020 23:53
An: pgsql-jdbc@lists.postgresql.org
Betreff: Metadata performance

 

Hello,

 

I am writing code that gets the metadata for a database. I use DatabaseMetaData to get tables, their columns, indices, PKs and FKs. I have instrumented my code and get the following metrics:

 

Schemas:    19 in    26ms or  0.3%

Tables :   280 in     3ms or  0.0%

Columns: 5,638 in   922ms or 10.8%

PK     :   280 in   630ms or  7.4%

FK-Out :   351 in 3,049ms or 35.8%

FK-In  :   353 in 2,735ms or 32.1%

Indices:   768 in 1,141ms or 13.4%

------------------------------------

Total: 8,509ms

 

I am on Postgres 12.3 with driver 42.2.18.

 

My questions are:

-          Are the methods DatabaseMetaData.getImportedKeys and getExportedKeys expected to take so long?

-          If so, is there another method to get that information that would perform much better?

-          DatabaseMetaData.getIndexInfo also feels like it should perform better

 

This utility I am working on is run often during development time. It takes a total of 9s and change, and out of that, 8.5s is spent just on fetching the meta-data 😊

 

I have tried multithreading this by distributing table meta-data fetching across multiple connections (I know the driver would block if multiple threads were to use a single connection), and to my surprise, seems that the driver also serializes all the requests across multiple connections: the meta-data APIs took the same amount of time over a single connection, or distributed over 4.

 

Any help or pointers are much appreciated.

 

Thank you,

Laurent.