DatabaseManager¶
DatabaseManager
is the base class providing the methods to interact with databases.
- class pandemy.DatabaseManager(container=None, engine_config=None, **kwargs)[source]¶
Bases:
abc.ABC
Base class with functionality for managing a database.
Each database type will subclass from
DatabaseManager
and implement the initializer which is specific to each database type.DatabaseManager
is never used on its own, but merely provides the methods to interact with the database to its subclasses.Initialization of a
DatabaseManager
creates the connection string (conn_str) and the database engine, which are used to connect to and interact with the database. These are available as attributes on the instance. The initializer can contain any number of parameters needed to connect to the database and should always support container, engine_config and **kwargs.- Parameters
container (SQLContainer or None, default None) – A container of database statements that can be used by the
DatabaseManager
.engine_config (dict or None) – Additional keyword arguments passed to the
sqlalchemy.create_engine()
function.**kwargs (dict) – Additional keyword arguments that are not used by the
DatabaseManager
.
- Attributes
conn_str (str) – The connection string for connecting to the database.
engine (
sqlalchemy.engine.Engine
) – The engine for interacting with the database.
- delete_all_records_from_table(table, conn)[source]¶
Delete all records from the specified table.
- Parameters
table (str) – The table to delete all records from.
conn (sqlalchemy.engine.base.Connection) – An open connection to the database.
- Raises
pandemy.InvalidTableNameError – If the supplied table name is invalid.
pandemy.DeleteFromTableError – If data cannot be deleted from the table.
- execute(sql, conn, params=None)[source]¶
Execute a SQL statement.
- Parameters
sql (str or sqlalchemy.sql.elements.TextClause) – The SQL statement to execute. A string value is automatically converted to a
sqlalchemy.sql.elements.TextClause
with thesqlalchemy.sql.expression.text()
function.conn (sqlalchemy.engine.base.Connection) – An open connection to the database.
params (dict or list of dict or None, default None) –
Parameters to bind to the SQL statement sql. Parameters in the SQL statement should be prefixed by a colon (:) e.g.
:myparameter
. Parameters in params should not contain the colon ({'myparameter': 'myvalue'}
).Supply a list of parameter dictionaries to execute multiple parametrized statements in the same method call, e.g.
[{'parameter1': 'a string'}, {'parameter2': 100}]
. This is useful for INSERT, UPDATE and DELETE statements.
- Returns
A result object from the executed statement.
- Return type
- Raises
pandemy.InvalidInputError – If sql is not of type str or sqlalchemy.sql.elements.TextClause.
pandemy.ExecuteStatementError – If an error occurs when executing the statement.
See also
sqlalchemy.engine.Connection.execute()
: The method used for executing the SQL statement.sqlalchemy.engine.CursorResult
: The return type from the method.
Examples
To process the result from the method the database connection must remain open after the method is executed i.e. the context manager cannot be closed before processing the result:
import pandemy db = SQLiteDb(file='mydb.db') with db.engine.connect() as conn: result = db.execute('SELECT * FROM MyTable;', conn=conn) for row in result: print(row) # process the result ...
- load_table(sql, conn, params=None, index_col=None, columns=None, parse_dates=None, localize_tz=None, target_tz=None, dtypes=None, chunksize=None, coerce_float=True)[source]¶
Load a SQL table into a
pandas.DataFrame
.Specify a table name or a SQL query to load the
pandas.DataFrame
from. Usespandas.read_sql()
function to read from the database.- Parameters
sql (str or sqlalchemy.sql.elements.TextClause) – The table name or SQL query.
conn (sqlalchemy.engine.base.Connection) – An open connection to the database to use for the query.
params (dict of str or None, default None) – Parameters to bind to the SQL query sql. Parameters in the SQL query should be prefixed by a colon (:) e.g.
:myparameter
. Parameters in params should not contain the colon ({'myparameter': 'myvalue'}
).index_col (str or sequence of str or None, default None) – The column(s) to set as the index of the
pandas.DataFrame
.columns (list of str or None, default None) – List of column names to select from the SQL table (only used when sql is a table name).
parse_dates (list or dict or None, default None) –
List of column names to parse as dates.
Dict of {column_name: format string} where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.
Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of
pandas.to_datetime()
. Especially useful with databases without native datetime support, such as SQLite.
localize_tz (str or None, default None) – Localize naive datetime columns of the returned
pandas.DataFrame
to specified timezone. If None no localization is performed.target_tz (str or None, default None) – The timezone to convert the datetime columns of the returned
pandas.DataFrame
into after they have been localized. If None no conversion is performed.dtypes (dict or None, default None) – Desired data types for specified columns {‘column_name’: data type}. Use pandas or numpy data types or string names of those. If None no data type conversion is performed.
chunksize (int or None, default None) – If chunksize is specified an iterator of DataFrames will be returned where chunksize is the number of rows in each
pandas.DataFrame
. If chunksize is supplied timezone localization and conversion as well as dtype conversion cannot be performed i.e. localize_tz, target_tz and dtypes have no effect.coerce_float (bool, default True) – Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.
- Returns
df –
pandas.DataFrame
with the result of the query or an iterator of DataFrames if chunksize is specified.- Return type
pandas.DataFrame or Iterator[pandas.DataFrame]
- Raises
pandemy.LoadTableError – If errors when loading the table using
pandas.read_sql()
.pandemy.SetIndexError – If setting the index of the returned
pandas.DataFrame
fails when index_col is specified and chunksize is None.pandemy.DataTypeConversionError – If errors when converting data types using the dtypes parameter.
See also
pandas.read_sql()
: Read SQL query or database table into apandas.DataFrame
.pandas.to_datetime()
: The function used for datetime conversion with parse_dates.
Examples
When specifying the chunksize parameter the database connection must remain open to be able to process the DataFrames from the iterator. The processing must occur within the context manager:
import pandemy db = pandemy.SQLiteDb(file='mydb.db') with db.engine.connect() as conn: df_gen = db.load_table(sql='MyTableName', conn=conn, chunksize=3) for df in df_gen: print(df) # Process your DataFrames ...
- manage_foreign_keys(conn, action)[source]¶
Manage how the database handles foreign key constraints.
Should be implemented by DatabaseManagers whose SQL dialect supports enabling/disabling checking foreign key constraints. E.g. SQLite.
- Parameters
conn (sqlalchemy.engine.base.Connection) – An open connection to the database.
action (str) – How to handle foreign key constraints in the database.
- Raises
pandemy.InvalidInputError – If invalid input is supplied to action.
pandemy.ExecuteStatementError – If changing the handling of foreign key constraint fails.
- save_df(df, table, conn, if_exists='append', index=True, index_label=None, chunksize=None, schema=None, dtype=None, method=None)[source]¶
Save the
pandas.DataFrame
df to specified table in the database.If the table does not exist it will be created. If the table already exists the column names of the
pandas.DataFrame
df must match the table column definitions. Usespandas.DataFrame.to_sql()
method to write thepandas.DataFrame
to the database.- Parameters
df (pandas.DataFrame) – The DataFrame to save to the database.
table (str) – The name of the table where to save the
pandas.DataFrame
.conn (sqlalchemy.engine.base.Connection) – An open connection to the database.
if_exists (str, {'append', 'replace', 'fail'}) –
How to update an existing table in the database:
’append’: Append the
pandas.DataFrame
to the existing table.’replace’: Delete all records from the table and then write the
pandas.DataFrame
to the table.’fail’: Raise
pandemy.TableExistsError
if the table exists.
index (bool, default True) – Write
pandas.DataFrame
index as a column. Uses the name of the index as the column name for the table.index_label (str or sequence of str or None, default None) – Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the
pandas.DataFrame
uses apandas.MultiIndex
.chunksize (int or None, default None) – The number of rows in each batch to be written at a time. If None, all rows will be written at once.
schema (str, None, default None) – Specify the schema (if database flavor supports this). If None, use default schema.
dtype (dict or scalar, default None) – Specifying the data type for columns. If a dictionary is used, the keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode. If a scalar is provided, it will be applied to all columns.
method (None, 'multi', callable, default None) –
Controls the SQL insertion clause used:
- None:
Uses standard SQL INSERT clause (one per row).
- ’multi’:
Pass multiple values in a single INSERT clause. It uses a special SQL syntax not supported by all backends. This usually provides better performance for analytic databases like Presto and Redshift, but has worse performance for traditional SQL backend if the table contains many columns. For more information check the SQLAlchemy documentation.
- callable with signature (pd_table, conn, keys, data_iter):
This can be used to implement a more performant insertion method based on specific backend dialect features. See: pandas SQL insertion method.
- Raises
pandemy.TableExistsError – If the table exists and
if_exists='fail'
.pandemy.DeleteFromTableError – If data in the table cannot be deleted when
if_exists='replace'
.pandemy.InvalidInputError – Invalid values or types for input parameters.
pandemy.InvalidTableNameError – If the supplied table name is invalid.
pandemy.SaveDataFrameError – If the
pandas.DataFrame
cannot be saved to the table.
See also
pandas.DataFrame.to_sql()
: Write records stored in a DataFrame to a SQL database.pandas SQL insertion method : Details about using the method parameter.
SQLiteDb¶
SQLiteDb
is a DatabaseManager
for the flexible file based database SQLite.
- class pandemy.SQLiteDb(file=':memory:', must_exist=False, container=None, engine_config=None, **kwargs)[source]¶
Bases:
pandemy.databasemanager.DatabaseManager
A SQLite
DatabaseManager
.- Parameters
file (str or pathlib.Path, default ':memory:') – The file (with path) to the SQLite database. The default creates an in memory database.
must_exist (bool, default False) – If
True
validate that file exists unlessfile=':memory:'
. If it does not existpandemy.DatabaseFileNotFoundError
is raised. IfFalse
the validation is omitted.container (SQLContainer or None, default None) – A container of database statements that the SQLite
DatabaseManager
can use.engine_config (dict or None) – Additional keyword arguments passed to the
sqlalchemy.create_engine()
function.**kwargs (dict) – Additional keyword arguments that are not used by
SQLiteDb
.
- Raises
pandemy.InvalidInputError – If invalid types are supplied to file, must_exist and container.
pandemy.DatabaseFileNotFoundError – If the database file does not exist when
must_exist=True
.pandemy.CreateEngineError – If the creation of the database engine fails.
See also
pandemy.DatabaseManager
: The parent class.sqlalchemy.create_engine()
: The function used to create the database engine.SQLAlchemy SQLite dialect : Implementation of the SQLite dialect in SQLAlchemy.
SQLite : The SQLite homepage.
- manage_foreign_keys(conn, action='ON')[source]¶
Manage how the database handles foreign key constraints.
In SQLite the check of foreign key constraints is not enabled by default.
- Parameters
conn (sqlalchemy.engine.base.Connection) – An open connection to the database.
action ({'ON', 'OFF'}) – Enable (‘ON’) or disable (‘OFF’) the check of foreign key constraints.
- Raises
pandemy.InvalidInputError – If invalid input is supplied to action.
pandemy.ExecuteStatementError – If the enabling/disabling of the foreign key constraints fails.