DatabaseManager¶

DatabaseManager is the base class providing the methods to interact with databases.

class pandemy.DatabaseManager(container=None, engine_config=None, **kwargs)[source]¶

Bases: abc.ABC

Base class with functionality for managing a database.

Each database type will subclass from DatabaseManager and implement the initializer which is specific to each database type. DatabaseManager is never used on its own, but merely provides the methods to interact with the database to its subclasses.

Initialization of a DatabaseManager creates the connection string (conn_str) and the database engine, which are used to connect to and interact with the database. These are available as attributes on the instance. The initializer can contain any number of parameters needed to connect to the database and should always support container, engine_config and **kwargs.

Parameters

container (SQLContainer or None, default None) – A container of database statements that can be used by the DatabaseManager.
engine_config (dict or None) – Additional keyword arguments passed to the sqlalchemy.create_engine() function.
**kwargs (dict) – Additional keyword arguments that are not used by the DatabaseManager.

Attributes

conn_str (str) – The connection string for connecting to the database.
engine (sqlalchemy.engine.Engine) – The engine for interacting with the database.

delete_all_records_from_table(table, conn)[source]¶

Delete all records from the specified table.

Parameters

table (str) – The table to delete all records from.
conn (sqlalchemy.engine.base.Connection) – An open connection to the database.

Raises

pandemy.InvalidTableNameError – If the supplied table name is invalid.
pandemy.DeleteFromTableError – If data cannot be deleted from the table.

execute(sql, conn, params=None)[source]¶

Execute a SQL statement.

Parameters

sql (str or sqlalchemy.sql.elements.TextClause) – The SQL statement to execute. A string value is automatically converted to a sqlalchemy.sql.elements.TextClause with the sqlalchemy.sql.expression.text() function.
conn (sqlalchemy.engine.base.Connection) – An open connection to the database.
params (dict or list of dict or None, default None) –
Parameters to bind to the SQL statement sql. Parameters in the SQL statement should be prefixed by a colon (:) e.g. :myparameter. Parameters in params should not contain the colon ({'myparameter': 'myvalue'}).

Supply a list of parameter dictionaries to execute multiple parametrized statements in the same method call, e.g. [{'parameter1': 'a string'}, {'parameter2': 100}]. This is useful for INSERT, UPDATE and DELETE statements.

Returns

A result object from the executed statement.

Return type

sqlalchemy.engine.CursorResult

Raises

pandemy.InvalidInputError – If sql is not of type str or sqlalchemy.sql.elements.TextClause.
pandemy.ExecuteStatementError – If an error occurs when executing the statement.

See also

sqlalchemy.engine.Connection.execute() : The method used for executing the SQL statement.
sqlalchemy.engine.CursorResult : The return type from the method.

Examples

To process the result from the method the database connection must remain open after the method is executed i.e. the context manager cannot be closed before processing the result:

import pandemy

db = SQLiteDb(file='mydb.db')

with db.engine.connect() as conn:
    result = db.execute('SELECT * FROM MyTable;', conn=conn)

     for row in result:
         print(row)  # process the result
         ...

load_table(sql, conn, params=None, index_col=None, columns=None, parse_dates=None, localize_tz=None, target_tz=None, dtypes=None, chunksize=None, coerce_float=True)[source]¶

Load a SQL table into a pandas.DataFrame.

Specify a table name or a SQL query to load the pandas.DataFrame from. Uses pandas.read_sql() function to read from the database.

Parameters

sql (str or sqlalchemy.sql.elements.TextClause) – The table name or SQL query.
conn (sqlalchemy.engine.base.Connection) – An open connection to the database to use for the query.
params (dict of str or None, default None) – Parameters to bind to the SQL query sql. Parameters in the SQL query should be prefixed by a colon (:) e.g. :myparameter. Parameters in params should not contain the colon ({'myparameter': 'myvalue'}).
index_col (str or sequence of str or None, default None) – The column(s) to set as the index of the pandas.DataFrame.
columns (list of str or None, default None) – List of column names to select from the SQL table (only used when sql is a table name).
parse_dates (list or dict or None, default None) –
- List of column names to parse as dates.
- Dict of {column_name: format string} where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.
- Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime(). Especially useful with databases without native datetime support, such as SQLite.
localize_tz (str or None, default None) – Localize naive datetime columns of the returned pandas.DataFrame to specified timezone. If None no localization is performed.
target_tz (str or None, default None) – The timezone to convert the datetime columns of the returned pandas.DataFrame into after they have been localized. If None no conversion is performed.
dtypes (dict or None, default None) – Desired data types for specified columns {‘column_name’: data type}. Use pandas or numpy data types or string names of those. If None no data type conversion is performed.
chunksize (int or None, default None) – If chunksize is specified an iterator of DataFrames will be returned where chunksize is the number of rows in each pandas.DataFrame. If chunksize is supplied timezone localization and conversion as well as dtype conversion cannot be performed i.e. localize_tz, target_tz and dtypes have no effect.
coerce_float (bool, default True) – Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.

Returns

df – pandas.DataFrame with the result of the query or an iterator of DataFrames if chunksize is specified.

Return type

pandas.DataFrame or Iterator[pandas.DataFrame]

Raises

pandemy.LoadTableError – If errors when loading the table using pandas.read_sql().
pandemy.SetIndexError – If setting the index of the returned pandas.DataFrame fails when index_col is specified and chunksize is None.
pandemy.DataTypeConversionError – If errors when converting data types using the dtypes parameter.

See also

pandas.read_sql() : Read SQL query or database table into a pandas.DataFrame.
pandas.to_datetime() : The function used for datetime conversion with parse_dates.

Examples

When specifying the chunksize parameter the database connection must remain open to be able to process the DataFrames from the iterator. The processing must occur within the context manager:

import pandemy

db = pandemy.SQLiteDb(file='mydb.db')

with db.engine.connect() as conn:
    df_gen = db.load_table(sql='MyTableName', conn=conn, chunksize=3)

    for df in df_gen:
        print(df)  # Process your DataFrames
        ...

manage_foreign_keys(conn, action)[source]¶

Manage how the database handles foreign key constraints.

Should be implemented by DatabaseManagers whose SQL dialect supports enabling/disabling checking foreign key constraints. E.g. SQLite.

Parameters

conn (sqlalchemy.engine.base.Connection) – An open connection to the database.
action (str) – How to handle foreign key constraints in the database.

Raises

pandemy.InvalidInputError – If invalid input is supplied to action.
pandemy.ExecuteStatementError – If changing the handling of foreign key constraint fails.

save_df(df, table, conn, if_exists='append', index=True, index_label=None, chunksize=None, schema=None, dtype=None, method=None)[source]¶

Save the pandas.DataFrame df to specified table in the database.

If the table does not exist it will be created. If the table already exists the column names of the pandas.DataFrame df must match the table column definitions. Uses pandas.DataFrame.to_sql() method to write the pandas.DataFrame to the database.

Parameters

df (pandas.DataFrame) – The DataFrame to save to the database.
table (str) – The name of the table where to save the pandas.DataFrame.
conn (sqlalchemy.engine.base.Connection) – An open connection to the database.
if_exists (str, {'append', 'replace', 'fail'}) –
How to update an existing table in the database:
- ’append’: Append the pandas.DataFrame to the existing table.
- ’replace’: Delete all records from the table and then write the pandas.DataFrame to the table.
- ’fail’: Raise pandemy.TableExistsError if the table exists.
index (bool, default True) – Write pandas.DataFrame index as a column. Uses the name of the index as the column name for the table.
index_label (str or sequence of str or None, default None) – Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the pandas.DataFrame uses a pandas.MultiIndex.
chunksize (int or None, default None) – The number of rows in each batch to be written at a time. If None, all rows will be written at once.
schema (str, None, default None) – Specify the schema (if database flavor supports this). If None, use default schema.
dtype (dict or scalar, default None) – Specifying the data type for columns. If a dictionary is used, the keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode. If a scalar is provided, it will be applied to all columns.
method (None, 'multi', callable, default None) –
Controls the SQL insertion clause used:
- None:
  Uses standard SQL INSERT clause (one per row).
- ’multi’:
  Pass multiple values in a single INSERT clause. It uses a special SQL syntax not supported by all backends. This usually provides better performance for analytic databases like Presto and Redshift, but has worse performance for traditional SQL backend if the table contains many columns. For more information check the SQLAlchemy documentation.
- callable with signature (pd_table, conn, keys, data_iter):
  This can be used to implement a more performant insertion method based on specific backend dialect features. See: pandas SQL insertion method.

Raises

pandemy.TableExistsError – If the table exists and if_exists='fail'.
pandemy.DeleteFromTableError – If data in the table cannot be deleted when if_exists='replace'.
pandemy.InvalidInputError – Invalid values or types for input parameters.
pandemy.InvalidTableNameError – If the supplied table name is invalid.
pandemy.SaveDataFrameError – If the pandas.DataFrame cannot be saved to the table.

See also

pandas.DataFrame.to_sql() : Write records stored in a DataFrame to a SQL database.
pandas SQL insertion method : Details about using the method parameter.

SQLiteDb¶

SQLiteDb is a DatabaseManager for the flexible file based database SQLite.

class pandemy.SQLiteDb(file=':memory:', must_exist=False, container=None, engine_config=None, **kwargs)[source]¶

Bases: pandemy.databasemanager.DatabaseManager

A SQLite DatabaseManager.

Parameters

file (str or pathlib.Path, default ':memory:') – The file (with path) to the SQLite database. The default creates an in memory database.
must_exist (bool, default False) – If True validate that file exists unless file=':memory:'. If it does not exist pandemy.DatabaseFileNotFoundError is raised. If False the validation is omitted.
container (SQLContainer or None, default None) – A container of database statements that the SQLite DatabaseManager can use.
engine_config (dict or None) – Additional keyword arguments passed to the sqlalchemy.create_engine() function.
**kwargs (dict) – Additional keyword arguments that are not used by SQLiteDb.

Raises

pandemy.InvalidInputError – If invalid types are supplied to file, must_exist and container.
pandemy.DatabaseFileNotFoundError – If the database file does not exist when must_exist=True.
pandemy.CreateEngineError – If the creation of the database engine fails.

See also

pandemy.DatabaseManager : The parent class.
sqlalchemy.create_engine() : The function used to create the database engine.
SQLAlchemy SQLite dialect : Implementation of the SQLite dialect in SQLAlchemy.
SQLite : The SQLite homepage.

manage_foreign_keys(conn, action='ON')[source]¶

Manage how the database handles foreign key constraints.

In SQLite the check of foreign key constraints is not enabled by default.

Parameters

conn (sqlalchemy.engine.base.Connection) – An open connection to the database.
action ({'ON', 'OFF'}) – Enable (‘ON’) or disable (‘OFF’) the check of foreign key constraints.

Raises

pandemy.InvalidInputError – If invalid input is supplied to action.
pandemy.ExecuteStatementError – If the enabling/disabling of the foreign key constraints fails.