Auto Generating Migrations#

Alembic can view the status of the database and compare against the table metadata in the application, generating the “obvious” migrations based on a comparison. This is achieved using the --autogenerate option to the alembic revision command, which places so-called candidate migrations into our new migrations file. We review and modify these by hand as needed, then proceed normally.

To use autogenerate, we first need to modify our env.py so that it gets access to a table metadata object that contains the target. Suppose our application has a declarative base in myapp.mymodel . This base contains a MetaData object which contains Table objects defining our database. We make sure this is loaded in env.py and then passed to EnvironmentContext.configure() via the target_metadata argument. The env.py sample script used in the generic template already has a variable declaration near the top for our convenience, where we replace None with our MetaData . Starting with:

# add your model's MetaData object here # for 'autogenerate' support # from myapp import mymodel # target_metadata = mymodel.Base.metadata target_metadata = None 
from myapp.mymodel import Base target_metadata = Base.metadata 

The above example refers to the generic alembic env.py template, e.g. the one created by default when calling upon alembic init , and not the special-use templates such as multidb . Please consult the source code and comments within the env.py script directly for specific guidance on where and how the autogenerate metadata is established.

If we look later in the script, down in run_migrations_online() , we can see the directive passed to EnvironmentContext.configure() :

def run_migrations_online(): engine = engine_from_config( config.get_section(config.config_ini_section), prefix='sqlalchemy.') with engine.connect() as connection: context.configure( connection=connection, target_metadata=target_metadata ) with context.begin_transaction(): context.run_migrations() 

We can then use the alembic revision command in conjunction with the --autogenerate option. Suppose our MetaData contained a definition for the account table, and the database did not. We’d get output like:

$ alembic revision --autogenerate -m "Added account table" INFO [alembic.context] Detected added table 'account' Generating /path/to/foo/alembic/versions/27c6a30d7c24.py. done

We can then view our file 27c6a30d7c24.py and see that a rudimentary migration is already present:

"""empty message Revision ID: 27c6a30d7c24 Revises: None Create Date: 2011-11-08 11:40:27.089406 """ # revision identifiers, used by Alembic. revision = '27c6a30d7c24' down_revision = None from alembic import op import sqlalchemy as sa def upgrade(): ### commands auto generated by Alembic - please adjust! ### op.create_table( 'account', sa.Column('id', sa.Integer()), sa.Column('name', sa.String(length=50), nullable=False), sa.Column('description', sa.VARCHAR(200)), sa.Column('last_transaction_date', sa.DateTime()), sa.PrimaryKeyConstraint('id') ) ### end Alembic commands ### def downgrade(): ### commands auto generated by Alembic - please adjust! ### op.drop_table("account") ### end Alembic commands ### 

The migration hasn’t actually run yet, of course. We do that via the usual upgrade command. We should also go into our migration file and alter it as needed, including adjustments to the directives as well as the addition of other directives which these may be dependent on - specifically data changes in between creates/alters/drops.

What does Autogenerate Detect (and what does it not detect?)#

The vast majority of user issues with Alembic centers on the topic of what kinds of changes autogenerate can and cannot detect reliably, as well as how it renders Python code for what it does detect. It is critical to note that autogenerate is not intended to be perfect. It is always necessary to manually review and correct the candidate migrations that autogenerate produces. The feature is getting more and more comprehensive and error-free as releases continue, but one should take note of the current limitations.

Autogenerate will detect:

Autogenerate can optionally detect:

Autogenerate can not detect:

Autogenerate can’t currently, but will eventually detect:

Notable 3-rd party libraries that extend the built-in Alembic autogenerate functionality#

Autogenerating Multiple MetaData collections#

The target_metadata collection may also be defined as a sequence if an application has multiple MetaData collections involved:

from myapp.mymodel1 import Model1Base from myapp.mymodel2 import Model2Base target_metadata = [Model1Base.metadata, Model2Base.metadata] 

The sequence of MetaData collections will be consulted in order during the autogenerate process. Note that each MetaData must contain unique table keys (e.g. the “key” is the combination of the table’s name and schema); if two MetaData objects contain a table with the same schema/name combination, an error is raised.

Controlling What to be Autogenerated#

The autogenerate process scans across all table objects within the database that is referred towards by the current database connection in use.

The list of objects that are scanned in the target database connection include:

Specifying the Schema Name - in depth introduction to how SQLAlchemy interprets schema names

Omitting Schema Names from the Autogenerate Process#

As the above set of database objects are typically to be compared to the contents of a single MetaData object, particularly when the EnvironmentContext.configure.include_schemas flag is enabled there is an important need to filter out unwanted “schemas”, which for some database backends might be the list of all the databases present. This filtering is best performed using the EnvironmentContext.configure.include_name hook, which provides for a callable that may return a boolean true/false indicating if a particular schema name should be included:

def include_name(name, type_, parent_names): if type_ == "schema": # note this will not include the default schema return name in ["schema_one", "schema_two"] else: return True context.configure( # . include_schemas = True, include_name = include_name ) 

Above, when the list of schema names is first retrieved, the names will be filtered through the above include_name function so that only schemas named "schema_one" and "schema_two" will be considered by the autogenerate process.

In order to include the default schema, that is, the schema that is referred towards by the database connection without any explicit schema being specified, the name passed to the hook is None . To alter our above example to also include the default schema, we compare to None as well:

def include_name(name, type_, parent_names): if type_ == "schema": # this **will* include the default schema return name in [None, "schema_one", "schema_two"] else: return True context.configure( # . include_schemas = True, include_name = include_name ) 

Omitting Table Names from the Autogenerate Process#

The EnvironmentContext.configure.include_name hook is also most appropriate to limit the names of tables in the target database to be considered. If a target database has many tables that are not part of the MetaData , the autogenerate process will normally assume these are extraneous tables in the database to be dropped, and it will generate a Operations.drop_table() operation for each. To prevent this, the EnvironmentContext.configure.include_name hook may be used to search for each name within the tables collection of the MetaData object and ensure names which aren’t present are not included:

target_metadata = MyModel.metadata def include_name(name, type_, parent_names): if type_ == "table": return name in target_metadata.tables else: return True context.configure( # . target_metadata = target_metadata, include_name = include_name, include_schemas = False ) 

The above example is limited to table names present in the default schema only. In order to search within a MetaData collection for schema-qualified table names as well, a table present in the non default schema will be present under a name of the form . . The EnvironmentContext.configure.include_name hook will present this schema name on a per-tablename basis in the parent_names dictionary, using the key "schema_name" that refers to the name of the schema currently being considered, or None if the schema is the default schema of the database connection:

# example fragment if parent_names["schema_name"] is None: return name in target_metadata.tables else: # build out schema-qualified name explicitly. return ( "%s.%s" % (parent_names["schema_name"], name) in target_metadata.tables ) 

However more simply, the parent_names dictionary will also include the dot-concatenated name already constructed under the key "schema_qualified_table_name" , which will also be suitably formatted for tables in the default schema as well with the dot omitted. So the full example of omitting tables with schema support may look like:

target_metadata = MyModel.metadata def include_name(name, type_, parent_names): if type_ == "schema": return name in [None, "schema_one", "schema_two"] elif type_ == "table": # use schema_qualified_table_name directly return ( parent_names["schema_qualified_table_name"] in target_metadata.tables ) else: return True context.configure( # . target_metadata = target_metadata, include_name = include_name, include_schemas = True ) 

The parent_names dictionary will also include the key "table_name" when the name being considered is that of a column or constraint object local to a particular table.

The EnvironmentContext.configure.include_name hook only refers to reflected objects, and not those located within the target MetaData collection. For more fine-grained rules that include both MetaData and reflected object, the EnvironmentContext.configure.include_object hook discussed in the next section is more appropriate.

Omitting Based on Object#

The EnvironmentContext.configure.include_object hook provides for object-level inclusion/exclusion rules based on the Table object being reflected as well as the elements within it. This hook can be used to limit objects both from the local MetaData collection as well as from the target database. The limitation is that when it reports on objects in the database, it will have fully reflected that object, which can be expensive if a large number of objects will be omitted. The example below refers to a fine-grained rule that will skip changes on Column objects that have a user-defined flag skip_autogenerate placed into the info dictionary:

def include_object(object, name, type_, reflected, compare_to): if (type_ == "column" and not reflected and object.info.get("skip_autogenerate", False)): return False else: return True context.configure( # . include_object = include_object ) 

Comparing and Rendering Types#

The area of autogenerate’s behavior of comparing and rendering Python-based type objects in migration scripts presents a challenge, in that there’s a very wide variety of types to be rendered in scripts, including those part of SQLAlchemy as well as user-defined types. A few options are given to help out with this task.

Controlling the Module Prefix#

When types are rendered, they are generated with a module prefix, so that they are available based on a relatively small number of imports. The rules for what the prefix is is based on the kind of datatype as well as configurational settings. For example, when Alembic renders SQLAlchemy types, it will by default prefix the type name with the prefix sa. :

Column("my_column", sa.Integer()) 

The use of the sa. prefix is controllable by altering the value of EnvironmentContext.configure.sqlalchemy_module_prefix :

def run_migrations_online(): # . context.configure( connection=connection, target_metadata=target_metadata, sqlalchemy_module_prefix="sqla.", # . ) # . 

In either case, the sa. prefix, or whatever prefix is desired, should also be included in the imports section of script.py.mako ; it also defaults to import sqlalchemy as sa .

For user-defined types, that is, any custom type that is not within the sqlalchemy. module namespace, by default Alembic will use the value of __module__ for the custom type:

Column("my_column", myapp.models.utils.types.MyCustomType()) 

The imports for the above type again must be made present within the migration, either manually, or by adding it to script.py.mako .

The above custom type has a long and cumbersome name based on the use of __module__ directly, which also implies that lots of imports would be needed in order to accommodate lots of types. For this reason, it is recommended that user-defined types used in migration scripts be made available from a single module. Suppose we call it myapp.migration_types :

# myapp/migration_types.py from myapp.models.utils.types import MyCustomType 

We can first add an import for migration_types to our script.py.mako :

from alembic import op import sqlalchemy as sa import myapp.migration_types $

We then override Alembic’s use of __module__ by providing a fixed prefix, using the EnvironmentContext.configure.user_module_prefix option:

def run_migrations_online(): # . context.configure( connection=connection, target_metadata=target_metadata, user_module_prefix="myapp.migration_types.", # . ) # . 

Above, we now would get a migration like:

Column("my_column", myapp.migration_types.MyCustomType()) 

Now, when we inevitably refactor our application to move MyCustomType somewhere else, we only need modify the myapp.migration_types module, instead of searching and replacing all instances within our migration scripts.

Affecting the Rendering of Types Themselves#

The methodology Alembic uses to generate SQLAlchemy and user-defined type constructs as Python code is plain old __repr__() . SQLAlchemy’s built-in types for the most part have a __repr__() that faithfully renders a Python-compatible constructor call, but there are some exceptions, particularly in those cases when a constructor accepts arguments that aren’t compatible with __repr__() , such as a pickling function.

When building a custom type that will be rendered into a migration script, it is often necessary to explicitly give the type a __repr__() that will faithfully reproduce the constructor for that type. This, in combination with EnvironmentContext.configure.user_module_prefix , is usually enough. However, if additional behaviors are needed, a more comprehensive hook is the EnvironmentContext.configure.render_item option. This hook allows one to provide a callable function within env.py that will fully take over how a type is rendered, including its module prefix:

def render_item(type_, obj, autogen_context): """Apply custom rendering for selected items.""" if type_ == 'type' and isinstance(obj, MySpecialType): return "mypackage.%r" % obj # default rendering for other objects return False def run_migrations_online(): # . context.configure( connection=connection, target_metadata=target_metadata, render_item=render_item, # . ) # . 

In the above example, we’d ensure our MySpecialType includes an appropriate __repr__() method, which is invoked when we call it against "%r" .

The callable we use for EnvironmentContext.configure.render_item can also add imports to our migration script. The AutogenContext passed in contains a datamember called AutogenContext.imports , which is a Python set() for which we can add new imports. For example, if MySpecialType were in a module called mymodel.types , we can add the import for it as we encounter the type:

def render_item(type_, obj, autogen_context): """Apply custom rendering for selected items.""" if type_ == 'type' and isinstance(obj, MySpecialType): # add import for this type autogen_context.imports.add("from mymodel import types") return "types.%r" % obj # default rendering for other objects return False 

The finished migration script will include our imports where the $ expression is used, producing output such as:

from alembic import op import sqlalchemy as sa from mymodel import types def upgrade(): op.add_column('sometable', Column('mycolumn', types.MySpecialType())) 

Comparing Types#

The default type comparison logic will work for SQLAlchemy built in types as well as basic user defined types. This logic is enabled by default. It can be disabled by setting the EnvironmentContext.configure.compare_type to False :

context.configure( # . compare_type = False ) 

Changed in version 1.12.0: The default value of EnvironmentContext.configure.compare_type has been changed to True .

The default type comparison logic (which is end-user extensible) currently (as of Alembic version 1.4.0) works by comparing the generated SQL for a column. It does this in two steps-

Alternatively, the EnvironmentContext.configure.compare_type parameter accepts a callable function which may be used to implement custom type comparison logic, for cases such as where special user defined types are being used:

def my_compare_type(context, inspected_column, metadata_column, inspected_type, metadata_type): # return False if the metadata_type is the same as the inspected_type # or None to allow the default implementation to compare these # types. a return value of True means the two types do not # match and should result in a type change operation. return None context.configure( # . compare_type = my_compare_type ) 

Above, inspected_column is a sqlalchemy.schema.Column as returned by sqlalchemy.engine.reflection.Inspector.reflect_table() , whereas metadata_column is a sqlalchemy.schema.Column from the local model environment. A return value of None indicates that default type comparison to proceed.

Applying Post Processing and Python Code Formatters to Generated Revisions#

Revision scripts generated by the alembic revision command can optionally be piped through a series of post-production functions which may analyze or rewrite Python source code generated by Alembic, within the scope of running the revision command. The primary intended use of this feature is to run code-formatting tools such as Black or autopep8, as well as custom-written formatting and linter functions, on revision files as Alembic generates them. Any number of hooks can be configured and they will be run in series, given the path to the newly generated file as well as configuration options.

The post write hooks, when configured, run against generated revision files regardless of whether or not the autogenerate feature was used.

Alembic’s post write system is partially inspired by the pre-commit tool, which configures git hooks that reformat source files as they are committed to a git repository. Pre-commit can serve this role for Alembic revision files as well, applying code formatters to them as they are committed. Alembic’s post write hooks are useful only in that they can format the files immediately upon generation, rather than at commit time, and also can be useful for projects that prefer not to use pre-commit.

Basic Post Processor Configuration#

The alembic.ini samples now include commented-out configuration illustrating how to configure code-formatting tools, or other tools like linters to run against the newly generated file path. Example:

[post_write_hooks] # format using "black" hooks=black black.type = console_scripts black.entrypoint = black black.options = -l 79 

Above, we configure hooks to be a single post write hook labeled "black" . Note that this label is arbitrary. We then define the configuration for the "black" post write hook, which includes:

New in version 1.12: added new exec runner

The following configuration option is specific to the "console_scripts" hook runner:

The following configuration option is specific to the "exec" hook runner:

bare executable name which will be searched in $PATH , or a full pathname to avoid potential issues with path interception.

The following options are supported by both "console_scripts" and "exec" :

Note Make sure options for the script are provided such that it will rewrite the input file in place. For example, when running autopep8 , the --in-place option should be provided:

[post_write_hooks] hooks = autopep8 autopep8.type = console_scripts autopep8.entrypoint = autopep8 autopep8.options = --in-place REVISION_SCRIPT_FILENAME 

When running alembic revision -m "rev1" , we will now see the black tool’s output as well:

$ alembic revision -m "rev1" Generating /path/to/project/versions/481b13bc369a_rev1.py . done Running post write hook "black" . reformatted /path/to/project/versions/481b13bc369a_rev1.py All done! ✨ 🍰 ✨ 1 file reformatted. done

Hooks may also be specified as a list of names, which correspond to hook runners that will run sequentially. As an example, we can also run the zimports import rewriting tool (written by Alembic’s author) subsequent to running the black tool, using a configuration as follows:

[post_write_hooks] # format using "black", then "zimports" hooks=black, zimports black.type = console_scripts black.entrypoint = black black.options = -l 79 REVISION_SCRIPT_FILENAME zimports.type = console_scripts zimports.entrypoint = zimports zimports.options = --style google REVISION_SCRIPT_FILENAME 

When using the above configuration, a newly generated revision file will be processed first by the “black” tool, then by the “zimports” tool.

Alternatively, one can run pre-commit itself as follows:

[post_write_hooks] hooks = pre-commit pre-commit.type = console_scripts pre-commit.entrypoint = pre-commit pre-commit.options = run --files REVISION_SCRIPT_FILENAME pre-commit.cwd = %(here)s 

(The last line helps to ensure that the .pre-commit-config.yaml file will always be found, regardless of from where the hook was called.)

Writing Custom Hooks as Python Functions#

The previous section illustrated how to run command-line code formatters, through the use of a post write hook provided by Alembic known as console_scripts . This hook is in fact a Python function that is registered under that name using a registration function that may be used to register other types of hooks as well.

To illustrate, we will use the example of a short Python function that wants to rewrite the generated code to use tabs instead of four spaces. For simplicity, we will illustrate how this function can be present directly in the env.py file. The function is declared and registered using the write_hooks.register() decorator:

from alembic.script import write_hooks import re @write_hooks.register("spaces_to_tabs") def convert_spaces_to_tabs(filename, options): lines = [] with open(filename) as file_: for line in file_: lines.append( re.sub( r"^( )+", lambda m: "\t" * (len(m.group(1)) // 4), line ) ) with open(filename, "w") as to_write: to_write.write("".join(lines)) 

Our new "spaces_to_tabs" hook can be configured in alembic.ini as follows:

[alembic] # . # ensure the revision command loads env.py revision_environment = true [post_write_hooks] hooks = spaces_to_tabs spaces_to_tabs.type = spaces_to_tabs 

When alembic revision is run, the env.py file will be loaded in all cases, the custom “spaces_to_tabs” function will be registered and it will then be run against the newly generated file path:

$ alembic revision -m "rev1" Generating /path/to/project/versions/481b13bc369a_rev1.py . done Running post write hook "spaces_to_tabs" . done

Running Alembic Check to test for new upgrade operations#

When developing code it’s useful to know if a set of code changes has made any net change to the database model, such that new revisions would need to be generated. To automate this, Alembic provides the alembic check command. This command will run through the same process as alembic revision --autogenerate , up until the point where revision files would be generated, however does not generate any new files. Instead, it returns an error code plus a message if it is detected that new operations would be rendered into a new revision, or if not, returns a success code plus a message. When alembic check returns a success code, this is an indication that the alembic revision --autogenerate command would produce only empty migrations, and does not need to be run.

alembic check can be worked into CI systems and on-commit schemes to ensure that incoming code does not warrant new revisions to be generated. In the example below, a check that detects new operations is illustrated:

$ alembic check FAILED: New upgrade operations detected: [ ('add_column', None, 'my_table', Column('data', String(), table=)), ('add_column', None, 'my_table', Column('newcol', Integer(), table=))]

by contrast, when no new operations are detected:

$ alembic check No new upgrade operations detected.

New in version 1.9.0.

The alembic check command uses the same model comparison process as the alembic revision --autogenerate process. This means parameters such as EnvironmentContext.configure.compare_type and EnvironmentContext.configure.compare_server_default are in play as usual, as well as that limitations in autogenerate detection are the same when running alembic check .

Generating SQL Scripts (a.k.a. “Offline Mode”)