Agnostic Database Migrations¶
Agnostic is a database migrations system that is agnostic to programming language, framework, and database system. It supports MySQL, PostgreSQL, and SQLite.
Overview¶
Agnostic Database Migrations is a database migration tool that is agnostic to:
- database system
- programming language
- object/relational mapper (ORMs)
- build system
- merging source code branches
- workflow
Favoring convention over configuration, Agnostic is lightweight and conceptually simple.
Stop stressing about migrations and start using Agnostic!
Quick Start¶
Ok, Mr. or Mrs. Impatient, you don’t want to read 14 pages of dry, technical documentation. You just want to see what it can do, right?
We will assume that you already have a running application that uses a database
named myapp
. This application has a single table called customer
:
CREATE TABLE customer (
name character varying(255),
address character varying(255),
home_phone character varying(255)
);
Note
One of the beautiful features of Agnostic is that doesn’t care how you got to this point, or what tools you use to build your database. You can build your database with a SQL script, an ORM, or even the butterfly effect. Agnostic won’t get jealous.
Now your customers are demanding a few features, and these new features require changes to the existing database structure:
- Add a
cell_phone
column. - Add a
nickname
column.
Let’s see how we can do this with Agnostic.
First, create a directory to hold the migrations. We will name the directory
migrations
. (You could name it something else, but that would take a few
extra minutes to explain, and this wouldn’t be a “quick” start.)
~/myapp $ mkdir migrations
Next, bootstrap the migrations system. This step creates a table inside your database to hold metadata about migrations.
~/myapp $ agnostic -t postgres -u myuser -d mydb bootstrap
Migration table created.
Now, write SQL scripts for each of the changes and save them in the
migrations
folder.
~/myapp $ cat > migrations/add_nickname.sql
ALTER TABLE customer ADD nickname VARCHR(255);
~/myapp $ cat > migrations/add_cell_phone.sql
ALTER TABLE customer ADD cell_phone VARCHAR(255);
Finally, run the migrations:
~/myapp $ agnostic -t postgres -u myuser -d mydb migrate
Backing up database "myapp" to "/tmp/tmprhty1nc7".
About to run 2 migrations in database "myapp":
* Running migration add_cell_phone (1/2)
* Running migration add_nickname (2/2)
Migration failed because:
failed to run external tool "psql" (exit 3):
ERROR: type "varchr" does not exist
LINE 1: ALTER TABLE customer ADD nickname VARCHR(255);
^
Will try to restore from backup…
Restored from backup.
Aborted!
Ruh roh! The first migration ran fine, but it looks like the second
migration has a typo: VARCHR
instead of VARCHAR
. Luckily, Agnostic
automatically backs up your database before running migrations. In the event of
a failure, it automatically restores from that backup so that you don’t get
stuck in an in-between state.
Note
You can disable Agnostic’s automatic backup/restore behavior with the
--no-backup
flag.
Let’s fix the typo and run it again.
~/myapp $ sed -i 's:VARCHR:VARCHAR:' migrations/add_nickname.sql
ALTER TABLE customer ADD nickname VARCHAR(255);
~/myapp $ agnostic -t postgres -u myuser -d mydb migrate
Backing up database "myapp" to "/tmp/tmpm8glpgaa".
About to run 2 migrations in database "myapp":
* Running migration add_cell_phone (1/2)
* Running migration add_nickname (2/2)
Migrations completed successfully.
Removing backup "/tmp/tmpm8glpgaa".
Sweet! You’re done…
Agnostic keeps track of what migrations have already been applied, so we can easily run future migrations without accidentally re-executing previous migrations.
~/myapp $ cat > migrations/drop_nickname.sql
ALTER TABLE customer DROP nickname;
~/myapp $ agnostic -t postgres -u myuser -d mydb list
Name | Status | Started At | Completed At
---------------+-----------+---------------------+--------------------
add_cell_phone | succeeded | 2015-05-23 21:09:33 | 2015-05-23 21:09:34
add_nickname | succeeded | 2015-05-23 21:09:34 | 2015-05-23 21:09:34
drop_nickname | pending | N/A | N/A
~/myapp $ agnostic -t postgres -u myuser -d mydb migrate
Backing up database "myapp" to "/tmp/tmpiq5fhnh6".
About to run 1 migration in database "myapp":
* Running migration drop_nickname (1/1)
Migrations completed successfully.
Removing backup "/tmp/tmpiq5fhnh6".
Easy peasy, right?
Purpose¶
If you’re new to migrations, or coming from a different migrations system, you may be wondering what exactly is meant by “agnostic database migrations”.
When you develop and deploy an application that is backed up by a relational database, you will eventually need to deploy a new version of that application that expects a slightly different, improved database structure. In most production use cases, it’s not acceptable to just drop the database and rebuild it. Instead, you must modify the existing database to match what the application expects, and you need to do so without corrupting or destroying any of your production data.
On small projects, you might be able to handle this process manually: you write a SQL script for each new release and then you run that script whenever you need to deploy an upgraded version.
On large projects, however, you’ll find that it quickly grows to be a bigger problem that you can reasonably manage. It becomes very difficult to ensure that all of your environments have exactly the same database structure; the bugs that arise from having slightly different database structures in different places (imagine a missing foreign key constraint) cause corrupted data to build up slowly over time and eventually turn into a nightmarish debugging scenario.
Alternatives¶
There are a lot of options for database migrations:
- Django (Python) has South.
- Doctrine (PHP) has Migrations.
- Java has migrate4j.
- Perl has DBIx::Migration::Directories.
- PHP has Phinx.
- Ruby On Rails has Active Record Migrations
- SQLAlchemy (Python) has Alembic.
(This is just a sample of the many tools out there.)
Why are there so many different migration tools?
The main reason that there are so many tools is that—for some strange reason—the developers think that each programming language or ORM needs its own separate migration tool. These solutions are simultaneously over-engineered and too restrictive.
Consider the Alembic tutorial as an example:
- Run
alembic init
to intialize a directory structure, including anenv.py
configuration file. - You also need an
alembic.ini
file, which contains 20 configuration directives by default. - Edit a Mako template file in order to customize automatically generated migrations. (You have free time to learn Mako, right?)
- Run
alembic revision -m foo
to create a template for a new migration script. - Write the migration script in Python, using the Alembic API. (You have time to learn that API, right?)
- Write an
upgrade()
method for the migration, and you may write adowngrade()
method as well. (Just keep in mind that downgrading won’t work at all if you have even a single migration that doesn’t implement downgrade().)
As if you didn’t have already have enough complex things to learn, memorize, and operate, these migration systems expect you to read 100 pages of documentation just so you can manage migrations. Hopefully you’re not thinking about using a different ORM or programming language on your next project—you’ll have to learn a whole new migrations system, too!
In contrast, consider Agnostic:
- Open source.
- Lightweight.
- Not tied to a specific programming lanuage.
- Not tied to a specific ORM.
- Not tied to a specific database system.
- Migrations written in pure SQL.
- No configuration files.
- High automated test coverage.
Agnostic is a migrations system you can use on all of your projects.
License¶
Agnostic is released under an MIT license.
Installation¶
Install From PyPI¶
The easiest way to install Agnostic is from PyPI.
Note
Agnostic requires Python 3, so make sure that you are using pip3
, not
pip2
. Don’t worry: you can use Agnostic with Python 2 projects — or
projects written in any programming language — but Agnostic itself needs
to run in the Python 3 interpreter.
Agnostic requires a specific driver for each database you want to use it with, so choose one of the following forms to install Agnostic with the correct drivers.
~ $ pip3 install agnostic[mysql]
~ $ pip3 install agnostic[postgres]
~ $ pip3 install agnostic[sqlite]
Agnostic expects some database tools to be present in order to make snapshots and backups. You only need to install dependencies for the database that you’re using.
- mysql
- Requires the
PyMySQL
Python driver and both themysql
andmysqldump
executables. - postgres
- Requires the
pg8000
Python driver and both thepsql
andpg_dump
executables. - sqlite3
- Requires the
sqlite3
executable.
Install From Source¶
Agnostic is also easy to install from source, in case you want to install a pre-
release version. You can clone the repo or download a Zip file from the
project’s repository. Once you
have it cloned or unzipped, run the setup.py
script.
Note
Make sure to use Python 3, not Python 2. See notice in previous section.
Go into the Agnostic directory and then install using one of the following forms.
~ $ cd agnostic
~/agnostic $ pip3 install .[mysql]
~/agnostic $ pip3 install .[postgres]
~/agnostic $ pip3 install .[sqlite]
Make sure to review the dependencies in the “Install from PyPI” section above.
Verification¶
You can verify that Agnostic is installed by running the following command. (Your version number may not match the version shown here. That’s OK.)
~ $ agnostic --version
agnostic, version 1.0
This command does not check your dependencies, so you may still find that Agnostic does not work when you start trying to interact with your database. In that case, review the dependency information in the previous sections again.
Command Line¶
Global Options¶
The command line interface is built around a set of subcommands. A global set of
options applies to all subcommands, such as database type, username, hostname,
etc. Each subcommand may also have its own arguments and options. You can view
built-in help with the --help
flag.
$ agnostic -h
Usage: agnostic [OPTIONS] COMMAND [ARGS]...
Agnostic database migrations: upgrade schemas, save your sanity.
Options:
-t, --db-type <type> Type of database. [required]
-h, --host <host> Database hostname.
-p, --port <port> Database port.
-u, --user <user> Database username.
--password <pass> Database password.
-d, --database <database> Name of database to operate on. [required]
-s, --schema <schema> The default schema[s] to use when connecting to
the database. (PostgreSQL only. WARNING:
EXPERIMENTAL!!!)
-m, --migrations-dir <dir> Path to migrations directory. (Default:
./migrations) [required]
-D, --debug Display stack traces when exceptions occur.
--version Show the version and exit.
--help Show this message and exit.
Commands:
bootstrap Bootstrap the migrations table.
drop Drop the migrations table.
list List migrations.
migrate Run pending migrations.
snapshot Take a snapshot of the current DB structure and write it to...
test Test pending migrations.
Most options may either be passed on the command line or exported in the environment. Each database type treats these options differently.
Option | MySQL | PostgreSQL | SQLite |
---|---|---|---|
-t --db-type $AGNOSTIC_TYPE |
Required mysql |
Required postgres |
Required sqlite |
-h --host $AGNOSTIC_HOST |
Default localhost |
Default localhost |
N/A |
-p --port $AGNOSTIC_PORT |
Default 3306 |
Default 5342 |
N/A |
-u --user $AGNOSTIC_USER |
Required | Required | N/A |
--password $AGNOSTIC_PASSWORD |
Required | Required | N/A |
-d --database $AGNOSTIC_DATABASE |
Required | Required | Required [1] |
-s --schema $AGNOSTIC_SCHEMA |
N/A | Default "$user",public [2] |
N/A |
-m --migrations-dir $AGNOSTIC_MIGRATIONS_DIR |
Default migrations |
Default migrations |
Default migrations |
Notes on options:
[1] | In SQLite, the “database” option is the path to the database file. |
[2] | “Schema” support for PostgreSQL is considered experimental. Please provide feedback if you use this feature and like/dislike it. |
Warning
Be careful with passwords!
Passing a password in the --password
argument is not safe in multi-user
environments because the password will be visible in plaintext, both in your
shell history and in the system-wide process list.
We recommend that you type the password. Alternatively, export the
AGNOSTIC_PASSWORD
variable in your environment, but be wary of the
security implications of storing this value in the environment or in a
world-readable .profile
or .bashrc
.
(Lack Of) Configuration¶
Agnostic prefers convention over configuration, and in order to avoid the need for configuration files, Agnostic takes all of its configurations as command line arguments. This design makes Agnostic easier to learn, but it may result in more typing, since the configuration needs to be passed into Agnostic each time you run it.
If you prefer to use a configuration file, here are two ways to do that.
- Set up an alias.
- Set up an environment file.
The first idea is obvious to shell power users: set up an alias (e.g. in your
~/.profile
or ~/.bash_aliases
) for Agnostic, like this:
alias ag=agnostic -h myhost -t postgres -u myuser -d mydb -m /opt/myapp/migrations
Now you can run shorter commands like ag snapshot foo.sql
or ag migrate
.
This approach may be a bit limiting if you have multiple projects and each
project has different database settings.
The second approach is a bit more flexible when dealing with multiple projects.
Create a file that contains Agnostic environment variables and put it in your
project’s root directory. Let’s call it .agnostic_env
.
export AGNOSTIC_HOST=myhost
export AGNOSTIC_USER=myuser
export AGNOSTIC_TYPE=postgres
export AGNOSTIC_DATABASE=myapp
export AGNOSTIC_MIGRATIONS_DIR=/opt/myapp/migrations
When you are working on a project, source these environment variables into your shell:
/opt/myapp $ source .agnostic_env
Now you can run commands like agnostic snapshot foo.sql
and agnostic
migrate
and Agnostic will read the parameters from your environment variables.
When you switch to work on another project, you just need to source that
project’s .agnostic_env
.
bootstrap¶
~ $ agnostic bootstrap --help
Usage: agnostic bootstrap [OPTIONS]
Bootstrap the migrations table.
Agnostic stores migration metadata inside of the database that it is
managing. The bootstrap process creates a table to store this tracking data
and also (optionally) loads pre-existing migration metadata into it.
Options:
--load-existing / --no-load-existing
Track existing migrations in the new
migration table. (default: --load-existing)
--help Show this message and exit.
The bootstrap
command creates a table inside the managed schema to track
migrations metadata.
- load-existing
- By default, the bootstrap command loads existing migrations into the
metadata table with the special status
bootstrapped
. This option can be to control that behavior. See Build vs. Migrate for more information.
drop¶
~ $ agnostic drop --help
Usage: agnostic drop [OPTIONS]
Drop the migrations table.
BACK UP YOUR DATA BEFORE USING THIS COMMAND!
This destroys all metadata about what migrations have and have not been
applied. This is typically only useful when debugging.
Options:
-y, --yes Do not display warning: assume "yes".
--help Show this message and exit.
The drop
command has the opposite effect of bootstrap
: it deletes the
metadata table.
- yes
- By default, Agnostic requires the user to type
y
onstdin
to confirm that they want to delete this table. This prompt can be skipped by passing the--yes
flag.
list¶
~ $ agnostic list --help
Usage: agnostic list [OPTIONS]
List migrations.
This shows migration metadata: migrations that have been applied (and the
result of that application) and migrations that are pending.
* bootstrapped: a migration that was inserted during the bootstrap
process.
* failed: the migration did not apply cleanly; the migrations system
will not be able to operate until this is rectified, typically by
restoring from a backup.
* pending: the migration has not been applied yet.
* succeeded: the migration applied cleanly.
Applied migrations are ordered by the "started_at" timestamp. Pending
migrations follow applied migrations and are sorted in the same order that
they would be applied.
Options:
--help Show this message and exit.
List all known migrations, both applied and pending. See :ref:metadata for more information.
migrate¶
~ $ agnostic migrate --help
Usage: agnostic migrate [OPTIONS]
Run pending migrations.
Options:
--backup / --no-backup Automatically backup the database before running
migrations, and in the event of a failure,
automatically restore from that backup. (default:
--backup).
--help Show this message and exit.
Run all pending migrations in the pre-determined order. See :ref:running_migrations for more details on this process.
- backup
- By default, Agnostic backs up your schema. In the event of a migrations failure, Agnostic will try to restore from this backup. You can disable this behavior, if desired.
snapshot¶
~ $ agnostic snapshot --help
Usage: agnostic snapshot [OPTIONS] OUTFILE
Take a snapshot of the current schema and write it to OUTFILE.
Snapshots are used for testing that migrations will produce a schema that
exactly matches the schema produced by your build system. See the online
documentation for more details on how to use this feature.
Options:
--help Show this message and exit.
A snapshot is a dump of the current schema, sans data. Snapshots are useful for testing migrations, as detailed in Workflow.
- outfile
- The name of the file to write the snapshot to.
test¶
~ $ agnostic test --help
Usage: agnostic test [OPTIONS] CURRENT TARGET
Test pending migrations.
Given two snapshots, one of your "current" state and one of your "target"
state, this command verifies: current + migrations = target.
If you have a schema build system, this command is useful for verifying
that your new migrations will produce the exact same schema as the build
system.
Note: you may find it useful to set up a database/schema for testing
separate from the one that you use for development; this allows you to test
repeatedly without disrupting your development work.
Options:
-y, --yes Do not display warning: assume "yes".
--help Show this message and exit.
- current
- A snapshot of the database before the most recent changes.
- target
- A snapshot of the database after the most recent changes.
The test
command verifies that a set of migrations will run without error
and will also precisely produce the desired target schema. See Write & Test
Migrations for more details.
Design¶
Overview¶
The best way to understand Agnostic is to learn how it works under the hood and the reasoning that supports its design. Both of these aspects are very simple — I promise! The conceptual simplicity of Agnostic is one of its strong points.
Note
The term schema is unfortunately overloaded by database vendors to mean different things. In most databases, a schema is a description of the structure of your data, i.e. the tables, columns, and data types. In other databases (e.g. Postgres) a schema is a logical grouping of objects that have shared access controls.
To avoid confusing, we use the word “schema” in this documentation to refer specifically to the Postgres definition of the term. We use the terms “database” or “database structure” to describe the definitions of tables, columns, etc.
Let’s quickly review the essential elements of Agnostic’s design…
Pure SQL¶
In order to truly be agnostic, as well as reduce learning curve, Agnostic migration scripts are always written in pure SQL. You won’t need to learn a new API just to write migration scripts, and you also won’t be confined by the capabilities of that API.
Some migrations systems can convert migrations written in their native API to pure SQL for DBAs that insist on seeing real SQL. However, like most machine-generated code, this transliterated SQL tends to be overly complex and difficult to read. By writing migrations in SQL from the start, you can write them to be human-readable.
Data vs. Database Structure¶
Some migration systems offer to help manage data for you. Others focus exclusively on managing the database structure only.
Agnostic focuses on managing the database structure, but that does not prohibit managing data. The “pure SQL” aspect of writing migrations means that you can of course write DML statements to manipulate data.
Agnostic provides tools for verifying the correctness of database structure modifications (see Write & Test Migrations), but due to the innumerable complexities of real world data, it does not attempt to verify correctness of data modifications.
Up vs. Down¶
Some migration systems allow you write both “up” scripts (upgrade the database structure) and “down” scripts (revert the database structure to an earlier version).
In Agnostic, there is no concept of a “down” migration, for the following reasons:
- Allowing two scripts for each migration complicates the storage and metadata for migration scripts.
- If an “up” migration discards some data, such as dropping a table, then there is no possible way to write a “down” migration that restores that data.
- Despite being nominally optional, “down” scripts only work if all migrations have “down” scripts. If even a single migration lacks a “down” script, then there is no possibility for downgrading.
- The need for downgrading a database structure in production is very rare; for those use cases where it is valuable, a backup may be faster, easier, and safer.
There are many limitations and caveats on “down” scripts, and in years of development work, this author has decided that the costs far outweight any benefits.
Storing Migrations¶
Migration scripts are stored in files contained within a directory of your
choosing. By convention, this directory is called migrations
and Agnostic
will look for it in your current working directory, but you can choose any
directory and pass it to Agnostic with the --migrations-dir
argument.
There is no prescribed layout for files within this folder; you are free to arrange your migration scripts however you want. All you need to know are these two rules:
- Migrations are named according to their relative path within the migration directory, without the
.sql
suffix.- Migrations are sorted by name.
These rules are illustrated by the following example. Assume that you have the following directory tree.
migrations/
add_phone_column.sql
drop_last_name_index.sql
social/
add_friends_join_table.sql
add_favorites_column.sql
readme.txt
zero_balance_constraint.sql
Agnostic will scan this directory and enumerate the following migration names:
add_phone_column
drop_last_name_index
social/add_friends_join_table
social/add_favorites_column
zero_balance_constraint
Each migration has been named by taking its path (relative to the migrations
directory) and removing the .sql
suffix. Files without a .sql
suffix are
ignored. The names are sorted (case sensitive) so that they will always be
applied in a deterministic order.
Danger
The first rule of Agnostic migrations is:
You do not talk about Agnostic migrations!
No wait, hold up… that’s the first rule of Fight Club. Sorry, I was getting really hyped up. The first rule of migrations is actually much tamer, but no less important:
Do not rename migrations after you have deployed them!
Migration names are used to keep track of which migrations have been applied and which have not been applied. (That process is described further down.) If you rename a migration, it will likely lead to that migration being applied twice, which could result in a migration failure.
In a development environment, you’ll probably be fine renaming migrations, as long as you and other developers know how to rebuild a database from scratch. But in a production environment, it’s just asking for trouble.
Sample File Layout¶
You may now be wondering:
How does Agnostic manage dependencies between migrations?
What a good looking question, fair reader!
Some migrations systems ignore this question altogether, and other systems tackle this question by introducing complex dependency resolution — yet another cognitive load for developers who want migrations that “just work”.
Agnostic’s simple and open-ended approach allows you to manage dependencies however you like, but without introducing a lot of extra work.
Here is an example file layout for migrations that minimizes dependency management without adding significant cognitive load. This is just an example, of course! You may find other systems that work even better for you own team, and Agnostic is cool with that.
Let’s assume that you use semantic versioning or
something like it. We will group all migrations into subdirectories, where each
subdirectory has a 6 digit name that corresponds to a semantic version number.
For example, version 1.2.3 would be named 010203
and version 12.34.56 would
be named 123456
.
This convention gives us a migrations directory layout like this:
migrations/
010000/
add_address_line_2.sql
add_home_phone.sql
010001/
add_cell_phone.sql
010200/
normalize_phones.sql
020000/
add_user_join_table.sql
Note
You can nest directories as deeply as you want, in case you want more fine-grained subgroups.
The beauty of this simple arrangement is that Agnostic will automatically sort migrations into the correct order: scripts for version 1.0.1 run before scripts for version 1.2.0, which in turn run before scripts for version 2.0.0. Any dependency conflicts between versions are automatically handled for us, with hardly any extra work on our own part.
But what about dependency conflicts within a single version? Again, Agnostic doesn’t prescribe a single, right answer. You have a lot of options, and it’s best for your team to pick a convention that works for you and stick to it. Here are some ideas:
- If conflicts are related to the same feature, that might be a good hint that they belong in the same migration script. Try combining them into a single SQL script where the statements are re-ordered to solve the dependency.
- Re-order the migrations by prefixing the file names with special characters.
An exclamation (
!
) sorts to the top, while an at-symbol (@
) sorts to the bottom. - If you have dozens or hundreds of migration scripts per version, then the special character approach may get cumbersome. Try moving the scripts that have dependency conflicts on each other into a subdirectory together, and then use special characters to reorder them within that subdirectory.
Metadata¶
Migration metadata is stored in the same database that Agnostic is managing for you. This arrangement is highly convenient: Agnostic already has access to this database, and the metadata stays right next to your data. If you backup your database, then your Agnostic metadata is backed up, too!
The metadata table looks like this:
CREATE TABLE "agnostic_migrations" (
name VARCHAR(255) PRIMARY KEY,
status VARCHAR(255),
started_at TIMESTAMP,
completed_at TIMESTAMP
)
We saw in a previous section how the migration name is determined (relative path, minus the .sql
suffix). The status can be any of the following:
- bootstrapped: The migration was added to the table when the migration system was bootstrapped, but it was never actually executed.
- succeeded: The migration was successfully executed.
- failed: The migration failed.
- pending: The migration has not been executed yet, but would be executed if
you ran the
migrate
command.
The started_at
and completed_at
columns make for a simple audit history,
so that you can see when various migrations were actually applied to a
particular system. For a more thorough explanation of bootstrapped, see: Build vs. Migrate.
Running Migrations¶
Now that we know how migration files are stored on disk and how migrations are represented in a table, we can complete the puzzle: running migrations. This is a rough outline of how migrations are executed.
- Make a backup, if requested.
- Compute pending migrations
- Enumerate all migration files in the migrations directory and sort them as described previously.
- Enumerate all the migrations that exist in the metadata table.
- The “pending” migrations are those that exist on disk but not in the metadata table.
- For each pending migration:
- Enter the migration into the metadata table, set the status to
failed
, and set thestarted_at
time to the current time. - Try to run the pending migration.
- If it succeeds, change the status to
succeeded
and set thecompleted_at
time to the current time. - If it fails, abort the entire process. If a backup was requested in step 1, try to restore from that backup now.
- Enter the migration into the metadata table, set the status to
- If all migrations completed successfully and a backup file was created in step 1, then remove that backup file.
Note that Agnostic fails fast: an error in any single migration causes the entire process to be aborted. In order to make this process as painless as possible, Agnostic backs up the database before it attempts to migrate it. This backup is automatically restored in the event of a failure.
Note
If restoring from backup fails, please note that the backup file will not removed. It remains on disk so that you can attempt a manual recovery.
Some database systems have transactional DDL that allows Agnostic to roll back all of the migrations in the event of a failure. Agnostic does not, however, rely on this feature by default, for two reasons:
- Not all DDL statements are transactional. We don’t want you to think you have a transactional DDL safety net only to find that it’s not there at that one, heart-thumping moment when you’re migrating a major production database and it fails.
- The overhead of creating a backup is negligible for small and medium sized datasets — no more than a few seconds.
If you are confident that you don’t need this feature, and you wish to avoid the
overhead of creating a backup file, you may pass the --no-backup
option to
Agnostic.
Build vs. Migrate¶
Most migration systems are part of an ORM, and most ORMs have an option to define the database structure using a native API, then generate SQL statements to build that database structure. This naturally leads to a difficult question:
How do we ensure that the build process always results in the same exact database structure as migrating?
This is deceptively difficult. Small difference in database structures across multiple instances of your application can lead to obvious, catastrophic failure or—even worse—can lead to the ticking time bomb of slow-but-unnoticed data corruption. This problem can reach nightmarish magnitudes if you have software deployed on hundreds or thousands of customer sites.
It’s imperative that all deployed instances of your application have exactly the same database structure.
Despite the obvious need, it’s not clear how best to pursue this stated goal. One possibility is to ignore your ORM’s database builder and always build new instances solely from migrations. With this convention, your initial database structure is treated as a “migration #1”, and (along with a deterministic migration sort order) ensures that all instances will always be built identically.
This approach does have drawbacks, though:
- Your ORM’s database builder is part of the benefit of using an ORM! You are creating additional work and also run the risk that the migration script you write doesn’t perfectly match what the ORM expects.
- It feels inefficient to have to build a brand new database structure by building a series of old, crufty database structures first.
The other approach is to try to maintain your ORM database structure and migrations in parallel, hoping, praying, and tediously testing to make sure that migration scripts perfectly replicate the effect of changing your ORM models.
Agnostic doesn’t have an opinionated stance on this question.
You are free to pick either approach, but if you decide to maintain your ORM database structure and migrations in parallel, then Agnostic can make this process easier and safer.
When you first bootstrap Agnostic on a given database structure, it loads all of
the existing migrations and sets their statuses to boostrapped
— but it
doesn’t actually execute any of them. This special status indicates that these
are migrations that already exist in the current database structure, but instead
of being put there by running migration scripts, they were put their by the
ORM’s database build tool.
When Agnostic sees this status, it will know that it does not need to run these migration scripts again. (For more information on how to do this, see: Write & Test Migrations.) Once you get used to Agnostic, you may even want to include the bootstrap step in your database build process.
On the other hand, if you want to build all new instances from scratch purely
using migrations, then you don’t want existing migrations to be bootstrapped,
because that would prevent any of them from running at all! You can disable this
behavior by passing the --no-load-existing
option to the bootstrap
command.
Workflow¶
Overview¶
Agnostic doesn’t impose any particular workflow—that’s one of its selling points! This document describes a hypothetical workflow for the purpose of illustration. By stepping through this workflow, we may gain a better insight into how to use Agnostic within any workflow of our choosing.
In this hypothetical scenario, we are going to be developing a customer database. Along the way, we will use Agnostic’s testing tools to make sure that our migrations work the way we expect them to. We will also handle a merge from a coworker’s source code.
Note
We’ll be using PostgreSQL throughout this example workflow. If you’re using a different system, then you may need to adjust some of the SQL statements, but the concepts should translate readily.
Initial Set Up¶
To begin with, let’s assume we’ve never used Agnostic on this project before. We need to create a directory to hold our migration scripts and then bootstrap the migrations system.
~/myapp $ mkdir migrations
~/myapp $ agnostic -t postgres -u myuser -d mydb bootstrap
Migration table created
That’s it! If you’re familiar with other migrations systems, then you may be wondering where the rest of the set up procedure is.
Snapshot Current Database Structure¶
In our hypothetical application, we are using an ORM. We define our data models using the ORM API and then the ORM generates SQL to build a database structure.
Note
This ORM build process saves us effort, as long as we can make sure that building a new database structure from scratch always produces the same exact result as running migrations on an existing database structure.
Our ORM outputs SQL to create a customer table like this:
CREATE TABLE customer (
name VARCHAR(255) PRIMARY KEY,
address VARCHAR(255),
phone VARCHAR(255)
);
Before we begin working on a new task, we should snapshot the current database structure. A snapshot contains database structure statements, but it does not include any data. Snapshots are useful for testing, which we will see later.
~/myapp $ agnostic -t postgres -u myuser -d mydb snapshot current.sql
Creating snapshot...
Snapshot written to "current.sql".
We name the file current.sql
so that we can compare later database builds to
it.
Build New Database With ORM¶
We put Agnostic aside for a while and tinker with our ORM data models. After some time building and testing our new features, we ask the ORM to build a new database.
CREATE TABLE customer (
name VARCHAR(255) PRIMARY KEY,
address VARCHAR(255),
home_phone VARCHAR(255),
cell_phone VARCHAR(255)
);
This example may be small enough that the database changes are obvious, but real world use cases will often be far more complex. Let’s use Agnostic to help us understand what the ORM has changed. We’ll begin by taking a second snapshot.
~/myapp $ agnostic -t postgres -u myuser -d mydb bootstrap
Migration table created
~/myapp $ agnostic -t postgres -u myuser -d mydb snapshot target.sql
Creating snapshot...
Snapshot written to "target.sql".
Now we have two SQL files, current.sql
and target.sql
. The former
describes how our database structure looked before we started working on these
new features, and the latter describes the target state that we want our
migrations to produce.
Let’s compare these two database structures to identify the differences.
~/myapp $ diff current.sql target.sql
51c51,52
< phone character varying(255)
---
> home_phone character varying(255),
> cell_phone character varying(255)
The diff helps us see that the phone
column was replaced with home_phone
and cell_phone
. Now that we have some idea what we need to do, we can write
some migrations that convert the database structure in current.sql
into the
database structure in target.sql
.
Write & Test Migrations¶
We could write one migration to change both phone number fields, but for the purpose of highlighting Agnostic’s features, we’ll write these as two separate migrations.
~/myapp $ cat > migrations/add_home_phone.sql
ALTER TABLE "customer" RENAME COLUMN "phone" to "home_phone";
~/myapp $ cat > migrations/add_cell_phone.sql
ALTER TABLE "customer" ADD COLUMN "cell_phone" VARCHAR(255);
With most migration systems, we’d simply cross our fingers, check in these scripts, and hope that they produce the precise effect that we desire. However, we’d really like to test that these migrations produce exactly the same database structure that the ORM generated.
Here’s a possible testing process:
- Load a “current” snapshot of the database.
- Run migrations on the current snapshot.
- Snapshot this new, migrated database.
- Build a new database using your ORM.
- Snapshot this ORM-built database.
- Compare the migrated snapshot to the target snapshot.
- If there are any differences between the snapshots, then the test fails.
- If the snapshots are identical, then the test passes and we can go to lunch early!
Sounds like a lot of thankless, tedious work, right?
Luckily, Agnostic automates this process!
~/myapp $ agnostic -t postgres -u myuser -d mydb test current.sql target.sql
WARNING: This will drop the database "myapp"!
Are you 100% positive that you want to do this? [y/N]: y
Dropping database "myapp".
Loading current snapshot "current.sql".
About to run 2 migrations in database "myapp":
* Running migration add_cell_phone (1/2)
* Running migration add_home_phone (2/2)
Finished migrations.
Snapshotting the migrated database.
Comparing migrated database to target database.
Test passed: migrated database matches target database!
In just a few seconds, Agnostic was able to perform that tedious testing process that we were dreading, and better yet, it proves that our migrations do exactly what we hoped for!
You can now commit your migrations with a high degree of assurance. (If you are actually heading out to lunch right now, can you get me a sandwich? I’m famished. Thanks!)
Merge Coworker’s Branch¶
Of course, you always write perfect code on the first try, don’t you, dear reader? But what happens when you merge in your coworkers’ code? You can easily test that their migrations work correctly and are compatible with your own migrations.
~/myapp $ # SCM checkout original version && ORM build database
~/myapp $ agnostic -t postgres -u myuser -d mydb bootstrap
Migration table created
~/myapp $ agnostic -t postgres -u myuser -d mydb snapshot current.sql
Creating snapshot...
Snapshot written to "current.sql".
~/myapp $ # SCM checkout latest version && ORM build database
~/myapp $ agnostic -t postgres -u myuser -d mydb bootstrap
Migration table created
~/myapp $ agnostic -t postgres -u myuser -d mydb snapshot target.sql
Creating snapshot...
Snapshot written to "target.sql".
~/myapp $ agnostic -t postgres -u myuser -d mydb test current.sql target.sql
WARNING: This will drop the database "myapp"!
Are you 100% positive that you want to do this? [y/N]: y
Dropping database "myapp".
Loading current snapshot "current.sql".
About to run 3 migrations in database "myapp":
* Running migration add_cell_phone (1/3)
* Running migration add_home_phone (2/3)
* Running migration add_office_phone (3/3)
Error: failed to run external tool "psql" (exit 3):
ERROR: column "phone" does not exist
Shnikeys! Your coworker’s add_office_phone
migration didn’t work. What could
have gone wrong? Let’s take a look at coworker’s migration.
~/myapp $ cat migrations/add_office_phone.sql
ALTER TABLE "customer" RENAME COLUMN "phone" to "office_phone";
Recall that Agnostic sorts migrations alphabetically, so your migration
add_home_phone
renames the phone
column before your coworker’s migration
script has a chance to run.
Fortunately, Agnostic made it easy to catch this mistake, so let’s try fixing it:
~/myapp $ cat > migrations/add_office_phone.sql
ALTER TABLE "customer" ADD COLUMN "office_phon" VARCHAR(255);
Now re-execute the test:
~/myapp $ agnostic -t postgres -u myuser -d mydb test current.sql target.sql
WARNING: This will drop the database "myapp"!
Are you 100% positive that you want to do this? [y/N]: y
Dropping database "myapp".
Loading current snapshot "current.sql".
About to run 3 migrations in database "myapp":
* Running migration add_cell_phone (1/3)
* Running migration add_home_phone (2/3)
* Running migration add_office_phone (3/3)
Finished migrations.
Snapshotting the migrated database.
Comparing migrated database to target database.
Test failed: migrated database differs from target database.
--- Migrated DB
+++ Target DB
@@ -50,7 +50,7 @@
address character varying(255),
home_phone character varying(255),
cell_phone character varying(255),
- office_phon character varying(255)
+ office_phone character varying(255)
);
Error: Test failed. See diff output above.
This time, the migration runs successfully, but it doesn’t produce the correct database structure. Agnostic points out where the migrated database differs from the target database, and the mistake is blindingly obvious: you misspelled “phone” in your migration!
One last fix and re-test:
~/myapp $ sed -i 's:office_phon:office_phone:' migrations/add_office_phone.sql
~/myapp $ agnostic -t postgres -u myuser -d mydb test current.sql target.sql
WARNING: This will drop the database "myapp"!
Are you 100% positive that you want to do this? [y/N]: y
Dropping database "myapp".
Loading current snapshot "current.sql".
About to run 3 migrations in database "myapp":
* Running migration add_cell_phone (1/2)
* Running migration add_home_phone (2/2)
* Running migration add_office_phone (3/3)
Finished migrations.
Snapshotting the migrated database.
Comparing migrated database to target database.
Test passed: migrated database matches target database!
Nice work, sir or madam! You’ve earned an 80’s style movie slow clap.
Clap… Clap… Clap… Clap…
Note
Because migration testing is so easy, you can easily retest multiple times at various stages in your team’s software development lifecycle. In particular, you should consider running one last test before each release that covers all of the migrations in that release. This helps catch merge issues.
Migrate Production¶
When you’ve done your due dilligence during development, there’s not much left to be surprised by when you migrate your production databases.
~/myapp $ agnostic -t postgres -u myuser -d mydb migrate
Backing up database "myapp" to "/tmp/tmpuy2v7hxc".
About to run 3 migrations in database "myapp":
* Running migration add_cell_phone (1/3)
* Running migration add_home_phone (2/3)
* Running migration add_office_phone (3/3)
Migrations completed successfully.
Removing backup "/tmp/tmpuy2v7hxc".
Smooth as pie, easy as silk. (Is that a thing people say?)
Note
Agnostic is a well-behaved command line script so that it is easy to integrate in your deployment or upgrade scripts. Once you get comfortable with it, migrations can just be another step in your lights-out build/deploy process.
Credits¶
Thanks to DARPA and Hyperion Gray LLC.
Thanks to all the engineers I worked with over the years who commiserated about the hassle of managing schemas, and helped me brainstorm to find a better way.
Thanks to all the bad migrations systems I have used (and occasionally wrote myself) that taught me what not to do.