Amazon Redshift Unexpected Commit

If you are familiar with sqlhammer.com, then you know that I heavily specialize in the Microsoft Data Platforms. My experiences with Microsoft, and specifically the SQL Server products, established an expectation that explicit transactions would always commit/rollback successfully or there would be an exception thrown to explain the problem.

That expectation set the stage for a bit of confusion on my part when I found myself using Amazon Redshift and a failed statement, which was in an explicit transaction, did not trigger a rollback!

What was going on?

My outward composure was as expressionless as a seasoned Aes Sedai, but inside I was panicking. Transactions exist to keep you safe and maintain data integrity by making sure that multi-step processes succeed completely or rollback as if they never happen. If this fundamental ACID concept is being violated, I might as well start questioning gravity!

After my initial reaction I began my research and quickly found this passage from the Redshift TRUNCATE command knowledge base.

The TRUNCATE command commits the transaction in which it is run; therefore, you can’t roll back a TRUNCATE operation, and a TRUNCATE command may commit other operations when it commits itself.

Demonstration

Now that we know it was the TRUNCATE command in my script which caused, what I considered to be, the odd behavior. This is how it works out in code.

Traditional RDBMS (SQL Server)

In SQL Server, or PostgreSQL for that matter, the TRUNCATE command is allowed in a transaction and it will commit or rollback like any other DML operation. In all of the scripts, below, I will do the following.

  1. Check my row counts
  2. Begin a transaction
  3. TRUNCATE the table
  4. INSERT one row
  5. Check my row counts from within the transaction
  6. ROLLBACK
  7. Check to see if  my row checks revert or not
All commands in the transaction were successfully rolled back. No change to the table.

Massively Parallel Processing – (Azure SQL Data Warehouse)

Still within the Microsoft stack, but as we move into an MPP architecture, we find the behavior change a bit.

The script fails when trying to run the TRUNCATE because it is not valid in a transaction. Rollback is successful without any table changes.

Massively Parallel Processing – (Amazon Redshift)

Up until this point, I feel very safe. Both SQL Server and Azure SQL Data Warehouse make me feel comfortable with my transactions, even though I will have to stop trying to use TRUNCATE within a transaction in Azure SQL DW.

Oh no, it didn’t roll any of my changes back!

As you can see and Amazon describes in their documentation, the TRUNCATE command will commit the explicit transaction. Then we write a row and try to ROLLBACK. That rollback has no transaction to rollback anymore. This is another difference between Redshift and SQL Server. SQL Server will throw an exception stating that there is no transaction to rollback. Redshift silently succeeds.

Take away

This Amazon Redshift behavior is not that hard to work around but it can be a disaster if you are not aware of it. Redshift is not going to warn you of the potential for your transactions to be handled differently than you have explicitly coded them.

At this point, I am educating my team to make sure they all are very aware but I am also considering making the DROP TABLE/CREATE TABLE method a mandatory part of our development standards because DROP/CREATE commands do not cause an automatic commit.

This article has 2 comments

  1. […] Derik Hammer notes that you cannot rollback a TRUNCATE TABLE operation in Redshift: […]

  2. Good information

Leave a Reply

%d bloggers like this: