Tips for Running Scripts in Production

Publish Date: Jun 16 '20

I know, how dare I suggest running a script in production. I am a Site Reliability Engineer, I should never condone such craziness. But truth is, there will likely come a time when you need to run a script in production to update or cleanup some data. In this post I am going to give you some tips about how to write and execute a script in production as safely as possible.

1) Track Your Progress

Nothing is worse than writing a giant block of code, pasting it into a console, then hitting enter and watching it sit there. You have no idea where the code is in the script or what it is doing and that, at least for me, is terrifying.

For this reason, you always want to make sure you output some sort of progress meter from your scripts. This allows you to follow along and know where you are in your process. In the event you are using Ruby, consider some well placed puts statements. Below is a script that we recently used at DEV to clean up some incorrectly cached data. Notice the puts statements throughout the script that allow us to follow along as it does its work.

invalid_articles = []
Tag.where(id: tag_ids).find_each do |tag|
  puts tag.taggings.count

  tag.taggings.each_with_index do |tagging, index|
    puts index if index%100 == 0
    article = tagging.taggable
    next unless article

    result = article.update(cached_tag_list: article.tags.pluck(:name).join(", "))
    if result
      puts "Artcle update success #{article.id}"
    else
      puts "Artcle update failure #{article.id}"
      invalid_articles << article
    end
  end
end

Also notice that we are keeping track of any invalid articles that we might find while running this script. Especially when you are cleaning up bad data, always assume you might stumble across more of it and prepare for that in your script. Here we use an if/else statement to catch any invalid articles. You could also use a begin/rescue block.

2) Record Before and After States

When you are updating records there is always a chance something will go off the rails. In order to have the ability to "roll back" track your before state as you are making the updates. If we update our script above to do this, here is what it would look like.

invalid_articles = []
before_update_tag_lists = {} 
Tag.where(id: tag_ids).find_each do |tag|
  tag.taggings.each_with_index do |tagging, index|
    puts index if index%100 == 0
    article = tagging.taggable
    next unless article
    # Record the current cached tag list for every article
    before_update_tag_lists[article.id] = article.cached_tag_list

    result = article.update(cached_tag_list: article.tags.pluck(:name).join(", "))
    if result
      puts "Artcle update success #{article.id}"
    else
      puts "Artcle update failure #{article.id}"
      invalid_articles << article
    end
  end
end

If anything goes wrong while this script is running, the before_update_tag_lists hash has all of our original data in it. Using this original data we can loop back through the articles and reupdate them with the old lists if necessary.

3) Write Production Quality Code

It can be tempting when you are writing a script to use as little syntax as possible. Usually, this means throwing in single letter variables everywhere. You probably won't ever use this code again, so why waste time making it look pretty and readable? The reason you want to make it pretty and readable is because then the script is easier to understand and follow. Having a script that is easy to understand will help you avoid writing bugs.

In my script example above I clearly write out what each object is that I am working with. This allows nearly anyone to look at the script and be able to understand what it is doing. This leads me to my next script writing tip.

4) Have Your Script Reviewed

The same way you never want to push code out to production without a code review, you shouldn't run a script in production without a code review. This is another reason why you want to make sure your script is understandable and readable, because you want someone else to be able to also figure out what it is doing.

We all know the value a second set of eyes on our code brings. Even if you find yourself in a situation where time is tight and you need to run a script ASAP, try as hard as you can to get a second set of eyes on it. I can't tell you the number of times a fresh set of eyes has kept me from botching a script update.

5) Use Screen or Tmux for Long Running Scripts

Tmux and Screen allow you to start an ssh session in a shell and keep that shell active even through network disruptions. This ensures that if you lose connection while your script is running, the script run will not be interrupted. Thanks @kinduff for the reminder!

Alejandro AR • Jun 17 '20

If the task takes a good amount of time, you have risks of being disconnected from either the SSH session, your internet provider, etc.

To avoid this, I recommended running these scripts (although, I do not recommend running scripts like this at all) using screen. It's super easy to use:

SSH into the desired instance
Start a new screen session using screen
Run the long running script
Press Ctrl + a followed by d to detach the session
You can now close everything, even the SSH session
You can reattach to the screen session using screen -r

Screen has a lot of awesome things, make sure to check out the man page.

Running a script in production is never ideal, but if you use these tips when you do it, it can make the experience much less daunting.

Happy scripting!

Comments 19 total

Shaiju TJun 16, 2020
Hi, 😄, By the way are you talking about SQL Script or YML Script . Because Ruby is a programming language right ?
- Molly Struve (she/her)Jun 16, 2020
  Nope, I am talking about Ruby. In this sense, I am using "script" to define a small chunk of Ruby code that is run standalone in a console.
- Juan A. Fuentest TorcatJun 17, 2020
  Programming languages can still be used to write scripts ;)
  - Shaiju TJun 17, 2020
    What kind of script is used in this post. Is there a documentation or tutorial. so I can learn?
    - Molly Struve (she/her)Jun 17, 2020
      The script in this post is written in Ruby. I highly recommend Googling Ruby Tutorial and you will find MANY resources to help you learn Ruby in whatever learning style is best for you.
RafiJun 17, 2020
I recently came across this gem data-migrate which allows you to do data migrations like you do schema migrations.
- Molly Struve (she/her)Jun 17, 2020
  One reason I am wary about doing data migrations in schema migrations is bc sometimes they can take a large amount of time and if your schema migrations are executed inline with your deploy pipeline that can block a deploy. At DEV we use DataUpdateScripts which is pretty similar but it runs asynchronously in the background.
  
  Set Up Framework For Running Data Update Scripts #6025
  
  mstruve posted on Feb 11, 2020
  
  What type of PR is this? (check all applicable)
  
  [x] Feature
  
  Description
  
  While hooking up our first Elasticsearch model Tags, I realized that in order for search to work I would have to manually reindex all of the tags in our database before the search code went live. This is not a huge deal for us, but it means an extra undocumented step that others who are using this codebase might miss. This framework would give us the ability to run scripts like this the same way we run migrations.
  
  module DataUpdateScripts class IndexTags def run Tag.find_each(&:index_to_elasticsearch) end end end
  
  A script like this would be deployed ahead of using the new data. When the app deploys or a local environment is updated the DataUpdateWorker would look at all the files in the data_update_scripts folder. Any file that has not been run, ie is not in our database, it will create a record for and then call run on that class.
  
  We used something like this at my prior company because we had to keep 5+ VPCs data in sync and it worked out really well.
  
  Why not use migrations? The reason we may want to separate this from migrations is so that it is not tied to our deploy process. In the past, I have had scripts that take hours to run bc they touch a lot of data and you don't want that holding up a deploy. This way a deploy goes out, kicks off a worker, and that worker does its thing in the background for however long it needs.
  
  THOUGHTS?!
  
  Added to documentation?
  
  [x] readme
  
  If people are on board with this approach I will add the necessary documentation to this branch as well
  
  View on GitHub
  - RafiJun 17, 2020
    Awesome !!!
  - RafiJun 17, 2020
    But how do you keep track of the order in which script ran? Do the script file names get timestamp attached to them similar to regular migrations?
    - Molly Struve (she/her)Jun 17, 2020
      Yep! That is exactly how it works, same as migrations. Here you can see a list of our scripts github.com/thepracticaldev/dev.to/...
Ben SinclairJun 17, 2020
When I do this (which is more often than I'd admit in an interview) I'll usually do something like this:
# result = article.update(cached_tag_list: article.tags.pluck(:name).join(", ")) puts "result = article.update(cached_tag_list: #{article.tags.pluck(:name).join(", ")}" continue
Excuse my butchered pseudo-ruby, I don't use it so am inferring from yours. What I mean, though, is I display what would be called as a kind of dry-run, and short-circuit the rest of the loop. Then when I'm ready, I uncomment the "real" line.
- Molly Struve (she/her)Jun 17, 2020
  That is definitely another great way to try it out first before actually doing it. When I "dry run" a script I usually run the script for a single object, check that it looks how I expect and then l let the script loose on the rest.
Alejandro ARJun 17, 2020
If the task takes a good amount of time, you have risks of being disconnected from either the SSH session, your internet provider, etc.

To avoid this, I recommended running these scripts (although, I do not recommend running scripts like this at all) using screen. It's super easy to use:
1. SSH into the desired instance
2. Start a new screen session using screen
3. Run the long running script
4. Press Ctrl + a followed by d to detach the session
5. You can now close everything, even the SSH session
6. You can reattach to the screen session using screen -r
Screen has a lot of awesome things, make sure to check out the man page.
- Molly Struve (she/her)Jun 17, 2020
  Oh man, YES! screen or tmux sessions are a must as well for long-running scripts!
- Juan A. Fuentest TorcatJun 18, 2020
  I actually learned about tmux one time I had to run a script that took hours to finish, total savior <3
- Ian PrideJun 19, 2020
  Screen is a must for SSH... or even local if you want to be minimal in your shells.
Julien CamblanJun 17, 2020
I learned all these lessons the hard way by running scripts in production without applying them in the first place. I think I'd still have hair if I'd read this kind of article before. So thank you, it will save lives for sure!
Lee NobleJun 20, 2020
A technique I used recently was to have my script write out two SQL files. One with all the fixes, and the other to revert everything back to how it was. The script took a long time to run but were completely non destructive. The SQL could actually then be examined and run on a copy of the database and the results checked, both the fixing SQL and the reversion (just check that both databases are the same again). Then just run the fix SQL on production.
Alex BitJul 11, 2023
great article. while this is about a script for changing data, the points are also applicable to writing scripts for transforming code / codemods. thanks for sharing molly.

Add comment