Category: Uncategorized

“Trust but Verify” Your Chef Infrastructure

A cornerstone of infrastructure as code is treating our infrastructure as we would any other software project by thoroughly testing all changes. As Chef users we have a plethora of options when it comes to testing infrastructure code. We can ensure teams use agreed upon coding standards by linting with Rubocop and Foodcritic.  We can test the outcome of complex cookbook logic quickly with unit testing in Chefspec.  We can perform lengthy, but thorough integration tests with bats, Serverspec, or Inspec running from within Test Kitchen. Combined these tests ensure high quality code powering our infrastructure.

Where testing tends to get complex is how we bring those various frameworks together so that any change to our infrastructure is tested before it enters production. In particular, for those of us that use monolithic repository structure for our chef environments, roles, and cookbooks testing is particularly tricky.  With a monolithic repository all code is stored in a single git repository that a team works out of.  This makes it difficult to determine what chef assets to test as a pull requests comes in. While we may find it acceptable to run full integration tests after changing a single cookbook, we can’t afford the time it would take to run integration tests on every cookbook in our repo when simple change is made. We decompose the changes in a commit to determine which assets have been changed so we can test just those individual assets.

To test just the assets that change in a monolithic repository I wrote Reagan.  Reagan is built to “Trusty but Verify” Chef repository pull requests in Github using Jenkins.  A Jenkins job polls your chef repository for new pull requests to test.  When a new PR is opened Reagan retrieves the list of changed files in the pull request from the Jenkins API.  From that list it builds a list of assets that have been changed.  For data bags, environments, and roles it runs simple JSON validation.  For cookbooks Reagan runs “knife cookbook test” and then checks the chef server to ensure the cookbook version has been incremented.  After performing those basic sanity tests it also runs tests defined in a per cookbook YAML file.  This allows you to vary your testing methods depending on the code.  The reagan_test.yml config contains a simple list of commands to run like this:

tests:
  - rubocop .
  - foodcritic . -f any
  - rspec

Jenkins then updates the PR testing status and depending on configuration may send e-mails or post to chat systems like Slack.  Here’s an example of a successful lint test run by Reagan:

success

Here’s an example of a similar test failing due to a Rubocop offense:

failure

Reagan helped me to ensure quality code entered our production environment without adding burdensome process.  The source is available at https://github.com/tas50/reagan with setup instructions in the Readme.  The gem is available on Rubygems as well.

Using Chef to Graph Deploys in Graphite

It’s pretty obvious at this point that I think Chef is a pretty amazing product.  I’m also quite smitten with Graphite for graphing the world, or at least the little part of the world that I’m responsible for.   Chef combined with Graphite can do some pretty amazing things, one of those things being the graphing of product deploys.

I rely on a little trick that Etsy first showed off (codeascraft – track every release ) where you can graph any value in Graphite as a vertical line when the value is 1 or more.  If you create a metric for deploys you can just send values of “1” every time you do a deploy.  Then you can overlay that data on top of your system or network metrics and search for patterns.  If you were to do this via the command line it would look something like this:

echo “servers.my_current_server.deploy 1 $(date +%s)” | nc graphite.mydomain.com 2003

If you want to run this same sort of thing via Chef you can just create an execute resource.  You can notify that resource anywhere in your recipe that you might consider a “deploy” action and you have ohai data that will allow you to send data to the right location.  Here’s an example:

execute “graph_deploy” do
command %Q[echo “servers.#{node.chef_environment}.#{node[‘fqdn’].gsub(‘.’,’_’)}.deploys.my_app_name 1 $(date +%s)” | nc graphite.mydomain.com 2003]
timeout 5
action :nothing
end

Now for the breakdown:  I setup my Graphite system with all servers are in a folder called “servers”, and under that things are broken out by Chef environment so I use the node.chef_environment variable.  From there I need to make sure the value goes under the current server.  I use FQDNs for my nodes in Graphite, but periods are used as the folder delimiter in Graphite so I need to replace the periods with underscores using gsub.  From there I create a folder for all my deploys since I run multiple applications on a system, and within that folder I create the actual metric with the same name as the service.  The resource times out after 5 seconds so if my Graphite server goes down Chef runs continue and the resource never executes on its own.  If I want to execute, I can notify an action of “run” from another resource.

And here’s the end result giving me a deploy to overlay on a few metrics:

Graphite Graph

A return to roots: Ramblings from Nerd Land

Several years ago when I first toyed around with the idea of registering a domain and once again publishing content on the web, I envisioned a place I could publish my more lucid technology tidbits (aka rants). The original version of this site was just that, with a blog where I included random thoughts and helpful tidbits I had discovered. I’ve since removed that content and I continually find myself discovering new techniques in my day to day work that would benefit others. I’ve decided now is the time to bring back my original Ramblings From Nerd Land blog. I hope to post here from time to time in the future.