Last time we talked about Puppet, I told you about ralsh, the tool which allows you to access Puppet’s Resource Abstraction Layer. It was a bit esoteric for many of you; I hope today’s entry will make up for it. I’m going to talk about the Puppet language today, as well as going over the development strategy I use when working on my Puppet manifests.
Let me start by giving you an overview of the language. The Puppet language is a domain-specific langage (DSL) for describing system configuration. The language is declarative: you specify the configuration you desire, it is up to Puppet to figure out how to get there. The fundamental “atom” of the Puppet language is the resource. Get used to that term, ladies and gentlemen, you’ll be hearing it a lot. A resource is a logical component of configuration: a file, a package, a user, a cron job; these are all resources which Puppet can manage natively. Puppet also gives you the ability to create defined resources which are collections of native resources that you can parameterize and treat as a single logical resource. I’ll talk more about that later.
Puppet also has variables. Because it is a declarative language, Puppet has scoping and assignment rules that are different from other (imperative) programming languages. Many people find these rules frustrating and/or counter-intuitive, but they turn out to be less limiting in practice than you’d expect. The first thing to remember about Puppet variables is that you cannot change the value of a variable within a single scope. Since Puppet is declarative, you cannot rely on file order to represent the order in which a configuration is instantiated; changing a variable within a single scope would require file ordering to be consistent with instantiation. The second thing to remember about Puppet variables is that they are dynamically scoped, which essentially means that scope hierarchies are created based on where code is evaluated as opposed to where it is defined. The Puppet documentation has a very good example which helps illustrate this constraint in practice.
Collections of resources, defined resources, and variables can be grouped into a class. Although these classes share some of the semantics of object-oriented programming classes (inheritance, ability to over-ride certain declarations in a subclass), one should be careful to remember that Puppet is declarative, not object-oriented. One of the biggest sources of frustration I see in new Puppet users is not trying to think in line with Puppet’s model. Puppet is a somewhat opinionated tool; the faster you get used to the Puppet Way, the more benefit you will get from using it.
Collections of classes, definitions, and resources can be grouped together in a module. Modules are a special helping of Puppet’s awesome sauce, and I’ll get into them a bit more later in this article, when I walk you through the creation of a module.
Finally, Puppet allows you to define nodes, which can include classes, definitions, and resources. Nodes are similar to, but distinct from classes; nodes can inherit other nodes, and they define a new variable scope when declared. However, as your Puppet infrastructure grows, you will probably transistion away from using Puppet’s internal node declarations and use External Nodes instead (a topic I’ll cover another time.)
Let’s talk about how I combine these elements when working on Digg’s Puppet manifests. Like many other programming languages, the development process in Puppet is one of iteratively creating abstractions to encapsualte lower-level details. From a top-down perspective, I tend to think of server roles - these are logically-contained units of functionality that a given node can provide. For example, a Digg webserver might have the following roles:
- base node
- This is the basic role that all Digg nodes fulfill. It encapsulates all the things we want to be present on every server; packages we always want, administrative scripts, ssh configuration, etc.
- memcache
- A server fulfilling this role has memcached running and is able to serve as part of a memcache pool.
- digghttp
- This role encapsulates all of the configuration we need in order for a server to be part of our web cluster.
Once I’ve come up with the high-level role I want to create, I will usually create a module for this. In my Puppet modulepath I create the minimal module structure; for example: mkdir -p memcached/manifests && touch memcache/manifests/init.pp. Since I know a high-level role like this will require config files and templates, I might create those as well: mkdir -p memcache/{manifests,files,templates} && touch memcache/manifests/init.pp. At this point I’ll often start an incremental development process. I’ll define a few resources I know I need, make sure they work, and then add a few more. This development style has caused me problems; it is easy to forget to set up proper relationships if you are adding things to your manifest as you go. Still, this works well for me and I usually catch ordering issues in later testing.
So, what do we need to configure a memcache node? Well, we need to get memcached installed, so we’ll need a package resource:
package {
"memcached":
ensure => installed;
}
After memcached is installed we need to configure it. This calls for a file resource. If our configuration is relatively standard across the cluster, we can use a static file. We put memcached.conf in the files/ subdirectory of our module and define a file resource:
file {
"/etc/memcached.conf":
source => "puppet:///memcached/memcached.conf",
require => Package["memcached"],
}
If we’d like to parameterize this a bit more, we can use a template instead. We put memcached.conf.erb in the templates subdirectory of our module and define that file resource like so:
file {
"/etc/memcached.conf":
content => template("memcached/memcached.conf.erb"),
require => Package["memcached"],
}
Puppet templates give you the full power of Ruby (they are just ERB with access to Puppet variables!) memcached.conf.erb might look like:
# AUTOGENERATED BY PUPPET
# memcached config file
# Run memcached as a daemon. This command is implied, and is not needed for the
# daemon to run. See the README.Debian that comes with this package for more
# information.
-d
# Log memcached's output to <%= memcached_log_dir %>
logfile <%= memcached_log_dir %>
# Be verbose
<% if verbose -%>
-v
<% end -%>
# Start with a cap of 64 megs of memory. It's reasonable, and the daemon default
# Note that the daemon will grow to this size, but does not start out holding this much
# memory
-m <%= memcached_memory_cap %>
# Default connection port is <%= memcached_port %>
-p <%= memcached_port %>
# Run the daemon as root. The start-memcached will default to running as root if no
# -u command is present in this config file
-u <%= memcached_user %>
I won’t go into detail about how to use ERB but, as you can see, if you define variables (perhaps in your site.pp file) like memcached_log_dir, they will be inserted into the appropriate places in the template.
To complete our memcached module, we probably want to be sure that the memcached service is running, and configured to start at boot. A service resource will do the job nicely:
service {
"memcached":
enable => true,
ensure => running,
subscribe => [ Package["memcached"], File["/etc/memcached.conf"] ],
}
In a Puppet service, “enable” defines whether the service will be started at boot, while “ensure” defines whether the service should be running or not. It is a subtle distinction, but an important one. The “subscribe” parameter tells Puppet to refresh the service when one of the referenced resources is changed; if the memcached package is upgraded, or the config file changes, Puppet will restart the service.
Putting it all together, we’ll end up with a memcached directory under our Puppet modulepath with a manifests subdirectory and a templates subdirectory. The templates subdirectory will contain the ERB file we defined above (memcached.conf.erb.) The manifests subdirectory has a file, init.pp containing the finished memcached manifest:
class memcached {
package {
"memcached":
ensure => installed;
}
file {
"/etc/memcached.conf":
content => template("memcache/memcached.conf.erb"),
require => Package["memcached"],
}
service {
"memcached":
enable => true,
ensure => running,
subscribe => [ Package["memcached"], File["/etc/memcached.conf"] ],
}
}
Now, to set up a memcached node, all you have to do is add: include memcached to the node definition. Puppet is smart enough to autoload classes from your modulepath and knows how to map the template path we used (memcache/memcached.conf.erb) to the actual location of the template file in modulepath/memcached/templates/. This autoloading magic is a big win and is the reason we use modules for all of our Puppet manifests.
I intentionally chose a fairly simple example this time around. Next time I hope to delve deeper into Puppet and show you how to get the real work done. See you then!