Tag Archives: Linux

Creating RR Records in Route53 with Ansible

So if you are like me and using Ansible to deploy all of your infrastructure, you probably run into more than your fair share of issues. Often Ansible modules take a list or a string, but output fancy structured data. One particular case of this is if you are using the route53 module and creating RR records in DNS. Here is what the route53 module accepts as a value:

- value
        The new value when creating a DNS record.  Multiple comma-
        spaced values are allowed.  When deleting a record all values
        for the record must be specified or Route53 will not delete
        it. [Default: None]

So here you can see that it takes a string (the source code indicates it also accepts lists) listing the ip addresses to point to. If you are deploying in a cloud however you probably need to pull the IP addresses from the inventory, which is stored as complex structured data. Extracting a list of IP addresses out of this is an extremely difficult task. To do this we will need to make heavy use of the features Ansible provides us through its Jinja2 variable expressions.

The first thing that we need is a list of all the variables for the machines in a particular group. Ansible doesn’t provide this, but it does provide two similar variables. First, it provides a variable names ‘group’ which gives the names of all the members of a particular group. Secondly it provides the ‘hostvars’ variable which gives you all of the variables in a dictionary with keys as the hostname and the variables as the values. As there is no builtin that does what we want we are going to need a custom filter plugin.

To install this plugin place it in a directory then add that directory to the end of your filter_plugins configuration option. For example if you placed the file in the plugins/filter directory your filter_plugins line might look like this:

filter_plugins     = /usr/share/ansible_plugins/filter_plugins:plugins/filter

This plugin will take a dictionary and a list, lookup all the keys within the list and return their values in a new list in the same order as the keys were in. So we can use this to get all the variables for the machines we are interested in. For example to get the variables for the hosts in the webservers group we can now do this:

{{ hostvars|fetchlistfromdict(groups.webservers) }}

Now that we have the right variables we simply need to filter it down to just the attribute we need. Jinja2 provides the map filter for this purpose. Map takes a list and can either pick out a particular attribute, or run another filter on all the items in the list. So to get the IP addresses we would do this:

{{ hostvars|fetchlistfromdict(groups.webservers)|map(attribute='ansible_default_ipv4.address') }}

If you try to print this out in a debug line you might notice that you don’t get a list in response and Ansible instead gives you a string with the following:

 <generator object do_map at 0x2d7f640>

To work around this issue you need to explicitly convert it to a list. This is fairly simple with the list filter. Adding that onto this example looks like this:

{{ hostvars|fetchlistfromdict(groups.webservers)|map(attribute='ansible_default_ipv4.address')|list }}

You now have a list of all the IP addresses ready to pass into a template, a role, or in this case into the route53 module. Finally we put this all together to create that round robin record we wanted at the start.

route53:
  command: create
  zone: example.com
  record: rr.example.com
  type: A
  value: "{{ hostvars|fetchlistfromdict(groups.webservers)|map(attribute='ansible_default_ipv4.address')|list }}"

Resources

System Calls

What are System Calls?

Systems calls are the interface between userspace and kernelspace. They are essentially procedures belonging to the operating system that you can call when you want something done that can’t be done in userspace. This includes anything system related, accessing devices, files, managing processes and networking. On my 64bit machine with a 3.5.4 kernel there are 344 system calls (cat /usr/src/linux/arch/x86/syscalls/syscall_64.tbl | egrep -v ‘^#’ | egrep -v ‘^$’ | wc -l) .

Calling procedures is generally a simple affair, just push the current context onto the stack and jump to the new address. Calling a system call on the other hand is quite a bit more difficult. The kernel executes in a higher privileged mode on the CPU and userspace executes on a lower privilege mode. To transfer through to the higher privilege mode a CPU exception is triggered, which then launches in the higher mode, however as you can’t pass arguments through this mechanism they are placed in the CPU registers before the exception. Newer CPUs have implemented SYSENTER/SYSEXIT calls (or similar) that work differently and are designed to speed up the process.

You can see the system calls that a process makes while it is running by using strace. Strace runs a program in such a way that it can intercept all the system calls it makes and gather details about them. Watching a program’s system calls is a good way to see what it is doing, if its hanging on IO, if it is stuck in an infinite loop or if it has deadlocked.

Debugging with strace

Interpreting the output of strace can be daunting, the average binary does many things on startup and shutdown that you’ve probably never coded. Lets dissect an example of stracing the date command:

execve("/bin/date", ["date"], [/* 63 vars */]) = 0
brk(0)                                  = 0xabd000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f13febdf000

You can see here the initial execution of the date binary. This is done with the execv call, which asks the operating system to replace the current process with the binary you provided and begin execution.

You can see the next thing that happens is a call to brk(0), this is a way to grab the processes current program break. The program break is the location of the end of the data segment (actually its the first location after), moving the program break up has the effect of allocating more memory to the process, and moving it down has the effect of deallocating it. This is the way malloc allocates memory.

Then mmap is called to allocate 4096 bytes at a location of the kernels choosing (addr is NULL). Reads and writes are allowed, the memory is anonymous (not backed by a file) and private to this process. This initial memory is associated with the loading of dynamic libraries by the dynamic linker.

access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=143007, ...}) = 0
mmap(NULL, 143007, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f13febbc000
close(3)                                = 0

So this is the dynamic linker running trying to find all the libraries that out program requires. The first thing it does is try to load any libraries that may be listed in /etc/ld.so.preload, this is the same way the LD_PRELOAD environment variable works, except backed by a file.

Next it opens /etc/ld.so.cache, checks its file attributes, maps the file to memory, then closes the file. This file is created by the command ldconfig based on the file /etc/ld.so.conf and other files it includes (its common to include /etc/ld.so.conf.d/*.conf). The /etc/ld.so.cache file contains, in machine readable format, a list of all the libraries in the previously mentioned files and their location.

open("/lib64/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\"\240\2356\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=48128, ...}) = 0
mmap(0x369da00000, 2128984, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x369da00000
mprotect(0x369da07000, 2093056, PROT_NONE) = 0
mmap(0x369dc06000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x369dc06000
close(3)                                = 0

Here we are loading the librt library. The librt library is a part of glibc which provides realtime extensions. You can see here that the open call returns with file descriptor 3. Then in the mmap call you can see it mapping the library with PROT_READ and PROT_EXEC permissions, keeping it private and denying writes. Loading things this way means that if another program loads the same library it can share the same page in physical ram. It then protects a portion of that libraries memory from any read or write. It maps another section of the file as read write, but not written to the file, and finally closes the file descriptor.

open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\27\242\2346\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2076800, ...}) = 0
mmap(0x369ca00000, 3896632, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x369ca00000
mprotect(0x369cbad000, 2097152, PROT_NONE) = 0
mmap(0x369cdad000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ad000) = 0x369cdad000
mmap(0x369cdb3000, 17720, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x369cdb3000
close(3)                                = 0

This is basically the same thing as the above library, except it is the libc library. The libc library provides the methods specified in the ANSI C and POSIX C libraries.

open("/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320k\340\2346\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=145176, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f13febbb000
mmap(0x369ce00000, 2208760, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x369ce00000
mprotect(0x369ce17000, 2093056, PROT_NONE) = 0
mmap(0x369d016000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x369d016000
mmap(0x369d018000, 13304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x369d018000
close(3)                                = 0

Again, another library being loaded into memory. This time its the libpthread library, which provides POSIX thread support.

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f13febba000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f13febb9000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f13febb8000

Here we are mapping three pages of memory using mmap. This memory is not backed by a file and is private to the process.

arch_prctl(ARCH_SET_FS, 0x7f13febb9700) = 0
mprotect(0x60d000, 4096, PROT_READ)     = 0
mprotect(0x369dc06000, 4096, PROT_READ) = 0
mprotect(0x369cdad000, 16384, PROT_READ) = 0
mprotect(0x369d016000, 4096, PROT_READ) = 0
mprotect(0x369c821000, 4096, PROT_READ) = 0
munmap(0x7f13febbc000, 143007)          = 0
set_tid_address(0x7f13febb99d0)         = 4110
set_robust_list(0x7f13febb99e0, 24)     = 0
rt_sigaction(SIGRTMIN, {0x369ce06720, [], SA_RESTORER|SA_SIGINFO, 0x369ce0f500}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x369ce067b0, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x369ce0f500}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0

These lines are likely the setup of the threading for the application. arch_prctl is being used to set the FS register, which the pthread library uses to store a pointer to a struct containing the thread local information for that thread. mprotect is being called on all the memory that above was made protected from both reads and writes to make it read only. Some memory is unmapped to return it to the system with munmap. set_tid_address is called to set the tread id, and get the pid of the process (4110).

We then see a bunch of calls to rt_sigaction and a rt_sigprocmask which is basically plumbing all the signals needed for threading. Finally getrlimit is used to see that the limits on stack sizes are.

brk(0)                                  = 0xabd000
brk(0xade000)                           = 0xade000
brk(0)                                  = 0xade000

Malloc() isn’t a system call, instead it uses other calls such as brk() and mmap() to create memory. Because a calls to these methods are expensive libc batches them up. So malloc() when initially called allocates a bigger chunk of memory than you asked for, and as you ask for further bits it give you chunks of the same memory you allocated. What you can see here is more than likely a malloc() call checking the current program break (the end of the heap), then moving it ahead (to increase the heap size) then checking where it is again.

open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=104997456, ...}) = 0
mmap(NULL, 104997456, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f13f8795000
close(3)                                = 0

So all the library loading and setup is now done. What we are doing here is loading the locales file. This will be done by most things that use libc, but its particularly important for the date command as date formatting changes drastically in different locales. Basically we open a file descriptor, stat it (usually to find size, permissions and such), map the file into memory, then closes the file descriptor. Because mmap has been used to get the file into memory we don’t need to use the read() and write() calls to access it. This will prevent an expensive system call, but the first time you access each page of the data a page fault will be generated, which could be just as expensive.

open("/etc/localtime", O_RDONLY)        = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2183, ...}) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=2183, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f13febde000
read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0"..., 4096) = 2183
lseek(3, -1398, SEEK_CUR)               = 785
read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\5\0\0\0\0"..., 4096) = 1398
lseek(3, 2182, SEEK_SET)                = 2182
close(3)                                = 0
munmap(0x7f13febde000, 4096)            = 0

Here we can see date is opening the /etc/localtime file. This file contains information about the current timezone (it is often a symlink to the correct file). You can see here, the file is opened, read, a couple of seeks happen (to find particular entries) and the file is read again from the new point. Finally having retrieved the required information from the file, it is closed and its memory is unmapped.

fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f13febde000</pre>
write(1, "Thu Sep 27 16:48:57 EST 2012\n", 29Thu Sep 27 16:48:57 EST 2012
 ) = 29

At this point the date command has all that is required to calculate the date, its text representation and display it. You’ll notice that there has been no call to the operating system to fetch the time, this is because modern x86 machines include a special method called vsyscalls that allows userspace programs to make certain system calls without ever context switching. One of the system calls implemented via this method is the one to get the current unix timestamp, so date doesn’t need to call into kernelspace for this.

close(1)                                = 0
munmap(0x7f13febde000, 4096)            = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

STDOUT and STDERR are closed and the process group is ended. The date command is now finished.

Simple Configuration Service

The Basics

So I’ve seen a few implementations of ‘config services’ being built in all sorts of {ruby,python,php,java} applications. It really gets on my nerves to see people implementing things that could be much simpler. As Engineers we should really strive to get the most simple product built in a flexible way. So here is my go at implementing a ‘config service’ (basically a key value store optimised for reads). This was originally tweeted in a set of four tweets (1, 2, 3, 4).

Tweet 1

A 'Config Service' that fits in a tweet:
<Location /config>
  DAV svn
  SVNPath /var/svn/config
  SVNAutoversioning on
</Location>

Simply place this in your apache server configuration. You’ll need to create the SVN repo, which can be done with ‘svnadmin create /var/svn/config’ then chmod it to be writiable and readable by apache. What we are doing here is turning on WebDAV (which lets us upload and make directories on the apache server) and turning on SVN backed autoversioning which means that every change we make will be recorded in the SVN repo.

Tweet 2

To get a variable:
curl -s http://config/config/namespace/key

This simply performs a GET request for a key. You can’t do this until you add some keys to the server.

Tweet 3

To change a variable:
echo "value" | curl -T- http://config/config/namespace/key

This creates/modifies a key. You will need to have the namespace created first, which is in the next tweet.

Tweet 4

Make a new namespace:
curl -XMKCOL http://config/config/namespace

This creates a namespace to put keys in.

Extensions

Users

The service above doesn’t support users, so anyone can modify the keys, and the svn logs look like this:

------------------------------------------------------------------------
r5 | (no author) | 2012-10-15 12:36:33 +1100 (Mon, 15 Oct 2012) | 2 lines

Autoversioning commit:  a non-deltaV client made a change to
/namespace/key
------------------------------------------------------------------------

All we need to do is require the user to authenticate when doing PUT requests to the server. You can use this method to restrict certain namespaces too. Generate a htpasswd file for the users and place it in /var/svn/users and change the code to read as follows.

<Location /config>
  DAV svn
  SVNPath /var/svn/config
  SVNAutoversioning on
  AuthName "Config Service"
  AuthType Basic
  AuthUserFile /var/svn/users
  <LimitExcept GET>
    require valid-user
  </Limit>
</Location>

Viewing the logs

Simply do this:

svn checkout http://config/config
cd config
svn log

You can also use this to make bulk/atomic changes in the same way you would make changes to a subversion repository.

Ansible Talk @ Infra Coders

Here are the notes that I used in my talk at Infrastructure Coders. Each section was also put on the screen as a ‘slide’. The configuration that I used in the demo is available at GitHub. A full video of the meetup is available on the Infrastructure Coders Youtube Channel, my talk starts at 25:05.

Ansible
--------

0. There is nothing in the hat
  - Start a RHEL install
    - Cmd line: console=ttyS0 ks=http://admin01/ns3.cfg
    - If you want to follow the demos grab the ansible config from my github
    - You will need to substitute hostnames in the ansible hosts file
    - You should copy the firewall config from ns2 (remove port 647 if paranoid)

1. The problem
  - Ansible is the combination of several functions
  - There was a plan to build config management on func
  - However func is a pain to setup
  - Puppet and Chef have a steep learning curve
  - Ansible was also built to simplfy complexrelease procedures
  - You need to know ruby to extend Puppet/Chef

2. Ansible
  - Designed so you can figure out how to use it in 15 minutes
  - Designed to be easy to setup
  - Doesn't require much to be installed on the managed host
  - Designed to do config management/deployment/ad hoc
  - Other people do security better, just use SSH
  - You can extend ansible in any language that can output JSON

3. Simple Ansible Demo
  - Ansible hosts file
  - Ansible can be run directly on the command line
    - Run cat /etc/redhat-release
    - Get info using the setup module
  - It can prompt for auth, or use key based auth
    - On the new machine show it prompting
    - Run the rhelsetup script on the new machine
    - Install vim-enhanced

4. Playbooks
  - This is the method of scripting Ansible
  - Done in YAML
  - Executed in order *gee thanks puppet*
  - Designed to be as easy as possible

5. Example playbook
  - Playbook for the name servers
    - https://github.com/smarthall/tildaslash.com/blob/master/playbooks/zones.play
    - Can have multiple plays in a book
    - Can serialise if you dont want all to be down at once
  - Template config for the name servers
    - https://github.com/smarthall/tildaslash.com/blob/master/playbooks/zones/named.conf.j2
  - Firewall install script
    - https://github.com/smarthall/tildaslash.com/blob/master/playbooks/firewall.play

6. My thoughts
  - Config management has been around a while, its going from art to science
  - Ansible covers more ground than puppet and chef do
  - Ansible doesn't compromise on simplicity to do that
  - I don't have to focus on the nodes, I can focus on services
  - There is something missing
    - Disk config is done in kickstarts
    - Network config can't be done by Ansible
    - Need to find a way to cover both with one

The playlist of all the videos is available at Youtube.