Category Archives: System Administration

Error messages aren’t perfect

When diagnosing a problem with a complex system such as Linux you sometimes need to step back, stop what you’re doing and take a different approach. Usually when a program fails on Linux you will get some kind of error message, traceback or coredump. Most people prefer to see some kind of error message rather than the latter two..

Tracebacks and coredumps are computer generated, which makes them more accurate then error messages, but harder for humans to understand. Error messages however are put in place by the programmer which means they can occasionally be misleading, inaccurate, ambiguous or just plain wrong. This is not always the programmers fault, sometimes its hard to describe exactly what went wrong. Other times the error describes the situation perfectly, but the sysadmin jumps to a different conclusion based on his circumstances.

Example

Some time ago we had some users complaining about a problem when trying to use X Forwarding via SSH. On this server /home was mounted off a Novell NetWare NFS share. They were getting the following output and were unable to run X11 applications.

[code]xauth: error in locking authority file /home/daniel/.Xauthority[/code]

Seeing this error I assumed that something was going wrong with the locking mechanism of NFS. I tried mounting the NFS share with the explicit lock option, but the same error remained. I tried explicitly giving the sync option too, but to no avail. I ended up trying many different NFS options until eventually I gave up and asked the Novell administrators to check their servers. I was convinced that something on their end was causing this locking error.

The Novell administrator responded that they could see nothing wrong on their end. This must mean that something was wrong on our side. I tried restarting the nfsstad and lockd initscripts and the whole machine but once again the same issue persisted. I checked the server using the rpcinfo command, which showed that everything was working fine. I even connected to the daemon using telnet (though I couldn’t talks its language) and confirmed a firewall was not in the way.

I thought that maybe there was something going wrong in the interaction between the client and the server, so I ran a tcpdump to capture all the packets transferred between them. this is where I made a small breakthrough. I found a NFS reply that had returned with SERVFAIL and error code 526. Googling for this error and Netware generally pointed towards a problem with character sets not getting preserved to the Novell server. There was nothing but ordinary characters on the filesystem, so much for that idea.

I wanted to know exactly what was happening when xauth was trying to lock the file, so I did an strace on it. Here are the last few lines (after xauth mmaped its libraries).:

[code]stat("/home/e71377/.Xauthority-c", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
unlink("/home/e71377/.Xauthority-c") = 0
unlink("/home/e71377/.Xauthority-l") = -1 ENOENT (No such file or directory)
open("/home/e71377/.Xauthority-c", O_WRONLY|O_CREAT|O_EXCL, 0600) = 3
close(3) = 0
link("/home/e71377/.Xauthority-c", "/home/e71377/.Xauthority-l") = -1 ESERVERFAULT (Unknown error 526)
write(2, "xauth: error in locking authori"..., 65xauth: error in locking authority file /home/e71377/.Xauthority
) = 65
exit_group(1) = ?[/code]

So it appears that this was not a file locking problem at all. xauth was successfully creating the files but it failed when it tried to create a hardlink. Reviewing the code for libXau (AuLock.c) revealed exactly why:

[code lang="c"] while (retries > 0) {
if (creat_fd == -1) {
creat_fd = open (creat_name, O_WRONLY | O_CREAT | O_EXCL, 0600);
if (creat_fd == -1) {
if (errno != EACCES)
return LOCK_ERROR;
} else
(void) close (creat_fd);
}
if (creat_fd != -1) {
#ifndef X_NOT_POSIX
/* The file system may not support hard links, and pathconf should tell us that. */
if (1 == pathconf(creat_name, _PC_LINK_MAX)) {
if (-1 == rename(creat_name, link_name)) {
/* Is this good enough? Perhaps we should retry. TEST */
return LOCK_ERROR;
} else {
return LOCK_SUCCESS;
}
} else {
#endif
if (link (creat_name, link_name) != -1)
return LOCK_SUCCESS;
if (errno == ENOENT) {
creat_fd= -1; /* force re-creat next time around */
continue;
}
if (errno != EEXIST)
return LOCK_ERROR;
#ifndef X_NOT_POSIX
}
#endif
}
(void) sleep ((unsigned) timeout);
--retries;
}[/code]

xauth isn’t trying to lock the file through flock() or another file locking method, which means that it is not the cause. Instead xauth is creating a file, and then to make sure it is the only program altering .Xauthority it creates a link. If the link succeeds then its the only program, if not then another program has the lock. The problem happens when xauth tries to make the hardlink. Interestingly there is a fallback for non-POSIX systems, but as RHEL is POSIX compatible it is not used.

It appeared that the NFS server did not support hard links. To test this theory I created several files, and attempted to create hard links using ‘cp -l file1 file2′. and they failed in the exact same way. All I had to do now was explain to the Novell Administrator that the problem was not locking, and was in fact that we were mounting a filesystem which did not support hard links on a POSIX compatible system. The Novell share was changed to support hard links (don’t ask me how, I’m not a Novell guy) and everything was working again.

Conclusion

The lesson to take away from here is not that hardlinks are required on POSIX, or that xauth doesn’t use file locking but locks itself via a dance of hardlinks. The lesson here is that you should never trust error messages. Take them as a hint, use them as a starting point but do not take them as law. You need to remember that the error message was written by a human and you may not be interpreting it how it was written.

mod_pagespeed is not (always) the answer

What is mod_pagespeed

Google recently released a chunk of code in the form of an Apache module. The idea is that you install it in your Apache server, it sits in between your application and the web browser and modifies the served requests to make the page load faster.
It does this by using combinations of filters, some are well known best practices, others are newer ideas. For example on filter simply minifies your JavaScript while another embeds small images in a page using data-uris. The changes these filters make range from low risk, to high risk. It should be noted that not all the filters will improve the page time some even making pages slower in some cases.

So what’s the issue?

The issue here really isn’t mod_pagespeed, but it’s the way people are viewing it. In my job as a Web Performance Engineer I have had several people recently say to me “let’s put mod_pagespeed on our web server to make it faster”. This is a break from normal attitudes, if someone were to to say “we should put our images into data-uris” then people would question the speed benefit, or the extra load on the server. For some reason when Google implement a page speed module people just assume that it will make their page faster, and that it will work in their environment. The truth is that Google really have no idea what the module will do to your page.

The second issue is that all these tweaks can usually be better implemented at the application level. If you minimize all your JavaScript as part of your build process then the web server will not have to do it for you. The same applies to data-uris. If they are simply part of the page then the browser doesn’t need to read in the extra image, uuencode it, then compress it. All that is quite a lot of work, which only really needs to be done once.

So what should I use mod_pagespeed for then?

You don’t always have access to the application code. If you are using third party software then before mod_pagespeed you may have had no control over the minification of CSS. This is where the module really shines. It gives you a layer between the application code and the web browser where you can apply all sorts of performance tuning.

The other advantage I can see is for looking for the best tunings to apply to your application quickly. You can setup mod_pagespeed and and run experimental tests with the filters on of and with a control to quickly figure out what rules you should apply in your application.

Rebooting with ‘The Big Hammer’

Today I had a machine I was working on spit the dummy in a really bad way. It had a tonne of IO errors to its root filesystem and eventually decided to remount it read only. Of course this meant that it was almost entirely wedged. I tried the reboot command, the init command and everything would lockup my terminal. Not having console or physical access to the machine I couldn’t simply hit the power button, so I used the Linux magic commands:


# echo 1 > /proc/sys/kernel/sysrq
# echo b > /proc/sysrq-trigger

Of course the disk errors meant that it was unable to boot but ‘The Big Hammer’ struck me as something extremely useful.

Protecting Email with DKIM

One of the problems with the email and the protocols used to transfer it (SMTP) is that they were designed long ago when the Internet was a much friendlier place. When SMTP was designed it was assumed that other hosts on the Internet could be trusted. This is particularly visible in the configuration of relays where the sender doesn’t have to be identified. A mail relay will accept mail from any server regardless of where the mail appears to be coming from.

To attempt to rectify this SPF was created. To setup SPF you add either a TXT or an SPF record to the DNS zone you will be sending from. This record defines which servers are allowed to send mail that is coming from that domain. So on my domain danielhall.me I could publish an SPF record that says only my mail server is allowed to send mail that ends in @danielhall.me. Any mailservers receiving mail that is from my domain but not coming from an address listed in my SPF record can see that the mail is likely forged and throw it away. SPF works well in most situations but fails at a very common use case. If someone I send mail to tries to forward it to another address using an automatic process (no clicking forward in their client) then the mail will appear to come from my domain when it gets to the user it was forwarded to, however it will have came from the original recipients mailserver.

DKIM solves this problem by giving each sending mailserver a cryptographic key pair. The public keys is then published in a DNS record in that zone and stores the private key somewhere safe on the server. The server then proceeds to sign the headers (especially the From: header) and the body of all outgoing emails. This signature is then attached to the email as an extra header. When the receiving server get the email it gets the signature and uses that along with the list of signed headers to verify the signature against the public key of the signing domain. This means as long as the mail has passed through an authorised mailserver at any point it will be considered valid.

Setting up a DKIM is relatively simple process. You will need access to the  zone records for your domain and access to the configuration of all the mailservers which all mail originating at your domain passes through. You also need to be aware that signing mail makes it slightly more processor intensive to send an email. If you send a large amount of email this difference could be quite significant. If you’re using sendmail you may be able to alleviate it by switching to a less resource hungry MTA like Exim. You should also note that in some configurations DKIM can not be setup. For example if you use masquerading in sendmail DKIM will always fail as sendmail will modify the from header after signing.

Ultimately DKIM is a good move for the internet community at large, especially when combined with SPF. DKIM mail is assured to come from the sender and can be cryptographically proven so. While it does take more take a little more effort to setup and maintain it assures mail from your domain is secure and can be assured to have come from you or your company. Ultimately DKIM can protect your company against phishing attempts and boost your spam scores.

Random thought: What would Email look like if it were designed today?

SSH Agent Forwarding

So you use keys to SSH between your hosts, and you either have separate keys for each machine you use, or worse you have the same key on each machine. Lets go over why each of those are bad, and lets see how SSH Agent forwarding will help with those issues and make things easier for you in general.

So the key part of why a SSH agent and SSH agent forwarding forwarding is so useful is due to the way keys can be attacked. If I wanted to get your SSH private key I could find some flaw in the system that would give me that /home/you/.ssh/id_rsa file you have. Of course a malicious user with root access to the system could just go in and grab it. You can prevent this kind of attack by setting a passphrase on the key. Of course the root user could replace SSH with a special version designed to get your passphrase, steal the key out of memory or setup a keylogger. This means effectively that your private key is not safe on any system where a person you don’t trust has root access, or has other users and exploitable vulnerabilities.

Single Private Key on Multiple Machines

In this example you’re trusting the security of every single machine you have your private key on. Should it get compromised then you have to revoke you public key from every host, and regenerate private keys to place on every host. Every time you put your private key on a machine you increase the chances that it could be compromised.

Multiple Private Keys On Multiple Machines

So we’re getting a little closer to a good solution. In this instance we don’t have to generate our key and roll it out to all hosts in event of a compromise. You can also have segregate groups, on set of keys for work, another for home and so on. Your keys can still be compromised easily though, and once compromised they can be used until you revoke them manually.

SSH Agent Forwarding

There is a way to keep your key safe from compromise. Now I’ll have to explain how SSH authenticates you using your key. When your authenticating with SSH keys your key isn’t sent, the server sends you some random data and challenges your client to encrypt it with your private key. It then verifies the encrypted data by decrypting it with the public key and checking if it matches the data originally sent. Now the way most people would SSH from the second host to another third host is to utilise a private key on the second host to connect to the third host. Unfortunately this method means that you have to store a key (that is open for compromise) on the second host. SSH agent forwarding tells the SSH client on the second server to send the challenge data through to the SSH client (or ssh agent) on the first host. The agent encrypts the data and sends it via the SSH session to the third client.

The beauty of this method is that the second host never sees a private key, and the challenge data is useless to try and connect to a different host. Even if the second host is compromised there isn’t a private key there to compromise. It should be noted that if the second host is compromised it can still request the agent identify for a different host, or the session to the third host can be taken over. Both these are temporary though and unless the malicious user installs their key (something easy to notice) they cannot get back in.

Diagram detailing how an SSH connection is authenticated using agent forwarding.

Diagram detailing how an SSH connection is authenticated using agent forwarding.

If you want to know more about how this works, there is a wonderful tech tip at http://unixwiz.net/techtips/ssh-agent-forwarding.html.

But how?

SSH agent forwarding is even easier than copying keys all over the place. The first step is to generate keys for all the machines you log on to directly. You need to be sure these machines are secure and that your keys will stay safe, though this is sometimes not possible. You then add the generated public key to the authorized hosts file of all the machines you will connect to from this one, including ones that take two or more steps to get to. Finally you edit your ~/.ssh/ssh_config file to tell SSH to forward your agent through those hosts. Include the intermediate hosts in this list, but not the endpoints. You could also use SSHmenu to add the arguments automatically to those SSH commands. The following disables forwarding to all hosts, and explicitly enables it to fred, and aaron.missgner.com.

Host fred
  ForwardAgent yes

Host aaron.missgner.com
  ForwardAgent yes

Host *
  ForwardAgent no

Random thought: Linux has Plug ‘n Pray too, you plug the device in and pray the drivers aren’t proprietary.