Category Archives: Computing

Error messages aren’t perfect

When diagnosing a problem with a complex system such as Linux you sometimes need to step back, stop what you’re doing and take a different approach. Usually when a program fails on Linux you will get some kind of error message, traceback or coredump. Most people prefer to see some kind of error message rather than the latter two..

Tracebacks and coredumps are computer generated, which makes them more accurate then error messages, but harder for humans to understand. Error messages however are put in place by the programmer which means they can occasionally be misleading, inaccurate, ambiguous or just plain wrong. This is not always the programmers fault, sometimes its hard to describe exactly what went wrong. Other times the error describes the situation perfectly, but the sysadmin jumps to a different conclusion based on his circumstances.

Example

Some time ago we had some users complaining about a problem when trying to use X Forwarding via SSH. On this server /home was mounted off a Novell NetWare NFS share. They were getting the following output and were unable to run X11 applications.

[code]xauth: error in locking authority file /home/daniel/.Xauthority[/code]

Seeing this error I assumed that something was going wrong with the locking mechanism of NFS. I tried mounting the NFS share with the explicit lock option, but the same error remained. I tried explicitly giving the sync option too, but to no avail. I ended up trying many different NFS options until eventually I gave up and asked the Novell administrators to check their servers. I was convinced that something on their end was causing this locking error.

The Novell administrator responded that they could see nothing wrong on their end. This must mean that something was wrong on our side. I tried restarting the nfsstad and lockd initscripts and the whole machine but once again the same issue persisted. I checked the server using the rpcinfo command, which showed that everything was working fine. I even connected to the daemon using telnet (though I couldn’t talks its language) and confirmed a firewall was not in the way.

I thought that maybe there was something going wrong in the interaction between the client and the server, so I ran a tcpdump to capture all the packets transferred between them. this is where I made a small breakthrough. I found a NFS reply that had returned with SERVFAIL and error code 526. Googling for this error and Netware generally pointed towards a problem with character sets not getting preserved to the Novell server. There was nothing but ordinary characters on the filesystem, so much for that idea.

I wanted to know exactly what was happening when xauth was trying to lock the file, so I did an strace on it. Here are the last few lines (after xauth mmaped its libraries).:

[code]stat("/home/e71377/.Xauthority-c", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
unlink("/home/e71377/.Xauthority-c") = 0
unlink("/home/e71377/.Xauthority-l") = -1 ENOENT (No such file or directory)
open("/home/e71377/.Xauthority-c", O_WRONLY|O_CREAT|O_EXCL, 0600) = 3
close(3) = 0
link("/home/e71377/.Xauthority-c", "/home/e71377/.Xauthority-l") = -1 ESERVERFAULT (Unknown error 526)
write(2, "xauth: error in locking authori"..., 65xauth: error in locking authority file /home/e71377/.Xauthority
) = 65
exit_group(1) = ?[/code]

So it appears that this was not a file locking problem at all. xauth was successfully creating the files but it failed when it tried to create a hardlink. Reviewing the code for libXau (AuLock.c) revealed exactly why:

[code lang="c"] while (retries > 0) {
if (creat_fd == -1) {
creat_fd = open (creat_name, O_WRONLY | O_CREAT | O_EXCL, 0600);
if (creat_fd == -1) {
if (errno != EACCES)
return LOCK_ERROR;
} else
(void) close (creat_fd);
}
if (creat_fd != -1) {
#ifndef X_NOT_POSIX
/* The file system may not support hard links, and pathconf should tell us that. */
if (1 == pathconf(creat_name, _PC_LINK_MAX)) {
if (-1 == rename(creat_name, link_name)) {
/* Is this good enough? Perhaps we should retry. TEST */
return LOCK_ERROR;
} else {
return LOCK_SUCCESS;
}
} else {
#endif
if (link (creat_name, link_name) != -1)
return LOCK_SUCCESS;
if (errno == ENOENT) {
creat_fd= -1; /* force re-creat next time around */
continue;
}
if (errno != EEXIST)
return LOCK_ERROR;
#ifndef X_NOT_POSIX
}
#endif
}
(void) sleep ((unsigned) timeout);
--retries;
}[/code]

xauth isn’t trying to lock the file through flock() or another file locking method, which means that it is not the cause. Instead xauth is creating a file, and then to make sure it is the only program altering .Xauthority it creates a link. If the link succeeds then its the only program, if not then another program has the lock. The problem happens when xauth tries to make the hardlink. Interestingly there is a fallback for non-POSIX systems, but as RHEL is POSIX compatible it is not used.

It appeared that the NFS server did not support hard links. To test this theory I created several files, and attempted to create hard links using ‘cp -l file1 file2′. and they failed in the exact same way. All I had to do now was explain to the Novell Administrator that the problem was not locking, and was in fact that we were mounting a filesystem which did not support hard links on a POSIX compatible system. The Novell share was changed to support hard links (don’t ask me how, I’m not a Novell guy) and everything was working again.

Conclusion

The lesson to take away from here is not that hardlinks are required on POSIX, or that xauth doesn’t use file locking but locks itself via a dance of hardlinks. The lesson here is that you should never trust error messages. Take them as a hint, use them as a starting point but do not take them as law. You need to remember that the error message was written by a human and you may not be interpreting it how it was written.

mod_pagespeed is not (always) the answer

What is mod_pagespeed

Google recently released a chunk of code in the form of an Apache module. The idea is that you install it in your Apache server, it sits in between your application and the web browser and modifies the served requests to make the page load faster.
It does this by using combinations of filters, some are well known best practices, others are newer ideas. For example on filter simply minifies your JavaScript while another embeds small images in a page using data-uris. The changes these filters make range from low risk, to high risk. It should be noted that not all the filters will improve the page time some even making pages slower in some cases.

So what’s the issue?

The issue here really isn’t mod_pagespeed, but it’s the way people are viewing it. In my job as a Web Performance Engineer I have had several people recently say to me “let’s put mod_pagespeed on our web server to make it faster”. This is a break from normal attitudes, if someone were to to say “we should put our images into data-uris” then people would question the speed benefit, or the extra load on the server. For some reason when Google implement a page speed module people just assume that it will make their page faster, and that it will work in their environment. The truth is that Google really have no idea what the module will do to your page.

The second issue is that all these tweaks can usually be better implemented at the application level. If you minimize all your JavaScript as part of your build process then the web server will not have to do it for you. The same applies to data-uris. If they are simply part of the page then the browser doesn’t need to read in the extra image, uuencode it, then compress it. All that is quite a lot of work, which only really needs to be done once.

So what should I use mod_pagespeed for then?

You don’t always have access to the application code. If you are using third party software then before mod_pagespeed you may have had no control over the minification of CSS. This is where the module really shines. It gives you a layer between the application code and the web browser where you can apply all sorts of performance tuning.

The other advantage I can see is for looking for the best tunings to apply to your application quickly. You can setup mod_pagespeed and and run experimental tests with the filters on of and with a control to quickly figure out what rules you should apply in your application.

Rebooting with ‘The Big Hammer’

Today I had a machine I was working on spit the dummy in a really bad way. It had a tonne of IO errors to its root filesystem and eventually decided to remount it read only. Of course this meant that it was almost entirely wedged. I tried the reboot command, the init command and everything would lockup my terminal. Not having console or physical access to the machine I couldn’t simply hit the power button, so I used the Linux magic commands:


# echo 1 > /proc/sys/kernel/sysrq
# echo b > /proc/sysrq-trigger

Of course the disk errors meant that it was unable to boot but ‘The Big Hammer’ struck me as something extremely useful.

Programming In Javascript

Javascript is an interesting language. Its partly a functional programming language and part object oriented. It uses a C style syntax but borrows its naming conventions from Java (mostly). Personally I find Javascript language to be one of the most interesting languages that I have played with. The complaints I hear most often regarding Javascript are that it is very hard to learn and that there are many subtle differences between the interpreters.

Difficult to learn

This used to be mostly true. Javascript was a poorly documented language, often only documented in tutorial form by w3schools, or technically documented as ECMAScript. The absolute wealth of tutorials and blog posts made the good information few and far between. Largely when looking for information on how to perform a particular function you had to download some sample code and figure out how it was done based on that.

More recently though Javascript has caught the wave that is trying to standardize the web and this has somewhat improved the situation. Browser manufacturers are documenting their Javascript implementations and largely converging on a common standard. Additionally many helper libraries have been introduced to make the task of working on Javascript even easier. Once Javascript may have been difficult to learn, but as of late this is no longer true.

Javascript Documentation:

Subtle differences in interpretation

This is is one of the biggest problems you still see in Javascript today. You will often find developers writing functions to simply deal with the differences between browsers, there are even entire libraries dedicate to to abstracting away the differences. If I had a dollar for every implementation of a function to get a XMLHttpRequest object across browsers, I wouldn’t need my job.

Unfortunately it is still however very important to know the differences between implementations of Javascript if you plan on writing anything that will run on more than one browser. These difference may be in the features available in the language, in the Document Object Model or in the way the browser handles CSS. Thankfully many people work on documenting the difference and abstracting around them in libraries.

Javascript Implementations Differences:

Javascript Libraries:

Random Thought: If only Facebook didn’t get in the way of Javascript all the time…

Protecting Email with DKIM

One of the problems with the email and the protocols used to transfer it (SMTP) is that they were designed long ago when the Internet was a much friendlier place. When SMTP was designed it was assumed that other hosts on the Internet could be trusted. This is particularly visible in the configuration of relays where the sender doesn’t have to be identified. A mail relay will accept mail from any server regardless of where the mail appears to be coming from.

To attempt to rectify this SPF was created. To setup SPF you add either a TXT or an SPF record to the DNS zone you will be sending from. This record defines which servers are allowed to send mail that is coming from that domain. So on my domain danielhall.me I could publish an SPF record that says only my mail server is allowed to send mail that ends in @danielhall.me. Any mailservers receiving mail that is from my domain but not coming from an address listed in my SPF record can see that the mail is likely forged and throw it away. SPF works well in most situations but fails at a very common use case. If someone I send mail to tries to forward it to another address using an automatic process (no clicking forward in their client) then the mail will appear to come from my domain when it gets to the user it was forwarded to, however it will have came from the original recipients mailserver.

DKIM solves this problem by giving each sending mailserver a cryptographic key pair. The public keys is then published in a DNS record in that zone and stores the private key somewhere safe on the server. The server then proceeds to sign the headers (especially the From: header) and the body of all outgoing emails. This signature is then attached to the email as an extra header. When the receiving server get the email it gets the signature and uses that along with the list of signed headers to verify the signature against the public key of the signing domain. This means as long as the mail has passed through an authorised mailserver at any point it will be considered valid.

Setting up a DKIM is relatively simple process. You will need access to the  zone records for your domain and access to the configuration of all the mailservers which all mail originating at your domain passes through. You also need to be aware that signing mail makes it slightly more processor intensive to send an email. If you send a large amount of email this difference could be quite significant. If you’re using sendmail you may be able to alleviate it by switching to a less resource hungry MTA like Exim. You should also note that in some configurations DKIM can not be setup. For example if you use masquerading in sendmail DKIM will always fail as sendmail will modify the from header after signing.

Ultimately DKIM is a good move for the internet community at large, especially when combined with SPF. DKIM mail is assured to come from the sender and can be cryptographically proven so. While it does take more take a little more effort to setup and maintain it assures mail from your domain is secure and can be assured to have come from you or your company. Ultimately DKIM can protect your company against phishing attempts and boost your spam scores.

Random thought: What would Email look like if it were designed today?