Category Archives: Programming

System Calls

What are System Calls?

Systems calls are the interface between userspace and kernelspace. They are essentially procedures belonging to the operating system that you can call when you want something done that can’t be done in userspace. This includes anything system related, accessing devices, files, managing processes and networking. On my 64bit machine with a 3.5.4 kernel there are 344 system calls (cat /usr/src/linux/arch/x86/syscalls/syscall_64.tbl | egrep -v ‘^#’ | egrep -v ‘^$’ | wc -l) .

Calling procedures is generally a simple affair, just push the current context onto the stack and jump to the new address. Calling a system call on the other hand is quite a bit more difficult. The kernel executes in a higher privileged mode on the CPU and userspace executes on a lower privilege mode. To transfer through to the higher privilege mode a CPU exception is triggered, which then launches in the higher mode, however as you can’t pass arguments through this mechanism they are placed in the CPU registers before the exception. Newer CPUs have implemented SYSENTER/SYSEXIT calls (or similar) that work differently and are designed to speed up the process.

You can see the system calls that a process makes while it is running by using strace. Strace runs a program in such a way that it can intercept all the system calls it makes and gather details about them. Watching a program’s system calls is a good way to see what it is doing, if its hanging on IO, if it is stuck in an infinite loop or if it has deadlocked.

Debugging with strace

Interpreting the output of strace can be daunting, the average binary does many things on startup and shutdown that you’ve probably never coded. Lets dissect an example of stracing the date command:

execve("/bin/date", ["date"], [/* 63 vars */]) = 0
brk(0)                                  = 0xabd000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f13febdf000

You can see here the initial execution of the date binary. This is done with the execv call, which asks the operating system to replace the current process with the binary you provided and begin execution.

You can see the next thing that happens is a call to brk(0), this is a way to grab the processes current program break. The program break is the location of the end of the data segment (actually its the first location after), moving the program break up has the effect of allocating more memory to the process, and moving it down has the effect of deallocating it. This is the way malloc allocates memory.

Then mmap is called to allocate 4096 bytes at a location of the kernels choosing (addr is NULL). Reads and writes are allowed, the memory is anonymous (not backed by a file) and private to this process. This initial memory is associated with the loading of dynamic libraries by the dynamic linker.

access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=143007, ...}) = 0
mmap(NULL, 143007, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f13febbc000
close(3)                                = 0

So this is the dynamic linker running trying to find all the libraries that out program requires. The first thing it does is try to load any libraries that may be listed in /etc/ld.so.preload, this is the same way the LD_PRELOAD environment variable works, except backed by a file.

Next it opens /etc/ld.so.cache, checks its file attributes, maps the file to memory, then closes the file. This file is created by the command ldconfig based on the file /etc/ld.so.conf and other files it includes (its common to include /etc/ld.so.conf.d/*.conf). The /etc/ld.so.cache file contains, in machine readable format, a list of all the libraries in the previously mentioned files and their location.

open("/lib64/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\"\240\2356\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=48128, ...}) = 0
mmap(0x369da00000, 2128984, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x369da00000
mprotect(0x369da07000, 2093056, PROT_NONE) = 0
mmap(0x369dc06000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x369dc06000
close(3)                                = 0

Here we are loading the librt library. The librt library is a part of glibc which provides realtime extensions. You can see here that the open call returns with file descriptor 3. Then in the mmap call you can see it mapping the library with PROT_READ and PROT_EXEC permissions, keeping it private and denying writes. Loading things this way means that if another program loads the same library it can share the same page in physical ram. It then protects a portion of that libraries memory from any read or write. It maps another section of the file as read write, but not written to the file, and finally closes the file descriptor.

open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\27\242\2346\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2076800, ...}) = 0
mmap(0x369ca00000, 3896632, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x369ca00000
mprotect(0x369cbad000, 2097152, PROT_NONE) = 0
mmap(0x369cdad000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ad000) = 0x369cdad000
mmap(0x369cdb3000, 17720, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x369cdb3000
close(3)                                = 0

This is basically the same thing as the above library, except it is the libc library. The libc library provides the methods specified in the ANSI C and POSIX C libraries.

open("/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320k\340\2346\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=145176, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f13febbb000
mmap(0x369ce00000, 2208760, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x369ce00000
mprotect(0x369ce17000, 2093056, PROT_NONE) = 0
mmap(0x369d016000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x369d016000
mmap(0x369d018000, 13304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x369d018000
close(3)                                = 0

Again, another library being loaded into memory. This time its the libpthread library, which provides POSIX thread support.

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f13febba000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f13febb9000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f13febb8000

Here we are mapping three pages of memory using mmap. This memory is not backed by a file and is private to the process.

arch_prctl(ARCH_SET_FS, 0x7f13febb9700) = 0
mprotect(0x60d000, 4096, PROT_READ)     = 0
mprotect(0x369dc06000, 4096, PROT_READ) = 0
mprotect(0x369cdad000, 16384, PROT_READ) = 0
mprotect(0x369d016000, 4096, PROT_READ) = 0
mprotect(0x369c821000, 4096, PROT_READ) = 0
munmap(0x7f13febbc000, 143007)          = 0
set_tid_address(0x7f13febb99d0)         = 4110
set_robust_list(0x7f13febb99e0, 24)     = 0
rt_sigaction(SIGRTMIN, {0x369ce06720, [], SA_RESTORER|SA_SIGINFO, 0x369ce0f500}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x369ce067b0, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x369ce0f500}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0

These lines are likely the setup of the threading for the application. arch_prctl is being used to set the FS register, which the pthread library uses to store a pointer to a struct containing the thread local information for that thread. mprotect is being called on all the memory that above was made protected from both reads and writes to make it read only. Some memory is unmapped to return it to the system with munmap. set_tid_address is called to set the tread id, and get the pid of the process (4110).

We then see a bunch of calls to rt_sigaction and a rt_sigprocmask which is basically plumbing all the signals needed for threading. Finally getrlimit is used to see that the limits on stack sizes are.

brk(0)                                  = 0xabd000
brk(0xade000)                           = 0xade000
brk(0)                                  = 0xade000

Malloc() isn’t a system call, instead it uses other calls such as brk() and mmap() to create memory. Because a calls to these methods are expensive libc batches them up. So malloc() when initially called allocates a bigger chunk of memory than you asked for, and as you ask for further bits it give you chunks of the same memory you allocated. What you can see here is more than likely a malloc() call checking the current program break (the end of the heap), then moving it ahead (to increase the heap size) then checking where it is again.

open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=104997456, ...}) = 0
mmap(NULL, 104997456, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f13f8795000
close(3)                                = 0

So all the library loading and setup is now done. What we are doing here is loading the locales file. This will be done by most things that use libc, but its particularly important for the date command as date formatting changes drastically in different locales. Basically we open a file descriptor, stat it (usually to find size, permissions and such), map the file into memory, then closes the file descriptor. Because mmap has been used to get the file into memory we don’t need to use the read() and write() calls to access it. This will prevent an expensive system call, but the first time you access each page of the data a page fault will be generated, which could be just as expensive.

open("/etc/localtime", O_RDONLY)        = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2183, ...}) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=2183, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f13febde000
read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0"..., 4096) = 2183
lseek(3, -1398, SEEK_CUR)               = 785
read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\5\0\0\0\0"..., 4096) = 1398
lseek(3, 2182, SEEK_SET)                = 2182
close(3)                                = 0
munmap(0x7f13febde000, 4096)            = 0

Here we can see date is opening the /etc/localtime file. This file contains information about the current timezone (it is often a symlink to the correct file). You can see here, the file is opened, read, a couple of seeks happen (to find particular entries) and the file is read again from the new point. Finally having retrieved the required information from the file, it is closed and its memory is unmapped.

fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f13febde000</pre>
write(1, "Thu Sep 27 16:48:57 EST 2012\n", 29Thu Sep 27 16:48:57 EST 2012
 ) = 29

At this point the date command has all that is required to calculate the date, its text representation and display it. You’ll notice that there has been no call to the operating system to fetch the time, this is because modern x86 machines include a special method called vsyscalls that allows userspace programs to make certain system calls without ever context switching. One of the system calls implemented via this method is the one to get the current unix timestamp, so date doesn’t need to call into kernelspace for this.

close(1)                                = 0
munmap(0x7f13febde000, 4096)            = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

STDOUT and STDERR are closed and the process group is ended. The date command is now finished.

Programming In Javascript

Javascript is an interesting language. Its partly a functional programming language and part object oriented. It uses a C style syntax but borrows its naming conventions from Java (mostly). Personally I find Javascript language to be one of the most interesting languages that I have played with. The complaints I hear most often regarding Javascript are that it is very hard to learn and that there are many subtle differences between the interpreters.

Difficult to learn

This used to be mostly true. Javascript was a poorly documented language, often only documented in tutorial form by w3schools, or technically documented as ECMAScript. The absolute wealth of tutorials and blog posts made the good information few and far between. Largely when looking for information on how to perform a particular function you had to download some sample code and figure out how it was done based on that.

More recently though Javascript has caught the wave that is trying to standardize the web and this has somewhat improved the situation. Browser manufacturers are documenting their Javascript implementations and largely converging on a common standard. Additionally many helper libraries have been introduced to make the task of working on Javascript even easier. Once Javascript may have been difficult to learn, but as of late this is no longer true.

Javascript Documentation:

Subtle differences in interpretation

This is is one of the biggest problems you still see in Javascript today. You will often find developers writing functions to simply deal with the differences between browsers, there are even entire libraries dedicate to to abstracting away the differences. If I had a dollar for every implementation of a function to get a XMLHttpRequest object across browsers, I wouldn’t need my job.

Unfortunately it is still however very important to know the differences between implementations of Javascript if you plan on writing anything that will run on more than one browser. These difference may be in the features available in the language, in the Document Object Model or in the way the browser handles CSS. Thankfully many people work on documenting the difference and abstracting around them in libraries.

Javascript Implementations Differences:

Javascript Libraries:

Random Thought: If only Facebook didn’t get in the way of Javascript all the time…

Cross-Domain AJAX

When making an xmlhttprequest from a website the browser will restrict you to the site from which the script came. This is a security precaution. If sites were able to tell the browser to make requests from other domains then they would be able to DDOS a site with a users browser. There are legitimate reasons to make requests to other sites though.

Many sites offer web services, xml data and json encoded data. These can provide almost anything from the weather, to search results, to advanced APIs. To use these services from your site using javascript you’ll have to employ one of the methods below.

Signing Javascript

Firefox allows you to sign your Javascript and place it in a jar file. This will give your code more privileges, You can also request these permissions explicitly without having your code signed, but having a dialog box appear for every AJAX request could get very tiring for the user. Another problem with this approach is that it isn’t documented very well and its Firefox specific. The first link in the references section deals with this method.

Access-Control Headers

This is the w3 approved method of allowing a client from another domain to access your web service. It is a server side method and requires no changes on the client to implement. This is both and advantage and a disadvantage. If you have control over the server then this method is simple, otherwise (for sites such as Yahoo API or other public services) you will not be able to implement this. It should also be noted that this was implemented in Firefox 3.5 so it can’t be used with earlier versions, or other browsers.

To use this method you tell your service to output extra headers that tell the browser whether access was allowed or denied.

Flash Enabled xmlhttprequest

This method involves using an invisible flash player to perform the actual request then handing the result back to the Javascript for processing. Flash still performs permission checking by looking for a /crossdomain.xml file in the root directory of the domain the request is being made to. There are several libraries that implement this approach and a few even implement in a way which is compatible with xmlhttprequest. One downside is this Flash is required, though recently Flash is required for several major sites and most browsers will have it installed.

Add Sites To Trusted Zone

Internet Explorer allows and denies cross-domain based xmlhttprequests based on the security setting. This approach is likely not going to be used on the Internet as it requires user interaction and is Internet Explorer specific. On a corporate Intranet this is slightly less difficult but not by much.

Apache mod_proxy

With this method you use the same server you shared the page from to proxy the requests automatically to the server with the data you’re fetching. For this to work your version of Apache has to be compiled with proxy support or you need to have the mod_proxy dso loaded. This method increases the latency of requests as they must first go via your server. It should also be noted that this cannot be implemented in .htaccess file and must be done in the main configuration.

Manual Proxy

If you don’t have control over your servers configuration then you can mimic the above method by writing a script that forwards the variables required and forwards back the data. This approach can even be more preferable than the above method as it allows you to preprocess the variables and cache the data if required.

References

http://www.mozilla.org/projects/security/components/signed-scripts.html

http://dev.w3.org/2006/waf/access-control/

http://developer.yahoo.com/javascript/howto-proxy.html

https://developer.mozilla.org/En/HTTP_Access_Control

http://ejohn.org/blog/cross-site-xmlhttprequest/

http://ajaxpatterns.org/XMLHttpRequest_Call

http://ajaxpatterns.org/Flash-enabled_XHR

Random Thought: Can you use AJAX to make web applications cleaner?

Writing a Daemon in C

What is a Daemon?

A daemon is a program that runs in the background. A daemon will usually be started at system startup and end at system shutdown. The exceptions to this rule are programs like the Bluetooth SDP daemon, which is activated when a new Bluetooth HCI is found,, and ends when it is removed. Daemons run transparently and do not normally interact with the user directly.

Daemons start as ordinary processes but they eventually ‘fork and die’ to start running in the background. Some daemons do only the ‘fork and die’ step but ignore other important steps. Here is a list of what a daemon should do:

  1. Fork to create a child, and exit the parent process.
  2. Change the umask so that we aren’t relying on the one set in the parent.
  3. Open logs to write to in the case of an error.
  4. Create a new session id and detach from the current session.
  5. Change the working directory to somewhere that won’t get unmounted.
  6. Close STDIN, STDOUT and STDERR.

These steps ensure that our association with the calling environment is destroyed and our daemon is now free to run as a completely separate process.

Lastly before writing the daemon you should make sure the code is written securely and in a way that fails gracefully. If your daemon crashes it will not be able to prompt the user about what action to take. The user may not even notice until it is too late.

Forking a child process

In Unix fork() is the only system call with two return values. When you call fork a child process is created which is a near copy of its parent (some things will be different in the child eg. process id). The fork command then returns a 0 in the child and the childs process id in the parent, on failure a -1 is sent to the parent. Generally a program will then check whether it is the child or parent by these return values (just like in movies when a cloned character will check to see if he has a belly button and hence is the original). Here is a snippet of code to do this:

pid_t pid;

/* Clone ourselves to make a child */   
pid = fork(); 

/* If the pid is less than zero,
   something went wrong when forking */
if (pid < 0) {
    exit(EXIT_FAILURE);
}

/* If the pid we got back was greater
   than zero, then the clone was
   successful and we are the parent. */
if (pid > 0) {
    exit(EXIT_SUCCESS);
}

/* If execution reaches this point we are the child */

Changing the umask

Because we are a clone of our parent we’ve inherited its umask. This means the child doesn’t know what permissions files will end up with when it tries to create them. We do this by simply calling umask like this:

/* Set the umask to zero */
umask(0);

Open logs to write to

This part can be done in several different ways. You could open text files, log to a database or use syslog. The method I’m going to demonstrate here is to log using syslog. Syslog sends your log messages to a system wide logger, where they can be configured to be written to a file, send to a network server or filtered away entirely.

/* Open a connection to the syslog server */
openlog(argv[0],LOG_NOWAIT|LOG_PID,LOG_USER); 

/* Sends a message to the syslog daemon */
syslog(LOG_NOTICE, "Successfully started daemon\n"); 

/* this is optional and only needs to be done when your daemon exits */
closelog();

Create a new session id

Each process on a Unix system is a member of a process group (or session). The id of each group is the process id of its owner. When we forked from our parent earlier we will have inherited its process group, and our process group leader will still be its parent process. We want to create our own process group and become our own process leader otherwise we will look like an orphan. We can do this easily as follows:

pid_t sid;

/* Try to create our own process group */
sid = setsid();
if (sid < 0) {
    syslog(LOG_ERR, "Could not create process group\n");
    exit(EXIT_FAILURE);
}

Changing the working directory

At the moment we have the working directory we inherited from our parent. This working directory could be a network mount, a removable drive or somewhere the administrator may want to unmount at some point. To unmount any of these the system will have to kill any processes still using them, which would be unfortunate for our daemon. For this reason we set our working directory to the root directory, which we are sure will always exist and can’t be unmounted.

/* Change the current working directory */
if ((chdir("/")) < 0) {
    syslog(LOG_ERR, "Could not change working directory to /\n");
    exit(EXIT_FAILURE);
}

Closing the standard file descriptors

A daemon doesn’t interact with the user directly it has no use for STDIN, STDOUT and STDERR and we really have no idea where these are connected or where anything we write to them will end up. As these file descriptors are not required and effectively useless we should close them to save some system resources and prevent any related security problems. We close these descriptors like this:

/* Close the standard file descriptors */
close(STDIN_FILENO);
close(STDOUT_FILENO);
close(STDERR_FILENO);

Writing the payload

Now you have a C program that is capable of becoming a daemon, but its a pretty useless daemon if it exits immediately. Payload code is really up to you to design. I’ll offer you a few tips on designing your payload.

  • Put your payload in a loop. Generally in a daemon you want to perform the same action over and over again until you’re killed. If you have to cleanup (such as closing syslog) when the daemon is about to be killed you should add an exit clause that will be activated by a SIGTERM signal handler.
  • Make your code as fast an efficient as possible. This is something you should do with any program, but with daemons it is important that you do not hamper the performance of the rest of the system. This is especially true if you’re going to be running this daemon on desktop systems.
  • Be aware that your code may be preempted very often. As your daemon is going to be running for the amount of time the system is up, it is likely that its execution will be preempted.
  • Be paranoid about security. Daemons are common attack vectors and can be used to gain privileged access to a system. You should consider dropping any privileges that you don’t require.

Conclusion

So if we take all the code I’ve mentioned in this post and put it all together you have a simple daemon. You can download the source from the link here: daemon.c.
If your daemon is only going to be run on Linux and not on a System V style system such as Solaris you can use the daemon function to do a lot of this work for you.

References

Linux Daemon Writing HOWTO in C
Linux Daemon writing in C++

Random Thought: It appears the devil uses a Unix based OS, probably OSX.