It turned out that emulating waitpid for threads was more complicated then I initially assumed. Some older Linux kernels also exhibited strange behaviors in which the cloned child could execute before the parent did. This and a bunch of other fixes went into Systrace 1.6c which is now also available as Debian package. I tested this on various 2.4 kernels and distributions and was able to use the ptrace backend to run complicated applications like FireFox and X-Chat. Things look good.