This software is released under the GNU General Public License. Please see the file COPYING for more details.
This manual may be out of date. Please check the Bypass Web Page for the most recent version.
An interposition agent is a small piece of software which transforms a program's operation by placing itself between the program and the operating system. When the program attempts certain system calls, the agent grabs control and manipulates the results.
Interposition agents can be used for many reasons:
An interposition agent created by Bypass can be added to (nearly) any UNIX program at run-time. The receiving program must be dynamically-linked, but it need not otherwise be specially prepared for Bypass. Agents created by Bypass have been used with unmodified system tools such as cp
, grep
, and emacs
.
Bypass is a code generator, much like the compiler tools yacc
and bison
. The programmer provides a specification which lists what system calls are to be trapped and what code is to replace them. Bypass parses the specification and produces C++ source for an agent which implements the programmer's intentions.
Writing interposition agents from scratch is tricky -- each operating system implements its systems calls in a slightly different manner, so writing portable code to trap them is quite difficult. Bypass hides the programmer from all these unfortunate details.
In addition to building interposition agents, Bypass can also build split execution systems. A split execution system consists of a matched interposition agent and a shadow process. The interposition agent attaches to a program, traps its system calls, and sends them back to another machine, where the shadow process executes them and returns the results. Under this arrangment, a program can run on any networked machine and yet execute exactly as if it were running on the same machine as the shadow.
This manual describes how to accomplish all of these things using Bypass. We will begin with a simple interposition agent that measures a program's I/O behavior, proceed through several agents which work with external storage systems, and conclude with two simple split execution systems.
Bypass was created by Douglas Thain at the University of Wisconsin. Rajesh Rajamani and Francesco Prelz made valuable contributions to the multithreaded feature. Massimo Sgaravatto was an early and brave debugger. Thank you!
For more information about Bypass, please contact:
The Bypass Web Site
http://www.cs.wisc.edu/condor/bypass
Douglas Thain
thain@cs.wisc.edu
http://www.cs.wisc.edu/~thain
Miron Livny
miron@cs.wisc.edu
http://www.cs.wisc.edu/~miron
The Condor Team
condor-admin@cs.wisc.edu
http://www.cs.wisc.edu/condor
Computer Sciences Department
University of Wisconsin
1210 W. Dayton St.
Madison WI 53706
configure
program. Here is an example of install Bypass in /home/fred/bypass
using the C-Shell:
% cd /tmp % gunzip bypass.tar.gz % tar xvf bypass.tar % cd bypass % ./configure --prefix /home/fred/bypass
Bypass runs on a variety of UNIX-like operating systems. If you attempt to run Bypass on an operating system we have not tested, you will be given a warning, but the configuration process will attempt to go ahead. Please understand that Bypass deals with many system-specific low-level details, and we don't expect it will run out of the box with every new operating system. If you try Bypass on a new operating system, we would be happy to hear of your results.
Bypass has run on the following systems:
Bypass requires the tools gcc, g++, bison, flex, make, and perl. The configuration process will check to make sure you have these tools in your path. If you do not have them, you can get them from the Free Software Foundation.
A number of example programs are distributed with Bypass. Some of them make use of other software packages such as Globus, and SRB. If you have these packages, you may add them with options to configure
like so:
./configure --prefix /home/fred/bypass --with-srb-path /usr/local/srbBypass will work fine without these optional packages, but some of the example programs will not be built.
After configuring, build the software, add the bin
directory to your path, and set the BYPASS_LIBRARY_DIR
variable to point to the lib
directory. Here is an example of building Bypass using the C-Shell:
% make % make install % setenv PATH ${PATH}:/home/fred/bypass/bin % setenv BYPASS_LIBRARY_DIR /home/fred/bypass/lib
examples/info_agent.so
. If you would like to try this example out, skip right to the section on injecting an agent. Come back to this section when you would like to learn how to write and build agents.
read
and write
system calls to count how many bytes each uses, and traps the exit
system call to display a summary message as the program exits. Each trapped system call must be declared and then followed by the code to execute in its place.
The agent_prologue
section is optional, and contains any header code required by the user-written code that follows. This prologue in this example includes the standard I/O interface, and declares two variables to keep track of the number of bytes transferred.
Note: |
If you use any preprocessor commands in this section, then you must replace the # s with @ s. For example, #include becomes @include , and #define becomes @define .
|
Each procedure declaration looks very much like a C procedure declaration: there must be a return value, a name, and the formal parameters. Next comes the agent_action
keyword, and some C++ code delimted by double braces. A semicolon ends the declaration.
The declaration names what system call is to be replaced and gives the code that is to take its place. The code may be any arbitrary C++ fragment -- it may compute values, use parameters, or even invoke the replaced procedure. Notice in the example above that the new definition of read
invokes the original read
and then stores the result before returning.
Building an Agent
Once you have written a specification, run Bypass to generate source code. Bypass will create a header file and a C++ source file with _agent
appended.
NOTE: If you want to use threaded agents or threaded applications, you'll have to compile with the -DUSE_PTHREADS flag.
% bypass -agent info.bypass
-fPIC
flag to make sure the object contains position independent code.
% g++ -fPIC -g -I/home/fred/bypass/include -c info_agent.C -o info_agent.o
% g++ -fPIC -g -I/home/fred/bypass/include -c -DUSE_PTHREADS info_agent.C -o info_agent.o
On Linux or Solaris:
% g++ -shared info_agent.o -o info_agent.so -L/home/fred/bypass/lib -lbypass -ldl
% g++ -shared info_agent.o -o info_agent.so -L/home/fred/bypass/lib -lbypass
Note: |
There are many esoteric operating system errors that you may encounter in this section. If these instructions don't work for you, please consult the frequently asked questions. |
To inject an agent into a program, we will instruct the linker to load the agent as a shared library before any other libraries are referenced. This is done on most platforms by setting an environment variable. Again, this process is slightly different on each platform. Here are some examples:
On Linux or Solaris:
% setenv LD_PRELOAD /path/to/info_agent.so
% setenv _RLD_LIST /path/to/info_agent.so:DEFAULT
Now, run any old command that you like -- try something as simple as ls -l
:
% ls -l total 28 drwxr-xr-x 2 thain 23330 2048 Apr 11 14:03 CVS -rw-r--r-- 1 thain 23330 382 Apr 13 13:35 Makefile -rw-r--r-- 1 thain 23330 679 Apr 13 13:35 Makefile.config -rw-r--r-- 1 thain 23330 382 Apr 7 10:35 Makefile.template -rw-r--r-- 1 thain 23330 2363 Apr 11 13:42 README drwxr-xr-x 3 thain 23330 2048 Apr 13 13:35 bin -rwxr-xr-x 1 thain 23330 5600 Apr 9 19:54 configure drwxr-xr-x 3 thain 23330 2048 Apr 13 13:31 doc drwxr-xr-x 3 thain 23330 4096 Apr 13 13:35 examples drwxr-xr-x 3 thain 23330 2048 Apr 11 13:25 lib drwxr-xr-x 3 thain 23330 4096 Apr 13 13:35 src NOTICE: process 5657: 297267 bytes read, 703 bytes written
Notice that the program ran as normal, but the interposition agent counted up all the I/O performed and displayed a message just as the application ended. Go ahead and try more complicated programs such as emacs
or netscape
. You should see results for these programs as well.
To return to normal operation, simply unset the environment variable:
On Linux or Solaris:
% unsetenv LD_PRELOAD
% unsetenv _RLD_LIST
% setenv LD_PRELOAD "/path/to/auto_gass_agent.so /path/to/info_agent.so"
For example, the measurement layer may be placed above or below the Automatic GASS layer. If above, the measurement layer will only record those operations actually attempted by the application. If below, the measurement layer will record all operations performed by the combination of the application and the Automatic GASS layer, including any reads or writes necessary to implement globus_gass_open. You might want to have both -- one above, and one below. however, if you do this, you must make a separate copy for each instance of the agent, otherwise two copies of the code will share the internal data structures. To combine two measurement agents with the Automatic GASS agent, you might do this:
% cp info_agent.so top_agent.so % setenv LD_PRELOAD "/path/to/top_agent.so /path/to/auto_gass_agent.so /path/to/info_agent.so"
We will give two examples of split execution systems built using Bypass. The first simply sends the standard I/O operations without modification. The second prevents access to particular files and logs a message for each file opened.
Remote I/O System
This example, io.bypass, is a simple split execution system which traps and forwards the standard UNIX I/O operations. It should already be built for you in the examples
directory.
Notice that no agent_action
or shadow_action
is given for any of the procedures. When no action is given, the default is for the agent to send the call via RPC to the shadow, which then invokes the procedure normally.
Notice also that some annotations have been made to the parameters of open, read, and write. These three procedures have pointer arguments. Pointers are a bit tricky because they refer to some large, variable size amount of data. Bypass must be informed of what direction pointer data must move, and how much data is to be transferred.
The parameter name
is given to open
to determine what file to open. We know that open will not send back any data when it completes, so we determine that the name
data only flows in
to the procedure. The parameter is a null-terminated string, which is indicated by the keyword "string".
Every pointer argument must be prefixed with in
, out
, or in out
, to describe which way data flows. in
indicates data flows from the agent to the shadow, while out
indicates data flow from the shadow to the agent Following in
or out
may be a number of additional constructs:
string
indicates this parameter points to a null-terminated string. Data up to and including the terminator will be transferred. A string may also be an out
parameter, but then the keyword string must be followed by a quoted expression which must evaluate to the maximum number of bytes available to store the string.
opaque "expr"
indicates this parameter points to opaque binary data. "expr"
is a C++ expression which must evaluate to the exact number of bytes to transfer.
array "expr"
indicates this paramater points to an array of objects of the given type. "expr"
is a C++ expression which must evaluate to the number of objects to transfer.
Using Split Execution Systems
To build a split execution system, run bypass
with both the -agent
and -shadow
flags. This time, three files will be written: io_agent.C
, io_shadow.C
, and io.h
.
The agent must be compiled and linked against libbypass.a
, which is found in the lib
and include
directories of the Bypass distribution. If your platform requires particular libraries in order to use sockets (shown in red), make sure to link against those, too.
% bypass -agent -shadow io.bypass % g++ -I/home/fred/bypass/include -c io_agent.C -o io_agent.o % g++ -shared io_agent.o -L/home/fred/bypass/lib -lbypass -o io_agent.so -ldl -lnsl -lsocket
The shadow consists only of io_shadow.C
. It should be compiled into a standalone program and linked with libbypass.a
.
% g++ -I/home/fred/bypass/include -c io_shadow.C -o io_shadow.o % g++ io_shadow.o -L/home/fred/bypass/lib -lbypass -o io_shadow -lnsl -lsocket
To execute the program, the shadow must be started, and then the agent must be instructed where to find the shadow.
First, run the shadow with no arguments. It will display a message indicating the host and port it is running on, and then it will wait for a agent to connect:
% ./io_shadow setenv BYPASS_SHADOW_HOST www.xxx.yyy.zzz setenv BYPASS_SHADOW_PORT pppp
The agent must be told the host and port number of the shadow it is to connect to. These are passed by way of environment variables. Handily, the shadow has printed them out in a format which is convenient for cutting and pasting. Paste these into another window, and then run the agent as you normally would.
% setenv BYPASS_SHADOW_HOST www.xxx.yyy.zzz % setenv BYPASS_SHADOW_PORT pppp % setenv LD_PRELOAD `pwd`/info_agent.so % cat /etc/passwd bypass_agent: Getting configuration from environment... bypass_agent: Connecting to www.xxx.yyy.zzz port pppp... bypass_agent: Connection made.
All the input and output for the program will be conducted in the shadow window:
% ./io_shadow setenv BYPASS_SHADOW_HOST www.xxx.yyy.zzz setenv BYPASS_SHADOW_PORT pppp bypass_shadow: Waiting for connection... root:*:0:1:System Administration:/:/sbin/sh operator:*:5:5:System Backups:/u/o/p/operator:/bin/tcsh ...
examples
directory.
Each Bypass declaration involves an action on either (or both) the agent and shadow programs. We have already seen code explicitly invoked on the agent side by a agent_action
block. We can also control the code executed on the shadow by specifying shadow_action
blocks. When no agent_action
is given, the default is to invoke a remote procedure call. When no shadow_action
is given, the default is to invoke the procedure of the same name. For example, this declaration:
int open( in string const char *path, int flags, [int mode] );
implies these actions:
agent_action {{ return bypass_shadow_open( path, flags, mode ); }} shadow_action {{ return open( path, flags, mode ); }};
We can explicitly specify these action blocks to create some very powerful code. Let's modify open
to create a simple sandbox. If the user attempts to open a file not in the current directory (that is, it contains a slash,) we will return a permission error. Otherwise, we will forward the request to the shadow, which will print out a brief notice and open the file.
int open( in string const char *path, int flags, [int mode] ) agent_action {{ if(strchr(path,'/')) { printf("DENIED: agent tried to open %s\n",path); errno = EPERM; return -1; } else { return bypass_shadow_open(path,flags,mode); } }} shadow_action {{ printf("NOTICE: agent opened %s\n",path); return open(path,flags,mode); }};
% ./io_shadow setenv BYPASS_SHADOW_HOST www.xxx.yyy.zzz setenv BYPASS_SHADOW_PORT pppp ...If multiple programs are to be run using the same split execution model, it may be more useful to have a server listening on a well-known port and forking off shadows for each incoming connection. This is easily done with command line options to the shadow:
% ./io_shadow -port 50000 -multiprocessOn the other hand, your application may not require that each program use a unique shadow process. In this case, the shadow can listen on a well-known port and simply fork off a new thread for each connection. (This option is only available if the pthread library was available at build time.) For example:
% ./io_shadow -port 50000 -multithreadFinally, the -debug option will display lots of information about each connection.
% ./io_shadow -debug
Note: |
The multithreading option is only available on Linux and then only if the pthreads library can be found. If you are building the examples in the Bypass package, the configure program will take care of finding and using the pthreads library. If you are building your own software, you must compile the shadow with -DUSE_PTHREADS and direct the compiler and linker to the pthreads library by yourself.
|
Two authentication methods are currently provided: Globus GSS and Trivial. Globus GSS authentication is more secure, but requires the Globus software and appropriate certificates. Trivial authentication is less secure, but has no special requirements. Both the agent and the shadow may use one or both authentication methods -- when they connect, they will negotiate a mutually acceptable method.
Globus GSS authentication is the recommended authentication mechanism for Bypass. If the Globus software was available when Bypass was built, then it is the default. Globus GSS uses public/private key cryptography to identify the agent to the shadow. The user running the agent is identified in human readable form as an X.509 subject, which looks something like:
/C=US/O=Bedrock Township/OU=Construction Services/CN=Fred Flintstone
fred@construction.bedrock.gov
Authorization is performed by looking for the subject name in an authorization file. The -authfile option specifies the path to an authorization file you can create. This file simply lists subject names one to a line. An asterisk may be used as a a wildcard to match several subjects. An example authorization file might be:
/C=US/O=Bedrock Township/OU=Construction Services/CN=Fred Flintstone /C=US/O=University of Petonkwa/OU=Computer Sciences/* fred@construction.bedrock.gov *@administration.bedrock.gov
Important Security Note: |
If you do not specify an authorization file, it is assumed you are willing to accept any connection. I do not recommend doing this for a production enviroment. Use -authfile to control the allowed connections.
|
Note: |
The Globus option is only available if the Globus libraries can be found. If you are building the examples in the Bypass package, the configure program will take care of finding and using them. If you are building your own software, you must compile the agent and shadow with -DUSE_GLOBUS_GSS and direct the compiler and linker to the various and sundry Globus libraries by yourself.
|
#
, and commands for the second pass should begin with @
.
For example, you may want to use the preprocessor to manage what code is included, but the code itself may also require the preprocessor:
#ifdef sun int exit( int status ) agent_action {{ @define SUCCESS -1 exit(SUCCESS); }}; #else int exit( int status ) agent_action {{ @define SUCCESS 0 exit(SUCCESS); }}; #endif
A trailing argument enclosed in brackets indicates that the declaration is variable, but any call to the procedure should assume a variable with the bracketed name and type.
A replacement for fcntl
might be declared like this:
int fcntl( int fd, int command, [void *arg] );
type : [unsigned] [const] [struct] type-name star-list star-list : /* nothing */ | [const] '*' star-list
BYPASS_SHADOW_HOST
BYPASS_SHADOW_PORT
BYPASS_DEBUG
BYPASS_FAILURE_PASSTHROUGH
This default behavior is chosen for two reasons. First, re-establishing the connection and rebuilding any state that was accumulated at the shadow is beyond the power of the application. Second, Bypass forces an abnormal termination (killed by signal) so that the scheduling system does not assume the application exited normally.
The modified behavior may be useful to some agents (such as the Grid Console) which may have enough information to trap and retry such errors.
int bypass_shadow_*( ... );
bypass_shadow
prepended. For example, to invoke a remote open
invoke bypass_shadow_open
with the same arguments as open
. This may be done inside of any agent action.
void bypass_debug( char *fmt, ... );
void bypass_error( char *fmt, ... );
void bypass_die();
int bypass_failure_passthrough;
Bypass generates code for a variety of UNIX-like platforms. On each platform, there are many tricky details to trapping and invoking each of the system calls. Bypass collects all these details together in a knowledge file, lib/bypass_knowledge
. When a user requests that a call be trapped, Bypass consults the knowledge file and generates several pieces of code for each of the user's declarations.
The knowledge file has the same syntax as a regular Bypass input file, but it make heavy use of option rules. An option rule lists the tricky details needed for a particular system call. An option rule by itself does not generate any code -- it only specifies options in case the user wants to trap the named procedure.
For example, the option statement for read
is:
options "read" entry "_read", "__read" syscall local_name "read" remote_name "read" ;
entry
indicates that trapping read
also involves catching the related _read
and __read
. syscall
indicates that read
is a true system call (as opposed to a standard library call.) local_name
and remote_name
give the names of the procedures to invoke when operating locally or via RPC. These are almost always the same as the regular procedure.
Option rules tend to be very similar to the example above -- only a few break the pattern. So, rules may use wildcards which specify the options for a whole class of system calls. For example, the first entry in the knowledge file looks something like this:
options "*" entry "_*", "__*" local_name "*" remote_name "*" syscall library "libc" ;This statement indicates that any system call will get the rules mentioned, with the call's name substituted for each occurrence of "*".
A call may match several option rules. For example, fstat
would match entries named "*"
, "f*"
, and "fstat"
. If this happens, the rules are applied in the order they appear in the knowledge file.
The allowed statements in the option rules are:
syscall
libcall
library "library-name"
.a
or .so
.
plain
kill
entry "name1", "name2", ...
local_name "name"
remote_name "name"
switch_code {{ code-fragment }}
indirect "name"
syscall(SYS_socketcall,SYS_socket,...)
.
instead <procedure-decl>
also {{ code-fragment }}
In addition to the usual declaration, add on this bit of code.
The knowledge file is heavily commented with the reasons behind each unusual system call. The adventuresome reader should skip right to the knowledge file to learn all the dirty details.
The standard injection method requires that the accepting program be dynamically linked. This is true of most programs on modern UNIX systems. On a few systems, critical programs (such as cp and ls) are statically linked so that they may be used without the standard library present. Statically linked programs will ignore any interposition agents. On some systems, you can use the ldd
program to determine if a program is dynamically linked.
In particular, many of the standard IRIX utilites are statically linked. However, the GNU utilities, provided in /usr/gnu/bin, are dynamically linked and should give you the behavior you want.
Most compilers produce relocatable object code by default. This kind of code is annotated with relocations which tell the linker how to rewrite bits of the code as it is placed in memory. This method is just fine for static linking.
Some operating systems do not allow relocatable code to be used in a dynamic library, because processing all those relocations would result in a very long startup time for a simple application. So, any code that gets placed in a dynamic library must be position independent code -- all references in the code use pc- or base-relative addresses that do not require any relocations. This allows objects to be re-arranged at run-time without a stiff penalty.
The upshot of this is that all code that goes into an interposition agent must be position-independent. This is easy to get -- you just compile with the -fPIC
flag. Any additional libraries (such as Globus or SRB) linked against the interposition agent must also be compiled as position-independent.
If you have already built these packages and didn't specify the -fPIC flag at compilation time, then I'm afraid you'll have to re-build them from scratch with -fPIC enabled.
Text relocation remains referenced against symbol offset in file ASN1_UTCTIME_set 0x740 /p/condor/workspaces/ssl/lib/libcrypto.a(x509_vfy.o) ... (followed by about a million similar lines) ...
This error means you didn't compile all the various libraries and objects in your agent as position independent code. Please see the preceding question for more information.
This warning says that your interposition agent defined a system call that was already defined in the standard library. In most programs, this indicates a bug, but for our purposes, this is exactly what you want -- your interposition agent is replacing a standard system call. You may safely ignore these warnings.
The situation on IRIX is a little complicated. IRIX currently has three binary program models -- o32, n32, and n64. A program built for one model can only be used with libraries of the same model. This message means that the program you are trying to run did not have the same binary model as that of the interposition agent, so you cannot inject it. You need to either rebuild the application or the agent so that they have the same model. Consult the documentation for your compiler to see exactly how to do this.
To the best of my knowledge, you can't do so by simply setting an environment variable. It may be possible to do so by creating a separate program which uses the /proc
interface to load a program, suspend it, and inject the interposition agent. If you come up with a better method, we would be happy to document it here.
Many of the ideas and techniques used in Bypass were inspired by similar features in Condor. Bypass does not require Condor, nor does Condor use Bypass. They are separate programs.