Bypass Manual

Release 2.5.3 7-August-2002

Front Matter

Bypass is Copyright (C) 1999-2002 Douglas Thain.

This software is released under the GNU General Public License. Please see the file COPYING for more details.

This manual may be out of date. Please check the Bypass Web Page for the most recent version.

Table of contents

  • Overview
  • Getting started
  • Beginning Example
  • Writing an Agent
  • Building an Agent
  • Injecting an Agent
  • Layering Agents
  • Example of Layering
  • Layering Rules
  • Layer Order Matters
  • Split Execution Systems
  • Remote I/O System
  • Using Split Execution Systems
  • Controlled I/O System
  • Shadow Options
  • Security
  • Notes for Wizards
  • Frequently Asked Questions

  • Overview

    Bypass is a tool for writing interposition agents.

    An interposition agent is a small piece of software which transforms a program's operation by placing itself between the program and the operating system. When the program attempts certain system calls, the agent grabs control and manipulates the results.

    Interposition agents can be used for many reasons:

  • To measure and debug a program.
  • To give new capabilities to an old program.
  • To attach programs to new storage systems.
  • To emulate one system while using another.
  • An interposition agent created by Bypass can be added to (nearly) any UNIX program at run-time. The receiving program must be dynamically-linked, but it need not otherwise be specially prepared for Bypass. Agents created by Bypass have been used with unmodified system tools such as cp, grep, and emacs.

    Bypass is a code generator, much like the compiler tools yacc and bison. The programmer provides a specification which lists what system calls are to be trapped and what code is to replace them. Bypass parses the specification and produces C++ source for an agent which implements the programmer's intentions.

    Writing interposition agents from scratch is tricky -- each operating system implements its systems calls in a slightly different manner, so writing portable code to trap them is quite difficult. Bypass hides the programmer from all these unfortunate details.

    In addition to building interposition agents, Bypass can also build split execution systems. A split execution system consists of a matched interposition agent and a shadow process. The interposition agent attaches to a program, traps its system calls, and sends them back to another machine, where the shadow process executes them and returns the results. Under this arrangment, a program can run on any networked machine and yet execute exactly as if it were running on the same machine as the shadow.

    This manual describes how to accomplish all of these things using Bypass. We will begin with a simple interposition agent that measures a program's I/O behavior, proceed through several agents which work with external storage systems, and conclude with two simple split execution systems.

    Bypass was created by Douglas Thain at the University of Wisconsin. Rajesh Rajamani and Francesco Prelz made valuable contributions to the multithreaded feature. Massimo Sgaravatto was an early and brave debugger. Thank you!

    For more information about Bypass, please contact:

    The Bypass Web Site
    http://www.cs.wisc.edu/condor/bypass

    Douglas Thain
    thain@cs.wisc.edu
    http://www.cs.wisc.edu/~thain

    Miron Livny
    miron@cs.wisc.edu
    http://www.cs.wisc.edu/~miron

    The Condor Team
    condor-admin@cs.wisc.edu
    http://www.cs.wisc.edu/condor

    Computer Sciences Department
    University of Wisconsin
    1210 W. Dayton St.
    Madison WI 53706


    Getting started

    To begin, unpack the Bypass distribution into a scratch directory, and run the configure program. Here is an example of install Bypass in /home/fred/bypass using the C-Shell:
    % cd /tmp
    % gunzip bypass.tar.gz
    % tar xvf bypass.tar
    % cd bypass
    % ./configure --prefix /home/fred/bypass
    

    Bypass runs on a variety of UNIX-like operating systems. If you attempt to run Bypass on an operating system we have not tested, you will be given a warning, but the configuration process will attempt to go ahead. Please understand that Bypass deals with many system-specific low-level details, and we don't expect it will run out of the box with every new operating system. If you try Bypass on a new operating system, we would be happy to hear of your results.

    Bypass has run on the following systems:

  • SPARC Solaris 2.5.1, 2.7, 2.8
  • Intel Solaris 2.6
  • Intel Linux 2.0, 2.2 with GNU libc
  • Intel Linux 2.0 with libc5
  • MIPS IRIX 6.2, 6.5
  • Alpha OSF/1 4.0
  • Bypass requires the tools gcc, g++, bison, flex, make, and perl. The configuration process will check to make sure you have these tools in your path. If you do not have them, you can get them from the Free Software Foundation.

    A number of example programs are distributed with Bypass. Some of them make use of other software packages such as Globus, and SRB. If you have these packages, you may add them with options to configure like so:

    ./configure --prefix /home/fred/bypass --with-srb-path /usr/local/srb
    
    Bypass will work fine without these optional packages, but some of the example programs will not be built.

    After configuring, build the software, add the bin directory to your path, and set the BYPASS_LIBRARY_DIR variable to point to the lib directory. Here is an example of building Bypass using the C-Shell:

     % make
     % make install
     % setenv PATH ${PATH}:/home/fred/bypass/bin
     % setenv BYPASS_LIBRARY_DIR /home/fred/bypass/lib
    

    Beginning Example

    This beginning example was automatically built for you as examples/info_agent.so. If you would like to try this example out, skip right to the section on injecting an agent. Come back to this section when you would like to learn how to write and build agents.

    Writing an Agent

    This first example, info.bypass, is a simple agent which measures the I/O performed by an application. It traps the read and write system calls to count how many bytes each uses, and traps the exit system call to display a summary message as the program exits. Each trapped system call must be declared and then followed by the code to execute in its place.

    The agent_prologue section is optional, and contains any header code required by the user-written code that follows. This prologue in this example includes the standard I/O interface, and declares two variables to keep track of the number of bytes transferred.

    Note:
    If you use any preprocessor commands in this section, then you must replace the #s with @s. For example, #include becomes @include, and #define becomes @define.

    Each procedure declaration looks very much like a C procedure declaration: there must be a return value, a name, and the formal parameters. Next comes the agent_action keyword, and some C++ code delimted by double braces. A semicolon ends the declaration.

    The declaration names what system call is to be replaced and gives the code that is to take its place. The code may be any arbitrary C++ fragment -- it may compute values, use parameters, or even invoke the replaced procedure. Notice in the example above that the new definition of read invokes the original read and then stores the result before returning.

    Building an Agent

    Once you have written a specification, run Bypass to generate source code. Bypass will create a header file and a C++ source file with _agent appended.
    NOTE: If you want to use threaded agents or threaded applications, you'll have to compile with the -DUSE_PTHREADS flag.
    % bypass -agent info.bypass  
    
    Compile the source into an object file. You will need to use the -fPIC flag to make sure the object contains position independent code.
    % g++ -fPIC -g -I/home/fred/bypass/include -c info_agent.C -o info_agent.o
    
    For using threaded agents or applications, compile it as:

    % g++ -fPIC -g -I/home/fred/bypass/include -c -DUSE_PTHREADS info_agent.C -o info_agent.o
    
    Finally, convert the object file into a shared library. This process is slightly different on each platform. You may have to experiment with the linker flags to get correct results. Here are some examples:

    On Linux or Solaris:

    % g++ -shared info_agent.o -o info_agent.so -L/home/fred/bypass/lib -lbypass -ldl
    
    On OSF/1 or IRIX:
    % g++ -shared info_agent.o -o info_agent.so -L/home/fred/bypass/lib -lbypass
    

    Injecting an Agent

    Note:
    There are many esoteric operating system errors that you may encounter in this section. If these instructions don't work for you, please consult the frequently asked questions.

    To inject an agent into a program, we will instruct the linker to load the agent as a shared library before any other libraries are referenced. This is done on most platforms by setting an environment variable. Again, this process is slightly different on each platform. Here are some examples:

    On Linux or Solaris:

    % setenv LD_PRELOAD /path/to/info_agent.so
    
    On OSF/1 or IRIX:
    % setenv _RLD_LIST /path/to/info_agent.so:DEFAULT
    

    Now, run any old command that you like -- try something as simple as ls -l:

    % ls -l
    
    total 28
    drwxr-xr-x   2 thain    23330        2048 Apr 11 14:03 CVS
    -rw-r--r--   1 thain    23330         382 Apr 13 13:35 Makefile
    -rw-r--r--   1 thain    23330         679 Apr 13 13:35 Makefile.config
    -rw-r--r--   1 thain    23330         382 Apr  7 10:35 Makefile.template
    -rw-r--r--   1 thain    23330        2363 Apr 11 13:42 README
    drwxr-xr-x   3 thain    23330        2048 Apr 13 13:35 bin
    -rwxr-xr-x   1 thain    23330        5600 Apr  9 19:54 configure
    drwxr-xr-x   3 thain    23330        2048 Apr 13 13:31 doc
    drwxr-xr-x   3 thain    23330        4096 Apr 13 13:35 examples
    drwxr-xr-x   3 thain    23330        2048 Apr 11 13:25 lib
    drwxr-xr-x   3 thain    23330        4096 Apr 13 13:35 src
    NOTICE: process 5657: 297267 bytes read, 703 bytes written
    

    Notice that the program ran as normal, but the interposition agent counted up all the I/O performed and displayed a message just as the application ended. Go ahead and try more complicated programs such as emacs or netscape. You should see results for these programs as well.

    To return to normal operation, simply unset the environment variable:

    On Linux or Solaris:

    % unsetenv LD_PRELOAD
    
    On OSF/1 or IRIX:
    % unsetenv _RLD_LIST
    

    Layering Agents

    Example of Layering

    Several agents may be applied to a single program at once. This yields a stack of software components, with each component called a layer. The topmost layer is the application, and the bottommost layer is the standard library. In between are zero or more agent layers. To apply several agents at once to a program, just list them one by one in the preloading command. For example, to apply both the Automatic GASS and measurement layers to a single program, do this:
    % setenv LD_PRELOAD "/path/to/auto_gass_agent.so /path/to/info_agent.so"
    

    Layering Rules

    In a program composed of multiple layers, there can be many definitions of and references to a single procedure name. In most programs, this constitutes an error, but in Bypass it is normal. We must carefully define some rules which describe the exact binding of names between layers.
    1. A process keeps track of its active layer in a global variable. A process begins execution with the top layer active.
    2. A call to a trapped procedure name selects the definition in the layer immediately below the active layer. If none is present, the next layer below is searched, and so on.
    3. After selecting but before invoking a trapped procedure, the active layer is lowered to that of the selected definition. Before returning, the active layer is restored to its previous value.
    4. A call to a non-trapped procedure does not consult or affect the active layer. Such calls are bound according to the normal linking policy of the operating system.

    Layer Order Matters

    An extensive discussion of layering and its consequences can be found in our published papers. However, it is important to at least note here that the ordering of layers matters.

    For example, the measurement layer may be placed above or below the Automatic GASS layer. If above, the measurement layer will only record those operations actually attempted by the application. If below, the measurement layer will record all operations performed by the combination of the application and the Automatic GASS layer, including any reads or writes necessary to implement globus_gass_open. You might want to have both -- one above, and one below. however, if you do this, you must make a separate copy for each instance of the agent, otherwise two copies of the code will share the internal data structures. To combine two measurement agents with the Automatic GASS agent, you might do this:

    % cp info_agent.so top_agent.so
    % setenv LD_PRELOAD "/path/to/top_agent.so /path/to/auto_gass_agent.so /path/to/info_agent.so"
    

    Split Execution Systems

    Bypass can be used to build split execution systems. As the name implies, such a system splits a program's execution between two machines. An application runs on a remote machine while an interposition agent traps some of its system calls. It sends some of those system calls via RPC back to a shadow process which runs in the user's home environment. The shadow performs the system calls and sends the results back to the agent and application. This system allows a process running on a remote machine to behave as if it were running on the user's home machine.

    We will give two examples of split execution systems built using Bypass. The first simply sends the standard I/O operations without modification. The second prevents access to particular files and logs a message for each file opened.

    Remote I/O System

    This example, io.bypass, is a simple split execution system which traps and forwards the standard UNIX I/O operations. It should already be built for you in the examples directory.

    Notice that no agent_action or shadow_action is given for any of the procedures. When no action is given, the default is for the agent to send the call via RPC to the shadow, which then invokes the procedure normally.

    Notice also that some annotations have been made to the parameters of open, read, and write. These three procedures have pointer arguments. Pointers are a bit tricky because they refer to some large, variable size amount of data. Bypass must be informed of what direction pointer data must move, and how much data is to be transferred.

    The parameter name is given to open to determine what file to open. We know that open will not send back any data when it completes, so we determine that the name data only flows in to the procedure. The parameter is a null-terminated string, which is indicated by the keyword "string".

    Every pointer argument must be prefixed with in, out, or in out, to describe which way data flows. in indicates data flows from the agent to the shadow, while out indicates data flow from the shadow to the agent Following in or out may be a number of additional constructs:

  • string indicates this parameter points to a null-terminated string. Data up to and including the terminator will be transferred. A string may also be an out parameter, but then the keyword string must be followed by a quoted expression which must evaluate to the maximum number of bytes available to store the string.
  • opaque "expr" indicates this parameter points to opaque binary data. "expr" is a C++ expression which must evaluate to the exact number of bytes to transfer.
  • array "expr" indicates this paramater points to an array of objects of the given type. "expr" is a C++ expression which must evaluate to the number of objects to transfer.
  • Using Split Execution Systems

    To build a split execution system, run bypass with both the -agent and -shadow flags. This time, three files will be written: io_agent.C, io_shadow.C, and io.h.

    The agent must be compiled and linked against libbypass.a, which is found in the lib and include directories of the Bypass distribution. If your platform requires particular libraries in order to use sockets (shown in red), make sure to link against those, too.

    % bypass -agent -shadow io.bypass
    % g++ -I/home/fred/bypass/include -c io_agent.C -o io_agent.o
    % g++ -shared io_agent.o -L/home/fred/bypass/lib -lbypass -o io_agent.so -ldl -lnsl -lsocket
    

    The shadow consists only of io_shadow.C. It should be compiled into a standalone program and linked with libbypass.a.

     %  g++ -I/home/fred/bypass/include -c io_shadow.C -o io_shadow.o
     %  g++ io_shadow.o -L/home/fred/bypass/lib -lbypass -o io_shadow -lnsl -lsocket
    

    To execute the program, the shadow must be started, and then the agent must be instructed where to find the shadow.

    First, run the shadow with no arguments. It will display a message indicating the host and port it is running on, and then it will wait for a agent to connect:

    %  ./io_shadow
    setenv BYPASS_SHADOW_HOST www.xxx.yyy.zzz
    setenv BYPASS_SHADOW_PORT pppp
    

    The agent must be told the host and port number of the shadow it is to connect to. These are passed by way of environment variables. Handily, the shadow has printed them out in a format which is convenient for cutting and pasting. Paste these into another window, and then run the agent as you normally would.

    % setenv BYPASS_SHADOW_HOST www.xxx.yyy.zzz
    % setenv BYPASS_SHADOW_PORT pppp
    % setenv LD_PRELOAD `pwd`/info_agent.so
    % cat /etc/passwd 
    bypass_agent: Getting configuration from environment...
    bypass_agent: Connecting to www.xxx.yyy.zzz port pppp...
    bypass_agent: Connection made.
    

    All the input and output for the program will be conducted in the shadow window:

    % ./io_shadow
    setenv BYPASS_SHADOW_HOST www.xxx.yyy.zzz
    setenv BYPASS_SHADOW_PORT pppp
    bypass_shadow: Waiting for connection...
    root:*:0:1:System Administration:/:/sbin/sh
    operator:*:5:5:System Backups:/u/o/p/operator:/bin/tcsh
    ...
    

    Controlled I/O System

    This example, controlled_io.bypass, is a split execution system which selectively controls what files an application can open. It should already be built for you in the examples directory.

    Each Bypass declaration involves an action on either (or both) the agent and shadow programs. We have already seen code explicitly invoked on the agent side by a agent_action block. We can also control the code executed on the shadow by specifying shadow_action blocks. When no agent_action is given, the default is to invoke a remote procedure call. When no shadow_action is given, the default is to invoke the procedure of the same name. For example, this declaration:

    int open( in string const char *path, int flags, [int mode] );
    

    implies these actions:

            agent_action
            {{
                    return bypass_shadow_open( path, flags, mode );
            }}
            shadow_action
            {{
                    return open( path, flags, mode );
            }};
    

    We can explicitly specify these action blocks to create some very powerful code. Let's modify open to create a simple sandbox. If the user attempts to open a file not in the current directory (that is, it contains a slash,) we will return a permission error. Otherwise, we will forward the request to the shadow, which will print out a brief notice and open the file.

    int open( in string const char *path, int flags, [int mode] )
            agent_action
            {{
    		if(strchr(path,'/')) {
    			printf("DENIED: agent tried to open %s\n",path);
    			errno = EPERM;
    			return -1;
    		} else {
    			return bypass_shadow_open(path,flags,mode);
    		}
            }}
            shadow_action
            {{
                    printf("NOTICE: agent opened %s\n",path);
                    return open(path,flags,mode);
            }};
    

    Shadow Options

    In the examples we have shown above, each program communicates through its agent to its own shadow, listening on a port known in advance. By default, the shadow randomly selects an available port, displays it, and requires that the user inform the agent of the port:
     % ./io_shadow
    setenv BYPASS_SHADOW_HOST www.xxx.yyy.zzz
    setenv BYPASS_SHADOW_PORT pppp
    ...
    
    If multiple programs are to be run using the same split execution model, it may be more useful to have a server listening on a well-known port and forking off shadows for each incoming connection. This is easily done with command line options to the shadow:
     % ./io_shadow -port 50000 -multiprocess
    
    On the other hand, your application may not require that each program use a unique shadow process. In this case, the shadow can listen on a well-known port and simply fork off a new thread for each connection. (This option is only available if the pthread library was available at build time.) For example:
     % ./io_shadow -port 50000 -multithread
    
    Finally, the -debug option will display lots of information about each connection.
     % ./io_shadow -debug
    
    Note:
    The multithreading option is only available on Linux and then only if the pthreads library can be found. If you are building the examples in the Bypass package, the configure program will take care of finding and using the pthreads library. If you are building your own software, you must compile the shadow with -DUSE_PTHREADS and direct the compiler and linker to the pthreads library by yourself.

    Security

    By default, a Bypass shadow will accept any incoming connection. This is ok for a testing environment, but out of the question for production use. Bypass splits such security concerns into two realms: authentication and authorization. Authentication is the process of confirming the identity of the subject, or the name of the agent attempting to communicate. Authorization is the process of determining which subjects are allowed to connect.

    Two authentication methods are currently provided: Globus GSS and Trivial. Globus GSS authentication is more secure, but requires the Globus software and appropriate certificates. Trivial authentication is less secure, but has no special requirements. Both the agent and the shadow may use one or both authentication methods -- when they connect, they will negotiate a mutually acceptable method.

    Globus GSS authentication is the recommended authentication mechanism for Bypass. If the Globus software was available when Bypass was built, then it is the default. Globus GSS uses public/private key cryptography to identify the agent to the shadow. The user running the agent is identified in human readable form as an X.509 subject, which looks something like:

    /C=US/O=Bedrock Township/OU=Construction Services/CN=Fred Flintstone
    
    Trivial authentication is provided as a poor man's alternative. In this scheme, the agent transmits the name of the user running the agent to the shadow. The shadow accepts this name without question and then uses a reverse DNS lookup to determine the name of the machine running the agent. These two names are attached together into a subject name that looks something like an email address:

    fred@construction.bedrock.gov
    

    Authorization is performed by looking for the subject name in an authorization file. The -authfile option specifies the path to an authorization file you can create. This file simply lists subject names one to a line. An asterisk may be used as a a wildcard to match several subjects. An example authorization file might be:

    /C=US/O=Bedrock Township/OU=Construction Services/CN=Fred Flintstone
    /C=US/O=University of Petonkwa/OU=Computer Sciences/*
    fred@construction.bedrock.gov
    *@administration.bedrock.gov
    

    Important Security Note:
    If you do not specify an authorization file, it is assumed you are willing to accept any connection. I do not recommend doing this for a production enviroment. Use -authfile to control the allowed connections.
    Note:
    The Globus option is only available if the Globus libraries can be found. If you are building the examples in the Bypass package, the configure program will take care of finding and using them. If you are building your own software, you must compile the agent and shadow with -DUSE_GLOBUS_GSS and direct the compiler and linker to the various and sundry Globus libraries by yourself.

    Security

    Notes for Wizards

    If this is your first trip through the manual, we recommend that you stop here and try your hand at the examples. Return to this section if you need more details about the esoteric bits of Bypass.

    Preprocessor Commands

    Bypass code is fed through the C preprocessor twice. Bypass runs the preprocessor before reading your input file, and then it is run again as part of the C++ compiling stage. Commands for the first pass should begin with #, and commands for the second pass should begin with @.

    For example, you may want to use the preprocessor to manage what code is included, but the code itself may also require the preprocessor:

    #ifdef sun
            int exit( int status )
                    agent_action
                    {{
                            @define SUCCESS -1
                            exit(SUCCESS);
                    }};  
    #else
            int exit( int status )
                    agent_action
    		{{
                            @define SUCCESS 0
                            exit(SUCCESS);
                    }};  
    #endif
    

    Variable Arguments

    In general, Bypass does not support procedures with a variable number of arguments. However, there are several system calls which declare a variable argument interface, but always use one extra argument with a predictable type.

    A trailing argument enclosed in brackets indicates that the declaration is variable, but any call to the procedure should assume a variable with the bracketed name and type.

    A replacement for fcntl might be declared like this:

    int fcntl( int fd, int command, [void *arg] );
    

    Supported Data Types

    Bypass does not go to the trouble to support every last possible combination of C++ type keywords. The following syntax for types is supported:
    type      : [unsigned] [const] [struct] type-name star-list
    star-list : /* nothing */
              | [const] '*' star-list
    

    Environment Variables

    Agents created by Bypass consult several environment variables.
    BYPASS_SHADOW_HOST
    
    In this variable, the user must place the hostname or IP address of the shadow to contact.
    BYPASS_SHADOW_PORT
    
    In this variable, the user must place the port number of the shadow to contact.
    BYPASS_DEBUG
    
    If this optional variable is set, a Bypass agent will display some debugging information on the standard error stream.
    BYPASS_FAILURE_PASSTHROUGH
    
    If this optional variable is set, errors that are normally fatal will be returned as normal error codes from RPC routines. For example, if the network fails while an agent is executing a bypass_shadow_open, the normal result is for Bypass to display an error and kill the process. However, if this variable is set, the RPC will simply return -1 with an appropriate errno, such as EPIPE.

    This default behavior is chosen for two reasons. First, re-establishing the connection and rebuilding any state that was accumulated at the shadow is beyond the power of the application. Second, Bypass forces an abnormal termination (killed by signal) so that the scheduling system does not assume the application exited normally.

    The modified behavior may be useful to some agents (such as the Grid Console) which may have enough information to trap and retry such errors.

    Built-In Utilities

    For most purposes, code written in agent blocks need not contain any specialized Bypass code. HOwever, a few procedures are provided that may be useful to the agent programmer.
    int bypass_shadow_*( ... );
    
    If building both a shadow and an agent, Bypass will generate RPC stubs that match the procedure declaration. These stubs bear the name of the replaced procedure with bypass_shadow prepended. For example, to invoke a remote open invoke bypass_shadow_open with the same arguments as open. This may be done inside of any agent action.
    void bypass_debug( char *fmt, ... );
    
    This function accepts printf-style arguments and displays the output on the standard error stream if BYPASS_DEBUG is set.
    void bypass_error( char *fmt, ... );
    
    This function accepts printf-style arguments and always displays the output on the standard error stream.
    void bypass_die();
    
    This function forces the calling process to terminate abnormally.
    int bypass_failure_passthrough;
    
    Setting this integer to true has the same effect as setting BYPASS_FAILURE_PASSTHROUGH.

    The Knowledge File

    Bypass generates code for a variety of UNIX-like platforms. On each platform, there are many tricky details to trapping and invoking each of the system calls. Bypass collects all these details together in a knowledge file, lib/bypass_knowledge. When a user requests that a call be trapped, Bypass consults the knowledge file and generates several pieces of code for each of the user's declarations.

    The knowledge file has the same syntax as a regular Bypass input file, but it make heavy use of option rules. An option rule lists the tricky details needed for a particular system call. An option rule by itself does not generate any code -- it only specifies options in case the user wants to trap the named procedure.

    For example, the option statement for read is:

    options "read"
            entry "_read", "__read"
            syscall
            local_name "read"
            remote_name "read"
    	;
    
    entry indicates that trapping read also involves catching the related _read and __read. syscall indicates that read is a true system call (as opposed to a standard library call.) local_name and remote_name give the names of the procedures to invoke when operating locally or via RPC. These are almost always the same as the regular procedure.

    Option rules tend to be very similar to the example above -- only a few break the pattern. So, rules may use wildcards which specify the options for a whole class of system calls. For example, the first entry in the knowledge file looks something like this:

    options "*"
            entry "_*", "__*"
            local_name "*"
            remote_name "*"
            syscall
            library "libc"
            ;
    
    This statement indicates that any system call will get the rules mentioned, with the call's name substituted for each occurrence of "*".

    A call may match several option rules. For example, fstat would match entries named "*", "f*", and "fstat". If this happens, the rules are applied in the order they appear in the knowledge file.

    The allowed statements in the option rules are:

  • syscall
    This procedure may be re-invoked with a real system call.

  • libcall
    This procedure must be re-invoked by consulting a library routine. Whether the library is static or dynamic is determined by the command line.

  • library "library-name"
    Use this library when re-invoking this procedure. This name should not have a trailing .a or .so.

  • plain
    There is no system call or library routine matching this procedure, so flag an error if it is attempted in local mode.

  • kill
    This procedure has an inline or static definition in the system include files. Issue some pre-processor magic to kill these definitions before re-defining the procedure.

  • entry "name1", "name2", ...
    Provide additional entry points with these names.

  • local_name "name"
    Use this name when re-invoking this procedure locally.

  • remote_name "name"
    Use this name when invoking the RPC version of this procedure.

  • switch_code {{ code-fragment }}
    Do not automatically generate switch code. Use this code instead.

  • indirect "name"
    Generate an indirect system call using this name as the primary name and the procedure name as the secondary name. This is used to generate the Linux socket calls, where socket() becomes syscall(SYS_socketcall,SYS_socket,...).

  • instead <procedure-decl>
    Do not generate the procedure that the user requested. Generate this one instead.

  • also {{ code-fragment }} In addition to the usual declaration, add on this bit of code.
  • The knowledge file is heavily commented with the reasons behind each unusual system call. The adventuresome reader should skip right to the knowledge file to learn all the dirty details.

    Frequently Asked Questions

    1. Why do some of my programs, like cp and ls, ignore the interposition agent?

      The standard injection method requires that the accepting program be dynamically linked. This is true of most programs on modern UNIX systems. On a few systems, critical programs (such as cp and ls) are statically linked so that they may be used without the standard library present. Statically linked programs will ignore any interposition agents. On some systems, you can use the ldd program to determine if a program is dynamically linked.

      In particular, many of the standard IRIX utilites are statically linked. However, the GNU utilities, provided in /usr/gnu/bin, are dynamically linked and should give you the behavior you want.

    2. What is position independent code? Why do I need it?

      Most compilers produce relocatable object code by default. This kind of code is annotated with relocations which tell the linker how to rewrite bits of the code as it is placed in memory. This method is just fine for static linking.

      Some operating systems do not allow relocatable code to be used in a dynamic library, because processing all those relocations would result in a very long startup time for a simple application. So, any code that gets placed in a dynamic library must be position independent code -- all references in the code use pc- or base-relative addresses that do not require any relocations. This allows objects to be re-arranged at run-time without a stiff penalty.

      The upshot of this is that all code that goes into an interposition agent must be position-independent. This is easy to get -- you just compile with the -fPIC flag. Any additional libraries (such as Globus or SRB) linked against the interposition agent must also be compiled as position-independent.

      If you have already built these packages and didn't specify the -fPIC flag at compilation time, then I'm afraid you'll have to re-build them from scratch with -fPIC enabled.

    3. What does this linker error mean?

      Text relocation remains                         referenced
          against symbol                  offset      in file
      ASN1_UTCTIME_set                    0x740       /p/condor/workspaces/ssl/lib/libcrypto.a(x509_vfy.o)
      ...
      (followed by about a million similar lines)
      ...
      

      This error means you didn't compile all the various libraries and objects in your agent as position independent code. Please see the preceding question for more information.

    4. What does this IRIX warning mean?

      ld32: WARNING 85: definition of __write in info_agent.o preempts that definition in /usr/lib32/mips3/libc.so.1.

      This warning says that your interposition agent defined a system call that was already defined in the standard library. In most programs, this indicates a bug, but for our purposes, this is exactly what you want -- your interposition agent is replacing a standard system call. You may safely ignore these warnings.

    5. What does this IRIX error message mean?

      9038:ls: rld: Warning: elfmap: running old 32-bit executable but finding new 32-bit shared objects with matching DSO name in the search path. You may not have set the environment variables correctly, please set LD_LIBRARY_PATH for old 32-bit objects, LD_LIBRARYN32_PATH for new 32-bit objects and LD_LIBRARY64_PATH for 64-bit objects -- continue searching ...

      The situation on IRIX is a little complicated. IRIX currently has three binary program models -- o32, n32, and n64. A program built for one model can only be used with libraries of the same model. This message means that the program you are trying to run did not have the same binary model as that of the interposition agent, so you cannot inject it. You need to either rebuild the application or the agent so that they have the same model. Consult the documentation for your compiler to see exactly how to do this.

    6. Using HPUX, how can I inject an agent into a program at run time?

      To the best of my knowledge, you can't do so by simply setting an environment variable. It may be possible to do so by creating a separate program which uses the /proc interface to load a program, suspend it, and inject the interposition agent. If you come up with a better method, we would be happy to document it here.

    7. How does Bypass relate to Condor?

      Many of the ideas and techniques used in Bypass were inspired by similar features in Condor. Bypass does not require Condor, nor does Condor use Bypass. They are separate programs.