NaviServer - programmable web server
4.99  5.0

[ Main Table Of Contents | Table Of Contents | Keyword Index ]

nscgi(n) 4.99.30 nscgi "NaviServer Modules"

Name

nscgi - NaviServer CGI Interface Guide

Table Of Contents

Description

What is CGI and How Does it Work?

CGI (Common Gateway Interface) is a standard way of running programs from a Web server. Often, CGI programs are used to generate pages dynamically or to perform some other action when someone fills out an HTML form and clicks the submit button. NaviServer provides full support for CGI v1.1.

Basically, CGI works like this:

A reader sends a URL that causes the NaviServer to use CGI to run a program. The NaviServer passes input from the reader to the program and output from the program back to the reader. CGI acts as a "gateway" between the NaviServer and the program you write.

The program run by CGI can be any type of executable file on the server platform. For example, you can use C, C++, Perl, Unix shell scripts, Fortran, or any other compiled or interpreted language. You can also use Tcl scripts with CGI, though the NaviServer API will not be available to them.

With NaviServer, you have the option of using the embedded Tcl and C interfaces instead of CGI. Typically, the Tcl and C interfaces provide better performance than CGI (see the NaviServer Tcl Developer's Guide for information on the Tcl interface and the NaviServer C Developer's Guide for information on the C interface).

You may want to use CGI for existing, shareware, or freeware programs that use the standard CGI input, output, and environment variables. Since CGI is a standard interface used by many Web servers, there are lots of example programs and function libraries available on the World Wide Web and by ftp. This chapter describes the interface and points you to locations where you can download examples.

For example, suppose you have a form that lets people comment on your Web pages. You want the comments emailed to you and you want to automatically generate a page and send it back to your reader.

  1. The reader fills out your form and clicks the "Submit" button. The FORM tag in your page might look like this:

     <FORM METHOD="POST" ACTION="/cgi-bin/myprog">
    

    The METHOD controls how the information typed into the form is passed to your program. It can be GET or POST. The ACTION determines which program should be run.

    Other ways for a reader to run a program are by providing a direct link to the program without allowing the reader to supply any variables through a form, or by using the ISINDEX tag.

  2. When NaviServer gets a request for a URL that maps to a CGI directory or a CGI file extension (as defined in the configuration file), it starts a separate process and runs the program within that process. The NaviServer also sets up a number of environment variable within that process. These environment variables include some standard CGI variables, and optionally any variables you define in the configuration file for this type of program.

  3. The program runs. The program can be any type of executable program. For example, you can use C, C++, Perl, Unix shell scripts, or Fortran.

    In this example, the program takes the comments from the form as input and sends them to you as email. If the form method is GET, it gets the input from an environment variable. If the form method is POST, it gets the input from standard input. It also assembles an HTML page and sends it to standard output.

  4. Any information the program passes to standard output is automatically sent to the NaviServer when the program finishes running.

  5. The server adds any header information needed to identify the output and sends it back to the reader's browser, which displays the output.

Configuring CGI with NaviServer

You can control the behavior of NaviServer's CGI interface by setting parameters in a configuration file. For example, you can control which files and directories are treated as CGI programs, you can determine how to run various types of programs, and you can set a group of environment variables for each type of program you use.

Note that if you're defining multiple servers, you will need to configure the CGI interface for each server configuration.

To enable and configure CGI:

  1. Edit your NaviServer configuration file, usually named nsd.tcl.

  2. Choose the server for which you want to enable CGI (such as for example the server named Server1). Then add the CGI module to that server such it will be loaded at start time. For example:

     ns_section "ns/server/Server1/modules"
     ns_param nscgi nscgi.so
    
  3. Add a section for the server with the suffix module/nscgi to configure the modules with certain parameters. One has to define typically the mappings via the parameter Map to indicate, what HTTP-method with which path should be directed to a corresponding CGI program. Typically, the mapping will point to a directory on the machine in which the CGI scripts are located.

    The value specified via the parameter map is of the form

    method pattern ?path?

    where method refers to the HTTP request method (i.e., HEAD, GET, POST, etc.), pattern refers to either a relative URL or a glob pattern to match on the HTTP request. This is how nscgi determines whether a request is a CGI request. path is optional and indicates either the directory where the CGI executables can be found, or the executable that should be used to fulfill the CGI request.

    If path is not specified, then the URL must refer a file which is the CGI executable. If path is specified and is a directory, then the filename portion of the URL must refer to a CGI executable in that directory. Otherwise, path must refer to a CGI executable which will handle all requests for this pattern. For example:

     ns_section "ns/server/Server1/module/nscgi" {
       ns_param Map  "GET /cgi /usr/local/cgi"
       ns_param Map  "POST /*.cgi"
     }
    
  4. If you want to call a CGI program (script) which requires an interpreter (e.g., Perl or bash), you will need to define the CGI interpreters via the module parameter Interps.

    • Add a definition for the Interps parameter to your CGI configuration section using e.g. the name CGIinterps.

       ns_section "ns/server/Server1/module/nscgi" {
         ns_param Map    "GET /cgi /usr/local/cgi"
         ns_param Map    "POST /*.cgi"
         ns_param Interps CGIinterps
       }
      
    • Then add a section under ns/interps for the chosen name (here cgi) and add there the mappings from file extensions to the script interpreters. When a CGI script with a specified extension is to be executed (below .pl and .sh), NaviServer will call the script via the named executables.

       ns_section "ns/interps/CGIinterps" {
         ns_param .pl  "c:\perl\bin\perl.exe"
         ns_param .sh  "c:\mks\mksnt\sh.exe(MKSenv)"
       }
      

      If no mapping of extensions to script interpreters is provided, the called script must have executable permissions.

      Files in the cgi directory without execute permissions are served as plain files when the parameter allowstaticresources is set to true. This way images and such in can be served directly from a cgi bin directory.

  5. In case the CGI script or the script interpreter requires additional environment variables, one can define this via the module parameter Environment and a matching section for the detail definitions.

    • Add a definition for the Environment parameter to your CGI configuration section.

       ns_section "ns/server/Server1/module/nscgi" {
         ns_param Map         "GET /cgi /usr/local/cgi"
         ns_param Map         "POST /*.cgi"
         ns_param Interps     cgi
         ns_param Environment cgi
       }
      
    • Then add a section under ns/environment for the chosen name (here cgi) containing the required environment variable definitions.

       ns_section "ns/environment/cgi" {
         ns_param FOO BAR     ;# defines environment variable "FOO=BAR"
         ns_param TMP /tmp    ;# defines environment variable "TMP=/tmp"
       }
      
  6. Further configuration parameter for module/nscgi are:

    • gethostbyaddr is a boolean parameter to indicate whether to resolve the peer IP address to its hostname when setting the REMOTE_HOST environment variable. If false, the peer IP address will be used instead. Caution: turning this option on can negatively impact performance due to the overhead of having to perform a DNS look-up on every CGI request. Default: off

    • limit is an integer parameter to indicate the maximum number of concurrent CGI requests to execute. 0 means unlimited. Default: 0

    • maxinput is an integer parameter to indicate the maximum in bytes to accept in the HTTP request. 0 means unlimited. Mostly useful to limit the size of POST'ed data. Default: 1024000

    • maxwait is an integer parameter to indicate the amount of time to wait in seconds in the queue when the concurrency limit has been reached. Server will respond with a "503 Service Unavailable" error on timeout. If limit is set to 0, this setting will have no effect. Default: 30

    • systemenvironment is a boolean parameter to controls whether the CGI will inherit the server process's environment variables or not. Enabling this could potentially leak sensitive information from the parent's environment if the CGI displays its environment variables which is the behavior of some common error-handling code. Default: false

    • allowstaticresources is a boolean parameter to controls whether static resources (e.g. images) can be served directly from the CGI bin directory. Default: false

How Web Pages Run CGI Programs

There are several ways a Web page can run a CGI program:

URLs that Run CGI Programs

For each method of running a CGI program described in the previous section, the browser software sends a URL to the server. (In addition, the HTTP header sent with the URL includes some environment variables).

Generally the URL to run a CGI program can have these parts:

CGI path[/extra path information ][?query string]

  • The CGI path is the location of the CGI program to run. The path can be a relative or absolute reference to the program file.

  • The optional extra path information can be included in the URL to provide either a directory location the CGI program should use or some extra information for the CGI program. The path is relative to the root directory for Web pages. The extra path information is available to the CGI program in the PATH_INFO environment variable.

  • The optional query string is preceded by a question mark (?) and contains either a single variable or a set of field names and variables for the CGI program to use. The query string is available to the CGI program in either the QUERY_STRING environment variable or the standard input location (if the form method is POST).

For example, the query string from a form with 3 fields could be:

Field1=Value1&Field2=Value2&Field3=Value3

Spaces in the query string are replaced with plus signs (+). Any special characters (such as ?, =, &, +) are replaced with %xx, where xx is the hexadecimal value for that character.

Here are some examples of URLs that could run a CGI program:

  • http://www.mysite.com/cgi-bin/gettime

    This URL runs the gettime program, which could return a page with the current time. There are no variables, so you might use this as a direct link.

  • http://www.mysite.com/cgi-bin/listdir/misc/mydir

    This URL runs the listdir program and passes it /misc/mydir as extra path information. This might be a direct link in a page.

  • http://www.mysite.com/cgi-bin/search?navigate

    This URL runs the search program and passes it the word "navigate" as input. This URL doesn't include any field names, so it might be passed by pages with an ISINDEX tag.

  • http://www.webcrawler.com/cgi-bin/WebQuery?searchText=word

    This is a real URL that runs the WebCrawler search program and passes a value for the searchText field of "word". Normally, CGI programs that accept field values like these are run from a form.

If your programs are not executed, make sure the program file allows read and execute access.

Input to CGI Programs

CGI programs can get input from these sources:

Accessing Environment Variables

Different languages allow you to access environment variables in different ways. Here are some examples:

C or C++

 #include <stdlib.h>;
  
 char *browser = getenv("HTTP_USER_AGENT");

Perl

 $browser = $ENV{'HTTP_USER_AGENT'};

Bourne shell

 BROWSER=$HTTP_USER_AGENT

C shell

 set BROWSER = $HTTP_USER_AGENT

Standard Environment Variables

These standard environment variables are defined for all CGI programs by the NaviServer:

AUTH_TYPE:

If the server supports user authentication, and the script is protected, this is the protocol-specific authentication method used to validate the user. For CGI programs run by NaviServer, this is always "Basic".

Example: Basic

CONTENT_LENGTH:

If the CGI program is run by a form with the POST method, this variable contains the length of the contents of standard input in bytes. There is no null or EOF character at the end of standard input, so in some languages (such as C and Perl) you should check this variable to find out how many bytes to read from standard input.

Example: 442

CONTENT_TYPE:

If the CGI program is run by a form with the POST method, this variable contains the MIME type of the information sent by the browser. Currently, all browsers should send the information as application/x-www-form-urlencoded. Other types may be added in the future.

GATEWAY_INTERFACE:

The version number of the CGI specification this server supports.

Example: CGI/1.1

HTTP_ACCEPT:

A comma-separated list of the MIME types the browser will accept, as specified in the HTTP header the browser sends. Many browsers do not send complete lists, and the list does not include external viewers the user has installed. If you want to send browser-specific output, you may also want to check the browser name, which is specified by the HTTP_USER_AGENT variable.

Examples:

 */*, application/x-navidoc
 */*, image/gif, image/x-xbitmap, image/jpeg

HTTP_FROM:

This variable may contain the email address of the reader who caused the CGI program to run. However, some browsers do not send the email address for privacy reasons. And, users may enter false email addresses in their preferences settings.

Example: itsme@mydomain.com

HTTP_IF_MODIFIED_SINCE:

This variable contains a date and time if the browser wants a response only if the data has been modified since the specified date and time. The date is in GMT standard time. Many browsers do not send this information.

Example: Thursday, 23-Nov-95 17:00:00 GMT

HTTP_REFERER:

This variable contains the URL of the page or other location from which the reader sent the request to run the CGI program. For example, if the reader runs the program from a form, this variable contains the URL of that form.

Example: http://www.mydomain.com/mydir/feedback.htm

HTTP_USER_AGENT:

This variable tells which browser the reader is using to send the request. Normally, the format is "browser name/version".

Example: Mozilla/1.2N (Windows; I; 16-bit)

PATH_INFO:

This variable contains any extra path information included in the URL sent by the browser. Commonly, this type of URL is used to pass a relative directory location to your program. For example, the following URL runs the listdir program and passes it /misc/mydir as extra path information:

http://www.mysite.com/cgi-bin/listdir/misc/mydir

Another use for this type of URL is to pass information to the program without using a form or to pass form-specific variables in addition to the user-specified variables. For example:

http://www.mysite.com/cgi-bin/search/keyword=navigate

Examples: /misc/mydir/keyword=navigate

PATH_TRANSLATED:

This variable translates the relative path from PATH_INFO into the absolute path by prepending the server's root directory for Web documents. This is useful because PATH_INFO, which the reader can view, need not reveal the physical location of your files on the server.

Example: /NaviServer/pages/misc/mydir

QUERY_STRING:

This variable contains information passed by a form or link to the program. The QUERY_STRING contains information in the following situations:

  • The reader submitted a form that uses the GET method.

  • The reader submitted a query in a page with the ISINDEX tag. (The text the user types is also decoded and sent to the program's command line in this situation. The QUERY_STRING provides the non-decoded information.)

  • A direct link included information after a "?" in the URL.

The QUERY_STRING is encoded in a format like this:

Field1=Value1&Field2=Value2&Field3=Value3

Your CGI program should decode the QUERY_STRING. Functions that decode this string are publicly available functions for most languages. The string encoding follows these rules:

  • Field name/value pairs are separated by an "&" sign.

  • A field's name and its value are separated by an "=" sign. Field names are specified by the NAME attribute. Field values depend on the type of field:

Text field and text area: The value is the text typed into the field. Multiline text is sent as one line with the return character encoded as described below.

Radio Buttons: The value is the value of the button that is selected.

Checkbox: The name and value usually appear in the list only if the box is checked. Some browsers may send the name of the checkbox only.

Selection List: The value of a selection list is the text of the item that is selected. If multiple items can be selected, there is a name/value pair with the same name for each item that is selected.

Image Field: Two name value pairs are sent. ".x" and ".y" are added to the field name and the values are the x and y coordinates (measured in pixels from an origin at the upper-left corner of the image). For example:

Figfield.x=185&Figfield.y=37

Hidden Fields: You can use hidden fields with fixed values (or values set when a CGI program generated the page). The value is set with the VALUE attribute. Some older browsers make hidden fields visible.

Range Fields: The value is the numeric value of the field (sent as a string). Some browsers do not support range fields.

Named Submit Buttons: You can place multiple Submit buttons in a form. If you add a NAME attribute to the Submit button, that name will be sent, along with the label of the button as the value. All the Submit buttons in a form run the same CGI program, but the CGI program can perform different actions based on which button was clicked. Some browsers do not support named submit buttons.

  • Spaces are replaced by "+" signs.

  • Special characters are replaced by a "%" sign followed by the hexadecimal value of the character. Here are some common characters and their hex values:

 # -- %23
 = -- %3D
 / -- %2F
 % -- %25
 : -- %3A
 \ -- %5C
 & -- %26
 ; -- %3B
 tab -- %0A
 + -- %2B
 ? -- %3F
 return -- %09

REMOTE_ADDR:

The IP address of the machine from which or through which the browser is making the request. This information is always available.

Example: 199.221.53.76

REMOTE_HOST:

The full domain name of the machine from which or through which the browser is making the request. If this variable is blank because the browser did not send the information, use the REMOTE_ADDR variable instead.

Example: mybox.company.com

REMOTE_USER:

If the server prompted the reader for a username and password because the script is protected by the NaviServer's access control, this variable contains the username the reader provided.

Example: nsadmin

REQUEST_METHOD:

The method used to send the request to the server. For direct links, the method is GET. For requests from forms, the method may be GET or POST. Another method is HEAD, which CGI programs can treat like GET or can provide header information without page contents.

SCRIPT_NAME:

The virtual path to the CGI script or program being executed from the URL used to execute the script. You may want to use this variable if the program generates a page that contains a form that can be used to run the program again -- for example, to search for another string.

Example: /cgi-bin/search

SERVER_NAME:

The full hostname, domain name alias, or IP address of the server that ran the CGI program.

Example: www.mysite.com

SERVER_PORT:

The server port number to which the request was sent. This may be any number between 1 and 65,535 (that is not already a well-known port). The default is 80.

Example: 80

SERVER_PROTOCOL:

The name and version number of the information protocol used to pass this request from the client to the server.

Example: HTTP/1.0

SERVER_SOFTWARE:

The name and version number of the server software running the CGI program.

Example: NaviServer/4.99.30

Other Environment Variables:

In addition to the preceding environment variables, the HTTP header lines received from the client, if any, are placed into the environment with the prefix HTTP_ followed by the header name. Any spaces in the header name are changed to underscores (_). The server may exclude any headers it has already processed, such as Content-type, and Content-length.

Also, you can specify environment variables to be passed to a CGI program in the NaviServer configuration file.

Accessing Standard Input

If a form uses the POST method to send a request, the field names and values are sent to standard input and the length of this string is provided in the CONTENT_LENGTH environment variable. The format of the standard input string is the same as the format of the QUERY_STRING environment variable when the GET method is used.

Different languages allow you to access the standard input in different ways. Here are some simplified examples. Your programs should also do some error checking.

C or C++

 #include <stdio.h>
 #include <stdlib.h>
 #define MAX_CONTENT_LENGTH 10000
 
 char *inputlenstr;
 int inputlen;
 int status;
 char inputtext[MAX_INPUT_LENGTH+1];
 inputlenstr = getenv("CONTENT_LENGTH");
 inputlen = strtol(inputlenstr, NULL, 10);
 status = fread(inputtext, 1, inputlen, stdin);

Bourne shell

 read input (reads contents to $input variable)

Output from CGI Programs

To send output from a CGI program to the reader's browser, you send the output to the standard output location. Different languages allow you to send text to standard output in different ways. Here are some examples:

C or C++

 #include <stdio.h>
 #include <stdlib.h>
 
 printf("Content-type: text/html\r\n\r\n");
 printf("<HEAD><TITLE>Hello</TITLE></HEAD>");
 printf("<BODY>You are using %s.</BODY>",
 getenv("HTTP_USER_AGENT") );

Perl

 #!/opt/local/bin/perl
 print "Content-type: text/plain\r\n\r\n";
 for my $var ( sort keys %ENV ) {
    printf "%s = \"%s\"\r\n", $var, $ENV{$var};
 }

Bourne shell

 echo Content-type: text/html
 echo 
 echo \<HEAD\>\<TITLE\>Hello\</TITLE\>\</HEAD\>
 echo \<BODY\>
 echo You are using $HTTP_USER_AGENT.\</BODY\>

HTTP Headers

Messages sent between a Web browser and a Web server contain header information that the software uses to determine how to display or interpret the information. The header information is not displayed by the browser.

The NaviServer automatically generates some HTTP header information and your program can add other information to the header.

Header Information Generated by NaviServer

When your CGI program sends output to the standard output location, the server automatically adds the following HTTP header information before sending the output to the reader's browser:

 HTTP/1.0 200 OK
 MIME-Version: 1.0
 Server: NaviServer/3.0
 Date: Monday, 06-Nov-95 17:50:15 GMT
 Content-length: 20134

However, if the name of your CGI program begins with "nph-", the NaviServer will not parse the output you send. Instead, the output is sent directly to the client. In this case, you must include the information above in your output. Generally, it is best to avoid using this "non-parsed header" feature because any errors may be sent to standard output and could make the header information incorrect. Also, with non-parsed headers, the server does not interpret the output, so the response code and content length are written out as 0 (zero) and 0 (zero) in the access log file.

Header Information Generated by Your Program

You can specify header information at the beginning of the output you send back to the client. After the header, add a blank line and then start the output you want the reader to see. The blank line is required. Your program should always send the Content-type header (unless you are using the Location header). The other headers listed below it are optional. For example,

 Content-type: text/html
 
 <HTML>
 <HEAD><TITLE>My title</TITLE></HEAD>
 <BODY>text goes here...</BODY>
 </HTML>

Content-type:

You should always use this header to specify the MIME type of the output you are sending (unless you are using the Location header). If you are sending an HTML page as output, use a Content-type of text/html. If you are sending untagged text, send a Content-type of text/plain. If you send images, you might use a Content-type of image/gif or image/jpeg. You can send any type of output from your CGI program -- just be sure to specify the correct MIME type.

Example: Content-type: text/html

Content-encoding:

Use this header if the output you are sending is compressed. The Content-type should specify the type of the uncompressed file. For example, use x-gzip for GNU zip compression and x-compress for standard UNIX compression.

Example: Content-encoding: x-compress

Expires:

Use this header to specify when the browser should consider the file "out-of-date". Browsers can use this date to determine whether to load the page from their local cache of pages or to reload the file from the server.

Example: Expires: Monday, 06-Nov-95 17:50:15 GMT

Location:

Use this header if you want to send an existing document as output. The server automatically sends the document you specify to the browser. You will probably want to specify a full URL for the Location. If you specify a complete URL (such as, http://www.mysite.com/out/response.htm), relative references in that file will be resolved using the information in the URL you specify. If you specify a relative URL (such as /out/response.htm), references in that file will be resolved using the directory that contains the CGI program.

If you send a Location header, you do not need to send a Content-type header. However, you may want to send HTML-tagged text including a link to the location for browsers that do not support this type of redirection. You can specify any type of URL as the output location. For example, you can send an FTP, Gopher, or News URL.

Example: Location: http://www.my.org/outbox/accepted.html

Status:

The NaviServer sends a status code to the browser in the first line of every HTTP header. The default status code for success is "200 OK". You can send other status codes by specifying the Status header.

Some browsers may not know how to handle all HTTP status codes, so your program should also send HTML output after the header to describe error situations that occur.

Example: Status: 401 Unauthorized

Sending HTML

To send a Web page to a reader's browser from a CGI program, first output this line followed by a blank line:

Content-type: text/html

Then, generate and output the HTML tags and content that make up the page. You can send any HTML tags you would normally use when creating pages.

If the file you want to send already exists, you can use the Location header described in the previous section to send that file as output from the CGI program.

Advice for CGI Programming

CGI Examples

You can download lots of examples and working CGI programs from the Web. Here are some places to look:

Keywords

module, nscgi