nscgi - NaviServer Modules

Description

This page describes the configuration of the module nscgi, which can be optionally loaded into NaviServer to run CGI programs.

What is CGI and How Does it Work?

CGI (Common Gateway Interface) is a standard way of running programs from a Web server. Often, CGI programs are used to generate pages dynamically or to perform some other action when someone fills out an HTML form and clicks the submit button. NaviServer provides full support for CGI v1.1 (RFC 3875).

Basically, CGI works like this: A web client sends a URL that causes the NaviServer to use CGI to run a program generating the content. NaviServer passes the input from the client to the program, runs it, and returns the output from the program back to the client. CGI acts as a "gateway" between the web server and the program executed at request time.

The program run by CGI can be any type of executable file on the server platform. For example, you can use C, C++, Perl, PHP, Python, Unix shell scripts, Fortran, or any other compiled or interpreted language.

One can also use plain Tcl scripts with CGI, but be aware that the NaviServer Tcl API will not be available to these scripts. To implement dynamic content with NaviServer, one has the option of using the embedded Tcl and C interfaces instead of CGI. Typically, the Tcl and C interfaces provide better performance than CGI (see the NaviServer Tcl Developer's Guide for information on the Tcl interface and the NaviServer C Developer's Guide for information on the C interface).

You may want to use CGI for existing, shareware, or freeware programs that use the standard CGI input, output, and environment variables. Since CGI is a standard interface used by many Web servers, there are lots of programs and libraries available. This chapter describes the interface and points you to locations where you can find more information.

A basic use-case for CGI is an HTML form that receives the form field values, computes some output, and returns the results as HTML output. For example, the form lets people to comment on your Web pages, and you want the comments emailed to you, and finally, you want to automatically generate a page and send it back to the web client.

The following steps describe the basic principles:

The reader fills out your form and clicks the "Submit" button. The HTML FORM tag in your page might look like this:
```
 <FORM METHOD="POST" ACTION="/cgi-bin/myprog">
```
The METHOD controls how the information typed into the form is passed to your program. It can be GET or POST. The ACTION determines which program should be run.

Other ways for a reader to run a CGI script are by providing a direct link to the program without allowing the reader to supply any variables through a form by using e.g. <A href="/cgi-bin/myprog">..., or by other means.
When the server gets a request for a URL that maps to a CGI directory or a CGI file extension (as defined in the configuration file), it starts a separate process and runs the program within that process. The server also sets up a number of environment variable within that process. These environment variables include some standard CGI variables, and optionally any variables you define in the configuration file for this type of program.
The program runs.

In this example, the CGI program myprog takes the comments from the form as input and sends them to you as email. If the form method is GET, it gets the input from an environment variable. If the form method is POST, it gets the input from standard input. It also assembles an HTML page and sends it to standard output.
Any information the program passes to standard output is automatically sent to the NaviServer when the program finishes running.
The server adds any header information needed to identify the output and sends it back to the reader's browser, which displays the output.

Configuring CGI for NaviServer

Loading the CGI module

In order to use CGI within NaviServer, one has to load the nscgi module, which is part of every NaviServer installation. As for all modules, one has to provide the information for which server configuration in the configuration file the module should be enabled. In the following example, we will use the server named s1). module to that server such it will be loaded at start time.

 ns_section ns/server/s1/modules {
   ns_param nscgi nscgi.so
 }

When the configuration file contains multiple server definitions, you will need to configure the CGI interface for each of these, since these might have different requirements (see NaviServer Configuration Reference for the configuration of virtual servers and multiple server definitions).

For the configuration of the server, one has to provide information for the following tasks:

Identify CGI Programs

The configuration file has to specify, which request should be treated as a CGI request. Without this information, the CGI program file would be treated by the server as a static resource by fastpath and its content would be sent back to the web client.

Therefore, a section to the configuration file for the nscgi module is needed to specify which HTTP methods and paths should be treated as a CGI requests. The following example specifies, that every GET request for a resource named *.cgi* everywhere in the URL space should be treated as a CGI request.

 ns_section ns/server/s1/module/nscgi {
   ns_param map  "GET /*.cgi"
 }

One could also restrict the path to a certain subdirectory in the URL space, and/or define multiple HTTP methods and paths.

 ns_section ns/server/s1/module/nscgi {
   ns_param map  "GET /cgi-bin/*.cgi"
   ns_param map  "POST /cgi-bin/*.cgi"
   # ...
 }

Identify CGI Programs in the Middle of the Path

The CGI specification supports script names not only at the end of the request URL, but as well CGI script names in the middle of the path. This is not very common, but allowed by RFC 3875. When such as request is issued, the remainder of the path after the script name is passed to the script via the environment variable PATH_INFO. This trailing path information is an alternative to using query variables to pass context information to the CGI script.

Below is an example of a CGI script name (here info.cgi) in the middle of a request path:

/cgi-dir/info.cgi/foo/bar?var=value&...

Since the URL space handling of NaviServer supports only wild cards in the last segment of a URL path, we have to specify in suchcases the script name without wildcards.

 ns_section ns/server/s1/module/nscgi {
   # ...
   ns_param map  "GET /cgi-dir/info.cgi"
   # ...
 }

With this definition, in the path above, info.cgi is identified as a CGI script. for the mentioned request path, the script gets /foo/bar passed via the environment variable PATH_INFO.

Specify CGI Script Interpreters

Typically, the CGI programs are not named with the extension .cgi, but in the case of CGI scripts, these are Perl, PHP, Python, ... scripts, using the language specific file extensions. In this step, we specify, which interpreter the CGI script should be executed.

For compiled CGI programs, this identification is not much of an issue (and not necessary). Also, scripted CGI programs might be executed via the shebang conventions. For security and maintenance reasons, web server administrators might want more control over, which script interpreters in which versions execute the CGI script.

The specification of the script interprets is performed via the script extension in a separate section in the configuration file under ns/interps. This section name is also specified in the module definition. In the following example, the section is named CGIinterps, containing definitions for Perl and PHP. When a CGI script with a specified extension is to be executed (below .pl and .sh), NaviServer will call the script via the named executables.

 ns_section ns/server/s1/module/nscgi {
   ns_param map    "GET /cgi/*.php"
   ns_param map    "POST /cgi/*.php"
   ns_param map    "GET /cgi/*.pl"
   # ...
   ns_param Interps CGIinterps
 }
 
 ns_section ns/interps/CGIinterps {
   ns_param .pl    "/opt/local/bin/perl"
   ns_param .php   "/opt/local/bin/php-cgi83"
 }

If no interpreter mapping is provided, the called script must have executable permissions.

Specify Source Locations

In some more complex setups, it is desirable to have a separate source locations for certain CGI applications. The source location is typically a directory not under the page directory of NaviServer, and maybe managed via a package manager or a source code repository.

The source-code mapping can be achieved by specifying additionally the path in the map value. The following example specifies that requests for Perl scripts stating with /cgi should be resolved by the specified source location.

 ns_section ns/server/s1/module/nscgi {
   # ...
   ns_param map  "GET /cgi/*.pl /server/perl-scripts"
   # ...
 }

The provided source location is registered via ns_register_fasturl2file such that fastpath can resource the request path against this source location.

Register CGI Handlers from a Script

Alternatively to the registration via the configuration file, CGI handlers can be as well be registered by the command ns_register_cgi. Loading the CGI module and specifying the CGI script interpreters have to be done via the configuration file.

Further configuration parameters

gethostbyaddr

is a boolean parameter to indicate whether to resolve the peer IP address to its hostname when setting the REMOTE_HOST environment variable. If false, the peer IP address will be used instead. Caution: turning this option on can negatively impact performance due to the overhead of having to perform a DNS look-up on every CGI request. Default: off

limit

is an integer parameter to indicate the maximum number of concurrent CGI requests to execute. 0 means unlimited. Default: 0

maxinput

is an integer parameter to indicate the maximum in bytes to accept in the HTTP request. 0 means unlimited. Mostly useful to limit the size of POST'ed data. Default: 1024000

maxwait

is an integer parameter to indicate the amount of time to wait in seconds in the queue when the concurrency limit has been reached. Server will respond with a "503 Service Unavailable" error on timeout. If limit is set to 0, this setting will have no effect. Default: 30

systemenvironment

is a boolean parameter to specify, if all environment variables of the server process should be passed to the CGI program. See below for more information about environment variables. Default: false

allowstaticresources

is a boolean parameter to controls whether static resources (e.g. images) can be served directly by the nscgi module.

In general, this option is not needed, since static content should be served via fastpath, providing all its features (caching, compressed output, etc.). The WordPress example on this page shows a complex setup, where the static resources are delivered from a non-standard location. Notice, that setting this resource might be security relevant, since it can lead to deliver unexpectedly configuration files etc. Default: false

Sample Configuration for WordPress

In the following example, we provide the definitions for running the popular WordPress package under NaviServer via CGI. We assume, the WordPress sources have been downloaded in the directory /var/www/wordpress/. The following definition specifies, that for GET and POST requests in the directory /wordpress with the extension .php, the source files should be taken from the downloaded folder. We use /opt/local/bin/php-cgi83 for executing the PHP files.

 ns_section ns/server/s1/module/nscgi {
   foreach httpMethod {GET POST} {
     ns_param map  "$httpMethod /wordpress/*.php /var/www/wordpress/"
   }
   ns_param interps php8
 }
 
 ns_section ns/interps/php8 {
   ns_param .php    "/opt/local/bin/php-cgi83"
 }

This definition is not yet complete. WordPress expects for URLs pointing to a directory that index.php is executed. Make sure to include this value in the fastpath setup.

 ns_section ns/server/s1/fastpath {
    # ...
    ns_param directoryfile  "index.adp index.tcl index.html index.htm index.php"
    # ...
 }

Sample Configuration for Joomla

The setup for Joomla is very similar to the setup for WordPress. Using this setup, the Joomla system will be available on the web server under the URL /joomla. in the setup, we reuse the CGI interpreter section php8 and the directoryfile from the WordPress setup. The definition assumes that the Joomla source code is installed on the server in the directory /var/www/joomla.

 ns_section ns/server/s1/module/nscgi {
   foreach httpMethod {GET POST} {
     ns_param map  "$httpMethod /joomla/*.php /var/www/joomla/"
   }
   ns_param interps php8
 }

Environment Variables passed to CGI programs

CGI programs receive input via environment variables (and standard input for methods like POST or PUT) and provide output via standard console output channels. We describe here in detail the environment variables, which are passed to the CGI programs.

Accessing Environment Variables

Different languages allow you to access environment variables in different ways. Here are some examples:

C or C++

 #include <stdlib.h>;
  
 char *browser = getenv("HTTP_USER_AGENT");

Perl

 $browser = $ENV{'HTTP_USER_AGENT'};

Bourne shell

 BROWSER=$HTTP_USER_AGENT

C shell

 set BROWSER = $HTTP_USER_AGENT

Tcl

 set browser $::env(HTTP_USER_AGENT)

Standard Environment Variables

These standard environment variables are provided for all CGI programs by NaviServer. These variables are described in detail in the RFC 3875 in Section 4.1.

AUTH_TYPE:

If the server supports user authentication, and the script is protected, this is the protocol-specific authentication method used to validate the user. A typical value is Basic for Basic authentication.

CONTENT_LENGTH:

If the CGI program is run by a form with the POST method, this variable contains the length of the contents of standard input in bytes. There is no null or EOF character at the end of standard input, so in some languages (such as C and Perl) you should check this variable to find out how many bytes to read from standard input.

Example: 442

CONTENT_TYPE:

If the CGI program is run by a form with the POST method, this variable contains the MIME type of the information sent by the browser. Currently, all browsers should send the information as application/x-www-form-urlencoded.

GATEWAY_INTERFACE:

The version number of the CGI specification this server supports.

Example: CGI/1.1

HTTP_ACCEPT:

A comma-separated list of the MIME types the browser will accept, as specified in the HTTP header the browser sends. Many browsers do not send complete lists, and the list does not include external viewers the user has installed. If you want to send browser-specific output, you may also want to check the browser name, which is specified by the HTTP_USER_AGENT variable.

Examples:

 */*, application/x-navidoc
 */*, image/gif, image/x-xbitmap, image/jpeg

HTTP_IF_MODIFIED_SINCE:

This variable contains a date and time if the browser wants a response only if the data has been modified since the specified date and time. The date is in GMT standard time. Many browsers do not send this information.

Example: Thursday, 23-Nov-95 17:00:00 GMT

HTTP_REFERER:

This variable contains the URL of the page or other location from which the reader sent the request to run the CGI program. For example, if the reader runs the program from a form, this variable contains the URL of that form.

Example: http://www.mydomain.com/mydir/feedback.htm

HTTP_USER_AGENT:

This variable tells which browser the reader is using to send the request. Normally, the format is "browser name/version".

Example: Mozilla/1.2N (Windows; I; 16-bit)

PATH_INFO:

This variable contains the path information from the URL after the name of the CGI program. Commonly, this type of URL is used to pass a relative directory location to your program. For example, the following URL runs the listdir program and passes it /misc/mydir as extra path information:

http://www.mysite.com/cgi-bin/listdir/misc/mydir

PATH_TRANSLATED:

This variable translates the CGI program to the spource location in the filesystem.

Example: /var/www/myserver/pages/misc/mydir

QUERY_STRING:

This variable contains the query information passed by a form or link to the CGI program.

REMOTE_ADDR:

The IP address of the machine from which or through which the browser is making the request. This information is always available.

Example: 199.221.53.76

REMOTE_HOST:

The full domain name of the machine from which or through which the browser is making the request. If this variable is blank because the browser did not send the information, use the REMOTE_ADDR variable instead.

Example: mybox.company.com

REMOTE_USER:

If the server prompted the reader for a username and password because the script is protected by the NaviServer's access control, this variable contains the username the reader provided.

Example: nsadmin

REQUEST_METHOD:

The method used to send the request to the server. For direct links, the method is GET. For requests from forms, the method may be GET or POST. Another method is HEAD, which CGI programs can treat like GET or can provide header information without page contents.

SCRIPT_FILENAME:

This variable contains the absolute pathname of the currently executing script in the filesystem.

SCRIPT_NAME:

This variable might be empty or the part of the request leading to the CGI program.

Example: /cgi-bin/search

SERVER_NAME:

The full hostname, domain name alias, or IP address of the server that ran the CGI program.

Example: www.mysite.com

SERVER_PORT:

The server port number to which the request was sent. This may be any number between 1 and 65,535 (that is not already a well-known port). The default is 80.

Example: 80

SERVER_PROTOCOL:

The name and version number of the information protocol used to pass this request from the client to the server.

Example: HTTP/1.0

SERVER_SOFTWARE:

The name and version number of the server software running the CGI program.

Example: NaviServer/5.0.0

HTTP_*:

In addition to the preceding environment variables, the HTTP header lines received from the client, if any, are placed into the environment with the prefix HTTP_ followed by the header name. Any spaces in the header name are changed to underscores (_). The server may exclude any headers it has already processed, such as content-type, and content-length.

Extra Environment Variables

One can configure NaviServer to pass further environment variables some CGI program.

In case the CGI program or the CGI script interpreter requires additional environment variables, one can define this via the module parameter Environment and a matching section for the detail definitions.
- Add a definition for the Environment parameter to your CGI configuration section.
```
 ns_section ns/server/s1/module/nscgi {
   ns_param map         "GET /cgi /usr/local/cgi"
   ns_param map         "POST /*.cgi"
   ns_param interps     cgi
   ns_param environment cgi
 }
```
- Then add a section under ns/environment for the chosen name (here cgi) containing the required environment variable definitions.
```
 ns_section ns/environment/cgi {
   ns_param FOO BAR     ;# defines environment variable "FOO=BAR"
   ns_param TMP /tmp    ;# defines environment variable "TMP=/tmp"
 }
```
systemenvironment is a boolean parameter in the module/nscgi section in the configuration file to controls whether the CGI will inherit the server process's environment variables or not. Enabling this could potentially leak sensitive information from the parent's environment if the CGI displays its environment variables which is the behavior of some common error-handling code. Default: false
```
 ns_section ns/server/s1/module/nscgi {
   # ...
   ns_param systemenvironment true
   # ...
 }
```

Output from CGI Programs

To create an HTTP response, the CGI program outputs the header and content to the standard output. Different languages allow you to send text to standard output in different ways. Here are some examples: