粗谈CGI

　　先看看维基百科上面关于 CGI的介绍http://zh.wikipedia.org/wiki/%E9%80%9A%E7%94%A8%E7%BD%91%E5%85%B3%E6%8E%A5%E5%8F%A3

一般我们在开发Web运用的时候很少接触CGI,这种底层的处理细节。但如果你想彻底理解Resquest-Response 过程,自己编写运用服务器就有必要详细了解CGI,很多语言的动态网页技术都是基于CGI的思想,对CGI进行扩展,比如Python的WSGI,Perl的PSGI。

　　有一篇介绍CGI很好的文章 http://www.jdon.com/idea/cgi.htm

　　我们知道HTTP Server只能处理用户的静态请求,但是如果用户的一个请求,请求的数据需要从数据库里面获取怎么办(我们称之为动态请求)。

　　我们在编写某个Web运用系统的时候,我们不使用任何任何HTTP服务器,我们使用最原始的Socket编程,我们编写程序自己解析用户的请求(HTTP协议),如果是静态文件,我们直接调用处理静态文件的方法,如果是动态请求我们则调用处理相应动态请求的方法,在整个过程中除了处理HTTP协议,我们没有使用任何其他协议，这种处理很直接,过程也很清楚。但是如果系统要增加新的功能,我们则要添加或修改对应的方法,很不容易扩展和维护。如果按照这种方式去编写一个Web网站,那么对程序员的要求就特别高,不仅要精通Socket编程和HTTP协议,还要会HTML编程。

　　下面是图示和为代码

 while true:

     conn = server.accept()             #接受连接

     req   =  conn.read()                 #读取用去请求

     headers = parse_http(reqr)    #解析用户请求

     if is_static(headers[url]):         #如果是静态请求

         res_str = do_static(req_file)

     else if is_dynamic(headers[url]): #如果是动态请求

         res_str = do_dynamic(req)

     res_str = end_handler(res_str)  #对响应的字符串进行加工

     conn.write(res_str)                     #向用户输出相应的结果

     conn.close()                                #关闭连接

　　按照上面那种方式的话,我们每编写一个Web系统都要重复以上步骤。而且整个处理动态请求的方法要用同一种语言。所以这种可行但不适用。其实按照上面这种思路的话编写的就是一个运用系统了。

思路没变,我们把重复的步骤提取出了,也就是除了动态处理的所有步骤，构成了一个HTTP Server。如果我们要开发一个网站系统，只用编写相应的HTTP的静态文件,和编写处理动态请求的脚本。我们称之为CGI脚本，CGI脚本可以由任何语言编写。我们知道大部分高性能的HTTP Server 都是用C/C++编写的,如果HTTP Server要调用的CGI脚本,调用CGI脚本我们可以使用Unix中的 execve 系统,下面是一些execve系统的一些函数及其用法。

 函数名: exec...

 功  能: 装入并运行其它程序的函数

 用  法: int execl(char *pathname, char *arg0, arg1, ..., argn, NULL);

  int execle(char *pathname, char *arg0, arg1, ..., argn, NULL,

      char *envp[]);

  int execlp(char *pathname, char *arg0, arg1, .., NULL);

  int execple(char *pathname, char *arg0, arg1, ..., NULL,

       char *envp[]);

  int execv(char *pathname, char *argv[]);

  int execve(char *pathname, char *argv[], char *envp[]);

  int execvp(char *pathname, char *argv[]);

  int execvpe(char *pathname, char *argv[], char *envp[]);

 程序例: 

 /* execv example */

 #include <process.h>

 #include <stdio.h>

 #include <errno.h> 

 void main(int argc, char *argv[])

 {

    int i; 

    printf("Command line arguments:\n");

    for (i=; i<argc; i++)

       printf("[%2d] : %s\n", i, argv[i]); 

    printf("About to exec child with arg1 arg2 ...\n");

    execv("CHILD.EXE", argv); 

    perror("exec error"); 

    exit();

 }

但是我们调用CGI脚本的时候要使用到一些请求的参数信息吧。在CGI里面有一个重要的名称 Environment variables 环境变量,下面列出 CGI/1.1 定义的环境变量

  Environment variables

    Environment variables are used to pass data about the request from

    the server to the script. They are accessed by the script in a system

    defined manner. In all cases, a missing environment variable is

    equivalent to a zero-length (NULL) value, and vice versa. The

    representation of the characters in the environment variables is

    system defined.

    Case is not significant in the names, in that there cannot be two

    different variable whose names differ in case only. Here they are

    shown using a canonical representation of capitals plus underscore

    ("_"). The actual representation of the names is system defined; for

    a particular system the representation may be defined differently to

    this.

    The variables are:

       AUTH_TYPE

       CONTENT_LENGTH

       CONTENT_TYPE

       GATEWAY_INTERFACE

       HTTP_*

       PATH_INFO

       PATH_TRANSLATED

       QUERY_STRING

       REMOTE_ADDR

       REMOTE_HOST

       REMOTE_IDENT

       REMOTE_USER

       REQUEST_METHOD

       SCRIPT_NAME

       SERVER_NAME

       SERVER_PORT

       SERVER_PROTOCOL

       SERVER_SOFTWARE

从字面上大家应该看出来所代表的含义吧,是不是有点类似 HTTP headers 。讲到这里大家应该很清楚 CGI 的处理过程了吧。总结来说当Web服务器接受CGI请求时,服务器将设置一些CGI程序的环境变量，运行CGI脚本时,CGI脚本在从环境变量中获取感兴趣的变量(比如获取查询字符串 QUERY_STRING),进行处理，响应结果。至于如何设置和获取环境变量请查看详解 Unix环境变量。

　　在列出一段 HTTP Server处理CGI请求的源码:

 #include    <stdio.h>

 #include    <sys/types.h>

 #include    <sys/stat.h>

 #include    <string.h>

 main(int ac, char *av[])

 {

     int     sock, fd;

     FILE    *fpin;

     char    request[BUFSIZ];

     if ( ac ==  ){

         fprintf(stderr,"usage: ws portnum\n");

         exit();

     }

     sock = make_server_socket( atoi(av[]) );

     if ( sock == - ) exit();

     /* main loop here */

     while(){

         /* take a call and buffer it */

         fd = accept( sock, NULL, NULL );

         fpin = fdopen(fd, "r" );

         /* read request */

         fgets(request,BUFSIZ,fpin);

         printf("got a call: request = %s", request);

         read_til_crnl(fpin);

         /* do what client asks */

         process_rq(request, fd);

         fclose(fpin);

     }

 }

 /* ------------------------------------------------------ *

    read_til_crnl(FILE *)

    skip over all request info until a CRNL is seen

    ------------------------------------------------------ */

 read_til_crnl(FILE *fp)

 {

     char    buf[BUFSIZ];

     while( fgets(buf,BUFSIZ,fp) != NULL && strcmp(buf,"\r\n") !=  )

         ;

 }

 /* ------------------------------------------------------ *

    process_rq( char *rq, int fd )

    do what the request asks for and write reply to fd

    handles request in a new process

    rq is HTTP command:  GET /foo/bar.html HTTP/1.0

    ------------------------------------------------------ */

 process_rq( char *rq, int fd )

 {

     char    cmd[BUFSIZ], arg[BUFSIZ];

     /* create a new process and return if not the child */

     if ( fork() !=  )

         return;

     strcpy(arg, "./");        /* precede args with ./ */

     if ( sscanf(rq, "%s%s", cmd, arg+) !=  )

         return;

     if ( strcmp(cmd,"GET") !=  )

         cannot_do(fd);

     else if ( not_exist( arg ) )

         do_404(arg, fd );

     else if ( isadir( arg ) )

         do_ls( arg, fd );

     else if ( ends_in_cgi( arg ) )

         do_exec( arg, fd );

     else

         do_cat( arg, fd );

 }

上面只列出了一部分HTTP Server代码,下面列出处理CGI请求的代码:

 do_exec( char *prog, int fd )

 {

     FILE    *fp ;

     fp = fdopen(fd,"w");

     header(fp, NULL);

     fflush(fp);

     dup2(fd, );

     dup2(fd, );

     close(fd);

     execl(prog,prog,NULL);

     perror(prog);

 }

这里用到了Unix I/O重定向技术,也就是把脚本里面的标准输出(Java的System.out.print()Python 的 print )直接连接到fd，也就是说你在脚本里面的 print 结果就是用户接受到的结果。

　　当然CGI这种技术是最基本的也是效率最低的,每次一个CGI请求都要fork()一次，而且你只能在CGI脚本要和Web Server在同一台机器上。现在出现很多技术取代它比如FastCGI,SCGI，而每个不同的语言都对CGI进行了扩展，形成了自己的规范比如Python的WSGI。而且每种语言在将Web系统部署在HTTP Server上面上的时候都有自己的解决方案,最常用的就是扩展HTTP Server的模块,编写相应的处理模块,比如Python的 mod_python其实本质上就是在Apache中嵌入一个Python解释器。而现在非常流行的一种架构方案就是 HTTP服务器做前端代理,接受用户请求,对于静态文件请求则直接响应给用户,对于动态请求则转发给运用服务器,运用服务器将处理的结果反馈给HTTP服务器,然后HTTP服务器在返回给用户。也就是 Server/Gateway 模式,比如 Python中组合 Nginx+Gunicorn,Nginx是代理服务器,而Gnicorn 是 WSGI服务器，Nginx将动态请求转发给Gnicorn,Gnicorn在将请求按照封装为符合WSGI规范的的请求,然后在调用相应的的app,由于WSGI服务器由Python编写,所以可以直接调用对应的方法即可，不用在fork()。(这里的处理过程类似于CGI但是规范之Python独有的WSGI规范,更加适合Python处理)详情处理过程可参看Python的 wsgiref模块。 WSGI类似于Java的Servlet,Gnicron类似于Tomcat.

　　如果想了解CGI的处理过程,建议直接看 Python CGI的源码，非常容易理解。

　　如果想了解 CGI/1.1 规范请点击 http://tools.ietf.org/html/draft-robinson-www-interface-00

巴特西

粗谈CGI

最新文章

热门文章