URL Encoding

(URL Encoding Table)

See URL Encoding in Action

Before data supplied on a form can be sent to a CGI program, each form element's name (specified by the NAME attribute) is equated with the value entered by the user to create a NAME=VALUE pair. For example, if the user entered "30" when asked for his or her age, the NAME=VALUE pair would be "age=30". In the transferred data, NAME=VALUE pairs are separated by the ampersand (&) character.

Since under the GET Method the Form information is sent as part of the URL, Form information cannot include any spaces or other special characters that are not allowed in URLs, or characters that have other meanings in URLs, like slashes (/). (For the sake of consistency, this constraint also exists when the POST method is being used.) Therefore, the Web browser performs some special encoding on user-supplied information.

URL Encoding involves replacing spaces and other special characters in the query strings with their hexadecimal equivalents. (URL Encoding is also sometimes called hexadecimal encoding.) Suppose a user fills out and submits a form containing his or her birthday in the syntax mm/dd/yy (ie, 11/05/73). The forward slashes in the birthday are among the special characters that can't appear in the client's request for the CGI program. Thus, when the browser issues the request, it encodes the data. The following sample request shows the resulting encoding:

POST /cgi-bin/birthday.pl HTTP/1.0
.
. [HTTP Header information]
.
Content-length: 21

birthday=11%2F05%2F73

The sequence %2F is actually the hexadecimal equivalent of the slash (/) character.

CGI scripts have to provide some way to "decode" Form data the client has encoded. Here's a short CGI program, written in Perl, that can process this Form:

#!/usr/local/bin/perl

read(STDIN, $input, $ENV{'CONTENT_LENGTH'});

$input =~ s/%([\dA-Fa-f]{2})/pack("C", hex($1))/eg;

($field_name, $birthday) = split(/&/, $input);

print "Content-type: text/plain", "\n\n";

print "Hey, your birthday is on: $birthday.";
print "That's what you told me, right?", "\n";

exit (0);

The line:

$input  =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C", hex ($1))/eg;

is a regular expression in Perl that converts the hex "%2F" back to a "/" character. This should look somewhat familiar, because this topic was superficially covered in Week 5 in the Introduction to JavaScript RegExp.

NOTE: As a special case, the space character can be encoded as a plus sign (+) in addition to its hexadecimal notation (%20). By using %20 instead of '+' URL Encoding is 100% compatible to JavaScript's escape() method - See this in Action.