|

Stateful -vs- Stateless
HTTP is a stateless protocol. This means that
an HTTP server has no information in a request to tie it to any other request. The data in a
response is based only on the information the client sends in the request. It's like doing a
math problem in high school -- you are only allowed to use the facts given in the
problem plus mathematical logic to derive an answer.
HTTP stands out from all the other protocols you're probably familiar with using. These
protocols are all "stateful" or "stated", which means information divulged in one request can
be used to modify future requests. In fact these protocols have a concept of a "session" wherein a batch of requests are sent and responses
received. FTP (File Transfer Protocol) has many
states, including "the current directory". SMTP
(Simple Mail Transfer Protocol) and POP (Post Office Protocol) both include
a concept of "who you are" which is used for all requests. NNTP (Network News Transfer Protocol) allows you to "change Usenet groups" to
direct where future requests for articles will be retrieved from.
Stateless protocols generally have the advantage that they require fewer resources on the
server -- the resources are pushed into the client. But the disadvantage is that the
client needs to tell the server enough information on each request to be able to get the
proper answer. Cookies are a method for a server to ask the client to store arbitrary data
for use in future connections. The server is asking the client to keep state information.
The hardest part of personalizing a Web page is maintaining state
-- tracking users as they click through your site. Web browsers and
servers have no built-in mechanisms to keep tabs on and remember users as they go from page
to page. That is, after a users sends a request to the server and a Web page is returned,
the server forgets all about the user and the page she has just downloaded. If a user clicks
on a link, the server doesn’t have background information about what page the user is
coming from and, more importantly, if the user returns to the page at a later date, there is
no information available to the server about the user’s previous actions on the page.
Maintaining state can be important to developing complex interactive applications.
Several sites work around this problem using complex server-side CGI scripts. But there is a
solution, the new browsers address this problem with cookies: a method of storing
information locally in the browser and sending it to the server whenever the appropriate
pages are requested by the user. Because cookies allow Web builders to ask a user for
personal information, store the data on their computers, and retrieve that knowledge when
the user returns, they are the most common way to track visitors.
The cookie mechanism allows servers to personalize pages for each client, or
remember selections the client has made when browsing through various pages of a site --
all without having to use a complicated (or more time-consuming) CGI/database system on the
server's side.
Cookies work in the following way: When a CGI program identifies a new user,
it adds an extra header to its response containing an identifier for that user and other
information that the server may glean from the client's input. This header informs the
cookie-enabled browser to add this information to the client's cookies file. After
this, all requests to that URL from the browser will include the cookie information as an
extra header in the request. The CGI program uses this information to return a document
tailored to that specific client. The cookies are stored on the client user's hard
drive, so the information remains even when the browser is closed and reopened.
JavaScript provides the capability to work with client-side information stored as
cookies.
Cookies provide a method to store information at the client side and have the browser
provide that information to the server along with a page request. A cookie always includes
the address of the server that sent it. That's the primary idea behind a cookie:
Identification.
Where did the term cookies come from?
"Lou Montulli, (currently?) the protocols manager in
Netscape's client product division, wrote the cookies specification for Navigator 1.0, the
first browser to use the technology. Montulli says there's nothing particularly amusing
about the origin of the name: 'A cookie is a well-known computer science term that is used
when describing an opaque piece of data held by an intermediary. The term fits the usage
precisely; it's just not a well-known term outside of computer science circles.'"
The Truth about Cookies
- Cookies just identify the computer being used, not the individual using the computer.
- A cookie is not a script. A cookie may be written by a script (either a CGI or
JavaScript) but the cookies themselves are simply passive text strings.
- The Netscape specifications limits a cookie to 4K of text.
Most cookies however rarely exceed 20-30 characters (a fraction of a kilobyte). The number
of cookies on your machine is limited to 20 per site visited up to a maximum
of 300. The oldest cookies are deleted.
- Cookie security is such that only the originating domain can ever use the contents of
your cookie. The trick that companies such as doubleclick use is to embed a
graphic from their domain on a page from another domain. When the graphic (usually a
banner) is loaded the doubleclick domain sets a cookie.
- The specifications allow for cookies to be set with or without an expiry date.
The former are called 'Persistent Cookies' and the latter 'Non-persistent'.
A cookie without a valid (future) expiry date will not be stored on your machine but
will be available for the duration of the current session (ie. until you log off).
- Cookie files stored on the client computer are easily read by any word processing
program, text editor or web browsing software. If a merchant actually stores sensitive
information in a Cookie, that information can be read by any Cookie savvy person with
access to the computer storing the Cookie. Most web merchants sophisticated enough to
use Cookie based shopping programs will take steps to protect any information
transmitted and stored via Cookie technology. For example:
- The merchant could use only Secure Socket Layer (SSL) or other encryption-enabled Web pages to send and
receive sensitive Cookie information to protect that information from Web miscreants
sniffing that merchant's web correspondence. Any Cookie containing sensitive
information could be created using the "secure" attribute so that it can be
retrieved only by a computer running SSL enabled software. Additionally, any sensitive
information actually stored in the Cookie should be encrypted to hide it from others
with access to the web surfer's computer.
- Better yet, the merchant can use a "short form Cookie" that does not store
the actual data but instead contains a pointer that the merchant's computer can use to
locate the file on the merchant's machine where the information collected is stored.
The bottom line is that an unsuspecting Web consumer, using current Cookie-enabled
browsers in their default mode and ignorant of the fact or content of a cookie, must rely
on the merchant to "do the right thing".
What Cookies cannot do
- Cookies CANNOT be used to get a persons e-mail address. They can save the
e-mail address after a browser types it into a form, but they can't GET anything. A
cookie is just a holder.
- Cookies do not steal credit card numbers, passwords or any other information. Rather
they allow a web site to store information a visitor voluntarily submits to that web
site on that visitor's machine. In this regard, Cookies are no different that the
traditional databases maintained by retail stores, mail order houses, and other
merchants so many of us trust implicitly with the same information the Cookie stores
only on the same machine used to supply that information.
- Cookies cannot be accessed by any computer other than the computer that created the
cookie. Yes, if a web surfer goes to Company Y's web page and orders a product, Company
Y can store whatever information that surfer is required to provide to complete that
sale as a Cookie on the surfer's machine. Equally true is that only Company Y can
retrieve that information. Companies A, B, C etc., running on a different computer,
cannot access any of the data stored in the Company Y Cookie. Bottom line, storing the
information in a Cookie poses no greater risk of Company Y misconduct than providing
Company Y access to that same information via mail, telephone, fax or a Cookie-less web
page.
- Web sites that send Cookies cannot, by virtue of creating that Cookie, access any
information stored on the system housing the Cookie that does not appear in that Cookie.
The Cookie at most allows the web site creating it to retrieve from a visitor's system
information that visitor has already submitted to that web site.
Cookies can be used for a multitude of tasks including:
- Reminder calendars that use cookies to store appointments and other
messages.
- Country tours that users can take during several visits to a Website
– cookies are used to remember where the user left-off.
- Adventure games that use cookies to keep track of pertinent character
data and the current state of the game.
- Storing data as you move from one page (or frame) to another, for example shopping
carts.
- Saving user preferences.
- Greeting people by name.
- Notifying visitor on what has changed since their last visit.
- Using CGI you can use a cookie to identify repeat visitors to your site and their
movement patterns.
The last point and others like it cause concern for some users. What you should realize
is that tracking of visitors existed long before cookies. Using CGI and server-side
scripts you can be tracked much more efficiently than by the humble cookie.
cookies.txt
During a browsing session Netscape stores your cookies in memory, but when you quit they
go into a file called cookies.txt (ie, C:\Program Files\Netscape\Users\Username), but on a
Macintosh the cookie jar is called MagicCookie and resides in the preferences folder. Every
time you open your browser, your cookies are read in from disk, and every time you close
your browser, your cookies are re-saved to disk. As a cookie expires, it is discarded from
memory and it is no longer saved to the hard drive.
www.sislands.com FALSE / FALSE 856869067 headCount 5
.sislands.com TRUE / javascript/week7/html FALSE 959145732 counter 3
www.sislands.com FALSE / FALSE 856869067 userName Frank%20Peter
Each line represents a single piece of stored information. A tab is inserted between each
of the fields.
- The domain of "originating" cookie. The domain parameter takes the
flexibility of the path parameter one step further. If a site uses multiple servers
within a domain the it is important to make the cookie accessible to pages on any of
these servers.
domain=www.sislands.com
Cookies can be assigned to individual machines, or to an entire Internet domain or
sub-domain. The only restrictions on this value is that it must contain at least two
dots (.sislands.com, not sislands.com) for the normal top-level domains, or three dots
for the "extended" domains (.ecom.sislands.com, not ecom.sislands.com)
- flag -
A TRUE/FALSE value indicating if all machines within a
given domain can access the variable. This value is set automatically by the browser,
depending on the value you set for domain.
- If you provide a cookie path attribute, the browser will check it
against your script's URL before returning the cookie. For example, if you specify the
path "/cgi-bin", then the cookie will be returned to each of the scripts
"/cgi-bin/tally.pl", "/cgi-bin/order.pl", and "/cgi-bin/customer_service/complain.pl",
but not to the script "/cgi-private/site_admin.pl". The path "/foo"
would match "/foobar" and "/foo/bar.html". The path "/" is
the most general path. By default, path is set to "/", which causes the cookie
to be sent to any CGI script on your site.
The two examples below should have help explain the path and what it means.
Cookies created in Week 1& read in Week 1 (annotated version)
Cookies created in Week 1 & read in Week 7 (annotated
version)
If the second boolean ("secure") attribute is
set, the cookie will only be sent to your script if the CGI request is occurring on a
secure channel, such as SSL (default is false).
- expiry date
is the large number before the cookie-name. It represents the number
of milliseconds since Jan 1, 1970 00:00:00 GMT (called the epoch in JavaScript). Hence,
there are no Y2K issues with Cookies.
- The end of each line there is the cookie-name and cookie-value - The
Cookie.
Setting Cookies
To set a cookie it is only necessary to specify a name-value pair. The domain will be set
automatically and the path will be "/". A cookie set without an expiry date will
not be written to the cookie file as it cannot persist beyond the current session.
Cookie values, for example, may not include semicolons, commas, or whitespace. For
this reason, you may want to use the JavaScript escape()
function to encode the value before storing it in the cookie. If you do this you’ll have
to use the corresponding unescape() function when you
read the cookie value.
escape() creates and returns a new string
that contains an encoded version of the string. The string is encoded as follows: all
spaces, punctuation, accented characters, and any other that are not ASCII letters or
numbers are converted to the form %xx, where xx is the two hexadecimal digits that
represent the ISO-8859-1 (Latin-1) encoding of the character. For example, the ! character has the Latin-1 encoding of 33 which is 21 hexadecimal,
so the escape() replaces this character with the
sequence %21. Thus the expression:
escape("Hello World!");
yields the string:
Hello%20World%21
while
unescape(Hello%20World%21);
yields the string:
Hello World!
The purpose of the escape() encoding is to ensure
that the string is portable to all computers and transmittable across all networks,
regardless of the character encodings the computer or networks support (as long as they
support ASCII).
The encoding performed by escape() is like the URL
encoding used to encode query strings and other portions of a URL that might include
spaces, punctuation, or characters outside the standard ASCII character set.
The only real difference is that the URL encoding, the spaces are
replaced with a ‘+’ character, while the escape() replaces spaces the %20
sequence.
Here is syntax use to set a cookie using JavaScript:
document.cookie="NAME=VALUE; expires=DATE;
path=PATH; domain=DOMAIN; secure";
and from the server:
Set-Cookie: NAME=VALUE; expires=DATE; path=PATH;
domain=DOMAIN; secure
Optional Attributes for Set-Cookies
| NAME |
DESCRIPTION |
| NAME=VALUE |
Both name and value can be any
strings that do not contain either a semi-colon, space, or tab. Encoding such as URL
encoding may be used if these entities are required in the name
or value, as long as your script is prepared to handle it. |
| domain=DOMAIN |
This attribute specifies a domain name range for which the cookie will be
returned. The domain-name must contain at least two dots
(.), e.g., ".microsoft.com" This value would cover both "www.microsoft.com"
and "msdn.microsoft.com", and any other server in the microsoft.com
domain.
When searching the cookie list for valid cookies, a comparison of the domain
attributes of the cookie is made with the Internet domain name of the host from
which the URL will be fetched. If there is a tail match, then the cookie will go
through path matching to see if it should be sent. "Tail matching" means
that domain attribute is matched against the tail of the fully qualified domain name
of the host.
Only hosts within the specified domain can set a cookie for a domain and domains
must have at least two (2) or three (3) periods in them to prevent domains of the
form: ".com", ".edu", and ".us". Any domain that fails
within one of the seven special top level domains listed below only require two
periods. Any other domain requires at least three. The seven special top level
domains are: "COM", "EDU", "NET", "ORG",
"GOV", "MIL", and "INT".
The default value of domain is the host name of the server which generated the
cookie response. |
| expires=DATE |
Specifies the expiry date of a cookie. After this date the cookie will no longer
be stored by the client or sent to the server (DATE takes the form Wdy, DD-Mon-YY
HH:MM:SS GMT – dates are only stored in GMT). By default, the value of expiry is
set to end of the browser session. |
| path=PATH |
The path attribute is used to specify the subset of URLs in a domain for which the
cookie is valid. If a cookie has already passed domain matching, then the pathname
component of the URL is compared with the path attribute, and if there is a match,
the cookie is considered valid and is sent along with the URL request. The path
"/foo " would match "/foobar" and "/foo/bar.html". The
path "/" is the most general path. If the path is not specified, it as
assumed to be the same path as the document being described by the header which
contains the cookie.
NOTE: And the more specific the path, the higher in the cookie
"order" it will be read from the cookie.txt file. However, all the cookies
from that domain will also be sent in the HTTP header. |
| secure |
If a cookie is marked secure, it will only be transmitted if the communications
channel with the host is a secure one. Currently this means that secure cookies will
only be sent to HTTPS (HTTP over SSL) servers. If
secure is not specified, a cookie is considered safe to be sent in the clear over
unsecured channels.
So mark it as secure if you are, for instance, running a JavaScript shopping cart
with SSL. |
By comparisons, the Cookie field in a request header contains only a set of NAME=VALUE pairs for the requested URL:
Cookie: name1=VALUE1; name2=VALUE2 …
Multiple Set-Cookie fields can be sent in a single response header from the server.
Note: a cookie that has the same path and name as an existing
cookie will overwrite the old one – this can be used as a way of erasing cookies – by
writing a new one with an expiry date that has already passed.
For a cookie to persist beyond the current session a valid expiry date must be set. This
is a number or date with a value greater than the current time/date value. The best way to
set an expiry date is to take the current date value, add a set time period and convert to
GMT (remember we're on a global network). Future cookie standards may allow setting a
duration rather than on a set date.
The cookie(s) that you set or accept are only accessible at pages with a matching domain
name, matching path. Also the cookies must not have reached or passed their expiry date.
When these criteria are met the cookies become available to JavaScript via the
document.cookie object.
Where are the Cookies stored?
Where does MSIE
keep its cookies?
Microsoft keeps its cookies in different locations. You will find your cookies in the
folder C:\windows\cookies in Windows 9X and C:\WinNT\profiles\username\cookies in Win NT
Each individual domain's cookies are stored in their own file, along with the username
that accessed the site. For example, if I went to Yahoo, I would get a cookie that is stored
in the file frank@yahoo.txt.
Note: that the username is not sent with the cookie.
Where does
Netscape keep its cookies?
You will find your cookies file in the folder C:\Program Files\Netscape\Users\YourName
then look for cookies.txt
Controlling Cookies within your Browsers
If You Want to Control Which
Cookies You Accept:
You can order your browser to accept all cookies or to alert you every time a cookie is
offered. Then you can decide whether to accept one or not.
If you're using Internet Explorer 4.0 and above:
1. Choose View, then
2. Internet Options.
3. Click the Advanced tab,
4. Scroll down to the yellow exclamation icon under Security and choose one of the
three options to regulate your use of cookies.
If you're using Firefox:
On your Task Bar, click:
1. Edit, then
2. Preferences, then
3. click on Privacy.
4. Set your options in the box labeled "Cookies".
How to See Cookies You've Accepted:
If you're using Internet Explorer 4.0 and above:
On your task bar, click:
1. View, then
2. Internet Options.
3. Under the tab General (the default tab) click
4. Settings, then
5. View Files.
Stopping Cookies
The options to allow all or deny all cookie are relatively clear.
The option to warn before accepting cookies is useful when you are developing a site that
uses cookies but become annoying when you are browsing the internet. Some servers are able
to use cookies to gather information about visitor behavior. When these are incorrectly
configured a single page can set a cookie for every graphic.
The intermediate option is to block cookies that do not originate from the current
domain. This means that if you are at http://www.foo.com/ and a server at http://www.bar.com/
tries to set a cookie through a banner graphic on the page, that cookie will not be
accepted.
Another poplar method is to replace your cookie.txt file with a folder of the same name.
This prevents any cookies from being accepted.
HTTP and how it works
When a user requests a page, an HTTP request is sent to the server. The request includes
a header that defines several pieces of information, including the page being requested.
The server returns an HTTP response that also includes a header. The header contains
information about the document being returned, including its MIME type (such as text/html
for a standard HTML page or image/gif for a GIF file).
Cookies and HTTP Headers
Cookie information is shared between the client browser and a server using fields in the
HTTP headers.
When the user requests a page for the first time, a cookie (or more than one cookie) can
be stored in the browser by a Set-Cookie entry in the header of the response from the
server. The set-Cookie field includes the information to be stored in the cookie along with
several optional pieces of information, including an expiry date, path, and server
information, and if the cookies requires security.
Then, when the user requests a page in the future, if a matching cookie is found among
all the stored cookies, the browser sends a Cookie field to the server in request header.
The header will contain the information stored in that cookie.
Cookies and CGI scripts
In order for cookies to be useful, it is necessary for the server to be able to take
advantage of the cookie information it receives and for the server to be able to generate
cookie headers if they are needed. This done primarily done by CGI scripts.
For instance, if you want to provide a custom search tool that would search WWW indices
selected by the user, you would need to develop a system that follows this basic pattern:
- User calls the site using an URL that requests a CGI script.
- The script checks whether it is the user’s first time at the site by checking
whether there is a cookie field in the HTTP request header.
- If there is no cookie, the script sends back a new search page with all choices
unselected and an empty search field.
- If there is a Cookie field, the script interprets the cookie and returns a page with
all the user’s previous choices selected.
- When the user conducts a search, the script returns the search results along with a
Set-Cookie field in the header to reset the cookie to the newly selected values that the
user used for the search.
To implement this type of server-side processing for cookies may require significant
increases in the load on a Web server. With this model, most pages are being built
dynamically based on receiving cookie information in the header.
This is in contrast to typical Web pages, which are static, and all the server needs to
do is send the current file to the client without any additional processing.
Links:
|