Cyber Security Expo
 
HTTP: Doing What Your Browser Does For You by Nekogaimasu on 07/09/02

Before we start:

This tutorial assumes you have knowledge and experience using telnet or something that can do the same job - like netcat. Knowing a bit of HTML would come in handy too.

Author"s Disclaimer:

This tutorial is provided free as a voluntary service of the author. No responsibility is held, nor is any claim made, for the accuracy, safety or any other aspect of this tutorial"s content.


OK, I wanted to write a tutorial on HTTP and now you want to learn it for whatever reason you do - to make a web browser or to just learn more about the Web. Seeing as I don"t want to write a big introduction, I"m going to get right into it.

HTTP is a service that usually runs on port 80 (however it often runs on 81, 8080, 8000 or 1080 or whatever). You can see this if you connect to microsoft.com on port 80 and give it a few carriage returns:

HTTP/1.1 400 Bad Request
Server: Microsoft-IIS/5.0
Date: Mon, 07 Jan 2002 02:17:22 GMT
Content-Type: text/html
Content-Length: 87

ErrorThe parameter is incorrect.

As you can see, many web servers (or daemons - pronounced just like demon) give an excessive amount of information out to anyone (some, like yahoo"s daemon, don"t give any at all). Let"s take a moment to examine this dump. The first phrase is "HTTP/1.1" which you"d probably correctly guess means that this server is using version 1.1 of HTTP. The second part is a status code with a friendly translation afterwards. These status codes are standard three digit numbers. Here"s what the HTTP 1.1 rfc (no. 2616) says:

The Status-Code element is a 3-digit integer result code of the attempt to understand and satisfy the request. These codes are fully defined in section 10. The Reason-Phrase is intended to give a short textual description of the Status-Code. The Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason-Phrase.

The first digit of the Status-Code defines the class of response. The last two digits do not have any categorization role. There are 5 values for the first digit:

- 1xx: Informational - Request received, continuing process
- 2xx: Success - The action was successfully received, understood, and accepted [yippee!]
- 3xx: Redirection - Further action must be taken in order to complete the request
- 4xx: Client Error - The request contains bad syntax or cannot be fulfilled
- 5xx: Server Error - The server failed to fulfil an apparently valid request

(There is a detailed list of these codes in the Appendix.) Looking at this we can see that code 400 is a client error. This means that microsoft.com"s daemon is blaming us for the fact it can"t make head or tail of our HTTP request. We can find out the name of this blaming daemon in the next line. Server: Microsoft-IIS/5.0 Well, you would expect microsoft.com to use the latest version of the Microsoft web daemon IIS, now, wouldn"t you? Putting this kind of information out in response to a connection from anywhere is often seen as a security risk by security experts and an excellent opportunity by lame "hackers" who"ll just look up these details on Bugtraq, download an exploit that looks nice and do lots of nasty things to the server. Enough of that, let"s skip the next line (current date and time - GMT in this case) and go on with the last two lines of headers. The Content-Type tells us that what we got in return was HTML - web page text - and the Content-Length tells us that the HTML (the bit the user would see if he/she used a common browser) takes up 87 bytes - and if you count the number of characters, you"ll find that that MR I.I.S. Daemon is perfectly correct.

OK, so now you know how to get an error message from a web server. Very good, but you probably want to get a full-blooded webpage. The magic word to do this is in fact GET. GET with a slash (/) after it that is (you"ll see why soon).

GET /


Hotmail - The World"s FREE Web-Based Email






[MSN Hotmail Logo]

 


 


Hotmail redirect in progress. Please wait...




(Hotmail.com hasn"t given us a personal life story here - it"s to do with the GET /, you"ll see why soon.) Using GET / on hotmail.com is getting the URL http://hotmail.com/. The URL http://google.com/jobs/benefits.html would be requested by connecting to google.com and typing GET /jobs/benefits.html. See:

GET /jobs/benefits.html



Google Job Benefits







...
Return to Google homepage.






 Cool Jobs at Google




You should see how it works by now: to see what should go after the GET, just drop the http:// and the domain name. There are more methods (as they are called) than GET - just remember that they are case-sensitive (ie. Use GET not get or Get.).

GET - as already covered

HEAD - like GET but requests that the daemon only returns the headers for a URL. This is little known (but useful) and I"ve seen many servers not implement it properly (eg. yahoo.com).

PUT - gives the server a file

POST - gives the server some information (it"s up to the server what it does with it, sometimes POST is just like PUT).

DELETE - a request that a server deletes a resource (now, now, a server"s not going to delete something without a reason so don"t get any ideas). The daemon shouldn"t tell you whether it deleted the file or not.

TRACE - kind of weird

OPTIONS - I"ve never got this to work.

CONNECT- this is reserved for use with other protocols.

Not all these work on all daemons unless you give the version of HTTP you are using (as 1.0 or 1.1, depending on how you feel). To do this you"d add HTTP/1.x to the end of your method (for instance, GET / HTTP/1.1) but now you have to press Enter twice.

HEAD / HTTP/1.0

HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI TELo OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
Content-Location: http://tkmsftwbw08/default.htm
Date: Sun, 06 Jan 2002 07:44:33 GMT
Content-Type: text/html
Accept-Ranges: bytes
Last-Modified: Fri, 04 Jan 2002 01:52:58 GMT
ETag: "ee33ae88c294c11:865"
Content-Length: 26597

(See how useful HEAD is? You can find out all about the server and file without having 26597 bytes streaming past your client.)

(Notice that using HTTP 1.0 or more means that the server gives you header stuff.) Why do you have to press Enter twice rather than once? Because by specifying 1.0 or more as your HTTP version number, you are now using HTTP v1.0 or more (duh...). With HTTP 1.x, you can give your own headers after the method. The server keeps accepting stuff from you until it gets a double carriage return (a blank line in other words). With these headers you"d give extra information like the language you"d prefer to have your webpage in. I set up my own dummy web daemon (with netcat if you"re interested) sometime and got three different browsers (Netscape, Internet Explorer and Lynx) to connect to it and get the main page. You can see what kind of information the browsers give gratia:

Internet Explorer:
GET / HTTP/1.1
Accept: application/msword, image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
*/*
Accept-Language: en
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; TBP_7.0_GS)
Host: ###.###.###.###
Connection: Keep-Alive

Netscape:
GET / HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/4.08 [en] (Win98; I ;Nav)
Host: ###.###.###.###
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*
Accept-Encoding: gzip
Accept-Language: en
Accept-Charset: iso-8859-1,*,utf-8

Lynx:
GET / HTTP/1.0
Accept: text/html, text/plain, text/sgml, */*;q=0.01
Accept-Encoding: gzip, compress
Accept-Language: en
Negotiate: trans
User-Agent: Lynx/2.8.1rel.2 libwww-FM/2.14
Via: 1.0 ###.###.###.###:80 (Squid/2.4.STABLE3)
X-Forwarded-For: ###.###.###.###
Host: ###.###.###.###
Cache-Control: max-age=259200
Connection: keep-alive

That"s quite a bit of information as well - the Lynx output has a little more because I used a copy on a remote network. (Why do Netscape and Internet Explorer feel the need to broadcast the name of people"s operating systems?) The important headers here are Connection, User-Agent and Host. The value for Connection here in all cases is keep-alive because browser-makers know that making a new connection is a real rigmarole. Actually, with HTTP v1.1, the default connection is keep-alive so unless a Connection: close is issued in response by the server, you should assume that the connection will keep on going. Whenever a Connection: close is in a message, by server or client, then the connection will close after the message is complete. Now, here"s what rfc 2616 has to say about User-Agent:

The User-Agent request-header field contains information about the user agent originating the request. This is for statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake of tailoring responses to avoid particular user
agent limitations. User agents SHOULD include this field with requests.

SHOULD is rfc-speak for saying that this is a recommendation that you should only ignore with good reason and after examining all the details (defined in rfc 2119...). User-Agent usually gives information in a big to small (most to least significant) format and sometimes strange things are in the User-Agent header. Anyway, now we have the host header. This is just specifies the hostname so if you connected to yahoo.com, you"d have Host: yahoo.com amongst the rest. Rfc 2616 does not fail in its attempt at stressing the importance of Host:

A client MUST include a Host header field in all HTTP/1.1 request messages. If the requested URI does not include an Internet host name for the service being requested, then the Host header field MUST be given with an empty value. An HTTP/1.1 proxy MUST ensure that any request message it forwards does contain an appropriate Host header
field that identifies the service being requested by the proxy. All Internet-based HTTP/1.1 servers MUST respond with a 400 (Bad Request) status code to any HTTP/1.1 request message which [sic] lacks a Host header field.

So don"t be naughty (or a bad request) and forget your Host - unless of course you are using HTTP v1.0. Even with HTTP v1.0, however, you sometimes come across servers and situations where you"re stuck with it. Consider this connection to www.tripod.lycos.com:

GET / HTTP/1.0



302 Found

Found


The document has moved here.


Hang on, I am connecting to www.tripod.lycos.com so why the hell is this daemon telling me to go to www.tripod.lycos.com??? This is what I was muttering to myself when I tried this some time ago until I suddenly thought of using Host. Worked like a charm but still a little weird.

This would have to be the end of this HTTP tutorial (hope you got something from it). Just a few final points though:

You can GET any kind of media (pictures, videos, sounds, etc) not just html.
Look at the HTTP 1.1 rfc (2616) that I quoted heavily throughout this document if you still have any questions although I must warn you that an rfc makes terrible bedtime reading (I quoted the more gripping, on-the-edge-of-your-seat things...).
If you really liked this tutorial or have something other to say about it then email me at neko@rootshell.be.
Copying this document
This document released under the condition that it is only reproduced, partially or complete, in its original form and together with the name Nekogaimasu and this condition.
Appendix

Here"s some useful information I promised regarding status codes as taken from rfc 2616:

The individual values of the numeric status codes defined for HTTP/1.1, and an example set of corresponding Reason-Phrase"s, are presented below. The reason phrases listed here are only recommendations -- they MAY be replaced by local equivalents without affecting the protocol.

Status-Code =
[Protocol info]
"100" ; Section 10.1.1: Continue
| "101" ; Section 10.1.2: Switching Protocols
[OK]
| "200" ; Section 10.2.1: OK
| "201" ; Section 10.2.2: Created
| "202" ; Section 10.2.3: Accepted
| "203" ; Section 10.2.4: Non-Authoritative Information
| "204" ; Section 10.2.5: No Content
| "205" ; Section 10.2.6: Reset Content
| "206" ; Section 10.2.7: Partial Content
[Redirection]
| "300" ; Section 10.3.1: Multiple Choices
| "301" ; Section 10.3.2: Moved Permanently
| "302" ; Section 10.3.3: Found
| "303" ; Section 10.3.4: See Other
| "304" ; Section 10.3.5: Not Modified
| "305" ; Section 10.3.6: Use Proxy
| "307" ; Section 10.3.8: Temporary Redirect
[Your fault]
| "400" ; Section 10.4.1: Bad Request
| "401" ; Section 10.4.2: Unauthorized
| "402" ; Section 10.4.3: Payment Required
| "403" ; Section 10.4.4: Forbidden
| "404" ; Section 10.4.5: Not Found
| "405" ; Section 10.4.6: Method Not Allowed
| "406" ; Section 10.4.7: Not Acceptable
| "407" ; Section 10.4.8: Proxy Authentication Required
| "408" ; Section 10.4.9: Request Time-out
| "409" ; Section 10.4.10: Conflict
| "410" ; Section 10.4.11: Gone
| "411" ; Section 10.4.12: Length Required
| "412" ; Section 10.4.13: Precondition Failed
| "413" ; Section 10.4.14: Request Entity Too Large
| "414" ; Section 10.4.15: Request-URI Too Large
| "415" ; Section 10.4.16: Unsupported Media Type
| "416" ; Section 10.4.17: Requested range not satisfiable
| "417" ; Section 10.4.18: Expectation Failed
[My fault]
| "500" ; Section 10.5.1: Internal Server Error
| "501" ; Section 10.5.2: Not Implemented
| "502" ; Section 10.5.3: Bad Gateway
| "503" ; Section 10.5.4: Service Unavailable
| "504" ; Section 10.5.5: Gateway Time-out
| "505" ; Section 10.5.6: HTTP Version not supported
| extension-code

extension-code = 3DIGIT
Reason-Phrase = *

HTTP status codes are extensible. HTTP applications are not required to understand the meaning of all registered status codes, though such understanding is obviously desirable. However, applications MUST understand the class of any status code, as indicated by the first digit, and treat any unrecognized response as being equivalent to the x00 status code of that class, with the exception that an unrecognized response MUST NOT be cached. For example, if an unrecognized status code of 431 is received by the client, it can safely assume that there was something wrong with its request and treat the response as if it had received a 400 status code. In such cases, user agents SHOULD present to the user the entity returned with the response, since that entity is likely to include human-readable information which will explain the unusual status.

Rate this article

All images, content & text (unless other ownership applies) are © copyrighted 2000 -  , Infosecwriters.com. All rights reserved. Comments are property of the respective posters.