INTRODUCTION TO INTERNET
INTRODUCTION TO INTERNET
Internet is a worldwide network that is a widely used to connect universities, government offices, companies and private individuals. A machine to be on the Internet means it runs TCP/IP protocol stack, has an IP address, and has the ability to send IP packets to all the other machines on the Internet. A private individual having a personal computer can call up an Internet service provider using a modem, be assigned a temporary IP address, and send IP packets to other Internet hosts.
An Internet consists of a set of connected networks that act as an integrated whole. The Internet provides universal interconnections while allowing individual groups to use whichever network hardware is best suited to their needs. As a network of networks, it provides a capability for communication to take place between research institutions, individuals, and among all ‘Internet Citizens’.
As a complex system of interlinked networks, the Internet supports millions of ‘servers’ computers housing large volumes of all sorts of information. The Internet is where millions of friends and strangers can chat. It lets people browse through thousands of on –line libraries, play new games, and trade software. Another feature of the Internet is that it has no geographic bounds. Users are logging on from India to the US, India to Australia etc.
WWW, an Internet environment, was a break through which began in 1993. It is just a software scheme for imposing order over the mass of free-form information on the Internet by organizing it into easily understood pages. Hyperlinking is a software technique that has made the web a powerful cyber helper. When composing a web page, an author can create hyperlinks-words that appear in bold type and indicate a path to some other information. Using a program known as a web browser on a personal computer or workstation, one can read pages stored on any web computer.
Many kinds of information are now available on World Wide Web servers. WWW offers hypertext technology that links together a ‘web’ of documents so that these can be navigated in any number of ways with the use of sophisticated Internet-specific graphical user interface (GUI) software (for example, Mosaic, Netscape, Explorer). WWW may also provide information consisting of hypermedia in which contents may include graphics video, voice, and/or music a part form text. Web server software now also allows the delivery of live, real-time audio and video.
Magazines, specs, documents, advertisements, technical information etc., of interest to business people are available on servers, generating considerable corporate traffic as knowledge workers access them via the enterprise network. Specifically, the WWW is a set of public specifications and a library of code for building information servers and clients. WWW is ideal to support cooperative work in complex research fields. WWW uses Internet–based architectures employing public and open specifications along with free sample implementations on the client and server end, so that any one can build a client or a server.
The three key components of WWW are:
- URL (Uniform Resource Locator)
- HTTP(Hyper Text Transfer Protocol)
A URL is the address of the document, which is to be retrieved from a network server. It contains the identification of the protocol, the server, and the filename of the document. When the user clicks on a link in a document, the link icon in the document contains the URL, which the client employs to initiate the session with the intended server.
HTTP is the protocol used in support of the information transfer. It is a fixed set of messages and replies that both the server and the client understand. The document itself, which is returned using HTTP upon the issuance of a URL, is coded in HTML. The browser interprets the HTML to identify the various elements of the document and render it on the screen of the client.
HTML specifications are reasonable for 2-dimensional page layout, but not necessarily for truly interactive browsing. Newly emerging languages such as Virtual Reality Modeling Language (VRML) and Java are designed to enhance the web browsing experience. VRML offers a method of describing 3-dimensional space so that user can navigate in 3-D. Java is an object-oriented program that adds animation and real time interaction through in line applications.
Activities running on Web servers that are CPU-and communications-intensive include applications generating real-time graphics using charts and colors to show trends in the stock market, voter returns, Geographic Information Systems (GISs), weather maps, database statistics, and analysis related to e-commerce.
Some of the Web server software is based on the concept of streaming media, which delivers audio and video on demand, rather than requiring a user to download a file from the web and play it back from the local server or hard drive.
The Internet provides connectivity for a wide range of application processes called network services. One can exchange electronic mail, access and participate in discussion forums, search databases, browse indexes, transfer files, and so forth. TCP and IP were developed for basic control of information delivery across the Internet. Application layer protocols, such as TELNET, FTP, SMTP, and HTTP, have been added to the TCP/IP suite of protocols to provide specific network services.
Hardware components of the Internet include routers, PCs, workstations, and servers. Software vendors include NetScape, spyGlass, spry, NetManage, Microsoft, and PC-NFS. Carriers include providers of dial-up access and providers of dedicated access (for example, inter exchange carrier).
E-mail is a most familiar and widely used network service. It is a system for sending messages or files to other computer users based on mailbox addresses rather than a direct host-to host exchange, and supports mail exchange between users on the same or different computers. Unlike other client-server applications, E-mail allows users to send anything from short notes to extensive files without worrying about the current availability of the receiving host. Some e-mail uses are as follows:
- Send a single message to one or many recipients.
- Send messages that include text, voice, video, or graphics.
- Send messages to users on networks outside the Internet.
- Send messages calling for a response from a computer program rather than a user.
E-mail design is similar to the postal system. Addresses are used to identify both the recipient and sender of a message (return address). Messages that cannot be delivered within a specified amount of time are returned to the sender. Every user on the network has a private mailbox. Received mail is stored in the mailbox until the recipient removes or discards it.
Electronic mail is different from other message transfer services provided by the Internet in the sense that it provides a mechanism called spooling, which allows a user to send mail even if a network is currently disconnected or the receiving machine is not operational. When a message is sent, a copy is placed in a storage facility called a spool. A spool resembles a queue. Messages in a queue are processed on a first-come, first-served basis. Once in a spool, a message is searched every 30 seconds by a client process running in the background. The background client looks for new messages and not-yet-sent old messages and attempts delivery. If the client process is unable to deliver a message, it marks the message with the time of the attempted delivery, leaves it in the spool, and repeats the attempt at a later time. If all attempts at delivery fail, the message may be deemed undeliverable after several days and returned to the mailbox of the sender. A message is considered delivered only when both client and server agree that the recipient has seen and disposed of it. Until then copies are kept in both the sending spool and the receiving mailbox.
An E-mail system consists of two sub systems:
User Agent (UA): A user agent supports user interface by allowing people to read and send E-mail. It retrieves incoming messages, mails outgoing messages and helps to compose messages by providing command based, menu based or graphical method to interaction.
Message Transfer Agent (MTA): Move messages from the source to the destination. It serves as the mail system’s interface with the network and runs in the background to move E-mail through the system.
Architecture & Services
Fig.(a). Two Directional Electronic Mail
An E-mail system support five basic functions or services:
1. Composition: Process of creating messages and answers. The system provides assistance with addressing and the numerous header fields attached to each message. For example, when answering a message, the e-mail system can extract the originator’s address from the incoming e-mail and automatically insert it in to the proper place in the reply.
2. Transfer: This refers to moving messages from the originator to the recipient. This requires establishing a connection to the destination or some intermediate machine, outputting the message, and releasing the connection.
3. Reporting: The originator is informed about what happened to the message – delivered, rejected or lost.
4. Displaying: This makes the incoming message displayed to the user so that he can read it. Necessary conversions like formatting, conversion of digitized voice etc. is done if required.
5. Disposition: This is concerned with what the recipient does with the message after receiving it. A message can be saved or deleted after reading or may be forwarded to some other person
6. Forwarding: This is an advanced feature in which an e-mail may be automatically forwarded to other user in case a recipient is away for some period of time.
7. Mailbox creation: A user can create a mailbox to store incoming E-mail. Mailboxes can be created and destroyed. Contents of mailboxes can be inspected and messages can be inserted and deleted from the mailboxes.
8. Mailing list: A message can be sent to a group of persons together with a single command. When a message is sent to the mailing list, identical copies are delivered to everyone on the list.
9. Registered E-mail: Allows the originator to know that his message has arrived. Automatic notification of undelivered mail is also done.
Other advanced features are carbon copies, high-priority e-mail, secret (encrypted) e-mail, alternative recipients if the primary one is not available, and the ability for secretaries to handle their boss’s e-mail.
A user agent is a user interface that routes messages between the uses and the local MTA and accepts commands for composing, receiving and replying to messages as well for manipulating mailboxes menu or icon-driven interfaces through mouse or 1-character commands through keyboard may be provided by a user agent.
Thus, the user agent interacts with the user and essentially defines what the user can do.
A user agent also manages a message store (MS), which is used to store messages.
Some of the user agent’s functions are listed below:
- Send Mail. The User Agent (UA) accepts a message and address from a user and gives it to the MTA for delivery. The UA will invoke an editor to allow the user to write a message or it will simply send the contents of a previously constructed file (which presumably contains the message). The address may be DNS address of the form - mail box @ location or may be of other forms like defining attributes (attribute = value pairs), or mailing lists. User Agent also appends a header to the user message containing the following information:
- cc recipients (those who get copies)
- blind copy recipients (those who get copies secretly)
- subject (short note describing its contents)
- message ID
- reply-by date
- sensitivity (degree of confidentiality)
This information is useful to the receiving UA for delivering the message to the appropriate users and for displaying a summary of all mail messages in a user’s mailbox.
- Display a list of mail messages. This is similar to a directory list in a computer account that lists all files. In this case, the UA provides a list of all messages for the current user. Each entry in the list typically contains the following items:
- source of the message
- subject field
- message size
- various indicators indicating whether the message has been read, deleted, answered, or forwarded to another user.
- message ID
- date received
- Display the contents of a message on the computer or workstation screen.
- Reply. The UA allows the user to respond to a currently displayed message by invoking an editor through which the user can construct a response. When the user is finished, the UA automatically sends the response to the source of the previously displayed message.
- Forward the current message to one or more specified recipients.
- Extract the message and store it in a file. This is convenient if you want to save a message, print it, or make it available to others without explicitly forwarding it to them.
- Delete unwanted messages. Messages are not always physically deleted but rather are marked for deletion. Then another command such as a purge deletes all files marked for deletion. This allows accidentally deleted messages to be recovered.
- Undelete a message. Removes the deletion mark mentioned previously.
Message Transfer Agent (MTA)
The MTA is software running on a dedicated workstation or computer and is part of the email system’s backbone. Each MTA communicates with one or more UA’s and other MTAs. Its basic function is to accept mail from a UA or another MTA, examine it, and route it. For example, when it receives mail from a UA it verifies the format of the mail. If it is not correct it informs the UA an error has occurred so the sender can be notified. If it is correct there are two possibilities. First, the recipient is reachable via another UA to which the MTA is connected. Example is user A sending mail to user B connected to same MTA. In this case the MTA gives the mail to the appropriate UA for delivery. Second, the UA that will deliver the mail is connected to another MTA. Here user A may have sent mail to user C connected to some other MTA. In this case the mail must be routed to another MTA. Collectively the MTAs execute a routing strategy that sends the mail through one or more MTAs until it reaches the desired one. Then the mail is sent to the appropriate UA for delivery.
The UA appends a header to a message sent from the user. If the MTA must route the mail to another MTA it also appends additional bytes to the message and header. These additional bytes are called the envelope. The envelope is used by the MTAs for routing, error checking, and verification. The following are some of the fields in the envelope:
- Destination address
- Sender address for possible acknowledgment or return
- Mail identification number
- Deferred date. A user may specify that delivery must occur after a given date.
- Delivery date. A user may specify that delivery must occur before a given date.
- Field specifying whether a message should be returned if it cannot be delivered
- Bytes for error detection
- Encryption information, such as the location of a key
- Digital signatures. This field can provide authentication for the sender. This is useful if a sender later denies transmitting something, or at least “cannot recall” sending it.
MESSAGE FORMAT AND TRANSFER
Messages consist of a primitive envelope, some number of header fields, a blank line, and then the message body. Each header field consists of a single line of ASCII text containing the field name, a colon, and, for most fields, a value.
The principal header fields related to message transport are:
To: E-mail address(es)(DNS address) of primary recipient(s)
Cc: E-mail address(es) of secondary recipient(s)
Bcc: E-mail address(es) for blind carbon copies. Here Bcc: (Blind Carbon Copy) field is like the Cc: field, except that the line is deleted from all the copies sent to the primary and secondary recipients. This feature allows people to send copies to third parties without the primary and secondary recipients knowing this.
From: person or people who created the message.
Sender: E-mail address of the actual sender. This field may be omitted if it is the same as the From field. Else, a sender may be a different person(for example secretary) than the actual creator of the message(for example a business executive).
Received: Line added by each message transfer agent along the route. The line contains the agent’s identity, the date and time the message was received, and other information that can be used for finding bugs in the routing system.
Return-path: can be used to identify a path back to the sender. This information can be gathered from all the Received: headers.
The message transfer system is concerned with relaying messages from originator to the recipient. The simplest way to do this is to establish a transport connection from the source machine to the destination machine and then just transfer the message.
SIMPLE MAIL TRANSFER PROTOCOL (SMTP)
SMTP is the basic protocol for transmitting messages between computers in the Internet.
Certainly one of the most common uses of networks is electronic mail, the ability to send a message or file to a specific user at a local or remote site. Typically, you send a message by specifying the email address of the recipient. The usual address format is name@host-text-address. The message is buffered as the destination site and is accessible only by the intended user.
There are some similarities with file transfer protocols. For example, both use a client and server to negotiate transfer of data. However, email typically sends the file to a specified user, in whose account the message is buffered. Also, the email client and server work in the background. For example, if you get a file-using FTP, you wait until the file arrives before doing another task. If you send (or receive) a file using email you can do other tasks while the client and server perform the mail delivery in the background. In the case of receiving mail you need not even be logged on. You will be notified of new mail the next time you log in to the system.
The standard mail protocol in the TCP/IP suite is the Simple Mail Transfer Protocol (SMTP). It runs above TCP/IP and below any local mail service. Its primary responsibility is to make sure mail is transferred between different hosts. By contrast, the local service is responsible for distributing mail to specific recipients.
When a user sends mail, the local mail facility determines whether the address is local or requires a connection to a remote site. In the latter case, the local mail facility stores the mail (much as you would put a letter in a mailbox), where it waits for the client SMTP. When the client SMTP delivers the mail, it first calls TCP to establish a connection with the remote site. When the connection is made, the client and server SMTPs exchange packets and eventually deliver the mail. At the remote end the local mail facility gets the mail and delivers it to the intended recipient.
The packets are also called SMTP protocol data units (PDUs) or simply commands. When the TCP connection is made the server sends a 220PDU indicating it is ready to receive mail. The number 220 serves to identify the type of packet. Afterward, the client and server exchange the identities of their respective sites. Next, the client sends a MAIL FROM PDU indicating there is mail and identifying the sender. If the server is willing to accept mail from that sender it respond with a 250 OK PDU.
The server then sends one or more RCPT TO PDUs specifying the intended recipients to determine whether the recipients are there before sending the mail. For each recipient, the server responds with a 250 OK PDU (recipient exists) or a 550 recipient not here PDU. After the recipients have all been identified, the client sends a DATA PDU indicating it will begin mail transmission. The server’s response is a 354 start mail PDU, which gives the OK to start sending and specifies a sequence the client should use to mark the mail’s end. In this case the sequence is . . The client sends the mail in fixed-size PDUs, placing this sequence at the mail’s end. When the server gets the last PDU, it acknowledges receipt of the mail with another 250 OK PDU. Finally, the client and server exchange PDUs indicating they are ceasing mail delivery and TCP releases the connection.
THE WORLD WIDE WEB
The World Wide Web is an architectural framework for accessing linked documents spread out over thousands of machines all over the Internet. It has a colorful graphical interface that is easy for beginners to use, and it provides an enormous wealth of information on almost every conceivable subject.
The Web (also known as WWW) began in 1989 at CERN, the European center for nuclear research. The WWW, is a World Wide, Internet based, multimedia presentation system. It is a system of cooperating Internet host computers that offer multimedia presentations, indexes, cross references, and text-search capabilities so that users can find text documents across the globe. The main vehicle for users to traverse the WWW are directories, which organize WWW sites by topic and evaluate them, and search engines which scan WWW pages for keywords or phrases. WWW is not owned by any body. People are responsible for the documents they create and make available to the public. Via the Internet, hundreds of thousands of people generate information that is accessible from homes, schools, and work places around the world.
The Web grew out of the need to have large teams of internationally dispersed researchers collaborate using a constantly changing collection of reports, blueprints, drawings, photos, and other documents.
Web is basically a client-server system. Before discussing client and server the concept of Domain Name System is given.
DOMAIN NAME SYSTEM (DNS)
Domain name is a service intended to simplify the access and the use of the Internet. It is implemented using the Domain Name System (DNS) and the appropriate DNS servers. It is a service for any host or user on the Internet, not just for the HTTP (a web server), and it is an important option to consider when building a web site. A DNS server is a computer, which translates numeric IP addresses to and from user-friendly names (domain names) to identify hosts and networks on the internet. The domain name service is a hierarchical, distributed method of organizing the name space of the Internet. The DNS administratively groups hosts into a hierarchy of authority that allows addressing and other information to be widely distributed and maintained. A key advantage to the DNS is that using it eliminates dependence on a centrally maintained file that maps host names to addresses. DNS is supported by domain name servers.
The IP address is a numeric address the serves a role analogous to a telephone number. In representation, IP addresses always consist of four numbers: four decimal values separated by periods. The computer named int.cs.ku.edu, for instance, is assigned a number 184.108.40.206 . IP addresses are numeric and can be easily understood and manipulated by the hardware and software that must move information over the Internet. So IP addresses are better suited to computers, and domain addresses are better suited to humans. DNS allows a translation between the domain name and the IP address. Domain
names do not necessarily have four parts. They might have only two parts - a top-level domain such as ‘edu’ or ’com’, preceded by a sub domain – or three, four or many.
The only limitations are:
- a domain-style name cannot exceed 255 characters
- each part of the name cannot exceed 63 characters
Domain names used in the Internet
All hosts attached to a network or subnet of the Internet must be registered with one of these organizational domains. The overall directory for the Internet is partitioned according to these domains. The choice of domain to be registered under is made to minimize the number of referrals. Hence, if a host is to be attached to a network that belongs to an educational institution, since it is likely that most network transactions will involve hosts that are attached to its own network or to those of other educational institutions, it is registered within the EDU domain.
Each domain uses an appropriate naming hierarchy. In the EDU domain the next level in the hierarchy is the names of the different educational institutions, while in the COM domain it is each commercial organization. The general scheme and naming convention are as shown in figure (c):