XML is considered as the forerunner of how systems will be built in the future. Most of the limitations of HTML have been overcome with this language. For the long term maintenance and re-purposing of information, XML does the heavy lifting without computing gymnastics closer to the web publishing business, with greater improvements in information interchange. Given a choice, it is better to build new systems using XML, as the foundation to have more flexibility and also for the possibility of extending the system easily in the future.
XML is an important standard in the development of e-commerce systems that the people should know about it. It was initiated in December 1997 and is the result of a working group funded by the World Wide Web consortium known as W3C and various vendors. Due to the limits of HTML, XML was a result of frustration developed in the minds of HTML developers and users of complex web application. XML is now a standard that addresses many of these concerns.
Definition of XML
XML describes the format, presentation and provides application control over the content of the documents and systems using this language. It is much powerful than HTML. XML is likely to be the next generation language for the Web and business applications.
XML is a technology that lets developers create their own mark-up tags, so they are not limited to the tags available in HTML and to the functionality HTML allows. XML allows links that go to multiple locations. HTML allows for links that lead to one destination each.
XML is defined by a number of related specifications:
Extensible Mark-up Language (XML) 1.0 – Defines the syntax of XML. The XML specification is the primary focus.
XML Pointer Language (X pointer) and XML Linking Language (X link) – X link defines a standard way to represent links between resources. In addition to simple links, like HTML’s tag, XML has mechanism for links between read only resources. X pointer describes how to address a resource, X link describes how to associate two or more resources.
Extensible Style Language (XSL) – It defines the standard style sheet language for XML.
Why XML based Technology ?
1) Reuse and re-purpose information quickly and efficiently.
2) Reduce the maintenance costs associated with e-commerce solution
3) Design flexibility into the system, making it easier when we want to make change as our business requirements dictate.
4) Share information easily inside our organization and with our business partners.
Comparison of HTML with XML
Standard Generalized Mark-up language (SGML) provides a grammar for specifying document formats based on standard mark up annotations. HTML is a general-purpose hypertext display language, based on SGML. It specifies a document format that allows text formatting commands as well as hypertext-link commands and image display commands.
HTML provides limited input features. The HTML document can specify that a form should be displayed. The HTML display program (web client) allows the user to fill in entries in the form. Menus and other graphical input facilities are also provided. When the user has finished entering values, he can press a ‘submit’ button.
At that time, values entered by the user are collected, and the web client sends a request for the document, with the values entered by the user as arguments. This document is an executable program, and the web server invokes the program with the specified arguments. The actual lay out of the screen and the specific forms to be filled and the menu items to be chosen, are all under the control of HTML document. There is no continuous connection between the client and the server.
Structure of XML document
An XML document consists of two parts, the Document Type Definition (DTD) and the Document Body. The DTD defines the vocabulary and the structure of the document. It is the existence of the DTD that makes XML extensible. As long as two businesses or applications are using the same DTD, they will be able to understand each other. The body is simply a flat text document and is very convenient for electronic processing. For this reason, the W3C have defined the Document Object Model (DOM) as a high-level object representation of XML documents.
DOM is a set of interfaces that provide convenient object-based programmatic access to XML documents. The DOM API is defined in IDL. This allows an application or programmer to access and manipulate the structure and content of XML documents. A number of vendors including IBM, Sun and Microsoft provides parsers for their respective environments, which automatically transform between XML documents in their text format and an object representation that conforms to the DOM API.
Domain-specific applications can then manipulate XML documents in this convenient object form, without having to resort to manually encoding and decoding raw text files or proprietary vendor-specific APIs. There is also another API for electronically manipulating XML documents: SAX, the simple API XML is an event based API that allows documents to be easily parsed by the applications.
XHTML
XHTML is one other form of XML. It can be called as transition path from HTML to XML. XHTML by its very name is meant to be extensible. XHTML allows authors to retain the semantics present in HTML. Indeed, XHTML can be rendered in current HTML browsers with only a few minor considerations to backward compatibility. The electronic bridge from HTML to XML lets web developers sharpen their skills and becomes fully prepared for the explosion of XML about to occur on the Web.
Using XML, whatever document you prepare to store your useful data or secret data, needs to be protected. For this security purpose, we have one of the products in the market called X/Secure. It has been developed by Baltimor Technologies Pvt. Ltd.
X/Secure
X/Secure is a radically new product, which has provided XML digital signatures and XML encryption for the first time ever. X/Secure comprises of Toolkit for application developers and a suite of security utilities that can be used in system integrations. It provides full encryption and digital signature capabilities, which can be used in an intranet, extranet or Internet environment. X/Secure offers the following benefits:
- Allows secure e-business XML systems to be created.
- Provides complete confidentiality of XML documents.
- Prevents tampering of documents during either transmission or storage.
- Unique and irrefutable digital signatures can be applied to part or whole documents.
- Full Public Key Infrastructure supports
- No prior knowledge of cryptography is required.
Advantages of XML
1. Its self-describing tags identify what your content is all about.
2. Data is easily repurposed via tags.
3. Creating, using and reusing tags is easy, making XML highly extensible.
4. XML data types map easily among different applications, so it is very inter operable.
5. It makes transferring data easy, simply give it XML tags.
6. Without modifying the XML code, a company can provide access to the information stored in its mainframe server
7. XML is easy to implement and offers numerous advantages to organizations, which would jump onto B2B E-commerce bandwagon by leveraging on their existing resources. The organizations will benefit from the flexibility and robustness provided by XML, thus making it lingua franca of B2B
8. We can define new tags and attributes at will.
9. Document structures can be nested to any level of complexity.
10. Any XML document can contain an optional description of its grammar that needs to perform structure validation.
Disadvantages
1. Majority of online browsers still see only HTML; you will need to add an XML to HTML translator.
2. Performance is still lower than equivalent HTML documents.
3. The XML tagged document is still rare; you will likely be doing a bit of conversion of older data.
4. Standard tag sets for different applications ands industries are not in widespread use yet.
Application of XML
- It can be used for database.
- It is used to store the data and XSL; is used to display only those contents which we would like to display to the viewer.
- It can be used as a web-developing tool, like HTML but has enhanced powers.
- It is a machine independent language.
For most of the programmers, working on a project is more about the coding part. The technologies, its versions, the framework used, database design and various techniques utilized to make non-redundant functional scriptlets.
But for me a web project is more about the finished product, which immediately gets global (or published, world wide) the moment it is hosted. The user interface and all related activities that come bundled with the project. The whole experience of being on a site for a while and admiring it as a complete product.
When I say a "complete product", I intend to define it as a set of user interface screens, with attention to detail. The portraying of useful graphics or icons or self explanatory images with the right blend of text to keep the visitor interested. Each interaction (may be a form, a set of links in content, menu(s), representation of common aesthetics for the site) needs to have the finishing touch. To pleasantly surprise the visitor with the unexpected.
With the advent of internet access in the most remote parts of the world, the interface should bear the theme of the site along with being generic enough to be understandable by anybody and everybody.
Experts say that, for a first-time visitor to stay on a site, the golden words are "to get what you are looking for, at the earliest with higher accuracy". I agree with that, but would like to add, that the visitor must get the feel and the urge to visit the site again, not only for the content he had found but also to re-experience the warmth that he had felt.
Maybe I am echoing a designer's perspective on a web site, but the point that I want to make here, is that, any web project has to have a blend of both technology and design to have a great user interface.
Complementing Java with XML Schema
Consider a Java class as a set of constraints, and an instance of that class as data that adheres to those constraints. The data in this case is binary data; in other words, bytecode. The constraints define the variables that can be filled with data, the methods that can be implemented, and acceptable inputs and outputs. However, the actual values of the variables and method calls are unknown, and undetermined until runtime. Much as a content author populates an XML document with data that conforms to a schema, an application populates a Java instance with values for the specific task at hand. This concept can then be layered upon itself when you consider a Java interface as another set of constraints, this time on the class definition.
The interface defines the actual method signatures, what inputs and outputs are acceptable, and what contract classes that implement the interface must follow. In this way, an interface constrains class definitions, which in turn constrain class instances.
While this chain of constraints makes for highly effective modeling and object-oriented design, the data that is used to set values within the class instance is not constrained except by type. As long as the variable is, for example, an int, any range is accepted. Implementing further constraints requires code within the class or method implementation. In addition, the return values of methods are similarly unconstrained. The application client then must enforce validation of its own if the return value of the invoked method must fall within a certain range. This makes for quite a bit of extra coding, and also can result in ambiguity to those using classes you may have written.
The values you may be returning may not be in the range of values the client expects; if validation is not explicitly coded, serious and unexpected behavior can result from miscommunication.The perfect solution and complement to Java in this case is XML Schema. By using XML Schema to constrain the data acceptable for member variables in a Java class instance, much tighter controls can be enforced that enable application clients to know exactly what ranges of data may be returned from method calls. XML Schema can also be used to define values acceptable for class instance use. For example, let's look back at some of the member variables used by our XmlRpcConfigurationHandler class:
private String uri;private String hostname;
private int portNumber = 0;
private Hashtable handlers;
- The problem here is that the hostname may need to be a limited number of characters in length; the port number should be a positive integer less than 64,000; and the handlers may have additional constraints. The XML-RPC clients are able to set these values with any appropriate type, forcing the handler to perform validation within code. However, along with the Java class definition.
- Here, each member variable is treated as an attribute of the class itself. With this XML Schema as a counterpart to your Java code, validation can occur outside of the code with a standard mechanism, and can also allow the client to act more intelligently, understanding the allowed ranges and constraints on allowed data types.
- While this integration at the Java Virtual Machine ( JVM) level is still a long way from reality, the promise of integration is important enough to warrant thinking about how validation is currently occurring. If you can convert your data constraints from DTDs to schemas, you are ahead of the game if and when XML Schema is integrated more tightly with the Java language. Additionally, you may find ways to integrate XML Schema constraints into your application logic in the process of constraint conversion.
Pattern Matching
Extending our look at XML Schema in light of how it can constrain data and integrate with Java even further, we look at XML Schema's pattern matching capabilities. In the last section, we talked about using XML Schema to avoid complicated validation within Java code. This is only applicable, though, if XML Schema can do more than just determine simple numeric ranges and String lengths. For example, ensuring that a monetary value is entered with allowed formatting applied is more complicated than requiring a data type and length. Instead, pattern matching must occur, as a dollar value can be entered in a number of ways:
$4.50$45.96
$54
$45.6
These types of scenarios must be handled for schemas to be useful for data validation. XML Schema provides the ability to perform pattern matching through the pattern attribute on an element or attribute.
While this is a simple example, as Perl and regular expression aficionados will let you know, it does show that XML Schema provides for pattern matching constraints. You should consult the XML Specification for more information about the supported regular expression constructs. Using these expressions can result in very complex validation occurring in your schema, which reduces the responsibility of your Java code to perform this validation in complicated code.
XML-RPC and Distributed Systems
- A particularly important application of using data constraints to complement Java code is in the case of XML-RPC, which we have already looked at briefly. Currently, XML-RPC libraries have a predefined set of variable types that can be passed between server and client.
- These constraints on what can be transferred across the network allow the client to have only a general idea of the handlers on the server and still interact with them. However, there is no knowledge of the ranges of values accepted as input and returned as output; although this may seem a minor issue, a handler being able to set these constraints and allow the client to recognize them can save significant processing time in validation and greatly increase its usability.
- With a schema defining the ranges and specifics of data input and output for XML-RPC handlers, a complete map of a handler's functionality is available to clients. This also applies for developers seeking to use another developer's classes. Not only is the input known, but specific details about useful input are available; exceptions thrown as a result of invalid data can be almost completely eliminated. This more complete mapping of XML-RPC handlers could easily be extended to other distributed systems; a prime candidate for this is EJB.
- In addition to the remote interface, imagine an XML Schema contract of the allowable data that is input and output from the methods in the remote interface. This additional information could greatly enhance the usability and reliability of distributed systems, particularly when the developers of the beans and handlers are not able to directly communicate with developers of application clients.
Databases and XML
Another revolution that could be brought on by additional data constraints and mappings is XML involvement in database use. First, it should be pointed out that we are not talking about pure XML databases here. Although complete XML database systems are being developed, they are very young technologies, and will most likely encounter a lot of resistance among traditionalists in management and application development. Additionally, there has yet to be a compelling reason for converting existing relational databases to this new format.
What is worth taking a long look at, however, is using XML to map data from Java (or any other programming language) to a relational or object-oriented database. Again, the key is that whle mappings occur today, these mappings do not reflect the physical constraints that may exist on a database. Thus, complex validation and range checking has to occur in application code before database inserts and updates can occur, and even then rigorous error checking has to be performed to ensure that errors do not occur from database constraint violation.
Concept of CGI
CGI stands for Common Gateway Interface, and couples your web page to a program which does the desired task for you. Now you can make dynamic web pages, pages that can interact with the user. You can think of forms, quizzes, counters, a bulletin board system, you name it! CGI is intended to be platform and language independent.
There are variety of popular server- side technologies for developing Web- based applications. Historically, the most widely used has been CGI.CGI let’s HTTP(Hyper transfer protocol) clients interact with programs across a network through a web server. CGI is a standard for interfacing applications with a web browser. These CGI applications can be written in many different programming languages. Permission is granted within the web browser by a web master(or author of web site) to allow specific programs to be executed on the web server. Typically, CGI applications reside in the directory /cgi –bin.
CGI is not a language. It's a simple protocol that can be used to communicate between Web forms and your program. A CGI script can be written in any language that can read STDIN, write to STDOUT, and read environment variables, i.e. virtually any programming language, including C, Pearl, or even shell scripting.
It is not a programming language. That means, for example:
- You do not have to learn Pearl
- You can use the languages you already know
- You can use any language as long as it
- can read input
- can write output
And what computer language can’t?
- For that matter, you do not need to use a language.
- It is not a programming style. You can use your own.
- It is not cryptic. Pearl is cryptic, all right, but see above: You don’t need to use Perl.
- It is not for Unix gurus only. In fact, you don’t have to be any kind of guru. All you need is to know how to program. And you already know that!
Structure of a CGI Script
Here's the typical sequence of steps for a CGI script:
- Read the user's form input.
- Do what you want with the data.
- Write the HTML response to STDOUT.
Reading the User's Form Input
"Name1=value1&name2=value2&name3=value3"
So just split on the ampersands and equal signs. Then, do two more things to each name and value:
Convert all
"+"
characters to spaces, and
Convert all
"%xx"
sequences to the single character whose ASCII value is
"xx"
, in hex. For example, convert "%3d" to "=".
This is needed because the original long string is URL-encoded, to allow for equal signs, ampersands, and so forth in the user's input.
So where do you get the long string? That depends on the HTTP method the form was submitted with:
For GET submissions, it's in the environment variable QUERY_STRING.
For POST submissions, read it from STDIN. The exact number of bytes to read is in the environment variable CONTENT_LENGTH.
Sending the Response Back to the User
First, write the line
Content-type: text/html
plus another blank line, to STDOUT. After that, write your HTML response page to STDOUT, and it will be sent to the user when your script is done. That's all there is to it.
Yes, you're generating HTML code on the fly. It's not hard; it's actually pretty straightforward. HTML was designed to be simple enough to generate this way.
If you want to send back an image or other non-HTML response:
For image contents:
Content-type: image/gif
GIF89a&%*$@#--- binary contents of GIF file here ---$(*&%(*@#......
Let's write a simple first program. Enter the following lines into a new file, and name it "first.pl".
#!/usr/bin/perl
Print "Hello, world!\n";
Save the file. Now, in the Unix shell, you'll need to type:
chmod 755 first.pl
This changes the file permissions to allow you to run the program. You will have to do this every time you create a new script; however, if you're editing an existing script, the permissions will remain the same and won't need to be changed again.
Now, in the Unix shell, you'll need to type this to run the script:
./first.pl
If all goes well, you should see it print Hello, world! to your screen.
A CGI program is still a Pearl script. But one important difference is that a CGI usually generates a web page (for example: a form-processing CGI, such as a guestbook, usually returns a "thank you for writing" page.) If you are writing a CGI that's going to generate a HTML page, you must include this statement somewhere in the script, before you print out anything else:
print "Content-type:text/html\n\n";
This is a content header that tells the receiving web browser what sort of data it is about to receive - in this case, an HTML document. If you forget to include it, or if you print something else before printing this header, you'll get an "Internal Server Error" when you try to access the CGI. A good rule of thumb is to put the Content-type line at the top of your script (just below the #!/usr/bin/perl/ line).
Now let's take our original first.pl script, and make it into a CGI script that displays a web page. If you are running this on a Unix server that lets you run CGIs in your public_html directory, you will probably need to rename the file to first.cgi, so that it ends in the .cgi extension. Here is what it should look like:
#!/usr/bin/perl
print "Content-type:text/html\n\n";
print "Test Page\n";
print "\n";
print "
Hello, world!
\n";
print "\n";
to run this Just move it into your public_html or CGI-bin directory, and type the direct URL for the CGI
the output would be