Complementing Java with XML Schema

 

Consider a Java class as a set of constraints, and an instance of that class as data that adheres to those constraints. The data in this case is binary data; in other words, bytecode. The constraints define the variables that can be filled with data, the methods that can be implemented, and acceptable inputs and outputs. However, the actual values of the variables and method calls are unknown, and undetermined until runtime. Much as a content author populates an XML document with data that conforms to a schema, an application populates a Java instance with values for the specific task at hand. This concept can then be layered upon itself when you consider a Java interface as another set of constraints, this time on the class definition.

The interface defines the actual method signatures, what inputs and outputs are acceptable, and what contract classes that implement the interface must follow. In this way, an interface constrains class definitions, which in turn constrain class instances.

While this chain of constraints makes for highly effective modeling and object-oriented design, the data that is used to set values within the class instance is not constrained except by type. As long as the variable is, for example, an int, any range is accepted. Implementing further constraints requires code within the class or method implementation. In addition, the return values of methods are similarly unconstrained. The application client then must enforce validation of its own if the return value of the invoked method must fall within a certain range. This makes for quite a bit of extra coding, and also can result in ambiguity to those using classes you may have written.

The values you may be returning may not be in the range of values the client expects; if validation is not explicitly coded, serious and unexpected behavior can result from miscommunication.The perfect solution and complement to Java in this case is XML Schema. By using XML Schema to constrain the data acceptable for member variables in a Java class instance, much tighter controls can be enforced that enable application clients to know exactly what ranges of data may be returned from method calls. XML Schema can also be used to define values acceptable for class instance use. For example, let's look back at some of the member variables used by our XmlRpcConfigurationHandler class:

 

private String uri;

private String hostname;

private int portNumber = 0;

private Hashtable handlers;

  • The problem here is that the hostname may need to be a limited number of characters in length; the port number should be a positive integer less than 64,000; and the handlers may have additional constraints. The XML-RPC clients are able to set these values with any appropriate type, forcing the handler to perform validation within code. However, along with the Java class definition.
  • Here, each member variable is treated as an attribute of the class itself. With this XML Schema as a counterpart to your Java code, validation can occur outside of the code with a standard mechanism, and can also allow the client to act more intelligently, understanding the allowed ranges and constraints on allowed data types.
  • While this integration at the Java Virtual Machine ( JVM) level is still a long way from reality, the promise of integration is important enough to warrant thinking about how validation is currently occurring. If you can convert your data constraints from DTDs to schemas, you are ahead of the game if and when XML Schema is integrated more tightly with the Java language. Additionally, you may find ways to integrate XML Schema constraints into your application logic in the process of constraint conversion.

 

Pattern Matching

Extending our look at XML Schema in light of how it can constrain data and integrate with Java even further, we look at XML Schema's pattern matching capabilities. In the last section, we talked about using XML Schema to avoid complicated validation within Java code. This is only applicable, though, if XML Schema can do more than just determine simple numeric ranges and String lengths. For example, ensuring that a monetary value is entered with allowed formatting applied is more complicated than requiring a data type and length. Instead, pattern matching must occur, as a dollar value can be entered in a number of ways:

$4.50

$45.96

$54

$45.6

These types of scenarios must be handled for schemas to be useful for data validation. XML Schema provides the ability to perform pattern matching through the pattern attribute on an element or attribute.

Here, the dollar sign is required ($). Then a sequence of digits can follow, occurring an unlimited number of times ([0-9]+). Then, as signified by the question mark around the entire parenthesized group, an optional cents qualifier can be given ($4.50). Again, digits can appear, but this time only singly or in a pair ({1,2}), and they must be preceded by a decimal (.).

While this is a simple example, as Perl and regular expression aficionados will let you know, it does show that XML Schema provides for pattern matching constraints. You should consult the XML Specification for more information about the supported regular expression constructs. Using these expressions can result in very complex validation occurring in your schema, which reduces the responsibility of your Java code to perform this validation in complicated code.

 

XML-RPC and Distributed Systems

  • A particularly important application of using data constraints to complement Java code is in the case of XML-RPC, which we have already looked at briefly. Currently, XML-RPC libraries have a predefined set of variable types that can be passed between server and client.
  • These constraints on what can be transferred across the network allow the client to have only a general idea of the handlers on the server and still interact with them. However, there is no knowledge of the ranges of values accepted as input and returned as output; although this may seem a minor issue, a handler being able to set these constraints and allow the client to recognize them can save significant processing time in validation and greatly increase its usability.
  • With a schema defining the ranges and specifics of data input and output for XML-RPC handlers, a complete map of a handler's functionality is available to clients. This also applies for developers seeking to use another developer's classes. Not only is the input known, but specific details about useful input are available; exceptions thrown as a result of invalid data can be almost completely eliminated. This more complete mapping of XML-RPC handlers could easily be extended to other distributed systems; a prime candidate for this is EJB.
  • In addition to the remote interface, imagine an XML Schema contract of the allowable data that is input and output from the methods in the remote interface. This additional information could greatly enhance the usability and reliability of distributed systems, particularly when the developers of the beans and handlers are not able to directly communicate with developers of application clients.

 

 

Databases and XML

Another revolution that could be brought on by additional data constraints and mappings is XML involvement in database use. First, it should be pointed out that we are not talking about pure XML databases here. Although complete XML database systems are being developed, they are very young technologies, and will most likely encounter a lot of resistance among traditionalists in management and application development. Additionally, there has yet to be a compelling reason for converting existing relational databases to this new format.

What is worth taking a long look at, however, is using XML to map data from Java (or any other programming language) to a relational or object-oriented database. Again, the key is that whle mappings occur today, these mappings do not reflect the physical constraints that may exist on a database. Thus, complex validation and range checking has to occur in application code before database inserts and updates can occur, and even then rigorous error checking has to be performed to ensure that errors do not occur from database constraint violation.

 


Like it on Facebook, Tweet it or share this article on other bookmarking websites.

No comments