At language-level data are stored in data structures. At TCP/UDP-level data are communicated as ‘messages’ or streams of bytes — hence, conversion/flattening is needed (Converted to a sequence of bytes)
Different machines have different primitive data reps, like Integers, float-type,char codes. So we can get two solutions.
Either both machines agree on a format type (included in parameter list) or an intermediate external standard is used. External data representation is an agreed standard for the representation of data structures and primitive values. e.g., CORBA Common Data Rep (CDR) for many languages, Java object serialization for Java code only.
Marshalling is process of taking a collection of data items and assembling them into a form suitable for transmission .Unmarshalling is disassembling (restoring) to original on arrival.
Three alter approaches to external data representation and marshalling:
- CORBA’s common data representation (CDR)
- Java’s object serialization
- XML (Extensible Markup Language) : defines a textual format for rep. structured data
In First two marshalling & unmarshalling carried out by middleware layer .And in XML software for marshalling and unmarshalling is available.
In First two primitive data types are marshalled into a binary form and in XML represented texually.
Another thing is Whether the marshalled data include info concerning type of its contents. In CDR, just the values of the objects transmitted and in Java, type info in the serialized form .And in XML, type info refer to externally defined sets of names (with types), namespaces
Although we are interested in the use of external data representation for the arguments and results of RMIs and RPCs, it has a more general use for representing data structures, objects, or structured documents in a form suitable for transmission or storing in files
CORBA CDR
15 primitive types: short, long, unsigned short, unsigned long, float, double, char, boolean, octet, any Constructed types: sequence, string, array, struct, enum and union
It does not deal with objects (only Java does: objects and tree of objects)
• Person struct with value: {‘Danie’, ‘America’, 1985}
CORBA CDR message
Java object serialization
serialization-flattening an object or a connected set of objects into a serial form suitable for storing on disk or transmitting in a message
deserialization -vice versa, assuming no a prior knowledge about of types of objects -self-containness
• serialization of an object + all objects it references as well to ensure that with the object reconstruction, all of its references can be fulfilled at the destination
• recursive procedure
Person p = new Person(“Danie,”America”,1985);
The true serialized form contains additional markers; h0 and h1 are handles
serialize: create an instance of class ObjectOuputStream on the stream and invokes its writeObject method
deserialize: open an ObjectOutputStream on the stream and use its readObject method to reconstruct the original object
(de)serialization carried out automatically in RMI
Reflection -– the ability to enquire about the properties of a class, such as the names and types of its instance variables and methods
- enables classes to be created from their names
- a constructor with given argument types to be created for a given class
- Reflection makes it possible to do serialization and deserialization in a completely generic manner
Extensible Markup Language (XML)
Extensible Markup Language (XML) defined by the World Wide Web Consortium (W3C) .Data items are tagged with ‘markup’ strings .Tags relate to the structure of the text that they enclose . XML is used to enable clients to communicate with web services ,defining the interfaces and other properties of web services , archiving and retrieval systems , specification of user interfaces , encoding of configuration files in operating systems . Clients usually use SOAP messages to communicate with web services
SOAP — XML format whose tags are published for use by web services and their clients
XML elements and attributes
Elements: portion of character data surrounded by matching start and end tags
• An empty tag — no content and is terminated with /> instead of > — For example, the empty tag … tag
Attributes: element — generally a container for data, whereas an attribute — used for labelling that data • Attributes are for simple values
• if data contains substructures or several lines, it must be defined as an element
Names start with letter _ or :
Binary data — expressed in character data in base64
Parsing and well-formed documents
set of rules e.g. XML prolog:
<?XML version=”1.0” encoding=”UTF-8” standalone=”yes”?>
XML namespaces — URL referring to the file containing the namespace definitions.
For example:
xmlns:pers = “http:www.cdk4.net/person”
Illustration of the use of a namespace in the Person structure
XML schemas [www.w3.org VIII] defines the elements and attributes that can appear in a document, how the elements are nested and the order and number of elements, and whether an element is empty or can include text
• used for encoding and validation
An XML schema for the Person structure