External Data Representation & Marshalling

Suthesana
5 min readJun 21, 2020

--

At language-level data are stored in data structures. At TCP/UDP-level data are communicated as ‘messages’ or streams of bytes — hence, conversion/flattening is needed (Converted to a sequence of bytes)

Different machines have different primitive data reps, like Integers, float-type,char codes. So we can get two solutions.

Either both machines agree on a format type (included in parameter list) or an intermediate external standard is used. External data representation is an agreed standard for the representation of data structures and primitive values. e.g., CORBA Common Data Rep (CDR) for many languages, Java object serialization for Java code only.

Marshalling is process of taking a collection of data items and assembling them into a form suitable for transmission .Unmarshalling is disassembling (restoring) to original on arrival.

Three alter approaches to external data representation and marshalling:

  • CORBA’s common data representation (CDR)
  • Java’s object serialization
  • XML (Extensible Markup Language) : defines a textual format for rep. structured data

In First two marshalling & unmarshalling carried out by middleware layer .And in XML software for marshalling and unmarshalling is available.

In First two primitive data types are marshalled into a binary form and in XML represented texually.

Another thing is Whether the marshalled data include info concerning type of its contents. In CDR, just the values of the objects transmitted ‹and in Java, type info in the serialized form .And in XML, type info refer to externally defined sets of names (with types), namespaces

Although we are interested in the use of external data representation for the arguments and results of RMIs and RPCs, it has a more general use for representing data structures, objects, or structured documents in a form suitable for transmission or storing in files

CORBA CDR

15 primitive types: short, long, unsigned short, unsigned long, float, double, char, boolean, octet, any Constructed types: sequence, string, array, struct, enum and union

It does not deal with objects (only Java does: objects and tree of objects)

• Person struct with value: {‘Danie’, ‘America’, 1985}

CORBA CDR message

Java object serialization

serialization-flattening an object or a connected set of objects into a serial form suitable for storing on disk or transmitting in a message

deserialization -vice versa, assuming no a prior knowledge about of types of objects -self-containness

• serialization of an object + all objects it references as well to ensure that with the object reconstruction, all of its references can be fulfilled at the destination

• recursive procedure

Person p = new Person(“Danie,”America”,1985);

The true serialized form contains additional markers; h0 and h1 are handles

serialize: create an instance of class ObjectOuputStream on the stream and invokes its writeObject method

deserialize: open an ObjectOutputStream on the stream and use its readObject method to reconstruct the original object

(de)serialization carried out automatically in RMI

Reflection -– the ability to enquire about the properties of a class, such as the names and types of its instance variables and methods

  • enables classes to be created from their names
  • a constructor with given argument types to be created for a given class
  • Reflection makes it possible to do serialization and deserialization in a completely generic manner

Extensible Markup Language (XML)

Extensible Markup Language (XML) defined by the World Wide Web Consortium (W3C) .Data items are tagged with ‘markup’ strings .Tags relate to the structure of the text that they enclose . XML is used to enable clients to communicate with web services ,defining the interfaces and other properties of web services , archiving and retrieval systems , specification of user interfaces , encoding of configuration files in operating systems . Clients usually use SOAP messages to communicate with web services

SOAP — XML format whose tags are published for use by web services and their clients

XML elements and attributes

Elements: portion of character data surrounded by matching start and end tags

• An empty tag — no content and is terminated with /> instead of > — For example, the empty tag … tag

Attributes: element — generally a container for data, whereas an attribute — used for labelling that data • Attributes are for simple values

• if data contains substructures or several lines, it must be defined as an element

Names start with letter _ or :

Binary data — expressed in character data in base64

Parsing and well-formed documents

set of rules e.g. XML prolog:

<?XML version=”1.0” encoding=”UTF-8” standalone=”yes”?>

XML namespaces — URL referring to the file containing the namespace definitions.

For example:

xmlns:pers = “http:www.cdk4.net/person

Illustration of the use of a namespace in the Person structure

XML schemas [www.w3.org VIII] defines the elements and attributes that can appear in a document, how the elements are nested and the order and number of elements, and whether an element is empty or can include text

• used for encoding and validation

An XML schema for the Person structure

--

--

No responses yet