XML

Top  Previous  Next

What is XML?

 

XML is a standard, simple, self-describing way of encoding both text and data so that content can be processed with relatively little human intervention and exchanged across diverse hardware, operating systems, and applications.

 

In brief, XML offers a widely adopted standard way of representing text and data in a format that can be processed without much human or machine intelligence. Information formated in XML can be exchanged across platforms, languages, and applications, and can be used with a wide range of development tools and utilities.

 

Source:http://www.softwareag.com/xml/about/starters.htm

 

Introduction

 

XML (Extensible Markup Language) is a general-purpose specification for creating custom markup languages. It is classified as an extensible language, because it allows the user to define the mark-up elements. XML's purpose is to aid information systems in sharing structured data, especially via the Internet, to encode documents, and to serialize data; in the last context, it compares with text-based serialization languages such as JSON, YAML and S-Expressions.

 

XML's set of tools helps developers in creating web pages but its usefulness goes well beyond that. XML, in combination with other standards, makes it possible to define the content of a document separately from its formatting, making it easy to reuse that content in other applications or for other presentation environments.

Most importantly, XML provides a basic syntax that can be used to share information between different kinds of computers, different applications, and different organizations without needing to pass through many layers of conversion.

 

XML began as a simplified subset of the Standard Generalized Markup Language (SGML), meant to be readable by people via semantic constraints; application languages can be implemented in XML. These include XHTML, RSS, MathML, GraphML, Scalable Vector Graphics, MusicXML, and others. Moreover, XML is sometimes used as the specification language for such application languages.

 

XML is recommended by the World Wide Web Consortium (W3C). It is a fee-free open standard. The recommendation specifies lexical grammar and parsing requirements.

 

HINTS for using XML in your solutions/projects

 

1.a Allowed characters in XML files

As <, > and &, have a special meaning in XML files (solutions, projects etc.) it is not allowed to use them in descriptions, titles etc.

Notice: in your XML files you have to substitute following operators:

     

 Examples:

  rule="a < b & c > b" - wrong

  rule="a &lt; b &amp; c &gt; b" - correct

 

  <description> ... uses Lintul2 & Slim</description> - wrong

  <description> ... uses Lintul2 &amp; Slim</description> - correct;

 

1.b Quotes within quotes in XML files

Obviously this can not work: title="A streetcar named "Desire"". To use quotes within xml attributes you have these alternatives:

- use single quotes for the qouted text: title="A streetcar named 'Desire'"

- use single quotes to enclose the attribute: title='A streetcar named "Desire"'

- use &quot; for ": title="A streetcar named &quot;Desire&quot;"

 

2. Order of attributes for ressources

Technically the order of attributes does not matter, but for readability one should keep an order, like:

       1. id

       2. datatype

       3. unit

       4. description

 5a. key    ( for <res>)

 5b. rule   (for <mgm>, <out> etc.)

 5c. source (in <inputs>

 

3. When to add units=""

       - Has the variable a dimension / is used for calculation? - yes

 - Is the variable a DATE, a CHAR, a BOOLEAN or acts as a switch? - no

 

4. Where to add units?

       - to <var> and <dyn>

 - to <res>

 - to <parameter> (variables fom datafiles that are in xml-format)

 

5. Where units should be omitted?

 - input (unit is determined by the SimVariable)

       - out (unit is determined by the SimVariable)

 

Correctness

 

An XML document has two correctness levels:

 

Well-formed. A well-formed document conforms to the XML syntax rules; e.g. if a start-tag (< >) appears without a corresponding end-tag (</>), it is not well-formed. A document not well-formed is not in XML; a conforming parser is disallowed from processing it.

Valid. A valid document additionally conforms to semantic rules, either user-defined or in an XML schema, especially DTD; e.g. if a document contains an undefined element, then it is not valid; a validating parser is disallowed from processing it.

 

Well-formedness

 

If only a well-formed element is required, XML is a generic framework for storing any amount of text or any data whose structure can be represented as a tree. The only indispensable syntactical requirement is that the document has exactly one root element (also known as the document element), i.e. the text must be enclosed between a root start-tag and a corresponding end-tag, known as a "well-formed" XML document:

 

xml_fig1

 

The root element can be preceded by an optional (for XML 1.0 only) XML declaration element stating what XML version is in use (normally 1.0); it might also contain character encoding and external dependencies information. Starting with XML version 1.1, this declaration becomes mandatory. This is necessary, as an XML document without an XML declaration is assumed to be a version 1.0 document.

 

xml_fig2

 

Comments can be placed anywhere in the tree, including in the text if the content of the element is text or #PCDATA.

 

xml_fig3

 

XML comments start with <!-- and end with -->. Two consecutive dashes (--) may not appear anywhere in the text of the comment.

In any meaningful application, additional markup is used to structure the contents of the XML document. The text enclosed by the root tags may contain an arbitrary number of XML elements. The basic syntax for one element is:

 

xml_fig4

 

 

The two instances of «element_name» are referred to as the start-tag and end-tag, respectively. Here, «Element Content» is some text which may again contain XML elements. So, a generic XML document contains a tree-based data structure. Here is an example of a structured XML document:

 

xml_fig5

 

Attribute values must always be quoted, using single or double quotes, and each attribute name may appear only once in any single element.

 

XML requires that elements be properly nested—elements may never overlap, and so must be closed in the order opposite to which they are opened. For example, this fragment of code below cannot be part of a well-formed XML document because the title and author elements are closed in the wrong order:

 

xml_fig6

 

One way of writing the same information in a way which could be incorporated into a well-formed XML document is as follows:

 

xml_fig7

 

XML provides special syntax for representing an element with empty content. Instead of writing a start-tag followed immediately by an end-tag, a document may contain an empty-element tag. An empty-element tag resembles a start-tag but contains a slash just before the closing angle bracket. The following three examples are equivalent in XML:

 

xml_fig8

 

An empty-element may contain attributes:

 

xml_fig9

 

 

Validity

 

By leaving the names, allowable hierarchy, and meanings of the elements and attributes open and definable by a customizable schema or DTD, XML provides a syntactic foundation for the creation of purpose-specific, XML-based markup languages. The general syntax of such languages is rigid — documents must adhere to the general rules of XML, ensuring that all XML-aware software can at least read and understand the relative arrangement of information within them. The schema merely supplements the syntax rules with a set of constraints. Schemas typically restrict element and attribute names and their allowable containment hierarchies, such as only allowing an element named 'birthday' to contain one element named 'month' and one element named 'day', each of which has to contain only character data. The constraints in a schema may also include data type assignments that affect how information is processed; for example, the 'month' element's character data may be defined as being a month according to a particular schema language's conventions, perhaps meaning that it must not only be formatted a certain way, but also must not be processed as if it were some other type of data.

 

An XML document that complies with a particular schema/DTD, in addition to being well-formed, is said to be valid.

 

For further information regarding XML and DTD please read:XML,DTD

 

Source:http://en.wikipedia.org/wiki/Xml