What is XML

What is XML?

XML stands for Extensible Markup Language. It is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. XML is widely used for data representation and data interchange between systems.

Key Features of XML

Extensible:
- XML allows users to define their own tags and document structure. This flexibility makes it suitable for a wide range of applications.
Self-Descriptive:
- XML documents are self-descriptive, meaning that the data is structured in a way that describes its own meaning. This makes it easier to understand the data without needing additional documentation.
Platform and Language Independent:
- XML is not tied to any specific programming language or platform, making it a versatile choice for data interchange between different systems.
Hierarchical Structure:
- XML data is organized in a tree-like structure, with a single root element and nested child elements. This hierarchy allows for complex data representation.
Human-Readable:
- XML is designed to be easily readable by humans, which makes it easier to debug and understand.

XML Encoding

Encoding in the context of XML refers to the conversion of Unicode characters into their binary representation. This is important for ensuring that the data can be correctly interpreted by different systems and applications.

UTF Encoding

UTF stands for Unicode Transformation Format. It is a way to encode Unicode characters into binary format. There are several types of UTF encoding, with the most common being:

UTF-8:
- Description: UTF-8 is a variable-length encoding that uses 1 to 4 bytes to represent each character. It is backward compatible with ASCII, meaning that the first 128 Unicode characters (U+0000 to U+007F) are represented using a single byte, making it efficient for texts primarily in English.
- Usage: UTF-8 is widely used on the web and is the default encoding for many applications, including XML documents.
UTF-16:
- Description: UTF-16 uses 2 bytes for most characters but can use 4 bytes for characters outside the Basic Multilingual Plane (BMP). It is commonly used in environments where memory usage is a concern, such as in Windows applications.
- Usage: UTF-16 is often used in programming languages and frameworks that require Unicode support.

Example of XML

Here’s a simple example of an XML document representing a list of books:

xml
1<?xml version="1.0" encoding="UTF-8"?>
2<library>
3    <book>
4        <title>XML Developer's Guide</title>
5        <author>John Doe</author>
6        <year>2021</year>
7        <price>29.99</price>
8    </book>
9    <book>
10        <title>Learning XML</title>
11        <author>Jane Smith</author>
12        <year>2020</year>
13        <price>39.99</price>
14    </book>
15</library>

Explanation of the Example

Root Element: <library> is the root element that contains all other elements.
Child Elements: Each <book> element represents a book and contains child elements such as <title>, <author>, <year>, and <price>.
Self-Descriptive: The tags used in the XML document describe the data they contain, making it easy to understand.

XML Documents must have a Root Element :

XML document must contain one root element that is the parent of all other element.

XML Prolog :

The XML prolog is a declaration that appears at the very beginning of an XML document. It provides important information about the XML document, such as the version of XML being used and the character encoding of the document. The prolog is optional, but it is recommended to include it for clarity and to ensure proper parsing of the XML document.

Structure of the XML Prolog

The XML prolog typically has the following structure:

xml
1<?xml version="1.0" encoding="UTF-8"?>

Components of the XML Prolog

XML Declaration:
- The prolog starts with the <?xml declaration, which indicates that the document is an XML document.
Version:
- The version attribute specifies the version of XML being used. The most common version is 1.0. This attribute is mandatory.
- Example: version="1.0"
Encoding:
- The encoding attribute specifies the character encoding used in the document. This attribute is optional but highly recommended, especially if the document contains characters outside the ASCII range.
- Common encodings include UTF-8, UTF-16, and ISO-8859-1.
- Example: encoding="UTF-8"
Standalone (Optional):
- The standalone attribute can be included to indicate whether the document relies on external markup declarations (like DTDs). It can have two possible values:
  - yes: The document is self-contained and does not rely on external resources.
  - no: The document may rely on external resources.
- Example: standalone="yes"

Example of an XML Prolog

Here’s an example of an XML document with a prolog:

xml
1<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
2<library>
3    <book>
4        <title>XML Developer's Guide</title>
5        <author>John Doe</author>
6        <year>2021</year>
7        <price>29.99</price>
8    </book>
9</library>

Importance of the XML Prolog

Parsing: The prolog helps XML parsers understand how to interpret the document. For example, knowing the encoding allows the parser to correctly read the characters in the document.
Compatibility: Specifying the XML version ensures compatibility with different XML processors and tools.
Clarity: Including the prolog makes it clear to anyone reading the XML document what version and encoding are being used, which can be important for data interchange and integration.

Note :

XML tags are case sensitive.
opening and closing tags must be written with same case .

Entity References :

Some characters have a special meaning in XML .

if you place a character like "<" inside a XML element it will generate a error because the parser interprets it as the start of a new element.

xml
1<message>Salary>2000</message>

This will give error .

To avoid this errors, replace the "<" character with an entity references .

xml
1<message>Salary&gt;2000</message>

There are five pre-defined entity references in XML :

< < less than
> > greater than
& & ampresand
' ' apostrophe
&quot " quotaion mark