What is XML?
XML stands for Extensible Markup Language. It is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. XML is widely used for data representation and data interchange between systems.
Key Features of XML
Extensible:
- XML allows users to define their own tags and document structure. This flexibility makes it suitable for a wide range of applications.
Self-Descriptive:
- XML documents are self-descriptive, meaning that the data is structured in a way that describes its own meaning. This makes it easier to understand the data without needing additional documentation.
Platform and Language Independent:
- XML is not tied to any specific programming language or platform, making it a versatile choice for data interchange between different systems.
Hierarchical Structure:
- XML data is organized in a tree-like structure, with a single root element and nested child elements. This hierarchy allows for complex data representation.
Human-Readable:
- XML is designed to be easily readable by humans, which makes it easier to debug and understand.
XML Encoding
Encoding in the context of XML refers to the conversion of Unicode characters into their binary representation. This is important for ensuring that the data can be correctly interpreted by different systems and applications.
UTF Encoding
UTF stands for Unicode Transformation Format. It is a way to encode Unicode characters into binary format. There are several types of UTF encoding, with the most common being:
UTF-8:
- Description: UTF-8 is a variable-length encoding that uses 1 to 4 bytes to represent each character. It is backward compatible with ASCII, meaning that the first 128 Unicode characters (U+0000 to U+007F) are represented using a single byte, making it efficient for texts primarily in English.
- Usage: UTF-8 is widely used on the web and is the default encoding for many applications, including XML documents.
UTF-16:
- Description: UTF-16 uses 2 bytes for most characters but can use 4 bytes for characters outside the Basic Multilingual Plane (BMP). It is commonly used in environments where memory usage is a concern, such as in Windows applications.
- Usage: UTF-16 is often used in programming languages and frameworks that require Unicode support.
Example of XML
Here’s a simple example of an XML document representing a list of books:
xml1<?xml version="1.0" encoding="UTF-8"?> 2<library> 3 <book> 4 <title>XML Developer's Guide</title> 5 <author>John Doe</author> 6 <year>2021</year> 7 <price>29.99</price> 8 </book> 9 <book> 10 <title>Learning XML</title> 11 <author>Jane Smith</author> 12 <year>2020</year> 13 <price>39.99</price> 14 </book> 15</library>
Explanation of the Example
- Root Element:
<library>
is the root element that contains all other elements. - Child Elements: Each
<book>
element represents a book and contains child elements such as<title>
,<author>
,<year>
, and<price>
. - Self-Descriptive: The tags used in the XML document describe the data they contain, making it easy to understand.
XML Documents must have a Root Element :
XML Prolog :
The XML prolog is a declaration that appears at the very beginning of an XML document. It provides important information about the XML document, such as the version of XML being used and the character encoding of the document. The prolog is optional, but it is recommended to include it for clarity and to ensure proper parsing of the XML document.
Structure of the XML Prolog
The XML prolog typically has the following structure:
xml1<?xml version="1.0" encoding="UTF-8"?>
Components of the XML Prolog
XML Declaration:
- The prolog starts with the
<?xml
declaration, which indicates that the document is an XML document.
- The prolog starts with the
Version:
- The
version
attribute specifies the version of XML being used. The most common version is1.0
. This attribute is mandatory. - Example:
version="1.0"
- The
Encoding:
- The
encoding
attribute specifies the character encoding used in the document. This attribute is optional but highly recommended, especially if the document contains characters outside the ASCII range. - Common encodings include
UTF-8
,UTF-16
, andISO-8859-1
. - Example:
encoding="UTF-8"
- The
Standalone (Optional):
- The
standalone
attribute can be included to indicate whether the document relies on external markup declarations (like DTDs). It can have two possible values:yes
: The document is self-contained and does not rely on external resources.no
: The document may rely on external resources.
- Example:
standalone="yes"
- The
Example of an XML Prolog
Here’s an example of an XML document with a prolog:
xml1<?xml version="1.0" encoding="UTF-8" standalone="yes"?> 2<library> 3 <book> 4 <title>XML Developer's Guide</title> 5 <author>John Doe</author> 6 <year>2021</year> 7 <price>29.99</price> 8 </book> 9</library>
Importance of the XML Prolog
Parsing: The prolog helps XML parsers understand how to interpret the document. For example, knowing the encoding allows the parser to correctly read the characters in the document.
Compatibility: Specifying the XML version ensures compatibility with different XML processors and tools.
Clarity: Including the prolog makes it clear to anyone reading the XML document what version and encoding are being used, which can be important for data interchange and integration.
Note :
- XML tags are case sensitive.
- opening and closing tags must be written with same case .
Entity References :
xml1<message>Salary>2000</message>
xml1<message>Salary>2000</message>
- < < less than
- > > greater than
- & & ampresand
- ' ' apostrophe
- " " quotaion mark