How to Parse and Print XML File in Tree Form using libxml2 in C Programming?

Here we’ll see how to write C program to print XML file on the screen. XML file is widely used to store and transport data over internet. Parsing and using the data from an XML file is basic programming requirement.

Format of XML file

Before jumping into the code, it is important to understand the basic format of an XML file. Here is an sample XML file.

XML is a markup language like HTML but here the tags are not predefined set. Any name can be used as tag in case off XML format, that’s why it is called eXtensible Markup Language. Few important constructs we need to know about the XML file.
Tag: tag is the basic markup construct of the XML file which begins with < and ends with >. In the example XML file the example of tags are <catalog> and <book> etc. It could be of three types, 1) start tag such as <catalog>, 2) end tag such as </catalog> and 3) empty element tag such as <catalog />. Example of empty element tag is not available in this example XML file.
Element: Element is logical document component which is the main building block of an XML file. It generally starts with a start tag and ends with an end tag. It could be an empty element tag also. The characters between the start tag and end tag, if any, are call the content of the element. Element can contain markup including other elements which are call children. Our example file contains one big element catalog which has few book elements. We can imagine an XML a file as a hierarchical tree structure of elements.
Attribute: Attribute is also a markup construct which is basically a name-value pair. It exists inside a start tag or empty element tag. In our XML file id is an example of an attribute in <book id=”bk101″> start tag.

C Program to Parse and Print XML file

The C program below can read any XML file and print in a tree structure. We’ll use the above XML file as the input of the program. File name is hard- coded in the program. One important thing to note that standard C libraries does not include the functionality to parse XML file. For that I used libxml2. To install libxml2 development package on RedHat based Linux, use this command.

For Debian based Linux, use this command.

Here is the complete C program.

In the main() function above xmlReadFile() loads and parses the XML file (dummy.xml) and returns the document tree. We get the root element of the XML from the document tree using the xmlDocGetRootElement() libxml2 function.

The root node (element) of the XML tree is passed to the print_xml() function to print the whole XML content in hierarchical form. This function traverse all siblings of the input node (including the passed node). If a node is of type ELEMENT then it prints some information about the node. libxml2 keeps other type of nodes also as sibling of the ELEMENT type node. That’s why we are skipping all node except ELEMENT type node. Tag name is printed and if the node is a leaf node, then we print content of the node, otherwise, we print the value of “id” attribute. We are not printing the content of non-leaf nodes because libxml2 return content of all nested children as the content of the node. The the content will be lengthy and repeated. Apart from print the information of the node, we are also call the same function print_xml() recursively for the children of the current node. This way all nodes will be printed.

The above program can be compiled by this command.

Output of the Program

2 thoughts on “How to Parse and Print XML File in Tree Form using libxml2 in C Programming?”

  1. Thanks a lot by sharing the documents.
    I need some more information regarding the xml parsing using c program in eclipse and visual studio 2013.
    If any document is there please share.
    thanks in advance

Leave a Reply

Your email address will not be published. Required fields are marked *