Parse and Print XML File in Tree Form using libxml2 in C

XML file is widely used format to store and transport data over internet. Parsing XML file is a very basic programming requirement. Here we’ll see how to parse and print the content of an XML file in C programming language.

XML File Format

Before jumping into the code, we should understand basic format of an XML file. We’ll this XML file as an example.

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies,
      an evil sorceress, and her own childhood to become queen
      of the world.</description>
   </book>
   <book id="bk103">
      <author>Corets, Eva</author>
      <title>Maeve Ascendant</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-11-17</publish_date>
      <description>After the collapse of a nanotechnology
      society in England, the young survivors lay the
      foundation for a new society.</description>
   </book>
</catalog>

XML is a markup language like HTML. Unlike HTML, the tags are not predefined. Any string can be used as XML tag. That’s why it is called eXtensible Markup Language. Here are a few XML construct you should be aware of.

Tag: tag is most important XML markup construct that starts with ‘ < ‘ and ends with ‘ > ‘. <catalog> and <book> are the examples of tag in our example XML file. It could be of three types, 1) Start Tag such as <catalog>, 2) End Tag such as </catalog> and 3) Empty Tag such as <catalog />. Empty tag is not present in our example.

Element: Element is logical document component of an XML file. It generally starts with a start tag and ends with an end tag. It could be an empty element tag also. The characters between the start tag and end tag, if any, are call the content of the element. Element can contain markup including other elements which are called children. Our example file contains one big element catalog which has few book elements. We can imagine an XML file as a hierarchical tree structure of elements.

Attribute: Attribute is also a markup construct which is basically a name-value pair. It exists inside a start or empty element tag. In our XML file ‘ id ‘ is an example of an attribute in <book id=”bk101″> tag.

C Program to Parse and Print XML File

Standard C library does not provide XML parser. I used libxml2. parser. So you have to explicitly install the libxml2 development library. If you don’t have it installed already, run the following command to install it.

For Redhat based Linux:

yum install libxml2-devel

And for Debian based Linux.

apt-get install libxml2-dev

The Program

#include <stdio.h>
#include <libxml/parser.h>

/*gcc `xml2-config --cflags --libs` test.c*/

int is_leaf(xmlNode * node)
{
  xmlNode * child = node->children;
  while(child)
  {
    if(child->type == XML_ELEMENT_NODE) return 0;

    child = child->next;
  }

  return 1;
}

void print_xml(xmlNode * node, int indent_len)
{
    while(node)
    {
        if(node->type == XML_ELEMENT_NODE)
        {
          printf("%*c%s:%s\n", indent_len*2, '-', node->name, is_leaf(node)?xmlNodeGetContent(node):xmlGetProp(node, "id"));
        }
        print_xml(node->children, indent_len + 1);
        node = node->next;
    }
}

int main(){
  xmlDoc *doc = NULL;
  xmlNode *root_element = NULL;

  doc = xmlReadFile("dummy.xml", NULL, 0);

  if (doc == NULL) {
    printf("Could not parse the XML file");
  }

  root_element = xmlDocGetRootElement(doc);

  print_xml(root_element, 1);

  xmlFreeDoc(doc);

  xmlCleanupParser();
}

This program first reads the XML file using the xmlReadFile() function. The file name is hard-code as ‘dummy.xml’. This file needs to be present before running the program. The xmlReadFile() function returns an XML document tree. We get the root element of the XML from the document tree using the xmlDocGetRootElement() function.

The root node (element) of the XML tree is passed to the print_xml() function to print the whole XML content in hierarchical form. This function traverses all siblings of the input node (including the passed node). If a node is of type ELEMENT then it prints some information about the node. libxml2 keeps few other type of nodes also as sibling of the ELEMENT type node. That’s why we are skipping all node except ELEMENT type node. Tag name is printed. And if the node is a leaf node, then we print content of the node, otherwise, we print the value of “id” attribute.

We are not printing the content of non-leaf nodes because libxml2 returns content of all nested children as the content of the node. The the content will be lengthy and repeated. Apart from printing the information of the node, we are also calling the same function print_xml() recursively for the children of the current node. This way all nodes will get printed.

To compile this program, run this command.

gcc `xml2-config --cflags --libs` test.c

Here is the output of the program.

-catalog:(null)
   -book:bk101
     -author:Gambardella, Matthew
     -title:XML Developer's Guide
     -genre:Computer
     -price:44.95
     -publish_date:2000-10-01
     -description:An in-depth look at creating applications
      with XML.
   -book:bk102
     -author:Ralls, Kim
     -title:Midnight Rain
     -genre:Fantasy
     -price:5.95
     -publish_date:2000-12-16
     -description:A former architect battles corporate zombies,
      an evil sorceress, and her own childhood to become queen
      of the world.
   -book:bk103
     -author:Corets, Eva
     -title:Maeve Ascendant
     -genre:Fantasy
     -price:5.95
     -publish_date:2000-11-17
     -description:After the collapse of a nanotechnology
      society in England, the young survivors lay the
      foundation for a new society.

Author: Srikanta

I write here to help the readers learn and understand computer programing, algorithms, networking, OS concepts etc. in a simple way. I have 20 years of working experience in computer networking and industrial automation.


If you also want to contribute, click here.

9 thoughts on “Parse and Print XML File in Tree Form using libxml2 in C”

  1. I am using CodeBlocks IDE on Debian Buster and after downloading
    libxml2-devel the program would not build.

    There are couple tricks :

    1) Go to /usr/include/libxml2
    Copy folder libxml to /usr/include so the path to the contents will
    be /usr/include/libxml
    2) Open the Project->Build Options -> Linker Settings tab -> Link Libraries
    Add /usr/lib/x86_64-linux-gnu/libxml2.so to the list.

    THEN it will find everything and build properly.

    If you are using some other IDE then step 2 will be different of course.
    If you are building the hard way from the command line then you will
    have to inform the linker as to where libxml2.so is located.

    1. Try this function:

      void save_xml_to_csv(xmlNode * node, FILE *fp, int indent_len)
      {
      int i = 0;
      while(node)
      {
      if(node->type == XML_ELEMENT_NODE)
      {
      for(i = 0; i name, is_leaf(node)?xmlNodeGetContent(node):xmlGetProp(node, “id”));
      }
      save_xml_to_csv(node->children, fp, indent_len + 1);
      node = node->next;
      }
      }

      Call this function like:

      doc = xmlReadFile(“dummy.xml”, NULL, 0);
      root_element = xmlDocGetRootElement(doc);
      FILE *fp = fopen(“output.csv”, “w”);
      save_xml_to_csv(root_element, fp, 0);
      fclose(fp);
      xmlFreeDoc(doc);
      xmlCleanupParser();

      Another important thing: You should not have any comma (,) or new line in the xml file. That will distort the output csv file. The example xml file has both comma and new line. You have to remove them and try.

  2. Thanks a lot by sharing the documents.
    I need some more information regarding the xml parsing using c program in eclipse and visual studio 2013.
    If any document is there please share.
    thanks in advance

Leave a Reply

Your email address will not be published. Required fields are marked *

5
5
0
4
10
12