Learning XML File In R Programming Language
XML is a file Format that uses standard ASCII text; it Shares both file formats and data on the World Wide Web, the Internet, and elsewhere.
XML stands for Extensible Markup Language.
XML File, Similar to HTML, this language contains markup tags. However, unlike HTML, where tags describe the structure of the page, in XML, the meaning of the data they contain is explicitly defined.
You can read an XML File in R using the “XML” package. This package can be installed using the following Command.
install.packages (“XML”)
Input data
Create an XML File by copying the following data into a text editor such as Notepad. Save the File with an .xml extension and select ‘All files’ as the File type.
<RECORDS> <EMPLOYEE> <ID> 1 </ID> <NAME> Rick </NAME> <SALARY> 623.3 </SALARY> <STARTDATE> 1/1/2012 </STARTDATE> <DEPT> IT </DEPT> </EMPLOYEE> <EMPLOYEE> <ID> 2 </ID> <NAME> Dan </NAME> <SALARY> 515.2 </SALARY> <STARTDATE> 9/23/2013 </STARTDATE> <DEPT> Operations </DEPT> </EMPLOYEE> <EMPLOYEE> <ID> 3 </ID> <NAME> Michelle </NAME> <SALARY> 611 </SALARY> <STARTDATE> 11/15/2014 </STARTDATE> <DEPT> IT </DEPT> </EMPLOYEE> <EMPLOYEE> <ID> 4 </ID> <NAME> Ryan </NAME> <SALARY> 729 </SALARY> <STARTDATE> 5/11/2014 </STARTDATE> <DEPT> HR </DEPT> </EMPLOYEE> <EMPLOYEE> <ID> 5 </ID> <NAME> Gary </NAME> <SALARY> 843.25 </SALARY> <STARTDATE> 3/27/2015 </STARTDATE> <DEPT> Finance </DEPT> </EMPLOYEE> <EMPLOYEE> <ID> 6 </ID> <NAME> Nina </NAME> <SALARY> 578 </SALARY> <STARTDATE> 5/21/2013 </STARTDATE> <DEPT> IT </DEPT> </EMPLOYEE> <EMPLOYEE> <ID> 7 </ID> <NAME> Simon </NAME> <SALARY> 632.8 </SALARY> <STARTDATE> 7/30/2013 </STARTDATE> <DEPT> Operations </DEPT> </EMPLOYEE> <EMPLOYEE> <ID> 8 </ID> <NAME> Guru </NAME> <SALARY> 722.5 </SALARY> <STARTDATE> 6/17/2014 </STARTDATE> <DEPT> Finance </DEPT> </EMPLOYEE> </RECORDS>
Read the XML File
The XML File is read by R using the xmlParse() function. This File is stored as a List in R.
# Load the package required to read XML files. library (“XML”) # Also load the other required package. library (“methods”) # Give the input file name to the function. result <- xmlParse (file = "input.xml") # Print the result. print (result)
When we run the above code, the following result is obtained:
1 Rick 623.3 1/1/2012 IT 2 Dan 515.2 9/23/2013 Operations 3 Michelle 611 11/15/2014 IT 4 Ryan 729 5/11/2014 HR 5 Gary 843. 25 3/27/2015 Finance 6 Nina 578 5/21/2013 IT 7 Simon 632.8 7/30/2013 Operations 8 Guru 722.5 6/17/2014 Finance
Get the number of nodes in the XML File.
# Load the packages required to read XML files. library (“XML”) library (“methods”) # Give the input file name to the function. result <- xmlParse (file = "input.xml") # Extract the root node form the xml file. rootnode <- xmlRoot (result) # Find number of nodes in the root. rootsize <- xmlSize (rootnode) # Print the result. print (rootsize)
When we execute the above code, we get the following result:
output [1] 8
Details of the first node
Let’s take a look at the first parsed File. This File suggests that there are various elements within a high-level node.
# Load the packages required to read XML files. library (“XML”) library (“methods”) # Give the input file name to the function. result <- xmlParse (file = "input.xml") # Extract the root node form the xml file. rootnode <- xmlRoot (result) # Print the result. print (rootnode [1])
When we run the above code, the following result is obtained:
$ EMPLOYEE 1 Rick 623.3 1/1/2012 IT attr (“class") [1] "XMLInternalNodeList" "XMLNodeList"
Get different elements of a node.
# Load the packages required to read XML files. library (“XML”) library (“methods”) # Give the input file name to the function. result <- xmlParse (file = "input.xml") # Extract the root node form the xml file. rootnode <- xmlRoot (result) # Get the first element of the first node. print (rootnode [[1]] [[1]]) # Get the fifth element of the first node. print (rootnode [[1]] [[5]]) # Get the second element of the third node. print (rootnode [[3]] [[2]])
When we run the above code, the following result is obtained:
1 IT Michelle
For effective data management of huge files, we read the data inside the XML File as a data frame. Data framework process for data analysis.
# Load the packages required to read XML files. library (“XML”) library (“methods”) # Convert the input xml file to a data frame. xmldataframe <- xmlToDataFrame ("input.xml") print (xmldataframe)
When we run the above code, the following result is obtained:
ID NAME SALARY STARTDATE DEPT 1 1 Rick 623.30 2012-01-01 IT 2 2 Dan 515.20 2013-09-23 Operations 3 3 Michelle 611.00 2014-11-15 IT 4 4 Ryan 729.00 2014-05-11 HR 5 NA Gary 843.25 2015-03-27 Finance 6 6 Nina 578.00 2013-05-21 IT 7 7 Simon 632.80 2013-07-30 Operations 8 8 Guru 722.50 2014-06-17 Finance
Since the data is now available as a data frame, we can utilize the data frame function to read and process files.