blog posts

Learning XML file in R programming language

XML is a file format that uses standard ASCLL text; Shares both file formats and data on the World Wide Web, the Internet, and elsewhere. 

In fact, XML stands for Extensible Markup Language.

XML file, Similar to HTML, this language contains markup tags. But unlike HTML where tagging tags describe the structure of the page; In xml markup tags, the meaning of the data it contains; Describes the.

You can read an xml file in R using the “XML” package. This package can be installed using the following command.

install.packages (“XML”)

Input data

Create an XML file by copying the following data into a text editor such as Notepad. Save the file with an .xml extension and select the file type as all files ().

<RECORDS>

<EMPLOYEE>

<ID> 1 </ID>

<NAME> Rick </NAME>

<SALARY> 623.3 </SALARY>

<STARTDATE> 1/1/2012 </STARTDATE>

<DEPT> IT </DEPT>

</EMPLOYEE>

<EMPLOYEE>

<ID> 2 </ID>

<NAME> Dan </NAME>

<SALARY> 515.2 </SALARY>

<STARTDATE> 9/23/2013 </STARTDATE>

<DEPT> Operations </DEPT>

</EMPLOYEE>

<EMPLOYEE>

<ID> 3 </ID>

<NAME> Michelle </NAME>

<SALARY> 611 </SALARY>

<STARTDATE> 11/15/2014 </STARTDATE>

<DEPT> IT </DEPT>

</EMPLOYEE>

<EMPLOYEE>

<ID> 4 </ID>

<NAME> Ryan </NAME>

<SALARY> 729 </SALARY>

<STARTDATE> 5/11/2014 </STARTDATE>

<DEPT> HR </DEPT>

</EMPLOYEE>

<EMPLOYEE>

<ID> 5 </ID>

<NAME> Gary </NAME>

<SALARY> 843.25 </SALARY>

<STARTDATE> 3/27/2015 </STARTDATE>

<DEPT> Finance </DEPT>

</EMPLOYEE>

<EMPLOYEE>

<ID> 6 </ID>

<NAME> Nina </NAME>

<SALARY> 578 </SALARY>

<STARTDATE> 5/21/2013 </STARTDATE>

<DEPT> IT </DEPT>

</EMPLOYEE>

<EMPLOYEE>

<ID> 7 </ID>

<NAME> Simon </NAME>

<SALARY> 632.8 </SALARY>

<STARTDATE> 7/30/2013 </STARTDATE>

<DEPT> Operations </DEPT>

</EMPLOYEE>

<EMPLOYEE>

<ID> 8 </ID>

<NAME> Guru </NAME>

<SALARY> 722.5 </SALARY>

<STARTDATE> 6/17/2014 </STARTDATE>

<DEPT> Finance </DEPT>

</EMPLOYEE>

</RECORDS>

Read XML file

The xml file is read by r using the xmlParse () function. This file is stored as a list in R.

# Load the package required to read XML files.

library (“XML”)

# Also load the other required package.

library (“methods”)

# Give the input file name to the function.

result <- xmlParse (file = “input.xml”)

# Print the result.

print (result)

When we run the above code; The following result is obtained:

1

Rick

623.3

1/1/2012

IT

۲

Dan

515.2

9/23/2013

Operations

3

Michelle

611

11/15/2014

IT

4

Ryan

729

5/11/2014

HR

5

Gary

843. 25

3/27/2015

Finance

6

Nina

578

5/21/2013

IT

7

Simon

632.8

7/30/2013

Operations

8

Guru

722.5

6/17/2014

Finance

Get the number of nodes in the XML file

# Load the packages required to read XML files.

library (“XML”)

library (“methods”)

# Give the input file name to the function.

result <- xmlParse (file = “input.xml”)

# Extract the root node form the xml file.

rootnode <- xmlRoot (result)

# Find number of nodes in the root.

rootsize <- xmlSize (rootnode)

# Print the result.

print (rootsize)

When we execute the above code; We get the following result:

output

[1] 8

Details of the first node

Let’s take a look at the first parsed file. This file gives us the idea that there are different elements in a high-level node.

# Load the packages required to read XML files.

library (“XML”)

library (“methods”)

# Give the input file name to the function.

result <- xmlParse (file = “input.xml”)

# Extract the root node form the xml file.

rootnode <- xmlRoot (result)

# Print the result.

print (rootnode [1])

When we run the above code; The following result is obtained:

$ EMPLOYEE

1

Rick

623.3

1/1/2012

IT

 

attr (, “class”)

[1] “XMLInternalNodeList” “XMLNodeList”

Get different elements of a node

# Load the packages required to read XML files.

library (“XML”)

library (“methods”)

# Give the input file name to the function.

result <- xmlParse (file = “input.xml”)

# Extract the root node form the xml file.

rootnode <- xmlRoot (result)

# Get the first element of the first node.

print (rootnode [[1]] [[1]])

# Get the fifth element of the first node.

print (rootnode [[1]] [[5]])

# Get the second element of the third node.

print (rootnode [[3]] [[2]])

When we run the above code; The following result is obtained:

1

IT

Michelle

For effective data management in very large files; We read the data inside the xml file as a data frame. Data framework process for data analysis.

# Load the packages required to read XML files.

library (“XML”)

library (“methods”)

# Convert the input xml file to a data frame.

xmldataframe <- xmlToDataFrame (“input.xml”)

print (xmldataframe)

When we run the above code; The following result is obtained:

ID NAME SALARY STARTDATE DEPT

1 1 Rick 623.30 2012-01-01 IT

2 2 Dan 515.20 2013-09-23 Operations

3 3 Michelle 611.00 2014-11-15 IT

4 4 Ryan 729.00 2014-05-11 HR

5 NA Gary 843.25 2015-03-27 Finance

6 6 Nina 578.00 2013-05-21 IT

7 7 Simon 632.80 2013-07-30 Operations

8 8 Guru 722.50 2014-06-17 Finance

Since data is now available as a data frame; We can use the data framework function to read and use files.