blog posts

Web data in R programming language

Using R programs, we can extract specific data from these types of websites with the help of programming. There are some packages in R that are used to extract data from the web; These include “RCurl”, “XML” and “stringer”.

Programming language, There are many websites that provide data for their users. for example; The World Health Organization (WHO) provides reports on health and medical information in the form of CSV, txt and XML files. 


They are used to link to URLs, specify required links for files, and download them locally.

Install R packages

The following packages are required for processing URLs and links to the destination file. If this package is not available in your R environment; You can install them using the following commands:

Input data

We will view the meteorological data URL and download the CSV files using R for 2015.


We will use the getHTMLLinks () function to collect the URLs of the files. Then we use the download.file () function to store the files on the local system. Because we want to apply the same code over and over again to multiple files; We will create a function to be call multiple times. The filenames are passed to this function in the form of parameters in the form of an R list object.

# Read the URL.

url <- “”

# Gather the html links present in the webpage.

links <- getHTMLLinks (url)

# Identify only the links which point to the JCMB 2015 files.

filenames <- links [str_detect (links, “JCMB_2015”)]

# Store the file names as a list.

filenames_list <- as.list (filenames)

# Create a function to download the files by passing the URL and filename list.

downloadcsv <- function (mainurl, filename) {

filedetails <- str_c (mainurl, filename)

download.file (filedetails, filename)


# Now apply the l_ply function and save the files into the current R working directory.

l_ply (filenames, downloadcsv, mainurl = “”)

Confirm file download

After executing the above code, you can find the following files in the current directory; Put.

“JCMB_2015.csv” “JCMB_2015_Apr.csv” “JCMB_2015_Feb.csv” “JCMB_2015_Jan.csv”