blog posts

Learning CSV files in R programming language

In the programming language R; We can read data from files stored outside the R environment. We can also write data into files that will be stored by the operating system and made available. 

R can read and write files with different formats such as csv, excel, xml, etc.

In this chapter, we will learn to read data from a csv file and then write the data into a csv file. The file must be in the directory in which it is running; This tribute can read that file. Of course, we can set our own directory and read files from it.

Get and set working directory

You can use the getwd () function to check which directory R workspace refers to. You can also set up a new task directory using the setwd () function.

# Get and print current working directory.

print (getwd ())

# Set current working directory.

setwd (“/ web / com”)

# Get and print current working directory.

print (getwd ())

When we run the above code; The following result is obtained:

[1] “/ web / com / 1441086124_2016”

[1] “/ web / com”

This result depends on your OS and the current directory you are working with.

Input as CSV file

This csv file is a text file in which the values ​​in the columns are separated using commas. Consider the following data, which is available under the name input.csv.

You can create this file using Windows Notepad; You can also copy and paste this data. Using the saveAsAll files () option in Notepad; Save the file as input.csv.

id, name, salary, start_date, dept

1, Rick, 623.3,2012-01-01, IT

2, Dan, 515.2,2013-09-23, Operations

3, Michelle, 611,2014-11-15, IT

4, Ryan, 729,2014-05-11, HR

5, Gary, 843.25,2015-03-27, Finance

6, Nina, 578,2013-05-21, IT

7, Simon, 632.8,2013-07-30, Operations

8, Guru, 722.5,2014-06-17, Finance

Read a CSV file

The following is a simple example of a read.csv () function for reading a CSV file that is available in your current job directory.

data <- read.csv (“input.csv”)

print (data)

When we run the above code; We get the following result:

id, name, salary, start_date, dept

1 1 Rick 623.30 2012-01-01 IT

2 2 Dan 515.20 2013-09-23 Operations

3 3 Michelle 611.00 2014-11-15 IT

4 4 Ryan 729.00 2014-05-11 HR

5 NA Gary 843.25 2015-03-27 Finance

6 6 Nina 578.00 2013-05-21 IT

7 7 Simon 632.80 2013-07-30 Operations

8 8 Guru 722.50 2014-06-17 Finance

CSV file analysis

By default, the read.csv () function generates the output as a data frame. This can be easily checked as follows. We can also check the number of columns and rows.

When we run the above code; The following result is obtained:

[1] TRUE

[1] 5

[1] 8

When we read this data in the data frame; As described in the previous section; We can apply all executable functions to the data frame.

Get the maximum salary

# Create a data frame.

data <- read.csv (“input.csv”)

# Get the max salary from data frame.

sal <- max (data $ salary)

print (sal)

When we run the above code; The following results are obtained:

[1] 843. 25

Receive personal information with maximum salary

We can fetch rows that meet a certain filter criteria, similar to SQL, fetch and locate.

# Create a data frame.

data <- read.csv (“input.csv”)

# Get the max salary from data frame.

sal <- max (data $ salary)

# Get the person detail having max salary.

retval <- subset (data, salary == max (salary))

print (retval)

When we run the above code; We get the following results:

id name salary start_date dept

5 NA Gary 843.25 2015-03-27 Finance

Access to people working in the IT department.

# Create a data frame.

data <- read.csv (“input.csv”)

retval <- subset (data, dept == “IT”)

print (retval)

When we run the above code; The following result is obtained:

id name salary start_date dept

1 1 Rick 623.3 2012-01-01 IT

3 3 Michelle 611.0 2014-11-15 IT

6 6 Nina 578.0 2013-05-21 IT

Access to people who work in the IT department and their salary is more than 600.

# Create a data frame.

data <- read.csv (“input.csv”)

info <- subset (data, salary> 600 & dept == “IT”)

print (info)

When the above code is executed; The following results are obtained:

id name salary start_date dept

1 1 Rick 623.3 2012-01-01 IT

3 3 Michelle 611.0 2014-11-15 IT

Access to people who joined the department in 2014.

# Create a data frame.

data <- read.csv (“input.csv”)

retval <- subset (data, as.Date (start_date)> as.Date (“2014-01-01”))

print (retval)

When we run the above code; The following results are obtained:

id name salary start_date dept

3 3 Michelle 611.00 2014-11-15 IT

4 4 Ryan 729.00 2014-05-11 HR

5 NA Gary 843.25 2015-03-27 Finance

8 8 Guru 722.50 2014-06-17 Finance

Write to a CSV file

R can create a csv file from an existing data frame. The write.csv () function is used to create a csv file. This file is created in the working directory.

# Create a data frame.

data <- read.csv (“input.csv”)

retval <- subset (data, as.Date (start_date)> as.Date (“2014-01-01”))

# Write filtered data into a new file.

write.csv (retval, ”output.csv”)

newdata <- read.csv (“output.csv”)

print (newdata)

When we run the above code, we get the following result:

X id name salary start_date dept

1 3 3 Michelle 611.00 2014-11-15 IT

2 4 4 Ryan 729.00 2014-05-11 HR

3 5 NA Gary 843.25 2015-03-27 Finance

4 8 8 Guru 722.50 2014-06-17 Finance

Here column X comes from the newer data set. This column can be used with additional parameters when writing a file; Deleted.

# Create a data frame.

data <- read.csv (“input.csv”)

retval <- subset (data, as.Date (start_date)> as.Date (“2014-01-01”))

# Write filtered data into a new file.

write.csv (retval, ”output.csv”, row.names = FALSE)

newdata <- read.csv (“output.csv”)

print (newdata)

When we run the above code; We get the following result:

id name salary start_date dept

1 3 Michelle 611.00 2014-11-15 IT

۲ 4 Ryan 729.00 2014-05-11 HR

3 NA Gary 843.25 2015-03-27 Finance

4 8 Guru 722.50 2014-06-17 Finance