blog posts

Learning binary files in R programming language

A binary file is a file that contains only information stored in the form of bits and bytes (0 and 1). They are not readable by humans; Because the bytes in it are translated into characters and symbols that contain many other non-printable characters. 

Attempt to read a binary file using any text editor; Represents characters such as Ø and..

The binary file must be read by certain programs in order to be usable. For example, a Microsoft Word application binary can only be used using Word; Made legible for humans. This program in addition to showing readable text for humans; It includes a lot of other information such as formatting the characters and page numbers and که which are also stored with the alphabetic characters and finally a binary file is a continuous sequence of bytes. This breakline that we see in a text file; Is a character that connects to the next first line.

Sometimes data generated by other programs must be processed by R as a binary file. R must also create a binary file that can be shared with other programs.

In the R programming language, there are two functions for creating and reading binary files, which are writeBin () and readBin (), respectively.

Syntax

writeBin (object, con)

readBin (con, what, n)

Parameters used in the above code; as follows:

  • con is the connection object for reading and writing binary files.
  • object is a binary file that must be written.
  • what is a state such as characters, integers, and so on. Indicates bytes to be read.
  • n is the number of bytes to be read from the binary file.

Example

We first consider “mtcars”, which is internal R data. First we can create a csv file from it and convert it to a binary file and save it as an OS file. Then we read this binary created in R.

Write a binary file

We read the “mtcars” data frame as a csv file and then write it as a binary file for the operating system.

# Read the “mtcars” data frame as a csv file and store only the columns

“Cyl”, “am” and “gear”.

write.table (mtcars, file = “mtcars.csv”, row.names = FALSE, na = “”,

col.names = TRUE, sep = “,”)

# Store 5 records from the csv file as a new data frame.

new.mtcars <- read.table (“mtcars.csv”, sep = “,”, header = TRUE, nrows = 5)

# Create a connection object to write the binary file using “wb” mode.

write.filename = file (“/ web / com / binmtcars.dat”, “wb”)

# Write the column names of the data frame to the connection object.

writeBin (colnames (new.mtcars), write.filename)

# Write the records in each of the column to the file.

writeBin (c (new.mtcars $ cyl, new.mtcars $ am, new.mtcars $ gear), write.filename)

# Close the file for writing so that it can be read by other program.

close (write.filename)

Read binary file

The binary created above stores all the data in the form of continuous bytes. So we read that file by choosing the right name and the right values ​​for the columns.

# Create a connection object to read the file in binary mode using “rb”.

read.filename <- file (“/ web / com / binmtcars.dat”, “rb”)

# First read the column names. n = 3 as we have 3 columns.

column.names <- readBin (read.filename, character (), n = 3)

# Next read the column values. n = 18 as we have 3 column names and 15 values.

read.filename <- file (“/ web / com / binmtcars.dat”, “rb”)

bindata <- readBin (read.filename, integer (), n = 18)

# Print the data.

print (bindata)

# Read the values ​​from 4th byte to 8th byte which represents “cyl”.

cyldata = bindata [4: 8]

print (cyldata)

# Read the values ​​form 9th byte to 13th byte which represents “am”.

amdata = bindata [9:13]

print (amdata)

# Read the values ​​form 9th byte to 13th byte which represents “gear”.

geardata = bindata [14:18]

print (geardata)

# Combine all the read values ​​to a dat frame.

finaldata = cbind (cyldata, amdata, geardata)

colnames (finaldata) = column.names

print (finaldata)

When we run the above code; The following results and diagrams are executable:

[1] 7108996 1728081249 7496037 6 6 4

[7] 6 8 1 1 1 0

[13] 0 4 4 4 3 3

[1] 6 6 4 6 8

[1] 1 1 1 0 0

[1] 4 4 4 3 3

cyl am gear

[1,] 6 1 4

[2,] 6 1 4

[3,] 4 1 4

[4,] 6 0 3

[5,] 8 0 3

as you see; The main data by reading the binary file in R; We have returned.