Familiarity with the use of Python and its important features
Science or Dytasayns in recent applications the science has become is. Success in today’s world to earn a sector-dependent, especially in the digital domain with data and big data is.
Python Pandas, Data scientists are using a series of tools to organizations in areas ranging from employment to help them. Python is a great language by Scientist data are used to.
Among the various Python, pandas one packageThe language is a valuable tool for working with data to account for it.
What is a Pandas?
Pandas ( Pandas ) package in Python is like home for data is considered to be. Data on the pandas were doing something different it possible for you to be. Basic most pandas tools to sort, transfer and analyze data. For example, suppose you have a database in CSV format . When the pandas arrive dataset to be, for a table really is.
The user can work on the table statistics such as mean making and distribution and correlation detection field to do. Users also can erase additions and blanks, the data are arranged ( Clean ) track. Another possibility is to display the information in the table in the form of various graphs. Basic applications of pandas are:
- Receive the
- Basic familiarity with the structure of the data (mean, distribution, communication columns , etc.)
- Visual Data Display
- Redundancy or gaps in data
The importance of pandas in data science
If the importance of pandas to interest fans of science data, summarized in one sentence, we have to say “no learning pandas, having a successful career in the field of data would not be possible.” Yes! Pandas are just as important in data science. As we said pandas to give you the tools to do things like sort or charting the show.
This adds to you before you enter the deep process more data, you have a good understanding of the data to arrive and fix potential problems in it. So given that the they are bug-free. Pandas package as well as in some other Python used (eg charts) are used.
Pandas links to other Python libraries
In making the pandas from the library NumPy is used. The library features for advanced mathematical operations and work with arrays to provide application developers with that. In fact NumPy foundation Pandas make it. Build your library pandas in other Python is involved.
- Scipy : The library statistical features his borrowed are
- Matplotlib : features related to the plot of the pandas would be.
- -Learn Scikit : learning tool for the pandas would be.
Get started with Pandas
Who will start working with the pandas must have adequate knowledge of Python. Pandas, one of the ten tools necessary to succeed in Dytasayns, along with other tools meant to be. If you do not have basic familiarity with Python do can training school Python home use. This time by one of the best coders Iran (Gadi Mirmirani) taught there. If you are familiar with Python, download and install Pandas.
Download and install Pandas
Download pandas usually done by writing code can be. It where it is for new users , or at least experience will be hard. Recommend the Anaconda download. Anaconda graphical installation environment and a comprehensive range of packages necessary to Dytasayns to simultaneously installed on the computer can not. Another way to install Pandas is to write the following commands in the terminal (or Comman Prompt ).
conda install pandas
OR
pip install pandas
After downloading the pandas, use the following command to install the Python environment can be.
import pandas as pd
How to work with Pandas
The Python Pandas has two basic components:
- Series : columns of data are
- DataFrame : A multidimensional table
Build a table ( DataFrame ) in Python Python
Dytafrym tables to store input data are. To build a pilot Dytafrym assume that our input is the sales table for a grocery. Each row and column of a customer number is visible bought from apple or an orange. The first data set in this case for Python defines it.
1 2 3 4 5 6 7 | <span style=“font-size: 16px;”>data = { ‘apples’: [3, 2, 0, 1], ‘oranges’: [0, 3, 7, 2] }</span> |
Then put the data into a table Dytafrym or bring pandas in. To do this, you write:
1 2 | <span style=“font-size: 16px;”>purchases = pd.DataFrame(data) purchases</span> |
The output of the written code is as follows:
Here you see that each line is marked with the numbers 0 to 3. This means that any customer with a known number is. If we enter our name, we must above code to the following write:
1 2 | <span style=“font-size: 16px;”>purchases = pd.DataFrame(data, index=[‘June’, ‘Robert’, ‘Lily’, ‘David’]) purchases</span> |
Now we will have in the output:
You can buy the amount of customer inquiries due to take his name:
1 | <span style=“font-size: 16px;”>purchases.loc[‘June’]</span> |
The output of oranges and apples 0 and 3 show up.
Using this simple example, we tried to acquaint you with the concept and application of the main components of Pandas. Application pandas , but in reading the actual file , such as the dataset is. Then read the file in JSON , CSV , and database MySQL by pandas to teach them. The data will generally be provided to you in these three formats.
Read data from CSV file
For example, fruit retail returns back. This time the fruit sold in the file format Purchases.csv provided to us have. To read this file by pandas in Python or Note Pad, you write:
1 2 | <span style=“font-size: 16px;”>df = pd.read_csv(‘purchases.csv’) df</span> |
In the output we will have:
As you can see a column Index number from zero to 3, there are extra. To remove this column is zero column (field names) in the Index introduce. You write:
1 2 | <span style=“font-size: 16px;”>df = pd.read_csv(‘purchases.csv’, index_col=0) df</span> |
And the output added to the table without columns displayed are:
Read data from JSON file
Another format widely used to provide data JSON is. To read the Purchases.json file, we need to write and execute the following code:
1 2 | <span style=“font-size: 16px;”>df = pd.read_json(‘purchases.json’) df</span> |
File in JSON usually add to the difficulty column INDEX does not exist. But sometimes pandas Python to analyze the file structure JSON difficult to eat. In these cases it is necessary to adjust the argument of the Orient keyword according to the structure. For more details on this case, read the FAQ file json visit.
Read data from MySQL database
To get data from the database, you first need to connect to the SQL server via Python. You need the pysqlite3 library to do this . If this library is not available in your Python, you should download it using the following command:
1 | <span style=“font-size: 16px;”>pip install pysqlite3</span> |
To communicate via the library database, type the following commands to them.
1 2 | <span style=“font-size: 16px;”>import sqlite3 con = sqlite3.connect(“database.db”)</span> |
When the connection is established, it’s time to extract the data desired to be. Dytafrym get to the table with the following command operations Purchases of MySQL do not.
1 2 | <span style=“font-size: 16px;”>df = pd.read_sql_query(“SELECT * FROM purchases”, con) df</span> |
If you see an additional column problem for the index in the output, use the following command to remove it.
1 2 | <span style=“font-size: 16px;”>df = df.set_index(‘index’) df</span> |
Convert dataframe to original format
Sometimes data through open pandas and build on the reforms that we are the primary format would bring. Dytafrym pandas become the format of CSV , JSON and SQL , respectively, through the three following commands can be:
1 2 3 | <span style=“font-size: 16px;”>df.to_csv(‘new_purchases.csv’) df.to_json(‘new_purchases.json’) df.to_sql(‘new_purchases’, con)</span> |
Other commands for working with data
Obviously, the training Pandas do not can only count on it. But we try to go so far that after reading this article, the audience will be able to work with Pandas at a moderate level. Pandas important commands can be used include:
Description of the command | Command |
Basic information about DataFrame as the total number of data , the number of rows, number of columns, each row of data in each column in the range. | DataFrame_df.info () |
With this command n the first row of the table output to display on the come. If blank brackets to automatically output will be printed in the first five rows. | DataFrame_df.head (n) |
With this command n The last line in the output displayed Dytafrym be. If blank brackets are automatically applied to the output of the last five lines are. | DataFrame_df.tail (n) |
The number of rows and columns Dytafrym to confirm it. | DataFrame_df.shape |
A copy temporary ( temp ) of Dytafrym taken to make the necessary changes to the original file without having to apply. | temp_df = DataFrame_df.append (DataFrame_df) |
Duplicate the work. | temp_df = temp_df.drop_duplicates () |
Zero values in Dytafrym find out. | DataFrame_df.isnull () |
Homes that are zero to remove it. | DataFrame_df.dropna () |
Conclusions about the use of pandas
Pandas is widely used in data science. In fact, all those interested interested in a career with data science should learn pandas. Using the pandas can be assigned to receive, it worked (column Add eliminated, it will be sorted, etc.) Finally, put it back in the original file stored. But it does not end here pandas used to be.
The Library Python library also originate from there. In this paper, we use pandas to be the first step to step forward and simple. To teach advanced than you can visit the course pandas in the school house.