Python’s Pandas dictionary-1 (Basics)

Maniswaroop
3 min readDec 9, 2020

--

Hello guys!!

If you are working on Python and especially in Data science or an Analyst field, lot of times we get the technical stuff to play with data like Query the data, Merging the data, Pivot tables, grouping the data using some aggregation(mean, median, mode, min, max etc.) methods which we can do in SQL.

But here is an easy way that we can do in Python using Pandas library is a simplest way than SQL.
Let’s go to learn few techniques using Pandas library those mostly using in Data scientist or an Analyst work life.
Firstly we start with the basic methods using Pandas.

  1. Read data/Import Excel files:
    Most of the times we use .csv and .xlsx formats of Excel file, so discuss how to read or import those files here.
  2. a. Import CSV file:
import pandas as pd
df = pd.read_csv(r’My_folder\Pandas\file_name.csv’)

#[OR] here we use r’ ‘ for having ‘\’
df = pd.read_csv(‘My_folder/Pandas/file_name.csv’)

Note: In first one we use r’ file path‘, because it having ‘\’ while reading the file path.
In the second one we didn’t use r’ file path’, because it having ‘/’ while reading the file path.
In my observation, while I’m using Windows OS, I face the error while adding file path.

b. Import XLSX file:
XLSX is another format of Excel file which extension (.xlsx or .xls).

import pandas as pd
#for importing .xlsx file
df = pd.read_excel(r‘My_folder\Pandas\file_name.xlsx’)
#for importing .xls file
df = pd.read_excel(r‘My_folder\Pandas\file_name.xls’)

2. See the data:

#a. Print top 5 rows of df
df.head() #[OR]
df[:5]
#b. Print last 5 rows of df
df.tail() #[OR]
df[5:]

c. To check df information like column’s data type, total index(max rows in df), number of null values.

df.info()

3. Change column’s data type or dtype:
a. Convert ‘object’ to ‘numeric’:

#If all the values are numeric without letters or special characters.df[‘column1’] = df[‘column1’].astype(‘int’)df[‘column1’] = df[‘column1’].astype(‘float’)(or)df[‘column1’] = pd.to_numeric(df[‘column1’])

b. Convert ‘numeric’ to ‘object’:

df[‘column1’] = df[‘column1’].astype(‘object’)

c. Convert Date column to date_type:

df[‘column1’] = pd.to_datetime(df[‘Date’])

4. Central Tendency:
As we know Central tendency measures are Mean, Median, Mode.

a. Mean:

#To check mean (Average) of all numeric columns.
df.mean()
#To check mean of column.
df[‘Column_name’].mean()

b. Median:

#To check Median of all numeric columns.
df.median()
#To check Median of column.
df[‘Column_name’].median()

c. Mode:

#To check Mode of all numeric columns.
df.mode()
#To check Mode of column.
df[‘Column_name’].mode()

d. Summary statistics of df:

· Here you’ll get Count, mean, std (standard deviation), min, 25% value, 50% value (median), 75% value, max values of each numerical column.
.For Categorical(object) columns we will get count, unique, top, freq.

#For all columns
df.describe()
#For selected columns.
df[[‘Column1’, ’Column2’, ‘Column3’]].describe()

5. Deleting or Drop values from df:

#Drop columns:
df.drop([‘Column1’,’column2’], axis=1)
#Drop rows from index 1:10
df.drop(df.index[1:10])
#Or if you want to drop rows based on condition or a column categorydf.loc[df['column1'] != 'Blank'] #ORdf.loc[~df['column1'] == 'Blank']

6. Missing Values:

To check sum of NaN (missing) values for each column

df.isnull().sum()

To check % of NaN (missing) values for each column.

df.isnull().sum()/len(df.index)

For Python’s Pandas dictionary-2 (Advanced) techniques click the link below.

https://maniswaroop.medium.com/pythons-pandas-dictionary-2-advanced-87389aeb8173

--

--

No responses yet