Python’s Pandas dictionary-2 (Advanced)

Maniswaroop
5 min readDec 11, 2020

--

Here you’ll see the advanced techniques like Query, Group by aggregation, Pivot Table, Melt, Concat, Append, Merge (Joins) etc.

1. Data Query:

There are multiple scenarios and ways to query the data.
Let’s assume the below data frame we are using for query.

a. Select a column from df.

b. Select multiple columns from df

c. Select data using .loc or .iloc or .ix functions.
In some pandas versions .ix is not working or it will give some warnings, there you can warnings by importing ‘warnings’ library.

The working nature of .loc, .iloc and .ix are probably same, but syntax difference.

Use,
.loc function when you know required index names or number and column’s names.
.iloc function when you want to filter rows based on index number and column’s order number.
.ix is very flexible, it has both working nature like both .loc and .iloc functions had.

From the below code ‘[ : ,:]‘ or ‘[ : ]’ means all (rows and columns).
where, ‘ : ‘ before ‘ , ‘ ([:,) means querying rows AND
‘ : ’ after ‘ , ‘ (:]) means querying columns.

d. Select data frame based on condition of column’s value or category.

Query the data where ‘col_D’ contains ‘a’

And

Query the data where ‘col_B’ contains data ≤ 50

e. Select data frame based on multiple condition of column’s value or category.

f. Select data frame between dates.

g. Query the data by using ‘string’ name.
Here, we query the data using ‘date’ column by converting it to ‘object’ or ‘str’ dtype.

h. Using query function:
Another way to do query the data in Pandas is using query function.

2. Group By Data:

Below Data Frame we will use for Group By practice.

Note: In below coding snaps you can see we used ‘reset_index()’ will helps you to return proper data set.

a. Group by SUM:

b. Group by MEAN:

c. Group by MIN:

d. Group by MAX:

e. Group by Count:

f. Group by Count %:

g. Group by ROLLING MEAN:
Rolling is the shifting the selected column’s rows data with given aggregation functions like Mean, Median, Sum etc.
.rolling(3).mean() means taking the mean of next 3 numerical rows in the current row.
.rolling(3) will shift the data to next 3 steps
.rolling(-3) will shift the data to previous 3 steps.

3. Pivot Tables:

People who knows MS Excel, might also familiar with Pivot Tables. Pivot Tables are one form on Group By methods.

In below coding we did SUM and MEAN aggregations for selected columns.

4. Melt Data:

Melt option will create a relation for multiple Columns and their values with in 2 columns.

5. Concat:

Concatenation used to combine 2 or more Data Frames either Row wise or Column wise.
Column wise Concatenation
:

Row wise Concatenation (also called as append data):
By default ‘axis=0’for Row wise concatenation.

6. Append:
Append also same as Row wise concatenation but syntax is different.
Syntax is below.

df.append(New_df)

7. Merge:

People who have knowledge on SQL might also familiar with joins also known as Merging in Pandas.
There are multiple types of Joins like Inner join, Right Join, Left Join and Outer Join in Pandas.

a. Inner Join:
Inner join combines 2 Data frames by mapping the row’s values of selected columns which is ‘on = ‘Views_show’ in the below code.
In the ‘New_df’ Dataframe we change the ‘Views_platform’ column name to ‘Col_A’ for our better understanding to the output DF of Inner join dataframe.
Note: From below result after join two dataframes you can see the shape is (40,9).
df’s shape: (80,9) and New_df’s shape: (40,9)

b. Left Join:
From below figure, we add ‘New_df ’ columns in ‘df ’data frame where ‘views_show’ column from ‘df’ dataframe matches the same values of ‘New_df’ dataframe.
Note: You can absorb that ‘col_A’ column from ‘New_df’ dataframe added in final output dataframe.
and the shape is (80, 9) because df’s shape is (80,9).

C. Right Join:
Here, we add ‘df ’ columns in ‘New_df ’data frame where ‘views_show’ column from ‘New_df’ dataframe matches the same values of ‘df’ dataframe.
Note: final output dataframe’s shape is (40, 9) because New_df’s shape is (40,9).

d. Outer Join:
Here, we combine both ‘New_df ’and ‘df’ dataframe without any mapping.
Note: final output dataframe’s shape is (80, 9) because higher len df’s shape is (80,9).

Hope you enjoyed learning, Please write your comments to add more methods.

Thank You!

--

--

No responses yet