Pandas: DataFrame Operations

Pandas DataFrame Analysis

View and analyze your data frames with built-in Pandas methods. Stats made simple!

Example 1: Get a quick statistical summary.

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
print(df.describe())

# Output:
#          A    B
# count  3.0  3.0
# mean   2.0  5.0
# std    1.0  1.0
# min    1.0  4.0
# 25%    1.5  4.5
# 50%    2.0  5.0
# 75%    2.5  5.5
# max    3.0  6.0

Example 2: Calculate column-wise means.

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
print(df.mean())

# Output:
# A    2.0
# B    5.0
# dtype: float64

Example 3: Return the first n rows in a data frame.

data = {
    "Name": ["John", "Alice", "Bob", "Emma", "Mike"],
    "Age": [25, 30, 35, 28, 32],
    "City": ["New York", "Paris", "London", "Sydney", "Tokyo"],
}
df = pd.DataFrame(data)
print(df.head(3))

# Output:
#     Name  Age      City
# 0   John   25  New York
# 1  Alice   30     Paris
# 2    Bob   35    London

Example 4: Return the last n rows in a data frame.

data = {
    "Name": ["John", "Alice", "Bob", "Emma", "Mike"],
    "Age": [25, 30, 35, 28, 32],
    "City": ["New York", "Paris", "London", "Sydney", "Tokyo"],
}
df = pd.DataFrame(data)
print(df.tail(3))

# Output:
#    Name  Age    City
# 2   Bob   35  London
# 3  Emma   28  Sydney
# 4  Mike   32   Tokyo

Example 5: Get the data frame information.

data = {
    "Name": ["John", "Alice", "Bob", "Emma", "Mike"],
    "Age": [25, 30, 35, 28, 32],
    "City": ["New York", "Paris", "London", "Sydney", "Tokyo"],
}
df = pd.DataFrame(data)
print(df.info())

# Output:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Data columns (total 3 columns):
#  #   Column  Non-Null Count  Dtype
# ---  ------  --------------  -----
#  0   Name    5 non-null      object
#  1   Age     5 non-null      int64
#  2   City    5 non-null      object
# dtypes: int64(1), object(2)
# memory usage: 248.0+ bytes
# None

💡

Pro tip: Let Pandas do the math while you grab a coffee.

Pandas DataFrame Manipulation

Add, update, or drop columns and rows with ease.

Example 1: Add and drop columns.

data = {
    "Name": ["John", "Alice", "Bob", "Emma", "Mike"],
    "Age": [25, 30, 35, 28, 32],
    "City": ["New York", "Paris", "London", "Sydney", "Tokyo"],
}
df = pd.DataFrame(data)

df["Country"] = ["United States", "France", "United Kingdom", "Australia", "Japan"]
print(df.head(2))

print("================")

df.drop("Age", axis=1, inplace=True)
# To drop multiple: df.drop(["Age", "City"], axis=1, inplace=True)
# Using columns: df.drop(columns="Age", inplace=True)
print(df.head(2))

#     Name  Age      City        Country
# 0   John   25  New York  United States
# 1  Alice   30     Paris         France
# ================
#     Name      City        Country
# 0   John  New York  United States
# 1  Alice     Paris         France

Example 2: Insert and drop rows.

data = {
    "Name": ["John", "Alice", "Bob", "Emma", "Mike"],
    "Age": [25, 30, 35, 28, 32],
    "City": ["New York", "Paris", "London", "Sydney", "Tokyo"],
}
df = pd.DataFrame(data)

df.loc[len(df.index)] = ["Drake", 32, "Bangkok"]
# To replace: df.loc[2] = ["Drake", 32, "Bangkok"]
print(df.tail(2))

print("================")

df.drop(1, axis=0, inplace=True)
# To drop multiple: df.drop([1, 3], axis=0, inplace=True)
# Using index: df.drop(index=1, inplace=True)
print(df.head(2))

#     Name  Age     City
# 4   Mike   32    Tokyo
# 5  Drake   32  Bangkok
# ================
#    Name  Age      City
# 0  John   25  New York
# 2   Bob   35    London

Example 3: Rename column names and indexes.

data = {
    "Name": ["John", "Alice", "Bob", "Emma", "Mike"],
    "Age": [25, 30, 35, 28, 32],
    "City": ["New York", "Paris", "London", "Sydney", "Tokyo"],
}
df = pd.DataFrame(data)

df.rename(columns={"City": "Address"}, inplace=True)
# Using mapper: df.rename(mapper={"City": "Address"}, axis=1, inplace=True)
print(df.head(2))

print("================")

df.rename(index={1: 100}, inplace=True)
# Using mapper: df.rename(mapper={1: 100}, axis=0, inplace=True)
print(df.head(2))

#     Name  Age   Address
# 0   John   25  New York
# 1  Alice   30     Paris
# ================
#       Name  Age   Address
# 0     John   25  New York
# 100  Alice   30     Paris

Think of it like playing Tetris but with data.

Pandas Indexing and Slicing

Access data with labels or positions—no guessing required.

Example 1: Indexing by column or row.

df = pd.DataFrame({'A': [10, 20], 'B': [30, 40]})
print(df['A'])  # Access column
print(df.iloc[0])  # Access first row by position
print(df.loc[0])  # Access first row by label

Example 2: Slice rows and columns.

print(df.iloc[:1])  # First row
print(df[['A', 'B']])  # Selected columns

Indexing in Pandas is like peeling an onion: layer by layer.

Pandas Select

Filter specific rows or data based on conditions.

Example 1: Simple condition.

print(df[df['A'] > 15])

Example 2: Multiple conditions.

print(df[(df['A'] > 10) & (df['B'] < 40)])

It’s like swiping right on the rows you love.

Pandas Multiindex

For when a single index just isn’t enough.

Example: Create and use a MultiIndex.

arrays = [['A', 'A', 'B'], [1, 2, 1]]
index = pd.MultiIndex.from_arrays(arrays, names=('Letter', 'Number'))
df = pd.DataFrame({'Value': [10, 20, 30]}, index=index)
print(df)

MultiIndex is for the overachievers. You know who you are.

Pandas Reshape

Reshape your data with melt and pivot.

Example 1: Use melt for long format.

df = pd.DataFrame({'ID': [1, 2], 'Value': [10, 20]})
melted = pd.melt(df, id_vars='ID')
print(melted

Example 2: Use pivot for wide format.

pivoted = melted.pivot(index='ID', columns='variable', values='value')
print(pivoted)

Shape your data like a pro.

Pandas Duplicate Values

Find and remove duplicate rows. Because no one likes redundancy.

Example 1: Detect duplicates.

df = pd.DataFrame({'A': [1, 1, 2], 'B': [3, 3, 4]})
print(df.duplicated())

Example 2: Drop duplicates.

df.drop_duplicates(inplace=True)
print(df

Duplicates, begone!

Pandas Pivot

Reorganize your data with pivot.

Example: Pivot data.

data = {'Date': ['2024-01-01', '2024-01-02'], 'Value': [10, 20]}
df = pd.DataFrame(data)
pivoted = df.pivot(index='Date', columns='Value')
print(pivoted)

Pivots make data feel fancy.

Pandas Pivot Table

Summarize data with a pivot table.

Example: Create a pivot table.

data = {'Category': ['A', 'A', 'B'], 'Value': [10, 20, 30]}
df = pd.DataFrame(data)
pivot_table = df.pivot_table(values='Value', index='Category', aggfunc='sum')
print(pivot_table)

You’re basically at Excel-level now.

Pandas: DataFrame Operations

Pandas DataFrame Analysis

Pandas DataFrame Manipulation

Pandas Indexing and Slicing

Pandas Select

Pandas Multiindex

Pandas Reshape

Pandas Duplicate Values

Pandas Pivot

Pandas Pivot Table

Comments

General Programming

Pandas: Introduction

More from this blog

Pandas: Import and Export

Pandas: Introduction

Revising NumPy: A Cheatsheet

How I Manage Django Settings

Command Palette

Pandas DataFrame Analysis

Pandas DataFrame Manipulation

Pandas Indexing and Slicing

Pandas Select

Pandas Multiindex

Pandas Reshape

Pandas Duplicate Values

Pandas Pivot

Pandas Pivot Table

Comments

General Programming

Pandas: Introduction

More from this blog