Pandas: Introduction

Getting Started with Pandas

Pandas is a powerful Python library for data manipulation and analysis. Pandas is your gateway to managing data effectively. First step: bring it to your project.

It’s like Excel, but smarter and much less likely to crash when you sort something.

import pandas as pd

Pandas Series

A Pandas Series is a one-dimensional labeled array. Think of it as a fancy list with labels (indices). We can access values by their indices.

Example 1: Create a basic Series from the List.

s = pd.Series([10, 20, 30])
print(s)

# Output:
# 0    10
# 1    20
# 2    30
# dtype: int64

Example 2: Assign custom indices to the elements.

s = pd.Series([10, 20, 30], index=["a", "b", "c"])
print(s)

# Output:
# a    10
# b    20
# c    30
# dtype: int64

Example 3: Create a Series from the dictionary.

s = pd.Series({"bmw": 10, "honda": 9, "suzuki": 8})
print(s)

# Output:
# bmw       10
# honda      9
# suzuki     8
# dtype: int64

Example 4: Create a Series by selecting specific keys from the dictionary.

s = pd.Series({'bmw': 10, 'honda': 9, 'suzuki': 8}, index=['bmw', 'honda'])
print(s)

# Output:
# bmw      10
# honda     9
# dtype: int64

💡

Fun fact: A Series can hold any data type. Yes, even your favorite memes. In those cases, the dtype will be an object.

Pandas DataFrame

A DataFrame is a 2D-labeled data structure. Think of it as rows and columns. It’s the bread and butter of Pandas.

Example 1: Create a DataFrame from a dictionary.

data = {
    "Name": ["Alice", "Bob"],
    "Age": [25, 30],
}
df = pd.DataFrame(data)
print(df)

# Output:
#     Name  Age
# 0  Alice   25
# 1    Bob   30

Example 2: Create a DataFrame from lists.

data = [[1, "John"], [2, "Bob"]]
df = pd.DataFrame(data, columns=["ID", "Name"])
print(df)

# Output:
#    ID  Name
# 0   1  John
# 1   2   Bob

💡

Pro tip: DataFrames can hold multiple data types, just like that one junk drawer in your kitchen.

Pandas Index

The Index is a label array of stored rows and columns. Labels can also be customized.

Example 1: Setting a column as an Index.

data = {
    "Name": ["Alice", "Bob"],
    "Age": [25, 30],
}
df = pd.DataFrame(data)
df.set_index("Name", inplace=True)
print(df)

# Output:
#        Age
# Name
# Alice   25
# Bob     30

Example 2: Setting a range as an Index.

data = {
    "Name": ["Alice", "Bob"],
    "Age": [25, 30],
}
df = pd.DataFrame(data, index=pd.RangeIndex(100, 102, name="Index"))
print(df)

# Output:
#         Name  Age
# Index
# 100    Alice   25
# 101      Bob   30

Example 3: Renaming an Index.

data = {
    "Name": ["Alice", "Bob"],
    "Age": [25, 30],
}
df = pd.DataFrame(data, index=pd.RangeIndex(100, 102, name="Index"))
df.rename(index={101: "First"}, inplace=True)
print(df)

# Output:
#         Name  Age
# Index
# 100    Alice   25
# First    Bob   30

Example 4: Resetting an Index.

data = {
    "Name": ["Alice", "Bob"],
    "Age": [25, 30],
}
df = pd.DataFrame(data)
df.set_index("Name", inplace=True)
df.reset_index(inplace=True)
print(df)

# Output:
#     Name  Age
# 0  Alice   25
# 1    Bob   30

Example 5: Listing Indices.

data = {
    "Name": ["Alice", "Bob"],
    "Age": [25, 30],
}
df = pd.DataFrame(data)
print(df.index)
print(df.index.values)

# Output:
# RangeIndex(start=0, stop=2, step=1)
# [0 1]

💡

Remember, the index is your friend, use it wisely. Few more types of indexes are CategoricalIndex, DatetimeIndex, IntervalIndex, etc.

Pandas Array

Pandas integrates with NumPy arrays for efficient data storage.

Example 1: Create a pandas array and pandas Series.

arr = pd.array(["John", "Alice", "Bob"])
arr_series = pd.Series(arr)
print(arr_series)

# Output:
# 0     John
# 1    Alice
# 2      Bob
# dtype: string

Example 2: Convert a column to a NumPy array.

data = {
    "Name": ["Alice", "Bob"],
    "Age": [25, 30],
}
df = pd.DataFrame(data)
arr = df["Name"].to_numpy()
print(arr)

# Output:
# ['Alice' 'Bob']

Example 3: Modify data using NumPy.

data = {
    "Name": ["Alice", "Bob"],
    "Age": [25, 30],
}
df = pd.DataFrame(data)
arr = df["Age"].to_numpy()
df["Age"] = arr + 2
print(df)

# Output:
#     Name  Age
# 0  Alice   27
# 1    Bob   32

💡

Fun fact: If you’re a NumPy fan, you’ll feel right at home here.

Pandas: Introduction

Getting Started with Pandas

Pandas Series

Pandas DataFrame

Pandas Index

Pandas Array

Comments

General Programming

Revising NumPy: A Cheatsheet

More from this blog

Pandas: Import and Export

Pandas: DataFrame Operations

Revising NumPy: A Cheatsheet

How I Manage Django Settings

Command Palette

Getting Started with Pandas

Pandas Series

Pandas DataFrame

Pandas Index

Pandas Array

Comments

General Programming

Revising NumPy: A Cheatsheet

More from this blog