# Pandas: DataFrame Operations ## Pandas DataFrame Analysis View and analyze your data frames with built-in Pandas methods. Stats made simple! Example 1: Get a quick statistical summary. ```python df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) print(df.describe()) # Output: # A B # count 3.0 3.0 # mean 2.0 5.0 # std 1.0 1.0 # min 1.0 4.0 # 25% 1.5 4.5 # 50% 2.0 5.0 # 75% 2.5 5.5 # max 3.0 6.0 ``` Example 2: Calculate column-wise means. ```python df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) print(df.mean()) # Output: # A 2.0 # B 5.0 # dtype: float64 ``` Example 3: Return the first n rows in a data frame. ```python data = { "Name": ["John", "Alice", "Bob", "Emma", "Mike"], "Age": [25, 30, 35, 28, 32], "City": ["New York", "Paris", "London", "Sydney", "Tokyo"], } df = pd.DataFrame(data) print(df.head(3)) # Output: # Name Age City # 0 John 25 New York # 1 Alice 30 Paris # 2 Bob 35 London ``` Example 4: Return the last n rows in a data frame. ```python data = { "Name": ["John", "Alice", "Bob", "Emma", "Mike"], "Age": [25, 30, 35, 28, 32], "City": ["New York", "Paris", "London", "Sydney", "Tokyo"], } df = pd.DataFrame(data) print(df.tail(3)) # Output: # Name Age City # 2 Bob 35 London # 3 Emma 28 Sydney # 4 Mike 32 Tokyo ``` Example 5: Get the data frame information. ```python data = { "Name": ["John", "Alice", "Bob", "Emma", "Mike"], "Age": [25, 30, 35, 28, 32], "City": ["New York", "Paris", "London", "Sydney", "Tokyo"], } df = pd.DataFrame(data) print(df.info()) # Output: # # RangeIndex: 5 entries, 0 to 4 # Data columns (total 3 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 Name 5 non-null object # 1 Age 5 non-null int64 # 2 City 5 non-null object # dtypes: int64(1), object(2) # memory usage: 248.0+ bytes # None ```

💡

Pro tip: Let Pandas do the math while you grab a coffee.

--- ## Pandas DataFrame Manipulation Add, update, or drop columns and rows with ease. Example 1: Add and drop columns. ```python data = { "Name": ["John", "Alice", "Bob", "Emma", "Mike"], "Age": [25, 30, 35, 28, 32], "City": ["New York", "Paris", "London", "Sydney", "Tokyo"], } df = pd.DataFrame(data) df["Country"] = ["United States", "France", "United Kingdom", "Australia", "Japan"] print(df.head(2)) print("================") df.drop("Age", axis=1, inplace=True) # To drop multiple: df.drop(["Age", "City"], axis=1, inplace=True) # Using columns: df.drop(columns="Age", inplace=True) print(df.head(2)) # Name Age City Country # 0 John 25 New York United States # 1 Alice 30 Paris France # ================ # Name City Country # 0 John New York United States # 1 Alice Paris France ``` Example 2: Insert and drop rows. ```python data = { "Name": ["John", "Alice", "Bob", "Emma", "Mike"], "Age": [25, 30, 35, 28, 32], "City": ["New York", "Paris", "London", "Sydney", "Tokyo"], } df = pd.DataFrame(data) df.loc[len(df.index)] = ["Drake", 32, "Bangkok"] # To replace: df.loc[2] = ["Drake", 32, "Bangkok"] print(df.tail(2)) print("================") df.drop(1, axis=0, inplace=True) # To drop multiple: df.drop([1, 3], axis=0, inplace=True) # Using index: df.drop(index=1, inplace=True) print(df.head(2)) # Name Age City # 4 Mike 32 Tokyo # 5 Drake 32 Bangkok # ================ # Name Age City # 0 John 25 New York # 2 Bob 35 London ``` Example 3: Rename column names and indexes. ```python data = { "Name": ["John", "Alice", "Bob", "Emma", "Mike"], "Age": [25, 30, 35, 28, 32], "City": ["New York", "Paris", "London", "Sydney", "Tokyo"], } df = pd.DataFrame(data) df.rename(columns={"City": "Address"}, inplace=True) # Using mapper: df.rename(mapper={"City": "Address"}, axis=1, inplace=True) print(df.head(2)) print("================") df.rename(index={1: 100}, inplace=True) # Using mapper: df.rename(mapper={1: 100}, axis=0, inplace=True) print(df.head(2)) # Name Age Address # 0 John 25 New York # 1 Alice 30 Paris # ================ # Name Age Address # 0 John 25 New York # 100 Alice 30 Paris ``` Think of it like playing Tetris but with data. --- ## Pandas Indexing and Slicing Access data with labels or positions—no guessing required. Example 1: Indexing by column or row. ```python df = pd.DataFrame({'A': [10, 20], 'B': [30, 40]}) print(df['A']) # Access column print(df.iloc[0]) # Access first row by position print(df.loc[0]) # Access first row by label ``` Example 2: Slice rows and columns. ```python print(df.iloc[:1]) # First row print(df[['A', 'B']]) # Selected columns ``` Indexing in Pandas is like peeling an onion: layer by layer. --- ## Pandas Select Filter specific rows or data based on conditions. Example 1: Simple condition. ```python print(df[df['A'] > 15]) ``` Example 2: Multiple conditions. ```python print(df[(df['A'] > 10) & (df['B'] < 40)]) ``` It’s like swiping right on the rows you love. --- ## Pandas Multiindex For when a single index just isn’t enough. Example: Create and use a MultiIndex. ```python arrays = [['A', 'A', 'B'], [1, 2, 1]] index = pd.MultiIndex.from_arrays(arrays, names=('Letter', 'Number')) df = pd.DataFrame({'Value': [10, 20, 30]}, index=index) print(df) ``` MultiIndex is for the overachievers. You know who you are. --- ## Pandas Reshape Reshape your data with `melt` and `pivot`. Example 1: Use `melt` for long format. ```python df = pd.DataFrame({'ID': [1, 2], 'Value': [10, 20]}) melted = pd.melt(df, id_vars='ID') print(melted ``` Example 2: Use `pivot` for wide format. ```python pivoted = melted.pivot(index='ID', columns='variable', values='value') print(pivoted) ``` Shape your data like a pro. --- ## Pandas Duplicate Values Find and remove duplicate rows. Because no one likes redundancy. Example 1: Detect duplicates. ```python df = pd.DataFrame({'A': [1, 1, 2], 'B': [3, 3, 4]}) print(df.duplicated()) ``` Example 2: Drop duplicates. ```python df.drop_duplicates(inplace=True) print(df ``` Duplicates, begone! --- ## Pandas Pivot Reorganize your data with `pivot`. Example: Pivot data. ```python data = {'Date': ['2024-01-01', '2024-01-02'], 'Value': [10, 20]} df = pd.DataFrame(data) pivoted = df.pivot(index='Date', columns='Value') print(pivoted) ``` Pivots make data feel fancy. --- ## Pandas Pivot Table Summarize data with a pivot table. Example: Create a pivot table. ```python data = {'Category': ['A', 'A', 'B'], 'Value': [10, 20, 30]} df = pd.DataFrame(data) pivot_table = df.pivot_table(values='Value', index='Category', aggfunc='sum') print(pivot_table) ``` You’re basically at Excel-level now. ---