pandas df.loc practice

·

2 min read

df.loc is a powerful and versatile method in the pandas library that is used for accessing and modifying data in a DataFrame based on labels. With the ability to select and modify data by row and column labels, it plays an essential role in data manipulation tasks. In this explanation, we will cover the basics of df.loc, its syntax, and provide examples to illustrate its capabilities.

Syntax and usage

The syntax for using df.loc is:

df.loc[row_labels, column_labels]

Here, df is the DataFrame you want to access or modify, row_labels are the labels of the rows you want to select, and column_labels are the labels of the columns you want to select. You can use single labels, lists of labels, or slice objects with labels.

Example 1: Basic selection

Let's create a sample DataFrame to illustrate the usage of df.loc:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Carol', 'David', 'Eve'],
    'Age': [25, 30, 35, 28, 22],
    'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago', 'Boston']
}

df = pd.DataFrame(data)

Now, we have a DataFrame df with the following structure:

    Name  Age           City
0  Alice  25      New York
1    Bob  30  San Francisco
2  Carol  35    Los Angeles
3  David  28      Chicago
4    Eve  22        Boston

To select a single row and column, we can use:

codedf.loc[0, 'Name']

This returns 'Alice', the value at the intersection of row 0 and column 'Name'.

Example 2: Selecting multiple rows and columns

To select multiple rows and columns, we can use lists of labels:df.loc[[1, 3], ['Name', 'City']]

This returns the following DataFrame:

Copy code   Name           City
1    Bob  San Francisco
3  David      Chicago

Example 3: Selecting rows and columns using slices

Slices can be used to select a range of row and column labels:

df.loc[1:3, 'Age':'City']

This returns the following DataFrame:

   Age           City
1  30  San Francisco
2  35    Los Angeles
3  28      Chicago

Note that unlike regular Python slicing, the endpoint is inclusive when using df.loc.

Example 4: Boolean indexing

df.loc also supports Boolean indexing, which allows for filtering rows based on conditions:

codedf.loc[df['Age'] > 25, ['Name', 'City']]

This returns the following DataFrame containing rows where the 'Age' is greater than 25:

    Name           City
1    Bob  San Francisco
2  Carol    Los Angeles
3  David      Chicago