pandas df.loc practice
df.loc
is a powerful and versatile method in the pandas library that is used for accessing and modifying data in a DataFrame based on labels. With the ability to select and modify data by row and column labels, it plays an essential role in data manipulation tasks. In this explanation, we will cover the basics of df.loc
, its syntax, and provide examples to illustrate its capabilities.
Syntax and usage
The syntax for using df.loc
is:
df.loc[row_labels, column_labels]
Here, df
is the DataFrame you want to access or modify, row_labels
are the labels of the rows you want to select, and column_labels
are the labels of the columns you want to select. You can use single labels, lists of labels, or slice objects with labels.
Example 1: Basic selection
Let's create a sample DataFrame to illustrate the usage of df.loc
:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Carol', 'David', 'Eve'],
'Age': [25, 30, 35, 28, 22],
'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago', 'Boston']
}
df = pd.DataFrame(data)
Now, we have a DataFrame df
with the following structure:
Name Age City
0 Alice 25 New York
1 Bob 30 San Francisco
2 Carol 35 Los Angeles
3 David 28 Chicago
4 Eve 22 Boston
To select a single row and column, we can use:
codedf.loc[0, 'Name']
This returns 'Alice'
, the value at the intersection of row 0 and column 'Name'.
Example 2: Selecting multiple rows and columns
To select multiple rows and columns, we can use lists of labels:df.loc[[1, 3], ['Name', 'City']]
This returns the following DataFrame:
Copy code Name City
1 Bob San Francisco
3 David Chicago
Example 3: Selecting rows and columns using slices
Slices can be used to select a range of row and column labels:
df.loc[1:3, 'Age':'City']
This returns the following DataFrame:
Age City
1 30 San Francisco
2 35 Los Angeles
3 28 Chicago
Note that unlike regular Python slicing, the endpoint is inclusive when using df.loc
.
Example 4: Boolean indexing
df.loc
also supports Boolean indexing, which allows for filtering rows based on conditions:
codedf.loc[df['Age'] > 25, ['Name', 'City']]
This returns the following DataFrame containing rows where the 'Age' is greater than 25:
Name City
1 Bob San Francisco
2 Carol Los Angeles
3 David Chicago