Pandas Filter by Multiple Conditions

The most popular DataFrame manipulation in Pandas is filtering. In this post, we’ll look at how to use several conditions to filter a Pandas DataFrame. In Pandas, there are multiple methods to extract data from the DataFrame using multiple conditions. In the following examples, we will demonstrate how to use different functions to filter DataFrame using multiple conditions.

Method 1: Using eval()

eval() is used to evaluate an expression. So it will act as a filter in the DataFrame and return the rows that match the condition.

Syntax

pandas.DataFrame_object[DataFrame_object.eval(“Conditions”)]

Example 1

Let’s create a DataFrame with 6 columns and 4 rows and return the rows based on the fee column where the id is greater than 20, and the name ends with “n”.

import pandas

import numpy

remarks = pandas.DataFrame([[23,'sravan','pass',1000,34,56],
                             [21,'siva','fail',400,32,45],
                            [20,'sahaja','pass',100,78,90],
                            [22,'suryam','fail',450,76,56]
],columns=['id','name','status','fee','points1','points2'])
print(remarks)

print()

# Return the rows based on the fee column where the id is greater than 20 and the name ends with “n”.

print(remarks[remarks.eval(id > 20 & name.str.endswith(‘n’).values)])

Output

   id    name status   fee  points1  points2
0  23  sravan   pass  1000       34       56
1  21    siva   fail   400       32       45
2  20  sahaja   pass   100       78       90
3  22  suryam   fail   450       76       56

   id    name status   fee  points1  points2
0  23  sravan   pass  1000       34       56

There is only one row such that name ends with ‘n’, and the id is greater than 20. Here, we specified two conditions using the “and” operator.

Example 2

Return the rows based on the ‘fee’ column where id is greater than 20 and ‘points1’ is less than 35, and the name starts with ‘s’.

import pandas

import numpy

remarks = pandas.DataFrame([[23,'sravan','pass',1000,34,56],
                             [21,'siva','fail',400,32,45],
                            [20,'sahaja','pass',100,78,90],
                            [22,'suryam','fail',450,76,56]


],columns=['id','name','status','fee','points1','points2'])

# Return the rows based on the fee column where id is greater than 20 and starts with "s" and points1 less than 35.

print(remarks[remarks.eval("id > 20 & name.str.startswith('s').values & points1 < 35")])

Output

   id    name status   fee  points1  points2
0  23  sravan   pass  1000       34       56
1  21    siva   fail   400       32       45

Two rows match the condition.

Method 2: Using loc[]

Syntax

DataFrame_object.loc[]

Parameter

Index label: List of strings or a single string of the row’s index names.

Example 1

Create a DataFrame named ‘remarks’ with 6 columns. Let’s return the rows based on the fee column where fee is greater than 300 and points2 less than 76.

import pandas

remarks = pandas.DataFrame([[23,'sravan','pass',1000,34,56],
                             [21,'siva','fail',400,32,45],
                            [20,'sahaja','pass',100,78,90],
                            [22,'suryam','fail',450,76,56]

],columns=['id','name','status','fee','points1','points2'])

# Display the DataFrame - remarks

print(remarks)

print()

# Return the rows based on the fee column where fee is greater than 300 and points2 less than 76

print(remarks.loc[(remarks['fee'] > 300) & (remarks['points2'] < 76)])

Output

   id    name status   fee  points1  points2


0  23  sravan   pass  1000       34       56

1  21    siva   fail   400       32       45

2  20  sahaja   pass   100       78       90

3  22  suryam   fail   450       76       56

id    name status   fee  points1  points2

0  23  sravan   pass  1000       34       56

1  21    siva   fail   400       32       45

3  22  suryam   fail   450       76       56

There are 3 rows where the fee is greater than 300 and points2 less than 76. Here, we specified two conditions with the ‘&’ operator.

Example 2:

Create a DataFrame named ‘remarks’ with 6 columns. Let’s return the rows based on the fee column where fee is greater than 300 and points2 less than 76.

import pandas

remarks = pandas.DataFrame([[23,'sravan','pass',1000,34,56],
                             [21,'siva','fail',400,32,45],
                            [20,'sahaja','pass',100,78,90],
                            [22,'suryam','fail',450,76,56]


],columns=['id','name','status','fee','points1','points2'])

# Return the rows based on the fee column where fee is greater than 300 and points2 less than 76, and the status is 'fail'.

print(remarks.loc[(remarks['fee'] > 300) & (remarks['points2'] < 76) & (remarks['status'] == 'fail')])

Output

   id    name status  fee  points1  points2

1  21    siva   fail  400       32       45

3  22  suryam   fail  450       76       56

There are 2 rows where the fee is greater than 300 and points2 greater than 76, and the status is ‘fail’. Here, we specified three conditions with the ‘&’ operator.

Method 3: Using query()

query() will take the condition as an expression such that rows are filtered in the DataFrame based on the expression provided. Make sure that you need to write an expression inside “ ”.

Syntax

pandas.DataFrame_object.query(“Expression”)

Example

Let’s return the rows based on the fee column where fee is greater than 300 and points2 less than 76.

Import pandas

remarks = pandas.DataFrame([[23,'sravan','pass',1000,34,56],
                             [21,'siva','fail',400,32,45],
                            [20,'sahaja','pass',100,78,90],
                            [22,'suryam','fail',450,76,56]


],columns=['id','name','status','fee','points1','points2'])

# Return the rows based on the fee column where fee is greater than 300 and points2 less than 76

print(remarks.query("fee>300 and points2 < 76"))

Output

   id    name status   fee  points1  points2

0  23  sravan   pass  1000       34       56

1  21    siva   fail   400       32       45

3  22  suryam   fail   450       76       56

There are 3 rows where the fee is greater than 300 and points2 less than 76. Here, we specified two conditions using the ‘and’ operator.

Conclusion

Filtering is the most often used DataFrame operation in Pandas. In this guide, we deliberated how you filter DataFrame by using multiple conditions. After covering this article, you may be able to filter the data by using multiple conditions yourself. We implemented a few examples in this article to teach you how to extract data from the DataFrame with the help of multiple conditions using the different functions in Pandas and NumPy like loc[], query(), and eval().



from https://ift.tt/FRTAPgz

Post a Comment

0 Comments