Pandas Weighted Average

The weighted average is the average of the data that identifies the specific numbers that are more important than the other numbers in the DataFrame. We will be implementing all possible ways in which the Pandas weighted average can be calculated with the help of several examples.

Formula

(values_column*weights_column).sum()/weights_column.sum()

Here, values_column is the numeric column in the Pandas DataFrame that stores the values, and weights_column is the numeric column that will store the weight of each value.

Method 1: Return Weighted Average

Let’s use the custom function that computes the weighted average of the Pandas DataFrame. We will use the sum() function to calculate the sum in the following computation:

sum(DataFrame_object[weight_data]*DataFrame_object[value_data])/DataFrame_object[weight_data].sum()

Here, weight_data is the column in the DataFrame that holds weights for values in the value_data column.

Example

In this example, we have a DataFrame named ‘calculations’ with 2 columns of integer type. Now, we will create a custom function, ‘weighted_avg_calculation’, to calculate the weighted average and call the function with these two columns by passing them as arguments.

import pandas

# Create the dataframe with 2 columns and 5 rows

calculations=pandas.DataFrame.from_dict({'count':[7,8,9,0,4],

'quantity':[2,3,4,5,2]

})

# Display the DataFrame - calculations

print(calculations)

# Custom function that calculates the weighted average

def weighted_avg_calculation(calculations,value_data,weight_data):

return sum(calculations[weight_data] * calculations[value_data])/calculations[weight_data].sum()

print()

# Call the function by passing the DataFrame, 'quantity' as value_data and 'count' as weight_data

print(weighted_avg_calculation(calculations,'quantity','count'))

Output

count quantity

0 7 2

1 8 3

2 9 4

3 0 5

4 4 2

2.9285714285714284

Explanation

So, the custom function is:

It will return the weighted average.

So, the weighted average of the above DataFrame is 2.92.

Method 2: Return Weighted Average in Groups

Now, we will use the groupby() function to group the rows and return the weighted average in each group. The apply() method is used along with the groupby() that takes the weighted average and columns as parameters.

DataFrame_object.groupby('grouping_column').apply(weighted_avg_calculation,'value_data','weight_data')

Here, rows were grouped based on values in the ‘grouping_column’. The weighted_avg_calculation is a custom function that computes the weighted average. The weight_data is the column in the DataFrame that holds weights for values in the value_data column.

Example

In this example, we have a DataFrame named ‘calculations’ with 3 columns. Now, we will create a custom function, ‘weighted_avg_calculation’, to calculate the weighted average and call the function with the two columns by passing them as arguments. We will group the rows based on the ‘item’ column and return the weighted average in each group.

import pandas

# Create the dataframe with 3 columns and 5 rows

calculations=pandas.DataFrame.from_dict({'count':[12,34,56,10,15],

'quantity':[100,200,345,670,50],

'item':['plastic','iron','iron','steel','plastic']

})

# Display the DataFrame - calculations

print(calculations)

# Custom function that calculates the weighted average

def weighted_avg_calculation(calculations,value_data,weight_data):

return sum(calculations[weight_data] * calculations[value_data])/calculations[weight_data].sum()

print()

print(calculations.groupby('item').apply(weighted_avg_calculation,'quantity','count'))

Output

count quantity item

0 12 100 plastic

1 34 200 iron

2 56 345 iron

3 10 670 steel

4 15 50 plastic

item

iron 290.222222

plastic 72.222222

steel 670.000000

dtype: float64

Explanation

So, the custom function is:

It will return the weighted average.

There are three groups in the calculations DataFrame.

  1. The weighted average for the ‘iron’ group is 290.22
  2. The weighted average for the ‘plastic’ group is 72.22
  3. The weighted average for the ‘steel’ group is 670.00

Method 3: Return Weighted Average Using NumPy

NumPy module supports the average() function in which we can pass the values and weights to it and get the weighted average of the pandas DataFrame.

  1. In the first parameter, we need to pass the values column.
  2. In the second parameter, we will assign the ‘weight data’ column to weights.

numpy.average(DataFrame_object[‘value_data’],weights=DataFrame_object[‘weight_data’])

Example

In this example, we have a DataFrame named ‘calculations’ with 2 columns. We will directly use numpy.average() to calculate the weighted average.

import pandas

import numpy

# Create the dataframe with 2 columns and 5 rows

calculations=pandas.DataFrame.from_dict({'count':[12,34,56,10,15],

'quantity':[100,200,345,670,50]

})

# Display the DataFrame - calculations

print(calculations)

print()

print(numpy.average(calculations['quantity'],weights=calculations['count']))

Output:

count quantity

0 12 100

1 34 200

2 56 345

3 10 670

4 15 50

273.7795275590551

dtype: float64

Explanation

Here, the quantity column will be the value, and the count will be the weights.

The weighted average is 273.77.

Conclusion

The Pandas weighted average is a valuable and technical function. We have done the custom function of the Pandas weighted average and the NumPy Pandas weighted average. The average is something we need to calculate in almost everything, even the budgets of small groceries. Thus, when talking about the millions of data, the weighted average Pandas function is a treat for all the users working on the specific data average calculations in their fields.



from https://ift.tt/vdGA6Y2

Post a Comment

0 Comments