SQL Server Median Function

The statistical median, or short, refers to a value that separates a set of values in half. You can think of the median of the middle value within a set of sorted values in ascending or descending order.

The median typically indicates the largest or smallest value, depending on the set referenced. For example, in a set with values:

{100,200,300,400,500,600,700,800,900}

The median value in the above set is 500. Hence, 500 is the fourth largest value in the first set and the fourth smallest in the second set.

This article will learn how to calculate a column in SQL Server. Keep in mind that there is no specific function to perform a statistical median in SQL Server.

The Basics

Let us start at the basics and understand how to calculate the median for a set of values.

In statistics, to calculate the median of a set, we start by arranging the values in either an ascending or descending order. Once we have the data sorted logically, we determine the middle value.

If the set contains an odd number of values, we consider the middle value the median of the specific set.

However, if the set comprises an even number of values, we determine the two middle values in the set, add them, and divide by 2.

We can express the formula for calculating the median of a given set as:

SOURCE: Wikipedia.

Calculate Median in SQL Server

Let us learn how to calculate the median in SQL Server. Let us start by setting up demo information as shown in the queries below:

Create database:

CREATE DATABASE median;

Use the database

USE median;

Create a table with columns as shown:

USE median;
CREATE TABLE sample_data (
        id INT PRIMARY KEY IDENTITY(1,1) NOT NULL,
        product VARCHAR(50),
        price money,
        quantity INT
);
INSERT INTO sample_data(product, price, quantity)
VALUES ('Adjustable Chair', 380.40, 1),
           ('Windproof Umbrella', 26.77, 3),
           ('Amazon Echo Dot', 39.99, 5),
           ('Air Purifier', 99.99, 6),
           ('4K Security Camera', 109.85, 4),
           ('Fitness Tracker', 67.49, 10),
           ('Touch Screen Gloves', 12.99, 8),
           ('Apple AirPods Pro', 329.99, 5),
           ('Sony WH-1000XM4', 320.99, 5),
           ('MacBook Air', 999.99, 10),
           ('Dell XPS 13', 1170.00, 6);

Once we have the sample data, we can calculate the median of the provided data.

Method 1 – SQL Ranking and CTE

The first method we can use to calculate the median of values is the rank function and common table expressions. This method works even in older versions of SQL Server.

This works by grouping the set into 50 percent highest and 50 percent lowest values.

We can use this as shown in the example query below:

SELECT
(
(SELECT MAX(price) FROM
 (SELECT top 50 percent price FROM sample_data ORDER BY price) AS bottomhalf)
 +
 (SELECT MIN(price) FROM
  (SELECT top 50 percent price FROM sample_data ORDER BY price DESC) AS tophalf)
) / 2 AS median

The resulting value is as:

median
---------------------
109.85
(1 ROW affected)

Method 2 – Percentile_cont

As mentioned, at the time of writing this article, there is no median function in SQL Server. However, we can use the PERCENTILE_CONT function to achieve the same functionality.

The function returns the value ranked at a specific percent for a defined set of values. Hence, if we set the percent value to 0.5, the function will return a median value.

Consider the example query below:

SELECT product, price, percentile_cont(0.5)
           within-GROUP (ORDER BY price)
           OVER (partition BY product) AS median
           FROM sample_data ORDER BY product DESC;

The query returns the output as:

Learn more about PERCENTILE_CONT in the documentation.

Closing

This article discusses the statistical median and various ways to calculate a column’s median in SQL Server.



from https://ift.tt/3GCZIsb

Post a Comment

0 Comments