題目
Table: Numbers
+-------------+------+
| Column Name | Type |
+-------------+------+
| num | int |
| frequency | int |
+-------------+------+
num is the primary key for this table.
Each row of this table shows the frequency of a number in the database.
The median is the value separating the higher half from the lower half of a data sample.
Write an SQL query to report the median of all the numbers in the database after decompressing the Numbers table. Round the median to one decimal point.
SQL Schema
Create table If Not Exists Numbers (num int, frequency int)
Truncate table Numbers
insert into Numbers (num, frequency) values ('0', '7')
insert into Numbers (num, frequency) values ('1', '1')
insert into Numbers (num, frequency) values ('2', '3')
insert into Numbers (num, frequency) values ('3', '1')
解題思考
- 題目要求輸出
Numbers表格的中位數median。
本題要求的邏輯為 :
1. 先將Numbers內的數字num以頻次frequency展開為等長的數列
2. L ← 將所有num展開的數列依num大小有序串接
3. 求出 L 的中位數median,並將中位數作為輸出結果
Input:
Numbers table:
+-----+-----------+
| num | frequency |
+-----+-----------+
| 0 | 7 |
| 1 | 1 |
| 2 | 3 |
| 3 | 1 |
+-----+-----------+
Output:
+--------+
| median |
+--------+
| 0.0 |
+--------+
Explanation:
If we decompress the Numbers table,
we will get [0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3],
so the median is (0 + 0) / 2 = 0.
- 中位數的定義 → 在一組數列
_S_中,有一半數字個數會小於中位數,另外一半數字個數會大於中位數。假設_L_← 數列_S_中,小於等於Numbers.num的數字個數_R_← 數列_S_中,大於等於Numbers.num的數字個數_N_← 數列_S_中,Numbers.num的數字個數
則數列_S_的數字個數應等同於_L + N + R_, 對Numbers中的所有num,該條件都成立 - 考慮當
_L_≠_R_的情況,若Numbers.num是中位數,當_N_加入至個數較少的短邊時 (_L_或_R_) ,被_N_加入的短邊會成為長邊:
當_L < R_,_L + N > R _當_R < L_,_R + N > L _若Numbers.num不是中位數,當_L < R_ 時,即使把_N_放到短邊,仍會得到_L + N < R_的結果,這表示_N_不在數列中間 - 考慮當
_L = R_的情況,根據定義已知Numbers.num是中位數
解決方案
select
round(avg(n.num),1) median
from Numbers n
where
n.Frequency >= abs(
(select sum(Frequency) from Numbers where num<=n.num) -
(select sum(Frequency) from Numbers where num>=n.num)
)