題目
Table: Numbers
+-------------+------+
| Column Name | Type |
+-------------+------+
| num | int |
| frequency | int |
+-------------+------+
num is the primary key for this table.
Each row of this table shows the frequency of a number in the database.
The median
is the value separating the higher half from the lower half of a data sample.
Write an SQL query to report the median
of all the numbers in the database after decompressing the Numbers
table. Round the median to one decimal point
.
SQL Schema
Create table If Not Exists Numbers (num int, frequency int)
Truncate table Numbers
insert into Numbers (num, frequency) values ('0', '7')
insert into Numbers (num, frequency) values ('1', '1')
insert into Numbers (num, frequency) values ('2', '3')
insert into Numbers (num, frequency) values ('3', '1')
解題思考
- 題目要求輸出
Numbers
表格的中位數median
。
本題要求的邏輯為 :
1. 先將Numbers
內的數字num
以頻次frequency
展開為等長的數列
2. L ← 將所有num
展開的數列依num
大小有序串接
3. 求出 L 的中位數median
,並將中位數作為輸出結果
Input:
Numbers table:
+-----+-----------+
| num | frequency |
+-----+-----------+
| 0 | 7 |
| 1 | 1 |
| 2 | 3 |
| 3 | 1 |
+-----+-----------+
Output:
+--------+
| median |
+--------+
| 0.0 |
+--------+
Explanation:
If we decompress the Numbers table,
we will get [0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3],
so the median is (0 + 0) / 2 = 0.
- 中位數的定義 → 在一組數列
_S_
中,有一半數字個數會小於中位數,另外一半數字個數會大於中位數。假設_L_
← 數列_S_
中,小於等於Numbers.num
的數字個數_R_
← 數列_S_
中,大於等於Numbers.num
的數字個數_N_
← 數列_S_
中,Numbers.num
的數字個數
則數列_S_
的數字個數應等同於_L + N + R_
, 對Numbers
中的所有num
,該條件都成立 - 考慮當
_L_
≠_R_
的情況,若Numbers.num
是中位數,當_N_
加入至個數較少的短邊
時 (_L_
或_R_
) ,被_N_
加入的短邊
會成為長邊
:
當_L < R_
,_L + N > R _
當_R < L_
,_R + N > L _
若Numbers.num
不是中位數,當_L < R_ 時
,即使把_N_
放到短邊,仍會得到_L + N < R_
的結果,這表示_N_
不在數列中間 - 考慮當
_L = R_
的情況,根據定義已知Numbers.num
是中位數
解決方案
select
round(avg(n.num),1) median
from Numbers n
where
n.Frequency >= abs(
(select sum(Frequency) from Numbers where num<=n.num) -
(select sum(Frequency) from Numbers where num>=n.num)
)