題目
Table: Failed
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| fail_date | date |
+--------------+---------+
fail_date is the primary key for this table.
This table contains the days of failed tasks.
Table: Succeeded
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| success_date | date |
+--------------+---------+
success_date is the primary key for this table.
This table contains the days of succeeded tasks.
A system is running one task every day
. Every task is independent of the previous tasks. The tasks can fail or succeed.
Write an SQL query to generate a report of period_state
for each continuous interval of days in the period from 2019-01-01
to 2019-12-31
.
period_state
is ’failed'
if tasks in this interval failed or 'succeeded'
if tasks in this interval succeeded. Interval of days are retrieved as start_date
and end_date.
Return the result table ordered by start_date
.
SQL Schema
Create table If Not Exists Failed (fail_date date)
Create table If Not Exists Succeeded (success_date date)
Truncate table Failed
insert into Failed (fail_date) values ('2018-12-28')
insert into Failed (fail_date) values ('2018-12-29')
insert into Failed (fail_date) values ('2019-01-04')
insert into Failed (fail_date) values ('2019-01-05')
Truncate table Succeeded
insert into Succeeded (success_date) values ('2018-12-30')
insert into Succeeded (success_date) values ('2018-12-31')
insert into Succeeded (success_date) values ('2019-01-01')
insert into Succeeded (success_date) values ('2019-01-02')
insert into Succeeded (success_date) values ('2019-01-03')
insert into Succeeded (success_date) values ('2019-01-06')
解題思考 (Variable 版本)
題目要求輸出 period_state
、 start_date
和 end_date
欄位的表格。
其中將連續日期都為相同 state 的定義為一個 period,且 start_date
和 end_date
將表示該 period 的起始日期和結束日期。
+--------------+--------------+--------------+
| period_state | start_date | end_date |
+--------------+--------------+--------------+
| succeeded | 2019-01-01 | 2019-01-03 |
| failed | 2019-01-04 | 2019-01-05 |
| succeeded | 2019-01-06 | 2019-01-06 |
+--------------+--------------+--------------+
- 最開始,我的想法是利用 variable,在每次
period_state
變換時,透過變數將該period_state
第一個日期作為錨點anchor
,再使用rank()
對start_date
做升序排序rn
。 - 透過
with clause
建立status
表格,聯合Failed
表格和Successed
表格,篩選出fail_date
、success_date
介於2019–01–01
至2019-12-31
的資料集。 - 透過
with clause
建立date_anchor
表格,擴增anchor_date
欄位以便後續找出end_date
。variable current_state
← 儲存當前 record 的 period_statevariable date_anchor
← 儲存當前 period 的第一個日期
透過子查詢初始化variable current_state
和variable date_anchor
- 查詢
date_anchor
以建立子查詢表格a
。
子查詢表格中,利用rank()
對start_date
做升序,以標記排序結果rn
。並利用max(start_date) over(partition by anchor_date)
取得每個 period 中最大的start_date
,即該 period 的end_date
。 - 查詢
a
作為主要表格,利用group by end_date
和having min(rn)
以篩選出符合輸出條件的資料。
解決方案 (Variable 版本)
with
status as (
select
period_state, start_date
from (
select fail_date as start_date, 'failed' as period_state from Failed
union all
select success_date as start_date, 'succeeded' as period_state from Succeeded
) tmp
where start_date between '2019-01-01' and '2019-12-31'
order by start_date
),
date_anchor as (
select
period_state, start_date, anchor_date
from(
select
a.period_state,
a.start_date,
if(@current_state=a.period_state, @date_anchor, a.start_date) as anchor_date,
if(@current_state=a.period_state, @date_anchor, @date_anchor:=a.start_date),
@current_state:=a.period_state
from status a, (select @current_state:="initial", @date_anchor:="2018-12-31") init
) anchor
)
select
a.period_state, a.start_date, a.end_date
from (
select
period_state, start_date,
max(start_date) over(partition by anchor_date) as end_date,
rank() over(order by start_date) rn
from date_anchor
) a
group by end_date
having min(rn)
order by start_date
解題思考 (Rank版本)
- 在提交了 variable 版本後,發現連續日期在排序後,其
rn
等差為1
,這表示每次period_state
變換時,取第一個日期作為該period
的最小值,則該period
中所有資料都應符合start_date-rn=min(start_date)
的條件,因此又寫了一個 Rank 版本。 - 透過
with clause
分別建立failed_state
、successeded_state
表格。
篩選資料時間介於2019–01–01
至2019-12-31
的資料集,
擴增period_state
欄位,
利用rank() over(order by )
標註排序結果rn
。 - 做子查詢表格
opt
,聯合failed_state
和successeded_state
。
在failed_state
表格,以group by period_state, date_add(fail_date, interval -rn day)
,分別找出failed_state
和successeded_state
,每個 period 的min(date)
和max(date)
,作為輸出結果的start_date
和end_date
欄位。
解決方案 (Rank版本)
with
failed_state as (
select
'failed' as period_state,
fail_date,
rank() over(order by fail_date) -1 as rn
from Failed
where fail_date between '2019-01-01' and '2019-12-31'
),
succeeded_state as (
select
'succeeded' as period_state,
success_date,
rank() over(order by success_date) -1 as rn
from Succeeded
where success_date between '2019-01-01' and '2019-12-31'
)
select
period_state, start_date, end_date
from (
select
period_state,
min(fail_date) as start_date,
max(fail_date) as end_date
from failed_state
group by period_state, date_add(fail_date, interval -rn day)
union
select
period_state,
min(success_date) as start_date,
max(success_date) as end_date
from succeeded_state
group by period_state, date_add(success_date, interval -rn day)
) opt