Sqlzoo练习笔记

前置课程

未明学院的《SQL零基础教程》

sqlzoo官网

https://sqlzoo.net/

#文字样式匹配查询

https://sqlzoo.net/wiki/SELECT_names

15.For Monaco-Ville the name is Monaco and the extension is -Ville.

Show the name and the extension where the capital is an extension of name of the country.

方法一:REPLACE函数。

1
2
3
4
SELECT w.name, REPLACE(w.capital, w.name, '')
FROM world AS w
WHERE w.capital LIKE concat('%', w.name, '%')
AND w.capital != w.name
1
2
3
SELECT name,REPLACE(capital,name,'') AS extension
FROM world
WHERE capital Like CONCAT(name, '_%')

方法二:截取字符串

1
2
3
4
SELECT w.name, RIGHT(w.capital, length(w.capital)-length(w.name))
FROM world AS w
WHERE w.capital LIKE concat('%', w.name, '%')
AND w.capital != w.name

#Nobel表查询

https://sqlzoo.net/wiki/SELECT_from_Nobel_Tutorial

14.The expression subject IN ('Chemistry','Physics') can be used as a value - it will be 0 or 1.

Show the 1984 winners and subject ordered by subject and winner name; but list Chemistry and Physics last.

1
2
3
4
SELECT winner, subject
FROM nobel
WHERE yr = 1984
ORDER BY subject IN ('Chemistry','Physics'), subject, winner

subject IN ('Chemistry','Physics')是一个逻辑判断,输出0和1,也可以作为排序的依据,这样物理和化学的就输出1排在最后了。

winner subject subject IN ('Chemistry','Physics')
Richard Stone Economics 0
Jaroslav Seifert Literature 0
César Milstein Medicine 0
Georges J.F. Köhler Medicine 0
Niels K. Jerne Medicine 0
Desmond Tutu Peace 0
Bruce Merrifield Chemistry 1
Carlo Rubbia Physics 1
Simon van der Meer Physics 1

#子查询原理

https://sqlzoo.net/wiki/Using_nested_SELECT

4.Show each country that has a population greater than the population of ALL countries in Europe.

Note that we mean greater than every single country in Europe; not the combined population of Europe.

1
2
3
4
SELECT name FROM world
WHERE population > ALL
(SELECT population FROM world
  WHERE continent='Europe')

#子查询练习

https://sqlzoo.net/wiki/SELECT_within_SELECT_Tutorial

5.Germany (population 80 million) has the largest population of the countries in Europe. Austria (population 8.5 million) has 11% of the population of Germany.

Show the name and the population of each country in Europe. Show the population as a percentage of the population of Germany.

1
SELECT name, concat(ROUND((population/(SELECT population FROM world WHERE name='Germany'))*100,0), '%') FROM world WHERE continent = 'Europe'

7.Find the largest country (by area) in each continent, show the continent, the name and the area.

1
2
3
4
5
SELECT continent, name, area FROM world x
WHERE area >= ALL
 (SELECT area FROM world y
     WHERE y.continent=x.continent
       AND area>0)

8.List each continent and the name of the country that comes first alphabetically.

方法一:

1
2
3
4
5
SELECT continent, name FROM world x
WHERE name IN
 (SELECT TOP 1 y.name FROM world y
     WHERE y.continent=x.continent
       ORDER BY y.name)

注意:The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified. 所以上述方法需要加TOP才行。

也可以用MIN就不需要ORDER BY了。

方法二:

1
2
3
4
SELECT continent, name FROM world x
WHERE name <= ALL
 (SELECT MIN(y.name) FROM world y
     WHERE y.continent=x.continent)

9.Find the continents where all countries have a population <= 25000000. Then find the names of the countries associated with these continents. Show name, continent and population.

1
2
3
4
SELECT name, continent, population FROM world x
WHERE 25000000 > ALL (
 (SELECT y.population FROM world y
     WHERE y.continent=x.continent))

10.Some countries have populations more than three times that of any of their neighbours (in the same continent). Give the countries and continents.

1
2
3
4
SELECT name, continent FROM world x
WHERE population > ALL (
 (SELECT y.population*3 FROM world y
     WHERE y.continent=x.continent AND x.name!=y.name))

#表连接练习1

https://sqlzoo.net/wiki/The_JOIN_operation

11.For every match involving 'POL', show the matchid, date and the number of goals scored.

1
2
3
SELECT matchid, mdate, COUNT(matchid)
FROM game JOIN goal ON (matchid = id)
WHERE (team1 = 'POL' OR team2 = 'POL') GROUP BY matchid, mdated

13.List every match with the goals scored by each team as shown. This will use "CASE WHEN" which has not been explained in any previous exercises.

mdate team1 score1 team2 score2
1 July 2012 ESP 4 ITA 0
10 June 2012 ESP 1 ITA 1
10 June 2012 IRL 1 CRO 3
...

Notice in the query given every goal is listed. If it was a team1 goal then a 1 appears in score1, otherwise there is a 0. You could SUM this column to get a count of the goals scored by team1. Sort your result by mdate, matchid, team1 and team2.

1
2
3
4
SELECT mdate,
team1,SUM(CASE WHEN teamid=team1 THEN 1 ELSE 0 END) score1,
team2,SUM(CASE WHEN teamid=team2 THEN 1 ELSE 0 END) score2
FROM game JOIN goal ON matchid = id GROUP BY mdate, team1, team2 ORDER BY mdate

我的和答案的差别在于0:0的答案保留,而我的则去除了。所以怎样保留0:0的答案呢?

原来正确答案为:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
SELECT mdate,
       team1,
       SUM(CASE WHEN teamid=team1 THEN 1 ELSE 0 END) score1,
       team2,
       SUM(CASE WHEN teamid=team2 THEN 1 ELSE 0 END) score2
FROM game 
LEFT JOIN goal ON matchid = id
GROUP BY mdate,matchid,team1,team2
ORDER BY mdate,matchid,team1,team2;		-- LEFT JOIN是为了防止缺少比赛中双方都没进球的情况
-- 原文链接:https://blog.csdn.net/weixin_45739141/article/details/104079549

#表连接练习2

https://sqlzoo.net/wiki/More_JOIN_operations

7.Obtain the cast list for the film 'Alien'

1
2
3
4
5
SELECT actor.name FROM casting JOIN actor
ON (casting.actorid=actor.id)
WHERE movieid IN
(SELECT id FROM movie WHERE title = 'Alien'
AND actorid=actor.id

8.List the films in which 'Harrison Ford' has appeared

1
2
3
4
5
SELECT title FROM movie JOIN casting
ON (movie.id=casting.movieid)
WHERE casting.actorid =
(SELECT actor.id FROM actor
WHERE actor.name LIKE '%Harrison Ford%')

10.List the films together with the leading star for all 1962 films.

1
2
3
4
5
6
7
SELECT title,
(SELECT actor.name FROM actor
 WHERE actor.id=casting.actorid)
FROM movie JOIN casting
ON (movie.id=casting.movieid)
WHERE casting.ord = 1
AND movie.yr=1962

11Which were the busiest years for 'Rock Hudson', show the year and the number of movies he made each year for any year in which he made more than 2 movies.

1
2
3
4
5
6
SELECT yr,COUNT(title) FROM
  movie JOIN casting ON movie.id=movieid
        JOIN actor   ON actorid=actor.id
WHERE name='Rock Hudson'
GROUP BY yr
HAVING COUNT(title) > 1

12.List the film title and the leading actor for all of the films 'Julie Andrews' played in.

1
2
3
4
5
6
7
8
SELECT title, name FROM casting
JOIN movie ON (casting.movieid=movie.id)
JOIN actor ON (casting.actorid=actor.id)
WHERE movieid IN
(SELECT movieid FROM actor
JOIN casting ON (actor.id=casting.actorid)
WHERE name = 'Julie Andrews')
AND ord = 1

#自连接练习

https://sqlzoo.net/wiki/Self_join

5.Execute the self join shown and observe that b.stop gives all the places you can get to from Craiglockhart, without changing routes. Change the query so that it shows the services from Craiglockhart to London Road.

1
2
3
4
5
SELECT a.company, a.num, a.stop, b.stop
FROM route a JOIN route b ON
  (a.company=b.company AND a.num=b.num)
WHERE a.stop=53
AND b.stop=(SELECT id FROM stops WHERE name='London Road')

6.The query shown is similar to the previous one, however by joining two copies of the stops table we can refer to stops by name rather than by number. Change the query so that the services between 'Craiglockhart' and 'London Road' are shown.

1
2
3
4
5
6
7
SELECT a.company, a.num, stopa.name, stopb.name
FROM route a JOIN route b ON
  (a.company=b.company AND a.num=b.num)
  JOIN stops stopa ON (a.stop=stopa.id)
  JOIN stops stopb ON (b.stop=stopb.id)
WHERE stopa.name='Craiglockhart'
AND stopb.name='London Road'

10.Find the routes involving two buses that can go from Craiglockhart to Lochend.
Show the bus no. and company for the first bus, the name of the stop for the transfer,
and the bus no. and company for the second bus.

Hint

Self-join twice to find buses that visit Craiglockhart and Lochend, then join those on matching stops.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SELECT S1.num, S1.company, S1.name, S2.num, S2.company
FROM
(SELECT stopb.name, a.company, a.num FROM route a JOIN route b
ON (a.company=b.company AND a.num=b.num)
JOIN stops stopa ON (a.stop=stopa.id)
JOIN stops stopb ON (b.stop=stopb.id)
WHERE stopa.name='Craiglockhart') AS S1
JOIN
(SELECT stopa.name, a.company, a.num FROM route a JOIN route b
ON (a.company=b.company AND a.num=b.num)
JOIN stops stopa ON (a.stop=stopa.id)
JOIN stops stopb ON (b.stop=stopb.id)
WHERE stopb.name='Lochend') AS S2
ON (S1.name=S2.name)

#常用函数

https://sqlzoo.net/wiki/Functions_Reference

🤣

CAST allows you to convert from one type to another.

Often a CAST is implicit - for example if you concatenate a string with a number the number will be automatically changed to a string. However sometimes you need to make the CAST explicit.

1
CAST(expr TO type) 

这个很有用:

1
SELECT CAST(population/1000000 AS DECIMAL(8,1))

🤣

COALESCE takes any number of arguments and returns the first value that is not null.

1
2
3
4
COALESCE(x,y,z) = x if x is not NULL
COALESCE(x,y,z) = y if x is NULL and y is not NULL
COALESCE(x,y,z) = z if x and y are NULL but z is not NULL
COALESCE(x,y,z) = NULL if x and y and z are all NULL

🤣

a DIV b returns the integer value of a divided by b.

1
8 DIV 3 -> 2

整除原来是这个。

🤣

EXTRACT allows you to retrieve components of a date.

You can extract also YEAR, MONTH, DAY, HOUR, MINUTE, SECOND.

1
2
3
EXTRACT(YEAR FROM d)    EXTRACT(MONTH FROM d)
EXTRACT(DAY FROM d)     EXTRACT(HOUR FROM d)
EXTRACT(MINUTE FROM d)  EXTRACT(SECOND FROM d)

🤣

IFNULL takes two arguments and returns the first value that is not null.

1
2
IFNULL(x,y) = x if x is not NULL
IFNULL(x,y) = y if x is NULL

这个和COALESCE(coalesce 合并、结合)功能是一样的。

🤣

INSTR(s1, s2) returns the character position of the substring s2 within the larger string s1. The first character is in position 1. If s2 does not occur in s1 it returns 0.

1
INSTR('Hello world', 'll') -> 3 

🤣

NULLIF returns NULL if the two arguments are equal; otherwise NULLIF returns the first argument.

1
2
NULLIF(x,y) = NULL if x=y
NULLIF(x,y) = x if x != y     

🤣

d + i returns the date i days after the date d.

1
DATE '2006-05-20' + 7  -> DATE '2006-05-27'  

这个应该没问题。

🤣

d + INTERVAL i DAY returns the date i days after the date d.

You can also add YEAR, MONTH, DAY, HOUR, MINUTE, SECOND

You can also add a negative value.

DATE '2006-05-20' + INTERVAL 5 DAY   -> DATE '2006-05-25' 
DATE '2006-05-20' + INTERVAL 5 MONTH -> DATE '2006-10-20' 
DATE '2006-05-20' + INTERVAL 5 YEAR  -> DATE '2011-05-20' 

🤣

POSITION(s1 IN s2) returns the character position of the substring s1 within the larger string s2. The first character is in position 1. If s1 does not occur in s2 it returns 0.

1
POSITION('ll' IN 'Hello world') -> 3

这个和INSTR功能一样,只是后比前大。

🤣

REPLACE(f, s1, s2) returns the string f with all occurances of s1 replaced with s2.

1
REPLACE('vessel','e','a') -> 'vassal'

🤣

SUBSTRING allows you to extract part of a string.

它有两种写法:

1
SUBSTRING('Hello world' FROM 2 FOR 3) -> 'ell'
1
SUBSTRING('Hello world', 2, 3) -> 'ell'

#窗口函数

Window functions - SQLZOO

5.You can use SELECT within SELECT to pick out only the winners in Edinburgh.

Show the parties that won for each Edinburgh constituency in 2017.

这题非常简单,要求的是每个地区的第一名,直接用partiton by把各地域的投票数排序后再选第一名即可。

1
2
3
4
5
6
7
SELECT constituency, party FROM
(SELECT constituency, party, votes,
RANK() OVER (PARTITION BY constituency ORDER BY votes DESC) as posn
FROM ge WHERE constituency
BETWEEN 'S14000021' AND 'S14000026'
AND yr  = 2017) AS m
WHERE posn = 1

6.You can use COUNT and GROUP BY to see how each party did in Scotland. Scottish constituencies start with 'S'

Show how many seats for each party in Scotland in 2017.

1
2
3
4
SELECT party, COUNT(*)
FROM ge
WHERE constituency LIKE 'S%' AND yr=2017
GROUP BY party

这答案不对吗?Scottish的constituency应该比以S打头的范围小吧?

懂了,获得seat需要在当地第一。(就是在前一题的基础上把各党派获奖情况算一算)

1
2
3
4
5
6
7
8
SELECT party,COUNT(*) FROM (
SELECT constituency,party, votes,
RANK() OVER (PARTITION BY constituency ORDER BY votes DESC) as posn
FROM ge
WHERE constituency LIKE 'S%'
AND yr  = 2017) rk
WHERE rk.posn=1
GROUP BY rk.party

Window LAG - SQLZOO

3.The number of confirmed case is cumulative - but we can use LAG to recover the number of new cases reported for each day.

Show the number of new cases for each day, for Italy, for March.

(SET @@sql_mode='ANSI';这题该网站要加上这句才能跑)

1
2
3
4
5
6
7
8
SELECT name, DAY(whn),
confirmed-LAG(confirmed, 1)
OVER (PARTITION BY name ORDER BY whn)
FROM covid
WHERE name = 'Italy'
AND MONTH(whn) = 3
AND YEAR(whn) = '2020'
ORDER BY whn;

这样一样:

1
2
3
4
5
6
7
8
9
SELECT name, DAY(whn),
CASE WHEN DAY(whn)=1 THEN null
ELSE confirmed-LAG(confirmed, 1)
OVER (PARTITION BY name ORDER BY whn) END
FROM covid
WHERE name = 'Italy'
AND MONTH(whn) = 3
AND YEAR(whn) = '2020'
ORDER BY whn;

4.The data gathered are necessarily estimates and are inaccurate. However by taking a longer time span we can mitigate some of the effects.

You can filter the data to view only Monday's figures WHERE WEEKDAY(whn) = 0.

Show the number of new cases in Italy for each week - show Monday only.

1
2
3
4
5
6
7
SELECT name
,DATE_FORMAT(whn,'%Y-%m-%d')
,confirmed-LAG(confirmed, 1) OVER (partition by name ORDER BY whn) AS lag
FROM covid
WHERE name = 'Italy'
AND WEEKDAY(whn) = 0 AND YEAR(whn) = 2020
ORDER BY whn;

5.You can JOIN a table using DATE arithmetic. This will give different results if data is missing.

Show the number of new cases in Italy for each week - show Monday only.

In the sample query we JOIN this week tw with last week lw using the DATE_ADD function.

1
2
3
4
5
6
7
8
SELECT tw.name, DATE_FORMAT(tw.whn,'%Y-%m-%d'), 
tw.confirmed-lw.confirmed
FROM covid tw LEFT JOIN covid lw ON 
DATE_ADD(lw.whn, INTERVAL 1 WEEK) = tw.whn
AND tw.name=lw.name
WHERE tw.name = 'Italy'
AND WEEKDAY(tw.whn)=0
ORDER BY tw.whn

这题是上一题的另一种解法,通过变换日期然后JOIN就直接获得了同在一行的这周和上周的确诊,直接相减即获得新增确诊。

7.The query shown includes a JOIN t the world table so we can access the total population of each country and calculate infection rates (in cases per 100,000).

Show the infect rate ranking for each country. Only include countries with a population of at least 10 million.

1
2
3
4
5
6
SELECT 
world.name,ROUND(100000*confirmed/population,0) infection,
rank()over(order by infection desc)
FROM covid JOIN world ON covid.name=world.name
WHERE whn = '2020-04-20' AND population > 10000000
ORDER BY population DESC

这题答案有问题。

8.For each country that has had at last 1000 new cases in a single day, show the date of the peak number of new cases.

先求出每日新增,然后给每日新增排序。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
SET @@sql_mode='ANSI';
SELECT name
,DATE_FORMAT(whn,'%Y-%m-%d')
,peakNewCases FROM
    (SELECT *
            ,RANK() OVER(PARTITION BY name ORDER BY peakNewCases DESC) AS rank
     FROM
             (SELECT name, whn,
                     confirmed-LAG(confirmed, 1)
                     OVER (PARTITION BY name ORDER BY whn)
                     AS peakNewCases
                     FROM covid) AS c WHERE peakNewCases > 1000)
             AS cc
     WHERE rank=1
     order by whn