본문 바로가기

Structured Query Language/MY_SQL_intermediate

SQL_Intermediate 03. 유저 세그먼트별 분석

03. 유저 세그먼트별 분석


Q13. 전체 유저의 Demographic을 알고 싶어요. 성, 연령별 유저 숫자를 알려주세요.
어느 세그먼트가 가장 숫자가 많나요? 

참고로 기타 성별은 하나로, 연령은 5세단위로 적당히 묶어주시고 숫자가 높은 순서대로 보여주세요.


CASE WHEN 사용NULL 값 오류 대처GROUP BY, ORDERBY 사용

 

SELECT CASE WHEN LENGTH(gender) < 1 THEN 'Others'
			ELSE gender
            END AS gender 
		,CASE WHEN age IS NULL THEN '무응답'
			 WHEN age <= 15 THEN '15세 이하'
             WHEN age <= 20 THEN '15~20세'
             WHEN age <= 25 THEN '21~25세'
             WHEN age <= 30 THEN '26~30세'
             WHEN age <= 35 THEN '31~35세'
             WHEN age <= 40 THEN '36~40세'
             WHEN age <= 45 THEN '41~45세'
			 WHEN age >= 46 THEN '46세 이상'
             END AS age 
	 , COUNT(*) AS cnt
FROM tbl_customer
GROUP BY 1 , 2
ORDER BY 3 DESC;

 

 

 

 

Q14 Q13결과의 성, 연령을 '남성(25~29세)'와 같이 통합해 주시고, 

각 성, 연령이 전체 고객에서 얼마나 차지하는지 분포(%)를 알려주세요,
역시 분포가 높은 순서대로 알려주세요


concat 함수는 두 텍스트를 붙여주는 함수
scala subquery 하나의 값에 사용가능

 

SELECT CONCAT ( CASE WHEN gender = 'M' THEN '남성'
					WHEN gender = 'F' THEN '여성'
                    WHEN LENGTH(gender) <1 THEN '기타'
                    WHEN gender = 'Others' THEN '기타'
                    END
				, "("
				,	CASE WHEN age IS NULL THEN '무응답'
					 WHEN age <= 15 THEN '15세 이하'
					 WHEN age <= 20 THEN '15~20세'
					 WHEN age <= 25 THEN '21~25세'
					 WHEN age <= 30 THEN '26~30세'
					 WHEN age <= 35 THEN '31~35세'
					 WHEN age <= 40 THEN '36~40세'
					 WHEN age <= 45 THEN '41~45세'
					 WHEN age >= 46 THEN '46세 이상'
					 END 
                     , ")" ) AS segement
                 , round(count(*) / ( SELECT count(*)
						from tbl_customer) * 100 , 2 ) AS per
FROM tbl_customer
GROUP BY 1 
ORDER BY 2 DESC ;

 

 

Q15. 2020년 7월, 성별에 따라 구매 건수와, 총 Revenue를 구해주세요. 

이전처럼 남녀 이외의 성별은 하나로 묶어주세요


JOIN과 GROUP BY 함께 사용

 

SELECT CASE WHEN B.gender = 'M' THEN '남성'
			WHEN B.gender = 'F' THEN '여성'
			WHEN LENGTH(B.gender) < 1 THEN '기타'
			WHEN B.gender = 'Others' THEN '기타'
			END AS gender
		, count(*) AS cnt
        , sum(price) AS revenue
FROM tbl_purchase AS A
LEFT JOIN tbl_customer AS B
ON A.customer_id = B.customer_id 
WHERE purchased_at >= '2020-07-01'
AND purchased_at < '2020-08-01'
GROUP BY 1  ;

 

 

 

Q16. 2020년 7월, 성별, 연령대에 따라 구매 건수와, 총 Revenue를 구해주세요.


JOIN과 GROUP BY 함께 사용

 

SELECT CASE WHEN B.gender = 'M' THEN '남성'
			WHEN B.gender = 'F' THEN '여성'
			WHEN LENGTH(B.gender) < 1 THEN '기타'
			WHEN B.gender = 'Others' THEN '기타'
			END AS gender
		, CASE
			 WHEN age <= 15 THEN '15세 이하'
			 WHEN age <= 20 THEN '15~20세'
			 WHEN age <= 25 THEN '21~25세'
			 WHEN age <= 30 THEN '26~30세'
			 WHEN age <= 35 THEN '31~35세'
			 WHEN age <= 40 THEN '36~40세'
			 WHEN age <= 45 THEN '41~45세'
			 WHEN age >= 46 THEN '46세 이상'
			 END AS age_group
		, count(*) AS cnt
        , sum(price) AS revenue
FROM tbl_purchase AS A
LEFT JOIN tbl_customer AS B
ON A.customer_id = B.customer_id 
WHERE purchased_at >= '2020-07-01'
AND purchased_at < '2020-08-01'
GROUP BY 1 , 2
ORDER BY 4 DESC;