Covid-19 Data Analysis Case Study | Tai Wai Kang

Introduction

In this case study, I would like to do an analysis of the coronavirus (Covid-19) pandemic.

Ask

Available Dataset

I had chosen a dataset from OurWorldinData for the analysis, the dataset is completely open access under the Creative Commons BY license, the user has permission to use, distribute, and reproduce these in any medium, provided the source and authors are credited.

Analysis Task

Analyze covid-19 trends and discover how covid-19 vaccination affect humanity.

Analysis Question

  1. What are some trend in covid-19 in Malaysia?
  2. How could these trends affect humanity in Malaysia?

Prepare

ROCCC Analysis

Reliability: High, Data has large populations that included world continents and population.

Original: Low, The data are collected by Third-party and made available by Our World.

 Comprehensive: High, The data have continents, country, population, vaccination, and deaths, and data formats are clean and tidy.

Current:  High, The data collection starts from January 29, 2020 to March 15, 2022

Cited: High, The dataset is cited, click here to check the citation

Overall, the dataset integrity is accepted for covid-19 case study, the only weakness in the dataset is not collected from the first party, therefore this case study only could identify hint of covid-19 trend and shall be verified data collected by the first party.

Process

Tools for analysis

  • SQL server management studio
  • Excel
  • Tableau 

Analysis Process

  •  Separate the original dataset (CSV format) into two excel files ‘covid19_death.xls’ and ‘covid_vaccination.xls’ 
  • Ensure data integrity using excel files, check any data error (date format and structural errors) available in the dataset.
  • Import to SQL server database ‘Covid19‘.
  • Table Name: covid19_death$ and covid19_vaccination$

Figure 1: Imported Data to SQL server management studio

Data Cleaning

The dataset is clean and tidy, no errors were detected during this process.

  1. covid19_death
  2. covid_vaccination
  • Note: Dataset uploaded to google sheets for viewing purposes.

Analyse

Data Exploration

For the analysis stage, use the analysis question as a guide throughout the analysis. After the analysis, three trends are determined:

  • Total Cases vs Total Deaths in Malaysia
  • Rate of Malaysia Population Decrease
  • Percent Population Vaccinated
In order to find out the relationship of the trends, three SQL queries are created to aggregate data.
 

Total Cases vs Total Deaths in Malaysia

SQL Query: 

SELECT SUM(cast(total_cases as decimal)) AS Total_Case_Malaysia, SUM(cast(total_deaths as decimal)) AS Total_Deaths_Malaysia
,(SUM(cast(total_deaths as decimal))/SUM(cast(total_cases as decimal))) * 100 AS Malaysia_DeathPercentage
FROM covid19_death$
WHERE iso_code like ‘%M%’ and location = ‘Malaysia’

Observations: 

  • Malaysia have 671,417,690 total cases and 6,728,847 total deaths.
 

Rate of Malaysia Population Decrease

SQL Query: 

SELECT (cast(total_deaths as decimal)/cast(population as decimal)) * 100 AS Population_Malaysia ,date
FROM covid19_death$ WHERE Location = ‘Malaysia’ and total_deaths != ‘NULL’
ORDER BY date

Observations: 

  • From 2020-01-24 ~ 2020-03-16 the population calculation result is NULL.
  • To avoid confusion during visualization, the NULL value should be filtered out.
  • Malaysia Population Decrease percentage is increasing  
 

Percent Population Vaccinated

SQL Query: 

WITH tempPopulationVaccinated (continent, location, date, population, new_vaccinations_smoothed, AccumulateNewVaccination) AS 
(
SELECT Death.continent, Death.location, Death.date, cast(Death.population as decimal), Vaccine.new_vaccinations_smoothed
, SUM(cast(Vaccine.new_vaccinations_smoothed as decimal)) OVER(ORDER BY Death.date) AS AccumulateNewVaccination
FROM covid19_death$ Death INNER JOIN covid19_vaccination$ Vaccine
ON Death.location = Vaccine.location
AND Death.date = Vaccine.date
WHERE Death.location = ‘Malaysia’
group by Death.location, Death.continent, Death.date, Death.population, Vaccine.new_vaccinations_smoothed
)
SELECT *, (AccumulateNewVaccination/population) * 100 AS vaccinatedPopulation FROM tempPopulationVaccinated

Observations: 

  • Vaccination starts from 2021-02-24
  • Malaysia population vaccinated is increasing from 2021-02-24 to 2022-03-15

Share

Visualization is created in Tableau 

  • Total Cases vs Total Deaths in Malaysia

Malaysia death percentage is 1.002% from 671,417,690 covid cases. Total Death caused by Covid 19 are 6,728,847

Covid 19 had impact in Malaysia population, Sixty Seven Million people had death from the virus.

  • Percent Population of Malaysia Decrease

Malaysia population decrease percentage had increased from March 2020 to January 2022. The population decrease percentage had decreased from January 2022 to March 2022.

This chart had proved Malaysia population had an impact by Covid 19.

 

  • Percent Population Vaccinated

Malaysia population started to take covid 19 vaccinations in March 2021, the percentage keeps increasing until January 2022 and start to decrease from January 2022 to March 2022.

This chart had proved high percentage of Malaysia population had taken Covid 19 vaccinations.

I had discovered one interesting trend during visualization

  •  Effect of Vaccination towards Malaysia Population

Both on the graph had a similar trend, from May 2021, Malaysia Population increases drastically and Malaysia vaccination started in March 2021. 

According to the red line graph, vaccinated Malaysian achieves peak in January 2022, the yellow line graph had a trend that shows a decrease in death percentage.


Covid-19 Malaysia Analysis Dashboard

Share

Conclusion

From the analysis, we can see the clear impact of covid 19 on Malaysia population. The Malaysian total amount of death reached 6,728,847. However, we discovered that vaccination could be one the reason of Malaysia Population death percentage decreasing.

Recommendation

  • Malaysian are highly recommended for taking covid 19 vaccinations to avoid critical illness that cause death.
Hannah Ritchie, Edouard Mathieu, Lucas Rodés-Guirao, Cameron Appel, Charlie Giattino, Esteban Ortiz-Ospina, Joe Hasell, Bobbie Macdonald, Diana Beltekian and Max Roser (2020) - "Coronavirus Pandemic (COVID-19)". Published online at OurWorldInData.org. Retrieved from: 'https://ourworldindata.org/coronavirus' [Online Resource]

Description

Technology