If you torture the data long enough, it will confess.

If you torture the data long enough, it will confess.

It is Week 3 already, yipeeeee, and we must say that the Side Hustle Bootcamp has been a skill fine-tuning platform. We have had to do a lot of collaboration, problem-solving, research, and communication which are basic soft skills a data analyst is required to possess. This week's task came in quite early, so the team had ample time to strategize on how to turn over the best deliverables.

The task was to work on FIFA World Cup Analysis with data from 1930 till present, using Microsoft Power BI to scrap, clean, and visualize the data. We recognized that to save ourselves from getting lost in this sea of data and ending up directionless, it becomes vital for us to focus on telling our story based on the required deliverables which are:

-Most Number of World Cup Winning Title

-Attendance, Number of Teams, Goals, and Matches per Cup

-Goals Per Team Per World Cup

-Matches With the Highest Number Of Attendance

-Stadium with Highest Average Attendance

-Which countries had won the cup?

-Number of goals per country

-Match outcome by home and away teams

Data Sourcing & Collection

After carrying out in-depth research, we decided to settle for the updated data sourced from the Wikipedia website, which categorically segmented all the features needed for our result visualization:

hashnode 1.PNG

hashnode 2.PNG

The data presented in a table format had information from 1930 till date. Immediately after data was sourced and agreed upon, the team got to work.

Data Scraping & Preparation

The data was scraped into Power BI using the menu option 'Get data', then it was cleaned and transformed through PowerQuery by removing duplicates, null values, and features that wouldn't be needed in the analysis. Each column was formatted with the correct data types, and the highest attendance column was split to separate the city from the country based on number, venue, and games.

Screenshot (9).png

Screenshot (10).png

Data Visualization.

After cleaning and transformation were done, the different tables were modeled using the primary key — divided into facts and dimension tables. As of the 2018 FIFA World Cup, twenty-one final tournaments have been held and a total of 79 national teams have competed. From the dashboard, we discovered that Brazil, which is the top goal-scoring team, has participated the most, and has also won the world cup 5 times, followed by Germany and Italy.

Uruguay 6–1 Yugoslavia, a Semi-final game in 1930, is the match with the highest number of attendance with Miroslav Klose of Germany as the all-time top goal scorer.

hashnode 3.PNG

hashnode 4.PNG

hashnode 5.PNG

hashnode 6.PNG

Through it all, we could say that week 3 came with a better experience, and sure the learning continues. So do stay glued to this page to know more about our journey.

Till next week,

Side Hustle Bootcamper.