Projects
DVD Rental Database Project
Dataset:
The dataset used for this project was the Sakila DVD Rental database that holds information about a company that rents DVD movies. In this project I used PostgreSQL to query the database in order to answer the questions presented below.
Objectives & Goals:
In this project, I queried the database to answer the following questions:
1: How many movies has each actor played in?
2: Are there movies that are not in the inventory and if yes how many are there?
3: Get the number of times each movie which is available for rent has been rented and their total revenue
4: Get the movies rental_rate that are not in the inventory.
5: How many customers have not returned more than 1 DVD?
6: How many movies has each customer rented?
7: Most rented movies by genre and how much were payed for them
8: Rental Count and Revenue by Genre and Date
9: How many times each movie was rented by genre (for the movies which are available for rent)
10: How many rented movies were returned late, early, and on time?
11: What were Total Rental for Family Movies?
12: How many movies in each category due by duration quartile?
13: What is the number of rentals per month for each store?
14: How much did our top customers pay in each month in 2007?
Analysis:
You can consult the queries I created for every question in the following GitHub repository:
https://github.com/fabiogoncalves27/DVD-Rental-Database-Project
Visualization:
In order to get a more compreensive representation of the results I created a dashboard using Power BI.
Above you can see the dashboard and consult the Power BI in the GitHub repository:
https://github.com/fabiogoncalves27/DVD-Rental-Database-Project
Investigating School Projects Python Project
Dataset:
The dataset used for this project were 6 .csv files named: 'Donations.csv', 'Donors.csv', 'Projects.csv', 'Resources.csv', 'Schools.csv' and 'Teachers.csv' that holds information about the school projects. You can consult the Entity Relationship Diagram (ERD) of this dataset in the picture presented below:
Objectives & Goals:
In this project, I used Visual Studio Code and created a Jupyter Notebook to manipulate the data of the csv files using Python.
1: Which 10 states have the most number of schools that opened projects to gather donations ?
2: What are the top 10 states in which schools gathered most amount of average donations for their projects ?
3: Investigate the maximum, minimum, mean, median, 25th and 75th percentiles of "Donation Amount" column.
4: What is the average donation amount across all projects ?
5: What are the minimum and maximums ?
6: In which states there are more donations done by donors.
7: Is there a relationship between the number of projects offered and number of donations made by the donors. Which states performing better in this case ?
8: How many of them responding project requests below average and which states are performing best in terms of donations per project ?
9: How many different project types exists? What is the total donation amount for each of them ?
10: How many project subject category trees exists ? Which ones attracted the most donations ?
11: What is the mean time that takes a project to be fully funded after posted and how it varies between states ?
12: In which 10 states project funded earlier than average days ?
Analysis:
You can consult the code to answer this questions on the Jupyter Notebook provided in the following GitHub repository:
https://github.com/fabiogoncalves27/School_Data_Project
Visualization:
In order to get a more compreensive representation of the results I created some plots in the Jupyter Notebook
Investigating Netflix Movies Python Project
Dataset:
The dataset used for this project was a .csv file named 'netflix_data.csv' that holds information about the streaming service. You can consult the dataset along with the following table detailing the column names and descriptions presented below:
Objectives & Goals:
In this project, I used Visual Studio Code and created a Jupyter Notebook to manipulate the data of the csv file using Python.
Analysis:
You can consult the files I created for this project in the following GitHub repository:
https://github.com/fabiogoncalves27/Netflix_Data_Project
Covid Data Project
Dataset:
The dataset used for this project was 2 Excel documents that contain information about Covid related deaths and Covid vaccination. In this project I used Microsoft SQL Server Management Studio to create a database using the 2 Excel files and to query the data in order to answer the questions presented below.
Objectives & Goals:
In this project, I queried the database to answer the following questions:
1: What are the global numbers for total cases, total deaths and death percentage?
2: What is the percentage of population infected per country?
3: What is the total death count per continent?
4: What is the average percentage of population infected over time?
Analysis:
You can consult the queries I created for every question in the following GitHub repository:
https://github.com/fabiogoncalves27/Data_Analyst_Portfolio
Visualization:
In order to get a more compreensive representation of the results I created a dashboard using Tableau.
Above you can see the dashboard or consult the Tableau dashboard in the following link: