top of page

Projects

DVD Rental Database Project

Dataset:


The dataset used for this project was the Sakila DVD Rental database that holds information about a company that rents DVD movies. In this project I used PostgreSQL to query the database in order to answer the questions presented below.
 


Objectives & Goals:


In this project, I queried the database to answer the following questions:

1: How many movies has
each actor played in?
2: Are there movies that are not in the inventory and if yes how many are there?
3: Get
the number of times each movie which is available for rent has been rented and their total revenue
4: Get the movies rental_rate
that are not in the inventory.
5: How many
customers have not returned more than 1 DVD?
6: How many movies has each customer rented?
7: Most rented movies by genre and how much were payed for them
8: Rental Count and Revenue by Genre and Date
9: How many times each movie was rented by genre (for the movies which are available for rent)
10
: How many rented movies were returned late, early, and on time?
11: What were Total Rental for Family Movies?
12: How many movies in each category due by duration quartile?
13: What is the number of rentals per month for each store?
14: How much did our top customers pay in each month in 2007?

 


Analysis:


You can consult the queries I created for every question in the following GitHub repository:

https://github.com/fabiogoncalves27/DVD-Rental-Database-Project

Visualization:


In order to get a more compreensive representation of the results I created a dashboard using Power BI.

Above you can see the dashboard and consult the Power BI in the GitHub repository:

https://github.com/fabiogoncalves27/DVD-Rental-Database-Project

Investigating School Projects Python Project

Dataset:


The dataset used for this project were 6 .csv files named: 'Donations.csv', 'Donors.csv', 'Projects.csv', 'Resources.csv', 'Schools.csv' and 'Teachers.csv' that holds information about the school projects. You can consult the Entity Relationship Diagram (ERD) of this dataset in the picture presented below:

ERD.jpg

Objectives & Goals:

In this project, I used Visual Studio Code and created a Jupyter Notebook to manipulate the data of the csv files using Python.

1: Which 10 states have the most number of schools that opened projects to gather donations ?

2: What are the top 10 states in which schools gathered most amount of average donations for their projects ?

3: Investigate the maximum, minimum, mean, median, 25th and 75th percentiles of "Donation Amount" column.

4: What is the average donation amount across all projects ?

5: What are the minimum and maximums ?

6: In which states there are more donations done by donors.

7: Is there a relationship between the number of projects offered and number of donations made by the donors. Which states performing better in this case ?

8: How many of them responding project requests below average and which states are performing best in terms of donations per project ?

9: How many different project types exists? What is the total donation amount for each of them ?

10: How many project subject category trees exists ? Which ones attracted the most donations ?

11: What is the mean time that takes a project to be fully funded after posted and how it varies between states ?

12: In which 10 states project funded earlier than average days ?


Analysis:


You can consult the code to answer this questions on the Jupyter Notebook provided in the following GitHub repository:

https://github.com/fabiogoncalves27/School_Data_Project

Visualization:

In order to get a more compreensive representation of the results I created some plots in the Jupyter Notebook

Investigating Netflix Movies Python Project

Dataset:


The dataset used for this project was a .csv file named 'netflix_data.csv' that holds information about the streaming service. You can consult the dataset along with the following table detailing the column names and descriptions presented below:

table.png

Objectives & Goals:

In this project, I used Visual Studio Code and created a Jupyter Notebook to manipulate the data of the csv file using Python.


Analysis:

You can consult the files I created for this project in the following GitHub repository:
https://github.com/fabiogoncalves27/Netflix_Data_Project

Covid Data Project

Dataset:


The dataset used for this project was 2 Excel documents that contain information about Covid related deaths and Covid vaccination. In this project I used Microsoft SQL Server Management Studio to create a database using the 2 Excel files and to query the data in order to answer the questions presented below.
 


Objectives & Goals:


In this project, I queried the database to answer the following questions:

1: What are the global numbers for total cases, total deaths and death percentage?

2: What is the percentage of population infected per country?
3: What is the total death count per continent?

4: What is the average percentage of population infected over time?

 


Analysis:


You can consult the queries I created for every question in the following GitHub repository:

https://github.com/fabiogoncalves27/Data_Analyst_Portfolio

Visualization:


In order to get a more compreensive representation of the results I created a dashboard using Tableau.

Above you can see the dashboard or consult the Tableau dashboard in the following link:

https://public.tableau.com/app/profile/f.bio.gon.alves/viz/CovidProjectDashboard_17017008451520/Painel1

Painel 1.png
bottom of page