The dashboard I did to land my first business intelligence internship

12 min readAug 10, 2023

Learn how to do an ETL process with Google Trends, Python, PostgreSQL and PowerBI.

1. INTRODUCTION

In the age of technology and changing consumer habits, the car industry in South America is at a critical point where traditional methods and modern digital marketing strategies come together. This is where exciting opportunities for growth and efficiency.

Our project aims to analyze the automotive industry in five South American countries — Peru, Colombia, Argentina, Bolivia, and Chile, focusing on various brands between 2021 and 2022. This analysis, presented through a dashboard such as Figure 2, seeks to provide insights for digital marketing purposes.

The article will be presented in 5 sections, which will be described as follows:

Section 1: Brief description that acts as the motivating foundation of this article.
Section 2: Explanation of the ETL diagram for the project.
Section 3: The technical section for the project where Python and pgAdmin4 will be used.
Section 4: Reporting data for the project insights.
Section 5: Project findings emphasizing the importance of visualization tools, digital marketing and suggesting future areas for study.

You can visit my GitHub in this link for the full code of the project: AlexRoman938/brand_car_dashboard (github.com)

2. ETL ARCHITECTURE DIAGRAM

ETL stands for Extract, Transform, Load. It is a data integration process that involves extracting data from various sources, transforming it into a consistent format, and loading it into a target system. ETL ensures data quality and enables analysis and reporting.

2.1. Extract Data

We will use Google Trends as a database to extract data, it is a public web-based tool that allows users to explore the popularity of search queries on Google. It can be used in Data Analytics projects to gather insights about the popularity of specific topics.

In this case, we are going to use to search car brand such as Kia, Mitsubishi, Peugeot, Fuso, Chery, MG and GAC Motor in some countries in South America such as Argentina, Bolivia, Chile, Colombia, and Peru, between 01–01–2021 and 31–12–2022.

2.2. Transform Data

Once we extracted data from it, we will have multiple dataframes for each car brand and country (i.e., dataframe for kia searches in Peru or MG searches in Colombia). Next, we need to concat all of them into one dataframe called “final dataframe” and make some feature engineering on it to climb to the next section.

2.3. Load Data

After the transform process we will load that “final dataframe” into pgadmin4, pgAdmin is an open-source administration and development platform for PostgreSQL.

2.4. Reporting Data

Finally, we will connect pgadmin4 and power bi to make an interactive dashboard.

3. CODING STAGE

In this stage we are going to code in Python 3.9 using Anconda Environment. So, we need to write the below command in Anaconda Navigator’s Prompt:

conda create -n project_car python=3.9

Where “project_car” is the name of our environment (You can change it).

Next, we have to go into to the project environment:

conda activate project_car

3.1. Extract Data Code

Once we set up our anaconda environment. We will use pytrends library, it is a Python package that provides a simple interface for interacting with the Google Trends API.

So, we need to write the below command:

pip install pytrends

First, we have to initialize all arguments (These were shown in Section 2.1).

from pytrends.request import TrendReq

if __name__ == '__main__':

    """
    It sets up a custom User-Agent header for making HTTP requests using 
    the requests library in Python.

    The User-Agent header is part of the HTTP protocol and is used to 
    identify the client making the request. It typically includes 
    information about the client's operating system, browser, and version.
    """

    requests_args = {
    'headers': {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
            }
    }

    pytrends = TrendReq(requests_args= requests_args)
    
    """
    ARGUMENTS THAT WE WANT TO SEARCH AND FILTER ♪
    """

    search_terms = ["Kia", "Mitsubishi", "Peugeot", "Fuso", "Chery", "MG", "GAC Motor"]

    countries_filt = ["AR", "BO", "CL", "CO", "PE"]

    start_date = '2021-01-01'

    end_date = '2022-12-31'

Where AR is Argentina, BO is Bolivia, CL is Chile, CO is Colombia and PE is Peru.

Next, we have to create a main function to search and filter of the previous argument…

from pytrends.request import TrendReq
import time

def main(search_terms, countries_filt, start_date, end_date, conn_string):
    
    """
    INPUT: ALL VARIABLES THAT WE WANT TO SEARCH AND FILTER
    
    TEMPORARY OUTPUT: MULTIPLE DATAFRAMES IN "df_list" (These were mentioned in Section 2.2)
    """
    
    df_list = []
    for search_term in search_terms:
        
        for country in countries_filt: 
            
            #cat is 12 because we want related searches in the automobile industry.
            pytrends.build_payload(kw_list=[search_term], cat=12, timeframe= f'{start_date} {end_date}', geo= country, gprop='') #cat = 12 for Autos & Vehicles
            interest_over_time_df = pytrends.interest_over_time()
            interest_over_time_df = interest_over_time_df.reset_index()
            interest_over_time_df["Country"] = country
            interest_over_time_df["Brand"] = search_term
     
            df_list.append(interest_over_time_df)

            """
            time.sleep(n) is used to introduce a delay or pause 
            in the execution of a program for a specified number of seconds. 
            In this case, This helps us pause for 3 seconds to avoid 
            overwhelming the Google Trends API.
            """
            time.sleep(3)



if __name__ == '__main__':

    requests_args = {
    'headers': {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
            }
    }

    pytrends = TrendReq(requests_args= requests_args)

    search_terms = ["Kia", "Mitsubishi", "Peugeot", "Fuso", "Chery", "MG", "GAC Motor"]

    countries_filt = ["AR", "BO", "CL", "CO", "PE"]

    start_date = '2021-01-01'

    end_date = '2022-12-31'

    main(search_terms, countries_filt, start_date, end_date, conn_string)

3.2. Transform Data

In this section we are going to use pandas library concat all dataframes of df_list and make some feature engineering. So move on… :D

from pytrends.request import TrendReq
import time
import pandas as pd

def main(search_terms, countries_filt, start_date, end_date, conn_string):
    
    """
    INPUT: ALL VARIABLES THAT WE WANT TO SEARCH AND FILTER
    
    TEMPORARY OUTPUT: A UNIQUE FINAL DATAFRAME 
                      WITH THE BELOW COLUMNS "COUNTRY, BRAND, #NUMBER (NUMBER SEARCH)
    """
    
    df_list = []
    for search_term in search_terms:
        
        for country in countries_filt: 
            
            #cat is 12 because we want related searches in the automobile industry.
            pytrends.build_payload(kw_list=[search_term], cat=12, timeframe= f'{start_date} {end_date}', geo= country, gprop='') #cat = 12 for Autos & Vehicles
            interest_over_time_df = pytrends.interest_over_time()
            interest_over_time_df = interest_over_time_df.reset_index()
            interest_over_time_df["Country"] = country
            interest_over_time_df["Brand"] = search_term
     
            df_list.append(interest_over_time_df)
            time.sleep(3)
    
    #Concat all dataframes of df_list
    final_df = pd.concat(df_list, axis=0)

    """
    Featuring engineering
    """
    #Drop
    final_df.drop(columns=['date', 'isPartial'], index= 0 ,inplace = True)

    #Groupby
    report = final_df.groupby(['Country', 'Brand']).agg({'Kia' : 'sum', 
                                                'Mitsubishi' : 'sum', 'Peugeot' : 'sum',
                                                'Fuso' : 'sum', 'Chery' : 'sum', 'MG' : 'sum','GAC Motor' : 'sum'})
    
    #Off index
    report.reset_index(inplace = True)

    #Fixed table
    selected_columns = ['Kia','Mitsubishi', 'Peugeot', 'Fuso', 'Chery',
       'MG', 'GAC Motor']

    row_sum = report.loc[:, selected_columns].sum(axis =1)

    report['#Number'] = row_sum.values

    #Drop columns that I won't use anymore
    report.drop(columns = selected_columns, axis = 0, inplace = True)

    #Change name of variables in the dataframe
    report['Country'] = report['Country'].replace({'AR':'Argentina', 'BO' : 'Bolivia', 'CL' : 'Chile', 'CO' : 'Colombia',
                                                 'PE' : 'Peru'})    

if __name__ == '__main__':

    requests_args = {
    'headers': {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
            }
    }

    pytrends = TrendReq(requests_args= requests_args)

    search_terms = ["Kia", "Mitsubishi", "Peugeot", "Fuso", "Chery", "MG", "GAC Motor"]

    countries_filt = ["AR", "BO", "CL", "CO", "PE"]

    start_date = '2021-01-01'

    end_date = '2022-12-31'

    main(search_terms, countries_filt, start_date, end_date, conn_string)

3.3. Load Data

This section could be more technical, as we need to install pgadmin4. Well, we can search on Youtube “how to install pgAdmin4 on [by our OS system such as Windows, Linux or macOS]” and we will find many tutorials.

Figure 7: Youtube’s video tutorials of pgAdmin4

Once we’ve installed pgAdmin4 in our system. We have to create a database for the project:

Next, we have to write database’s name and save💾.

Figure 9: Writing name of our database and save it

Excellent! 😍 We set up our database in pgadmin4. But for the moment our database is empty! 😮.

So, it’s time to load “df_final” to pgAdmin4.

from pytrends.request import TrendReq
import time
import pandas as pd

def main(search_terms, countries_filt, start_date, end_date, conn_string):
    
    """
    INPUT: ALL VARIABLES THAT WE WANT TO SEARCH AND FILTER
    
    OUTPUT: LOADING "FINAL_DATAFRAME" TO DATABASE
                      WITH THE BELOW COLUMNS "COUNTRY, BRAND, #NUMBER (NUMBER SEARCH)
    """
    
    df_list = []
    for search_term in search_terms:
        
        for country in countries_filt: 
            
            pytrends.build_payload(kw_list=[search_term], cat=12, timeframe= f'{start_date} {end_date}', geo= country, gprop='') #cat = 12 for Autos & Vehicles
            interest_over_time_df = pytrends.interest_over_time()
            interest_over_time_df = interest_over_time_df.reset_index()
            interest_over_time_df["Country"] = country
            interest_over_time_df["Brand"] = search_term
     
            df_list.append(interest_over_time_df)
            time.sleep(3)
    
    final_df = pd.concat(df_list, axis=0)

    final_df.drop(columns=['date', 'isPartial'], index= 0 ,inplace = True)

    report = final_df.groupby(['Country', 'Brand']).agg({'Kia' : 'sum', 
                                                'Mitsubishi' : 'sum', 'Peugeot' : 'sum',
                                                'Fuso' : 'sum', 'Chery' : 'sum', 'MG' : 'sum','GAC Motor' : 'sum'})
    
    report.reset_index(inplace = True)

    selected_columns = ['Kia','Mitsubishi', 'Peugeot', 'Fuso', 'Chery',
       'MG', 'GAC Motor']

    row_sum = report.loc[:, selected_columns].sum(axis =1)

    report['#Number'] = row_sum.values

    report.drop(columns = selected_columns, axis = 0, inplace = True)

    report['Country'] = report['Country'].replace({'AR':'Argentina', 'BO' : 'Bolivia', 'CL' : 'Chile', 'CO' : 'Colombia',
                                                 'PE' : 'Peru'})    
    
    #Loading to pgAdmin4
    report.to_sql(name = "2021_2022_vehicles", con = conn_string, if_exists="replace")

if __name__ == '__main__':

    """
    conn_string is used to establish a connection to PostgreSQL database.
    Remember to edit by our username and database_name
    """
    username = "snow"
    database_name = "car_project"
    conn_string = f"postgresql://postgres:{username}@localhost:5432/{database_name}"

    requests_args = {
    'headers': {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
            }
    }

    pytrends = TrendReq(requests_args= requests_args)

    search_terms = ["Kia", "Mitsubishi", "Peugeot", "Fuso", "Chery", "MG", "GAC Motor"]

    countries_filt = ["AR", "BO", "CL", "CO", "PE"]

    start_date = '2021-01-01'

    end_date = '2022-12-31'

    main(search_terms, countries_filt, start_date, end_date, conn_string)

NOW IT’S TIME TO RUN THE CODE…!!! 🏃🏃🏃

Congratulations!!! 🎊 We loaded “final_df” to our database in pgAdmin4. We can show it in tables.

4. Reporting Data

In this section, we have to download, connect and analyze the data on PowerBI. Therefore, for the sake of brevity, we have to download the file brand_cars_dashboard.pbix from the project’s GitHub repository.

4.1. Download File

Once we’re in the project’s GitHub repository, we need to click on “brand_cars_dashboard.pbix”.

Now, we have to click on the icon of “download”.

4.2. Connection pgAdmin4 and PowerBI

Once we’ve downloaded the file. First, we have to connect our database in pgAdmin4 with our PowerBI file:

Step 1: We’re going to open our PowerBI file. Moreover, in “Home” view → Click on“Get Data” → Click on“More…”

Step 2: First we have to type “postgre”. Next, we select “PostgreSQL database” and click on“Connect”.

Figure 14: Step 2 - Connection to PostgreSQL database

Step 3: First, we have to type on Sever and Database: “localhost” and “car_project” (It’s database’s name). Next, click on “OK”.

Figure 14: Step 3 - Connection to pgAdmin4 server

Step 4: We have to check on the “empty checkbox” and click on “Load”.

Once we’ve clicked on “Load”, Power BI will connect with pgAdmin4. Finally, it will show us the data.

4.3. Data Analysis

It’s time for thinking… How can we get insight from our data?

We could start with basic questions:

Q1: How many searches in total are there?
Q2: How many searches (%) by country are there?
Q3: How many searches by brand are there?
Q4: How many searches by brand and country are there?
Q5: Summarize Data

Q1: How many searches in total are there?

We have to click on card visualization… because it displays a single value or measure to show our KPI.

We can see there are 85K searches in total for the users.

Q2: How many searches (%) by country are there?

We have to click on Pie chart visualization… because we want to show a percentage (%) value.

We can see the most searches by Country is Chile (35.16%).

Q3: How many searches by brand are there?

We have to click on Stacked bar chart visualization… because we want to divide into segments our data.

We can see the most searches by Brand is MG (greater than 20K) for the users.

Q4: How many searches by brand and country are there?

We have to click on Clustered column chart visualization… because we want an easily graphs to show easy comparison between multiple categories and their respective values.

We can see on Argentina and Chile there are a lot of searches about MG and Peugeot.

Q5: Summarize Data

We have to click on Matrix… because we want to summarize the data in a structured format similar to a spreadsheet.

We can see a summarized table 😆.

If you want to know more about data analysis you can visit my profile and read my previous stories: [TUTORIAL] A BASIC DATA ANALYSIS WITH POSITIVES COVID 19 PERU USING PYTHON | by Alexander Roman | MLearning.ai | Medium and Exploratory Data Analysis (EDA) on MyAnimeList data | by Alexander Roman | MLearning.ai | Medium

5. CONCLUSION AND RECOMMENDATION

Power BI is really important for visualization tools, and it connects easily with PostgreSQL databases. It has an easy-to-use interface and strong abilities to turn data into useful information. When combined with PostgreSQL, it helps businesses make dynamic visuals, improve decision-making, and succeed with data-driven strategies.

During the analysis, we aimed to ascertain the search trends for brands and cards between 2021 and 2022 in certain countries within South America. Based on the outcomes, we are able to propose the following insights:

Utilize paid advertisements on platforms like Google Ads, Facebook Ads, or Instagram Ads to reach specific audiences interested in the MG and Peugeot car brands in Argentina and Chile, as these brands have a significant number of searches in these countries (Figure 20).
Implement content marketing for the MG brand by sharing relevant content to engage the entire audience interested in the brand, considering it’s the most searched brand overall (Figure 19).
Invest efficiently in digital marketing campaigns such as advertising, social media, etc., for the MG, Peugeot, KIA, Mitsubishi, and Chery brands across different South American countries due to the high user searches, exceeding 10,000 searches, compared to other brands (Figure 21).
Utilize influencer marketing in Chile, where searches overall account for 35.16% of the total (Figure 18). By collaborating with influencers passionate about automobiles, we can harness this trend to authentically and effectively present our brands, directly reaching a highly interested and engaged audience in the automotive world.

As a direction for future work, we suggest employing alternative data extraction methods to enhance the quality and quantity of the data. Additionally, using different visualization techniques such as box plots, scatter plots, treemaps, etc., is recommended to discover improved representations and insights for subsequent marketing campaigns.

Finally, thank you for reading this article. If you would like to contact me. This is my LinkedIn: Alexander Daniel Roman Gabriel | LinkedIn 🙌