Taxi Drivers Efficiency Analysis with SQL & Tableau

I recently completed a hands-on SQL project using the Chicago Taxi Trips dataset on BigQuery, where I calculated taxi drivers' efficiency based on total fare earned per minute spent on trips for a single day.

🧠 What I Did:

1. Queried the dataset with a Common Table Expression (CTE):

One of the most useful SQL features I leveraged in this project was the Common Table Expression (CTE).

A CTE lets you create a temporary, named result set that can be referenced within your main query. For me, it makes my SQL logic far more readable and manageable.

Instead of cramming all calculations into one long block of code, I broke things down step by step—calculating total trip duration, number of trips, and total fare inside the CTE. This made the final SELECT query cleaner, easier to debug, and more efficient to run.

# Creating a CTE
WITH table_mains AS (
  SELECT
    taxi_id,
    SUM (TIMESTAMP_DIFF(trip_end_timestamp, trip_start_timestamp, MINUTE)) AS total_trip_duration,
    COUNT (*) AS trip_count,
    SUM (fare) AS total_fare
  FROM
    `bigquery-public-data.chicago_taxi_trips.taxi_trips`
  WHERE
    DATE (trip_start_timestamp) = '2013-10-03'

# Filtering out rows that could have problems so as to get a clean result
    AND (fare) is NOT NULL
    AND (TIMESTAMP_DIFF(trip_end_timestamp, trip_start_timestamp, MINUTE)) > 0
  GROUP BY taxi_id
)

2.Filtered out incomplete or unrealistic data while creating the CTE:

To ensure accurate results and insights, I filtered out rows with missing fare values and zero trip durations. This step is crucial—it prevents skewed efficiency scores and keeps the analysis reliable

# Filtering out rows that could have problems so as to get a clean result
    AND (fare) is NOT NULL
    AND (TIMESTAMP_DIFF(trip_end_timestamp, trip_start_timestamp, MINUTE)) > 0

3. Calculated key metrics: total trip duration, total fare, number of trips, and efficiency score and ranked drivers based on their efficiency score:

I calculated key performance metrics like total trip duration, total fare, and trip count.

Using SAFE_DIVIDE, I computed the efficiency score (fare per minute) to avoid errors from division by zero. Then, I applied the RANK() window function to rank drivers based on this score—making it easy to identify the most efficient drivers at a glance.

Also, I used the WHERE clause to extract results for drivers that had more than 5 trips and ordered the results by efficiency rank.

SELECT 
  taxi_id,
  total_trip_duration,
  trip_count,
  total_fare,
  SAFE_DIVIDE (total_fare, trip_count) AS avg_trip_cost,
  SAFE_DIVIDE (total_fare, total_trip_duration) AS efficiency_score,
  RANK () OVER (
    ORDER BY SAFE_DIVIDE (total_fare, total_trip_duration) DESC
  ) AS efficiency_rank
FROM 
  table_mains

# Say we want to see the results for taxis that travelled more than 5 trips
WHERE
  trip_count >= 5
ORDER BY efficiency_score DESC

📊 Results:

After running the query and getting my query result, I saved the result and went ahead to import the saved result into Tableau to create a visualization and draw insights from the result.

The visualization brings the data to life—making it easy to compare driver performance at a glance with its interactivity. It also helps stakeholders quickly identify top performers and make data-driven decisions with clarity

🧪 Tools Used:

SQL (Google BigQuery Sandbox)
Tableau (for visualization)

💭 Reflection:
This project helped me reinforce my understanding of analytic functions like SAFE_DIVIDE() and RANK(), and taught me how to turn raw data into actionable insights.

What do you think of this approach? Feedback and ideas are welcome!

Thank you!

Jeremiah Oseremi Mayaki @omeiza_mayaki

Taxi Drivers Efficiency Analysis with SQL & Tableau

Comments 2 total