I recently completed a hands-on SQL project using the Chicago Taxi Trips dataset on BigQuery, where I calculated taxi drivers' efficiency based on total fare earned per minute spent on trips for a single day.
🧠 What I Did:
1. Queried the dataset with a Common Table Expression (CTE):
One of the most useful SQL features I leveraged in this project was the Common Table Expression (CTE).
A CTE lets you create a temporary, named result set that can be referenced within your main query. For me, it makes my SQL logic far more readable and manageable.
Instead of cramming all calculations into one long block of code, I broke things down step by step—calculating total trip duration, number of trips, and total fare inside the CTE. This made the final SELECT query cleaner, easier to debug, and more efficient to run.
# Creating a CTE
WITH table_mains AS (
SELECT
taxi_id,
SUM (TIMESTAMP_DIFF(trip_end_timestamp, trip_start_timestamp, MINUTE)) AS total_trip_duration,
COUNT (*) AS trip_count,
SUM (fare) AS total_fare
FROM
`bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE
DATE (trip_start_timestamp) = '2013-10-03'
# Filtering out rows that could have problems so as to get a clean result
AND (fare) is NOT NULL
AND (TIMESTAMP_DIFF(trip_end_timestamp, trip_start_timestamp, MINUTE)) > 0
GROUP BY taxi_id
)
2.Filtered out incomplete or unrealistic data while creating the CTE:
To ensure accurate results and insights, I filtered out rows with missing fare values and zero trip durations. This step is crucial—it prevents skewed efficiency scores and keeps the analysis reliable
# Filtering out rows that could have problems so as to get a clean result
AND (fare) is NOT NULL
AND (TIMESTAMP_DIFF(trip_end_timestamp, trip_start_timestamp, MINUTE)) > 0
3. Calculated key metrics: total trip duration, total fare, number of trips, and efficiency score and ranked drivers based on their efficiency score:
I calculated key performance metrics like total trip duration, total fare, and trip count.
Using SAFE_DIVIDE, I computed the efficiency score (fare per minute) to avoid errors from division by zero. Then, I applied the RANK() window function to rank drivers based on this score—making it easy to identify the most efficient drivers at a glance.
Also, I used the WHERE clause to extract results for drivers that had more than 5 trips and ordered the results by efficiency rank.
SELECT
taxi_id,
total_trip_duration,
trip_count,
total_fare,
SAFE_DIVIDE (total_fare, trip_count) AS avg_trip_cost,
SAFE_DIVIDE (total_fare, total_trip_duration) AS efficiency_score,
RANK () OVER (
ORDER BY SAFE_DIVIDE (total_fare, total_trip_duration) DESC
) AS efficiency_rank
FROM
table_mains
# Say we want to see the results for taxis that travelled more than 5 trips
WHERE
trip_count >= 5
ORDER BY efficiency_score DESC
📊 Results:
After running the query and getting my query result, I saved the result and went ahead to import the saved result into Tableau to create a visualization and draw insights from the result.
The visualization brings the data to life—making it easy to compare driver performance at a glance with its interactivity. It also helps stakeholders quickly identify top performers and make data-driven decisions with clarity
🧪 Tools Used:
SQL (Google BigQuery Sandbox)
Tableau (for visualization)
💭 Reflection:
This project helped me reinforce my understanding of analytic functions like SAFE_DIVIDE() and RANK(), and taught me how to turn raw data into actionable insights.
What do you think of this approach? Feedback and ideas are welcome!
Thank you!
This is a well-structured and thoughtful approach. Using a CTE made your query more readable and modular, which is great for debugging and collaboration. Filtering out bad data early was a smart move to keep your results accurate. In the same way that searching for a taxi near me helps you quickly find efficient transport options, your use of SAFE_DIVIDE and RANK() makes it easy to identify the most efficient drivers in the dataset. Bringing the results into Tableau adds a strong visual layer that makes insights easy to understand. Overall, it's a solid blend of SQL logic and visualization — great job.