About: A Full Stack Developer specializes in Python (Django, Flask), Go, & JavaScript (Angular, Node.js). Experience designing, planning, and building complete web applications with backend API systems.
Location:
Toronto, Canada
Joined:
Jan 22, 2019
Building a Scalable API Event Logger using Pub/Sub, and BigQuery
Publish Date: Nov 3
0 0
API logging is one of the most underrated aspects of backend engineering. We often rely on structured logs for debugging, but when it comes to analytics, user behavior tracking, or performance insights, traditional logs begin to show their limitations.
That’s where event-based API logging comes in.
In this post, I’ll walk you through building a simple but scalable API event logger using Go, Google Pub/Sub, and BigQuery. We’ll also see how this approach differs from traditional GCP logs and why it might be the better choice for data-driven teams.
☑️ Prerequisite
Before we begin, we are assuming that you have some knowledge of Middlewares, Google Cloud Pub/Sub, and BigQuery (data warehouse)
Though this approach can be implemented with any language/framework, I have created a working Go application project
Let's roll in!🍟
📇 Overview
Modern APIs generate a continuous stream of data about how users interact with your system — but traditional logging methods often treat this as an afterthought, providing debugging information rather than actionable events. Event-based API logging flips that model: every API interaction becomes a structured, analyzable event.
Before diving into how this logger works with Pub/Sub and BigQuery, let’s first understand what event-based API logging means and why conventional logging systems can fall short for analytics and long-term insights.
What does "event-based API logging" mean?
Event-based API logging treats each API request (and response) as a discrete event, capturing key attributes such as method, URL, user ID, timestamp, request/response payloads, and more. Unlike traditional logs that mix system messages, debug lines, and errors, an event-based approach focuses on meaningful business or technical interactions (e.g., "user 123 hit endpoint /v1/items", "response code 201").
By publishing these events to a message broker (such as Google Cloud Pub/Sub) and then storing them in an analytics store (like BigQuery), we enable downstream systems — dashboards, ML jobs, user-behavior pipelines — to unsubscribe, transform, and analyze without modifying the producer.
What are some limitations of "old-style" logging?
While tools like Google Cloud Logging (formerly Stackdriver) are great for operations, they impose constraints when we want long-term analytics or behavioural metrics. For instance, the default retention period for log buckets is 30 days, after which entries are deleted unless we change it.
On the querying side, Logs Explorer is purpose-built for debugging rather than large-scale analytics; we don't get the same SQL flexibility or performance as a dedicated data warehouse. In short, for one-off troubleshooting, it works; for streaming user-behaviour metrics, it often doesn't.
Now let's dive deep into our implementation 👇
🧠 API Logging Flow
The general idea is to capture all API calls using a middleware and send a Pub/Sub event with all the information. The subscriber then consumes this Pub/Sub message and creates a record in BigQuery.
Here are the steps involved:
Client makes an API call
API Server (Go app in this example) processes the request and logs details like method, URL, user_id, request/response JSON
The Logging Middleware publishes this log as an event to Pub/Sub
A GCP BigQuery subscriber automatically consumes these events and stores them in a BigQuery table
We can now run SQL queries on your API activity!
Now that we know the flow, let's discuss the Middleware implementation
⚙️ The Logging Middleware
The APILogEvent struct
// APILogEvent represents an API request/response log eventtypeAPILogEventstruct{RequestIDnull.String`json:"request_id"`Servicestring`json:"service"`URLstring`json:"url"`Methodstring`json:"method"`ResponseCodeint`json:"response_code"`ResponseBodynull.String`json:"response_body,omitempty"`RequestBodynull.String`json:"request_body,omitempty"`UserIDnull.String`json:"user_id,omitempty"`Durationfloat64`json:"duration"`Versionstring`json:"version"`Namestring`json:"name"`CreatedAttime.Time`json:"created_at"`}
Service - represents the API service name, so we can use it across multiple API services
UserID - represents the caller of the API
The Middleware
packagemiddlewareimport("bytes""context""fmt""io""log""net/http""strings""time""api-pubsub-logger/internal/pubsub""api-pubsub-logger/internal/utils""api-pubsub-logger/pkg/logger""github.com/gorilla/mux""gopkg.in/guregu/null.v3")// Routes to skip from logging (e.g., health checks)varskipRoutes=map[string]struct{}{"GET::/health":{},}// responseRecorder is a wrapper for http.ResponseWriter to capture response datatyperesponseRecorderstruct{http.ResponseWriterbody*bytes.BufferstatusCodeint}func(rw*responseRecorder)Write(b[]byte)(int,error){rw.body.Write(b)returnrw.ResponseWriter.Write(b)}func(rw*responseRecorder)WriteHeader(statusCodeint){rw.statusCode=statusCoderw.ResponseWriter.WriteHeader(statusCode)}// LoggingMiddleware logs HTTP requests and responses to Pub/SubfuncLoggingMiddleware(pubsubClientpubsub.Publisher,serviceNamestring)func(http.Handler)http.Handler{returnfunc(nexthttp.Handler)http.Handler{returnhttp.HandlerFunc(func(whttp.ResponseWriter,r*http.Request){// Check if the request URL path should be skippedrouteKey:=fmt.Sprintf("%s::%s",strings.ToUpper(r.Method),r.URL.Path)if_,skip:=skipRoutes[routeKey];skip{next.ServeHTTP(w,r)return}startTime:=time.Now()// Read request bodyvarrequestBody[]byteifr.Body!=nil{requestBody,_=io.ReadAll(r.Body)r.Body=io.NopCloser(bytes.NewBuffer(requestBody))// Restore the request body}// Create response recorder to capture responserecorder:=&responseRecorder{ResponseWriter:w,body:&bytes.Buffer{},statusCode:http.StatusOK,}// Call the next handlernext.ServeHTTP(recorder,r)// Extract context valuesctx:=r.Context()requestID:=utils.GetRequestID(ctx)userID:=utils.GetUserID(ctx)// Mask sensitive data in request and response bodiesmaskedRequestBody:=string(utils.MaskSensitiveData(requestBody))maskedResponseBody:=string(utils.MaskSensitiveData(recorder.body.Bytes()))// Extract route version and name from mux routerrouteName,routeVersion:=extractRouteVersionAndName(mux.CurrentRoute(r))// Create API log eventlogData:=logger.APILogEvent{RequestID:null.NewString(requestID,len(requestID)>0),Service:serviceName,Method:r.Method,URL:r.URL.String(),RequestBody:null.NewString(maskedRequestBody,len(maskedRequestBody)>0),ResponseBody:null.NewString(maskedResponseBody,len(maskedResponseBody)>0),ResponseCode:recorder.statusCode,UserID:null.NewString(userID,len(userID)>0),Version:routeVersion,Name:routeName,CreatedAt:startTime,Duration:time.Since(startTime).Seconds(),}// Publish to Pub/Sub asynchronously using background context// We use context.Background() instead of the request context because// the request context gets canceled when the HTTP response is sent,// but we want the publishing to complete independentlygosendToPubSub(context.Background(),pubsubClient,logData)})}}// extractRouteVersionAndName extracts the name and version from the mux routefuncextractRouteVersionAndName(route*mux.Route)(string,string){varname,versionstringifroute!=nil{name=route.GetName()pathTemplate,_:=route.GetPathTemplate()parts:=strings.Split(strings.Trim(pathTemplate,"/"),"/")iflen(parts)>0&&strings.HasPrefix(parts[0],"v"){version=parts[0]}}returnname,version}// sendToPubSub publishes log data to Pub/SubfuncsendToPubSub(ctxcontext.Context,clientpubsub.Publisher,logDatalogger.APILogEvent){iferr:=client.PublishAPILogEvent(ctx,logData);err!=nil{log.Printf("Failed to publish API log event: %v",err)}}
The router can be set up as
packagehttpimport("net/http""api-pubsub-logger/internal/http/handlers""api-pubsub-logger/internal/http/middleware""github.com/gorilla/mux")// Handler mounts all the handlers at the appropriate routes and adds any required middlewarefunc(h*Handler)Handler()http.Handler{r:=mux.NewRouter()// Apply global middlewarer.Use(middleware.RequestIDMiddleware)r.Use(middleware.UserIDMiddleware)r.Use(middleware.LoggingMiddleware(h.PubSubClient,h.ServiceName))// Health check endpoint (not logged due to skip in middleware)r.Methods("GET").Path("/health").Name("health").HandlerFunc(handlers.HealthCheck)h.router=rreturnr}
Let's understand a bit deeper what's happening 🙇
The Pub/Sub client is set up in main.go and part of the HTTP Handler struct, which the Middleware then accesses
We allow skipping logging for the API/routes using skipRoutes map
To ensure sensitive information is not exposed, we define a map sensitiveKeys defined in utils/mask.go that lists fields to mask
The logs are pushed to Pub/Sub in the background, so the API latency remains unaffected
Each message is appended as a new row in our api_logs table — matching the APILogEvent structure. BigQuery automatically maps these fields when the Pub/Sub topic is linked to a dataset.
That's it! 🎯
🤔 Why Not Just Use GCP Structured Logs?
GCP’s structured logs are powerful but not ideal for data analytics use cases.
Feature
Cloud Logging
Pub/Sub + BigQuery Logger
Query Language
Logs Explorer (limited SQL)
Full BigQuery SQL
Query performance
Slow for analytics
Fast, SQL-based via BigQuery
Custom schema
Limited
Fully customizable
Integration
GCP only
Can export anywhere
📊 What We Can Do with the Data
Once in BigQuery, we can easily compute metrics such as:
✅ Asynchronous Logging — doesn’t block API latency.
✅ Scalable — Pub/Sub handles millions of events.
✅ Analytics Ready — BigQuery can query directly.
✅ Customizable Schema — include app version, route, or user agent.
✅ Cross-Language Ready — can be replicated in Node.js, Python, etc.
📌 You can check out my GitHub repository for a complete working example of this approach 👇
Demo Go API with middleware that logs requests/responses to Google Cloud Pub/Sub. Features versioned routes, sensitive data masking, request tracking, and comprehensive unit tests.
API Pub/Sub Logger
A demonstration project showcasing how to log API requests and responses to Google Cloud Pub/Sub for later analysis and debugging.
Blog Post
For a detailed explanation of this implementation, read the full blog post
[Coming Soon - Link will be added here]
Overview
This project demonstrates a pattern for logging all API requests and responses to Google Cloud Pub/Sub. The middleware captures:
Request/Response bodies (with sensitive data masking)
HTTP method and URL
Response status codes
Request duration
Request IDs for tracing
User IDs from headers
All logs are published to a Pub/Sub topic which can then be consumed by subscribers to store in BigQuery or other analytics platforms.
Features
Middleware-based logging: Automatic logging of all API requests
Sensitive data masking: Automatically redacts email, phone numbers, and other sensitive fields
Request ID tracking: Unique ID for each request for distributed tracing