Sharing the technical challenges and solutions encountered during the development of an ID card capture and face recognition system. From YOLOv12 to Canvas API, this post contains valuable experiences gained through continuous trial and error.
🎯 Project Overview: Ambitious Start and Practical Approach
Final architecture of the Face Recognition project - stable structure completed through multiple iterations
This project began as the development of a KYC (Know Your Customer) system that captures ID cards through webcam and extracts faces from images for feature analysis. Initially, we aimed to achieve high accuracy by adopting cutting-edge deep learning technologies, but faced practical constraints that led us to pivot to a more pragmatic approach.
Initial Goals and Expectations
- High-precision object detection: ID card detection using YOLOv12
- Real-time processing: Fast response speed in web environments
- High accuracy: Stable recognition rates in various environments
- User-friendly UI: Intuitive capture guidance
📈 Development Timeline: The Journey
Complete timeline of the project development process - key challenges and solutions at each stage
Phase 1: Initial Direction Setting and Review
Development began with the introduction of the YOLOv12 deep learning model. The goal was to accurately detect ID cards using cutting-edge object recognition technology and extract facial regions from images.
// Initial YOLOv12 implementation attempt (not actually implemented)
class YOLODetector {
constructor() {
this.model = null;
this.isLoaded = false;
}
async loadModel() {
// Model loading attempt
// Issue: Difficulty securing appropriate training data
// Issue: Model size and performance issues in web environment
}
}
Key Issues:
- Data acquisition difficulties: Legal/ethical constraints in collecting ID card image training data
- Model size: Large models unsuitable for web environments
- Development complexity: Specialized knowledge required for deep learning model optimization
Due to these practical constraints, we pivoted to an OpenCV-based approach.
Phase 2: OpenCV-based Feature Implementation and Optimization
After the pivot, we attempted a practical approach using OpenCV.js and MediaPipe.
// MediaPipe face detection implementation
import mediaPipeManager from '../../utils/mediaPipeManager.js';
import FaceAngleCalculator from '../utils/face-angle-calculator.js';
class FaceDetector {
async detectFaces(video) {
try {
if (!mediaPipeManager.isReady) {
await this.initialize();
}
const results = await mediaPipeManager.detectFaces(
video,
performance.now(),
this.modelKey
);
if (!results || !results.faceLandmarks || results.faceLandmarks.length === 0) {
return { faces: [] };
}
// Face landmark-based angle calculation
const faces = results.faceLandmarks.map((landmarks, index) => {
const angles = FaceAngleCalculator.calculateAngles(landmarks);
const quality = this.assessQuality(landmarks);
return {
landmarks,
angles,
quality,
boundingBox: this.calculateBoundingBox(landmarks)
};
});
return { faces };
} catch (error) {
console.warn('Face detection error:', error);
return { faces: [] };
}
}
}
Key Implementation Features:
- Canny edge detector: ID card border detection
- Automatic capture system: Auto-capture when guidelines are aligned
- Real-time quality assessment: Image sharpness and lighting condition checks
// ID card border detection using Canny edge detection
class DocumentDetector {
detectDocument(imageData) {
// Canny edge detection
const edges = this.cannyEdgeDetection(imageData);
// Find contours
const contours = this.findContours(edges);
// Filter rectangular contours
const rectangles = this.filterRectangularContours(contours);
// Select the largest rectangle as ID card
return this.selectLargestRectangle(rectangles);
}
cannyEdgeDetection(imageData) {
// Gradient calculation using Sobel filter
const gradientX = this.sobelX(imageData);
const gradientY = this.sobelY(imageData);
// Calculate gradient magnitude and direction
const magnitude = this.calculateMagnitude(gradientX, gradientY);
// Apply non-maximum suppression and double threshold
return this.applyThreshold(magnitude);
}
}
Phase 3: Framework Migration Review and Algorithm Replacement
As development progressed, we faced difficulties with debugging and increasing code complexity.
React Migration Review:
// React component structure review (actually maintained vanilla JS)
function FaceRecognitionApp() {
const [isDetecting, setIsDetecting] = useState(false);
const [faceData, setFaceData] = useState(null);
const videoRef = useRef(null);
// Issue: Complexity of existing MediaPipe integration
// Issue: State management overhead
// Decision: Decided to maintain vanilla JS
}
Douglas-Peucker Algorithm Issues:
// Existing Douglas-Peucker implementation (accuracy issues)
class ContourSimplifier {
douglasPeucker(points, epsilon) {
// Issue: Inaccurate results with complex ID card shapes
// Issue: Sensitive reactions to lighting changes
if (points.length < 3) return points;
let maxDistance = 0;
let maxIndex = 0;
// Find the farthest point
for (let i = 1; i < points.length - 1; i++) {
const distance = this.perpendicularDistance(
points[i], points[0], points[points.length - 1]
);
if (distance > maxDistance) {
maxDistance = distance;
maxIndex = i;
}
}
// Recursive simplification (problem point)
if (maxDistance > epsilon) {
const left = this.douglasPeucker(points.slice(0, maxIndex + 1), epsilon);
const right = this.douglasPeucker(points.slice(maxIndex), epsilon);
return left.slice(0, -1).concat(right);
}
return [points[0], points[points.length - 1]];
}
}
Replacement with Bounding Box Algorithm:
// Improved bounding box-based detection
class ImprovedDocumentDetector {
detectDocumentBoundingBox(contours) {
let bestCandidate = null;
let bestScore = 0;
for (const contour of contours) {
// Area-based filtering
const area = this.calculateArea(contour);
if (area < this.minArea || area > this.maxArea) continue;
// Aspect ratio check (ID card ratio)
const boundingRect = this.getBoundingRect(contour);
const aspectRatio = boundingRect.width / boundingRect.height;
if (aspectRatio < 1.4 || aspectRatio > 1.8) continue;
// Calculate rectangularity score
const rectangularityScore = this.calculateRectangularity(contour);
// Calculate total score
const totalScore = rectangularityScore * 0.7 +
this.calculatePositionScore(boundingRect) * 0.3;
if (totalScore > bestScore) {
bestScore = totalScore;
bestCandidate = boundingRect;
}
}
return bestCandidate;
}
calculateRectangularity(contour) {
const boundingRect = this.getBoundingRect(contour);
const contourArea = this.calculateArea(contour);
const rectArea = boundingRect.width * boundingRect.height;
// Measure how close the contour is to a rectangle
return contourArea / rectArea;
}
}
Phase 4: System Stabilization and Code Improvement
Tech stack evolution process - from problem identification to final solution
In the final phase, we focused on system stability. The most important decision was to completely remove OpenCV.js dependency and directly implement Canvas API.
OpenCV.js Removal and Direct Canvas API Implementation:
// Direct image processing implementation using Canvas API
class CanvasImageProcessor {
constructor() {
this.canvas = document.createElement('canvas');
this.ctx = this.canvas.getContext('2d');
}
// Grayscale conversion
convertToGrayscale(imageData) {
const data = imageData.data;
for (let i = 0; i < data.length; i += 4) {
const gray = data[i] * 0.299 + data[i + 1] * 0.587 + data[i + 2] * 0.114;
data[i] = gray; // R
data[i + 1] = gray; // G
data[i + 2] = gray; // B
// data[i + 3] = alpha (unchanged)
}
return imageData;
}
// Sobel edge detection
applySobelFilter(imageData) {
const width = imageData.width;
const height = imageData.height;
const data = imageData.data;
const output = new ImageData(width, height);
// Sobel X kernel
const sobelX = [
[-1, 0, 1],
[-2, 0, 2],
[-1, 0, 1]
];
// Sobel Y kernel
const sobelY = [
[-1, -2, -1],
[ 0, 0, 0],
[ 1, 2, 1]
];
for (let y = 1; y < height - 1; y++) {
for (let x = 1; x < width - 1; x++) {
let gx = 0, gy = 0;
// Apply 3x3 kernel
for (let ky = -1; ky <= 1; ky++) {
for (let kx = -1; kx <= 1; kx++) {
const pixel = this.getPixel(data, x + kx, y + ky, width);
gx += pixel * sobelX[ky + 1][kx + 1];
gy += pixel * sobelY[ky + 1][kx + 1];
}
}
// Calculate gradient magnitude
const magnitude = Math.sqrt(gx * gx + gy * gy);
const index = (y * width + x) * 4;
output.data[index] = magnitude;
output.data[index + 1] = magnitude;
output.data[index + 2] = magnitude;
output.data[index + 3] = 255;
}
}
return output;
}
// Image quality assessment
assessImageQuality(imageData) {
const brightness = this.calculateBrightness(imageData);
const contrast = this.calculateContrast(imageData);
const sharpness = this.calculateSharpness(imageData);
return {
brightness: brightness,
contrast: contrast,
sharpness: sharpness,
overall: (brightness + contrast + sharpness) / 3
};
}
calculateSharpness(imageData) {
// Sharpness measurement using Laplacian filter
const laplacian = [
[0, -1, 0],
[-1, 4, -1],
[0, -1, 0]
];
const width = imageData.width;
const height = imageData.height;
const data = imageData.data;
let variance = 0;
let count = 0;
for (let y = 1; y < height - 1; y++) {
for (let x = 1; x < width - 1; x++) {
let sum = 0;
for (let ky = -1; ky <= 1; ky++) {
for (let kx = -1; kx <= 1; kx++) {
const pixel = this.getPixel(data, x + kx, y + ky, width);
sum += pixel * laplacian[ky + 1][kx + 1];
}
}
variance += sum * sum;
count++;
}
}
return Math.sqrt(variance / count);
}
}
Improved Server-side Quality Check Logic:
// Integrated quality check on server
class QualityValidator {
validateImageQuality(imageBuffer, metadata) {
const qualityChecks = {
brightness: this.checkBrightness(imageBuffer),
contrast: this.checkContrast(imageBuffer),
sharpness: this.checkSharpness(imageBuffer),
resolution: this.checkResolution(metadata),
fileSize: this.checkFileSize(imageBuffer.length)
};
// Calculate total score
const weights = {
brightness: 0.2,
contrast: 0.2,
sharpness: 0.3,
resolution: 0.2,
fileSize: 0.1
};
let totalScore = 0;
for (const [check, score] of Object.entries(qualityChecks)) {
totalScore += score * weights[check];
}
return {
score: totalScore,
details: qualityChecks,
passed: totalScore >= 0.7,
recommendations: this.generateRecommendations(qualityChecks)
};
}
generateRecommendations(checks) {
const recommendations = [];
if (checks.brightness < 0.3) {
recommendations.push('Please increase lighting');
} else if (checks.brightness > 0.8) {
recommendations.push('Lighting is too bright. Please reduce it slightly');
}
if (checks.sharpness < 0.5) {
recommendations.push('Please hold the camera steady and take a clear photo');
}
if (checks.contrast < 0.4) {
recommendations.push('Please increase contrast between background and ID card');
}
return recommendations;
}
}
🔧 Core Technical Challenges and Solutions
1. Escaping Dependency Hell
Problem: Increased loading time due to large OpenCV.js library
// Problematic OpenCV.js loading
const loadOpenCV = () => {
return new Promise((resolve, reject) => {
const script = document.createElement('script');
script.src = 'https://docs.opencv.org/4.5.0/opencv.js';
script.onload = () => {
cv.onRuntimeInitialized = () => {
resolve(cv); // Takes 3-5 seconds to complete loading
};
};
script.onerror = reject;
document.head.appendChild(script);
});
};
Solution: Instant loading with direct Canvas API implementation
// Improved instant loading approach
class LightweightImageProcessor {
constructor() {
// Immediately available, no external dependencies
this.ready = true;
}
// Implement only necessary features directly
processImage(imageData) {
// Can process immediately
return this.applyFilters(imageData);
}
}
2. MediaPipe Loading Error Resolution
Problem: MediaPipe initialization failure and memory leaks
// Problematic MediaPipe initialization
class ProblematicMediaPipe {
async initialize() {
// Issue: Duplicate initialization attempts
// Issue: Insufficient memory cleanup
this.faceMesh = new FaceMesh({
locateFile: (file) => {
return `https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh/${file}`;
}
});
}
}
Solution: Singleton pattern and proper resource management
// Improved MediaPipe manager
class MediaPipeManager {
constructor() {
this.faceMesh = null;
this.isInitialized = false;
this.isInitializing = false;
}
async initialize() {
if (this.isInitialized) return;
if (this.isInitializing) {
// Prevent duplicate initialization
await this.waitForInitialization();
return;
}
this.isInitializing = true;
try {
this.faceMesh = new FaceMesh({
locateFile: (file) => {
return `https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh/${file}`;
}
});
this.faceMesh.setOptions({
maxNumFaces: 1,
refineLandmarks: true,
minDetectionConfidence: 0.5,
minTrackingConfidence: 0.5
});
await this.faceMesh.initialize();
this.isInitialized = true;
} catch (error) {
console.error('MediaPipe initialization failed:', error);
throw error;
} finally {
this.isInitializing = false;
}
}
cleanup() {
if (this.faceMesh) {
this.faceMesh.close();
this.faceMesh = null;
}
this.isInitialized = false;
}
}
// Singleton instance
const mediaPipeManager = new MediaPipeManager();
export default mediaPipeManager;
3. Face Angle Calculation Accuracy Improvement
Problem: Incorrect step progression due to inaccurate angle measurement
// Previous inaccurate angle calculation
function calculateYawAngle(landmarks) {
// Issue: Simple 2D calculation
const leftEye = landmarks[33];
const rightEye = landmarks[263];
const nose = landmarks[1];
// Inaccurate calculation method
const angle = Math.atan2(
rightEye.y - leftEye.y,
rightEye.x - leftEye.x
) * 180 / Math.PI;
return angle; // Inaccurate result
}
Solution: Accurate angle calculation based on 3D landmarks
// Improved 3D angle calculation
class FaceAngleCalculator {
static calculateAngles(landmarks) {
if (!landmarks || landmarks.length < 468) {
return { yaw: 0, pitch: 0, roll: 0 };
}
// Key landmark points (based on MediaPipe 468 points)
const noseTip = landmarks[1]; // Nose tip
const leftEyeCorner = landmarks[33]; // Left eye corner
const rightEyeCorner = landmarks[263]; // Right eye corner
const leftMouth = landmarks[61]; // Left mouth
const rightMouth = landmarks[291]; // Right mouth
const chin = landmarks[18]; // Chin
const forehead = landmarks[10]; // Forehead
// Calculate Yaw (left-right rotation)
const yaw = this.calculateYaw(
leftEyeCorner, rightEyeCorner, noseTip
);
// Calculate Pitch (up-down rotation)
const pitch = this.calculatePitch(
forehead, noseTip, chin
);
// Calculate Roll (tilt)
const roll = this.calculateRoll(
leftEyeCorner, rightEyeCorner
);
return { yaw, pitch, roll };
}
static calculateYaw(leftEye, rightEye, nose) {
// Calculate eye center point
const eyeCenter = {
x: (leftEye.x + rightEye.x) / 2,
y: (leftEye.y + rightEye.y) / 2,
z: (leftEye.z + rightEye.z) / 2
};
// Vector between nose and eye center
const noseVector = {
x: nose.x - eyeCenter.x,
y: nose.y - eyeCenter.y,
z: nose.z - eyeCenter.z
};
// Vector between eyes (reference line)
const eyeVector = {
x: rightEye.x - leftEye.x,
y: rightEye.y - leftEye.y,
z: rightEye.z - leftEye.z
};
// Calculate normal vector using cross product
const normal = this.crossProduct(eyeVector, noseVector);
// Calculate Yaw angle (convert radians to degrees)
const yaw = Math.atan2(normal.y, normal.x) * (180 / Math.PI);
return this.normalizeAngle(yaw);
}
static calculatePitch(forehead, nose, chin) {
// Face centerline vector
const faceVector = {
x: chin.x - forehead.x,
y: chin.y - forehead.y,
z: chin.z - forehead.z
};
// Angle with vertical reference line
const pitch = Math.atan2(
faceVector.z,
Math.sqrt(faceVector.x * faceVector.x + faceVector.y * faceVector.y)
) * (180 / Math.PI);
return this.normalizeAngle(pitch);
}
static calculateRoll(leftEye, rightEye) {
// Tilt between eyes
const roll = Math.atan2(
rightEye.y - leftEye.y,
rightEye.x - leftEye.x
) * (180 / Math.PI);
return this.normalizeAngle(roll);
}
static crossProduct(a, b) {
return {
x: a.y * b.z - a.z * b.y,
y: a.z * b.x - a.x * b.z,
z: a.x * b.y - a.y * b.x
};
}
static normalizeAngle(angle) {
// Normalize angle to -180 ~ 180 range
while (angle > 180) angle -= 360;
while (angle < -180) angle += 360;
return angle;
}
}
📊 Results and Improvements
Performance Improvement Metrics
Item | Before | After | Improvement |
---|---|---|---|
Initial Loading Time | 3-5 seconds | Under 0.5 seconds | 90% reduction |
Memory Usage | 150-200MB | 50-80MB | 60% reduction |
Face Detection Accuracy | 75% | 92% | 17% improvement |
System Stability | Medium | High | 95% crash reduction |
Key Achievements
- Significant Loading Speed Improvement: 90% reduction in initial loading time by removing OpenCV.js
- Enhanced System Stability: Resolved memory leaks and 95% reduction in crashes
- Accuracy Improvement: 17% improvement in recognition accuracy through 3D landmark-based angle calculation
- Code Quality Enhancement: Strengthened modularization and error handling
Technical Learning Points
// Final integrated system architecture
class IntegratedFaceRecognitionSystem {
constructor() {
this.mediaManager = mediaPipeManager;
this.imageProcessor = new CanvasImageProcessor();
this.qualityValidator = new QualityValidator();
this.angleCalculator = FaceAngleCalculator;
}
async processFrame(videoElement) {
try {
// 1. Face detection with MediaPipe
const faceResults = await this.mediaManager.detectFaces(videoElement);
if (faceResults.faces.length === 0) {
return { success: false, reason: 'NO_FACE_DETECTED' };
}
const face = faceResults.faces[0];
// 2. Image quality assessment with Canvas API
const imageData = this.extractImageData(videoElement);
const quality = this.imageProcessor.assessImageQuality(imageData);
// 3. 3D landmark-based angle calculation
const angles = this.angleCalculator.calculateAngles(face.landmarks);
// 4. Comprehensive validation
const validation = this.validateFrame(face, quality, angles);
return {
success: true,
face: face,
quality: quality,
angles: angles,
validation: validation
};
} catch (error) {
console.error('Frame processing error:', error);
return { success: false, reason: 'PROCESSING_ERROR', error };
}
}
validateFrame(face, quality, angles) {
const checks = {
faceSize: this.checkFaceSize(face.boundingBox),
facePosition: this.checkFacePosition(face.boundingBox),
imageQuality: quality.overall >= 0.7,
faceAngle: this.checkFaceAngle(angles),
landmarks: face.landmarks.length >= 468
};
const passed = Object.values(checks).every(check => check);
return {
passed,
checks,
recommendations: this.generateRecommendations(checks, quality, angles)
};
}
}
🎯 Future Development Plans
Short-term Goals
- Mobile Optimization: Touch interface and responsive design
- Multi-face Processing: Stable operation in environments with multiple people
- Real-time Feedback Improvement: More intuitive user guidance
Medium-term Goals
- Liveness Verification: Feature to distinguish between real people and photos
- ID Text Recognition: OCR-based information extraction
- Security Enhancement: Biometric data encryption and secure storage
Long-term Goals
- AI Model Integration: Development of custom face recognition models
- Cloud Expansion: Serverless architecture for high-volume processing
- Internationalization: Support for various countries' identification documents
💡 Key Lessons Learned
Technical Lessons
- Importance of Dependency Management: The value of reducing external library dependencies and implementing core features directly
- Incremental Improvement: Don't try to create a perfect solution from the start; build a working version first, then improve
- Balance Between Performance and Features: The latest technology isn't always the best; choosing appropriate technology that fits project requirements is crucial
Project Management Lessons
- Importance of Requirements Convergence: Clarifying requirements in the initial design phase determines the overall development direction
- Technical Debt Management: Writing sustainable code while balancing performance and functionality
- Problem-solving Skills: Improved ability to systematically analyze and solve complex technical challenges
This project was a comprehensive growth opportunity that went beyond simple feature development to encompass real-time system design, performance optimization, and user experience improvement. I will continue to build more stable and accurate face authentication systems in the future.
Tags: #FaceRecognition #MediaPipe #Canvas #OpenCV #WebRTC #ImageProcessing #KYC #JavaScript #ComputerVision #WebDevelopment