Articles by Tag #dataengineering

Browse our collection of articles on various topics related to IT technologies. Dive in and explore something new!

The Ultimate Linux Command Cheat Sheet for Data Engineers and Analysts

Introduction As a data engineer or analyst, your day-to-day responsibilities likely...

Learn More 73 4May 21

All About Parquet Part 08 - Reading and Writing Parquet Files in Python

Free Copy of Apache Iceberg the Definitive Guide Free Apache Iceberg Crash Course Iceberg Lakehouse...

Learn More 34 0Oct 21 '24

Streaming SQL in Stateful DataFlows

Streaming SQL Functionality SQL Streaming Queries and Stream Processing Operations is...

Learn More 32 0Feb 22

Comprehensive LuxDevHQ Data Engineering Course Guide

This comprehensive course spans 4 months (16 weeks) and equips learners with expertise in Python,...

Learn More 28 1Jan 21

Roadmap de Engenharia de Dados para 2025

Tenho 2 objetivos com esse roadmap: primeiro, é uma lista muito completa dos assuntos que acredito...

Learn More 26 1Jul 25

Open Source Data Engineering Landscape 2025

Alireza Sadeghi Reposted from...

Learn More 18 2Feb 13

When Small Parquet Files Become a Big Problem (and How I Ended Up Writing a Compactor in PyArrow)

It all began with a fairly normal data pipeline. Events were coming in through Kafka, landing in AWS...

Learn More 17 2May 9

How to Design a Relational Database Schema in 2025

Designing a good relational database schema is one of the most important steps when building any...

Learn More 17 1Jun 23

All About Parquet Part 10 - Performance Tuning and Best Practices with Parquet

Free Copy of Apache Iceberg the Definitive Guide Free Apache Iceberg Crash Course Iceberg Lakehouse...

Learn More 17 0Oct 21 '24

Data Orchestration Tool Analysis: Airflow, Dagster, Flyte

Introduction Data orchestration tools are key for managing data pipelines in modern...

Learn More 17 8Jan 23

All About Parquet Part 05 - Compression Techniques in Parquet

Free Copy of Apache Iceberg the Definitive Guide Free Apache Iceberg Crash Course Iceberg Lakehouse...

Learn More 16 0Oct 21 '24

Designing robust and scalable relational databases: A series of best practices.

Throughout my experience as a data engineer and software developer, I've had the pleasure (and...

Learn More 16 5Nov 19 '24

Skip the Database: Building Analytics Dashboards Directly from S3 Files

Introduction Hello guys, my name is Jo and welcome back to my engineering blog!. I'm gonna...

Learn More 15 3Sep 27

🎥 Speed Up PostgreSQL Queries by 7x with Smarter Indexing

PostgreSQL is powerful, but scanning huge datasets can slow down queries. What if you could skip...

Learn More 15 0Mar 10

SQL Server 2025 - What’s New and How to Visualize the Schema

What's New in SQL Server 2025 SQL Server 2025 brings several important updates that make...

Learn More 14 1Jul 18

The Apache Iceberg™ Small File Problem

If you've been following Apache Iceberg™ at all, you've no doubt heard whispers about "the small file...

Learn More 13 0Dec 11 '24

How to Document SQL Server Schemas Visually in 2025

Working with SQL Server, you usually get great performance and solid tools, but documenting the...

Learn More 12 1Jun 26

Building an Automated Weather Data Pipeline with Apache Kafka and Cassandra

This article creates an end-to-end weather data pipeline that collects weather data from multiple...

Learn More 12 0Apr 7

The Complete Guide to Setting Up Postgresql on Windows 11 and WSL2

TL;DR Start on your data engineering journey by installing WSL2 with Ubuntu on Windows 11 and setting...

Learn More 12 4Apr 26

Database Design Errors to Avoid & How To Fix Them

Even now, in 2025, with powerful database tools and cloud platforms, developers still make elementary...

Learn More 11 1Jul 17

Zero-Downtime Database Migration: The Complete Engineering Guide

Mission Critical: Migrating a production database is like replacing the engine of a plane mid-flight...

Learn More 11 4Sep 7

SQL Query Optimization for Data Engineers

Moving data efficiently can make the difference between a smooth system and a frustratingly slow one....

Learn More 11 0May 6

Building a News Sentiment Analysis Pipeline with Apache Airflow and Snowflake

This is a fully automated pipeline for fetching news articles, analysing their sentiment, and...

Learn More 11 0Aug 22

Ultimate guide to creating a pipeline(Apache Airflow)

Hello there data enthusiasts. Today's guide walks you through building a complete data pipeline using...

Learn More 11 0May 22

Create, Read, Update, Delete (CRUD) | MongoDB Tutorial 2025

Table of Contents Introduction to MongoDB Installation & Database Creation CRUD...

Learn More 10 0Apr 14

Introdução à Engenharia de Dados

Antes de começar com o roadmap, acho importante discutir o que realmente é Engenharia de...

Learn More 10 0Feb 25

The Rise of Real-Time Data: Why Batch Might Be Fading

Ever wondered why your favorite apps feel so snappy and responsive these days? The quiet revolution...

Learn More 10 0Aug 2

Building a Data Career: The Skills That Truly Matter

The need for people to understand, prioritize, manage, and analyze data is not slowing down in any...

Learn More 10 0Aug 14

Building a Big Data Playground Sandbox for Learning

Introduction As a data engineer, I'm always seeking opportunities to experiment with...

Learn More 10 0Oct 17 '24

SQLite Database GUI Tool | Design and Visualize with DbSchema

If you are looking for a way to design and manage your SQLite database, DbSchema helps you create...

Learn More 9 1Mar 26