Tag #dataengineering Articles

Articles by Tag #dataengineering

Browse our collection of articles on various topics related to IT technologies. Dive in and explore something new!

The Ultimate Linux Command Cheat Sheet for Data Engineers and Analysts

The Ultimate Linux Command Cheat Sheet for Data Engineers and Analysts

Introduction As a data engineer or analyst, your day-to-day responsibilities likely...

Learn More 69 4May 21

Streaming SQL in Stateful DataFlows

Streaming SQL Functionality SQL Streaming Queries and Stream Processing Operations is...

Learn More 32 0Feb 22

All About Parquet Part 08 - Reading and Writing Parquet Files in Python

All About Parquet Part 08 - Reading and Writing Parquet Files in Python

Free Copy of Apache Iceberg the Definitive Guide Free Apache Iceberg Crash Course Iceberg Lakehouse...

Learn More 29 0Oct 21 '24

Comprehensive LuxDevHQ Data Engineering Course Guide

Comprehensive LuxDevHQ Data Engineering Course Guide

This comprehensive course spans 4 months (16 weeks) and equips learners with expertise in Python,...

Learn More 26 1Jan 21

Open Source Data Engineering Landscape 2025

Chen Debra

#dataengineering

#opensource

#apacheddolphinscheduler

#database

Open Source Data Engineering Landscape 2025

Alireza Sadeghi Reposted from...

Learn More 18 2Feb 13

When Small Parquet Files Become a Big Problem (and How I Ended Up Writing a Compactor in PyArrow)

When Small Parquet Files Become a Big Problem (and How I Ended Up Writing a Compactor in PyArrow)

It all began with a fairly normal data pipeline. Events were coming in through Kafka, landing in AWS...

Learn More 17 2May 9

Data Orchestration Tool Analysis: Airflow, Dagster, Flyte

Data Orchestration Tool Analysis: Airflow, Dagster, Flyte

Introduction Data orchestration tools are key for managing data pipelines in modern...

Learn More 17 8Jan 23

All About Parquet Part 05 - Compression Techniques in Parquet

All About Parquet Part 05 - Compression Techniques in Parquet

Free Copy of Apache Iceberg the Definitive Guide Free Apache Iceberg Crash Course Iceberg Lakehouse...

Learn More 15 0Oct 21 '24

All About Parquet Part 10 - Performance Tuning and Best Practices with Parquet

All About Parquet Part 10 - Performance Tuning and Best Practices with Parquet

Free Copy of Apache Iceberg the Definitive Guide Free Apache Iceberg Crash Course Iceberg Lakehouse...

Learn More 15 0Oct 21 '24

🎥 Speed Up PostgreSQL Queries by 7x with Smarter Indexing

🎥 Speed Up PostgreSQL Queries by 7x with Smarter Indexing

PostgreSQL is powerful, but scanning huge datasets can slow down queries. What if you could skip...

Learn More 15 0Mar 10

Designing robust and scalable relational databases: A series of best practices.

Designing robust and scalable relational databases: A series of best practices.

Throughout my experience as a data engineer and software developer, I've had the pleasure (and...

Learn More 14 5Nov 19 '24

What Apache Iceberg REST Catalog is and isn't

Alex Merced

#dataengineering

#datascience

What Apache Iceberg REST Catalog is and isn't

Free Copy of Apache Iceberg: The Definitive Guide Free Apache Iceberg Crash Course I've recently...

Learn More 13 0Aug 18 '24

How to Design a Relational Database Schema in 2025

Roxana Maria Haidiner

How to Design a Relational Database Schema in 2025

Designing a good relational database schema is one of the most important steps when building any...

Learn More 11 0Jun 23

Engenharia de Dados com Scala: masterizando o processamento de dados em tempo real com Apache Flink e Google Pub/Sub

Engenharia de Dados com Scala: masterizando o processamento de dados em tempo real com Apache Flink e Google Pub/Sub

Note: this article is also available in english 🌎 O Apache Flink é um framework de processamento...

Learn More 11 0Aug 9 '24

Understanding Apache Iceberg Delete Files

Understanding Apache Iceberg Delete Files

Free Copy of Apache Iceberg: The Definitive Guide Free Apache Iceberg Crash Course Apache Iceberg...

Learn More 10 0Aug 29 '24

Ultimate guide to creating a pipeline(Apache Airflow)

Ultimate guide to creating a pipeline(Apache Airflow)

Hello there data enthusiasts. Today's guide walks you through building a complete data pipeline using...

Learn More 10 0May 22

Serverless PDF Processing with AWS Lambda and Textract

Serverless PDF Processing with AWS Lambda and Textract

Serverless computing has transformed the way we build applications by eliminating the need to manage...

Learn More 10 2Sep 28 '24

Cadu Magalhães

#braziliandevs

#dataengineering

Introdução à Engenharia de Dados

Antes de começar com o roadmap, acho importante discutir o que realmente é Engenharia de...

Learn More 10 0Feb 25

Create, Read, Update, Delete (CRUD) | MongoDB Tutorial 2025

Roxana Maria Haidiner

Create, Read, Update, Delete (CRUD) | MongoDB Tutorial 2025

Table of Contents Introduction to MongoDB Installation & Database Creation CRUD...

Learn More 10 0Apr 14

The Developer’s Guide to Real-Time Data Platforms!

The Developer’s Guide to Real-Time Data Platforms!

The data landscape has expanded immensely, making it crucial for organizations to leverage data...

Learn More 9 0Aug 21 '24

SQL Query Optimization for Data Engineers

SQL Query Optimization for Data Engineers

Moving data efficiently can make the difference between a smooth system and a frustratingly slow one....

Learn More 9 0May 6

Jupyter Notebooks in Docker

Why use Docker for Jupyter Notebooks? Docker provides an efficient and reproducible...

Learn More 9 1Nov 29 '24

The Apache Iceberg™ Small File Problem

If you've been following Apache Iceberg™ at all, you've no doubt heard whispers about "the small file...

Learn More 9 0Dec 11 '24

SQLite Database GUI Tool | Design and Visualize with DbSchema

Roxana Maria Haidiner

SQLite Database GUI Tool | Design and Visualize with DbSchema

If you are looking for a way to design and manage your SQLite database, DbSchema helps you create...

Learn More 9 1Mar 26

Building an Automated Weather Data Pipeline with Apache Kafka and Cassandra

Building an Automated Weather Data Pipeline with Apache Kafka and Cassandra

This article creates an end-to-end weather data pipeline that collects weather data from multiple...

Learn More 9 0Apr 7

Understanding Apache Iceberg's metadata.json file

Understanding Apache Iceberg's metadata.json file

Free Copy of Apache Iceberg: The Definitive Guide Free Apache Iceberg Crash Course ...

Learn More 8 0Aug 21 '24

Understanding RAID Levels: A Comprehensive Guide to RAID 0, 1, 5, 6, 10, and Beyond

Understanding RAID Levels: A Comprehensive Guide to RAID 0, 1, 5, 6, 10, and Beyond

In today’s fast-paced digital landscape, data storage is crucial for safeguarding critical...

Learn More 8 0Jul 11 '24

Get the Records after and before the Searched One — From SQL to SPL #18

Judith-Data-Processing-Hacks

Get the Records after and before the Searched One — From SQL to SPL #18

Problem description & analysis: The ProductionLine_Number in a certain table of the...

Learn More 8 2Apr 14

Data Engineering in 2024: Innovations and Trends Shaping the Future

Data Engineering in 2024: Innovations and Trends Shaping the Future

** As 2024 unfolds, data engineering is becoming more integral to organizational success than ever...

Learn More 8 2Oct 27 '24

Data Engineering with Scala: Mastering Real-Time Data Processing with Apache Flink and Google Pub/Sub

Data Engineering with Scala: Mastering Real-Time Data Processing with Apache Flink and Google Pub/Sub

Note: this article is also available in brazilian portuguese 🌎 Apache Flink is a distributed data...

Learn More 8 0Oct 18 '24