Pizofreude

Pizofreude @pizofreude

About: CAE Engineer & Data Cruncher, Tech Stack Enthusiast, FOSS Advocate, and Avid Cyclist.

Location:
Die Erde
Joined:
Nov 12, 2022

Pizofreude
articles - 86 total

Peer Review 3: France Data Engineering Job Market Transformations, Visualization, and Feedback (Part 2)

Introduction Welcome back to the last part peer review of the France Data Engineering Job...

Learn More 0 0May 2

Peer Review 3: France Data Engineering Job Market Analysis Pipeline Infra (Part 1)

Introduction Welcome to the third peer review series for DataTalks Club Data Engineering...

Learn More 0 0May 2

Peer Review 2: Data Warehousing, Transformation, and Reproducibility in tfl-data-visualization (Part 2)

Welcome to the second part of my peer review series of the tfl-data-visualization project—a...

Learn More 0 0May 1

Peer Review 2: TfL Station Footfall Data Analysis Pipeline (Part 1)

Peer reviews are a cornerstone of building high-quality data engineering projects. They don’t just...

Learn More 0 0May 1

Peer Review 1: Poland's Real Estate Market Dashboards and Insights with Streamlit (Part 2)

Introduction Welcome to the second part of Peer Review 1, where we continue exploring the...

Learn More 0 0Apr 30

Peer Review 1: Analyzing Poland's Real Estate Market (Part 1)

Introduction Welcome to the first part of Peer Review 1 for DTC DEZOOMCAMP. This two-part...

Learn More 0 0Apr 30

InsightFlow Part 9: Workflow Orchestration with Kestra

9. Workflow Orchestration with Kestra In modern data engineering, orchestrating workflows...

Learn More 0 0Apr 29

InsightFlow Part 8: Setting Up AWS Athena for Data Analysis in InsightFlow

InsightFlow GitHub Repo In this post, we’ll explore how Amazon Athena was set up for querying and...

Learn More 0 0Apr 29

InsightFlow Part 7: Data Quality Implementation & Best Practices for InsightFlow

InsightFlow GitHub Repo In this post, we’ll explore how data quality was implemented in the...

Learn More 0 0Apr 29

InsightFlow Part 6: Implementing ETL Processes with AWS Glue for InsightFlow

InsightFlow GitHub Repo In this post, we’ll explore how AWS Glue was used to implement the ETL...

Learn More 0 0Apr 29

InsightFlow Part 5: Designing the Data Model & Schema with dbt for InsightFlow

InsightFlow GitHub Repo In this post, we’ll dive into how the data model and schema for the...

Learn More 0 0Apr 29

InsightFlow Part 4: Data Exploration & Understanding the Datasets

InsightFlow GitHub Repo Before diving into building any data pipeline, a crucial first step is Data...

Learn More 0 0Apr 29

InsightFlow Part 3: Building the Data Ingestion Layer with AWS Batch

InsightFlow GitHub Repo In this post, we’ll explore how the data ingestion layer for the InsightFlow...

Learn More 0 0Apr 29

InsightFlow Part 2: Setting Up the Cloud Infrastructure with Terraform

In this post, I’ll walk you through how I set up the cloud infrastructure for my project,...

Learn More 0 0Apr 28

InsightFlow Part 1: Building an Integrated Retail & Economic Data Pipeline - Project Introduction

Introduction I'm thrilled to begin documenting my journey building "InsightFlow" - an...

Learn More 0 0Apr 9

Study Notes 6.15: Kafka & Flink Streaming

1. Introduction to Kafka Streaming with PyFlink Streaming Data Processing: Involves...

Learn More 0 0Mar 18

Study Notes 6.13-14: Kafka Streaming with Python & PySpark Structured Streaming with Kafka

1. Overview of Kafka Streaming with Python Purpose & Context: This session...

Learn More 0 0Mar 18

Study Notes 6.11-12: Kafka ksqlDB, Connect & Schema Registry

1. Overview of Kafka ksqlDB & Kafka Connect ksqlDB: ksqlDB is Kafka’s SQL-based...

Learn More 0 0Mar 18

Study Notes 6.7-10: Kafka Stream Basics, JOIN, Testing & Windowing

1. Overview Kafka Streams Basics Objective: Learn the fundamental building blocks of...

Learn More 0 0Mar 18

Study Notes 6.5-6: Kafka Producer, Consumer & Configuration

1. Overview of Kafka Producer & Consumer Objective: Learn how to produce and consume...

Learn More 1 0Mar 18

Study Notes 6.3-4: What is Kafka & Confluent Cloud

1. Introduction to Kafka in Stream Processing Context of Stream Processing: Stream...

Learn More 0 0Mar 18

Study Notes 6.1-2: Introduction to Stream Processing

1. Overview of Stream Processing Definition: Continuous, real-time processing of data as...

Learn More 0 0Mar 18

Study Notes 5.6.3-4 Setting up a Dataproc Cluster & Connecting Spark to Big Query

Study Notes 5.6.3 - Setting Up a Dataproc Cluster in GCP 1. Introduction GCloud...

Learn More 0 0Mar 4

Study Notes 5.6.1-2 Spark on cloud & local

Study Notes 5.6.1 - Connecting Spark to GCS 1. Overview This guide explains how...

Learn More 0 0Mar 4

Study Notes 5.5.1-2 Operations on Spark RDDs & Spark RDD mapPartition

Study Notes 5.5.1 - Spark RDDs 1. Introduction to RDDs on Spark Resilient...

Learn More 0 0Mar 4

Study Notes 5.4.1-3 Anatomy of a Spark Cluster GroupBy & Joins in Spark

Study Notes 5.4.1 - Anatomy of a Spark Cluster 1. Introduction In this lesson,...

Learn More 0 0Mar 4

Study Notes 5.3.3-4 Data Processing & SQL with Spark

Study Notes 5.3.3: Preparing Yellow and Green Taxi Data 1. Overview and...

Learn More 0 0Mar 4

Study Notes 5.3.1-2 First Look at Spark/PySpark & Spark Dataframes

Study Notes 5.3.1 - on Spark/PySpark These notes cover the basics and some intermediate...

Learn More 0 0Mar 4

Study Notes 5.1.1-2 Introduction to Batch Processing & spark

1. Introduction to Batch Processing What is Batch Processing? Batch processing...

Learn More 1 0Mar 4

Study Notes 4.5.2: Visualizing Data with Metabase (Alternative B)

1. Introduction to Metabase Purpose: Metabase is a tool for visualizing and exploring...

Learn More 0 0Feb 25