Articles by Tag #spark

Browse our collection of articles on various topics related to IT technologies. Dive in and explore something new!

Choosing the Right Real-Time Stream Processing Framework

Introduction In the dynamic realm of real-time stream processing, tools like ksqlDB,...

Learn More 12 1Sep 8 '24

Hadoop/Spark is too heavy, esProc SPL is light

With the advent of the era of big data, the amount of data continues to grow. In this case, it is...

Learn More 8 1Jul 8 '24

Auditoria massiva com Lineage Tables do UC no Databricks

Explorando as Lineage Tables do Unity Catalog no Databricks 🚀 As Lineage Tables do Unity...

Learn More 7 0Dec 10 '24

Entendendo e aplicando estratégias de tunning Apache Spark

Motivadores para ler esse artigo. Experiência própria e vivenciada em momentos de caos e...

Learn More 6 0Nov 7 '24

End-to-End Realtime Streaming Data Engineering Project

This repository demonstrates a data engineering pipeline using Spark Structured Streaming. It...

Learn More 6 0Aug 7 '24

Análise de dados de tráfego aéreo em tempo real com Spark Structured Streaming e Apache Kafka

Atualmente, vivemos em um mundo onde peta bytes de dados são gerados a cada segundo. Como tal, a...

Learn More 6 0Oct 28 '24

[API Databricks como serviço interno] dbutils — notebook.run, widgets.getArgument, widgets.text e notebook_params

Reduzindo Custos com Automação de Processos no Databricks Tive uma necessidade em um...

Learn More 6 1Nov 2 '24

Databricks - Variant Type Analysis

The VARIANT data type is a recent introduction in Databricks (available in Databricks Runtime 15.3...

Learn More 3 0Jun 29 '24

Advanced Deduplication Using Apache Spark: A Guide for Machine Learning Pipelines

In the era of big data, ensuring the quality and accuracy of your data is paramount for both business...

Learn More 2 0Oct 14 '24

Complete Beginner's Guide: Building a Weather ETL Pipeline with PySpark

Introduction Welcome to the exciting world of data engineering! In this comprehensive...

Learn More 2 1Jun 6

Enhancing Data Security with Spark: A Guide to Column-Level Encryption - Part 2

This post describes how you can build an AWS Glue ingestion job with PySpark aes_encrypt() function...

Learn More 1 0Dec 22 '24

My journey learning Apache Spark

My Journey Learning Apache Spark on Coursera As someone passionate about data, I recently...

Learn More 1 0Oct 26 '24

Study Notes 5.1.1-2 Introduction to Batch Processing & spark

1. Introduction to Batch Processing What is Batch Processing? Batch processing...

Learn More 1 0Mar 4

Building a YouTube Channel Analytics Dashboard with Airflow, Spark, and Grafana

Introduction In the rapidly evolving creator economy, data-driven insights are essential...

Learn More 1 3Jun 10

Spark On Kubernetes

Spark On Kubernetes via helm chart This article is aimed at introducing the feature of Apache Spark...

Learn More 1 0May 5

PySpark optimization techniques

There are a variety of different parts of Spark jobs that you might want to optimize, and...

Learn More 1 0Aug 28 '24

Study Notes 5.3.3-4 Data Processing & SQL with Spark

Study Notes 5.3.3: Preparing Yellow and Green Taxi Data 1. Overview and...

Learn More 1 0Mar 4

AWS Glue vs AWS Lambda: Comparativa Serverless para Ingeniería de Datos en AWS

El mundo Cloud ha revolucionado la forma en que las empresas gestionan y analizan sus datos. Amazon...

Learn More 1 0Feb 16

Run PySpark Local Python Windows Notebook

Introduction PySpark is the Python API for Apache Spark, an open-source distributed...

Learn More 1 0Jan 21

Top 5 Things You Should Know About Spark

1. Dataframe is a Dataset Try searching for a DataFrame API in Scala Spark documentation...

Learn More 1 0Aug 29 '24

How to treat secure data on lakehouse

In the modern data stack, the lakehouse has emerged as a hybrid solution that combines the...

Learn More 1 0Apr 8

Automatizando a Qualidade de Dados com DQX: Performance e praticidade

Introdução ao DQX No cenário atual, onde os dados são frequentemente comparados ao "novo...

Learn More 0 0Feb 27

Integrating Elasticsearch with Spark

I got an opportunity to host a session with elastic community which explains the integration of...

Learn More 0 0Oct 14 '24

Like IDE for SparkSQL: SparkSQLHelper v2024.1.4 released

Since I published the article:...

Learn More 0 0Dec 24 '24

Journey Through Spark SQL

Journey Through Spark SQL: A Behind-the-Scenes Adventure Introduction Have you ever...

Learn More 0 0Oct 12 '24

Spark Review : The Game-Changer for Effortless Content Creation

The Key Features Of Spark Host unlimited websites and domains on our cloud hosting servers for a low...

Learn More 0 0Feb 7

Designing a Scalable Shuffle Service for Big Data on AWS

Introduction Shuffle operations are a critical component of distributed data processing frameworks...

Learn More 0 0Feb 5

Comprehensive Guide to Schema Inference with MongoDB Spark Connector in PySpark

When working with MongoDB and Apache Spark, you might encounter situations where the MongoDB Spark...

Learn More 0 0Jun 27 '24

Machine Learning with Spark and Groovy

In a previous post, machine-learning-groovy.html (spanish), I was playing to group similar customers...

Learn More 0 0Jul 21 '24

Setting Up IOMete: A Cloud-Independent Data Platform Based on Spark

IOMete is a powerful, cloud-independent data platform built on Apache Spark, designed to enable...

Learn More 0 1Jun 10