Articles by Tag #bigdata

Browse our collection of articles on various topics related to IT technologies. Dive in and explore something new!

What to use parquet or CSV?

History of Parquet File: A Big Data Storage Revolution The Parquet file format has emerged...

Learn More 22 0May 7 '24

Trino & Iceberg Made Easy: A Ready-to-Use Playground

Earlier I briefly introduced Apache Iceberg and built an out-of-the-box experiment...

Learn More 20 0May 20 '24

Stream Data at scale from millions of sources with Amazon Kinesis (Serverless)

Amazon Kinesis is a streaming service specifically designed to address the complexities of data...

Learn More 13 0May 20 '24

Using ReAct Agents LLMs to Draw Insights from Tabular Data

Introduction Large language models (LLMs) heavily changed the way I and other developers...

Learn More 11 0Sep 11 '24

The Apache Iceberg™ Small File Problem

If you've been following Apache Iceberg™ at all, you've no doubt heard whispers about "the small file...

Learn More 9 0Dec 11 '24

Processando 20 milhões de registros em menos de 5 segundos com Apache Hive.

Iniciando com Hadoop e Apache Hive: Arquitetura, Configuração e Otimização Neste artigo...

Learn More 9 0Nov 2 '24

Hands-on introduction to Apache Iceberg

Lessons learned through a PoC for a challenging use-case Introduction Apache Iceberg...

Learn More 8 2Oct 28 '24

Data Visualisation Basics

Why use data vis When you need to work with a new data source, with a huge amount of data,...

Learn More 8 0Sep 6 '24

To Index Data is To Sort Data

Indexing is commonly used among programmers. Without fully grasping the idea behind the technique, a...

Learn More 8 0Aug 26 '24

How to Load Datasets Efficiently in Pandas: A Complete Guide

"Without data, you're just another person with an opinion." — W. Edwards Deming In today’s...

Learn More 8 2Feb 18

Introduction to Big Data Analysis

Data refers to raw, unprocessed facts, statistics, or information collected for reference, analysis,...

Learn More 8 0Nov 17 '24

How Data Science & Analytics Are Transforming Industries Today

In a world where every click, purchase and interaction generates data, companies that fail to...

Learn More 8 1Apr 21

The Heart of DolphinScheduler: In-Depth Analysis of the Quartz Scheduling Framework

Quartz is an open-source Java job scheduling framework that provides powerful capabilities for...

Learn More 8 0Nov 20 '24

Building a Big Data Playground Sandbox for Learning

Introduction As a data engineer, I'm always seeking opportunities to experiment with...

Learn More 8 0Oct 17 '24

🎉 Apache Ambari 3.0.0 Released: A New Chapter in Hadoop Cluster Management

Apache Ambari 3.0.0 brings major improvements to cluster management capabilities, featuring Apache Bigtop integration, Java 17 support, and much more!

Learn More 7 2Apr 7

Is distributed technology the panacea for big data processing?

Using distributed cluster to process big data is the mainstream at present, and splitting a big task...

Learn More 7 1Jun 6 '24

Introduction to Hadoop:)

Hadoop is an open-source software framework designed to handle and process large volumes of data...

Learn More 6 0Nov 24 '24

Using DolphinScheduler API to Achieve Efficient Batch Workflow Import and Script Deployment

When I Implemented batch generation of DolphinScheduler tasks and imported them, it was found that...

Learn More 6 0Jan 22

The current Lakehouse is like a false proposition

From all-in-one machine, hyper-convergence, cloud computing to HTAP, we constantly try to combine...

Learn More 6 1Jun 12 '24

5 Game-Changing Habits to Master Your Data Science Journey

The journey to becoming a data scientist isn’t for the faint of heart. It’s a demanding but rewarding...

Learn More 6 0Jan 28

Mastering Big Data with GCP: My Capstone Journey in Cloud Data Analysis

Introduction As a data enthusiast, I’ve always been fascinated by the power of cloud...

Learn More 6 0Mar 25

Introduction to Big Data

Abstract Big data refers to large and complex datasets that require advanced techniques to store,...

Learn More 5 2Nov 2 '24

Introduction to Apache Hadoop & MapReduce

The History of Hadoop There are mainly two problems with the big data. Storage for a...

Learn More 5 0Jun 30 '24

SeaTunnel-Powered Data Integration: How 58 Group Handles Over 500 Billion+ Data Points Daily

Introduction In the digital age, data has become one of the most valuable assets for...

Learn More 5 2Nov 20 '24

Discover Q2 2024's top and bottom-ranked stocks with FINQ

The stock market can often feel like an exhilarating yet unpredictable rollercoaster ride. The second...

Learn More 5 0Jul 28 '24

Big Data Trends That Will Impact Your Business In 2025

What made Men In Black so incredibly awesome? Was it their coordinated suits? Or was it the fact that...

Learn More 5 0Dec 24 '24

Vector Databases: Leading a New Era of Big Data and AI Integration

1. Introduction Driven by the wave of digitalization, the growth rate of data has reached...

Learn More 5 0Jul 10 '24

🚀 Unlock the Power of ORC File Format 📊

Are you diving into the world of data storage and processing? Look no further! My latest blog...

Learn More 5 0Nov 22 '24

Apache Pyspark

It is a fast and general-purpose distributed computing system for big data processing. It provides an...

Learn More 5 0Apr 1

Introduction to Data lakes: The future of big data storage

Reflection 8 Data lakes have emerged as a pivotal component in the realm of big data management,...

Learn More 5 0Dec 14 '24