How I Built a Smarter ZIP Engine with AI: My Day 9 & 10 Journey (Pagonic Project)
SetraTheX

SetraTheX @setrathexx

About: I am telling what I experienced while developing my own application

Location:
Turkey
Joined:
Jun 20, 2025

How I Built a Smarter ZIP Engine with AI: My Day 9 & 10 Journey (Pagonic Project)

Publish Date: Jun 26
4 5

How I Built a Smarter ZIP Engine with AI: My Day 9 & 10 Journey (Pagonic Project)

A true story of breakthroughs, setbacks, and lessons from the frontlines of AI-assisted software development.


🧠 Introduction

In my previous article, I shared how I built a modern ZIP engine with almost no coding background—just by managing AI tools like GitHub Copilot and ChatGPT. But the journey didn't end there. In this follow-up, I'll tell you what really happened in Days 9 and 10: the plans, the failures, the breakthroughs, and the reality of working with AI on the edge of what's possible.


🎯 Day 9 & 10 Plan: Chasing More Speed

After the previous steps, my goal for Day 9 was simple: boost extraction speed even further. My plan? Parallel processing and SIMD (hardware-accelerated operations) to make the engine even faster. I dreamed of a true parallel extraction engine, expecting 2-3x speed gains over previous results.

But reality was different. The "benchmark records" I imagined never came. Everything looked great in theory, but in practice, things got messy. My first tests showed only a 4-6% improvement. I was aiming for 200-300%! This was when I learned one of the most important lessons in software: real progress comes from honestly analyzing your failures.


😬 First Disappointment & The Autopsy Report

For days, I thought I had built a parallel extraction system. My code, my architecture, my plans—they all looked great. But the benchmark results were disappointing. So, like any developer, I sat down and wrote an autopsy report. I analyzed, line by line, what was missing and why I wasn't seeing the leap I expected.

Executive Summary:

  • Excellent architecture, but no real parallel speedup.
  • Root cause: No real parallel algorithm, just simulation.
  • Result: Marginal gain, no breakthrough.
  • Action: True parallelization and large-scale testing needed.

This report was the first turning point. Now I had the answer to "where did I go wrong?"—and that answer opened the door to a new plan.


🔄 Restart: Copilot Crashes & Switching to Cursor

Armed with my autopsy report, I created a new "Day 9+" plan. I spent five days on optimization, refactoring, and new modules. Every time I thought "this is it!", Copilot would crash at the next step. My zip_handler.py had grown past 4000 lines, and Copilot couldn't scan or suggest code anymore. Even the smallest change would freeze the IDE—AI just gave up.

This was where I saw the limits of "tool dependency" in software. When Copilot got stuck, I looked for a new way and switched to Cursor. With Cursor, I rolled back to a simpler archive from a few days earlier—right after Day 8, before Day 9 had even started. I told the AI everything I'd been through, shared the autopsy, and updated the plan. This time, I completed Day 9 in one clean shot.

During this process, I learned that working with AI isn't just "tell it, it does it." Sometimes, you have to dive in and add code line by line yourself. Cursor couldn't auto-insert into the compressed, layered code. For the first time, I manually added code with AI's guidance. It was hard, it slowed me down, but I made it work.


Day 9 – Parallel Extraction: Deciding When to Go Parallel ⚡

Another key breakthrough was realizing that parallel extraction only makes sense for large archives. With AI, I developed a simple but effective heuristic: decide between parallel and single-threaded based on file size and count. This eliminated unnecessary thread overhead and visibly improved performance.

def is_parallel_beneficial(self, total_size: int, file_count: int) -> bool:
    return (
        total_size >= 10 * 1024 * 1024 and  # 10MB+ total size
        file_count >= 3                     # 3+ files minimum
    )
Enter fullscreen mode Exit fullscreen mode

Day 9 – From Naive to Enterprise-Grade Decompression 🏗️

At first, my decompression function was just a one-liner delegating everything to Python's standard library. But after hitting a performance bottleneck, I integrated my own ZIP parser and AI-driven strategy selection. Now, the best extraction method is chosen automatically for each file, and performance for large archives is dramatically improved.

Before:

def decompress(self, zip_path, output_dir):
    return zipfile.ZipFile(zip_path).extractall(output_dir)
Enter fullscreen mode Exit fullscreen mode

This change meant the optimal extraction method was chosen for every file, bringing real speed gains for big archives.

After:

def decompress(self, zip_path, output_dir):
    entries = self._parse_central_directory(zip_path)
    if self.is_parallel_beneficial(total_size, len(entries)):
        return self._parallel_decompress_with_pools(zip_path, entries, output_dir)
    else:
        return self._fast_single_thread_decompress(zip_path, entries, output_dir)
Enter fullscreen mode Exit fullscreen mode

Day 9 – SIMD CRC32 Optimization 🏎️

CRC32 validation was a major bottleneck. With hardware-accelerated CRC32, I made this process 8-9x faster.

def fast_crc32(data: bytes, value: int = 0) -> int:
    try:
        import crc32c
        return crc32c.crc32c(data, value)
    except ImportError:
        return zlib.crc32(data, value) & 0xffffffff
Enter fullscreen mode Exit fullscreen mode

Day 9 – Buffer Pooling 💾

Allocating a new buffer for every extraction was slow. With a buffer pool, I enabled reuse and sped up memory operations.

class OptimizedBufferPool:
    def get_aligned_buffer(self, size: int) -> memoryview:
        aligned_size = ((size + 7) // 8) * 8
        return memoryview(bytearray(aligned_size))
Enter fullscreen mode Exit fullscreen mode

🤖 Day 10: Smart ZIP Handler, AI Decisions & Benchmark Challenge

The goal for Day 10 was to make the ZIP handler smart. I designed an AI system that analyzes ZIP contents and automatically selects the best extraction or compression strategy. It worked! The engine became fully adaptive. But a new problem appeared: my benchmark tests didn't show which AI/heuristic decisions were made or with what parameters.

At this point, I worked with Cursor to analyze and plan in detail. I added a system to log every AI decision and parameter for each test. Because of the compressed code structure, I had to add these logging functions manually, line by line. This process showed me again how working with AI can sometimes be painstaking and require patience.


Day 10 – AI-Driven Strategy Selection 🧩

The AI system I built analyzes ZIP metadata and suggests the optimal extraction strategy. Now, the engine can optimize itself for every file.

def suggest_extraction_strategy(self, zip_metadata: dict) -> dict:
    if zip_metadata['entropy'] < 2.0:
        return {'mode': 'fast', 'buffer_size': 65536}
    elif zip_metadata['file_count'] > 10:
        return {'mode': 'parallel', 'chunk_size': 262144}
    else:
        return {'mode': 'standard'}
Enter fullscreen mode Exit fullscreen mode

Day 10 – Logging Every AI Decision 📝

After making the engine smart, I wanted to see which AI decision was made with which parameters. So, I added a system to log every AI/heuristic decision and parameter for each test.

def log_ai_decision(test_name: str, strategy: dict):
    with open('ai_decision_log.json', 'a') as f:
        f.write(json.dumps({'test': test_name, 'strategy': strategy}) + '\n')
Enter fullscreen mode Exit fullscreen mode

Now, at the end of every test, I can transparently see which AI decision was made, with which parameters and why. This is a huge advantage for both debugging and performance analysis.


💡 Takeaways: The Reality of AI-Assisted Coding

As I said in my first article, I still don't really "write" code. I manage the AI. I've become skilled at steering Copilot and ChatGPT, understanding their quirks, and knowing when to trust and when to doubt. I know how Copilot's models work, and I always start with a plan.

But here's the truth: AI isn't magic. It's not as easy as "just tell it, it does it." Sometimes, AI is just dumb. I can't count how many times I had to roll back, how many times it crashed or slowed me down. The biggest challenge? Adding new features to a benchmark system with legacy code I didn't fully understand. But with patience, planning, and a lot of manual effort, I made it work.


📝 Advice for Other AI Developers

If you want to build real software with AI, don't just rely on the tools. Always have a roadmap and a detailed plan. Set strict rules (like my @Kurallar.md file) and guide the AI step by step. Check every change, never trust AI blindly—it sometimes deletes the wrong code or gets completely lost. Be careful and methodical; the rest will follow.


🚀 Results: What Did I Actually Achieve?

  • Parallel extraction and SIMD acceleration are now real and working.
  • AI-driven strategy selection: The engine automatically picks the best method for every file.
  • Transparent benchmark: Every AI decision and parameter is logged and visible.
  • Performance: Large files now extract at 250–340 MB/s, with 100% test pass rate.
  • Personal growth: I'm still not a "coder," but I'm a much better AI manager.

📊 Benchmark Results (Older Data)

Here are some concrete performance numbers from my earlier tests. Note that these are from an older version, and method names may change as the project evolves:

🗜️ Compression Methods

Method Avg Speed (MB/s) Max Speed (MB/s)
standard 232.5 342.7
modular_full 228.2 343.0
memory_pool 149.6 169.5

📂 Decompression Methods

Method Avg Speed (MB/s) Max Speed (MB/s)
parallel_decompression 236.2 282.6
legacy_decompression 186.0 248.7
simd_crc32_decompression 185.2 249.9
hybrid_decompression 185.0 247.9

📈 These results show the real impact of each optimization and AI-driven strategy. The parallel decompression method clearly outperforms others, validating the approach we developed in Day 9.


🎯 Conclusion

Days 9 and 10 weren't just about code—they were about learning to manage AI, to recover from mistakes, and to build real, production-level features even when the tools fight back.

In this process, I started over countless times, Copilot hit its limits, so I switched to Cursor, I added code by hand, and I experienced both the limits and the power of AI firsthand. Every failure brought a new lesson and a new solution. Most importantly, I saw that being planned and patient is the real key to AI-assisted software development.

Today, I still don't write code, but I know how to manage AI, set rules, and step in manually when needed. If you want to build software with AI, remember: the real magic isn't in the code, it's in the mind that guides it.

💬 Have you tried managing AI in your own dev projects? Let me know how it went I'd love to compare notes.

And the journey continues...


📦 Project: Pagonic (name may change!)

👤 Developer: SetraTheXX

Comments 5 total

  • SetraTheX
    SetraTheXJun 26, 2025

    I look forward to hearing your thoughts, and if you have anything you'd like to ask, I'd be happy to answer.

  • Nevo David
    Nevo DavidJun 27, 2025

    growth like this is always nice to see. kinda makes me wonder - what keeps stuff going long-term for you once the hype fizzles out?

    • SetraTheX
      SetraTheXJun 27, 2025

      To be honest, my excitement during the first week was super high but of course, that fades over time.
      What really keeps me connected to this project is the feeling at the end of each day when I can say,
      "I made progress today." That small sense of achievement is something I genuinely enjoy.

      Following my plans step by step and knowing I’m moving forward that’s what motivates me.
      That’s why even after 1.5 months, my enthusiasm hasn’t dropped at all.
      In fact, this approach only makes me want to finish the project even more.

  • Dotallio
    DotallioJun 27, 2025

    I really felt the pain of tool crashes and having to jump in by hand - it's wild how much AI can help, but you still end up doing the hard parts yourself sometimes. Have you tried layering in even more autonomy, like connecting strategy selection directly to feedback from benchmarks, or does that add too much complexity?

    • SetraTheX
      SetraTheXJun 27, 2025

      That's a really good question and interestingly, the present AI system is doing something that is quite similar to what you are describing.

      Well, it inspects the archive contents (file count, entropy, compression method, etc.) and chooses what it deems to be the best extraction strategy. Then the engine just executes that.

      Otherwise, what makes this even more useful is that the benchmark system maintains a log of what reasoning the AI used for each of the decisions it made. So, I can really see what logic it employed, what parameters it considered, and which strategy it adopted for any given test, which is of great value.

      So, yes, at this stage, we don't have a proper feedback loop where the AI may learn and improve upon its results-but the groundwork is definitely there.

      By the way, I truly love that question. You are thinking exactly along the same lines as I have.

      Honestly means so much to me as I am still quite green and early in the journey, so I really appreciate you taking the time to ask.

Add comment