2.2 Models

2.2.1 System Under Test

It is important to consider the impact of perturbations (interventions) on test results! 🔍

Perturbations are unexpected or background interventions that can distort the results of system performance measurements. They may occur due to:

planned system activity (for example, automatic updates, backups),
actions of other users (if the system is multiuser),
competing workloads (other applications or virtual machines in the cloud).

🌩 Special difficulties in cloud environments

In cloud infrastructures (AWS, Azure, GCP, etc.), a physical host can serve multiple virtual machines (VMs). The activity of neighboring VMs ("noisy neighbors") can affect your system, but remain invisible from inside your test environment (SUT, System Under Test). This is called the "noisy neighbor problem" — and it can seriously distort benchmark results!

🧩 The complexity of modern distributed systems

Modern environments often consist of many components:

Load balancers (Nginx, HAProxy)
Proxies and caching servers (Varnish, Redis)
Web and app servers (Apache, Node.js)
Databases (PostgreSQL, MongoDB)
Network Storage (AWS S3, Ceph)

Each of these elements can make its own perturbations! For example, the cache can mask the real load on the database, and network delays can affect the response time.

🔎 How to minimize the impact of perturbations?

Careful mapping of the environment — draw up a diagram of all system components.
Isolation of the test environment — use dedicated resources whenever possible.
Monitoring of background processes (for example, via top, `htop', Prometheus).
Statistical analysis — multiple test runs and averaging of results.

🚀 Conclusion: Perturbations are the hidden enemy of accurate measurements. Always analyze the environment, isolate the interference and check the results several times!

2.2.2 Queueing System

Deep analysis: Modeling queues, Delays, and System Performance

📊 Modeling components as queue systems

Many elements of the IT infrastructure (disks, CPUs, network interfaces) can be represented as queuing systems (queuing systems). This allows you to predict their behavior under load!

🔹 Example with disks:

As the load increases, the disk response time increases non-linearly due to the queue of requests.
The M/M/1 model (exponential service time and one handler) is often used to predict delays.

Practical application: If your disk is serving 100 IOPS, but 150 requests per second are coming, the queue will grow, and latency will skyrocket! ☄️

What is M/M/1? (In simple words)

https://en.wikipedia.org/wiki/M/M/1_queue

Imagine that:

M: Customers arrive randomly. It's like people walk into a store and you can't predict in advance when the next customer will arrive. This is called a Poisson process, and it describes random events occurring in time.
/M/: Each customer's service takes a random amount of time. It is as if the cashier in the store serves each customer for a different amount of time (depends on the number of goods, etc.). The service time is also described by a Poisson process (exponential distribution).
/1 Questioner: we have one server (or cashier). Only one person can serve customers at a time.

As a result: M/M/1 is a simple queue model where customers arrive randomly, are serviced at random times, and there is only one service machine.

Real-life analogies

There is a queue in the store at one cashier: Customers come to the store at different times, everyone has their own basket of goods, and the cashier serves everyone at different speeds.
Calls to a call center with one operator: Calls arrive randomly, each call takes a different time (depending on the problem), and there is only one operator to take the call.
Request processing on a single-processor web server: Users send requests to the server randomly, each request takes different time to process, and the server has only one processor.

Key parameters of the M/M/1 model

To understand how the M/M/1 model works, we need the following parameters:

λ (lambda) is the intensity of the arrival flow: How many customers arrive on average per unit of time. For example, 10 clients per hour.
µ (mu) is the intensity of the service flow: How many clients the server can handle on average per unit of time. For example, 15 clients per hour.

Important: If λ > μ, the queue will grow indefinitely! The server will not be able to handle the flow of clients. In order for the queue to be stable, p = λ / μ (load factor) must be less than 1. p indicates how much of the time the server is busy servicing.

What does p = λ / μ < 1 (load factor) mean?

r (ro): This is the load factor. It shows how much of the time the server is busy working. It is calculated as λ / μ.

If p < 1, it means that the server is serving clients faster than they arrive. In our example:

λ = 10 clients per hour
μ = 15 clients per hour

- ρ = 10 / 15 = 0.67

Here p is less than 1 (0.67 < 1). This means that the cashier is busy only 67% of the time. He has time to relax between clients. The queue will be stable because the cashier manages to serve customers.

Key performance indicators

The M/M/1 model allows us to estimate the following indicators:

r (load factor): How much of the time is the server busy? ρ = λ / μ. If p = 0.8, the server is busy 80% of the time.
p₀ (probability of system downtime): What is the probability that there are no clients in the system? P₀ = 1 - ρ. If p = 0.8, then p₀ = 0.2, that is, the server is idle 20% of the time.
L (average number of clients in the system): How many customers are in the system on average (queued + serviced)? L = λ / (μ - λ).
Lq (average number of customers in the queue): How many customers are in the queue on average? Lq = ρ * L = (λ²)/(μ(μ - λ)).
W (average time spent by the client in the system): How much time does the client spend in the system (in the queue + on maintenance)? W = 1 / (μ - λ). (W = L/λ).
Wq (average waiting time for a customer in the queue): How much time does the customer spend in the queue? Wq = ρ / (μ - λ). (Wq = Lq/λ).

Formulas for calculation (if you are interested)

Here are some formulas for calculating these indicators. But don't be scared, the main thing is to understand the concept.

ρ = λ / μ
P₀ = 1 - ρ
L = λ / (μ - λ)
Lq = (λ²)/(μ(μ - λ))
W = 1 / (μ - λ)
Wq = λ / (μ(μ - λ))

Calculation example (simple)

Let's say 10 customers come to the store per hour (λ = 10) and the cashier serves 15 customers per hour (μ = 15).

ρ = 10 / 15 = 0.67. The server is loaded at 67%.
p₀ = 1 - 0.67 = 0.33. The probability that the cashier is free is 33%.
L = 10 / (15 - 10) = 2. On average, there are 2 customers in the store (in line or at the checkout).
Lq = (10 * 10) / (15 * (15 - 10)) = 1.33. On average, there are 1.33 customers in the queue.
W = 1 / (15 - 10) = 0.2 hours = 12 minutes. The average customer spends 12 minutes in the store.
Wq = 10 / (15 * (15 - 10)) = 0.133 hours = 8 minutes. On average, a customer waits in line for 8 minutes.

Advantages of the M/M/1 model

Simplicity: Easy to understand and use.
Applicability: Describes many real systems.
Usefulness: Helps to evaluate system performance, design new systems, and optimize existing ones.

Limitations of the M/M/1 model

Random nature: Assumes random customer arrivals and random service times. In reality, this is not always the case.
One server: Not suitable for systems with multiple servers.
Does not take into account priorities: Does not take into account the priorities of customers.

Conclusion

The M/M/1 model is a powerful tool for understanding and analyzing queues. Knowing its basics, you can evaluate how to improve customer service, reduce waiting times, and improve the efficiency of various systems. I hope it's clearer to you now! If you have any questions, don't hesitate to ask! ☕

⏱️ Latency is the Queen of Metrics

The delay is the waiting time before performing the operation. It looks different in different contexts.:

🌐 Network examples:

DNS Latency: Domain name resolution time (for example, google.com → IP address).
TCP Connection Latency: Connection setup delay (3-way handshake).
HTTP Request Latency: The time from sending the request to receiving the first byte of the response (TTFB).

🔢 Why is latency more important than IOPS?

IOPS (Input/Output Operations Per Second) shows only the number of operations, but not their execution speed.
Latency is measured in time units (ms, µs), which allows you to:
Compare heterogeneous systems (network vs. disk).
- Accurately assess the impact of optimizations.

Example*:

100 network I/O with a delay of 100 ms = 10,000 ms in total.
50 disk I/O with a delay of 50 ms = 2,500 ms. Conclusion: the disk is 4 times faster!

📏 Table of time units

It is important to use the correct units for accurate measurements.:

Unit	Reduction	Fraction of a second
Millisecond	ms	0.001 (10-3)
Microsecond	µs	0.000001 (10⁻⁶)
Nanosecond	ns	0.000000001 (10⁻)
Picosecond	ps	0.000000000001 (10-12)

🔬 Interesting fact: A delay of 1 ns is the time it takes for light to travel only ~30 cm!

🎯 Key findings

Queue models help predict performance degradation (for example, disks under load).
Latency is a universal metric for comparing systems. Always specify the context: "TCP latency", "disk seek latency", etc.
Convert metrics to time to make informed decisions (for example, choosing between network and disk I/O).

**Remember:* "Productivity is the art of measurement, not guesswork!"\

The shocking truth about delays: from nanoseconds to millennia!*

⚡ How to understand time scales in computers?

Imagine that 1 CPU cycle (0.3 ns) is 1 second in our "scaled" world. Then:

Event	Real delay	On the scale of "1 cycle = 1 sec"	💡 Real-life analogy
Access to the L1 cache	0.9 ns	3 sec	Blink an eye
Read from RAM	100 ns	6 minutes	Make coffee
SSD disk (read)	50 microseconds	45 hours	Two days of vacation
HDD (data search)	5 ms	6 months	Six months of pregnancy
Ping from SF to New York	40 ms	4 years	Full-time university studies
Physical server restart	5 minutes	32 000 years	The last Ice Age

, Shock content:

During the time until the light reaches the book from your eyes (** 1.7 ns*), the processor manages to execute **5+ instructions*!
Waiting for a response from the HDD (10 ms) on a CPU scale — how to wait for a package for 12 months →→.

🌍 Why is this important?

Code optimization: If your algorithm once again accesses RAM instead of cache, it's like waiting 6 minutes instead of 3 seconds!
Architecture choice:
SSD vs HDD: The difference between hours and years of waiting.
- Server location: The delay between the USA and Australia (183 ms) is comparable to 19 years in CPU time!

🔧 Practical advice

Cache aggressively: L1 cache is faster than RAM by 100+ times.
Avoid disk I/O in critical code: One request per HDD = months CPU downtime.
Consider geography: Place servers closer to users — the difference between SF and UK (81 ms) Like 8 years waiting!

📚 Source: Adapted from "Systems Performance: Enterprise and the Cloud" (Brendan Gregg), chapters 6 (CPU), 9 (Disks), 10 (Network).

🎯 Philosophical summary:

"Computers live in another time dimension. Your task is not to keep them waiting!" ⏳💻

dima853 @dima853

2.2 - 2.3.2 Models/Concepts (System Performance Brendan Gregg 2nd)