Performance

Concepts

CPU time is the time code spends running on the CPU, and can be divided into two categories:
- User time, or time spent in user-space.
- System (or privileged) time, or time spent in kernel-space.
The run queue is the list of runnable process: the tasks that are either currently running or are ready to run.
I/O wait is the time spent by a task waiting for I/O to complete. As a task isn't runnable in this state, the Operating systems is free to schedule other work.

Troubleshooting

Disk I/O

High throughput may indicate insufficient buffering of disk operations, insufficient throughput of the underlying storage.
- This might be caused by swapping, where physical memory is exhausted and pages are moved to swap (or page file) space on persistent storage.
Consider the ratio of read/write operations to read/written bytes. Low throughput may indicate a buffering issue: an application isn't reading or writing enough data.

Memory

Virtual memory allows the OS to overcommit its physical memory in terms of allocations. This is ideal for desktop applications, where workloads tend to be bursty and context switching between multiple applications is commonplace, but is less well-suited to servers or containers dedicated to single workloads which generally use all of their allocated resources.

Network

A LAN's optimal load is considered around 40% (assuming multiple NICs are competing for the available bandwidth). Beyond this point the rate of collisions means the network is consumed mostly by Ethernet traffic.

Profilers

Profilers may be either:

Sampling, where they sample the currently executing function at regular intervals. While these have a relatively low overhead, they're prone to somewhat misrepresenting relative costs (e.g. always sampling when threads are in states where it's safe to sample the stack, or at an interval which misses a significant event). It's unlikely that this is a problem in practice, given a sufficiently large time window and small sampling interval.
Instrumenting, where instrumentation is inserted directly into the application's code to record significant events. Whilst more intrusive and higher overhead, they can be more accurate (e.g. identifying functions inlined by JIT compilers).

Blocking function calls (e.g. performing select on a socket) indicate time spent waiting for I/O may be excluded from some profilers' visualisations, since they don't actively prevent the OS from scheduling other tasks.

It's easier to identify lock contention and synchronisation issues in a timeline view across threads than viewing the call times of individual functions.

Types of test

Throughput
Batch
Response/operation/transaction time
- Averages
- Percentiles

Variance in results

Student's t-test for p-value - the probability of a null hypothesis: that a baseline and specimen result are the same.
a-value sets the arbitrary point at which a delta is considered statistically significant.
statistical significance != statistical importance

Types of benchmarks

Nanobenchmarks
Millibenchmarks
Microbenchmarks test small units of program code, and may be useful for comparing algorithm implementations.
Mesobenchmarks are a middle ground between micro- and macrobenchmarks, and include dependencies.
Macrobenchmarks test within the full application.