adaptive service monitor
No two servers are alike. No two servers will ever experience the same conditions.
For those evolving servers, we have created the Adaptive Service Monitor™ ("asm"), a
statistical monitor that collects, analyzes, and adapts the server to the changing
needs of its users.
How it works
asm periodically collects samples from selected services at arbitrary intervals.
These data are then compared against historical trends with a focus on significant
changes between samples. If significance is found between the samples (α = 0.05),
then further analysis is done to determine the cause of a bottleneck. asm analyzes
process information, memory utilization, disk I/O, and network throughput, then adapts the
server by lessening the burden of the conflicting service. After exploratory tuning is
performed, server information is recorded and reanalyzed to determine a success rate. This
information is used for future decision making. As a result, the server stays healthy
during peak hours and constantly retunes itself as Web sites grow.
In order to better understand how asm operates, let us step through a brief example. This is
a guided tour that steps through the basic process in asm given two variables.
- Consider that we have a MySQL server and Web server running in tandem on 2 GB of RAM. MySQL is
consuming 1 GB of RAM, and the Web server, the remaining 1 GB. The rest of the memory is paged to disk
through a swap file. Paging reduces server performance by shifting memory from the faster RAM to the slower
- Under an extraordinary circumstance, a site suffers from the "Slashdot Effect", meaning that it receives
an influx of page requests beyond normal tolerance levels. The Web server scales up the number of concurrent connections
in an attempt to cope with this drastic change.
- An average system would buckle under the pressure and become progressively slower during this timeframe. asm
analyzes the server loads and discovers a spike in load averages. After exploratory research, it determines that
the Web server has grown in memory utilization from 1 GB to 1.5 GB with 1/2 GB of memory being paged to disk, thus
significantly reducing performance.
- asm analyzes other services and notes that MySQL's query and table caches are being fairly underutilized, meaning
that an unnecessary chunk of memory is being allocated to MySQL's buffers for future caching that never occurs and
is unlikely to occur in the near-future in a normal environment.
- MySQL is retuned to reduce the caching buffers thereby freeing up memory to be used by other applications, such as the Web server.
The Web server now has an extra 512 MB of RAM available, which it uses to eliminate paging by shifting memory processing back to the much faster RAM.
Load averages subside to normal tolerance levels and asm records the response for future decision making.
- What is asm in a nutshell?
An adaptive service monitor capable of analyzing server performance and tuning to ensure peak throughput is sustained.
- Who developed asm?
asm is developed by Apis Networks for use on Apis Networks servers.
- What type of statistics are used?
We use a one-sided t-distribution to compare data. Data is collected on a per-server basis,
that is to say calculations are not compared against all servers, but rather on the server that asm runs.
Signficance is evaluated at α = 0.05 once adequate data has been mined.
- Which services are monitored?
asm is capable of monitoring and tuning kernel-level metrics such as disk I/O and swaps, per-process
CPU utilization, process counts, Web server throughput, sendmail, MySQL, and PostgreSQL. asm is able to monitor and restart additional
services as the need arises, but cannot retune on-demand.
- What are some examples of dynamic retuning?
asm can switch kernel elevators (2.6), modify table/query cache allowances in MySQL,
adjust WAL and page costs in PostgreSQL, toggle keepalives in Apache, change readahead rates on drives, and even
- Is asm guaranteed to keep the server up?
asm runs as a service ontop of the Linux kernel. Under rare, unavoidable circumstances
that force an underlying kernel panic, asm is unable to fully recover the server. However, for most cases where load averages
gradually increase to levels that the server can still operate under, asm will work responsively in reducing server loads.
- When will asm be available on the servers?
Parts of asm is already live on our servers. How do you think we keep such
a solid and high uptime
? asm is built around
a modularized framework, which allows us to progressively deliver new changes to enhance overall system stability.
Originally, asm performed basic threshold checks, but today it has evolved from a series of range checks to
proactive screenings to prevent even the rarest of problems from manifesting; once is enough.