I’m
a Lumberjack and I’m OK:
System Logging
Prof M. E. Kabay, PhD, CISSP-ISSMP
[1]
O |
ne of the more
arcane aspects of running a multiuser computer system is logging. No, not the kind of logging that the Monty Python character
sings about in his famous
(or notorious) song about British Columbia
[2]
– this kind of logging refers to keeping records of many
different types of events on the system. For example, operating systems on
mainframes, minicomputers, and some LANs (or on LANs equipped with appropriate
monitoring software) can record
events such as
Logging serves
many functions. Basically, logging keeps a record of who was doing what to
which data on what systems at which times. Simply knowing that logging is
taking place and contribute to self regulation: knowing that actions are
monitored and reduce harmful behavior. Log files provide information for
controlling the system; for example, system administrators and managers can
limit access to certain resources in response to observations of abuse and they
can change parameters and resources in response to trends. Log files also serve
a fundamental purpose in forensic investigations.
In the early
days of computing, system managers had to decide how often to close log files;
disk space was expensive. However, these days, disk space is not much of an
issue. In 1980, as I’ve mentioned in other commentaries, a 120 MB hard disk the
size of a washing machine cost U$25,000 – about U$75,000 in current value and
roughly U$625/MB. In contrast, today as I write in early 2009, a 1 TB Maxtor
hard disk the size of a book costs less than $200 or roughly U$0.0002/MB (~1/50th
of a penny per MB). So the price was ~3,276,800 times greater 30 years ago,
which incidentally works out to about a 61% drop per year in compounded change.[3]
So the main issue today with respect to logging onto disk is simply preventing
data loss if the system or the logging process crashes; one approach is to disable
buffered I/O and force immediate writes to the disk. Since one can and should
dedicate a storage device exclusively, performance should not be a problem as
long as the logging process is separate and does not hold up what is being
monitored.
The system manager
must decide how long to archive log files. Usually, there are legal
requirements which can guide the establishment of definite policies. These
policies must be monitored and enforced to avoid serious problems such as
contempt of court citations if records are deleted in violation of those
requirements. For legal and functional purposes, log files must be archived in
environmentally sound and secure storage facilities; normally, these are off
site just like those for other system backups.
Each operating
system can have particular variations in log file structure; you should look for
log-file analysis tools that are specific to your environment. A search engine
such as GOOGLE provides a wealth of references for the string “operating system” “log file analysis”;
for example, at the top of the 13,400 hits located in 0.94 seconds in my
search, there were dozens of products listed right from the start of the
results. The Exefind Software Search
service alone which was located by the GOOGLE search listed five pages of names
and descriptions of log-analysis utilities.[4]
Such analytical
software is necessary because it is usually impossible to examine all the records
- there could be millions. One needs to be able to break out the unusual
events. Using appropriate software, one can set filters to scan for unusual
conditions. Some systems define baseline events – that is, the norm – and spot
unusual ones using statistical methods. Human beings can scan the exception
reports and look for patterns; more sophisticated software systems can use
artificial intelligence (AI) to help spot patterns and anomalies.
AI systems may
use accumulated observations and statistical methods to spot outliers that
signal unusual and perhaps dangerous events; for example, suppose that no more
than one user logon in 1,000 over the last year has used an ID from the
accounting department between the hours of midnight and six in the morning – so
why did “Ralph” try to logon at 3:30 in the morning overnight? And if the real
“Ralph” from accounting has not had to try his password more than twice in the
last 1,523 logons, then how come this “Ralph” tried 18 passwords at 3:30 in the
morning before giving up?
In the early
days of computing, users normally got charged for every aspect of resource utilization;
for example they might accumulate charges of $0.00001/disk I/O and
$0.00002/process initiation. Users received the itemized bills showing their
resource utilization – typically monthly; such bills promoted efficient use of
resources. I remember personally being alerted by the Vice President of
Operations at Mathema, the big data center where I worked in the 1980s, that
one of our clients was generating reports that were three to four times more
expensive in the last few months than in previous years. He told me to find out
why and to help reduce the costs.[5]
Searching the log files for changes in their resource utilization, I noted that
their total disk I/O charges had been climbing rapidly in recent months;
further analysis showed that both online access and batch job I/O during the reports
on orders had gone out of line. I investigated their databases and found that they
had not repacked their most-used detail data set for a long time. As a result,
records for line items in individual orders were no longer placed in contiguity
according to the primary index[6],
order number. Thus printing out lists of all the order details forced the
database to make multiple reads over the data set. I told the client database
administrator to repack the data set so that related records would end up in
the same data block, reducing the number of I/Os in reads using the order
number by the blocking factor (records per block). The clients’ report costs
for their monthly global report dropped from $1,200 back down into the $300
range.
Chargeback
systems can also play a direct role in improving system security by increasing
user involvement. Any user being charged more than expected can alert system
management to the anomaly, which may be the result of hacking or of malware.
If log files
are to be useful in forensic work, they must be protected against unauthorized
alterations. There are many ways of doing so including checksums, digital
signatures, and encryption. Checksums are hash totals generated using a
standard algorithm which are appended to each record; any change to the record
that does not recompute the checksum using the right algorithm will allow
immediate identification of the corrupted record. In addition, if the checksum is
computed by including the checksum from the preceding record, an attacker would
have to recompute the checksums for every single record following an insertion,
deletion, or modification of the original records.
Once a log file
has been closed, one can use public key cryptography to sign the entire file with
an authorized signing key. No one without access to the signing key will be
able to generate a valid digital signature for a modified log file.
Large systems
often made use of memory dumps, which in the 1980s were still frequently called
core dumps.[7]
These core dumps were files containing the entire contents of active memory
(RAM) and were very useful for debugging and forensics. There were two types of
dumps: some were obtained through diagnostic utilities (debuggers) in real time
and others were captured after abnormal system shutdown using special
utilities.
System level
DEBUG utilities gave administrators complete access to RAM, thus allowing a
total bypass of the system security. These were extremely powerful and
therefore dangerous tools; they allowed users with root privileges to copy or
alter any portion of memory, to access system tables by name and make changes
such as stopping processes, altering priorities and so on. It was critically
important to control access to these tools; often, it was forbidden for any one
user to initiate the system debug without the presence of a colleague.
Early core
dumps were so tiny by today's standards that they could actually be printed to
paper; for example, an HP 3000 Series III maximum memory size in 1980 was 2 MB.
A perfectly normal PC today can easily have a minimum of 2 GB of RAM – quite
impossible to print and read manually.[8]
Even in the 1980s, however, we much preferred to work directly on electronic
versions of the core dumps (usually mag tapes) and do our analysis on our
terminals using the analytical utilities.
A core dump can
be a major security vulnerability. It contains cleartext versions of vast
amounts of confidential and possibly of encrypted data; it can included I/O
buffers such as input from keyboards and files or output to displays and files.
It would be a disaster to release a core dump to unauthorized personnel. There
is even a serious question about whether vendors should be permitted to see
memory dumps.
This informal
overview should get you interested in finding out about logging in your own
case-study organization or industry. Ask your system and network managers about
their practices and discuss the costs and benefits of current standards and
possible enhancements with them.
And we haven’t
even touched the issue of application logging. . . .
[1] Sincere thanks to MSIA Administrative
Director Elizabeth Templeton for proofreading.
[2] Jones, T. M. Palin, & F. Tomlinson
(1969). I’m a Lumberjack and I’m OK. Monty Python Show.
Official video at < http://pythonline.com/node/18116931
>.
Lyrics at < http://www.metrolyrics.com/im-a-lumberjack-lyrics-monty-python.html
>.
[3] (1/3,276,800)^(1/30)=0.6065
[5] Students may be surprised to learn that
the data center manager would want to reduce
income from the charges to a client. However, we took a long-term view of our
relationship with our clients; our job was to help them make the best of our
services so that they would stay with us as long as it made sense and then buy
computer systems from us (we were an authorized HP distributor) when it was
time for them to become autonomous.
[6] A detail data set might have many
indexes, but one of them was supposed to be identified as the most frequently
used one and could be designated the primary index. In a special operation
called a data set condense or repack, all the records would be rewritten so
that they would fall into place according to the increasing value of their
primary index. Thus if the order number were the primary, as many records as
possible for order #1245 would be placed one after another – perhaps in the
same block (the unit of physical I/O;
i.e., what got read into a memory buffer by the disk driver). Thus reading all
the records for a specific order number would be measurably faster from a
packed data set because the blocking factor would determine the total number of
I/Os required to read the entire file. If there were, say, 100,000 record
packed 10 records per block, only 10,000 I/Os would be required instead of
100,000. Since the drives of that time could not reasonably complete more than
40 I/Os per second, the difference between 100,000 I/Os and 10,000 I/Os was the
difference between 41 minutes just for physical I/O and 4 minutes. CPU overhead
would normally add at least as much time, so the difference in job time could
be 80 minutes for the unpacked data set versus 8 minutes for the properly
packed data set.
[7] Here’s one of my
favorite stories about tech support from
the days when I was in charge of the Phone
In Consulting Service (PICS) at Hewlett-Packard in the early 1980s. Here’s
an excerpt from a real conversation with a customer from that time:
Customer: The HP3000 crashed 10 minutes ago.
Mich: So did you take a dump?
Customer: [long pause] Yes, but I don’t see
what that has to do with the computer.
Mich: NO, a core dump!
Customer: Oh – a core dump – yeah, sure, I did that.
[8] Suppose we use 256 characters x 88
lines = 22,528 bytes/page. Then 1 MB would take ~46 pp and 2 GB would take ~95,000
pp. If the manual inspection rate were 1 minute per page (and that would be
fast), it would take ~66 days to read the dump once.