Design Ideas for a Future Computer Virus... and for a Future Security Architecture

-- by François-René Rideau

This article was initially written from 2002-01-14 to 2002-01-19. Notes on possible counter-measures were added from 2003-07-07 to 2003-07-08. This article is available under the bugroff license. Its length is about 7000 words.

Introduction

In a preceding article, Les virus informatiques comme sous-produits des logiciels exclusifs (in French; see also its -- still incomplete -- adaptation into English, Computer Viruses Are Caused By Proprietary Software), I explained the reasons why proprietary OSes are so intrinsically unsafe and prone to viruses: because software is distributed by as many monopolist black-box vendors as there are pieces of software, without any possibility of trust or coordination, and end-users (who cannot afford to surrender control of their computer to a single monopolist) are to be able install software packages received by email, CD-ROM, etc. Even if operating system makers had an interest in making things reliable (they don't), the situation is intrinsically desperate as far as security goes. The fact that mainstream systems be so ridiculously easy to infect means that those viruses that currently survive and prosper are extremely crude and primitive ones: no need to spend time being elaborate, instead just ride on the wave of one of the huge security issues that are common in proprietary systems, and spread from end-user to end-user.

As a counterpoint, I wanted to explore how elaborate viruses would have to be, so as to survive and propagate in free software systems, considering the much more secure practices by which software is distributed. And since free software systems evolve in reaction to such threats, I also explored the dynamics of this evolution, that would only raise the stakes of the challenge for such viruses.

As explained in the previous article, in the world of free software, there is a free market in organizing software into coherent reliable "distributions", that end-users install directly each from one trusted server or CD-ROM set. There is never a need for the end-user to install any small utility, pirated software, teaser demonstration, or anything; silly animations can be available using applet viewers, and real programs are packaged by your distribution maker (Debian, FreeBSD, OpenBSD, RedHat, etc.). Thus, it is a rare event that an end-user installs software, and mail user-agents and file browsers are configured to not install or run any untrusted software by default. The virus must survive all this time without being detected, and the day that the rare opportunity comes, it must be able to run on the software configurations of the day, that might have changed since the day the virus was initially launched. Also, those people who would have to be infected so that a virus could spread to lots of end-users are the developers and maintainers of software packages; but they are also the people most proficient to avoid being caught by easy tricks, the people with the greatest interest in not being caught, and the people who would most quickly find out and eradicate the virus before it could do much damage, and who would be able to fix that damage afterwards.

All these phenomena combine in making it a very difficult task for a dedicated virus developer to take by storm the free software community. To measure the difficulty of such a task, I decided to throw a few ideas for the design of a very robust virus, that could defeat the vigilance of free software developers. I then also explore the counter-measures that may have to be taken to efficiently defuse the threat of such viruses.

Malicious effects

To start with, let's recall the kind of things that a virus or other malware can do, once it has actively infected a system. Indeed, although a virus has to replicate itself so as to spread, this might not be its sole activity, and even this activity can have undesirable side-effects.

Using up resources

The most common malicious effect, is to just use the limited resources of infected systems for purposes in different from its owner's.

The "normal" activity of a virus can be disruptive in itself, by its utilization of processor time, memory storage, network bandwidth, etc., in replicating itself, and the waste of human time while fighting it.
The way the virus modifies its environment can also unpurposefully cause malfunctions, bugs, crashes, resource starvation, and otherwise prevent the normal behaviour of the systems of which it is a parasite.
The virus may participate in distributed computation projects that consume computing resources (SETI, RC5-64, Carnivore, and other such things).
The virus may participate in denial of service (DoS) attacks against some designated target, which will consume networking resources, and may result in legitimate traffic being blocked in retaliation.
The virus may open a generic backdoor for an operator or remote program to log into the system, explore it, and install new unwanted behaviours.

Counter-measures: Quotas can help stop the loose waste of resource and detect the culprit while leaving enough resources for the administrator to inspect the live system (particularly if the quota drops to zero when the end-user isn't interacting with the application or otherwise explicitly endowing it). As for saving human resources spent administering machines, it will take better, more automated system administration and application development tools.

Defeating privacy

When you monitor a system, you can try to access private data.

Passwords, passphrases, cryptographic keys, and the like, can be used to get further access.
Credit cards and other financial information can be gathered, and used automatically to fund various accounts at random, including not-so-random ones that the author controls, or charities that the author wants to fund.
All this data can be published on a broadcast medium such as newsgroups or weblogs, so the virus author can read them anonymously, and take advantage of them manually.
Trade secret, classified data, and private databases can also be accessed, and published widely.

However, before you can do all those things, you also need to have quite a good model of how these things work: detect where they are, how they can be used, when keys are used, what they open, whether it's safe to use them, etc. Avoiding needless risks, traps, etc.

Counter-measures: Have fine-grained access-control lists or capabilities, so that a program may not access data it isn't meant to access. Access-control on meta-data can also help put an early stop to adaptative attempts to propagate.

Tampering with data

When you have write access to some data, you can tamper with it and breach the integrity of the system.

If you just hate other people and want to make noise, you can just destroy their data.
A grandiose destruction can also be used by a virus to hide the smaller event that is the self-destruction of its own sensitive parts so as to avoid further tracing when it suspects having been detected.
If you really hate someone, you can modify his data in subtle but critical ways, that will bite him hard the day he uses it for an important purpose. Then cover your tracks by erasing the virus completely (but not anything else) so he doesn't suspect his data ever had the opportunity of being corrupted.
If the data is valuable, you can also encrypt it, and ask the owner for a ransom in exchange for decryption keys. This only works for incremental data that hasn't been backed up, though. Hower, it may work if you've been tampering with the data for a long time, never giving it a chance to be backed up, and have recorded an encrypted log of fixes to this data.
In some case, devices attached to infected computers can be damaged or destroyed by pushing their controls beyond safe limits. If you're sure about controlling valuable devices and data, you can ask for a ransom for not tampering with them (this smells a lot like Palladium).

These are by far the most malicious things one can do with a virus. Definitely evil. Definitely difficult to do, too, since detecting what applies and preparing for action without getting caught, in a fully automated program that doesn't know much in advance about its target systems, and with little to no interaction, is quite a feat.

Counter-measures: the same counter-measures of access-control as for breaches in privacy can be extended to deal with breaches in integrity. Additionally, application-level write access to data that matters to users should in as much as possible take place in reversible ways: modifications should be logged together with previous unmodified state, so as to be able to override mischevious modifications and undo any bad effect. In other words, user-level system data should be stored in a "monotonic" knowledge-base, where the available information increases but is never deleted (or at least, not until it has been safely backed up, and not without explicit user consent). Considering the price of storage hardware and the input rate of a user with keyboard and mouse, this could be done systematically for any document entered on a typical console, and possibly also for typical uses of audio documents and pictures, etc.

Summary of malicious effects

Note that the difficulties of achieving these malicious effects are not very much changed from a proprietary software system to a free software system. However, the diversity of free software systems in competition as regards security as well as thousands of details to which to adapt certainly makes it harder for virus developers to write a virus that can adapt to all the varying conditions met in the diverse systems they will have to target, as opposed to the uniform conditions imposed in the monocultures that proprietary software systems are. See Understanding the World below.

Counter-measures: To control access to resources in general, fine-grained quotas can help refine detection, at the expense of additional management: per-user, per-session, per-program and per-process quotas and access-control, resource-level and application-level filters; some of them might be temporarily passed over as capabilities, but reclaimed by the quota-owner and terminated with it. Additionally, all user-produced modifications should hopefully be stored in permanent storage media. Dynamic resource allocation and supervision tools will ultimately have to be developed that will automatically manage normal increase in resource usage and detection of suspicious patterns. Actually, the same techniques that must be used to prevent virus attack are only a natural elaboration of routine techniques that allow for safe and efficient computing in general, and ought to become available in standard system administration tools and development environments.

Stealth

In a secure system, viruses don't often get the opportunity to propagate or to execute malicious code. And when they do, they can be busted quickly, which would stop further propagation. Until they can reach the climax of their life and die, they must survive, and thus avoid detection by the immunity systems of target computer systems (intrusion-detection software, watchful system administrators, attentive users). The main stealth techniques for going undetected can be divided into Polymorphism, Discretion and Track-Covering.

Polymorphism

Polymorphism avoids recognition of the virus in its inactive forms by simple pattern-matching techniques thanks to the introduction of randomness and complexity in the aspect of the virus. Lack of polymorphism makes a virus easy to recognize, once first identified.

The main design decision here will be to compile the virus into bytecode, wordcode, compressed parse-tree, or some other high-level representation that allows for a lot of arbitrariness in creating variations or the code, independently from the processor-specific back-end, while at the same time permitting a great total set of high-level features.
Change the values (and maybe even meaning and grouping) of the bytecodes for every target executable; for instance, take a long sequence of different bytes in the executable, and use them as the seed for the bytecode interpreter or compiler.
To minimize impact on the program size, reuse parts of the existing program for both bytecode and bytecode interpreter, whenever possible.
Use some source of randomness when generating the bytecode and its interpreter.
Another advantage of the bytecode approach is that most of a virus' infrastructure can be high-level and target-system independent, thus sharing a lot between different operating systems or microprocessors.
Byte-code and multi-abstraction-level design also allows for making it harder to reverse-engineer the virus, making it last longer.
If speed is ever a problem, byte-code interpretation can be used just to dynamically compile and link a native version of those speed-sensitive parts of the virus. Again, randomness could be used to avoid in-memory detection by simple pattern-matching on memory.
When intercepting a system or library call, try to do it up the call chain, if possible, so as to make pattern-matching-based detection harder.

Counter-measures: There can be an arms-race of pattern-recognition and code-obfuscation between viruses and virus-detection programs; but viruses have the initiative in this regard, and current viruses will forever remain undetectable by the current generation of virus detectors. Moreover if virus designers get really serious, they will become completely undetectable by any affordable pattern-recognition technique.

Thus, while reactive anti-virus software can be a complement to the security, and a quick and dirty audit tool for the time being, they cannot constitute an effective barrage against security breaches. People responsible for computer security should not rely on them, and instead take proactive measures starting with sensible access control.

Discretion

Discretion is about avoiding recognition of the virus in its active forms by people or programs monitoring running processes. It works by only altering infected programs' behaviour in subtle ways that are difficult to observe, and sometimes even impossible to observe with the coarse-grained tools most used for monitoring (e.g. ltrace or strace).

Try to only eat resources when they will go unnoticed, interleaving any heavy CPU usage with normal computations, consuming memory during normal allocations, accessing disk or doing system or library calls along with the infected program, etc. Behaving as a parasite, just like a government uses taxation to live on the back of productive people.
Try to modify the behaviour of a program without modifying its system call pattern; e.g. read time by intercepting calls to time(2), allocate memory by intercepting sbrk(2), mmap(2) or malloc(3). If the modified program doesn't do all these things, it is still possible to do more, but then it would be more risky.
Choose behaviour alterations depending on target program behaviour, as known from a base of common knowledge, from data extrapolated from the information on DLL calls, and/or from actual observation and monitoring of its behaviour.
Have a model of how risky every observable behaviour is, and pick a random number to decide whether to undertake it, with a probability inversely proportional to the risk taken. Use system calls (particularly date and time) as source of randomness, and use information gathered at infection time in the model.
Adapt risk estimation and, accordingly, risk-taking, to access rights already obtained, rights hopefully obtained, and corresponding capacity of track-covering (see below): Do not hesitate to do things that would otherwise be considered as risky when you're pretty sure you can't be watched; but conversely, don't take any risk that is made wholly unnecessary because the desired effect is already achieved.
Use subliminal channels for communication: hide local persistent data in seemingly innocent configuration files; hide LAN communication in ARP or ICMP packets; hide WAN broadcast messages in USENET posts or weblog comments; etc.
Once again, ride on existing files, TCP connections, and posts, when applicable. Otherwise, use automatic content generation (based on a dynamically acquired corpus) to make these messages indistinguishable from normal noisy activity for all those who don't have access to enough information to know that there shouldn't have been activity here to begin with. Use volume control techniques (including exponential backup) so as not to get noticed by extraordinary volume while using any such medium. Use steganography to hide messages in all the above activity.
When applicable, avoid dynamic library calls, that can be intercepted by ltrace and dependency-tracing wrappers, and can interfere with library extensions. Use direct system calls instead that can only be seen by strace. Even then, a system call added to the normal behaviour of the direct program is suspect, and it's better to hide it in the middle of lots of similar "normal" calls.
Try to detect the absence of a debugger or monitor before to do anything visibly wrong. If a debugger is detected, skip any behaviour, or better, mislead the enemy with a fake behaviour.
Do not help anti-virus developers by providing an accurate check of whether a particular file, process or system is already infected. For instance, a check that doesn't give false negative is sufficient to avoid re-infection and the noticeable exponential slowdown factor that it would incur. Check routines easily accessible to the anti-virus developers should have 50% to 99% of false positive. More generally, see Multilayered Security below.
To avoid detection by anti-virus monitors scanning the memory of running processes, or disassembly by debugging enemies, use just-in-time generation and erasure of code for whatever bits of native code would have a typical pattern; also use the technique for critical bytecodes such as a check as above.

Counter-measures: The arms-race here will be in the development of high-level application-specific filters and logging. That is, system administrators must also develop a better model of what is normal interaction, so as to detect unusual patterns. Well understood high-level patterns of interaction, of course, should be directly encoded in programming languages, which then makes possible to enforce the disciplined adherence of system behaviour to such high-level patterns. It then becomes possible to control system access, and to log and monitor authorized and unauthorized accesses at a more synthetic level, which makes it both easier for the administrator to assess, and more difficult for the attacker to mimick. Of course, such evolution toward higher-level programming languages with system-enforced invariant testing and enforcement ought to be the general direction of software development in general.

Track-Covering

Track-Covering alters the very means of detection of the immunity system, so as to confer invisibility to behaviours that would otherwise be quickly noticed. It consists in using the powers of the infected binaries, libraries and kernel, so as to make their alterations invisible to someone inspecting the system using tools that are themselves infected.

Though doing things without having one's track covered is risky, trying to cover tracks is risky, too, for someone not infected will not observe the same behaviour, and the discrepancy can lead to discovery of the virus. Hence, decisions must again be taken according to a model of the environment of the virus: who are the people whose files are currently infected? What are they likely to observe? Last but not least, track-covering alterations must be consistent. If we take the decision to pretend that some file's size or md5sum is not its real size or md5sum, but the one of the original file before infection, then we must do it always from now on, or at least not change on a whim. Consecutive tests should give the same result.
As for taking the decision of faking original file contents, size and/or checksums, the question is whether anyone who will check them later remembers the old values from before the infection, or will he check with values of the file already infected. In the latter case, nothing must be done. In the former, the old values must be constantly delivered, and it would be great to avoid doing any modification, unless necessary, or to revert modifications, once higher access rights are granted. For instance, if system libraries and crucial binaries are hit, there's no need to modify all the remaining binaries. And if the kernel is hit, even libraries are not so important to infect after all, unless copied to someone else over the network, or to restore kernel infection after the kernel is upgraded.
Files that mostly do not need content, size and checksum faking are files that have been created in infected state, at compile-time, link-time or assembly-time. However, they might still require some content faking when they are being debugged, so as to hide any obviously altered behaviour.

Counter-measures: Once the virus was activated without triggering immediate detection, it is often too late to save the infected system. However, once again, the system-wide enforcement of high-level discipline of abstract system behaviour, can help limit any damage (particularly so if data that matters to users is stored in monotonic data bases), and also help track culprit processes. This means that systems in the future ought to never execute potentially unsafe code but in high-level virtual machines (possibly compiled to machine code and cached, but subject to strict high-level abstract semantics).

All in all, and not so surprizingly, we find that high-level abstract machines are a useful tool for white hat hackers as well as for black hat hackers, and here, the good guys have the initiative, as systems will hopefully emerge that adopt safe virtual machines as compulsory protection against malicious (or merely buggy) software.

Architecture

The previous section on stealth was about negative design constraints on a virus: what a virus cannot do and how it cannot do things, and in within what limits it can live and survive. There remains to see what a virus can do and how it could be organized, so as to be able to replicate and do other useful things during its lifespan. That's what architecture is about.

Multilayered Security

It is a standard security procedure to provide failover modes of execution for systems that must resist attacks. Somehow, the same applies to viruses, that have to resist understanding by potential anti-virus developers. The idea is that the virus, like any kind of robust software, must be designed around several levels of behaviour, each with its own security requirements, so that if one security measure is defeated, there are still other fallback measures to defeat for the attacker to get to the heart of the system.

Before to enter a more malicious mode that will incur a much higher risk of getting caught, try to find a scapegoat that can be blamed for the damage. For instance, in a multi-user system, you can pick an innocent user, and only have externally observable malicious effects when this user is logged in, run malicious processes under his name, etc. Or in a server, you can trigger malicious behaviour right after some upgrade or maintenance operation, etc., and produce a binary and/or source containing the code for the malicious behaviour without any reference to the rest of the virus infrastructure. The idea is that by blaming the wrong person or binary, some silly administrators won't tackle the real problem, and may even restore already infected binaries. A successful scapegoat tactic of course depends on careful selection of the scapegoat, hence on a very good understanding of the way system administrators think and react, and on the events that they think about.
Use decoys, that is, non-functional behaviour that will lead the anti-virus developer or system administrator in wrong directions that will keep him busy while the virus is at work in other places. For instance, don't just publish informational messages on USENET, but also publish really garbage messages with no content at wrong places. Do computations that will boggle the debugging engineer's mind, but only wastes his time, or earn you blocks on SETI or another distributed computation project (it would be funny if the virus did find something, as no one would raise to receive credits). Infect binaries with visible bad behaviour, that seem to relate to what the virus does, but doesn't really.
When the virus is successfully detected, use shunts, that is, chunks of information that can be discovered without the virus analyst being able to understand everything about the virus out of that knowledge. Hence, in a source or binary version of the virus, keep whatever must be visible obfuscated, and compress and encrypt everything that isn't strictly necessary for bootstrapping purposes, and hide it steganographically as innocent-looking source or binary code. Use debug-protection techniques to prevent this further code from being decoded unless in a safe environment; if in an unsafe environment, decode a scapegoat instead. It should be conspicuously easier to decode the scapegoat, so that the virus analyst will think he has understood everything, whereas he hasn't.
For each level of virus behaviour, have appropriate shunts, scapegoats, decoys, etc. Be more subtle as you get deeper in the virus. Base the delusion at each place on the expected level of proficiency of someone who would have gone this far. Of course, you could try submit the problem to some friends, but then you'd have to kill them afterwards.
Lock the deepest parts of the virus with cryptographic means such that the decryption key will have to be fetched from steganographic messages within a large body of USENET postings.
Use background noise as part of the encryption mechanism, so there is always a remaining random-looking chunk of bits in the code for the virus, whether it has been completely decoded or not -- thus the virus analysts never know for sure if they understand it all, or if there is still a hidden evil in what looks like noise; sometimes, they may feel safe, and be bitten later. Eventually, they may be actually safe, but will still worry and have horrible nightmares when they sleep. Muhahaha!

Counter-measures: To resist the attack of viruses designed in multiple layers of security, system designers will themselves have to design system security architecture in multiple layers. So as to prevent malicious programs from selectively triggering their behaviour systems will screen any information on which to base mischievous selection of behaviour. Unless specifically meant to access such information, software modules should not be able to distinguish a "normal" failure from an access violation in their own operation or the operation of a peer component. Introspection of access rights should be forbidden by default; indirect introspection through the gathering of system statistics and similar data should be discouraged, too. Honey pots and other kind of fool's gold should be systematically setup so as to catch suspicious behaviour -- and the absence of introspection should prevent the virus from avoiding discovery in such setting. The virus developer too, can be made to feel unsafe about being discovered, fast. Access rights designed for fault containment should also prevent complete system takeover once malicious software is executed in one place. Special cryptographic keys unlocked by a password prompted on a secure system menu should be required to modify the kernel and other security sensitive configuration, for instance. Finally, the layered use of virtual machines that ensure execution according to proper semantic model at each level of the system minimizes the edge for dangerous behaviour from malicious or spurious programs.

Targets

The whole structure of the virus will be articulated around its various modes of execution, as determined by what target it has already infected, and what targets will be trying to infect next.

The ultimate piece of software to infect on a computer, so as to be able to intercept everything, yet cover all tracks, is the operating system kernel (or, further along the same line, the ROM BIOS; even further, there's the user's mind, but that's not on the computer). However, this isn't always possible, and when it is, it mightn't take effect without restarting the machine (unless a module is inserted at runtime). The next target, giving almost as much control as the kernel, yet easier to infect, is the system's main shared libraries. While these are great for doing things unnoticed in the system afterwards, however, the kernel and shared libraries are typically not things that get replicated and distributed to further computers. So their infection may correspond to the virus entering a mode of greater confidence in its local environment (after it checks it isn't in a honeypot), and can deploy more of its self-reproducing and malicious behaviours; but isn't directly conducive to replication to further systems.
As demonstrated by Ken Thompson, the best targets for the further propagation of infected programs are compilers, linkers, installers, network communication agents, and anything that gets an opportunity to make executables, or export or import them between systems: cc, as, ld, tar, scp, install, dpkg, perl, sendmail.
Similarly, the best targets to open a backdoor access are thus login, sshd, inetd, telnetd, libpam, etc., and all programs that already accept new connections during their normal use.
Don't even think of infecting small fry: when your access rights allow you to infect best choice targets, don't run the risk of detection by attacking lesser targets. Too small files are particularly to be avoided, unless with a good reason: they are the ones that usually don't run for long, so you can't do much unnoticed while they run, and the ones that will be used as obvious "honeypot" by virus analysts trying to catch "the virus, just the virus, nothing but the virus". If some small fry binary is to be infected, only put a small stub inside, to activate the full virus from another location.
Infect programs with the greatest chance of being distributed. This means that you have a model of which programs are copied between systems; for instance, programs with a high rate of buggy and/or incompatible behaviour may be copied from another system when the upgrade leads to utter failure (e.g. the Linux pppd); also games not part of standard distributions, proprietary software, or source code of "latest" software. Often, it's a .tar.gz, .rpm, .deb or .zip package that should be tampered with, but it is risky to do it and risk detection by some software automatically comparing checksums, unless the infection was done at the moment the package was created, or the checksums can be tampered with at the moment the package is installed. Again, making the right decision here depends on the virus somehow understanding the world (see below).
Similarly, infecting shell or editor initialization scripts is dangerous, and should be used with much care in making things discreet. It is better to infect writeable user programs in his path or maybe to install latest (and infected) versions of some GNU utilities there (particularly if they are missing from /usr/ports and /usr/local). It all depends on the model we have of the proficiency of the user, the proficiency of the administrator, and what the administrator thinks about the proficiency of the user: would new binaries in $HOME/bin seem suspect? might the administrator be stupid enough to run ./ls --version?
The best target computers for replication of the full virus are machines from which originate programs distributed around the world: developers' machines, distribution-makers' machines, distribution mirror servers, file servers, mail routers, etc. They are also the best protected machines, those where it is most likely that abnormal behaviour will be spotted, that verification scripts run, that a potentially proficient virus analyst may attack the virus, etc. At the same time, developers often over-estimate their own capacity; they might recompile programs from the sources, but usually won't audit it enough to spot well-hidden dangerous code.
Other machines are only targets for end-of-chain malicious effects (see above), and should be endowed with a slave backdoor to obey the rest of the distributed network of infected machines, but not with enough information to be anything like a master in this whole network. Lesser machines can also be used to originate risky things, easily observable effects like attacks of new systems, etc.; they then act as shunts, and can be endowed with a scapegoat and/or decoy, in case they are caught in the act.
When spreading through executable files, there is the problem of avoiding to infect a same file many times, which would cause incremental slow down and size increase as the executable file grows and grows, until the infection becomes obvious. The viruses must thus include a test to check whether a file is already infected, and this check is a weakness in the virus, since isolating it could give the virus analyst a great tool to detect infections. Measures to overcome this weakness include: (1) making the check expensive enough that it is slow to run it over a large number of files -- unless you share a cryptographic secret known only to the virus author or (2) making the test inaccurate, with a large number of false positive (executable files that look like they are infected to the test, even though they are not, and will thus be made immune), but few false negative, if any, so as to avoid overinfection. For instance, one virus of old was testing a given bit in the contents or meta-data of a file; every other file would thus look like it had already already been infected. Deeper layers of the virus might have more accurate tests, but they must only be made available when controlling the running environment (i.e. running as root in the machine, and controlling the libc and/or the kernel).
When spreading through source files, it is necessary to ensure that the change won't be too obvious. For instance committing the virus as an incremental change to parts of a CVS tree browsed by hundreds of persons is a big no-no, but adding the virus to a .tar.gz source archive is ok. It is best to only target those projects that are not yet using any source control, or that are extracted from a CVS tree without the active CVS information. If the source attack is successful, the newly compiled virus could take care of removing the source modifications before they become visible. When there is an opportunity (which is rarer), it is safer is to attack intermediate object files instead.
The virus must be able to run unattended without a glitch, and without user-visible effect, as long as it isn't actively watched by a proficient security expert, or hasn't decided to do something conspicuously visible (see malicious effects above). When watched, it must reveal as little as possible about its internal purposes. A virus module must not have more readable information than it needs, and it must not reveal such readable information to any less-trusted instance of the virus, unless temporarily, as required to achieve some specific purpose, depending on the currently secured targets, and the next targets to aim at. You can see that as a need-to-know approach to security among the many modules that make the virus.

Counter-measures: Basic infrastructure such as kernel, compiler, communication layers, user interface, etc., should be properly isolated from tampering. Use of unauthorized customized versions should typically happen within virtual machines that isolate the rest of the system from the risky activity. Using infected virtual machines as honey pot, it is easier to gather enough information on the attacker to successfully crack down on him. Automatic audits by secure parts of the system based on checksums of sensitive code and data, as stored on read-only media can help assess access rights violations. Modification of pervasive configuration scripts as well as of binaries should be taken with extreme care, and the effect of unaudited modifications should be sandboxed by default until properly validated. Programs being distributed, the perfect target for malicious hackers, can be subject to code review; interfaces and access rights policies can be cryptographically signed by many reviewers, independently from the code itself, each release of which needs only be signed by its author. Ultimately, this means that software will be developed using high-level programming languages that have a static and/or dynamic type system that can express constraints on side-effects.

Understanding the World

As we saw above over and over, a crucial part in a successful virus is that it have a good enough model of the world, the various protection mechanisms it will have to defeat, the current environment in which is it running now, etc., so as to succeed. Actually, whereas all the features discussed above are "mere" engineering, that any dark side hacker or team of hacker could easily perform, given enough dedication, the crux of a virus, what will make it successful or not, will be its ability to adapt to the hostile world in which it is let loose. And when having to fight the security defenses of the ultimate target sought, that is, the light side hackers who make and distribute free software, having lots of well-oiled mechanisms won't be enough, without the sense to use the right one at the right moment.

The first and foremost source of information for the virus is the set of rules defined by the authors, based on their expertise. At the moment an executable file is infected, knowledge about the file being infected, its purpose, its behaviour, the libc symbols it links to, etc., can help take decisions as to what behaviours of the virus to enable. Then, when activated, the virus can gather information from the current host, by probing the filesystem and the other running processes. After having inspected the current host, and tried its best to get the highest privileges on the current host or set of file-sharing hosts, the virus can begin probing the network, and try to extend over the LAN, and then to the Internet. At each step, the virus must assess a good enough model of what it can do or must not do, so as to replicate as much as possible without getting caught, which would raise significantly the barriers to penetration into the ultimate target: developers' machines.
Local information can be stored and shared among runs of infected files by concealing it discreetly in configuration files, using cryptography to make it unreadable by others, and using steganography to make it unnoticeable to those not looking at the right place. Netscape cache, gimp swap, core files, GNOME configuration files, etc., anything can do, as long as other programs have low probability of tampering with it, and humans have low probability of watching it.
Instances of the virus running on multiple users on the same machine, multiple machines in the same local network, multiple sites around the world, etc., can coordinate: so as to reduce the amount of visible activity and risk-taking for a given achievement, so as to detect which copies got busted after which attempt, so as to gather usage statistics and be able to hide better, so as to share secret keys that were discovered, so as to discover network topology and scout for more targets to infect, so as to efficiently distribute information among all copies without overloading broadcast media like the USENET, etc. This coordination is only useful if it can help the virus learn from the shared information. It depends on the virus integrating a dynamic expert system that can actually enhance its model of the world thanks to data about past successes and failures.
Reaction to failure and discovery can be either to self-destruct, or to turn into a cheap plastic imitation (scapegoat or decoy). Reaction to failure of a peer will affect all other instances; the virus must make an estimate of what the enemy now knows. Depending on that estimate, it may choose to trust the current stealth techniques and go on or to retreat to a more stealthy mode: this retreat can consist in erasing tracks and committing suicide, in hiding and remaining dormant until possible better days (depending on the estimated attention span of the enemy), in polymorphing into scapegoats or decoys, or in becoming very aggressive in a desperate move before everything goes bust. Monitoring the history of peer instances, warning peers before something risky is attempted, etc., and reassuring them after it's successful, is thus essential in choosing strategy.
It is important for the virus to detect when it is being watched, when it has been discovered, when it is running in a honeypot, etc., so as to adapt its behaviour and preserve its invisibility or at least its secrets. Conversely, it is important to observe the pattern of "normal" use of the computer: what user is running which programs for how long, watching which files and leaving which files alone, etc., so as to decide what penetration technique to use next. This detection can be based on general rules, but also on observation of what users to watch out for, what processes are running with what pattern, etc., what are usual behaviours, and what are exceptional behaviours. Ideally, the virus ought to be able to identify some SETI or distributed.net as a long running process that is valuable to subvert, although their binaries could also be identified as highly watched with widely known checksum and it would be safer to tamper with a library they use than with the closely watched binaries themselves.
Once safe enough in their new host environment, virus instances can gather information from broadcast media on the internet or from peers, about new stealth precautions to take, new penetration techniques, new strategic models to use, new scapegoats and decoys, etc. Basing the virus on a high-level knowledge-based expert system that includes its own compiler can help take advantage of very compact declarative update information in an incremental way, which is most adapted to the lossy medium through which updates are transmitted.
Virus instances are to publish information about which were the successful targets, which are the next attempts, and whether the past published attempts were successful, so as to gather information about which targets are a better choice for further development, and which were the ones most conducive to detection. Most interesting is those programs that are compiled in userland and allow or not to infect further people or enter root mode.

All in all, we saw that a really adaptative virus would have to be based on an expert system that has a dynamic knowledge base, that describes both all the relevant security practices and all relevant software development and distribution practices, so that the virus can establish sensible strategies. Interestingly, if such an expert system existed, it could be used by "white-hat hackers" to build more robust systems impervious to virus attacks.

Counter-measures: Just like malicious software is ultimately based on an understanding of what bad behaviour is possible to do that isn't understood as bad by system defenses, system defenses is based on understanding what is legitimate software behaviour and what is malicious behaviour. Ultimately, virus development is an arms-race between the malicious and the legitimate programmers; but as long as legitimate developers have the initial control of their own machine, they have the initiative in taking measures to defend their computer systems. One notable pitfall to avoid is the paranoia by which system administrators would fail to understand that some behaviour is legitimate and prohibit it: either they will succeed in prohibition, and destroy part of the utility of the computer systems, or they will fail in their prohibition, and open new vulnerabilities in their systems as users develop work-arounds for unadapted administrative policies. In the end, the computers are tools to satisfy the users, and administrators are there to serve users, not to impede their work. When system administrators fall in that pitfall, malicious developers win. (Same as when governments destroy liberties of law-abiding citizens as a response to bombings by terrorist outlaws.) Instead, legitimate users, developers, etc., must cooperate into defining and refining more efficient, higher-level, more automated uses of their computer systems -- and thus keep the lead in the race toward better computerized understanding of the world.

There is no way in which computer security experts can prevent virus developers from picking a good architecture. What we can do is develop computer environments that provide no nutrients for viruses to develop: make it so it will take a virus developer extreme pain in terms of complexity to handle, to achieve little gain in terms of virus survival. We can starve viruses out, if we are proactive in administering what programs can do in general, so as to trap viruses as a (very) particular case. Note then how we're back to the initial problem discussed in previous article, where free software developers can afford a policy that is both tighter and dynamically easier to adapt as compared to proprietary software developers with respect to access rights for installing new software; which is why free software is intrinsically more secure.

Conclusion

For a virus to be successful in surviving and doing malicious things in current and future free software operating systems, it would require quite an amount of proficiency, work, tenacity. Making a virus that can robustly scale to a large range of situations, is really a particularly hard instance of the problem of making a robust and adaptative piece of software, with the additional constraint that it is hardly possible if at all to upgrade, fix or patch the software after it was originally let loose, whereas those who will attack the virus, though they start without much information, will be able to disassemble it, test it in laboratory conditions, exchange information, grow new intrusion-detection techniques, learn better habits, etc.

With a fraction of the work invested in building a really robust virus, the person or group of developers able to build it could get rich and famous at developing actually useful software: writing compilers or decompilers, doing security consultancy, developing copy-protection schemes, growing expert systems, engineering large projects, teaching software design, building systems that automate tasks currently done by human administrators. In contrast, as far as infecting proprietary systems goes, viruses need be neither robust nor scalable, neither stealthy nor architected around multiple layers: a dedicated teenager can write one that will spread all over the world. Meanwhile, the overhead cost of entry before one can do useful, worthwhile work is very low in free software communities, whereas it is very high in proprietary software communities. These economic considerations again explain why viruses are such a constant nuisance with proprietary systems, whereas it is unlikely that they will ever be much of a danger with free software systems.

Additional Pointers

My preceding article, Les virus informatiques comme sous-produits des logiciels exclusifs (in French) and its -- still incomplete -- adaptation into English, Computer Viruses Are Caused By Proprietary Software.
Other articles that I (Faré) wrote.
A great story for all those interested in network security hacking and high-level programs of extreme complexity: True Names, by Vernor Vinge.
Bruce Ediger's page about viruses under UNIX.
Viruses may be one thing, worms are another; read about Warhol worms
If you like this article, you can prove it by sending me a tip (using e-gold or paypal, etc.).

Faré -- François-René Rideau -- Ðặng-Vũ Bân