Term Project Proposals in Computer Science

Term Project Proposals
in Computer Science

François-René Rideau
TUNES Project
http://tunes.org/

1 Introduction
   1.1 Term Projects
   1.2 The Underlying Infrastructure
2 The Projects
   2.1 Erlang on top of Common Lisp
   2.2 Failover Feature for the Transactional Engine
   2.3 Content-addressed Data Store
   2.4 Monotonic Epistemic Logic
   2.5 Distributed Versioning Software
   2.6 Dynamic Cron
   2.7 Persistent Web Cache
   2.8 Session-aware Web Server Multiplexer
   2.9 Enhancing Exscribe
   2.10 Defining a Typesystem for Software Increments
   2.11 Building a Reflective Lisp Environment
3 Conclusion
   3.1 Deliverables
   3.2 Recommendations

1 Introduction

1.1 Term Projects

This is a set of proposals for term projects in Computer Science. They tackle domains such as Distributed Systems, Foundations of Computational Logic. They are proposed in the context of the TUNES Project for a free reflective computing system. They are primarily intended for students completing a Masters degree in Computer Science, but can be done with any person focused on studying Computer Science and already having acquired the basic skills and the general attitude. In each case, a successfully conducted project would lead to some software with practical direct or indirect applications for a large number of potential users.

Find the latest version of these proposals at the following URL: http://fare.tunes.org/computing/term-project-proposal.html.

Note that some of these projects have been included in the Google Summer of Code programme under the sponsorship of Lisp NYC: http://www.lispnyc.org/summerofcode.html But none was successfully taken care of in any of 2005, 2006 or 2007.

I am willing to provide guidance to the interns for the duration of the project. However, unless the interns will be able to visit me, my coaching will be restricted to Internet interaction; a second local supervisor is more than recommended. If the framework of the TUNES Project, a Free Software project run by a loose team of individuals, doesn't seem formal enough to you, I can easily obtain formal support and affiliation from a company or a university. If this is the case, please contact me <fare@tunes.org>.

These projects are conceived as parts of a coherent whole. They can be divided into tasks to be undertaken independently by interns; or they can be assigned to a team of interns, each tackling specific tasks, but cooperating to ensure that the different parts will work together; or they can be assigned to a sole intern working over several months.

The time estimates are rough estimates for good interns who would already be familiar with the main concepts involved in the project, and would do a proper job including test cases and minimal documentation. Interns who first have to get up to speed with some of the concepts may require more time to complete the project properly; this is not to be considered as a handicap, but as an opportunity to learn more.

1.2 The Underlying Infrastructure

All these proposals are intended to be built as extensions to existing Common Lisp infrastructure. Projects involving persistence may be developed on top of BKNR or Rucksack; projects involving distributed programming may be developed on top of cl-muproc; projects involving web interfaces may be developed on top of UCW or BKNR; etc. We recommend that the interns should reuse existing libraries and avoid reinventing the wheel when there's a perfectly good one that fits, unless the reinvention of a better wheel is a constitutive part of the project specification.

Of course, the proposed projects could also be developed using a different infrastructure and/or a different programming language, Still, we recommend using the specified infrastructure, for the following reasons:

Common Lisp is a versatile and extensible language, that can be tailored to the domain at hand. In a research project, this means that the interns can express the essence of the concepts being employed, factoring out most of the noise that clutters programs in other languages. This makes them more productive and prone to completing more interesting programs in the alloted time. This also makes it easier for their supervisor and for themselves to assess that they indeed have understood the concepts.
BKNR is a platform being developed and already deployed in several actual applications. Developing on top of it means that you can rely on a rich and robust base, and at the same time add value to existing systems. Interns will thus be able to make non-trivial contributions to actually useful projects, and earn credits on real-world software. At the same time, their work can be field tested on real applications. This makes things easier and their result more valuable for both interns and supervisors. Note that depending on previous experience and available support, the interns may prefer to use such software as Rucksack and UnCommon Web as alternatives to BKNR.
Free Software licenses mean that the code base and intern contributions can be freely used, copied, distributed and modified. They make reviewing the code for academic purposes easy. They enable interns to get involved in cooperative development of real programs instead of being isolated and writing throw-away programs. This also enables interns, supervisors and other software developers to pursue long term mutually beneficial relationships for software development and academic research.

2 The Projects

2.1 Erlang on top of Common Lisp

Erlang is a distributed programming language that makes programming in the Actor paradigm easy. cl-muproc is the embryo of an abstraction layer providing an Erlang-like programming model to Common Lisp. This project aims at extending the most popular Common Lisp implementations in order to make the concurrent programming features of Erlang fully available to them.

The intern will complete a working version of the cl-muproc code, portable to several Common Lisp implementations (say sbcl and clisp on Linux). He will provide the following implementation strategies for spawning new activities:

by using the threading mechanism built into the Lisp implementation,
by forking the current Unix process,
by asking a remote server to create a new activity,
by running a command that launches a new image in the background.

For each strategy, the Erlang communication protocols must be implemented accordingly. Of course, each of these strategies is only applicable for some combinations of underlying Lisp implementation and operating system that provide the features making it possible.

To demonstrate the functionality of this distributed-software infrastructure, the intern will develop an actual implementation of Erlang on top of it.

If time is limited, implementation of Erlang-like capabilities may be restricted to a subset of the suggested strategies. We recommend that the intern should start with the simplest way of completing the underlying infrastructure, implement Erlang on top of it, import simple test programs from the Erlang community, debug his implementation, and then develop the other ways of spawning activities.

Community Benefits: Current Lisp systems are currently lacking in the way of concurrent programming. Concurrent programming is better done right with the Erlang model (actually, the Actor paradigm) than by emulating the low-level C/C++ thread model like Java does (which is an abstraction%20inversion).

Intern Benefits: The intern will gain a deep understanding of concurrent and distributed programming, synchronization problems, fine details of operating system API, performance issues, etc.

Difficulty of the Task: This project may take from two to six months for one to three interns, depending on how much the interns aim to achieve. It can be assigned to a team of interns who will each tackle a different aspect of the overall system, yet cooperate to get them all working together.

Notes: cl-muproc currently provides a messaging mechanism subtly different from that of Erlang. This project specifically requires full compatibility with Erlang, with the eventual ability to interoperate with Erlang processes in mind. We recommend that the use of fork(2) and other unix system calls be done through cffi-unix or an extension thereof. For reference, also see the failed Erlisp project.

2.2 Failover Feature for the Transactional Engine

BKNR has a transactional engine based on the Prevalence technique, whereby a journal of object modifications and other high-level system updates is kept. Upon failure or system upgrade, transactions can then be replayed from the latest system snapshot.

The intern will implement a failover feature for BKNR: one or more live backup systems are kept up-to-date at all times; a monitor regularly assesses that the main system is operational, and activates one of the backups to replace it in case it fails.

The system should be designed so as to work in a multiple-machine setting, where the monitor and the various redundant copies of the system are each located on a different machine. However, the intern should also prepare a test configuration that runs with multiple processes on a single machine.

Community Benefits: Current Lisp systems are currently lacking in the way of orthogonal persistence. With such a distributed failover feature, it may become possible to use Common Lisp as a robust object-oriented database, obsoleting a lot of painful work done with relational databases at the time being.

Intern Benefits: The intern will gain a deep understanding of concurrent and distributed programming, synchronization problems, fine details of a persistence API, performance issues, etc.

Difficulty of the Task: This project may take two to four months for one or two interns.

2.3 Content-addressed Data Store

The intern will implement an extension to the BKNR blob store, that stores read-only blobs in the filesystem with an index based on a strong cryptographic hash of the contents. Blobs with identical contents are automatically merged.

In order to test this infrastructure, the intern will implement a drop-in replacement for the existing BKNR datastore using the content-addressed blob as a backend; automatic merging of identical files will then be a beneficial side-effect. This code will contain a proper level of indirection so as to handle hierarchical file names, and special attention will be given to the situation when clients modify files with already indexed contents.

Optionally, the intern will implement support for incremental backup and restoration of the blob store, integrate this with the backup and restore process of the rest of the transactional engine, and ensure the proper garbage collection of unused backed-up blobs.

Note that by computing hashes of very large files based on a subdivision of them as binary trees rather than as a sequences of bytes, it becomes possible to process file transmission and verification in a more incremental, parallelizable manner that might better suit the needs of P2P applications. However, special care and consultation with a cryptographic expert is required to ensure that such a scheme is safe.

Community Benefits: A Content-Addressed Data-Store part is a straightforward implementation technique for Monotonic Logic (see below), and has many applications: filesharing, P2P, databases, versioning, pattern-recognition, etc.

Intern Benefits: The intern will gain insight in Monotonic Logic, resource management, API implementation and evolution, use of a transactional engine; he will acquire useful skills for further, more challenging tasks.

Difficulty of the Task: This is a simple project, which might take two months for a intern familiar with the infrastructure. It can be proposed as a short project, or as a preliminary part of the following project. The part specified as optional above may be proposed as a separate project in the next year.

2.4 Monotonic Epistemic Logic

The intern will implement a library to provide Monotonic Epistemic Logic to Lisp programs.

Monotonic Epistemic Logic is the natural solution to the problem of having networking infrastructure that provides acceptable performance in the context of SNAIL: Slow Networking with Atrocious Interchange Latency, such as is the case with snail mail, interplanetary networking, and disconnected computing (1, 2).

Monotonic Logic means that the computing system is organized around a model where information accumulates but is never modified or deleted. Messages may thus be sent or received multiple times and in any order, without causing system inconsistency. This avoids negociation on the synchronization of states between parties during communication.

Epistemic Logic means that each key actor in the system maintains a model of what information itself and other key actors already have or need to acquire. Actors may thus send or broadcast messages that contain only the incremental knowledge needed by the destination parties. This minimizes the need to negociate what content to communicate.

When a host has to communicate with one or many other hosts, it will aggregate all the information to be transmitted at the time into a single chunk message. Chunks may then be exchanged over TCP/IP, email, some web interface, NetBLT (RFC 969, 998, 1030, 1080, 1986), or through files to be otherwise transmitted by snail mail (burnt on a CD, copied to a USB key, etc.) The monotonic epistemic logic should be independent from the transportation layer.

Additionally, messages may also be built by encrypting or cryptographically signing simpler messages, so as to provide authentication from multiple sources and for multiple destinations.

Extra points for a system that allows some newer facts to explicitly supersede older facts, and for a relevance logic that allows to restrict communication or storage to relevant facts. This way, it becomes possible to actually erase previous facts from the database while keeping the system coherent with respect to the main invariant of monotonic logic: communication consists in declaring facts, and fact declaration is idempotent. At least the latest supersede or cancellation message must be remembered so that the system maintains the commutativity of message declarations: messages can be received (and received again) in any order without changing the meaning of the system.

To illustrate the functionality of his software, the intern will build a distributed data store where each node uses the BKNR data store and maintains a model of what other nodes know or need to know.

Community Benefits: SNAIL networks are everywhere: PDA owners, inhabitants of poor or remote locations, space explorers, travelers, all will benefit from support for SNAIL networking. See also next proposal for a higher-level application of the SNAIL paradigm.

Intern Benefits: The intern will gain a deep understanding of concurrent and distributed programming, synchronization problems, and high-level approaches to solving them.

Difficulty of the Task: This project may involve one, two or three interns for a period of three to six months. It may profitably use the Content-addressed Datastore from the previous project, which may be included as a preliminary part of this project.

2.5 Distributed Versioning Software

The intern will build a distributed versioning system based on the Monotonic Epistemic Logic library as developed during the above project. Such a system is fit for SNAIL networks as above, but also for SNAIL networks as a conceptual model of social interaction.

An early versioning system may restrict its scope to differences between known file versions; any version discrepancy will be left for the users to cope with, and any version subsumption will have to be declared by the users. A less primitive versioning system may offer conflict management à la CVS; and a more elaborate system may use a theory of patches such as the one from DARCS. For inspiration, the intern may want to have a glance at existing systems such as Arch, ChangeSafe, Codeville, DARCS, Git, HO-CVS, Meta-CVS, monotone or VESTA (See also these reviews: 1, 2, 3, 4).

For additional credits, a simple compatibility layer should allow the software to be used as a drop-in replacement for CVS, Subversion or GIT, when properly configured; for yet more credits, a working default autoconfigurator should be included. Final points for a tool to feed modifications from existing CVS repositories into the new versioning system.

Community Benefits: Members of loosely-connected projects will benefit from support of SNAIL, not as a machine-level networking paradigm, but as a people-level networking paradigm.

Intern Benefits: The intern will become the leader in an important project with a major impact on the way software programmers interact.

Difficulty of the Task: This project may take three to six months for one or two interns.

2.6 Dynamic Cron

The intern will implement a Dynamic Cron: a dynamic job control system with an interactive web interface, integrated into the BKNR infrastructure.

Through the web interface of BKNR as well as through a Lisp interface to be documented by the intern, users may be able to schedule, stop, reschedule and delete jobs, list them, watch them, inspect their partial or ongoing output, etc.

Jobs may further be made to depend not just on a global timer, but also on the completion of other jobs. Additional credits will be granted if the Dynamic Cron is turned into a workflow management system.

Community Benefits: Such a dynamic scheduler will be a key piece of an infrastructure that transforms Common Lisp into a dynamic programming system with a web interface instead of (merely) a GUI or a command line (that it may also have).

Intern Benefits: The intern will gain a deep understanding of concurrent and distributed programming, synchronization problems, persistence, and high-level approaches to solving them.

Difficulty of the Task: This project may involve one or two interns from one to three months. It may profitably reuse parts of the Erlang-in-lisp project above, and may be conceived as a sequel to it, if the interns finish early within the alloted time.

2.7 Persistent Web Cache

The intern will implement a web proxy that caches pages, storing them in a persistent store. The proxy may be further extended with a crawling module that proactively crawls pages it predicts might soon be accessed.

Existing software that may inspire interns include wwwoffle, w3mir and V6. Also, for the purposes of recovering dead sites, the crawler may be extended to recover files from archive.org.

Intern Benefits: The intern will gain a deep understanding of concurrent and distributed programming, synchronization problems, persistence, and high-level approaches to solving them.

Difficulty of the Task: This project may take two to six months for one or two interns. It may profitably reuse results from above projects concerning Erlang-in-Lisp and the content-addressed blob store.

2.8 Session-aware Web Server Multiplexer

For the purpose of performance and scalability, it is useful that requests for a service be handled by a farm of servers instead of only one. Now, many web servers maintain session information; this is particularly the case for servers with continuation-based interfaces such as UnCommon Web. And, session information can be expensive to ``serialize´´ and ship between machines; once again this is particularly the case for continuations. Thus it is better that clients should always be served by the same server.

The intern will modify an existing web server frontend, so that it will delegate each requests to one server in a farm, in such a way that requests within a same session will always be served by the same server. The intern will arrange for such a setup to work with the UCW infrastructure.

To demonstrate his software, the intern will develop or adapt an existing simple e-commerce application. He will prepare several test configurations, one that runs on a single machine with multiple processes, and one that runs on multiple machines.

Community Benefits: As Common Lisp is more used into the realm of high-performance computing with web interfaces, there is greater need for tools that will make it easy to leverage together the power of cheap hardware and the expressive power of Common Lisp.

Intern Benefits: The intern will gain a deep understanding of web APIs, and how to provide high-level approaches to them.

Difficulty of the Task: For additional credits, and if time permits, the intern will integrate UCW and BKNR, and implement the reverse kind of (de)multiplexing: distributed transactions between multiple BKNR servers. This can also be conceived as a separate project to be done in the next year.

This project may take from one to six months for one intern.

2.9 Enhancing Exscribe

The intern will enhance Exscribe with a variety of features, turning it into a better platform for authoring documents.

New features to implement include, in order of priority:

A cl-typesetting back-end, allowing the output of PDF printable documents.
A reStructured Text front-end, allowing the input of documents in either plain rst, a mix of rst and a programmable strict subset of Exscribe. The subset will be characterized by an explicit code-expander to be applied to in-line Exscribe escape statements, or nil if disallowed.
An extension to cl-typesetting that allows to output proper footnotes, instead of having to put notes in a separate section.
Extensions to some literate programming software (e.g. Albert to use Exscribe with the ReST syntax by default).
Extensions to some web system so as to allow Wiki content with the rst syntax and some Exscribe extensions.
Extensions to Exscribe to allow for seamless use of inlined cl-pdf or LaTeX material, to be rendered as images in HTML output and included as PDF in PDF output.
Development of an Exscribe-mode for Emacs and/or its CLIM equivalent.
Extensions to cl-typesetting to use some generic declarative constraint-solving language for deciding of text layout.
Interface of Exscribe with TeXmacs or some other WYSIWYM text editor, possibly a CLIM one.

Community Benefits: A good programmable document authoring system is still lacking. Exscribe could become the basis for such a system, and be integrated with other systems to allow for easy-written Lisp documentation, literate programming software, etc.

Intern Benefits: The intern will gain a deep understanding of a layered document manipulation system.

Difficulty of the Task: This project can keep a number of interns busy indefinitely. It involves hacking the guts of several systems, and being compatible with existing software that may or may not have formal specifications.

2.10 Defining a Typesystem for Software Increments

This is a project in theoretical Computer Science, whereby the intern will develop a representation for Software Increments (program changes) and a Typesystem DT applicable to such changes, that closely matches an existing system T for programs, such that there be a natural transformation from DT(diff(nil,.) to T(.).

Note that you will first have to formalize programs and program changes. Possibly, starting with a type for terms, take its derivative type, and consider collections of pairs of term-derivative and function from (sub)term to (sub)term. Then deduce an obvious type abstraction for that from the type abstraction of terms. Then realize this won't encapsulate correctness of transformations least your initial type (and its derivative) somehow handles symbols (identifiers) in a proper way. Maybe discover something grand (and a posteriori trivial) about intention vs extension and meta-level representation, since the interesting case is about changing the extensional interface at a given symbol and all that entails for sites that consume that symbol.

Extra credits if the incremental typesystem can account not just for correct complete programs, but for erroneous partial programs, by representing the discrepancy between the two, i.e. what it would take to make the current program type-correct. Thus, the incremental typesystem could be used not just as a verification tool, but as a guide for what remains to be done in a software project.

Community Benefits: This proposal, if completed, will reconcile Static Type Systems with Dynamic Software Evolution, and open a whole new world of software engineering techniques and tools.

Intern Benefits: The successful intern will be well on his way to a Turing Award.

Difficulty of the Task: This is a research project that will take just a few weeks or months for whichever genius achieves it, and that other mortals will not be able to complete in any amount of time. I can't help you much, except to incite you to tackle this opportunity. An actual implementation that turns such a type system into an engineering tool might the whole lives of many people.

2.11 Building a Reflective Lisp Environment

In this project, the intern will develop a Reflective Lisp Environment, with introspective and intercessive capabilities such that it is possible to efficiently build on top of them such features as garbage collection, regions management, partial continuations, concurrency, distribution, code migration, orthogonal persistence of data and code, version control, configuration management, etc. All these features will be demonstrated. Both a Common Lisp layer and a Scheme layer will be built atop this environment.

The executive will be built on top of Ian Piumarta's COLA, or using similar techniques.

Community Benefits: This proposal, if completed, will provide the Lisp community at large with an environment capable of supporting live programming, and enjoying the benefits of an integrated environment capable of coherently automating meta-level tasks that programmers have to do manually in existing systems.

Intern Benefits: The successful intern will gain key insight on many aspects of the implementation of computing system.

Difficulty of the Task: This is a project that requires a lot of patience and debugging, especially when it comes to interfacing to existing systems, and overcoming the various quirks of hundreds of variants of the underlying hardware and other software that has to be interfaced with.

3 Conclusion

3.1 Deliverables

At the end of his project, each intern will submit to the supervisor the code he has written, and a report. The code and report will be previously reviewed by the TUNES Project coordinator.

The code will have to include automated tests for all the developed functionalities, easily runnable using a simple command, be it a Makefile target or the equivalent.

The report will contain the intern's reflections on the software development process. He will stress what he thinks he has learnt from both expected and unexpected problems that he faced, and the way he solved them. He will also make an assessment of his own level of satisfaction regarding his own contribution, the support he received or failed to receive from his supervisors and fellow developers. He may make suggestions as to how future similar projects could be turned into better learning experiences.

3.2 Recommendations

Interns are encouraged to maintain contact with the TUNES Project coordinator as well as with the BKNR developers. Not only will this make their own work easier, but it is also one of the aims of these projects that the interns should acquire skills in cooperative software development.

The use of the programming language Common Lisp should discourage neither interns nor supervisors who are not familiar with it. Interns who already know a programming language can become productive in Common Lisp in a matter of a few weeks: indeed Common Lisp is a programming language that can straightforwardly express the programming paradigms that the intern typically learn, imperative, object-oriented and functional. As their project unfolds, interns may learn on a need basis the additional functionalities of the language, its extensibility and its extensions. As for supervisors, the conciseness of Common Lisp code will only make their supervision task easier, due to the match between domain concepts and constructs in properly extended Lisp. For more information, we recommend the use of the Common Lisp resources available on the CLiki.net website.

Interns are also advised to write tests for each unit of code early on during software development, maybe even before writing the code itself: thus they will achieve a better understanding of the specifications, and they may detect mistakes earlier. The fact that the code will pass tests and be validated by the TUNES Project coordinator will ensure that it is indeed meaningful and not a rigged demo, making things clearer for the academic supervisors.

For use with BKNR, it is better that interns use the CMUCL implementation of Common Lisp. Actually, they could use any Lisp platform, if they are ready to make the additional porting effort of BKNR to that new platform.

Faré Rideau

Faré on Computing

Site by Faré Rideau — Donate: bitcoins 1fareF6wCNYYiLPGmyQjrd3AQdHBb1CJ6

or paypal