Induction of Rules for Biological Macromolecule Crystallization
Hennessy, D., Gopalakrishnan, V., Buchanan, B.G., Subramanian, D., Rosenberg, J.M. Induction of Rules for Biological Macromolecule Crystallization, In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology (1994) 179-187. PMID: 7584388
X-ray crystallography is the method of choice for determining the 3-D structure of large macromolecules at a high enough resolution. The rate limiting step in structure determination is the crystallization itself. It takes anywhere between a few weeks to several years to obtain macromolecular crystals that yield good diffraction patterns. The theory of forces that promote and maintain crystal growth is preliminary, and crystallographers systematically search a large parameter space of experimental settings to grow good crystals. There is a wealth of experimental data on crystal growth most of which is in paper laboratory notebooks. Some of the data has been gathered in electronic form, e.g., the Biological Macromolecular Crystallization Database (BMCD) which is a repository of successful experimental conditions for growing over 800 different macromolecules (Gilliland 1987). Crystallographers are in need of computational tools to gather and analyze past data to design new crystal growth trails. We are building the Crystallographer's Assistant (CA) to help crystallographers record and maintain experimental context in electronic form, offer suggestions on experimental conditions that are likely to be successful, and provide explanations for failed experiments. As an initial step in this project, we have applied RL, an inductive learning program, to the BMCD. In this paper we report initial experiments and findings in applying RL to the BMCD. From the point of view of crystallography, we have discovered possibly significant new empirical relationships in crystal growth. From the point of view of machine learning, our work suggests refinements of existing methods for incorporating detailed domain knowledge into inductive analysis techniques.