A new method of solution for the occupancy problem and its application to operon size prediction
J Theor Biol. 2004 Apr 7;227(3):315-22
Warren F. Lamboy and Gabriel Moreno-Hagelsieb
published: 2004 | Research publication | Science
The problem of estimating the expected number of transcription units containing a specific number of genes arises in the context of operon size prediction in prokaryotic genomes, where operons are defined to be transcription units containing two or more genes. It turns out that this problem is identical mathematically to the balls in urns occupancy problem in probability theory. In that problem, a fixed number of indistinguishable balls are randomly placed in a known number of distinguishable urns, subject to the restriction that no urns may remain empty, and an estimate is desired for the expected number of urns containing a specific number of balls. In this paper we present a new simple technique for solving the occupancy problem when empty urns are allowed and extend it to the case when each urn must contain the same non-zero minimum number of balls. Treating transcription units as equivalent to urns, and genes as equivalent to balls, we then use that result to solve the problem of estimating the expected number of transcription units that contain a specific number of genes, and then apply that result to predicting the expected number of transcription units present in an entire genome. Since these predictions can be made for any completely sequenced and annotated prokaryotic genome, they provide a starting point for the comparison of regulatory complexity across such genomes.
Download: PDF (218k) Lamboy-JTB2004.pdf
revised Aug 31/06
View all Gabriel Moreno-Hagelsieb documents