In the era of post-genomics, almost all the genes have been sequenced and enormous amount of data have been generated. Hence, to mine useful information from these data is a very important topic. In this paper, we present a software architecture for finding motifs using genetic algorithm (GA). The new approach can find potential motifs in the regions located from the -2000 bp upstream to +1000 bp downstream of transcription start site (TSS). The mutation in the genetic algorithm is performed using position weight matrices to reserve the completely conserved positions. The crossover in the GA is implemented with specially-designed gap penalties to produce an optimal child pattern. We also present a rearrangement method based on position weight matrices to avoid the presence of a very stable local minimum that may be difficult for operators to generate the optimal pattern. The predicted results obtained from our approach are more accurate than that of Gibbs sampler and we spend less computation time than MEME.
International Journal of Software Engineering and Knowledge Engineering 15(3):571-585