最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

bioinformatics - Defining a complex tandem repeat motif with indels and substitutions - Stack Overflow

programmeradmin4浏览0评论

I am working on a tandem repeat project and I want to define a repeated motif that is complex, including indels and substitutions, with most bases being conserved. The motif varies in length between 22 and 28 bp.

I need to define it as an HMM motif because the tool I am using can only take exact motifs or HMM motifs. The exact way is not working for every sample because of the complexity of the motif.

I am looking for an appropriate way to model this motif, considering its variability. Since the motif includes both insertions/deletions (indels) and substitutions, I want to use a probabilistic model that can capture these variations while still recognizing the overall conserved structure.

I first used GLAM2 from the MEME suite, which provided me with a position probability matrix (PPM) for this motif. I was wondering if I could define it as an HMM motif, or if it lacks key information such as transition probabilities.

glam2 -a 11 -r 50 n motifs_vntr_20p.fasta -o glam2_motif_20p

Another approach I tried was multiple sequence alignment using MAFFT. I created a FASTA file where each sequence corresponds to one repeat of the motif (a total of 3,713 sequences). Then, I used hmmbuild from HMMER to build an HMM profile from the MAFFT alignment. However, I am unsure if this approach is reliable for modeling such a complex motif.

mafft --maxiterate 1000 --globalpair motifs_vntr_20p.fasta > mafft_vntr_20p.fasta

hmmbuild motifs_vntr_20p.hmm motifs_vntr_20p.fasta

Do you have any suggestions for better modeling this motif? Are there other tools that could be more suitable?

Thanks a lot!

发布评论

评论列表(0)

  1. 暂无评论