The Domains & Families tab

The Domains and Families tab shows hits from the gene to domains and gene families from various databases.

For each hit, an id, description, range, and coordinates are shown. Mousing over the id will show the member database's name, and clicking the id will pop up a new browser window to that website's page for that hit, if applicable. For those applicable domains and families, mousing over the e-value reported by hmmsearch (or, for COG, rpsblast, and for PDB, BLASTp) will show the score as reported by hmmsearch, rpsblast, BLASTp, or InterProScan. (For BlastProDom and ProfileScan, which do not use hmmsearch, e-values are not currently shown.)

Many models that are part of the FastHMM pipeline have been classified by InterPro. In these cases where a FastHMM hit has an InterPro entry, the InterPro description is shown, and links to the page describing that particular domain or family. Mousing over the description will show the InterPro id of the hit, if available. The range is color-coded so that hits with the same InterPro id have the same color; unclassified hits are gray. A legend is at the bottom of the page.

For PDB hits, we also show the percent identity returned by BLASTp.

If available, mousing over the range will show you the coverage of the hit to the target, as well as the length of the entire domain or family.

We have built gene trees for selected COG and Pfam hits. Clicking the red T will bring you to the gene tree for that gene and COG or Pfam.

You can sort the domain hits by start, by IPR, by domain database, or by category. Sorting by start simply lists each hit from the beginning of the gene to the end; if multiple hits have the same start, longer hits are shown first. Sorting by IPR groups hits with the same IPR id together; unclassified hits are shown after all classified hits. Sorting by domain database sorts by start within each database. Sorting by domain/family/pdb/site groups each hit based on whether InterPro classifies it as a domain or a gene family; unclassified or other hits are shown unders Sites, and PDBs are grouped separately as well.

The FastHMM pipeline replaces using hmmpfam for searching many HMMs with many query sequences. We use or generate alignments for each family of HMMs, then use PSIBLAST with a high cutoff to analyze each gene sequence against each alignment. The PSIBLAST output is filtered to find candidate sequences for each HMM, and only those sequences are analyzed using hmmsearch against the matching HMM. The result is that, instead of analyzing over one million genes against each HMM, we analyze on the order of a few hundred, thus reducing processor time by about 20-fold. We have found no false positives and a very few false negatives compared to running hmmsearch. You can learn more about FastHMM, or download it, here.

Currently we use these versions of external HMM libraries:

We update our analyses with the latest release of each database every six-twelve months.

InterPro is a database that classifies various member databases, so that similar models all have the same InterPro id. InterProScan is a tool that allows users to analyze genes against all InterPro member databases.

We use InterProScan for all non-HMM-based analyses. For HMM-based analyses we instead use our FastHMM pipeline, described above.

last updated November 1, 2007

MicrobesOnline Home Page