Blast

Help

▷ File Formats ▽ File Formats
FASTA The FASTA format is a text-based format for representing nucleotide sequences or amino acid (protein) sequences using using single-letter coding for each nucleotide or amino acid.

Example:
>SEQUENCE_1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK
IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL
MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL

The file is composed of 2 parts:
- Description line: Beginning with the ">" (greater-than) and holds a summary description of the sequence. It starts with the sequence ID that cna be followed by one or more comments
- Sequence representation: Proper representation of the sequence. It may be delivered as a single line or with multiple lines with the same number of characters.
▷ Data type ▽ Data type
Gene annotation File containing the location of the gene loci and the description of their strucuture and of the associated subfeatures (ex. mRNAs, exons, CDSs, etc.).
CDSs sequences File containing the sequence of the coding portion (CDS) of the transcribed mRNAs
Functional annotation File containing the Functional annotation of the mRNAs encoded in the gene loci
Complete genome assembly File containing the integral collection of genomic sequences describing the genome of the species. It contains:
- For pseudomolecule-scale assemblies: all chromosome sequences and all unplaced sequences;
- For diploid pseudomolecule-scale assemblies: all chromosome sequences for both haplotypes and all unplaced sequences;
- For diploid draft assemblies: all sequences of primary assembly and all haplotigs sequences
Genome haplotype 1/2 File containing the collection of pseudomolecule sequences for haplotype 1 or haplotype 2 of a diploid pseudomolecule-scale assembly
mRNA sequences File containing the collection of sequences of the transcribed mRNA as concatenation of all the ascribed exons.
Protein sequences File containing the collection of sequences of the translated proteins from the annotated CDS sequences
Repeats annotation File containing the positional information regarding all repetitive regions (ex. transposable elements, low complexity regions, etc.)
Unplaced sequences For pseudomolecule-scale assemblies, this file collects all the assembled genomic sequences that could not be ascribed to any chromosome.