Download
Help
▷ File Formats ▽ File Formats
FASTA
The FASTA format is a text-based format for representing nucleotide sequences or amino acid (protein) sequences using using single-letter coding for each nucleotide or amino acid.
Example:
The file is composed of 2 parts: - Description line: Beginning with the ">" (greater-than) and holds a summary description of the sequence. It starts with the sequence ID that cna be followed by one or more comments
- Sequence representation: Proper representation of the sequence. It may be delivered as a single line or with multiple lines with the same number of characters.
Example:
>SEQUENCE_1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK
IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL
MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL
The file is composed of 2 parts: - Description line: Beginning with the ">" (greater-than) and holds a summary description of the sequence. It starts with the sequence ID that cna be followed by one or more comments
- Sequence representation: Proper representation of the sequence. It may be delivered as a single line or with multiple lines with the same number of characters.
▷ Data type ▽ Data type
Gene annotation
File containing the location of the gene loci and the description of their strucuture and of the associated subfeatures (ex. mRNAs, exons, CDSs, etc.).
CDSs sequences
File containing the sequence of the coding portion (CDS) of the transcribed mRNAs
Functional annotation
File containing the Functional annotation of the mRNAs encoded in the gene loci
Complete genome assembly
File containing the integral collection of genomic sequences describing the genome of the species. It contains:
- For pseudomolecule-scale assemblies: all chromosome sequences and all unplaced sequences;
- For diploid pseudomolecule-scale assemblies: all chromosome sequences for both haplotypes and all unplaced sequences;
- For diploid draft assemblies: all sequences of primary assembly and all haplotigs sequences
- For pseudomolecule-scale assemblies: all chromosome sequences and all unplaced sequences;
- For diploid pseudomolecule-scale assemblies: all chromosome sequences for both haplotypes and all unplaced sequences;
- For diploid draft assemblies: all sequences of primary assembly and all haplotigs sequences
Genome haplotype 1/2
File containing the collection of pseudomolecule sequences for haplotype 1 or haplotype 2 of a diploid pseudomolecule-scale assembly
mRNA sequences
File containing the collection of sequences of the transcribed mRNA as concatenation of all the ascribed exons.
Protein sequences
File containing the collection of sequences of the translated proteins from the annotated CDS sequences
Repeats annotation
File containing the positional information regarding all repetitive regions (ex. transposable elements, low complexity regions, etc.)
Unplaced sequences
For pseudomolecule-scale assemblies, this file collects all the assembled genomic sequences that could not be ascribed to any chromosome.