Downloading Swiss-Prot FASTA sequences and creating a BLAST protein database

by rnnh
GNU/Linux ◆ xterm-256color ◆ bash 3793 views

In this video, the FASTA amino acid sequences of Swiss-Prot are downloaded, and a BLAST protein database is created from these sequences using makeblastdb. UniProtKB/Swiss-Prot is a manually annotated, non-redundant protein sequence database. As it is well-annotated and curated, the Swiss-Prot database gives informative results when searched locally using blastp and blastx. The link used in the wget command is copied and pasted from the UniProt downloads page. This is the full link to the compressed FASTA sequences of the Swiss-Prot database: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz

These FASTA amino acid sequences are compressed into a .gz (gzip) file. Before using the makeblastdb command, this FASTA file is uncompressed using gunzip, turning uniprot_sprot.fasta.gz into uniprot_sprot.fasta. Once the FASTA file is downloaded and uncompressed, makeblastdb is used to create a BLAST protein database of the amino acid sequences in this FASTA file. This BLAST protein database is named swissprot, and consists of three binary files.

Once the BLAST protein database is created, blastp and blastx can be used to search sequences against it. This database can be selected using the argument -db swissprotwith blastp or blastx (the path to the swissprot database will need to be given if the command is run from a different directory).

Made for https://rnnh.github.io/bioinfo-notebook/