Introduction to the Standalone WWW Blast server
Index
This standalone BLAST server was designed similar to regular NCBI BLAST server and command-line NCBI BLAST program "blastall". It incorporates most features, which exist in NCBI BLAST program and should be relatively easy to use. The main points here, that this server DOES NOT support any request queuing and load balancing. As soon as user hit "Search" button, BLAST will start immediately if entered information is valid. So, this server is not intended to handle large load, which may exist in public service. Standalone server assumes, that user has its own BLAST database, that should be searched and want to have simple WWW interface to this search. It is STRONLY recommended that user has experience to install and run standalone NCBI BLAST programs.
After files uncompressed server is ready to be used immediately. Any customizations to the program are welcomed and may be done by experienced programmers using source code, which also provided. Recompilation of the server executables require, that programmer has compiled NCBI toolkit libraries available. This toolkit may be downloaded from NCBI FTP web site: ftp://ncbi.nlm.nih.gov
- PSI Blast now has new advanced statistics and ability to
produce Smith-Waterman alignments
- Added support for XML output.
After downloading file wwwblast.Your_platform.tar.gz to your computer place it into document directory of HTTPD server and uncompress it by
gzip -d wwwblast.Your_platform.tar.gz
tar -xvpf wwwblast.Your_platform.tar
Please note that parameter "p" in tar options is significant - it will preserve file access options stored in the distribution. Temporary directory for BLAST overview images (TmpGifs) should have 777 permission and logfiles (wwwblast.log and psiblast.log) should have 666.
After you uncompressed distribution file "blast" directory will be created. You can access sample BLAST HTML input forms using URLs:
- http://your_hostname/blast/blast.html
- http://your_hostname/blast/blast_cs.html
- http://your_hostname/blast/psiblast.html
- http://your_hostname/blast/psiblast_cs.html
This distribution comes with 2 BLAST databases: "test_aa_db" - sample protein database and "test_na_db" - sample nucleotide database. These databases configured to be searchable immediately with corresponding BLAST program.
- Root directory (./blast):
blast.cgi |
- BLAST search start-up C-shell file |
blast_cs.cgi |
- BLAST search start-up C-shell file for version with Entrez lookups |
.nlmstmanrc |
- Configuration file for the graphical overview image (do not edit!) |
blast.html |
- sample BLAST search input HTML form |
blast_cs.html |
- sample BLAST search input HTML form for version with Entrez lookups |
blast.rc |
- Default configuration file for the WWW BLAST server |
psiblast.rc |
- Default configuration file for the PSI/PHI WWW BLAST server |
psiblast.cgi |
- PSI/PHI BLAST search start-up C-shell file |
psiblast_cs.cgi |
- PSI/PHI BLAST search start-up C-shell file for version with Entrez lookups |
psiblast.html |
- sample PSI/PHI BLAST search input HTML form |
psiblast_cs.html |
- sample PSI/PHI BLAST search HTML form for version with Entrez lookups |
psiblast.REAL |
- Main PSI/PHI BLAST server executable |
psiblast_cs.REAL |
- Main PSI/PHI BLAST server executable for version with Entrez lookups |
xmlblast.html |
- sample form that produces only XML Blast Output |
xmlblast.cgi |
- sample start-up C-shell to produce XML content type |
blast_form.map |
- Auxiliary map file for the front BLAST image |
nph-viewgif.cgi |
- CGI program used to view and delete overview images |
readme.html |
- this documentation |
wwwblast.log |
- default logfile |
psiblast.log |
- default PSI/PHI Blast logfile |
- ./data
directory - matrixes used in BLAST search
- ./db
directory. - Files of test BLAST databases: test_aa_db and test_na_db. This directory has also binary of formatdb program.
- ./docs:
- HTML pages used in sample BLAST search input page
- ./images
- images used in the sample BLAST search input page
- ./Src
- source directory for WWW BLAST server and formatdb program.
- ./Src/XML
- source directory for creation of XML output related files.
- blstxml.asn - ASN.1 definition for Blast XML
- blstxml.dtd - corresponding DTD
- ./TmpGifs
- storage for temporary BLAST overview gif files
To set up databases for the standalone WWW BLAST server it is necessary to follow these steps:
- Copy file with concatenated FASTA entries, that will be used as a database into directory "./db"
- Run "formatdb" program to format this database there.
- Add name of the database into server configuration file
- Add name of the database into (PSI/PHI) WWW BLAST search form
There is one significant feature of the PSI/PHI Blast server, that FASTA
files as a source for BLAST databases should have GI number in the SeqId
field. This is practically always TRUE for FASTA files from NCBI FTP site.
Local databases may not be used with this version of PSI/PHI Blast unless
they have ">gi|12345..." prefix in the definition line.
Databases for the PSI/PHI Blast should always be created with formatdb using
"-o T" flag. Test database "test_aa_db" was created using this flag and
database "test_na_db" was created without this flag.
If this distribution was installed not in "/blast" directory under HTTPD
documents root directory, than path do the distribution should be set
by environment WWW_ROOT_PATH in the file psiblast.cgi or psiblast_cs.cgi
Both regular Blast and PSI/PHI Blast have client/server versions for
Entrez gi/accession lookups and printing Taxonomy reports. Configuration of
client/server interface from the user to NCBI should be done as with any
other client/server program to NCBI. If program "blastcl3" works without
problems this server also should work OK. If user has firewall - default
configuration will definitely fail to work properly and this case will
require special configuration. General explanations of client/server
configuration to NCBI is not subject of this discussion. In any case if
user has problem with such configuration he should write to
info@ncbi.nlm.nih.gov
for farther assistance.
Possibility to produce XML output was added to this server. XML definition
of BLAST output tied to the simple ASN.1 specification designed for this
case. These definitions may be found in the directory ./Src/XML. Any
recommendations on improvements to this (possible) standard may be sent to
blast-help@ncbi.nlm.nih.gov or directly to me at shavirin@ncbi.nlm.nih.gov
XML may be printed by setting "Alignment view" in blast.html or (blast_cs.html)
page to "Blast XML".
Resulted page will looks empty - but if you open page source (in Netscape -
View -> Page source) - you will see complete XML document.
As an example of XML usage sample page "xmlblast.html" was created. This form
calls starter cshell script "xmlblast.cgi", which set Content-type of the
output to ncbi/xmlblast. Now if browser can handle this type of content
using plug-in or some helper application - it is possible to design custom
BLAST output viewer.
This standalone server has the same tag convention as regular NCBI BLAST server. Sample BLAST search form may be changed to accommodate particular needs of the user in the custom search. Here is the list of these tags and their meaning. If some tag is missing from the search input page this will take default value. Exceptions are tags PROGRAM, DATALIB and SEQUENCE (or SEQFILE), that should always be set.
- PROGRAM
- name of the BLAST program. Supported values include programs: blastn, blastp, blastx, tblastx and tblastn
- DATALIB
- name of the database(s) to search. This implementation includes possibility to use multiple databases. To use multiple databases few "DATALIB" tags should be used on the page for example using checkboxes (look for example at Microbial Genomes Blast Databases BLAST at NCBI). Note, that all of these databases should be properly written in the server configuration file.
- SEQUENCE
and SEQFILE - these tags used to pass sequence. First SEQUENCE tag is used for the input sequence and if it is missing SEQFILE tag used instead.
- UNGAPPED_ALIGNMENT
- default BLAST search is gapped search this tag if set will turn gapped alignment off
- MAT_PARAM
used to set 3 parameters at the same time. Value for this tag should be in format " " where mat_name - string name of the matrix (BLOSUM62, etc), d1 - integer for cost to open gap and d2 - cost to extend gap (-G and -E parameters in blastall respectably)
- GAP_OPEN
- set value for cost to open gap - 0 or missing tag invoked default behavior
- GAP_EXTEND
- set value for cost to extend gap - 0 or missing tag invoked default behavior
- X_DROPOFF
- Dropoff (X) for blast extensions in bits (default if zero) (-y parameter in "blastpgp" program)
- GENETIC_CODE
- Query Genetic code to use (for blastx only)
- THRESHOLD_1
- Threshold for extending hits in first pass in multipass model search (-f in blastall)
- THRESHOLD_2
- Threshold for extending hits in second pass in multipass model search
- MATRIX
- Matrix (default is BLOSUM62) (-M in blastall)
- EXPECT
- Expectation value (-e in blastall)
- NUM_OF_BITS
- Number of bits to trigger gapping (-N in blastpgp)
- NCBI_GI
- If formated database use SeqIds in the NCBI format this option will turn printing of gis together with accessions.
- FILTER
- Multiple instances of values of this tag are concatenated and passed to the engine as "filter_string" ("L" for low complexity and "m" if filter should be set for lookup table only) - any letter will turn default filtering on - DUST for nucleotides and SEG for proteins (-F in blastall)
- DESCRIPTIONS
- Number of one-line descriptions in the output (-v in blastall)
- ALIGNMENTS
- Number of alignments to show (-b in blastall)
- COLOR_SCHEMA
- Color schema to use in printing of alternative alignment. This option valid only for blastp and blastn programs. If set - it will
override option set by "ALIGNMENT_VIEW"
- TAX_BLAST
- Print taxonomy reports. This option is valid only for
client/server version of regular Blast
- XML_OUTPUT
- Print XML Blast output. All other alignment view options will be disabled
- OTHER_ADVANCED
- this tag allows to input string analogous to the command line parameters of blastall. Setting parameter in OTHER_ADVANCED tag will override all other settings of this parameter. Supported options include:
- -G
gap open cost
- -E
gap extend cost
- -q
penalty for nucleotide mismatch
- -r
reward for nucleotide match
- -e
expect value
- -W
wordsize
- -v
Number of descriptions to print
- -b
Number of alignments to show
- -K
Number of best hits from a region to keep
- -Y
effective search space
- ALIGNMENT_VIEW
- will set type of alignment to show. Available options include:
- 0 - Pairwise
- 1 - master-slave with identities
- 2 - master-slave without identities
- 3 - flat master-slave with identities
- 4 - flat master-slave without identities
- OVERVIEW
- used to turn on or off printing of alignment overview image
- WWW_BLAST_TYPE
- this special tag to distinguish different BLAST search types. See description of configuration file.
Default configuration file is "blast.rc" and logfile "wwwblast.log". Setting tag WWW_BLAST_TYPE to specific value may change these names. This is useful if few different search input pages use the same CGI search engine, but significantly different by content and priorities. Here is sample configuration file comes with this distribution:
# Number of CPUs to use for a single request
#
NumCpuToUse 4
#
# Here is list of combination program/database,
# that allowed by BLAST service. Format: ...
#
blastn test_na_db
blastp test_aa_db
blastx test_aa_db
tblastn test_na_db
tblastx test_na_db
This file will set how many CPUs of computer will be used in the BLAST search and what databases may be used with what programs. Logfile currently store only limited information but also may be updated by programmers to store more values in it. Please note, that usually HTTPD servers run by accounts, that do not have write access to disk, so to write logfile - its permission should be set to 666.
There is a way to debug these programs.
- Add line "setenv DEBUG_COMMAND_LINE TRUE" into *.cgi file
(uncomment it)
- Run search, that results in the problem - this should create
file "/tmp/__web.in" in the "/tmp" directory.
- Set in command line all necessary environment (BLASTDB at least)
and run from command-line: "blast.REAL < /tmp/__web.in"
This should do your problematic search without WWW. If this resulted
in coredump - you may look into corefile with:
dbx blast.REAL core
and then use command "where" to print stack.
Sergei Shavirin
Last modified: Fri Aug 10 13:51:15 EST 2000