INSTALLING WEBBLAT

Installing A Web-Based Blat Server involves four major steps:
 1) Creating sequence databases.
 2) Running the gfServer program to create in-memory indexes of the databases.
 3) Editing the webBlat.cfg file to tell it what machine and port the gfServer(s)
    are running on, and optionally customizing the webBlat appearance to users.
 4) Copying the webBlat executable and webBlat.cfg to a directory where the web server can
    execute webBlat as a CGI.

CREATING SEQUENCE DATABASES

You create databases with the program faToTwoBit. Typically you'll create
a separate database for each genome you are indexing.  Each database can
contain up to four billion bases of sequence in an unlimited number of
records.  The databases for webPcr and webBlat are identical.

The input to faToTwoBit is one or more fasta format files each of which
can contain multiple records.  If the sequence contains repeat sequences,
as is the case with vertebrates and many plants, the repeat sequences can
be represented in lower case and the other sequence in upper case.  The gfServer
program can be configured to ignore the repeat sequences.  The output of
faToTwoBit is a file which is designed for fast random access and efficient
storage.  The output files store four bases per byte.  They use a small amount
of additional space to store the case of the DNA and to keep track of runs of
N's in the input.  Non-N ambiguity codes such as Y and U in the input sequence
will be converted to N.

Here's how a typical installation might create a mouse and a human genome database:
    cd /data/genomes
    mkdir twoBit
    faToTwoBit human/hg16/*.fa twoBit/hg16.2bit
    faToTwoBit mouse/mm4/*.fa twoBit/mm4.2bit
There's no need to put all of the databases in the same directory, but it can
simplify bookkeeping.  

The databases can also be in the .nib format which was used with blat and
gfClient/gfServer until recently.  The .nib format only packed 2 bases per
byte, and could only handle one record per nib file.  Recent versions of blat
and related programs can use .2bit files as well.

CREATING IN-MEMORY INDICES WITH GFSERVER

The gfServer program creates an in-memory index of a nucleotide sequence database.  
The index can either be for translated or untranslated searches.  Translated indexes
enable protein-based blat queries and use approximately two bytes per unmasked base 
in the database.  Untranslated indexes are used nucleotide-based blat queries as well
as for In-silico PCR.  An index for normal blat uses approximately 1/4 byte per base.  For
blat on smaller (primer-sized) queries or for In-silico PCR a more thorough index that
requires 1/2 byte per base is recommended.  The gfServer is memory intensive but typically
doesn not require a lot of CPU power.  Memory permitting multiple gfServers can be
run on the same machine.  

A typical installation might go:
    ssh bigRamMachine
    cd /data/genomes/twoBit
    gfServer start bigRamMachine 17779 hg16.2bit &
    gfServer -trans -mask start bigRamMachine 17778 hg16.2bit &
the -trans flag makes a translated index.   It will take approximately
15 minutes to build an untranslated index, and 45 minutes to build a
translate index.  To build an untranslated index to be shared with 
In-silico PCR do
    gfServer -stepSize=5 bigRamMachine 17779 hg16.2bit &
This index will be slightly more sensitive, noticeably so for small query sequences,
with blat.

EDITING THE WEBBLAT.CFG FILE

The webBlat.cfg file tells the webBlat program where to look for gfServers and
for sequence.  The basic format of the .cfg file is line oriented with the
first word of the line being a command.  Blank lines and lines starting with #
are ignored.  The webBlat.cfg and webPcr.cfg files are similar. The webBlat.cfg
commands are:
   gfServer  -  defines host and port a (untranslated) gfServer is running on, the 
   		associated sequence directory, and the name of the database to display in
                the webPcr web page.
   gfServerTrans - defines location of a translated server.
   background - defines the background image if any to display on web page
   company    - defines company name to display on web page
   tempDir    - where to put temporary files.  This path is relative to where the web
                server executes CGI scripts.  It is good to remove files that haven't
		been accessed for 24 hours from this directory periodically, 
		via a cron job or similar mechanism.
The background and company commands are optional.  The webBlat.cfg file must
have at least one valid gfServer or gfServerTrans line, and a tempDir line.
.  Here is a webBlat.cfg file that you
might find at a typical installation:

company Awesome Research Amalgamated
background /images/dnaPaper.jpg
gfServer bigRamMachine 17778 /data/genomes/2bit/hg16.2bit Human Genome
gfServer bigRamMachine 17779 /data/genomes/2bit/hg16.2bit Human Genome
gfServer mouseServer 17780 /data/genomes/2bit/mm4.2bit Mouse Genome
gfServer mouseServer 17781 /data/genomes/2bit/mm4.2bit Mouse Genome
tempDir ../tmp

PUTTING WEBBLAT WHERE THE WEB SERVER CAN EXECUTE IT

The details of this step vary highly from web server to web server.  On
a typical Apache installation it might be:
     ssh webServer
     cd kent/webBlat
     cp webBlat /usr/local/apache/cgi-bin
     cp webBlat.cfg /usr/local/apache/cgi-bin
assuming that you've put the executable and config file in kent/webBlat.
On OS-X, instead of /usr/local/apache/cgi-bin typically you'll copy stuff
to /LibraryWebServer/CGI-Executables.  Unless you are administering your
own computer you will likely need to ask your local system administrators
for help with this step.
