This directory has the data for the Nov 2003 freeze of GALA.
  PanTro1, pt1

Directory structure:
	Readme.txt
     	alignments/
		concise/
                hg16/ (axtBest)
                        gapchr*.dat.gz
                        localchr*.dat.gz
                        seqfile.dat.gz
	alternate_genes/
		*_Genes.dat.gz
		*_Genes_exons.dat.gz
	chromInfo.dat.gz
        conserved*.dat.gz
	cpgIsland.dat.gz
        fuguBlat/
		fuguBlat.dat.gz
		fuguBlat_block.dat.gz
	genes/
		exons.dat.gz
		genes.dat.gz
                go_defs_info.dat.gz
 	isochore.dat.gz
        mrna/
		chimp_est.dat.gz
		chimp_est_block.dat.gz
		chimp_mrna.dat.gz
		chimp_mrna_block.dat.gz
		nonchimp_est.dat.gz
		nonchimp_est_block.dat.gz
		nonchimp_mrna.dat.gz
		nonchimp_mrna_block.dat.gz
		spliced_est.dat.gz
		spliced_est_block.dat.gz
        net_aligns/
                net_aligns_hg16.dat.gz
	repeats/
		chr*_repeats.dat.gz
	restriction_sites/
		chr*.dat.gz
	tables.txt

NOTES on general file format
.gz     The files were compressed using gzip to save space and download time.

.dat    The files are comma delimited with text fields enclosed in single
        quotes.  Quotes within text are escaped by doubling them.
        A Unix newline separates the table rows.

NOTES on individual files/tables.
alignments -
        This is a pairwise alignment between panTro1 and the release indicated
	in the subfolder for all but the concise directory.  
	There are human alignments.
	The scoring is done using multiz.  
	These files can be used to generate lav files.
	The concise directory has the alignments in the concise format used
	by tools such as strong-hits.
        source: http://bio.cse.psu.edu
alternate genes -
        The alternate gene models are represented by pairs of files named
        after the track name.  These are all on freeze panTro1.
        download source: http://genome.ucsc.edu/
        source: tracks by different sources, indicated by name
chrom_info -
        The chromosome name, start, and stop, and species.
        download source: http://genome.ucsc.edu/
conserved_regions.dat -
        The regions that are found conserved in the pairwise alignments
        using strong-hits minus exons from GALA's default set of genes.
        Three levels are measured 60, 70, 80 percent identity.
	The alignments are panTro1/hg16.
        source: http://bio.cse.psu.edu
cpgIsland.dat -
        CpG islands in the panTro1 sequence.
        source: http://genome.ucsc.edu/
genes -
        Right now the default gene set is Human Refseq Genes
        sources:
	-gene coordinates http://genome.ucsc.edu/
enzyme_info.dat -
        The enzymes and patterns used for the restriction sites.
        source: http://www.neb.com/neb/products/res_enzymes/re_update_frame.html
fuguBlat -
	The chimp fugu alignments done using Blat.  There are 2 tables.
gc_percent.dat -
        The GC percent in set size windows genome wide.
        download source: http://genome.ucsc.edu/
isochore.dat -
	This file contains the regions of arbitrary length that have relatively
	uniform GC content.  The table has the chromosome, start and stop points
	as well as the GC%.
	source: http://nekrut.bx.psu.edu/
mrna - 	Data from UCSC Genome Browser mRNA and EST tracks.  2 tables per track,
	with table names reflecting the track name.  These are mostly the psl
	tables in the tables.txt file.
	download source: http://genome.ucsc.edu/
repeats -
        Repetitive elements.
        download source: http://genome.ucsc.edu/
restriction_sites -
        Matches to the enzyme pattern for 128 enzymes on the hg16 sequence.
        source: http://bio.cse.psu.edu
tables.txt -
        The table definitions.  The field types are listed as integer which is
        from -2,147,483,648 to 2,147,483,647.  varchar(N) which is a
        character field with up to N characters. smallint with is from
        -32,768 to 32,767. Fields generated from other fields have the
        equation used to generate the values and are used for indexing.