This directory has the data for the Feb 2004 freeze of GALA.
  galGal2

Directory structure:
	Readme.txt
*not available	align_3way/
		log.chr*.dat.gz
     	alignments/
		hg16/ (axtBest)
      			gapchr*.dat.gz
			localchr*.dat.gz
			seqfile.dat.gz

	alternate_genes/
		*_Genes.dat.gz
		*_Genes_exons.dat.gz
	chromInfo.dat.dat
        conserved90_hg16.dat.gz
        conserved80_hg16.dat.gz
 	conserved70_hg16.dat.gz
        conserved_tfbs/
                gg2Hg16/
			chr*.gala.gz
	cpgIsland.dat.gz
	gcPercent.dat.gz
        isochore.dat.gz
        fuguBlat.dat.gz
        fuguBlat_block.dat.gz
        cDNA/
               *.dat.gz
               *_block.dat.gz
	genes/
		exons.dat.gz
		gene_alias.dat.gz
		gene_cdd.dat.gz
		gene_dbids.dat.gz
		gene_expr.dat.gz
		gene_model_prots.dat.gz
*not available		gene_ontology.dat.gz
*not available		gene_orthologs.dat.gz
		gene_product.dat.gz
		genename.dat.gz
		genes.dat.gz
                godef.dat.gz
*not available	microRNA.dat.gz
*not available	multiple_aligns/
	predicted_promoters.dat.gz
*not available	recombRate.dat.gz
*not available	known_regulatory.dat.gz
	repeats/
		repeatschr*.dat.gz
	restriction_sites/
		chr*.dat.gz
*not available	rp_multiple/
		chr*.3wayRP.dat.gz
*not available	snp_allele.dat.gz
*not available	snps.dat.gz
	tables.txt
*not available	tissues.dat.gz
        tfbinding_sites/
  		chr*.total.gz

NOTES on general file format
.gz     The files were compressed using gzip to save space and download time.

.dat    The files are comma delimited with text fields enclosed in single
        quotes.  Quotes within text are escaped by doubling them.
        A Unix newline separates the table rows.

NOTES on individual files/tables.
alignments -
        This is a pairwise alignment between galGal2 and the release indicated
	in the subfolder.  There are human alignments.
	The scoring is done using axtBest.  
	These files can be used to generate lav files.
        source: http://bio.cse.psu.edu
alternate genes -
        The alternate gene models are represented by pairs of files named
        after the track name.  These are all on freeze galGal2.
        download source: http://genome.ucsc.edu/
        source: tracks by different sources, indicated by name
chrom_info -
        The chromosome name, start, and stop, and species.
        download source: http://genome.ucsc.edu/
conserved_regions.dat -
        The regions that are found conserved in the pairwise alignments
        using strong-hits minus exons from GALA's default set of genes.
        Three levels are measured 70, 80, 90 percent identity.
	The alignment is galGal2/hg16
        source: http://bio.cse.psu.edu
cpgIsland.dat -
        CpG islands in the galGal2 sequence.
        source: http://genome.ucsc.edu/
enzyme_info.dat -
        The enzymes and patterns used for the restriction sites.
        source: http://www.neb.com/neb/products/res_enzymes/re_update_frame.html
gc_percent.dat -
        The GC Percent track from UCSC for the galGal2 freeze.
        download source: http://genome.ucsc.edu/
isochore.dat -
        The GC Percent track from Anton Nekrutenko's Group at PSU
genes -
        The default set of genes are the Refseq genes from the UCSC genenome  
        browser.  The Locus Link ID is then used to tie the gene
        coordinates to more annotations on the genes.  Alternately spliced
        genes have each listed as a separate gene, therefore a we assigned
        a unique ID to each gene rather than using the Locus Link ID as the
        primary key.  
        sources:
        -evid code from LL map data found at
        ftp.ncbi.nih.gov/genomes/H_sapiens/maps/mapview/elements/LOCUS_objects.gz
        -genbank files *.gbs
        ftp.ncbi.nih.gov/genomes/H_sapiens/CHR_N/*.gbs
        -refseq genes
        http://genome.ucsc.edu/
        -annotations for genes
        ftp.ncbi.nih.gov/refseq/LocusLink/LL_tmpl.gz
                        /repository/UniGene/Hs.data.gz
                        /repository/OMIM/genemap
        http://expression.gnf.org/cgi-bin/index.cgi
repeats -
        Repetitive elements.
        download source: http://genome.ucsc.edu/
restriction_sites -
        Matches to the enzyme pattern for 128 enzymes on the hg16 sequence.
        source: http://bio.cse.psu.edu
snpall.dat -
snps.dat -
        sources: http://genome.ucsc.edu/  files:bgisnp.txt.gz
tables.txt -
        The table definitions.  The field types are listed as integer which is
        from -2,147,483,648 to 2,147,483,647.  varchar(N) which is a
        character field with up to N characters. smallint with is from
        -32,768 to 32,767. Fields generated from other fields have the
        equation used to generate the values and are used for indexing.
tfbinding_sites -
        Matches to 166 transcription factor matrices.  The matrices come
        from Transfac (registration required).  These are matches on the
        galGal2 sequence of 75% or better.
        source: http://bio.cse.psu.edu