Quick Order
Welcome, Guest
Downloads - The Effects of Alternative Splicing on Transmembrane Proteins in the Mouse Genome
This data has been distributed in conjunction with the publication The Effects of Alternative Splicing on Transmembrane Proteins in the Mouse Genome, presented at PSB 2004.  The data describes how alternative splicing alters transmembrane and signal peptide protein motifs in 8067 mouse proteins.  This is nonredundant set derived from all mouse cDNAs with reasonable genomic alignments.

As detailed in the paper, these proteins were grouped dynamically by gene and splice variant according to the genomic alignment of the associated cDNA sequence.  The proteins actually assessed were derived from the genomic translation of the GenBank CDS regions.  This was a deliberate measure to factor out the effects of genetic variation.  Thus, in some cases, the protein analyzed may differ from the protein in the GenBank record.  All proteins were applied to TMHMM and SIGNALP to identify putative transmembrane and signal peptide motifs, respectively.  The paper assesses how these protein annotations varied between different splice variants of the same gene, and analyzes the relation between these motifs and splice sites.

This data is available in RDF and N3 format, and is organized as follows.  The files all_genes.1.rdf to all_genes.14.rdf contain the data on all the 8067 sequences and 6847 genes analyzed.  Due to the size of this data, it was divided into fourteen files by gene.  The files  multi_variant_genes.1.rdf to multi_variant_genes.4.rdf contain a subset of that data; the 904 genes with multiple protein isoforms, with a total of 2118 sequences.  Finally, the file genes_with_differing_annotations.rdf reports on the 138 genes for which the motifs differed between the protein isoforms.  These genes were associated with 396 sequences.

In this data, each splice variant is represented by one transcript.  The properties associated with each transcript include its protein translation, its set of exons, and its set of annotations.  The exons are described by: genomic start and stop coordinates; whether they're part of a CDS region;  the CDS start and stop, if they are not the same as the exon start and stop; and the translation frame, if any.  Each annotation is described by its set of protein spans: ungapped motifs in the protein sequence.  Each protein span is associated with one or more genomic spans, ungapped regions of genomic alignment.  If the genomic coordinates of a protein span are not divided by an intron, then the protein span will have one associated genomic span.  Finally, each exon and each annotation are described as whether they're common to all variants of the gene.

For more information, please contact the authors of the paper.
DOWNLOADS
all_genes.1.n3
all_genes.1.rdf
all_genes.2.n3
all_genes.2.rdf
all_genes.3.n3
all_genes.3.rdf
all_genes.4.n3
all_genes.4.rdf
all_genes.5.n3
all_genes.5.rdf
all_genes.6.n3
all_genes.6.rdf
all_genes.7.n3
all_genes.7.rdf
all_genes.8.n3
all_genes.8.rdf
all_genes.9.n3
all_genes.9.rdf
all_genes.10.n3
all_genes.10.rdf
all_genes.11.n3
all_genes.11.rdf
all_genes.12.n3
all_genes.12.rdf
all_genes.13.n3
all_genes.13.rdf
all_genes.14.n3
all_genes.14.rdf
genes_with_differing_annotations.n3
genes_with_differing_annotations.rdf
multi_variant_genes.1.n3
multi_variant_genes.1.rdf
multi_variant_genes.2.n3
multi_variant_genes.2.rdf
multi_variant_genes.3.n3
multi_variant_genes.3.rdf
multi_variant_genes.4.n3
multi_variant_genes.4.rdf