January 15, 2006 Human Exon 1.0 ST Array Update Notes
Updated the design time annotations based on some improvements and bug fixes. These changes affect some of the transcript cluster and exon cluster IDs. They also affect transcript cluster groupings. New versions for both build 34 and build 35 of the human genome are provided. These include new meta probeset lists for doing gene-level signal estimation in ExACT.
- Changed: intron_controls are now bounded with their associated transcripts (approx. 3000)
- Corrected: number_independent_probes and number_nonoverlapping_probes counts
- Corrected: There were some cases where the first exon of a transcript was incorrectly treated as a separate transcript.
- Corrected: There were some cases where a group of PSRs should have been labeled 'extended' and placed in the same transcript, but instead were labeled 'ambiguous' and placed in distinct transcripts.
- Removed: number_cross_hyb_probes field was removed from probeset lines because they were incorrect and deemed unnecessary. The self_cross_hybridizes and cross_hybridizes flags on the individual probe lines are correct and sufficient.
- Removed: 109 intron control probesets were removed because they were identical to other intron control probesets.
- Changed: For the build 35 gff files, previously the content was partitioned by the build 34 location rather than the build 35 location. Content is now partitioned by build 35 location.
- Corrected: There were transcript cluster gff lines with incorrect genome coordinates. (They did not span all the contained exon clusters, PSRs, and probesets.)
- Corrected: Evidence comments in the gff files for build 35 were not properly placed in the unmapped file when annotations were not lifted from build 34 to build 35. This is now corrected.
- Changed: Transcript clusters with content which maps to different strands of the same chromosome or different chromosomes are now correctly broken up.
- Changed: Probe sequence in the extra feat field was reverse complemented in order to reflect the actual probe sequence on the chip rather than the sequence of the biological target. Because these are sense target probesets the implication in these new files is that the probe sequence will be the reverse complement of the biological target it interrogates.
New NetAffx CSV Annotation Files
- Based on updated design time annotations described above
- Updated assignments based on NetAffx 2005 Q4 annotation update.
- Added probeset_id column to transcript cluster CSV file. This field is set with the transcript_cluster_id, not exon level probe set id. This change makes it easier to merge these annotations with gene level signal estimates using the ExACT software.
- Fixed error in the total_probes and the assignment coverage statistics for transcript clusters.
- Changed ranking of assigned mRNAs by data source (applies to both probe set and transcript cluster CSV files): RefSeq, Ensembl-Transcript, GenBank, Ensembl-EST, Ensembl-Prediction
New probeset sequence file
- Incorporates changes related to new design time build 35 annotation files
- Sequences are relative to build 35 of the human genome (previous release was build 34)
New probe tab and fasta file
- Probe sequence is now the reverse complement of what it was before in order to reflect the actual probe sequence on the chip rather than the sequence of the biological target. Because these are sense target probesets the implication in these new files is that the probe sequence will be the reverse complement of the biological target it interrogates.
NetAffx Analysis Center Update
- Updated annotations based on the NetAffx 2005 Q4 annotation update.
- Probe match tool will now show hits reverse complement to the actual biological target, reflecting the fact that these are sense target probes.
- Probe sequences displayed in the probeset view were changed to reflect the sequence on the chip rather than the biological target.
- Updated NetAffx DAS server
Human Exon 1.0 ST Array Sample Data Update
- changed the chiptype setting in the CEL files from HuEx-1_0-st-v1 to HuEx-1_0-st-v2: This should mitigate chiptype mismatches when using 3rd party software with the unsupported CDF file.
- added 3 more paired samples to the colon cancer data set: In two of these pairs there is a bright spot on the CEL image. The third pair contains a CEL file which is an outlier based on PLIER residuals.
New IGB bgn files reflecting new design time annotations will be made available as soon as they are ready.