LIBQUEST

From Mswiki

Jump to: navigation, search

LIBQUEST Instructions

Introduction

LIBQUEST is an algorithm to calculate similarity between tandem mass spectra. Mathematically, it is very similar to SEQUEST; it uses a cross-correlation function to measure spectral similarity, except that LIBQUEST compares a real spectrum to another real spectrum, whereas SEQUEST compares a real spectrum to a simulated spectrum of a peptide sequence.

LIBQUEST was written in the C programming language on a UNIX platform. The original reference is:

Yates, J. R., 3rd; Eng, J. K.; McCormack, A. L.; Schieltz, D. Anal Chem 1995, 67, 1426-1436.

Several modifications to the original algorithm were made. The program was modified to use MS2 formatted data files as input. The experimental and reference spectra are both preprocessing in the same manner: only the 100 most intense ions are used, and the intensity of each peak is square rooted. The scoring scheme was adjusted such that the search score is calculated by the following formula:

Score = 10 * CCer / (square root (CCee * CCrr ) )

Where CCer represents the cross-correlation score of the experimental versus the reference spectrum, and CCee and CCrr represent the cross-correlation scores of the experimental spectrum versus itself and the reference spectrum versus itself, respectively. The program was modified to include LAM MPI functions to distribute processing across multiple nodes on a supercomputer.

Running LIBQUEST

Library Construction

There are two ways to create a library the LIBQUEST can search. The first is simply creating a library from a specified set of MS2 files using the MakeLib Perl script. This library will not include information regarding the peptide sequence assignments for these spectra. This method of library construction is best for clustering applications that make use of the LIBQUEST output. In a directory of MS2 files that should be joined together to create a library, the command is:

/data/6/jrjohns1/perlscripts/MakeLib *ms2

This will create a file called ms2lib.lib. This file can then be used as a reference library for LIBQUEST.

The second method of library construction is to create a library of spectra that have been identified by SEQUEST and filtered by DTASelect using the DTASelect2Lib Perl script. This library will include peptide sequence assignment information that will be indicated in the LIBQUEST output. This method of library construction is best for identifying spectra that have already been observed, for example, comparing a synthetic peptide or protein to a set of MS/MS spectra. When running DTASelect, the --DB option must be used, and the –t 0 option must not be used. All other options are up to the user. In a directory that includes MS2 files and the associated DTASelect files, the command to run the script is:

/data/6/jrjohns1/perlscripts/DTASelect2Lib

This will create a library called ms2lib.lib consisting of only spectra that are specified in the DTASelect files.

Running LIBQUEST

Currently, LIBQUEST only runs on skinner. All files must be uploaded to skinner in order to search. The three types are files that are required are: 1) input MS2 files to be searched against the library, 2) a library file generated by one of the library construction methods above, and 3) a libquest.params file. A libquest.params file can be found in the following directory:

/home/jrjohns1/libquest

This file should be copied to a directory in the users account containing the MS2 files and the library file, and modified to specify the full path of the library and the mass tolerance desired for the search.

To submit the jobs on skinner, use the submit_libquest scripts with the following command:

/home/jrjohns1/libquest/submit_libquest

The user will be prompted to enter the number of nodes to use for search. The progress of the submitted jobs can be monitored with the qstat command. LIBQUEST Output Files

LIBQUEST generates .lqt files containing output information. There are two types of lines in a .lqt file: 1) S lines, which contain the input filename, scan number, and precursor mass, and 2) M lines, which contain the scan number of the library spectrum, the original filename of the library spectrum, the original scan number of the library spectrum, the precursor mass of the library spectrum, the LIBQUEST score, the peptide sequence assigned to the library spectrum, and the charge state of the peptide sequence assigned to the library spectrum. At a later time, this manual will be updated with methods for utilizing LIBQUEST output information for clustering applications. For now, the user must interpret the output either manually or by writing a customized Perl script.

Personal tools