[inria-00269967, v1] gene selection in cancer classification using gpso/svm and ga/svm hybrid algorithms

Author manuscript, published in "Congress on Evolutionary Computation, Singapor : Singapour (2007)" Gene Selection in Cancer Classification using PSO/SVM and
GA/SVM Hybrid Algorithms
Enrique Alba, Jos´e Garc´ıa-Nieto, Laetitia Jourdan and El-Ghazali Talbi Abstract— In this work we compare the use of a Particle
a filter method, is in definition, independent of the learning Swarm Optimization (PSO) and a Genetic Algorithm (GA)
algorithm used after it. The second one, the wrapper method, (both augmented with Support Vector Machines SVM) for
which carries out the feature subset selection and classifi- the classification of high dimensional Microarray Data. Both
cation in the same process, engages a learning algorithm algorithms are used for finding small samples of informative
genes amongst thousands of them. A SVM classifier with 10-
to measure the classification accuracy. From a conceptual fold cross-validation is applied in order to validate and evaluate
point of view, wrapper approaches are clearly advantageous, the provided solutions. A first contribution is to prove that
since the features are selected by optimizing the discriminate P SOSV M is able to find interesting genes and to provide clas-
power of the finally used induction algorithm.
sification competitive performance. Specifically, a new version
of PSO, called Geometric PSO, is empirically evaluated for
Feature selection for gene expression analysis in cancer the first time in this work using a binary representation in
prediction often uses wrapper classification methods [7] to Hamming space. In this sense, a comparison of this approach
discriminate a type of tumor, to reduce the number of genes with a new GASV M and also with other existing methods of
to investigate in case of a new patient, and also to assist literature is provided. A second important contribution consists
in drug discovery and early diagnosis. Several classification in the actual discovery of new and challenging results on six
public datasets identifying significant in the development of a
algorithms could be used for wrapper methods, such as K- variety of cancers (leukemia, breast, colon, ovarian, prostate,
Nearest Neighbor (K-NN) [8] or Support Vector Machines and lung).
(SVM) [9]. By creating clusters a big reduction of the numberof considered genes and an improvement of the classification Microarray technology (DNA microarray) [1] allows to The definition of the feature selection problem is this: simultaneously analyze thousands of genes and thus can give given a set of features F = {f1, ., fi, ., fn}, find a subset important insights about cell’s function, since changes in F ⊆ F that maximizes a scoring function Θ : Γ → G such the physiology of an organism are generally associated with changes in gene expression patterns. Several gene expres- F = argmaxG⊂Γ{Θ(G)}, sion profiles obtained from tumors such as Leukemia [2],Colon [3] and Breast [4] have been studied and compared to where Γ is the space of all possible feature subsets of expression profiles of normal tissues. However, expression F and G a subset of Γ. The optimal feature selection data are highly redundant and noisy, and most genes are problem has been shown to be NP-hard [10]. Therefore, believed to be uninformative with respect to studied classes, only heuristics approaches are able to deal with large size as only a fraction of genes may present distinct profiles for problems. Recently, such advanced structured methods have different classes of samples. So, tools to deal with these been used to explore the huge space of feature subsets, like issues are critically important. These tools should learn to for example metaheuristics as Evolutionary Algorithms and, robustly identify a subset of informative genes embedded specifically, Genetic Algorithms (GAs) [11], [5], [12].
out of a large dataset which is contaminated with high- In this work, we are interested in gene selection and classification of DNA Microarray data in order to distin- In this context, feature selection is often considered as guish tumor samples from normal ones. For this purpose, a necessary preprocess step to analyze these data, as this we propose two hybrid models that use metaheuristics and method can reduce the dimensionality of the datasets and classification techniques. The first one consists of a Particle often conducts to better analyses [6].
Swarm Optimization (PSO) [13] combined with a SVM Two models of feature selection exist depending on approach. PSO is a population based metaheuristic inspired whether the selection is coupled with a learning scheme or by the social behavior of bird flocking or fish schooling.
not. The first one, filter model, which carries out the feature Specifically, a recent version called Geometric PSO [14] subset selection and the classification in two separate phases, (explained in Section II) has been used in this work. The uses a measure that is simple and fast to compute. Hence, second model is based on the popular GA using a spe- E. Alba and J. Garc´ıa-Nieto are with the Department of Lengua- cialized Size-Oriented Common Feature Crossover Operator jes y Ciencias de la Computaci´on, Universidad de M´alaga, (email: (SSOCF) [15], which keeps useful informative blocks and {eat,jnieto}@lcc.uma.es, url http://neo.lcc.uma.es) produces offsprings which have the same distribution than L. Jourdan and E-G. Talbi are with the LIFL/INRIA Futurs-Universit´e de Lille 1, Bˆat M3-Cit´e Scientifique, (email: {jourdan,talbi}@lifl.fr, url the parents. This model will be also combined with SVM in Both proposed approaches are experimentally assessed on six well-known cancer datasets (Leukemia, Colon, Breast, Support Vector Machines, a technique derived from statis- Ovarian [16], Prostate [17] and Lung [18]), discovering new tical learning theory, is used to classify points by assigning and challenging results and identifying specific genes that them to one of two disjoint half spaces [9]. So, SVM our work suggests as significants. Performances of proposed performs mainly a (binary) 2-class classification. For linearly PSO and GA algorithms solving the gene extraction problem separable data, SVM obtains the hyperplane which maxi- (using SVM) are compared in this paper. Specifically, we fo- mizes the margin (distance) between the training samples cused in the capacity of the P SOSV M combination in order and the class boundary. For non linearly separable cases, to provide considerable performance in this matter. In this samples are mapped to a high dimensional space where sense, comparisons with several state of art methods show such a separating hyperplane can be found. The assignment competitive results according to the conventional criteria.
is carried out by means of a mechanism called the kernel The outline of this work as follows. We review the PSO and the SVM techniques in order to introduce our P SOSV M SVM is widely used in the domain of cancer studies, hybrid model in Section II. In Section III, the six microarray protein identification and specially in Microarray data [6], datasets used in this study are described. Experimental results [12], [19]. Unfortunately, in many bioinformatics problems are presented in Section IV, including biological descriptions the number of features is significantly larger than the number of several obtained genes. Finally, we summarize our work of samples. For this reason, tools for decreasing the number and present some conclusions and possible future work in of features in order to improve the classification or to help to identify interesting features (genes) in noisy environmentsare necessary. In addition, SVM can treat data with a large number of genes, but it has been shown that its perfor- mance is increased by reducing the number of genes [20].
In this section, we describe the hybrid P SOSV M approach The hybrid PSO and hybrid GA approaches next proposed for gene selection and classification of Microarray data. The PSO algorithm is designed for obtaining gene subsets assolutions in order to reduce the high number of genes to be later classified. The SVM classifier is used whenever the In order to offer a basic idea of the operation of our fitness evaluation of a tentative gene subset is required.
P SOSV M approach, in Figure 1, we can observe a sim-ple scheme of how features are extracted from the initial microarray dataset and how the resulted subset is evaluated.
Particle Swarm Optimization was first proposed by In a first phase, the metaheuristic algorithm involved, PSO Kennedy and Eberhart in 1995 [13]. PSO is a population in this case, provides a binary encoded particle1 where each based evolutionary algorithm inspired in the social behavior bit2 represents a gene. If a bit is 1, it means that this gene of bird flocking or fish schooling. In the description of PSO, is kept in the subset and 0 indicates that the gene is not the swarm is made up of a certain number of particles (sim- included in the subset. Therefore, the particle length is equal ilar to population of individuals in EAs). At each iteration, to the number of genes in the initial microarray dataset.
all the particles move in the problem space to find the global The original PSO was initially developed for continuous optima. Each particle has a current position vector and a optimization problems. However, lots of practical enginee- velocity vector for directing its movement.
ring problems are formulated as combinatorial optimizationproblems and specifically as binary decisions. Several binary = ω · vki + ϕ1 · rnd1 · (pBesti − xki) + ϕ2 · rnd2 · (gi − xki) (2) versions of PSO can be found in present literature [21], [22].
Nevertheless, these versions consist on ad hoc adaptations from the original PSO and therefore their performances areusually improvable.
Equations 2 and 3 describe the velocity and position update With the aim of facing the gene selection problem, an of a given particle i at a certain iteration k. Equation 2 innovative version of PSO, based on the geometric frame- calculates a new velocity vi for each particle (potential work presented in [14], has been developed in this work.
solution) based on its previous velocity, the particle’s location This version, called Geometric Particle Swarm Optimiza- at which the best fitness so far has been found pBesti, tion (GPSO), enables to us to generalize PSO to virtually and the population global (or local neighborhood, in the any solution representation in a natural and straightforward neighborhood version of the algorithm) location at which way. This property was demonstrated for the cases of Eu- the best fitness so far has been achieved gi. Individual and clidean, Manhattan and Hamming spaces in the referenced social weight are represented by means of ϕ1 and ϕ2 factors work. Even a recently appeared work [23], uses the GPSO respectively. Finally, rnd1 and rnd2 are random numbers for solving the Sudoku Puzzle by means of permutations in range {0, 1}, and ω represents the inertia weight factor.
Equation 3 updates each particle’s position xi in solution 1chromosome in GA and solution (S) in Figure 1 Subset Evaluation
Solution (S). Provided by
Metaheuristic (PSO, GA)
SVM – Classification
10-Fold Cross
Validation
Microarray
Initial Dataset
1 0 1 1 . 0 0 1
Fitness(S)=Accuracy &
Subset Size
A simple scheme of how features (genes) are selected out from the original microarray dataset using a particle with binary encoding. In a second phase, the resulted subset is evaluated by means of a SVM classifier and 10-fold cross validation to obtain the fitness value (accuracy) of such particle.
representation. Since the gene selection problem has been Algorithm 1 Pseudocode of the GPSO for Hamming space.
represented by binary way, specific operators for Hamming space were used in the PSO described here.
2: while not stop condition do
3:
for each particle xi of the swarm S do
D. Geometric Particle Swarm Optimization if f itness(xi) is better than fitness(hi) then
In this version, the location of each particle i is represented as vector xi = xi1, xi2, ., xiN taking each bit xij (with if f itness(hi) is better than fitness(gi) then
j in {1, N }) binary values 0 or 1. The key issue of the GPSO is the concept of particle movement. In this approach, instead of the notion of velocity added to the position, a for each particle xi of the swarm S do
three-parent mask-based crossover (3PMBCX) operator is xi ← 3P MBCX((xi, w1), (gi, w2), (hi, w3)) applied to each particle in order to “move” it. According to the definition of 3PMBCX [14], given three parents a, b and 16: end while
c in {0, 1}n, generate randomly a crossover mask of length n 17: Output: best solution found
with symbols from the alphabet {a, b, c}. Build the offspringfilling each element with the bit from the parent appearingin the crossover mask at the position.
a second phase, after the evaluation of particles (line 4), The pseudocode of the GPSO algorithm for Hamming historical and social position are updated (lines 5 to 10).
spaces is illustrated in Algorithm 1. For a given particle i, Finally, particles are “moved” by means of the 3PMBCX three parents take part in the 3PMBCX operator (line 13): the operator (line 13). In addition, with a probability of 10%, a simple bit-mutation operator (line 14) is applied in order i, the social best position gi and the histori- to avoid the early convergence. This process is repeated i (of this particle). The weight values w1, w2 and w3 indicate for each element in the crossover until reach the stop condition fixed to a certain number of mask the probability of having values from the parents x gi or hi respectively. These weight values associated to each parent represent the inertia value of the current position (w1),the social influence of the global/local best position (w2) and Since the position of a particle xi represents a gene subset, the individual influence of the historical best position found the evaluation of each particle is carried out by means of the (w3). A constriction of the geometric crossover forces w1, SVM classifier to assess the quality of the represented gene w2 and w3 to be non-negative and add up to one.
subset. The fitness of a particle xi is calculated applying a 10-fold Cross Validation (10FCV) method to calculate the rate of In summary, the GPSO developed in this study oper- correct classification (accuracy) of a SVM trained with this ates as follows: In a first phase of the pseudocode, the gene subset. In 10FCV, the data set is divided into 10 subsets.
initialization of particles are carried out by means of the Each time, one of the 10 subsets is used as the test set and SwarmInitialization() function (Line 1). This special the other 9 subsets are put together to form the training set.
initialization method (used also in our GA approach) was Then the average error across all 10 trials is computed. The adapted to gene selection as follows. The swarm (population) complete fitness function is described in Equation 4.
was divided into four subsets of particles (chromosomes)initialized in different ways depending on the number of f itness(x) = α · (100/accuracy) + β · #f eatures, (4) features in each particle. That is, 10% of particles wereinitialized with N (prefixed value) selected genes (1s) located where α and β are weight values set to 0.75 and 0.25 randomly. Another 20% of particles were initialized with 2N respectively in order to control that the accuracy value genes, 30% with 3N genes and finally, the rest of particles takes priority over the subset size, since high accuracies are (40%) were initialized randomly and 50% of the genes were preferred when guiding the search process. The objective turned on. In these experiments N will be equal to 4. In here consists of maximizing the accuracy and minimizing the number of genes (#f eatures). For convenience (only All experiments were carried out using a PC with Linux minimization of fitness) the first factor is presented as O.S (Suse 9.0 with kernel 2.4.19) and a Pentium IV 2.8GHz processor, with 512MB of RAM. P SOSV M and GASV Malgorithms on six cancer related microarray datasets were independent executed 10 times over each dataset, in order to Instances used in this study consists of six well- have statistically meaningful conclusions as both algorithms known datasets issued of microarray experiments, ALL- AML Leuke-mia dataset, Breast cancer dataset, Colon tumordataset, Ovarian cancer dataset, Prostate cancer dataset, and Lung cancer dataset. All of them were taken from the public The parameters used in our PSO and GA algorithms Kent Ridge Bio-medical Data Repository with url http:// are shown in Table I. These parameter were selected after sdmc.lit.org.sg/GEDatasets/Datasets.html.
several test evaluations of each algorithm and dataset instance ◦ The ALL-AML Leukemia dataset consists of 72 mi- until reach the best configuration in terms of the quality of croarray experiments with 7129 gene expression le- solutions and the computational effort.
vels. Two classes for distinguishing: Acute MyeloidLeukemia (AML) and Acute Lymphoblastic Leukemia (ALL). The complete dataset contains 25 AML and 47 PSO AND GA PARAMETERS FOR GENE SUBSET SELECTION AND ◦ The Breast cancer dataset consists of 97 experiments with 24481 gene expression levels. Patients studied Parameter
Parameter
show two classes of diagnosis called relapse with 46 patients and non-relapse with 51 ones.
◦ The Colon tumor dataset consists of 62 microarray experiments collected from colon-cancer patients with2000 gene expression levels. Among them, 40 tumorbiopsies are from tumors and 22 (normal) biopsies are from healthy parts of the colons of the same patients.
◦ The Lung cancer dataset involves 181 microarray ex- Several observations can made based on the above expe- periments with 12533 gene expression levels. Classifi- riments, so we tackle the analysis of results focusing on the cation occurs between Malignant Pleural Mesothelioma performance and robustness of our algorithms, as well as (MPM) and Adenocarcinoma (ADCA) of the lung. In the quality of the obtained solutions providing a biological tissue samples there are 31 MPM and 150 ADCA.
description of most significant ones.
◦ The Ovarian cancer dataset consists of 253 microarray 1) Performance Analysis: From the point of view of experiments with 15154 gene expression levels. The the performance, both algorithms obtain in a few iterations goal of this experiment is to identify proteomic patterns acceptable results in gene selection, providing reduced sub- in serum that distinguish cancer from non-cancer scenar- sets with high classification rates. However, the behavior ios. The dataset includes 162 (of 253) ovarian cancers is slightly different. Figure 2 shows a graphical evolution, in terms of the average of the fitness value, of a typical ◦ The Prostate cancer dataset involves 136 microarray execution of P SOSV M and GASV M . It is noticeable that in experiments with 12600 gene expression levels. Two few iterations (4 or 5) the average of fitness decrease quickly classes must be differentiated: tumor with 77 (52 + 25) and then stop in similar solutions. The large diversity of samples and non-tumor with 59 (50+9) samples.
solutions provided in the initialization method (Section II-B)provokes fast of good solutions and the early convergence of both methods. Although the GASV M generally obtains For our P SOSV M approach, the PSO was imple- lower average than P SOSV M , whose solutions have in turn mented in C++ following the skeleton architecture of the MALLBA [24] library. For the GASV M approach the GA Results for all the datasets are shown in Table II. Columns was implemented in C++ using the ParadisEO [25] Frame- 2 and 3 contain the average of the best solutions obtained work. The GA implements a generational evolution strategy in 10 independent executions of P SOSV M and GASV M (offspring replacement with elitism) and uses the follow- respectively. Six state of the art methods from literature are ing operators: deterministic tournament selection, SSOCF presented in columns 4 to 10 in order to show how our crossover, and uniform mutation. The SVM classifier used proposals actually push forward the research in this area.
in both approaches is based on the LIBSVM [26] library.
Cells in - haven’t values to our knowledge. Standard criteria For the SVM confi-guration, the same parameters were used are used to compare the results: the classification accuracy in PSO and GA algorithms and the Kernel function was in terms of the rate of correct classification (first value in configured as Linear. The fitness function used in GASV M every table cell) and the number of selected genes (the value is the same one (described in Section II) as in P SOSV M .
COMPARISON OF RELEVANT WORKS ON CANCER CLASSIFICATION WITH PROPOSED MODELS GASV M AND P SOSV M . IN BOLD WE MARK THE MOST ACCURATE RESULTS. CELLS WITHOUT KNOWN VALUE (TO US) ARE MARKED WITH - CHARACTER Luekemia
Prostate
2) Algorithm Robustness: One of the most important criteria in evaluating any proposed algorithm is the quality of the algorithm and its ability to generate similar (identical) outcomes when executed several times. This factor is veryimportant for metaheuristics which is the case in this work.
To examine the robustness of the two proposed approaches, AVG Fitness
in some instances, in all ten runs both algorithms manages to find the same answer or similar ones (not identical).
However, it is worthwhile mentioning that the total accuracy and number of selected features in all the cases did notdeviate from each other by more than 5.5. Table III shows the Number of Iterations
result of running the GASV M and P SOSV M algorithms interms of statistical results, reporting the Best solution found, Fig. 2. Evolution of the average fitness (AVG Fitness) in a typical execution Mean and Standard Deviation of ten independent runs.
of P SOSV M and GASV M approaches using the Leukemia dataset.
COMPARISON IN TERMS OF STATISTICAL RESULTS OF THE GASV M In this comparison, we can observe that all solutions AND P SOSV M APPROACHES. THE Best SOLUTION FOUND, Mean AND SV M and GASV M algorithms present a classification rate higher than 86%, and subsets with four and less than four selected genes are common. We Leukemia
outperform all the existing results (to our knowledge) but one case [6] presents a smaller subset of 2 genes. We suspect that the initialization method used in our work helps the performance of the algorithms significantly, finding small Prostate
subsets with a high classification accuracy. PSO with SVMalgorithm was also used in [30] (PSO/SVM from now on), 3) Brief Biological Analysis of Selected Genes: Finally, a although showing several differences with respect to our summary of the best subsets of genes found for each dataset P SOSV M . In the first place, in PSO/SVM the binary values is shown in Table IV. All subset of genes which reported of each particle are chosen at each iteration by means of a close to 100% test accuracy and the minimum number of decision function based on threshold parameters, whereas in genes. It is remarkable that apparently (to our knowledge) P SOSV M , binary particles evolve following a completely several discovered genes that has not been seen in any past different mechanism, that is, 3PMBCX crossover and muta- studies. In this sense, we can provide a brief biological tion operators (Subsection II-D). In second place, during the description of some of the most frequently obtained genes evaluation phase, a leave-half-out cross-validation method since they are currently used in the design of drugs and was used in PSO/SVM whereas the P SOSV M validates the cancers treatment. Some of which are listed below: selected subsets by means of 10 k fold cross-validation.
• Gene L12052 at is “CAMP phosphodiesterase mRNA, If we compare our GPSO and GA metaheuristics com- 3’ end” which is used in drugs like Anagrelide or bined with SVM similar results are found. In general, the GA Milrinone. Specifically the Anagrelide is used for the approach obtain better best solutions (Hits) although the best treatment of essential thrombocytosis, and it was proved classification was provided by GP SOSV M for the Colon to be effective in treating patients with certain kinds of tumor dataset (100% accuracy and 2 genes in Table IV).
leukemia such as chronic myeloid leukemia [31]. This From the point of view of the accuracy average in all gene belongs to a set of 3 genes (reported from leukemia independent runs, the GPSO obtain a better performance dataset in Table IV) with 100% accuracy selected by the although the difference with regard to the GA (as shows • Gene AB022847 is a “Solute carrier family 6 (neu- SUBSETS OF GENES REPORTED WITH 100% TEST ACCURACY Leukemia
Prostate
Independent runs
Independent runs
Fig. 3. Accuracy obtained by P SOSV M and GASV M in each independent run. Legends specify the datasets with the number of features in parenthesis.
rotransmitter transporter, noradrenalin), member 2” lo- belongs to a set of 4 genes (reported from lung dataset in cated in plasma membrane. Current drugs like Radax- Table IV) with 100% accuracy selected by the GASV M .
afine, Amphetamine or Venlafaxine are associated withthis gene. Specifically, Venlafaxine is a prescription antidepressant first introduced by Wyeth in 1993. It In this work, two hybrid techniques for gene selection is specifically used in management of hot flashes in and classification of high dimensional DNA Microarray data survivors of breast cancer [32]. This gene belongs to a were presented and compared. These techniques are based set of 3 genes (reported from breast dataset in Table IV) on different metaheuristic algorithms such as GPSO and GA with 95.8763% accuracy selected by the P SOSV M .
used for feature selection using the SVM classifier to identify • Gene 36245 at is “5-hydroxytryptamine (serotonin) re- potentially good gene subsets. Specifically, the Geometric ceptor 2B” located in human plasma membrane. There PSO algorithm for Hamming space was used to solve a real are several drugs where this gene is used like Risperi- problem (gene selection in this case) for the first time (to done, Blonanserin and Mirtazapine. Some studies con- our knowledge). In addition, genes selected are validated by sider Mirtazapine as the first-choice agent for anxiety an accurate 10-fold cross validation method to improve the and depression after lung transplantation. This gene Both approaches (P SOSV M and GASV M ) were experi- [12] E. B. Huerta, B. Duval, and J.-K. Hao, “A Hybrid GASVM Approach mentally assessed on six well-known cancer datasets disco- for Gene Selection and Classification of Microarray Data,” in LectureNotes in Computer Science of EvoWorkshops, F. Rothlauf, J. Branke, vering new and challenging results, and identifying specific S. Cagnoni, E. Costa, C. Cotta, R. Drechsler, E. Lutton, P. Machado, genes that our work suggests as significant ones. In this J. H. Moore, J. Romero, G. D. Smith, G. Squillero, and H. Takagi, sense, comparisons with several state of art methods show [13] J. Kennedy and R. Eberhart, “Particle Swarm Optimization,” in Proc. competitive results according to standard evaluation. Results of the IEEE International Conference on Neural Networks, vol. 4, of 100% classification rate and few genes per subset (3 and 4) are obtained in most of our executions. The use of an [14] A. Moraglio, C. D. Chio, and R. Poli, “Geometric Particle Swarm Optimization,” in 10th European conference on Genetic Programming adapted initialization method has shown a great influence on (EuroGP 2007), ser. Lecture Notes in Computer Science, vol. 4445.
the performance of proposed algorithms, since it introduces an early set of acceptable solutions in their evolution process.
[15] L. Jourdan, C. Dhaenens, and E.-G. Talbi, “A genetic algorithm for feature selection in data-mining for genetics,” in Proceedings of the Continuing the line of this work, we are interested in deve- 4th Metaheuristics International ConferencePorto (MIC’2001), Porto, loping and testing several combinations of other metaheuris- tics with classification methods in order to discover new and [16] E. F. Petricoin, A. M. Ardekani, B. A. Hitt, P. J. Levine, V. A. Fusaro, S. M. Steinberg, G. B. Mills, C. Simone, D. A. Fishman, E. C. Kohn, better subsets of genes using specific Microarray datasets. In and L. A. Liotta, “Use of proteomic patterns in serum to identify this sense, the utilization of multiobjective approaches could ovarian cancer,” The Lancet, vol. 359, pp. 572–577, 2002.
contribute notably in gene subset selection.
[17] D. Singh, P. Febbo, K. Ross, D. Jackson, J. Manola, C. Ladd, P. Tamayo, A. Renshaw, A. D’Amico, and J. Richie, “Gene expression correlates of clinical prostate cancer behavior,” Cancer Cell, vol. 1, pp.
[18] G. J. Gordon, R. V. Jensen, L.-L. Hsiao, S. R. Gullans, J. E.
Blumenstock, S. Ramaswamy, W. G. Richards, D. J. Sugarbaker, and PERFOM 3+3 Mediterranean project. E. Alba and J.
R. Bueno, “Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma,” Cancer Res, vol. 62, pp. 4963–4967, 2002.
[19] S. Mukherjee, Classifying Microarray data using Support Vector contract TIN2005-08818-C04-01 (the OPLINK project, Machines, Boston, MA, 2003, ch. 9, pp. 166–185.
[20] T. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler, “Support vector machines classification and validation of cancer tissue samples using microarray expression data,” Bioinformat-ics, vol. 16, no. 10, pp. 906–914, 2000.
[1] A. Pease, D. Solas, E. Sullivan, M. Cronin, C. P. Holmes, and S. Fodor, [21] J. Kennedy and R. Eberhart, “A Discrete Binary Version of the “Light-generated oligonucleotide arrays for rapid dna sequence anal- Particle Swarm Algorithm,” in Proceedings of the IEEE International ysis,” in Proc. Natl. Acad. Sci., vol. 96, USA, 1994, pp. 5022–5026.
Conference on Systems, Man and Cybernetics, vol. 5, 1997, pp. 4104– [2] R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P.
Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D.
Bloomfield, and E. S. Lander, “Molecular classification of cancer: Derivations, and Mathematical Insights,” 2005. [Online]. Available: Class discovery and class prediction by gene expression monitoring,” Science, vol. 286, pp. 531–537, 1999.
[23] A. Moraglio and J. Togelius, “Geometric particle swarm optimization [3] Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, and for the sudoku puzzle,” in Proceedings of the Genetic and Evolutionary A. J. Levine, “Broad patterns of gene expression revealed by clustering Computation Conference (GECCO 2007). London, UK: ACM Press, analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proc. Natl. Acad. Sci, vol. 96, pp. 6745–6750, 1999.
[24] E. Alba and M. Group, “Mallba: A Library of Skeletons for Combina- [4] L. J. van ’t Veer, H. Dai, M. J. van de Vijver, Y. D. He, A. A. M.
torial Optimisation,” in Proceedings of the Euro-Par, B. Monien and Hart, M. Mao, H. L. Peterse, K. van der Kooy, M. J. Marton, A. T.
R. Feldmann, Eds., vol. LNCS 2400, 2002, pp. 927–932.
Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, [25] S. Cahon, E.-G. Talbi, and N. Melab, “Paradiseo: A framework for R. Bernards, and S. H. Friend, “Gene expression profiling predicts parallel and distributed metaheuristics,” in Proc. of Internationa the clinical outcome of breast cancer,” Nature, vol. 415, pp. 530–536, Parallel and Distributed Processing Symposium, 2003, pp. 144–155.
[5] Juliusdottir, D. Corne, E. Keedwell, and A. Narayanan, “Two-phase EA/K-NN for feature selection and classification in cancer microarray http://www.csie.ntu.edu.tw/ cjlin/libsvm, 2002.
datasets,” in CIBCB, 2005, pp. 1–8.
[27] K. Deb and A. R. Reddy, “Classification of two-class cancer data [6] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection reliably using evolutionary algorithms,” KanGAL Report No. 2003001, for cancer classification using support vector machines,” Machine Learning, vol. 46, no. 1-3, pp. 389–422, 2002. [Online]. Available: [28] L. Yu and H. Liu, “Redundancy based feature selection for microarry data,” in Proceedings of International Conference on Knowledge [7] J. Kohavi and G. H. John, “The wrapper approach,” in Feature Discovery and Data Mining, Seattle, Washington, 2004, pp. 22–25.
Selection for Knowledge Discovery and Data Mining, 1998, pp. 33–50.
[29] B. Liu, Q. Cui, T. Jiang, and S. Ma, “A combinational feature [Online]. Available: citeseer.ist.psu.edu/article/kohavi97wrapper.html selection and ensemble neural network method for classification of [8] Fix and J. L. Hodges, “Discriminatory analysis. nonparametric dis- gene expression data,” BMC Bioinformatics, vol. 5, pp. 136–148, 2004.
crimination: Consistency properties,” 4, US Air Force School of [30] Q. Shen, W. M. Shi, W. Kong, and B. X. Ye, “A combination of Aviation Medicine, Randolph Field, TX, Tech. Rep., 1951.
modified particle swarm optimization algorithm and support vector [9] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learn- machine for gene selection and classification,” Talanta, 2006.
ing, vol. 20, no. 3, pp. 273–297, 1995.
[31] R. T. Silver, “Anagrelide is effective in treating patients with [10] M. Narendra and K. Fukunaga, “A branch and bound algorithm for hydroxyurea-resistant thrombocytosis in patients with chronic myeloid feature subset selection,” IEEE Trans. on Computer, vol. 26, pp. 917– leukemia,” Leukemia, vol. 19, no. 3, p. 3943, 2005. [Online]. Available: [11] J. Yang and V. Honavar, Feature Extraction, Construction and Selec- [32] C. L. Loprinzi, “Venlafaxine in management of hot flashes in survivors tion: A Data Mining Perspective, 1998, ch. Feature Subset Selection of breast cancer: a randomised controlled trial,” The Lancet, vol. 356, Using a Genetic Algorithm, pp. 177–136.

Source: http://gardeux-vincent.eu/These/Papiers/Bibli1/Genomique/Alba07.pdf

Covered stent published article

ATRIUM VASCULAR GRAFTS CLINICAL PUBLICATIONS / PRESENTATIONS LITERATURE REVIEW Contents 1. Dialysis Applications 2. Aortic Applications 3. Peripheral Applications 4. Patch Applications 5. General Vascular Graft Applications 6. Supportive Graft Publications 1. DIALYSIS APPLICATIONS Evaluation of Early Cannulation, Patency, and Complications in Flixene™ Comp

Microsoft word - 06 app note automated colony counting.doc

A DIVISION OF THE SYNOPTICS GROUP Automated colony counting for molecular biologists How it can improve productivity when selecting recombinant clone Introduction Transformation efficiency is a measure of the amount transmitted and darkfield illumination, enabling coloniesof cells within the baceterial culture that are able to takeand plaques to be counted on a range of media types.