SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Johansson Tuve) "

Sökning: WFRF:(Johansson Tuve)

  • Resultat 1-10 av 54
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Ahlberg, Ernst, et al. (författare)
  • Using conformal prediction to prioritize compound synthesis in drug discovery
  • 2017
  • Ingår i: Proceedings of Machine Learning Research. - Stockholm : Machine Learning Research. ; , s. 174-184
  • Konferensbidrag (refereegranskat)abstract
    • The choice of how much money and resources to spend to understand certain problems is of high interest in many areas. This work illustrates how computational models can be more tightly coupled with experiments to generate decision data at lower cost without reducing the quality of the decision. Several different strategies are explored to illustrate the trade off between lowering costs and quality in decisions.AUC is used as a performance metric and the number of objects that can be learnt from is constrained. Some of the strategies described reach AUC values over 0.9 and outperforms strategies that are more random. The strategies that use conformal predictor p-values show varying results, although some are top performing.The application studied is taken from the drug discovery process. In the early stages of this process compounds, that potentially could become marketed drugs, are being routinely tested in experimental assays to understand the distribution and interactions in humans.
  •  
2.
  • Boström, Henrik, et al. (författare)
  • Evaluation of a variance-based nonconformity measure for regression forests
  • 2016
  • Ingår i: 5th International Symposium on Conformal and Probabilistic Prediction with Applications, COPA 2016. - Cham : Springer. - 9783319333946 - 9783319333953 ; , s. 75-89
  • Konferensbidrag (refereegranskat)abstract
    • In a previous large-scale empirical evaluation of conformal regression approaches, random forests using out-of-bag instances for calibration together with a k-nearest neighbor-based nonconformity measure, was shown to obtain state-of-the-art performance with respect to efficiency, i.e., average size of prediction regions. However, the use of the nearest-neighbor procedure not only requires that all training data have to be retained in conjunction with the underlying model, but also that a significant computational overhead is incurred, during both training and testing. In this study, a more straightforward nonconformity measure is investigated, where the difficulty estimate employed for normalization is based on the variance of the predictions made by the trees in a forest. A large-scale empirical evaluation is presented, showing that both the nearest-neighbor-based and the variance-based measures significantly outperform a standard (non-normalized) nonconformity measure, while no significant difference in efficiency between the two normalized approaches is observed. Moreover, the evaluation shows that state-of-theart performance is achieved by the variance-based measure at a computational cost that is several orders of magnitude lower than when employing the nearest-neighbor-based nonconformity measure. 
  •  
3.
  • Johansson, Ulf, et al. (författare)
  • Accuracy on a Hold-out Set : The Red Herring of Data Mining
  • 2006
  • Ingår i: Proceedings of SAIS 2006. - Umeå : Swedish Artificial Intelligence Society - SAIS, Umeå University. ; , s. 137-146
  • Konferensbidrag (refereegranskat)abstract
    • Abstract: When performing predictive modeling, the overall goal is to generate models likely to have high accuracy when applied to novel data. A technique commonly used to maximize generalization accuracy is to create ensembles of models, e.g., averaging the output from a number of individual models. Several, more or less sophisticated techniques, aimed at either directly creating ensembles or selecting ensemble members from a pool of available models, have been suggested. Many techniques utilize a part of the available data not used for the training of the models (a hold-out set) to rank and select either ensembles or ensemble members based on accuracy on that set. The obvious underlying assumption is that increased accuracy on the hold-out set is a good indicator of increased generalization capability on novel data. Or, put in another way, that there is high correlation between accuracy on the hold-out set and accuracy on yet novel data. The experiments in this study, however, show that this is generally not the case; i.e. there is little to gain from selecting ensembles using hold-out set accuracy. The experiments also show that this low correlation holds for individual neural networks as well; making the entire use of hold-out sets to compare predictive models questionable
  •  
4.
  • Johansson, Ulf, et al. (författare)
  • Building Neural Network Ensembles using Genetic Programming
  • 2006
  • Ingår i: The 2006 IEEE International Joint Conference on Neural Network Proceedings. - Umeå : IEEE. - 0780394909 - 9780780394902 ; , s. 1260-1265, s. 117-126
  • Konferensbidrag (refereegranskat)abstract
    • In this paper we present and evaluate a novel algorithm for ensemble creation. The main idea of the algorithm is to first independently train a fixed number of neural networks (here ten) and then use genetic programming to combine these networks into an ensemble. The use of genetic programming makes it possible to not only consider ensembles of different sizes, but also to use ensembles as intermediate building blocks. The final result is therefore more correctly described as an ensemble of neural network ensembles. The experiments show that the proposed method, when evaluated on 22 publicly available data sets, obtains very high accuracy, clearly outperforming the other methods evaluated. In this study several micro techniques are used, and we believe that they all contribute to the increased performance. One such micro technique, aimed at reducing overtraining, is the training method, called tombola training, used during genetic evolution. When using tombola training, training data is regularly resampled into new parts, called training groups. Each ensemble is then evaluated on every training group and the actual fitness is determined solely from the result on the hardest part.
  •  
5.
  • Johansson, Ulf, et al. (författare)
  • Chipper : A Novel Algorithm for Concept Description
  • 2008
  • Ingår i: Frontiers in Artificial Intelligence and Applications. - : IOS Press. - 9781586038670 ; , s. 133-140
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, several demands placed on concept description algorithms are identified and discussed. The most important criterion is the ability to produce compact rule sets that, in a natural and accurate way, describe the most important relationships in the underlying domain. An algorithm based on the identified criteria is presented and evaluated. The algorithm, named Chipper, produces decision lists, where each rule covers a maximum number of remaining instances while meeting requested accuracy requirements. In the experiments, Chipper is evaluated on nine UCI data sets. The main result is that Chipper produces compact and understandable rule sets, clearly fulfilling the overall goal of concept description. In the experiments, Chipper's accuracy is similar to standard decision tree and rule induction algorithms, while rule sets have superior comprehensibility.
  •  
6.
  • Johansson, Ulf, et al. (författare)
  • Conformal Prediction Using Decision Trees
  • 2013
  • Ingår i: IEEE 13th International Conference on Data Mining (ICDM). - : IEEE Computer Society. - 9780769551081 ; , s. 330-339
  • Konferensbidrag (refereegranskat)abstract
    • Conformal prediction is a relatively new framework in which the predictive models output sets of predictions with a bound on the error rate, i.e., in a classification context, the probability of excluding the correct class label is lower than a predefined significance level. An investigation of the use of decision trees within the conformal prediction framework is presented, with the overall purpose to determine the effect of different algorithmic choices, including split criterion, pruning scheme and way to calculate the probability estimates. Since the error rate is bounded by the framework, the most important property of conformal predictors is efficiency, which concerns minimizing the number of elements in the output prediction sets. Results from one of the largest empirical investigations to date within the conformal prediction framework are presented, showing that in order to optimize efficiency, the decision trees should be induced using no pruning and with smoothed probability estimates. The choice of split criterion to use for the actual induction of the trees did not turn out to have any major impact on the efficiency. Finally, the experimentation also showed that when using decision trees, standard inductive conformal prediction was as efficient as the recently suggested method cross-conformal prediction. This is an encouraging results since cross-conformal prediction uses several decision trees, thus sacrificing the interpretability of a single decision tree.
  •  
7.
  •  
8.
  • Johansson, Ulf, et al. (författare)
  • Evaluating Ensembles on QSAR Classification
  • 2009
  • Konferensbidrag (refereegranskat)abstract
    • Novel, often quite technical algorithms, for ensembling artificial neural networks are constantly suggested. Naturally, when presenting a novel algorithm, the authors, at least implicitly, claim that their algorithm, in some aspect, represents the state-of-the-art. Obviously, the most important criterion is predictive performance, normally measured using either accuracy or area under the ROC-curve (AUC). This paper presents a study where the predictive performance of two widely acknowledged ensemble techniques; GASEN and NegBagg, is compared to more straightforward alternatives like bagging. The somewhat surprising result of the experimentation using, in total, 32 publicly available data sets from the medical domain, was that both GASEN and NegBagg were clearly outperformed by several of the straightforward techniques. One particularly striking result was that not applying the GASEN technique; i.e., ensembling all available networks instead of using the subset suggested by GASEN, turned out to produce more accurate ensembles.
  •  
9.
  • Johansson, Ulf, et al. (författare)
  • Evaluating Standard Techniques for Implicit Diversity
  • 2008
  • Ingår i: Advances in Knowledge Discovery and Data Mining. - Berlin, Heidelberg : Springer Berlin/Heidelberg. - 9783540681243 - 9783540681250 ; , s. 592-599
  • Konferensbidrag (refereegranskat)abstract
    • When performing predictive modeling, ensembles are often utilized in order to boost accuracy. The problem of how to maximize ensemble accuracy is, however, far from solved. In particular, the relationship between ensemble diversity and accuracy is, especially for classification, not completely understood. More specifically, the fact that ensemble diversity and base classifier accuracy are highly correlated, makes it necessary to balance these properties instead of just maximizing diversity. In this study, three standard techniques to obtain implicit diversity in neural network ensembles are evaluated using 14 UCI data sets. The experiments show that standard resampling; i.e. dividing the training data by instances, produces more diverse models, but at the expense of base classifier accuracy, thus resulting in less accurate ensembles. Building ensembles using neural networks with heterogeneous architectures improves test set accuracies, but without actually increasing the diversity. The results regarding resampling using features are inconclusive, the ensembles become more diverse, but the level of test set accuracies is unchanged. For the setups evaluated, ensemble training accuracy and base classifier training accuracy are positively correlated with ensemble test accuracy, but the opposite holds for diversity; i.e. ensembles with low diversity are generally more accurate.
  •  
10.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 54

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy