Are grammatical representations useful for learning from biological sequence data? - a case study

↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Sökning: onr:"swepub:oai:DiVA.org:liu-191066" > Are grammatical rep...

1 av 1
Föregående post
Nästa post
Till träfflistan

Are grammatical representations useful for learning from biological sequence data? - a case study

Muggleton, S.H. (författare): Department of Computer Science, University of York, Heslington, York, UK; Department of Computing, Imperial College of Science, Technology and Medicine, London, UK

Bryant, C.H. (författare): Department of Computer Science, University of York, Heslington, York, UK; School of Computer and Mathematical Sciences, The Robert Gordon University, Aberdeen, Scotland, UK

Srinivasan, A. (författare): Oxford University Computing Laboratory, Wolfson Building, Oxford, UK

visa fler...

Whittaker, A. (författare): SmithKline Beecham, New Frontiers, Science Park, Harlow, Essex, UK; Psygnosis Ltd. Abingdon, UK

Topp, Simon (författare): SmithKline Beecham, New Frontiers, Science Park, Harlow, Essex, UK

Rawlings, Chris (författare): SmithKline Beecham, New Frontiers, Science Park, Harlow, Essex, UK; Oxagen Ltd. Milton Park, Abingdon, UK

visa färre...

(creator_code:org_t)

Linköping University Electronic Press, 2001
Engelska 39 s.
Serie: Linköping Electronic Articles in Computer and Information Science, 1401-9841 ; Vol.6:13

Relaterad länk:: https://libris.kb.se...; visa fler...; https://liu.diva-por... (primary) (Raw object); https://urn.kb.se/re...; visa färre...

Rapport (övrigt vetenskapligt/konstnärligt)

Abstract Ämnesord

Stäng

This paper investigates whether Chomsky-like grammar representations are useful for learning cost-effective, comprehensible predictors of members of biological sequence families. The Inductive Logic Programming (ILP) Bayesian approach to learning from positive examples is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). Collectively, five of the co-authors of this paper, have extensive expertise on NPPs and general bioinformatics methods. Their motivation for generating a NPP grammar was that none of the existing bioinformatics methods could provide sufficient cost-savings during the search for new NPPs. Prior to this project experienced specialists at SmithKline Beecham had tried for many months to hand-code such a grammar but without success. Our best predictor makes the search for novel NPPs more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity. As far as these authors are aware, this is both the first biological grammar learnt using ILP and the first real-world scientific application of the ILP Bayesian approach to learning from positive examples.A group of features is derived from this grammar. Other groups of features of NPPs are derived using other learning strategies. Amalgams of these groups are formed. A recognition model is generated for each amalgam using C4.5 and C4.5rules and its performance is measured using both predictive accuracy and a new cost function, Relative Advantage ( RA ). The highest RA was achieved by a model which includes grammar-derived features. This RA is significantly higher than the best RA achieved without the use of the grammar-derived features. Predictive accuracy is not a good measure of performance for this domain because it does not discriminate well between NPP recognition models: despite covering varying numbers of (the rare) positives, all the models are awarded a similar (high) score by predictive accuracy because they all exclude most of the abundant negatives.

Till lärosätets databas

1 av 1
Föregående post
Nästa post
Till träfflistan

Hitta mer i SwePub

Av författaren/redakt...: Muggleton, S.H.; Bryant, C.H.; Srinivasan, A.; Whittaker, A.; Topp, Simon; Rawlings, Chris

Om ämnet

NATURVETENSKAP: NATURVETENSKAP; och Data och informa ...; och Bioinformatik

Delar i serien: Linköping Electr ...

Av lärosätet: Linköpings universitet

Sök utanför SwePub

Sök vidare i:: Google; Google Book Search; Google Scholar

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

LIBRIS.kb.se

Are grammatical representations useful for learning from biological sequence data? - a case study

Ämnesord

Publikations- och innehållstyp

Till lärosätets databas

Hitta mer i SwePub

Sök utanför SwePub