SwePub
Sök i LIBRIS databas

  Utökad sökning

L773:0886 9383
 

Sökning: L773:0886 9383 > (2020-2021) > Multivariate patent...

Multivariate patent analysis : using chemometrics to analyze collections of chemical and pharmaceutical patents

Sjögren, Rickard (författare)
Umeå universitet,Kemiska institutionen
Stridh, Kjell (författare)
Umeå universitet,Kemiska institutionen
Skotare, Tomas (författare)
Umeå universitet,Kemiska institutionen
visa fler...
Trygg, Johan (författare)
Umeå universitet,Kemiska institutionen,Sartorius Stedim Data Analytics, Umeå, Sweden
visa färre...
 (creator_code:org_t)
2018-05-10
2020
Engelska.
Ingår i: Journal of Chemometrics. - : John Wiley & Sons. - 0886-9383 .- 1099-128X. ; 34:1
  • Tidskriftsartikel (refereegranskat)
Abstract Ämnesord
Stäng  
  • Patents are an important source of technological knowledge, but the amount of existing patents is vast and quickly growing. This makes development of tools and methodologies for quickly revealing patterns in patent collections important. In this paper, we describe how structured chemometric principles of multivariate data analysis can be applied in the context of text analysis in a novel combination with common machine learning preprocessing methodologies. We demonstrate our methodology in 2 case studies. Using principal component analysis (PCA) on a collection of 12338 patent abstracts from 25 companies in big pharma revealed sub-fields which the companies are active in. Using PCA on a smaller collection of patents retrieved by searching for a specific term proved useful to quickly understand how patent classifications relate to the search term. By using orthogonal projections to latent structures (O-PLS) on patent classification schemes, we were able to separate patents on a more detailed level than using PCA. Lastly, we performed multi-block modeling using OnPLS on bag-of-words representations of abstracts, claims, and detailed descriptions, respectively, showing that semantic variation relating to patent classification is consistent across multiple text blocks, represented as globally joint variation. We conclude that using machine learning to transform unstructured data into structured data provide a good preprocessing tool for subsequent chemometric multivariate data analysis and provides an easily interpretable and novel workflow to understand large collections of patents. We demonstrate this on collections of chemical and pharmaceutical patents.

Ämnesord

NATURVETENSKAP  -- Kemi -- Annan kemi (hsv//swe)
NATURAL SCIENCES  -- Chemical Sciences -- Other Chemistry Topics (hsv//eng)

Nyckelord

text analytics
OnPLS
principal component analysis
orthogonal projections to latent structures
feature engineering

Publikations- och innehållstyp

ref (ämneskategori)
art (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Hitta mer i SwePub

Av författaren/redakt...
Sjögren, Rickard
Stridh, Kjell
Skotare, Tomas
Trygg, Johan
Om ämnet
NATURVETENSKAP
NATURVETENSKAP
och Kemi
och Annan kemi
Artiklar i publikationen
Journal of Chemo ...
Av lärosätet
Umeå universitet

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy