class: center, middle, inverse, title-slide # Text Mining for Academic Journals ## Miao Yu ### Pawliszyn Research Group ### 2017/09/20 --- class: inverse, center, middle # Papers ??? This presentation is all about papers or research articles published on academic journals My topic will cover my experiences about using text mining to find hot point in analytical chemistry. I will also cover some usages about text mining for PDF or young scientist. This slides is actually a template. You could make modification to fit your study. --- ## How far away between PostDoc and PI? -- <img src="https://yufree.github.io/presentation/figure/pdpi.png" width="80%" style="display: block; margin: auto;" /> von Bartheld CS, Houmanfar R, Candido A. (2015) https://doi.org/10.7717/peerj.1262 ??? It's not easy to survive in academia, especially when lots of the technique are younger than you/your kids --- class: inverse, center, middle ### Genomics/Protomics/Metabolomics ### Quantum Computing ### Big Data ### Nanotechnology ### Artificial Intelligence ### Blockchain ### 3D Printing ### Precision Medicine ### MOOCs ??? More and more new trend or technology will change our life, including our research. We might have to change our research in the near future by using new technology. Text mining is one of them which I think almost any researcher could benefit from it. Text Mining, or natural language processing, use statistical method to find the rules behind literatures. It cover term frequency, term frequency–inverse document frequency, word co-ocurrences, topic modeling, sentiment analysis, etc. Let's start with a simple question: how to change research area --- # Common workflow -- ## 1. Search keywords on WOS/Pubmed/Scopus -- ## 2. Collect papers -- ## 3. Start with few reviews/feature articles -- ## 4. Update with RSS ??? Keywords tracking is always keywords orientated --- class: inverse, center, middle # Solid Phase MicroExtraction ??? My topic in Waterloo --- # Subdiscipline in SPME ```r library(scifetch) query <- 'Solid Phase MicroExtraction[MH] AND 2007/08:2017/08[DP]' tmdf <- getpubmed(query, start = 1, end = 10000) %>% getpubmedtbl() %>% mutate(time = as.POSIXct(date, origin = "1970-01-01"), month = round_date(date, "month")) ``` <img src="textmining_files/figure-html/trend-1.png" style="display: block; margin: auto;" /> ??? I developed scifetch package to fetch data from pubmed and google scholar 2668 records Keywords show no idea about the whole discipline I found SPME is a technique with its hard core around extraction. I don't think I should go into this area too much. Instead, I prefer some words related to this technique like mass spectrum. --- class: inverse, center, middle # Analytical Chemistry ??? PDF should have a larger scope as a Prof? no, editor Here I collected the information from all the papers published in *Analytical Chemistry* in the past five years five year means the trends currently happened while not included in textbook 8676 records found --- # Frequently used words <img src="textmining_files/figure-html/Frequently used words-1.png" style="display: block; margin: auto;" /> ??? Something about mass spectrum and biological analysis --- # Temporal Trends - Growing Words <img src="textmining_files/figure-html/time-1.png" style="display: block; margin: auto;" /> ??? term frequency is count data, so I use Generalized Linear Regression (binomial)to compute the slope about time. Here I select 9 fast growing words in the titles. We could find more concepts are actually about living system. Also I found metabolomics which is actually what I did in the past year since I came Waterloo. --- # Temporal Trends - Shrinking Words <img src="textmining_files/figure-html/time2-1.png" style="display: block; margin: auto;" /> ??? It is interesting to find some words which currently get into our textbook are disappearing as hot spots. --- class: inverse, center, middle # Science & Nature ??? Go further 14964 records founds I collected paper published on S&N for 3 years --- # Topic Model <img src="textmining_files/figure-html/TMplot-1.png" style="display: block; margin: auto;" /> ??? Topic modeling use latent Dirichlet allocation to find potential topic among the papers. Climate change, quantum, tumour, protein, neurons, evolution and astronomy are what they like. My research should be mapped into one of those topics. --- # Insights in Text Mining ## Mass Spectrum ## Bioanalytical Chemistry ## Solid Phase MicroExtraction ## Living Samples -- Use Mass Spectrum and Solid Phase MicroExtraction to study Omics by **in vivo** sampling --- class: inverse, center, middle # Metabolomics --- # Sentiment analysis <img src="textmining_files/figure-html/SAplot-1.png" style="display: block; margin: auto;" /> ??? Should we use neutral word? Sentiment analysis could show this. --- # Journal Tones <img src="textmining_files/figure-html/EI-1.png" style="display: block; margin: auto;" /> ??? Here I use term frequency–inverse document frequency, which could show the preferred words in different parts of records. Science has American accent and Nature has British tones --- # Word usage <img src="textmining_files/figure-html/data-1.png" style="display: block; margin: auto;" /><img src="textmining_files/figure-html/data-2.png" style="display: block; margin: auto;" /> ??? Whether data is a plural or singular nouns? You might also use text mining to write papers. --- # Final Comments -- ## Larger scope -- ## Capture trends -- ## Write papers --- class: inverse, center, middle # From Yahoo! to Google ??? Yahoo! use manually category. Google use algorithm. --- class: center, middle # Thanks! Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). Source Code is [**here**](https://github.com/yufree/presentation/blob/gh-pages/textmining/textmining.Rmd) Contact me [**here**](https://yufree.cn/en) or @yu_free Online textbook about textmining: http://tidytextmining.com/ ??? For xaringan, @xieyihui Keywords for pubmed: https://www.ncbi.nlm.nih.gov/books/NBK3827/ --- # Scan <img src="textmining_files/figure-html/QRcode-1.png" style="display: block; margin: auto;" />