Font Size: a A A

Design And Implementation Of Materials Genome, A High-Performance Material Science Information Searching System

Posted on:2018-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:R Z LiFull Text:PDF
GTID:2321330518497724Subject:Physical chemistry
Abstract/Summary:PDF Full Text Request
Cheminformatics and Materials Informatics play important roles in modern chem-istry researches. In Cheminformatics, computer-assisted chemical information search-ing, including keyword-based searching, numeric filtering and structural searching,become the critical part of efficient information management in cheminformatics.We designed a high-performance chemical structure & data search engine called DCAIKU, built on CouchDB and ElasticSearch engines. DCAIKU can handle complex filtering efficiently, and converts the chemical structure similarity search problem into a general text search problem to utilize off-the-shelf full-text search engines. DCAIKU also supports flexible document structures and heterogeneous datasets with the help of schema-less document database.Our evaluations show that DCAIKU can handle both keyword search and struc-tural search against millions of records with both high accuracy and low latency. Com-pared with other similar searching services, our accuracy reaches the same level while the latency being lowered by one order of magnitude, and the throughput raised by one order of magnitude. We expect that DCAIKU will lay the foundation towards large-scale and cost-effective keyword and structural search in materials science and chemistry research.
Keywords/Search Tags:Material Informatics, Search Engine, Cheminformatics, Structural Search, Schema-less Databases
PDF Full Text Request
Related items