You are on page 1of 15

TCM FORMULAE SYSTEM ANALYSIS

Use Case Description


The orchestration of formulae through choices of herbs gives birth to a bipartite graph between formulae and herbs. A formula can also be interpreted as a collaboration of its member herbs, and thus the phenomena of formulae orchestration leads to an herb-collaboration graph. The efficacy and toxicity of TCM Formulae are supposed to be based on complex interactions between herbs, which draw implications on effectiveness of herb-inspired drug discovery. In this empirical study, we will seek scientific interpretation of complex relationships between herbs involved in formulae orchestration. The methods under investigation includes: (1) Complex network analysis; (2) Frequent Semantic Sub-graph Mining. Tong Yu 2008-12-6

CONTENTS
Contents ..................................................................................................................................................... 2 Use case: TCM Formula Interpretation and Analysis ................................................................................. 3 Dataset ....................................................................................................................................................... 6 Query #1 Find all herbs originated from a Particular Medicinal Part ......................................................... 7 Query #2 Find the Contraindication of the herbs that satisies the condition defined in query #1 ............ 8 Query #3 Find names of all herbs ............................................................................................................... 9 Query #4 Find instances by a name.......................................................................................................... 10 Query #5 Find instances by a name within a named graph ..................................................................... 11 Query #6 Find properties of a herb .......................................................................................................... 12 Query #7 Find drugs that interact with a herb ......................................................................................... 13 Query #8 Find How many herbs a formula has ........................................................................................ 14

USE CASE: TCM FORMULA INTERPRETATION AND ANALYSIS

Formulae Search and Interpretation is an important function of the Decision-Support System for drug discovery. A formal knowledge representation mechanism is needed to address the complexity of biomedical domain model. For example, it is feasible to define a formula as a set of instances of the Class Herb. The specification of a formula includes the enumeration of its individual herbs, their dosage and functional role, the rationale and methodology of herb mixture, the mechanism of action, and its relationships with other medical concepts. A formula can be a well-defined classical formula, a customized prescription of drugs addressing the particular situation of a patient, or the drugs that a patient orchestrates and consumes for oneself. Given a set of formulae, a clinician needs to accurately specify them, to discern their differences, and to identify their relationships. The ability to formally define formulae, and the relationship between them, is a necessity for our problem. The discussion around formulae is also applicable to other medical concepts, such as examination, disease, and diagnosis as well.
ACTORS TCM PRACTITIONER
A TCM Practitioner is a medical professional who can utilize the knowledge of TCM Formulae System for the care of patients.

TCM KNOWLEDGE CURATOR


A medical professional who can manage, maintain, and use a knowledge management system for TCM.

TCM KNOWLEDGE ANALYST


TCM Knowledge Analyst is a medical professional who can use the TCM knowledge and data to perform knowledge discovery experiments.

IT ANALYST
A IT Analyst is an informatics professional who can develop a knowledge management system based on the requirement of the above TCM domain experts.

STORY FROM TCM DOMAIN


In Chinese history, TCM practitioners have developed an elaborate formulae system, under the essential principle that a formula should embody a proper herb companionship involving hierarchical social relationships, between a single dominant figure, the king herb, and a set of subordinate figures such as minister herbs, assistant herbs, and carrier herbs. For example, Four-Gentleman decoction (FGD) is an ancient herbal formula documented in the Song Dynasty(960-1279), with medical actions to nurture the qi (the vital substance of human body as is stated in TCM). It consists of the following four herbs: (1) Ginseng, the king herb (FGD-K for short) that nurtures the qi, (2) Atractylodis, the minister herb (FGD-M for short) that strengthens the spleen and dries dampness, (3) Sclerotium, the assistant herb (FGD-A for short) that assists the king and the minister in strengthening the Spleen, and (4) Glycyrrhizae uralensis, the carrier herb (FGD-C for short) that warms and harmonizes the stomach.

APPROACH
In our frequent semantic subgraph discovery (FSSD) process, we first construct a set of named graphs using semantic queries, and then feed the resultset to an existing frequent pattern mining operator which treats each statement as an item.

TIGGERS
A Knowledge Analyst specifies a collection of formulae to analyze various associations among herbs. For example, a pharmaceutical knowledge analyst retrieves a set of Kidney Yang Deficiency(KYD) (an instance of TCM Syndrome) related EHRs to discover interesting patterns related to drug usage, efficacy, and safety.

PRECONDITIONS
Related datasets are integrated based on the herb ontology and accessible through a semantic web query interface.

PROCESS
The process is as follows: (1) These EHR transactions are represented as semantic graphs with statements that capture medical facts, such as: patient condition, symptoms, drug usage, efficacy, and safety, etc.

(2) Merge EHR graphs with domain knowledge, and add annotations to every EHR graph utilizing Sparql construction queries. (3) Specify restrictions in Sparql, and run the Sparql against the EHR graphs to get a (aggregated) resultset of named graphs (This step aims at supporting analysts to filter the data according to problem context.) (4) Run frequent semantic subgraph discovery algorithm against the resultset, which results into a set of discovered patterns. (5) Feed the discovered patterns into a semantic search portal (which can discover hidden associations between instances and/or concepts), and add semantic annotations to every pattern based on semantic associations. (6) Visualize patterns for user interpretation.

POSTCONDITIONS
Discovered patterns can be annotated with domain knowledge based on semantic associations of concepts, and visualized as a semantically-enriched graph to facilitate human interpretation. These patterns can then be submitted to a knowledge base as knowledge proposals.

DATASET
Information about drugs can be found in the product list of pharmaceutical companies, and also in on-line drug databases. Drugbank [Wishart et al, 2006] [Wishart et al, 2008] is a richly annotated resource that combines detailed drug data with comprehensive drug target and drug action information. DrugBank contained 4,252 drug entries, including 1,178 FDA-approved drugs (1,065 small molecules and 113 proteins/peptides) and 3,074 drugs under investigation (experimental drugs). The FDA-approved drugs target 394 human proteins in total. An analysis of the chemical similarity between drugs targeting these proteins reveals that most of the drugs have a distinct chemical structure. Drugbank contains a card for Huperaine A (DB01928), which contains Huperaine As drug type and category, chemical properties, drug target, etc. ( http://www.drugbank.ca/cgi-bin/getCard.cgi?CARD=DB01928). The TCM Drug databases contain three major categories of knowledge: (1) The properties of TCM Herbs; (2) The knowledge of TCM Formulae as mixture of herbs; (3) Chemical compounds of herbs.

TCM FORMULA DATABASE


Database of Chinese Medical Formula (DCMF) contains the knowledge of TCM Formulae as mixture of herbs. The database contains more than 85,000 records of prescribed formula. Each record contains a formulas formal name, clinical usage, efficacy and safety issues, ingredients, etc.

TCM HERB DATABASE


Traditional Chinese Drug Database (TCDBASE) contains over 11,000 records of medicinal herbs. Each record contains an herbs name, biological properties, manufacturing methods, clinical usage, efficacy and safety issues, etc.

TCM CHEMICAL DATABASE


Database of Chemical Composition from Chinese Herbal Medicine (DCCCHM) contains over 4500 records of chemical compounds. Main contents include the chemicals name, physical/chemical properties, herbal origin, pharmacological action, efficacy and safety issues, etc. The Chemical compounds of herbs databases can link to Western drug databases via chemicals formal name, and serve as a connecting point of TCM and Western medicine.

QUERY #1 FIND ALL HERBS ORIGINATED FROM A PARTICULAR MEDICINAL PART

The query specifies all herbs that have a medicinal part as shu pi. This query aims at testing the capacity retrieval entity by property, which often involves cross-table joins.

PREFIX dart: < http://dart.zju.edu.cn/dartcore/dart> SELECT DISTINCT ? med WHERE { ?med dart:hasMedicinalPart ?yybw . ?yybw dart:hasName ?name . FILTER( ?name=' '). } HERE THE VARIABLE ?MED IS A DRUG BECAUSE WE HAVE SPECIFY THE DOMIAN OF HASMEDICINALPART TO BE DRUG THE WHERE CLAUSE AIMS AT SPECIFYING THE FILTERING CODITION: THE HERB HAS MEDICINAL PART AS SHU PI. Expected Results

QUERY #2 FIND THE CONTRAINDICATION OF THE HERBS THAT SATISIES THE CONDITION DEFINED IN QUERY #1

The query specifies the Contraindications of the herbs that have a medicinal part as shu pi. This query aims at testing the capacity retrieval property by related property, which often involves cross-table joins.

PREFIX dart: < http://dart.zju.edu.cn/dartcore/dart> SELECT DISTINCT WHERE { ?med dart:hasContraindication ?contradication . ?med dart:hasMedicinalPart ?medicinalPart. ?medicinalPart dart:hasName ?name . FILTER( ?name=' '). } BY USING ?MEDICINALPART AS AN OBJECT OF ONE TRIPLE AND THE SUBJECT OF ANOTHER, WE TRAVERSE MULTIPLE LINKS IN THE GRAPH. THE WHERE CLAUSE AIMS AT SPECIFYING THE FILTERING CODITION: THE HERB HAS MEDICINAL PART AS SHU PI. THE WHERE CLAUSE AIMS AT SPECIFYING THE ASSOCIATION: THE CONTRAINDICATION HAS MEDICINAL PART BY HERB. Expected Results ?contradication

QUERY #3 FIND NAMES OF ALL HERBS

The query specifies the list of all herb names. This query aims at testing the capacity for thorough searching of certain information across selected tables.

SELECT DISTINCT WHERE

?name

{ ?med a dart:Herb ?med dart:hasName ?name .} LIMIT 50 THE WHERE CLAUSE AIMS AT SPECIFYING THE TYPE OF ? MED AS HERB. THE QUERY POTENTIALLY INVOLVES THE QUERY OF INFORMATION ACROSS TABLES THE QUERY MIGHT BE ANSWERED WITH LARGE RESULTSET; THE QUERY MIGHT NOT BE FINISHED ONLINE. THE SPARQL KEYWORD A IS A SHORTCUT FOR THE COMMON PREDICATE RDF:TYPE, GIVING THE CLASS OF A RESOURCE. LIMIT IS A SOLUTION MODIFIER THAT LIMITS THE NUMBER OF ROWS RETURNED FROM A QUERY. SPARQL HAS TWO OTHER SOLUTION MODIFIERS: ORDER BY FOR SORTING QUERY SOLUTIONS ON THE VALUE OF ONE OR MORE VARIABLES; OFFSET, USED IN CONJUNCTION WITH LIMIT AND ORDER BY TO TAKE A SLICE OF A SORTED SOLUTION SET (E.G. FOR PAGING). THE QUERY MIGHT BRING CHALLENGES TO RESULT TRANSDFDMMISSION.

Expected Results

QUERY #4 FIND INSTANCES BY A NAME

The query specifies all instances that has the name with their classes.. This query aims at testing the capacity for thorough searching of certain information across all tables.

SELECT DISTINCT ?instance, ?class WHERE { ?instance rdf:type ?class ?instance dart:hasName .} THE WHERE CLAUSE AIMS AT SPECIFYING THE SEARCH KEYWORD . THE QUERY POTENTIALLY INVOLVES THE GENERATION OF SEMANTIC INDEX;

Expected Results

QUERY #5 FIND INSTANCES BY A NAME WITHIN A NAMED GRAPH

The query specifies a set of properties of all instances that has the name . This query aims at testing the expressiveness of ontology and mapping.

SELECT DISTINCT ?instance, ?class FROM <http://dart.zju.edu.cn/dartcore/dart/herbase> WHERE { ?instance rdf:type ?class. ?instance dart:hasName . }

THE FROM CLAUSE AIMS AT SPECIFYING THE SUBGRAPH HERB.

Expected Results

QUERY #6 FIND PROPERTIES OF A HERB

The query specifies all instances that has the name with their classes.. This query aims at testing the capacity for thorough searching of certain information across all tables.

SELECT DISTINCT ?yybw_name, ?contradication FROM <http://dart.zju.edu.cn/dartcore/dart/herbase> WHERE { ? med rdf:type dart:Herb. ? med dart:hasName . OPTIONAL {?med dart:hasContraindication ?contradication} OPTIONAL {?med dart:hasMedicinalPart ?yybw . ? yybw dart:hasName ? yybw_name.} }

THE QUERY AIMS AT LISTING ALL PORPERTIES THAT A HERB HAS. OPTIONAL TRIES TO MATCH A GRAPH PATTERN, BUT DOESN'T FAIL THE WHOLE QUERY IF THE OPTIONAL MATCH FAILS. IF AN OPTIONAL PATTERN FAILS TO MATCH FOR A PARTICULAR SOLUTION, ANY VARIABLES IN THAT PATTERN REMAIN UNBOUND (NO VALUE) FOR THAT SOLUTION.

THE FROM KEYWORD LETS US SPECIFY THE TARGET GRAPH IN THE QUERY ITSELF.

Expected Results

QUERY #7 FIND DRUGS THAT INTERACT WITH A HERB

The query specifies all drugs (with their properties) that interact with the herb with the name .

SELECT DISTINCT ?name, ?class FROM <http://dart.zju.edu.cn/dartcore/dart/herbase> WHERE { ? med rdf:type dart:Herb. ? med dart:hasName . ?med dart:interacts ?reactor. ? reactor dart:hasName ?name. ? reactor rdf:type ?class. }

THE FROM CLAUSE AIMS AT LISTING ALL PORPERTIES THAT A HERB HAS.

Expected Results

QUERY #8 FIND HOW MANY HERBS A FORMULA HAS

The query specifies all drugs (with their properties) that interact with the herb with the name .

SELECT

?formula_name, ? (COUNT(?herb) AS ? herbs)

FROM <http://dart.zju.edu.cn/dartcore/dart/herbase> WHERE { ?formula a dart:Formula. ? formula dart:hasEfficacy . ? formula dart:hasName ?formula_name. ? formula dart:containsHerb ?herb. }GROUP BY ?formula_name

THE FROM CLAUSE AIMS AT LISTING ALL PORPERTIES THAT A HERB HAS. AGGREGATE FUNCTIONS, SUCH AS COUNT, MIN, MAX, SUM, CALCULATE A SINGLE VALUE FROM A SET OF RESULTS.

GROUP BY CLAUSE BREAKS THE QUERY'S RESULT SET INTO GROUPS BEFORE APPLYING THE AGGREGATE FUNCTION(S).

Expected Results

You might also like