You are on page 1of 17

MIPS 2012 IP for Development: The Emerging Paradigm (SJMSOM; IIT Bombay), pgs. 307 315 (ISBN No.

o. 978-81-7425-330-9)

TITLE: PATENT PRIOR ART SEARCH: A DOT-CONNECT THEORY, A GAPING HOLE PROBLEM, AND A CROWDSOURCING SOLUTION Author: Tanna Chirag

ABSTRACT The following paper is based on a prospective model for increasing the intelligence of search engines used for patent retrieval and document retrieval. It aims to propose a crowdsourced solution for building universal databases and query logic. It further aims to improve search engine intelligence by provisioning transparent intelligence. And it still further aims to improve visibility of abstracts on a common basis by providing universal meta-abstracts.

1. INTRODUCTION: (The Dot-Connect Theory): The gridlock of technical and scientific evidence, which is scouted for prior art research, comprises granted patent documents, patent application documents, research documents, journals, conference proceedings and the like documented evidence.

Inventions and their subsequent patents can be deemed to be analogous to lodging of mines in a predefined area; said pre-defined area being technical or scientific (metaphoric) area. As and how patents are granted, progressivity of science is defined and each granted patent is a milestone, or better still, in confirming with the discussion herein, a mine. Further, analogically speaking, it is in everyones best interests to discover such mines whilst treading the pre-defined area, in that, it means that it is necessary to understand the nature and construction of a patent before exercising any technology, for it may happen that an exercise may lead to setting off the mine by infringing patent rights or not accounting for novelty or inventive step. Subsequent effects of booming off such mines may be just as catastrophic considering that typical litigation bills run up to enormous amounts in payments!

All these existing documents have come to form a MEGAGRID which includes a compendium of smaller grids (GRID 1, GRID 2, GRID 3) in relation to genesis of technologies (Refer Figure 1)

Traversing this MEGAGRID in relation to innovating or practicing the art necessitates the need to invoke search engines to find the connected dot pattern for each technology (GRID 1 or GRID 2 or GRID 3 or the like) under question. A number of search engines and databases for such a task have been developed and the nature of queries which work on such search engines is relegated to keyword-based queries. The insemination of intelligence imparted to such queries is the matter of widespread research, worldwide.

Just as it is necessary to invest in a strong research oriented outlook, it is just of equivalent importance to pre-empt threats by warding off wayward wandering and adorning a pro-active due-diligent outlook. The art of landscaping (in patents) involves a methodology to scale the exact nature of the invention in the vast realm of existing prior art and to map the specifics of the current art with the prior art to understand the ground that has already been traversed, and to understand potentially traversable grounds.

A step wise procedure for an effective landscaping technique and report includes:

- understanding the subject matter by defining an outlining scaffold to define its perimeter, and its intermittent support structures; - deriving keywords resembling the same outlining scaffold and intermittent structures, and drawing out alternative search-words for the same; - formulating a query and running it in a plurality of databases to formulate a good hit-list of relevant patents/patent applications; - analyzing the claims of each of the searched and identified patents/patent applications in order to map the claims of the relevant documents to the subject matter so as to arrive at a mapping quotient.

The mapping quotient eventually lies in a detailed analysis of similarities and dissimilarities between patents and the subject matter.

The entire genesis of a technological area is clearly visible once a successful patent landscape report or search report is formulated. How this helps a research firm is that it can now, nitpick upon the gaps and dig the same to scavenge for potential markets and patentability gaps.

2. BACKGROUND (Field Setting): The art of writing, essentially, paved the way for documenting evidence. History of pictographic writing can be traced back to 3000 B.C [1]. History of Ideographic Writing can be traced back to 2500 B.C [1].

At its inception, the patent system dates back to the Greeks, as far back as 3rd century B.C. [2], the item of the monopoly by virtue of a patent being a recipe. This evidence of granting monopoly provides a seed for the thought process that grant of monopoly was an honour and a reward for creating something new.

The first Indian Patent was granted in 1856.

The first US Patent [3] was granted in 1790 for the process of making potash, and since then the gigantic USPTO machine has been credited with prosecuting and granting well over 7 million patents. The federal circuit well works in oiling and regulating this machine by providing judgements and arguments in order to streamline the system.

In England, the Crown issued letters patent providing any person with a monopoly to produce particular goods or provide particular services. Apart from the grant to John Kempe and his company mentioned above [4] an early example of such letters patent was a grant by Henry VI in 1449 to John of Utynam, a Flemish man, for a 20 year monopoly for his invention.

The total number of pending patent applications (patent backlogs) across the world is estimated at around 4.2 million [5] in 2007. AWIPO report shows that the number of patent applications between 1985 and 2008 is 29,984,825 [6]. Each of these documents serves as evident prior art. And these are mere samples of the entire list. It can be speculated about the burgeoning volume of patent documents available, not to mention the research papers.

Classification methods have been universally accepted in order to classify a patent carefully. Still, it is a labourious and highly skilled task to come up with queries, each time, which shall carefully select all of the pertinent documents for studying.

Furthermore, each patent document is a creative linguistic canvas drafted by patent attorneys whose primary aim is to create definitive and deterministic terminologies and then to attempt to mask them by departing from the use of routinely accepted words. E.g. a computer (understood colloquially) shall be used in patent context as a computing means or a processing means or a computational means, or even merely an intelligent means.

3. Literature Review (At current instance, T): Schneidermans [7] information seeking mantra: overview first, zoom and filter, then details on demand is what is being followed by the current search engines, which pull out largely relevant data. This is followed by human filtering and then subsequent querying for detailed information. Fully automated solutions are not applicable since analysis requires human insight, judgement, and the ability to make complex decisions. For certain tasks, combination of automation analysis techniques and visualization techniques can be only supplemental which can help bridge analytic gap and save time.

As was found [8], some of the most experienced confessed, at a conference of professional patent searchers, to a growing anxiety about missing something. There is a dump of patent documents, journals, research papers, conference proceedings, encyclopedias, textbooks, paperback books; which need to be scavenged, ideally, before deciding if a subject matter is Novel and Inventive, and still there could be concerns of missing something out.

But when the ensemble of documents and digital content is so large, are the current methods enough to satiate the search palate?

4. Search Support and Database Support: 4.1 Gauging the Field (Wading knee-deep, neck-deep, or head deep): Keyword search is the order of the day used in the industry and accepted worldwide. But the question posed in the above sections brings the discussion to a poise wherein it has now become an accepted belief that intelligence needs to be incorporated to search engines at various levels. Each keyword search starts with studying the subject matter, garnering the essence of the technology in terms of cited keywords and their relationship, coming up with synonyms for the garnered technology terms and their causal relationship, and then formulating a string of queries with permutations and combinations of Boolean / logical / interconnecting operators interspersed between them. As an individual searcher, the range of coming up with synonyms for a given subject matter can only be limited to the diaspora of technology that the person has been exposed to. All these pose the GAPING HOLE CHALLENGE. Therefore, whether or not any search is entirely complete is always questionable. And therefore, a user of such search engines and databases thereof can only be left wondering of the level of technological progress surrounding him for a given technology or field of science.

Notwithstanding the fact that Novelty and Inventive Step may or may not be established using the prior art search process, whether, a business product or service would be party to infringement, without knowledge or abundant due diligence, is a matter of risky judgement.

4.2 Crowdsourcing Solution: 4.2.1 Database Improvement: Plenty of databases have been built. These databases may be synonym databases, (particularly having hypernym databases and hyponym databases), semantic databases, syntactic databases and the like. These databases may be built by a set of employees or researchers governed by a single body.

What is proposed here, to increase the scope of present search methodologies and search engines, is the inclusion of diversity; a multi-view angle to any subject matter. The idea is to garner cloudsupport; to include every person, from any location, with access to the Internet, and interested in the project to build these databases in relation to their understanding. This should ideally -s involved and include terms pertaining to analogy and innovative terms, conjugate terms, portmanteaus; instead of merely adhering to current databases with a rigid structure.

Crowdsourcing solutions have been successfully applied in order to unearth, digitalized and nondigitalized prior art from the remotest of locations. Money based incentives are offered to global employees, which global employees are mere registrants to the program and contribute using versions of logic and methods for constructing a robust patent regime. What has been observed that since these people may be located anywhere in the world, with a variety of educational, technical, professional backgrounds, their levels of understanding or interpretation may vary, their work methodology may vary, and therefore, the results of such work have been fruitful in programs such as Peer-To-Patent and Article One Partners. A platform such as Article One Partners has more than one million scientists [9] and technologists worldwide to research the

validity of specific patents and banks of geographical, professional, experiential, and technical diversity.

Each person in such programs contributes prior art which they deem to be most relevant to the technology under review. And these citations have, more often than not, proven to be beyond the scope of what patent examiners have been successful in finding. The modification that can be incorporated herein, which this paper suggests, is to ask these cloud-dispersed people to simultaneously also enter their query list and databases used to unearth the data. While the query list helps to map an addendum to the synonym databases already available, the semantics and syntaxes can also be developed in relation to the technology under review. In a rigorous exercise of sorts, a diverse spectrum of technologies may be advertised on such platforms and usergenerated query lists be analysed to understand newer synonyms, newer analogies, newer syntactic relations, newer semantic relations and comprehensively to map new data onto existing databases. Simultaneous translation databases also may be created with the use of this global cloud-dispersed platform.

Still one more idea is to deviate from linear databases and develop branched databases (allied fields, analogies and the like).

As a case study, an on-field example can be cited: A case for sustainability of opinion on an electrical connector was being conducted. The research involved searching for prior art documents which included the concept of noise elimination by measuring the noise and adding noise canceling component which is the vector opposite of the measured noise component. After, careful searching in the electrical domain,

the chemical domain was scavenged to refer to analogous reactions wherein an acidic component could be negated by an (opposite) basic component in order to prove obviousness. For such purposes, taking various cases on a continuous learning and incorporating mode, at least three branched databases can be built: 1) Synonym Database, 2) Analogous Database; thereby incorporating creativity, and 3) Translation Database.

Figure 2 represents this solution system.

Figure 2

A search engine with a tri-mode can be envisaged; one which throws up search results with the synonym database, another which simultaneously throws up search results with an alternative analogous database, and yet another which uses translation database for results.

There are many algorithms for lexicon database, synonym database, syntactic database, semantic database usage. There are no acceptable algorithms for the flow of DATA FEEDS into these databases. This crowdsourcing solution may be used to feed data extracted from various viewpoints in to the databases.

The synonym and analogous databases can also conquer the diversity originated from language differences or defects in machine translations. Documents in Japanese or Chinese, for example, shall have only one meaning if driven by a machine translated algorithm. However, using crowdsourcing solution, the diversity component can be expanded to include various synonyms and analogies pertaining to the field or terminologies under question due to human involvement, particularly, from that specific region lending a helping hand.

4.2.2 Query Improvement Logic: With the available query set, now, from the crowdsourcing system, semantic and syntactic relationships may be revisited. Iterative query improvement can be incorporated with the abundance of proven test data. Each time a query is formulated, this logic may be used to fetch similar queries which have previously worked. A train of queries may be generated which links to the original query. Even algorithms for classifying earlier used or earlier accepted portions of queries may be used or suggested by the incorporation of this logic.

4.2.3 Search Engine (JUI-R): A Justified Unified Information Retrieval (JUI-R) search engine can be accomplished using iterative query improvement based on received data from a crowdsourced database. This is based on selection semantics and selection syntaxes which are preserved and stored and used to train search engine algorithms.

In conjunction with the developed databases (from 4.2.1), mentioned above, what the solution herein provides is an augmented semantic search plus database which is built on diversity inclusion and using query improvement logic (from 4.2.2)

Thus, the mere fractals of keyword search can be done away with.

4.3 Transparent Intelligence As a continuous user of various search engines and as a provider of search results to clients, each day, I still conform to an old school thought of manual reading and manual interpretation, due to lack of visible intelligence. The manner in which selections are made in a semi-intelligent machine is questionable with respect to the level of intelligence plugged into the search engine. If it results in exclusion of evidence cannot be completely warranted, then. If there is really an intelligent engine to be adhered to, I would more probably rely on a TRANSPARENT INTELLIGENT ENGINE.

That is, instead of assuming various keywords and their synonyms from the database, the search engine may be developed to provide suggestions to a user to select from, i.e. for every computer that a user inputs, the system provides suggestions of computing means, computational means, processing means and the like. And for every computer with playback means, the system provides suggestions of computer linked to playback means, computer communicably coupled to playback means and the like. Thus, the user knows and controls the depths of search layers that the search engine may intrude into, tunes the level of intelligence thereby choosing to opt for or opt without potentially analogous results or potentially obscure results, which is beyond the scope of current search engines, techniques, and methodologies.

5. Linking Patents and Research Documents It is commonly known that each patent attorney indulges in patent terminology jugglery in order to best-define his clients inventions. This includes creation of terminology or the induction of techno-legal terminologies which far departs from the manner in which research papers are written. Generally, research papers stick to scientifically accepted terms and terminologies without attempts to get creative. However, for a good search, it is important that patent documents be scavenged thoroughly for related art along with research papers. There needs to be a link established between the two for concurrent findings or peripheral findings, beyond the scope of normal user-inputted static keyword search.

Figure 3 and the steps below illustrate how a pool of patent-related documents and research documents referring to one technology may be developed, using the above-mentioned databases and logic for the same: 1) Input specific terms or identify nouns from a given description; 2) Search for the terms in a set of patent-related documents; 3) Rank the searched terms in patent-related documents in their order of frequency; 4) Use above-mentioned synonym database and analogous database to create a list of similar terms; 5) Search for similar terms in a set of research papers; 6) Rank the searched terms in research papers in the order of their frequency; 7) Rank the linkage between the searched terms in the order of their frequency; 8) Link to required Database. Or

1) Input specific terms or identify nouns from a given description; 2) Search for the terms in a set of research papers; 3) Rank the searched terms in research papers in their order of frequency; 4) Use above-mentioned synonym database and analogous database to create a list of similar terms; 5) Search for similar terms in a set of patent-related documents; 6) Rank the searched terms in patent-related documents in the order of their frequency; 7) Rank the linkage between the searched terms in the order of their frequency; 8) Link to required Database.

FIGURE 3

6. Road Ahead / STEP NEXT (At a prospective instance, T+n): Each new patent document is classified in accordance with universally accepted codes. What we propose is the use of accepted semantics and syntaxes to verbally define each patent. Instead of accepting user-written abstracts, the idea is to build a meta-abstract which can be crawled upon by a search engine using the built databases only. Thus, we have a common verbal dialogue. We identify each new patent application with terminologies from a given accepted database to provide accurate and discoverable metadata. The whole idea of patenting was to provide a monopoly in return for disclosure. If, Patent is for disclosure purposes, then we should work towards such disclosure.

7. CONCLUSIONS The above discussed subject matter is part of an Indian patent application; vide application no. 1962/MUM/2011.

Cumulatively, a collaborative platform is proposed with reference to this paper; one which utilises a crowdsourced database building solution with different types of databases; a search engine having a selectable level of intelligence thereby providing transparent intelligence and further having a query improvement logic, and a meta-abstract tagging mechanism to appropriate common tags to the multitude of patents and patent applications and research papers; irrespective of their drafting terminologies. Essentially, the question of WHAT DO WE SEE should not remain.

8. REFERENCES [1] Available at http://www.ling.upenn.edu/courses/Fall_1998/ling001/Writinglect.html (Accessed 05th November, 2011) [2] Stobbs, Gregory A., SOFTWARE PATENTS, Aspen Publishers, 2000 (at 3) [3] Hernandez, Maria V., United States Patent and Trademark Office, 2001 See also http://www.uspto.gov/news/pr/2001/01-33.jsp [4] E Wyndham Hulme, The History of the Patent System under the Prerogative and at Common Law, Law Quarterly Review, vol.46 (1896), pp.141-154. [5] WORLD INTELLECTUAL PROPERTY INDICATORS (2009) [6] Available at http://www.wipo.int/ipstats/en/statistics/patents/ (Accessed 05th November, 2011) [7] [Shneiderman, 1996] Ben Shneiderman, The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. In Proceedings of the IEEE Symposium on Visual Languages, pages 336-343, Washington. IEEE Computer Society Press, 1996. [8] Toward a More Rational Patent Search Paradign; Kristine H. Atkinson, Legal Dept., Boston Scientific Corporation, Natick, Massachusetts, USA

[9]Available

at

http://finance.yahoo.com/news/Article-One-Partners-Launches-prnews-

442199887.html?x=0&.v=1 (Accessed 05th November, 2011)

You might also like