You are on page 1of 18

EMC DOCUMENTUM MANAGING DISTRIBUTED ACCESS

This white paper describes the various distributed architectures supported by EMC Documentum and the relative merits and demerits of each model. It can be used to evaluate which distributed model or combination of models would be most suitable based on the business needs. This would be particularly relevant to organizations with users who are dispersed throughout a large region or across the world, and where improving the speed and efficiency of information collaboration and production across their enterprise would be the primary objective

EMC DOCUMENTUM Managing Distributed Access

Table of Content
1. Introduction 2. Abbreviation and Acronyms used 3. The Foundation: The Documentum Repository 4. Why Distributed Access? 5. Documentum Solutions For Optimizing Content Responsiveness 6. Relative Comparison Between Single And Multiple Repositories 7. Documentum Distributed Architectures 8. References And Citations 3 3 3 4 5 6 7 16

EMC DOCUMENTUM Managing Distributed Access

About the Author


Lekha Menon Lekha Menon is the Enterprise Content Management (ECM) Lead for the HiTech Industry Solution Units Domain group. She has been focusing on developments in ECM for the last three years with an overall of 10 years of experience in Software Design, Development, Solution Architecting and Training. She has a Bachelors degree in Electronics. She can be reached at lekha.menon@tcs.com

EMC DOCUMENTUM Managing Distributed Access

Introduction
This paper attempts to outline the various options available to a design or solution architect while planning to implement a distributed architecture environment using EMC Documentum. It details the relative merits and demerits of each model that are essential to be considered during the planning stage before finalizing on a best-fit distributed architecture for any Documentum implementation.

Abbreviation and Acronyms used


The abbreviations and acronyms that are used in this manual are: Acronym ACL BOCS DMS RCS WAN WDK Definition/Description Access Control List Branch Office Caching Services Document Management System Remote Content Server Wide Area Network Web development kit

The Foundation: The Documentum Repository


The Documentum repository comprises the following: Metadatastored in a relational database Content filesusually stored in the file system EMC Documentum Content Server is the core server technology that manages the access to the content and metadata. It controls the access to the Documentum repository. Documentum provides the client/server-based (Documentum Desktop) as well as Web-based (Webtop - a J2EE-based web application framework) application interface for the users to access the content and metadata. The Documentum repository is often hosted at a single location, and multiple workgroups within a global enterprise connect over the network to access and retrieve content as shown in the following figure.

EMC DOCUMENTUM Managing Distributed Access

Central Site

Remote Site

Content Server

Local Client

Database WDK/App Server

WAN

Remote Client

File System

Figure 1: A Typical Documentum Implementation

Why Distributed Access?


The prospect of poor Wide Area Network (WAN) performance-unpredictable or slow data transfer time across vital WANs-has given pause to many organizations seeking to leverage the benefits of their content management systems enterprise-wide. Several factors, such as the following, affect the content responsiveness: Bandwidth Network latency File size Frequency of remote fetches and updates These mechanical challenges impact business. A distributed repository that determines how content is accessed and stored across multiple servers and systems within an enterprise addresses the key factors. When content files are hosted at multiple network locations, closer to the end user, the impact of network latency is mitigated. Network connections automatically transfer content among servers, rapidly delivering files where needed. Based on an organizations needs, the most suitable architecture can be selected after evaluating the strengths and limitations of each model.

EMC DOCUMENTUM Managing Distributed Access

Documentum Solutions For Optimizing Content Responsiveness


The Documentum platform supports a solution that optimizes global content access and ensures content responsiveness for distributed task teams. Documentum supports several Distributed Architecture models, the most important of which are described below. Single Or Multiple Repositories Single repository Single Repository with Branch Office Caching Services (BOCS) - The primary repository maintains the document meta-data, but the content is dynamically cached and stored, on demand, on a local file system located within a branch office, using BOCS. Single Repository with Multiple Content Servers - The primary repository maintains the document meta-data, but multiple "content servers" are located close to remote users. The content is, therefore, stored at the location from which it is most frequently used. Single Repository with Multiple Content Servers Using Content Replication - The primary repository maintains the document meta-data, and multiple "content servers" are located close to remote users. The content is stored at the location from which it is most frequently used. Additionally, a content replication job creates a copy of the content to store at each location. Multiple repositories Multiple repositories using replication - In this case, there are multiple repositories for each location, and periodic replication is scheduled to create copies of each docbase object (content and meta-data) at every other location. Multiple repositories as a federation This is similar to the earlier model, but with an additional feature. A federation allows one to manage the users, groups, and Access Control Lists (ACLs) for all participating repositories from a single "governing" repository.

EMC DOCUMENTUM Managing Distributed Access

Relative Comparison Between Single And Multiple Repositories


Comparison between Single and Multiple Repository Models
Single Repository Model ? repository will enable real-time sharing of A single
documents amongst users across locations. It would be relatively easier to manage as compared to a multirepository model, at the same time providing a better performance over centralized content storage architecture.

Multiple Repository Model ? multiple repositories, a user at one location will not With
be able to see a document uploaded by a user from the remote location unless the replication job has run.

? repository model is less dependent on the A single


replication job since only the content is replicated. If the replication job has not run or failed due to any reason, a user from one location can still access the document from the remote site.

? multiple repositories, replication jobs will be required With


for content synchronization at specific intervals which would hog the network bandwidth. Configuring replication at short intervals will affect performance, and keeping very long periods between replication would make it impossible for users across different locations to share documents on a real-time basis.

?architecture will not by itself take care of Disaster This


Recovery. In a single repository model, if the remote content server goes down, the remote users can still connect to the central content server and continue to work. In this scenario, all the content that has already been replicated will be available. However, if the central content server goes down, the remote users cannot continue working as the repository is based in the central site.

?architecture will handle the issue of Disaster Recovery This


to a large extent. With multiple repositories, if either content server crashes, all users can still work by connecting to the other content server; however, only the replicated content and data will be available. The content and data that was not replicated since the last replication cycle will be lost.

? The index agent and index server can only be installed at


the primary site; consequently documents that are uploaded to the remote site will not be indexed until the replication job has run, and the remote content has been replicated to the central site.

? multiple repositories, since replication happens in a With


two-way manner, there can be situations of conflict where one user from remote site and another from central site, work on the same document before the replication has happened.

EMC DOCUMENTUM Managing Distributed Access

Documentum Distributed Architectures


Single Repository Using BOCS EMC Documentum BOCS enables local access to content without the additional requirement of setting up a local content server. It speeds up resolution of performance issues experienced in branch offices when they are caused by network latency, by easily placing content caches close to end users in branch offices or other remote locations where there may be limited infrastructure and no onsite administrators. This helps in faster content transfers, particularly in high-latency environments. The content is stored locally, whereas the metadata, which is significantly smaller in size, is stored and managed centrally. Data caching with BOCS enable users to read and write to local caches that are synchronized with the primary content repository. It is a self-contained installation that leverages BOCS of Documentum without installing an additional EMC Documentum Content Server and supports the use of existing hardware for local caches without purchasing specific machines to match the central Content Server. The administration is lightweight and can be setup through EMC Documentum Administrator. It is scalable and additional BOCS servers can be setup as and when needed, to accommodate future growth. Using the BOCS configuration, when a remote user connects through a web browser, the EMC Documentum Web development kit (WDK)/Webtop Server detects the user's network location and redirects the request to the BOCS server. The BOCS server then determines if the requested content is available locally or whether it needs to be fetched from the nearest content server and cached locally. Once it is fetched, the content is presented to the users through the Web browser interface. The metadata comes directly from the central database; BOCS has nothing to do with the metadata, it only deals with read and write requests to the content. BOCS also supports an additional feature knows as "Content precaching". If there is awareness of content that will be accessed frequently or regularly by the BOCS users, this content can be cached on the server prior to user requests. This will ensure that even first time users do not face the performance hit due to remote content access. Pre-caching can be performed by a job or programmatically. A BOCS server can communicate with a Document Management System (DMS) server in either push or pull mode based on the configuration set in the BOCS configuration object. In push mode, the messages routed to the server through DMS are sent by the DMS server to the BOCS server, whereas in pull mode, messages routed to the server through DMS are picked up by the BOCS server; the DMS server does not send them to the BOCS server. The content that is written to BOCS may be configured to be transferred to the central repository, to occur either asynchronously or synchronously. In asynchronous write, the content is initially stored, or parked, on a BOCS server host, and sent to the repository later. Once it is parked on BOCS, a request to write the content to the repository is sent, and if the request is not fulfilled immediately, the request is sent again by an internal, system-defined job. The content that is parked on a BOCS server for asynchronous writes is not removed from the BOCS content cache after it is written to the repository. Instead, it becomes part of the cached content on the BOCS server. An objects metadata is always written to the repository immediately.

EMC DOCUMENTUM Managing Distributed Access

Asynchronous write operations ensure that a user does not wait for content to be saved to the repository when the network communication lines are slow. Additionally, other users in the network locations served by the BOCS server on which the content is parked have immediate access to the content. Asynchronous write operations are best used when: The branch office and primary office are connected by slow network lines. When the content is used primarily by users at the network locations served by the BOCS servers. The content to be saved or checked in is a large content file. Limitations Using asynchronous write has the following limitations: Parked content is unavailable to users who are not accessing the repository through the BOCS server on which the content is parked. If an application needs immediate access to particular content, asynchronous write cannot be used for that content unless the application is rewritten to check for the parked state before obtaining the content.

Central Site

Remote Site

Content Server
Database Local Client

Metadata
Remote Client

Content

File System

WAN

BOCS Cache

Figure 2: A BOCS Implementation

EMC DOCUMENTUM Managing Distributed Access

BOCS Advantages and Disadvantages


Strengths ? A Documentum content server installation is not required
at the remote locations. This solution leverages the existing Documentum server installation and licensing.

Limitations ? It requires Installation and Administration of BOCS at


remote site. BOCS will also need separate licenses to be procured.

? is network-aware and will automatically download BOCS


and upload content to the nearest content server, whether it is a remote content server or the primary content server.

? It functions only for Web-based user interface


(Webtop). Clients using Desktop client server interface cannot experience the benefits of BOCS.

? there is no replicated content server or database to Since


maintain, there is no need for onsite IT or other administrative support. Everything can be easily handled from a central location. With BOCS, the metadata (as well as permissions and entitlements) is accessed from the content server through WDK on the application server, enabling administrators to maintain central control over all the content.

? BOCS, the first user requesting content from a With


remote location may experience a fetching delay due to the latency issues and bandwidth constraints affecting other network users.

? The backup process will be much simpler than all other


distributed models, as all the content will be available locally.

? The content needs to be transferred between the


content server and BOCS at regular intervals. The bandwidth would need to be sufficient to accommodate this periodic replication.

? text searching is a a primary requirement, then If full


replication becomes mandatory as the index server will only index the documents from the central server. In such a situation, BOCS is the preferred configuration.

EMC DOCUMENTUM Managing Distributed Access

Single Repository With Multiple Content Servers In this model, content is stored in a distributed storage area. A distributed storage area has multiple component storage areas. One component is located at the repositorys primary site. Each remote site has one of the remaining components. Each site has a full Content Server installation. This model can be used for either Web-based clients or Desktop clients. In this configuration, metadata requests are handled by the Content Server at the primary site, and requests to write content to storage are handled by the Remote Content Servers (RCS) as depicted in the following figure.

Data Requests Central Site Remote Site

Content Server

Local Client

Remote Client

Distributed Content Server


Database WDK/App Server WDK/App Server

File System

WAN

File System

Content may be at either location, but distributed so that frequently used content is close to its user

Figure 3: Single Repository Multiple Content Servers

10

EMC DOCUMENTUM Managing Distributed Access

Single Repository Model Advantages and Disadvantages


Strengths ? It exhibits improved performance for remote users as
content is accessed from the local site. This model is beneficial where a set of users belonging to one geographical location accesses common content, and the need for content sharing across several geographical locations is minimal.

Limitations ? The benefits are nullified in cases where content is


frequently shared across multiple different geographical locations.

? the database and repository are available at a central Since


location, there is only a single point of management and maintenance for database and repository.

? may still need to access content remotely, if Users


they are accessing a document that is not stored at their current location. In this situation, the performance experienced by the user would be similar to a non-distributed centralized content architecture.

? Replication jobs can be added whenever needed, Content


and stopped if not required.

? Interruptions in connectivity between main and


remote locations would render the system unusable, as data requests are still routed to the main content Repository.

? using Desktop clients, this model is the only For sites


model available for a single-repository distributed configuration.

? Installation is required at each remote site to add a


local Content Server and Application Server. Thus, additional Documentum installation, administration and Management activities would be required at each site.

?model is recommended for sites where full text This


searching is not a requisite. In such a situation, replication is not mandatory as there is no specific need for all content to be available at the central location, and this model would be the best fit.

? would need to be planned because the Backup


standard EMC product for documentum backup Networker, does not recognize remote filestores. The remote filestore will need to be either backed up separately, or provision will need to be done to replicate content to central site (refer to next model).

11

EMC DOCUMENTUM Managing Distributed Access

Single Repository With Multiple Content Servers Using Content Replication Documentum provides the ability to replicate content to one or more locations. This option entails a single repository with multiple content severs same as in the previous option. However, the content replication functionality will need to be used in this case as depicted in the following figure. The content is replicated from its source component to the remaining components by user-defined content replication jobs.

Data Requests Central Site Remote Site

Content Server

Local Client

Remote Client

Distributed Content Server


Database WDK/App Server WDK/App Server

File System

WAN

File System

Content Replication - Creates a local copy of Content

Figure 4: Single Repository with Replication

This model allows supporting the situation where the same piece of content is frequently accessed from multiple locations. Content In A Distributed Storage Area In this model, content is stored in a distributed storage area. A distributed storage area is a single storage area made up of multiple component storage areas. All sites in a model using a distributed storage area share the same repository, but each site has a distributed storage area component as its own local storage area to provide fast, local access to content. One component is located at the repositorys primary site, and each remote site has one of the remaining components. Each site has a full Content Server installation and an Application server (for Web-based clients) installation for the repository. The content is replicated from its source component to the remaining components by user-defined content replication jobs. This model can be used for either web-based clients or Desktop clients. Desktop clients at the remote sites use Content Server at the remote site to access content. In this configuration, metadata requests are handled by Content Server at the primary site, and content operations are handled by the distributed content servers at the remote sites as shown in the following figure.

12

EMC DOCUMENTUM Managing Distributed Access

Primary Site
Content Server Distributed Store Component 1 DMS

Web Server

Distributed Store Component 2

Distributed Store Component 3

Content Server Content Server Web Server

Remote Remote Site 2 Site 1

Web Server

Web Client Site 1 Site 3

Web Client

Web Client Site 2 Site 4

Web Client

Figure 5: Single Repository with Distributed Storage In this model, users in Site 1 and Site 2 are closer to Remote Site 1 and will access the content stored in the distributed storage component 2 located at the Remote site 1 distributed content server, whereas users in Site 3 and site 4 are closer to Remote Site 2 and will access content stored in the distributed storage component 3 located at the Remote site 2 distributed content server. If the users are logging in using a Web-based client, content requests are handled through the Web server at the appropriate branch office in the Remote sites 1 or 2. If the users are logging in using a Desktop-based client, content requests are handled by the Content Server in Remote sites 1 or 2. Content Replication Content replication is a process of replicating content files among distributed storage area components. This process ensures that users at each site have local copies of the files to access. Content replication can be scheduled to run automatically or it can be performed manually. Automatic Replication The tools that can be used to replicate content automatically are: - ContentReplication tool The ContentReplication tool provides automatic replication on a regular schedule. It is implemented as a job. Once the parameters of the job are defined, the agent exec process executes it automatically on the preferred schedule. - Surrogate get feature The Surrogate get feature provides replication on demand. In this mode, when users request a content file that is not present in their local storage area, the server automatically searches for the file in the component storage areas and replicates it into the users local storage area.

13

EMC DOCUMENTUM Managing Distributed Access

Manual Replication To manually replicate content files, the following administration methods can be used: - REPLICATE The REPLICATE administration method copies a file from one storage area to another. The disks on which both component storage areas reside must be accessible to the server. IMPORT_REPLICA The IMPORT_REPLICA administration method imports a file from another component of the distributed storage area, or from an external file system into a storage area.

Both these methods can be executed from Documentum Administrator, the EXECUTE statement or the Apply method.

Single Repository with Replication Advantages and Disadvantages


Strengths ? the database and repository are available at a central Since
location, there is only a single point of management and maintenance for database and repository.

Limitations ? Installation is required at each remote site to add a


local Content Server and Application Server. Thus, additional Documentum installation, administration and Management activities would be required at each site.

? If replication jobs are scheduled and content has been


replicated locally, then it would provide a better performance as compared to the previous model, even when some content is frequently viewed by multiple locations.

? are two ways documents can be replicated, There


scheduled or on-demand. if using scheduled replication, content may not be immediately available at remote sites. if using on-demand replication, performance may suffer due to network limitations.

? across all locations can view/share and modify Users


documents on a real-time basis (unlike in a multirepository model).

? access still depends on the connection to the Remote


central repository, as all data requests are routed to the mail location.

? using Desktop clients, this model is the only For sites


model available for a single-repository distributed configuration.

?architecture will not by itself take care of This


Disaster Recovery. If the central server goes down, remote users cannot work either.

? If provision is made to replicate all content from remote


filestore to central server, then this model can handle full text searching too, and backup process using Networker, would also be a straight forward activity.

? For documents from remote filestore that are not


replicated, backup would need to be planned because the standard EMC product for documentum backup Networker, does not recognize remote filestores. The remote filestore will need to be either backed up separately, or provision will need to be done to replicate content to central site.

14

EMC DOCUMENTUM Managing Distributed Access

Multiple Repositories, Using Object Replication In this model, an actual and complete repository resides at each location. The repositories are synchronized with Documentum's Object Replication functionality. This ensures that when a new content is created, it is replicated to each location as shown in the following figure.

Central Site

Remote Site

Content Server

Local Client

Remote Client

Content Server

Database

WDK/App Server

WDK/App Server

Database

File System

WAN

File System

Object Replication - Creates a local copy of Content

Figure 6: Multiple Repositories with Replication

Multiple Repository with Replication Advantages and Disadvantages


Strengths ?architecture provides maximum benefit to This
remote users as both the metadata as well as the content would be stored locally.

Limitations ? Installation is required at each remote site to add a local


Content Server, a database server, index server and Application Server. Thus additional Documentum installation, administration and Management activities would be required at each site. Additional licenses will also need to be procured for each location.

? will function even when connectivity System


between locations is down. Documentum will replicate changes when connectivity is restored.

? made in one repository would not be reflected in the Changes


other repository until replication happens. Configuring replication at short intervals will affect performance and keeping very long periods between replication would make it impossible for users across different locations to share documents on a real-time basis.

15

EMC DOCUMENTUM Managing Distributed Access

Multiple Repositories, Using Federation This option is similar to the above option; however, a Federation provides some additional functionality. In this option, multiple repositories are bound together to facilitate management of global users, groups, and ACLs. Users, groups, and ACLs are automatically propagated to all of the repositories of the federation from the "governing" repository.

Federation Advantages and Disadvantages


Strengths ? advantages as with the previous model. Same
Additionally, users, groups, ACLs can be managed centrally.

Limitations
Same ? disadvantages as with the previous model, with some added complexity in setting up the Federation.

?option enables Federated Search, where a user can This


search across multiple repositories that form a federation.

? Replication is essential for this architecture. Requires


a very good WAN bandwidth and periodic monitoring of the Object Replication jobs.

References And Citations


About EMC Documentum: The EMC Documentum family of products by EMC helps to create content applications and solutions on a single foundation and build a common content repository. It is used to manage, store, secure, and deliver unstructured content in a systematic manner, according to predefined business rules, policies, and procedures. With a unified repository, various groups can easily share and reuse their content with other areas of the business that would benefit from access to this valuable information. More information can be obtained from www.emc.com. 1. White Paper - Using EMC Documentum to Improve Content Responsiveness in Distributed Environments 2. http://www.dmdeveloper.com/articles/administration/distributed.html 3. Documentum Distributed Configuration Guide Version 6

16

About HiTech ISU, HTTD


As a functional group within HiTech ISU, HTTD is mandated to provide leadership in technical and domain capabilities. HTTD supports both the presales and the delivery functions. HTTD consists of high tech domain CoEs, technology CoEs, Product Engineering groups and Domain University.

About Tata Consultancy Services (TCS)


Tata Consultancy Services Limited is an IT services, business solutions and outsourcing organization that delivers real results to global businesses, ensuring a level of certainty no other firm can match. TCS offers a consulting-led, integrated portfolio of IT and ITenabled services delivered through its unique Global Network Delivery ModelTM, recognized as the benchmark of excellence in software development. A part of the Tata Group, India's largest industrial conglomerate, TCS has over 100,000 of the world's best trained IT consultants in 50 countries. The company generated consolidated revenues of US $5.7 billion for fiscal year ended 31 March 2008 and is listed on the National Stock Exchange and Bombay Stock Exchange in India. For more information, visit us at www.tcs.com

To know more about how we help companies in the High Tech Industry overcome their challenges to achieve real business results, Contact: hitech.marketing@tcs.com

All content / information present here is the exclusive property of Tata Consultancy Services Limited (TCS). The content / information contained here is correct at the time of publishing. No material from here may be copied, modified, reproduced, republished, uploaded, transmitted, posted or distributed in any form without prior written permission from TCS. Unauthorized use of the content / information appearing here may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties. Copyright 2008 Tata Consultancy Services Limited

www.tcs.com

You might also like