dm_notes: Documentum Notes

November 8, 2010


Filed under: documentum — Raj V @ 11:29 pm

Probably you would have noticed that my last post has been quite a while. The reason being, I have been out of DCTM world for over a year. You may not see DCTM updates anymore (unless I move back or find something worthy to blog).

May 7, 2009

DQL to find duplicate objects

Filed under: documentum, dql — Raj V @ 5:11 pm

Over  a period of time due to some bugs in our Application, I came across a scenario where we found that few documents whose attribute is supposed to be unique for a given version tree (for all versions) is not unique.

The logic for generating the unique attribute has a flaw.

For every document we create in documentum, a unique identifier is be assigned, which would be the same for all versions of the document (similar to i_chronicle_id). This unique identifier would be used by the business users for accessing the document or for referring it in other applications.

To identify the list documents that have the same attribute in the repository and the no. of occurrences of each such document, I use the following query.

The below query can also be used to find duplicate documents(objects).

You need to replace with your corresponding unique attribute in the below dql:

DQL> SELECT <unique attribute>, count(*) FROM <doc type> GROUP BY <unique attribute> HAVING count(*) > 1
ORDER BY <unique attribute>

This gives the document identifier and the no. of  documents having the same unique identifier.

Also a technique I use to generate the unique identifier for a document with in the system is by using the following approach (may not suit all your business scenarios though)

Below is the i_chronicle_id of a document (Unique for all versions of the document)

i_chronicle_id : 0900b276800077f3

The first two characters specify the type of the document

The next six characters specify the docbase id to which the document belongs to

The rest of the 8 characters specify the unique object id of the object for a given document type and docbase.

If you want to generate a unique custom document no. with in this docbase you can use the unique document id in the following format

Unique document no. : <Business Prefix ><unique document id>

ex: DOC800077f3

Changing the to this new pattern has resolved the issue for new documents creation. We are yet to figure out a way how we can remove the existing duplicates as they are being used in other systems and we can’t straight away change the unique attributes.


September 5, 2008

UCF Demistified : Client Config : Part 1

Filed under: documentum, ucf — Tags: , , — Raj V @ 9:19 pm

UCF is the de facto standard for Transferring the Content from Documentum repository to the end client across its application clients (applications that doesn’t have any DFC footprint on the client).
UCF is composed of 2 parts: a Client and a Server.  UCF extensively uses configuration files for various aspects.

UCF configuration files exist at two layers. One on the server – App Server (ucf.server.config.xml file) and the other on the Client (ucf.client.config.xml file).

The rest of the article looks at the configuration files applicable to the client layer.

A configuration file defines how UCF is launched on the client.
On the client, UCF has few configuration files that defines how it is launched.

Each of the below configuration file is used for a definite purpose. The purpose of each file is detailed below:

  • ucf.launcher.config.xml: This is the file that specifies where UCF should look for files (like libraries, configuration files etc.). This file has the configuration parameter that specifies the UCF home directory (“installs.home”). This directory is used by the UCF launcher to look for the files it requires for launching the UCF client process. During UCF installation this is the first files that gets created.
  • ucf.installs.config.xml:
    This is the main configuration file that the UCF launcher uses to launch the UCF Java process (on the client). This file defines the UCF Runtime environment on the client.

    • <ucfInstall> element followed by its attributes define where the UCF libraries can be found and how to use it.
      • appId: defaulted to ‘shared’. This specifies that the UCF application libraries are shared across multiple applications (all UCF based consumers) for the same version of UCF client.
      • version: This defines which version of UCF client library should be used for launching the UCF client process. This is the UCF’s version number your application is bundled with. If you have multiple versions of UCF on you client machine, the version specified here used for launching UCF.
        This version follows a standard as other Documentum applications do :
        Format <major>.<minor>.<maintenance  no>.<build no.> ex.: ==> 5.3 SP4 build.
      • host: The host for which these UCF libraries were installed. If your %USERPROFILE% directory is the same from any host in your corporate network, each host from which you access UCF will have a separate UCF libraries directory. Your host UCF client process would pick up the UCF libraries only from the directory that match with the current host.
      • home: UCF home directory, this is the directory where all the UCF related libraries and configuration files are found. This attribute value is the same as defined in the ucf.launcher.config.xml file.

      This ucfInstall element defines the location of the UCF library files, configuration files and the intermediate temp directory.

    • <java> element (under ucfInstall element) and its enclosed elements define the runtime environment the UCF client process uses. The attributes of this element are detailed below:
      • version: The version of JRE being used by the UCF client. This is the version that is found by UCF installer on your host machine found to be suitable for launching the UCF process.
      • minVersion: The minimum Java version the current UCF build requires to launch the client successfully. If this minimum java version is not met, UCF downloads its private JRE and uses it. if your version is higher than the minVersion, no private JRE is downloaded.
      • exePath: The path to the javaw executable that is used for launching the UCF process.
        Hack: If I want to force the UCF to be launched by a different version of java (like JDK 1.6.0 instead of JDK 1.4.2), I can modify this path to point to my JRE. Beware that EMC doesn’t support these hacks and versions it doesn’t support.
      • classpath: This is the classpath used by the UCF client to launch the UCF process. The only additional libraries available to UCF during the launch are only the bootstrap jars of the JRE.
        If you take close look at this classpath entry, it followes a pre-defined format:

        • <installs.home>/<host>/<appId>/<bin>/<version>/*.jar
          (Pre 5.3 SP3 UCF versions doesn’t have the version directory)
          This is for this reason the ucfInstall element has defined these attributes.
    • <option> elements. These are the VM arguments that are passed to UCF process during the launch time.
      • The default options the installer specifies are:
        • java.library.path : This is the directory where the UCFWin32JNI.dll is found (on Windows). This dll file provides the support to access Windows Registry for tracking the User operations on the client (Checked out documents, Viewed documents, Linked documents also called “InlinedDocuments”, Housekeeping parameters etc. When ever a document is transferred to the client (accessed using UCF), its tracked in the registry.
        • java.util.logging.config.class: This is the standard Java.util.logging parameter (defined by JDK) that defines the Logger class to be used for logging UCF calls. You can plug-in your own logger instead of the default Logger, in case you have additional requirements for logging.
          Note: One pitfall I see in using java.util.logging.Logger is it doesn’t record the granularity of timing at milliseconds unlike Log4J (Bug# 6285131).
          Custom loggers however are not supported by EMC.  (Only for detailing purpose they are hinted here)
        • user.home: The Java’s user.home system variable. This would be the default directory where your documents would be checked out, Viewed or Exported to. The documents location can however be defaulted to a different location by specifying them in the “ucf.client.config.xml” file (more on this later).
        • Any additional VM arguments that needs to be passed can be added to this file here.
          Like minimum/maximum Heap memory; Debugging parameters or JMX listeners etc.

UCF Launcher : UCF Launcher used by WDK applications launch the UCF process in a shared mode. This means the launched UCF client process can be shared across multiple content transfer invocations. Having it shared means the UCF client process lingers around for some period  after the current operation (this is configurable again; details to follow later) waiting for further requests from UCF server.

Note: The above statements detailed in this article are neither supported nor suggested by EMC. Its purely my own understanding of the UCF component. Please use this information at your own discretion.
Neither me nor EMC would provide the support/take the responsibility, if the default installation is altered.

August 14, 2008

How many objects your docbase can accomodate?

Filed under: documentum — Tags: , — Raj V @ 5:33 pm

In Documentum every object has a unique identifier (r_object_id) in the repository for accessing it and this unique identifier is composed of a 16 digit hexadecimal(4-bits) value (8 bytes). The structure of the object id is composed of 2-digit hexadecimal id (object type – not the same as r_object_type attribute of an object), a 6 digit hexadecimal docbase id and a 8 digit unique identifier for the corresponding object.

  • Each Object Type is defined by a 2 digit hexadecimal identifier by Documentum Server internally
    • Like ’09’ for dm_document type
    • ‘0b’ for dm_folder type etc.
  • Each Object has the next 6 digits as the docbase id to which it belongs to. The docbase id is theoretically unique globally.
  • The rest of the 8 digit hexadecimal identifier defines a unique identifier to the object in the repository.

So for a given object type (including the its sub types) in a repository we can accommodate a maximum of ‘ffffffff’ (hexadecimal) no. of objects. This translates to ‘4,294,967,295’ no. of objects (in decimal). This comes to roughly 4 billion objects. This is good enough for a single repository at any point.

How ever over the course of time (may be in decades) this no. could reach its limit with the increasing no. of documents being placed over and over again (including multiple versions).

Think of a case of the ACLs being defined (dm_acl type; type id : 45) in the repository, The ACL type also can outgrow to a max limit of 4,294,967,295 in a docbase. However even under any exceptional scenario, I don’t see these no. of ACLs being defined in a single repository. Here many of the identifiers are being unused in the repository.

Taking a note of the hierarchy of the “dm_document” type, there are many internally defined subtypes of “dm_document” that would occupy further more space with in the limit set for dm_document type stored in the repository.

\ dm_document
\ dm_staged
\ dm_plugin
\ dm_java
\ dm_email_message
\ dm_format_preferences
\ dm_menu_system
\ dm_docset
\ dm_docset_run
\ dm_esign_template
\ dm_xml_config
\ dm_xml_style_sheet
\ dm_xml_zone
\ dm_xml_custom_code
\ dm_message_archive
\ dmc_notepage
\ dmc_jar
\ dmc_tcf_activity_template
\ dmc_tcf_activity
This clearly points out that in some cases its possible to hit the limit for a object type(including sub types)  where as in other cases we would never fill even 10th of the space allocated for these types.

The way Documentum has categorized/structured the object ids is designed to its best, but how to overcome these limitations.

One possible way is to increase length of the object id from 8 bytes to 24/32/64 bytes. Again the OS limitations and the Database limits may apply here. Quite possible that it was designed keeping in view of the 16/32-bit OS available at that time.

I recall at some discussion that Documentum is planning to support 32 bytes in future. If this is the case they would probably provide a utility during upgrade where the object ids should be converted to accommodate the 32bytes like converting the object identifiers from 8 bytes to 24 bytes.

Migrating these object ids may create issues in the environments using CIS filers who refer to the older object ids and these ids can’t be changed due to the filers locking the content/metadata.

July 10, 2008

r_object_id Vs user_name of dm_user which is better?

Filed under: documentum — Tags: , — Raj V @ 3:22 am

I always have an unanswered question, when ever I see the “r_accessor_name”, “owner_name”, authors, users_names (of a dm_group) and at many places where ever CS refers to a user. Documentum stores the “user_name” in all the above attributes. Typically the “user_name” is “Last name, First Name” and “user_os_name” & “user_login_name”  is generally the unique login user_id in a corporate network.

Why doesn’t CS store the r_object_id of the user where it has to refer to a user, as it is unique, immutable in a repositories lifetime and is also maintainable.
dm_user is a content less object,  It can’t be version-ed.  (You can however update the object properties but you can’t create version this object; i_vstamp tracks the no. of updates) .
So invariably your r_object_id always refer to the same object (version).

If the CS would have used the r_object_id instead of user_name across all objects, it would have been more manageable.

Here are few I believe that create trouble:

  • A user name correction or a rename would have been easier. (Ex.: The maiden name change due to marital status or mis-spelled name)
    • Need to change the name on all objects or deactivate the user and create the same user as a new user
    • Need to change the name in all groups defined.
    • A scenario where there are 2 users with the same name.
  • User deletion would have been easier and maintainable. ( Delete the user and run a script to remove all the orphan object ids referred by the objects)
  • Better disk space utilization. Consider a scenario where there are 10 users (authors) on each object and there are 1000 objects in the repository. Assume an average user_name attribute takes 20 bytes (this would be the minimum I guess). So the space consumed by these users for each version of the object is 10* 20  bytes * 1000 objects = 200,000 bytes. Same scenario with r_object_id 10 users * 16 bytes * 1000 objects = 160,000 bytes a saving of 4000 bytes (4K approx).  Additional versions would save further on space. ( Disk space is inexpensive now but how about maintainability)

The only reason I see the benefit in having the user_name across all objects is the “performance factor”, the CS doesn’t require to fetch the Full name everytime you access a object as it is directly stored currently. But the same could easily be atained through object caching (already supported by the server).

Why it has been designed this way any specific reason for this or it was just overlooked during intial design phase?

Expert comments are welcome.

June 26, 2008

Role of UCF in Documentum Clients

Filed under: dfc, notes, ucf, wdk, webtop — Tags: — Raj V @ 10:32 pm

In early days, all Documentum clients were thick clients (either DFC based or DMCL based). This  means it was on a 2-tier architecture. You either need to have a DFC or a DFC based client installed on every client that accesses the Documentum Repository or alternatively use the legacy DMCL library based (IAPI, IDQL) to access the repository .

With the advent of WDK (a web based repository access manager), the thick client is no longer required on the individual clients. This was made possible by moving the Documentum Repository client layer (DFC) to the middle tier ( a 3-tier based architecture) from the traditional Client/Server based 2-tier architecture).

Moving DFC to the middle tier will enable the Application server to access the Repository. But how can the end client access the content in the Repository locally, where there is no footprint of Documentum?

When you perform content management operations, the content is retrieved by DFC on behalf of WDK from the content server and is transferred to the Application server where DFC resides. But how do we transport the content from the App Server to the end client.

To answer this, Documentum has come up with a HTTP based content transfer program that runs with in the context of the Client browser.
This program is a Java based Applet that transfers the content from the App Server to the client (Outbound operations ) and vice-versa (In-bound operations).
But due to the applets limitations to process complex document structures like XML Links/OLE Links etc. this transport mechanism was limited to basic content management functionality.

These limitations have put forward a new robust and extensible transport mechanism called UCF (Unified Client Facilities).

We can enable UCF in WDK based applications( 5.3 or later) through the configuration parameter.

How can you identify which transport facility (http or ucf) is used in your WDK based application?

It is defined in your app.xml file of your application (default entry : wdk\app.xml).
The below config element defines the mode



What is UCF?

UCF is composed of two components (at a very high level). UCF Server and UCF Client.

UCF Server plays two different roles. It presents itself as a end client to the DFC layer that communicates with the Repository and it presents itself as a server to the UCF Client (end client).

The broader communication channel is as below:

Content Server <—> DFC <– –> WDK/UCF Server<—> UCF Client.

What are the benifits of UCF over HTTP transfer mode and why is it being made the de-facto standard in  Documentum based applications (from D6)?

  • Performance and Throughput
    • There is a disbelief that the earlier HTTP based transport is faster than UCF transport. In general any standard http based uploads are better than http based downloads. Not sure why? May be the Server’s is responding more to the clients as most of the users are accessing the application and there are only few users that are trying to write back to the server (upload). (Just a thought)
    • I do agree there have been few performance issues and there have been lot of improvements over the time. It has improved a lot lately.
    • UCF provides client information to the server as and when required, accesses the registry, optimizes the content transfer, etc.. There is a delay involved in launching the UCF client and initializing it compared to the applet based transport. This delay is due to the JVM startup(launch), UCF client making the initial connection to the App Server and protocol negotiation with the server.
  • Extensibility
    • UCF is extensible. You can add your own Requests (Server)/handlers (Client) and plug-in to the UCF Infrastructure and enable it to perform your custom tasks.
  • Recovery
    • UCF has the support for recovery. Say when you are Exporting/Viewing a content file and the socket connection was broken during the transfer, UCF tries to re-try the operation from where it has left and attempts to complete the operation.
  • DFC based analysis for the client
    • As UCF has a small footprint on the client it sends the content file to the server. Then the server analyzes the content and initiates a 2-way communication channel with the client. This communication channel enables the server and client together to perform the Content Analysis.
  • Client information available on the server (for DFC and WDK components)
    • UCF client makes available all the required client information at the Server giving the impression of a client.

None of the above benefits were available with the http based transport implementation or were available with limited support.

June 22, 2008

Enabling Documentum Composer features on Eclipse Europa

Filed under: documentum — Tags: , , , — Raj V @ 11:28 pm

I ran through an article to Install Documentum Composer on Webtools project. However I was having Eclipse Europa  3.3.2 installation and want to enable Documentum Composer features on it.
The same rules apply other than getting the suggested updates. Europa uses the Discovery of features available on Eclipse site.

This came out to be much easier without much configuration as the required EMF Validation framework was already part of Europa.

  • Install Eclipse Europa JEE edition from Eclipse Europa Site.
  • Unzip EMC Documentum Composer to a separate folder (or you can extract the com.emc.* packages into corresponding eclipse plugins/features folders)
  • Copy com.emc.* from Composer/plugins directories to Eclipse/plugins directory.
  • Copy com.emc.* from Composer/features directories to Eclipse/features directory.
  • Start Eclipse and your are ready with Documentum Project with JEE that you can use for WDK development.

Hope EMC releases a update site for Composer along with a complete build.
This will enable using the update site to install “Composer” over existing installations of  Eclipse Europa/Webtools.
This saves lots of RAM (a minimum of 300MB).

Previous installation I had was (Hungry for memory):

  1. DCTM server (6.0 SP1 bundled with Weblogic)
  2. Oracle 10g R2 (Pre SP1 I was on Oracle 10g XE as development docbase to save on memory. XE comes for free and with complete feature set a docbase requires and a small footprint)
  3. Tomcat Webtop + DA
  4. Eclipse for Documentum WDK Development
  5. EMC Documentum Composer for DocApp Builder/Installer

Now with new installation I could atleast live with one Eclipse installation with the  option of having WDK Development and Documentum Composer in one IDE.

Few features I would like to see in Composer to make it a unified Documentum Development Tool are :
IAPI/IDQL (features of Samson) ; WDK WYSIWYG Editor, DFS SOA Design time with HTTP Analyzer.

June 19, 2008

Query to find list of objects in folder along with its Folder Path

Filed under: dfc, documentum, dql — Tags: , , — Raj V @ 3:42 pm

Occasionally we require to find the list of all objects from a folder and also retrieve their exact folder path with in the same query.

This can be achieved easily through a DFC Program. But its a little tricky when you want it through a a DQL Query. As dm_sysobject stores only the folder id (i_folder_id) of the object instead of the folder path.

The folder path is hidden in the dm_folder object and is a repeating attribute. So we need to query dm_folder for r_folder_path. The issue in DQL is you can’t select repeating attributes when you join multiple types. You will hit DM_QUERY2_E_REPEAT_TYPE_JOIN if you do so.

Lets see what DFC can do and how to approach the same with DQL.

DFC code snippet looks as below:

IDfFolder folder = (IDfFolder) session.getObjectByPath(<<folderpath>> );
if (folder != null) {
     getContents(session, folder, docs);
     System.out.println("Total Number of Files : "+docs.getSize() );

private List getContents(IDfSession session, IDfFolder folder, List docs) throws DfException, IOException {
   // get all the r_object_id
   IDfCollection collection = folder.getContents("r_object_id");
   if (collection != null) {
      while ( {
	String objectId = collection.getString("r_object_id");
	IDfSysObject object = (IDfSysObject) session.getObject(new DfId(objectId));
	if (object.getTypeName().equals("dm_folder" ) || object.getType().isSubTypeOf("dm_folder" )) {
           getContents(session, (IDfFolder) object, docs, writer);
	} else {
           IDfFolder folderObj = (IDfFolder) session.getObjectByQualification("dm_folder where r_object_id = '+ object.getString("i_folder_id" ) + "'");
           if (folderObj != null) {
	      buffer = object.getObjectName() + "\t"+ object.getOwnerName() + "\t"+
                                  folderObj.getString("r_folder_path" ) + "\t"+ object.getModifyDate();

In DQL the same can be achieved as below:

DQL> select A.r_object_id, A.object_name, B.r_folder_path from dm_document A, dm_folder_r B where any A.i_folder_id = B.r_object_id and B.r_folder_path like ‘%/System/%’;

What we are trying to achieve is to join the dm_document type repeating attribute ‘i_folder_id’ and a dm_folder single value attribute table. This way we don’t end up querying the r_folder_path a repeating attribute. If we would have queried dm_folder type (instead of dm_folder_r) we would have hit the DQL restriction of DM_QUERY2_E_REPEAT_TYPE_JOIN error. However querying the underlying table enables us to pass through the DQL translator for _r table (just like any registered table concept).

June 5, 2008

Hierarchical list of Documentum types

Filed under: dql, notes — Tags: , , — Raj V @ 8:06 pm

Came across a simple (useful) query that displays the types and thier hierarchy.

DQL>describe hierarchy persistent.

Displays a hierarchial structure  of all “Persistent Objects”.

DQL> describe hierarchy dm_sysobject
Object hierarchy list
\ dmi_expr_code
\ dm_mount_point
\ dm_location
\ dm_docbase_config
\ dm_server_config
\ dm_policy
\ dm_registered
\ dm_folder
\ dm_cabinet
\ dm_xml_application
\ dm_category
\ dmc_topic
\ dmc_room
\ dmc_module
\ dmc_aspect_type
\ dmc_validation_module
P.S.: Found it through samson (Just below the toolbar : Query Topics -> Type Management + List tree of types known in docbase –> Generate Query)

Analyzing the above query, I gave a DQL to find out hierarchy of dm_sysobject and it works perfect. cool

DQL>describe hierarchy dm_sysobject

(lists hierarchically all the sub types of sysobject ).

This will be handy sometimes to see who all extend a custom object type.

Here is the DQL to find direct sub types of a given type
DQL> select name from dm_type where super_name =’dm_sysobject’;
(This query doesn’t list the indirect sub types)

December 28, 2007

DQL: List of groups a user belongs to ..

Filed under: dm_group, dql, notes — Tags: , , , — Raj V @ 9:44 pm

Here is the DQL to query the list of groups a user belongs to:

select group_name from dm_group where any i_all_users_names = ‘<user id>’;

This gives the list of all groups a user belongs to directly or indirectly.
i_all_users_names is a computed value.

If you want to query for list of groups the logged in user belongs to:

select group_name from dm_group where any i_all_users_names = USER ;

The USER is a place holder for the current logged in User.

Querying for ‘users_names’ attribute instead of i_all_users_names will return only the groups where the user is directly part of that group (no sub groups)

Older Posts »

Blog at