Identity correlation
Encyclopedia
In information systems, identity correlation is a process that reconciles and validates the proper ownership of disparate user account login IDs (user names) that reside on systems and applications throughout an organization and can permanently link ownership of those user account login IDs to particular individuals by assigning a unique identifier
Unique identifier
With reference to a given set of objects, a unique identifier is any identifier which is guaranteed to be unique among all identifiers used for those objects and for a specific purpose...

 (also called primary or common keys) to all validated account login IDs.

The process of identity correlation validates that individuals only have account login IDs for the appropriate systems and applications a user should have access to according to the organization’s business policies, access control
Access control
Access control refers to exerting control over who can interact with a resource. Often but not always, this involves an authority, who does the controlling. The resource can be a given building, group of buildings, or computer-based information system...

 policies and various application requirements.

A unique identifier, in the context of identity correlation, is any identifier
Identifier
An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...

 which is guaranteed to be unique among all identifiers used for a group of individuals and for a specific purpose. There are three main types of unique identifiers, each corresponding to a different generation strategy:

• Serial numbers, assigned incrementally

• Random numbers, selected from a number space much larger than the maximum (or expected) number of objects to be identified. Although not really unique, some identifiers of this type may be appropriate for identifying objects in many practical applications, and so are referred to as “unique” within this context

• Name or codes allocated by choice, but are forced to be unique by keeping a central registry such as the EPC Information Services of the EPCglobal Network
EPCglobal Network
The EPCglobal Network is a computer network used to share product data between trading partners. It was created by EPCglobal. Basis for the information flow in the network is the Electronic Product Code of each product which is stored on an RFID tag....



For the purposes of identity correlation, a unique identifier is typically a serial number
Serial number
A serial number is a unique number assigned for identification which varies from its successor or predecessor by a fixed discrete integer value...

 or random number selected from a number space much larger than the maximum number of individuals who will be identified. A unique identifier, in this context, is typically represented as an additional attribute in the directory associated with each particular data source. However, adding an attribute to each system-specific directory may affect application requirements or specific business requirements, depending on the requirements of the organization. Under these circumstances, unique identifiers may not be an acceptable addition to an organization.

Basic Requirements of Identity Correlation

Identity Correlation involves several factors:

1. Linking Disparate Account IDs Across Multiple Systems or Applications

Many organizations must find a method to comply with audits that require it to link disparate application user identities with the actual people who are associated with those user identities.

Some individuals may have a fairly common first and/or last name, which makes it difficult to link the right individual to the appropriate account login ID, especially when those account login IDs are not linked to enough specific identity data to remain unique.

A typical construct of the login ID, for example, can be the 1st character of givenname + next 7 of sn, with incremental uniqueness. This would produce login IDs like jsmith12, jsmith 13, jsmith14, etc. for users John Smith, James Smith and Jack Smith, respectively.

Conversely, one individual might undergo a name change either formally or informally, which can cause new account login IDs that the individual appropriates to appear drastically different in nomenclature to the account login IDs that individual acquired prior to any change.

For example, a woman could get married and decide to use her new surname professionally. If her name was originally Mary Jones but she is now Mary Smith, she could call HR and ask them to update her contact information and email address with her new surname. This request would update her Microsoft Exchange login ID to mary.smith to reflect that surname change, but it might not actually update her information or login credentials in any other system she has access to. In this example, she could still be mjones in Active Directory and mj5678 in RACF.

Identity correlation should link the appropriate system account login IDs to individuals who might be indistinguishable, as well as to those individuals who might appear to be drastically different from a system-by-system standpoint, but should be associated with the same individual.

For more details on this topic, please see: The Second Wave: Linking Identities to Contexts

2. Discovering Intentional and Unintentional Inconsistencies in Identity Data

Inconsistencies in identity data typically develop over time in organizations as applications are added, removed or changed and as individuals attain or retain an ever-changing stream of access rights as they matriculate into and out of the organization.

Application user login IDs do not always have a consistent syntax across different applications or systems and many user login IDs are not specific enough to directly correlate it back to one particular individual within an organization.

User data inconsistencies can also occur due to simple manual input errors, non-standard nomenclature, or name changes that might not be identically updated across all systems.

The identity correlation process should take these inconsistencies into account to link up identity data that might seem to be unrelated upon initial investigation.

3. Identifying Orphan or Defunct Account Login IDs

Organizations can expand and consolidate from mergers and acquisitions, which increases the complexity of business processes, policies and procedures as a result.

As an outcome of these events, users are subject to moving to different parts of the organization, attaining a new position within the organization, or matriculating out of the organization altogether. At the same time, each new application that is added has the potential to produce a new completely unique user ID.

Some identities may become redundant, others may be in violation of application-specific or more widespread departmental policies, others could be related to non-human or system account IDs, and still others may simply no longer be applicable for a particular user environment.

Projects that span different parts of the organization or focus on more than one application become difficult to implement because user identities are often not properly organized or recognized as being defunct due to changes in the business process.

An identity correlation process must identify all orphan or defunct account identities that no longer belong from such drastic shifts in an organization’s infrastructure.

4. Validating Individuals to their Appropriate Account IDs

Under such regulations as Sarbanes-Oxley and Gramm-Leach-Bliley Act
Gramm-Leach-Bliley Act
The Gramm–Leach–Bliley Act , also known as the Financial Services Modernization Act of 1999, is an act of the 106th United States Congress...

, it is required for organizations to ensure the integrity of each user across all systems and account for all access a user has to various back-end systems and applications in an organization.

If implemented correctly, identity correlation will expose compliance issues. Auditors frequently ask organizations to account for who has access to what resources. For companies that have not already fully implemented an enterprise identity management
Identity management
Identity management is a broad administrative area that deals with identifying individuals in a system and controlling access to the resources in that system by placing restrictions on the established identities of the individuals.Identity management is multidisciplinary and covers many...

 solution, identity correlation and validation is required to adequately attest to the true state of an organization’s user base.

This validation process typically requires interaction with individuals within an organization who are familiar with the organization’s user base from an enterprise-wide perspective, as well as those individuals who are responsible and knowledgeable of each individual system and/or application-specific user base.

In addition, much of the validation process might ultimately involve direct communication with the individual in question to confirm particular identity data that is associated with that specific individual.

5. Assigning a unique primary or common key for every system or application Account ID that is attached to each individual

In response to various compliance pressures, organizations have an option to introduce unique identifiers for its entire user base to validate that each user belongs in each specific system or application in which he/she has login capabilities.

In order to effectuate such a policy, various individuals familiar with the organization’s entire user base, as well as each system-specific user-base, must be responsible for validating that certain identities should be linked together and other identities should be disassociated from each other.

Once the validation process is complete, a unique identifier can be assigned to that individual and his or her associated system-specific account login IDs.

Approaches to Linking Disparate Account IDs

As mentioned above, in many organizations, users may sign into different systems and applications using different login IDs. There are many reasons to link these into ``enterprise-wide user profiles.

There are a number of basic strategies to perform this correlation, or "ID Mapping:"
  • Assume that account IDs are the same:
    • In this case, mapping is trivial.
    • This actually works in many organizations, in cases where a rigorous and standardized process has been used to assign IDs to new users for a long time.
  • Import mapping data from an existing system:
    • If an organization has implemented a robust process for mapping IDs to users over a long period, this data is already available and can be imported into any new Identity management
      Identity management
      Identity management is a broad administrative area that deals with identifying individuals in a system and controlling access to the resources in that system by placing restrictions on the established identities of the individuals.Identity management is multidisciplinary and covers many...

       system.
  • Exact matching on attribute values:
    • Find one identity attribute or a combination of attributes on one system which correlate to one or more attributes on another system.
    • Connect IDs on the two systems by finding users whose attribute(s) are the same.
  • Approximate matching on attribute values:
    • The same as above, but instead of requiring attributes or expressions to match exactly, tolerate some differences.
    • This allows for mis-spelled, inconsistently capitalized and otherwise somewhat diverse names and similar identity values.
    • The risk here is that accounts which should not be connected will accidentally be matched by this process.
  • Self-service login ID reconciliation:
    • Invite users to fill in a form and indicate which IDs, on which systems, they own.
    • Users might lie or make mistakes -- so it's important to validate user input, for example by asking users to also provide passwords and to check those passwords.
    • Users might not recognize system names -- so it's important to offer alternatives or ask users for IDs+passwords in general, rather than asking them to specify which system those IDs are for.
  • Hire a consultant and/or do it manually:
    • This still leaves open the question of where the data comes from -- perhaps by interviewing every user in question?

Common Barriers to Performing Identity Correlation

1. Privacy Concerns

Often, any process that requires an in-depth look into identity data brings up a concern for privacy and disclosure issues. Part of the identity correlation process infers that each particular data source will need to be compared against an authoritative data source to ensure consistency and validity against relevant corporate policies and access controls.

Any such comparison that involves an exposure of enterprise-wide, authoritative, HR-related identity data will require various non-disclosure agreements either internally or externally, depending on how an organization decides to undergo an identity correlation exercise.

Because authoritative data is frequently highly confidential and restricted, such concerns may bar the way from performing an identity correlation activity thoroughly and sufficiently.

2. Extensive Time and Effort Requirements

Most organizations experience difficulties understanding the inconsistencies and complexities that lie within their identity data across all of their data sources. Typically, the process can not be completed accurately or sufficiently by undergoing a manual comparison of two lists of identity data or even executing simple scripts to find matches between two different data sets. Even if an organization can dedicate full-time individuals to such an effort, the methodologies themselves usually do not expose an adequate enough percentage of defunct identities, validate an adequate enough percentage of matched identities, or identify system (non-person) account IDs to pass the typical requirements of an identity-related audit.

See also

  • Sarbanes-Oxley Act
    Sarbanes-Oxley Act
    The Sarbanes–Oxley Act of 2002 , also known as the 'Public Company Accounting Reform and Investor Protection Act' and 'Corporate and Auditing Accountability and Responsibility Act' and commonly called Sarbanes–Oxley, Sarbox or SOX, is a United States federal law enacted on July 30, 2002, which...

    (SOX)
  • Gramm-Leach-Bliley Act
    Gramm-Leach-Bliley Act
    The Gramm–Leach–Bliley Act , also known as the Financial Services Modernization Act of 1999, is an act of the 106th United States Congress...

     (GLBA)
  • Health Insurance Portability and Accountability Act
    Health Insurance Portability and Accountability Act
    The Health Insurance Portability and Accountability Act of 1996 was enacted by the U.S. Congress and signed by President Bill Clinton in 1996. It was originally sponsored by Sen. Edward Kennedy and Sen. Nancy Kassebaum . Title I of HIPAA protects health insurance coverage for workers and their...

    (HIPAA)
  • Information Technology Audit
    Information technology audit
    An information technology audit, or information systems audit, is an examination of the management controls within an Information technology infrastructure. The evaluation of obtained evidence determines if the information systems are safeguarding assets, maintaining data integrity, and operating...

     (ITA)


Manual efforts to accomplish identity correlation require a great deal of time and people effort, and do not guarantee that the effort will be completed successfully or in a compliant fashion.

Because of this, automated identity correlation solutions have recently entered the marketplace to provide more effortless ways of handling identity correlation exercises.

Typical automated identity correlation solution functionality includes the following characteristics:

• Analysis and comparison of identities within multiple data sources

• Flexible match criteria definitions and assignments for any combination of data element
Data element
In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. A data element has:# An identification such as a data element name# A clear data element definition# One or more representation terms...

s between any two data sources

• Easy connectivity either directly or indirectly to all permissible sources of data

• Out-of-the-box reports and/or summaries of data match results

• Ability to manually override matched or unmatched data combinations

• Ability to view data results on fine-grained level

• Assignment of unique identifiers to pre-approved or manually validated matched data.

• Export abilities to send verified user lists back to source systems and/or provisioning solutions

• Ability to customize data mapping
Data mapping
Data mapping is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks including:...

 techniques to refine data matches

Role-based access control
Role-Based Access Control
In computer systems security, role-based access control is an approach to restricting system access to authorized users. It is used by the majority of enterprises with more than 500 employees, and can be implemented via mandatory access control or discretionary access control...

s built into the solution to regulate identity data exposures as data is loaded, analyzed, and validated by various individuals both inside and outside of the organization

• Ability to validate identity data against end-users more quickly or efficiently than through manual methodologies

Three Methods of Identity Correlation Project Delivery

Identity correlation solutions can be implemented under three distinct delivery models. These delivery methodologies are designed to offer a solution that is flexible enough to correspond to various budget and staffing requirements, as well as meet both short and/or long-term project goals and initiatives.

Software Purchase – This is the classic Software Purchase model where an organization purchases a software license and runs the software within its own hardware infrastructure.
  • Training is available and recommended
  • Installation Services are optional


Identity Correlation as a Service (ICAS) – ICAS is a subscription-based service where a client connects to a secure infrastructure to load and run correlation activities. This offering provides full functionality offered by the identity correlation solution without owning and maintaining hardware and related support staff.

Turn-Key Identity Correlation – A Turn-key methodology requires a client to contract with and provide data to a solutions vendor to perform the required identity correlation activities. Once completed, the solutions vendor will return correlated data, identify mismatches, and provide data integrity reports.

Validation activities will still require some direct feedback from individuals within the organization who understand the state of the organizational user base from an enterprise-wide viewpoint, as well as those individuals within the organization who are familiar with each system-specific user base. In addition, some validation activities might require direct feedback from individuals within the user base itself.

A Turn-Key solution can be performed as a single one-time activity or monthly, quarterly, or even as part of an organization’s annual validation activities. Additional services are available, such as:
  • Email Campaigns to help resolve data discrepancies
  • Consolidated or merged list generation

See Also: Related Topics

Related or associated topics which fall under the category of identity correlation may include:

Compliance Regulations / Audits

Sarbanes-Oxley Act
Sarbanes-Oxley Act
The Sarbanes–Oxley Act of 2002 , also known as the 'Public Company Accounting Reform and Investor Protection Act' and 'Corporate and Auditing Accountability and Responsibility Act' and commonly called Sarbanes–Oxley, Sarbox or SOX, is a United States federal law enacted on July 30, 2002, which...



Gramm-Leach-Bliley Act
Gramm-Leach-Bliley Act
The Gramm–Leach–Bliley Act , also known as the Financial Services Modernization Act of 1999, is an act of the 106th United States Congress...



Health Insurance Portability and Accountability Act
Health Insurance Portability and Accountability Act
The Health Insurance Portability and Accountability Act of 1996 was enacted by the U.S. Congress and signed by President Bill Clinton in 1996. It was originally sponsored by Sen. Edward Kennedy and Sen. Nancy Kassebaum . Title I of HIPAA protects health insurance coverage for workers and their...



Information Technology Audit
Information technology audit
An information technology audit, or information systems audit, is an examination of the management controls within an Information technology infrastructure. The evaluation of obtained evidence determines if the information systems are safeguarding assets, maintaining data integrity, and operating...



Management of identities

Identity Management
Identity management
Identity management is a broad administrative area that deals with identifying individuals in a system and controlling access to the resources in that system by placing restrictions on the established identities of the individuals.Identity management is multidisciplinary and covers many...



Unique identifier
Unique identifier
With reference to a given set of objects, a unique identifier is any identifier which is guaranteed to be unique among all identifiers used for those objects and for a specific purpose...

 (Common Key)

Identifier
Identifier
An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...



• User Name

• User ID

Provisioning
Provisioning
In telecommunication, provisioning is the process of preparing and equipping a network to allow it to provide services to its users. In NS/EP telecommunications services, "provisioning" equates to "initiation" and includes altering the state of an existing priority service or capability.In a...



Metadirectory
Metadirectory
A metadirectory system provides for the flow of data between one or more directory services and databases, in order to maintain synchronization of that data, and is an important part of identity management systems. The data being synchronized typically are collections of entries that contain user...



Access control

Access control
Access control
Access control refers to exerting control over who can interact with a resource. Often but not always, this involves an authority, who does the controlling. The resource can be a given building, group of buildings, or computer-based information system...

 

• Single Sign On (SSO)

Web Access Management
Web Access Management
Web Access Management is a subcategory of the broader Identity management space. Web Access Management controls access to Web resources, providing:* Authentication Management* Policy-based Authorizations* Audit & Reporting Services...



Directory services

Directory service
Directory service
A directory service is the software system that stores, organizes and provides access to information in a directory. In software engineering, a directory is a map between names and values. It allows the lookup of values given a name, similar to a dictionary...



Lightweight Directory Access Protocol
Lightweight Directory Access Protocol
The Lightweight Directory Access Protocol is an application protocol for accessing and maintaining distributed directory information services over an Internet Protocol network...



Metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...



Virtual directory
Virtual directory
In computing, a virtual directory or virtual directory server is a software layer that delivers a single access point for identity management applications and service platforms...



Other categories

Role-based access control
Role-Based Access Control
In computer systems security, role-based access control is an approach to restricting system access to authorized users. It is used by the majority of enterprises with more than 500 employees, and can be implemented via mandatory access control or discretionary access control...

(RBAC)

Federation
Federation
A federation , also known as a federal state, is a type of sovereign state characterized by a union of partially self-governing states or regions united by a central government...

of user access rights on web applications across otherwise un-trusted networks

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK