Organizations increasingly view investments in data integration tools as a
strategic basis for enterprise data management. Vendors with capabilities across
multiple styles of data delivery, supported by strong metadata management and
service enablement, are becoming the focus of market demand
The data integration tools market is gaining new momentum as organizations
recognize the role of these technologies in support of high-profile initiatives
such as master data management (MDM), business intelligence (BI) and delivery of
service-oriented architectures (SOAs). Recent focus on cost control has made
data integration tools a surprising priority as organizations realize the
"people" commitment for implementing and supporting custom-coded or semimanual
data integration approaches is no longer reasonable. Vendor consolidation
continues, driven by the convergence of single-purpose tools into data
integration suites or platforms. While most vendors still approach this market
with multiple products, metadata-driven architectures supporting a range of data
delivery styles continue to emerge. Organizations seeking data integration tools
must assess their current and future requirements and map them against product
functionality, including support for a range of data integration patterns and
latencies. Buyers must recognize that, as an evolving market, disruptions caused
by merger and acquisition activity are likely as smaller vendors with valuable
technology continue to be subsumed into larger entities to form more complete
data integration tools portfolios.
Market Overview
The discipline of data integration comprises the practices, architectural
techniques and tools for achieving consistent access to, and delivery of, data
across the spectrum of data subject areas and data structure types in the
enterprise, to meet the data consumption requirements of all applications and
business processes. As such, data integration capabilities are at the heart of
the information-centric infrastructure and will power the frictionless sharing
of data across all organizational and system boundaries. Contemporary pressures
are leading to an increased investment in data integration in all industries and
geographic regions. Business drivers, such as the imperative for speed to market
and agility to change business processes and models, are forcing organizations
to manage their data assets differently. Simplification of processes and the IT
infrastructure are necessary to achieve transparency, and transparency requires
a consistent and complete view of the data, which represents the performance and
operation of the business. Data integration is a critical component of an
overall enterprise information management (EIM) strategy that can address these
data-oriented issues.
From a technology point of view, data integration tools were traditionally
delivered via a set of related markets, with vendors in each market offering a
specific style of data integration tool. Traditionally, tools for extraction,
transformation and loading (ETL) used predominantly in data warehouse/mart
implementations held the largest market shares in this overall space and formed
a "center of gravity" for market consolidation. BI efforts, with their focus on
metadata, promoted the infusion of the combined data integration market with
metadata management capabilities. However, in the past two years, vendors and
leading organizations have been pursuing a strategy of centralizing their
capabilities for semantic interpretation and reconciliation as a service within
a platform and reducing, in relative terms, their emphasis on connectivity and
specific data delivery styles. A variety of other related markets, such as those
for data quality tools, adapters and data modeling tools, overlap with the data
integration tools space. The result of this historical fragmentation in the
markets is the equally fragmented and complex way in which data integration is
accomplished in large enterprises — different teams using different tools with
little consistency, lots of overlap and redundancy, and no common management and
leverage of metadata. Technology buyers have been forced to acquire a portfolio
of tools from multiple vendors to amass the capabilities necessary to address
the full range of their data integration requirements. The market has not yet
reached a point at which data integration is typically achieved via a single
platform or suite, but notable improvements exist — especially relative to data
services delivery and metadata management.
With the emergence of the data integration tools market, separate and distinct
submarkets continue to converge, both at a vendor and technology level. This is
being driven by buyer demands (for example, organizations realizing they need to
think about data integration holistically and have a common set of data
integration capabilities they can use across the enterprise, particularly when
working on SOA initiatives that have many different consumers working in various
contexts). It is also being driven by vendors' actions (for example, vendors in
individual data integration submarkets organically expanding their capabilities
into neighboring areas, and acquisition activity bringing vendors from multiple
submarkets together). The result is a market for complete data integration tools
that address a range of different data integration styles and are based on
common design tooling, metadata and runtime architecture. This market has
supplanted the former data integration tools submarkets, such as ETL, and
becomes the competitive landscape in which Gartner evaluates vendors for
placement within this Magic Quadrant. While the traditional ETL vendor submarket
was the driver for consolidation, it is important to note that, even after
acquiring federation or message-based integration capabilities, the market is
not showing wide implementation of either of these approaches. Vendors that
supply all three types of integration techniques generally exhibit an
overwhelming strength in one — even those vendors that acquired market leaders
in a second submarket.
Gartner estimates the size of the market for data integration tools at
approximately $1.44 billion as of the end of 2007, and believes it is growing at
a compound annual rate of more than 17% (see "Market Trends: Data Integration
Tools, Worldwide, 2007-2012"). Services revenue from data integration tools
implementations is also growing, with the time and effort required to implement
the tools varying widely depending on the scope and complexity of the deployment
(see "Toolkit Sample Template: Data Integration LOE Estimator").
During 2H07 and 1H08, the market showed the most pronounced dichotomy in
execution that we have seen since 2006 when the Magic Quadrant for Data
Integration Tools first replaced the Magic Quadrant for ETL. The top execution
vendors — IBM, Informatica, SAP-Business Objects, Oracle and Microsoft — come
from the more traditional data integration tools heritage and their execution
capabilities arise from each vendor's traditional strength in supporting
integration of tabular (structured) data. The vendors in the Leaders' quadrant
have that traditional background, plus the incorporation of data integration for
newly emergent (unstructured) data types, as well as the ability to support all
major styles of data delivery. The remaining vendors demonstrate an interesting
response to the market demands and represent the potential for disruptive
practices in data integration. While ETI, Syncsort, Open Text and Pitney Bowes
Software are successful in the traditional data integration tools market
(providing only minimal support beyond ETL), the remaining vendors (all the
Visionaries plus Sybase) are pursuing a different strategy. The Visionaries have
chosen a different method of challenging the traditional Leaders. Some,
including Sun Microsystems and Tibco Software, recognize the emerging importance
of services architectures and, while the Leaders also offer this type of
solution, the Visionaries recognize this area of performance as an opportunity.
SAS, iWay Software, Pervasive Software and Sybase are each pursuing a focus that
is unique and different from that of the Leaders in terms of market execution —
channels, integration tightly linked with quality and analytics, and/or blending
search with integration. The Leaders also possess features/functionality in
these areas but the Visionaries differentiate on these characteristics (among
others). The challenge for the Visionaries will be to expand on execution and
that, in part, is dependent on their vision focus matching the emerging needs of
the market related to services-style delivery, metadata and data quality.
Market Definition/Description
The data integration tools market comprises vendors that offer software products
to enable the construction and implementation of data access and delivery
infrastructure for a variety of data integration scenarios, including:
- Data acquisition for BI and data warehousing: Extracting data from
operational systems, transforming and merging that data, and delivering it to
integrated data structures for analytic purposes. BI and data warehousing remain
a mainstay of the demand for data integration tools.
- Creation of integrated master data stores: Enabling the consolidation and
rationalization of the data, representing critical business entities such as
customers, products and employees. MDM may or may not be subject-based, and data
integration tools can be used to build the data consolidation and
synchronization processes that are key to success.
- Data migrations/conversions: Traditionally addressed most often via the
custom coding of conversion programs, data integration tools are increasingly
addressing the data movement and transformation challenges inherent in the
replacement of legacy applications and consolidation efforts during merger and
acquisition activities.
- Synchronization of data between operational applications: Similar in
concept to each of the previous scenarios, data integration tools provide the
capability to ensure database-level consistency across applications, both on an
internal and interenterprise basis, and in a bidirectional or unidirectional
manner.
- Creation of federated views of data from multiple data stores: Data
federation, often referred to as enterprise information integration (EII), is
growing in popularity as an approach for providing real-time integrated views
across multiple data stores without physical movement of data. Data integration
tools are increasingly including this type of virtual federation capability.
- Delivery of data services in an SOA context: An architectural technique,
rather than a data integration usage itself, data services are the emerging
trend for the role and implementation of data integration capabilities within
SOAs. Data integration tools will increasingly enable the delivery of many types
of data services.
- Unification of structured and unstructured data: Not a specific use-case
itself, and relevant to each of the above scenarios, there is an early but
growing trend toward leveraging data integration tools for merging both
structured and unstructured data sources, as organizations work on delivering a
holistic information infrastructure that addresses all data types.
Gartner has defined several classes of functional capabilities that vendors of
data integration tools must possess to deliver optimal value to organizations in
support of a full range of data integration scenarios:
- Connectivity/adapter capabilities (data source and target support).
- Data delivery capabilities.
- Data transformation capabilities.
- Metadata and data modeling capabilities.
- Design and development environment capabilities.
- Data governance capabilities (data quality, profiling and mining).
- Runtime platform capabilities.
- Operations and administration capabilities.
- Architecture and integration.
- Service-enablement capabilities.
Connectivity/Adapter Capabilities (Data Source and Target Support)
The ability to interact with a range of different data structures types,
including:
- Relational databases.
- Legacy and nonrelational databases.
- Various file formats.
- XML.
- Packaged applications such as CRM and supply chain management.
- Industry-standard message formats such as electronic data interchange (EDI),
SWIFT and Health Level Seven (HL7).
- Message queues, including those provided by application integration middleware
products and standards-based products (such as Java Messaging Service [JMS]).
- Emergent data types, such as e-mail, Web sites, office productivity tools and
content repositories.
In addition, data integration tools must support different modes of interaction
with this range of data structure types, including:
- Bulk acquisition and delivery.
- Granular trickle-feed acquisition and delivery.
- Changed-data capture (ability to identify and extract modified data).
- Event-based acquisition (time-based or data-value-based).
Data Delivery Capabilities
The ability to provide data to consuming applications, processes and databases
in a variety of modes, including:
- Physical bulk data movement between data repositories.
- Federated views formulated in memory.
- Message-oriented movement via encapsulation.
- Replication of data between homogeneous or heterogeneous database management
systems (DBMSs) and schemas.
In addition, support for delivery of data across the range of latency
requirements is important:
- Scheduled batch delivery.
- Streaming/real-time delivery.
- Event-driven delivery.
Data Transformation Capabilities
Built-in capabilities for achieving data transformation operations of varying
complexity, including:
- Basic transformations, such as data type conversions, string manipulations and
simple calculations.
- Intermediate complexity transformations, such as lookup and replace operations,
aggregations, summarizations, deterministic matching and management of slowly
changing dimensions.
- Complex transformations, such as sophisticated parsing operations on free-form
text and rich media.
In addition, the tools must provide facilities for development of custom
transformations and extension of packaged transformations.
Metadata and Data Modeling Capabilities
As the increasingly important heart of data integration capabilities, metadata
management and data modeling requirements include:
- Automated discovery and acquisition of metadata from data sources, applications
and other tools.
- Data model creation and maintenance.
- Physical to logical model mapping and rationalization.
- Defining model-to-model relationships via graphical attribute-level mapping.
- Lineage and impact analysis reporting, via graphical and tabular format.
- An open metadata repository, with the ability to share metadata bidirectionally
with other tools.
- Automated synchronization of metadata across multiple instances of the tools.
- Ability to extend the metadata repository with customer-defined metadata
attributes and relationships.
- Documentation of project/program delivery definitions and design principles in
support of requirements definition activities.
- Business analyst/end-user interface to view and work with metadata.
Design and Development Environment Capabilities
Facilities for enabling the specification and construction of data integration
processes, including:
- Graphical representation of repository objects, data models and data flows.
- Workflow management for the development process, addressing requirements such as
approvals and promotions.
- Granular role-based and developer-based security.
- Team-based development capabilities, such as version control and collaboration.
- Functionality to support reuse across developers and projects, and facilitate
identification of redundancies.
- Support for testing and debugging.
Data Governance Capabilities (Data Quality, Profiling and Mining)
Mechanisms for aiding the understanding and assurance of quality of data over
time, including interoperability with:
- Data profiling tools.
- Data mining tools.
- Data quality tools.
Runtime Platform Capabilities
Breadth of support for hardware and operating systems on which data integration
processes may be deployed, specifically:
- Mainframe environments, such as IBM z/OS and z/Linux.
- Midrange environments, such as IBM System i (formerly AS/400) or HP Tandem.
- Unix-based environments.
- Wintel environments.
- Linux environments.
Operations and Administration Capabilities
Facilities for enabling adequate ongoing support, management, monitoring and
control of data integration processes implemented via the tools, such as:
- Error-handling functionality, both predefined and customizable.
- Monitoring and control of runtime processes.
- Collection of runtime statistics to determine use and efficiency, as well as an
application-style interface for visualization and evaluation.
- Security controls, for both data "in flight" and administrator processes.
- Runtime architecture that ensures performance and scalability.
Architecture and Integration
The degree of commonality, consistency and interoperability between the various
components of the data integration toolset, including:
- Minimal number of products (ideally one) supporting all data delivery modes.
- Common metadata (single repository) and/or the ability to share metadata across
all components and data delivery modes.
- Common design environment for supporting all data delivery modes.
- Ability to switch seamlessly and transparently between delivery modes with
minimal rework.
- Interoperability with other integration tools and applications, via certified
interfaces and robust application programming interfaces (APIs).
- Efficient support for all data delivery modes regardless of runtime architecture
type (centralized server engine vs. distributed runtime).
Service-Enablement Capabilities
As acceptance of data services concepts continues to grow, data integration
tools must exhibit service-oriented characteristics and provide support for SOA
deployments, such as:
- Ability to deploy all aspects of runtime functionality as data services.
- Management of publication and testing of data services.
- Interaction with service repositories and registries.
- Service enablement of the development and administration environments, such that
external tools and applications can dynamically modify and control runtime
behavior of the tools.
Inclusion and Exclusion Criteria
For vendors to be included in this Magic Quadrant, they had to meet the
following requirements:
- Possess within their technology portfolio the subset of capabilities identified
by Gartner as most critical from within the overall range of capabilities
expected in data integration tools. Specifically, vendors must deliver the
following functional requirements:
- Range of connectivity/adapter support (sources and targets): native access to
relational DBMS products, plus access to nonrelational legacy data structures,
flat files, XML, and message queues.
- Mode of connectivity/adapter support (against a range of sources and targets):
bulk/batch and change data capture.
- Data delivery modes support: bulk/batch (ETL-style) delivery, plus at least one
additional mode (federated views, message-oriented delivery or data
replication).
- Data transformation support: at a minimum, packaged capabilities for basic
transformations (such as data type conversions, string manipulations and
calculations).
- Metadata and data modeling support: automated metadata discovery, lineage and
impact analysis reporting, and an open metadata repository including mechanisms
for bidirectional sharing of metadata with other tools.
- Design and development support: graphical design/development environment and
team development capabilities (such as version control and collaboration).
- Data governance support: ability to interoperate at a metadata level with data
profiling and/or data quality tools.
- Runtime platform support: Windows, Unix or Linux operating systems.
- Service enablement (ability to deploy functionality as services conforming to
SOA principles).
For this iteration of the Magic Quadrant, we added support for interaction with
message queues, changed-data capture capabilities, and ability to deploy
functionality as data services to reflect their importance in the ideal of a
comprehensive information-centric infrastructure, as well as in response to
increasing demand for this functionality in the market. Vendors had to:
- Generate at least $20 million of annual software revenue from data integration
tools or maintain at least 300 production customers.
- Support data integration tools customers in at least two of the major geographic
regions (North America, Latin America, Europe and Asia/Pacific).
- Have customer implementations that reflect the use of the tools at an enterprise
(cross-departmental and multiproject) level.
We excluded vendors focusing only on one specific data subject area (for
example, only customer data integration), a single industry, or their own data
models and architectures.
Many other vendors of data integration tools exist beyond those included in this
Magic Quadrant. However, most do not meet the above criteria and, therefore, we
have not included them in this analysis. Market trends in the past three years
indicate that organizations want to use data integration tools that provide
flexible data access, delivery and operational management capabilities within a
single vendor solution. Excluded vendors frequently provide products to address
one very specific style of data delivery (for example, only data federation) but
cannot support other styles. Others provide a range of functionality, but
operate only in a single region or support only narrow, departmental
implementations. Some vendors meet all the functional, deployment and geographic
requirements but are very early in their maturity and have limited revenue and
few production customers. The following vendors are sometimes considered by
Gartner clients alongside those appearing in the Magic Quadrant when deployment
needs are aligned with their specific capabilities and/or are newer market
entrants with relevant capabilities:
Ab Initio, Lexington, Massachusetts, www.abinitio.com — Application development toolbox
(Co>Operating System) and component library for metadata management and data
integration.
Alebra Technologies, Minneapolis, Minnesota,
www.alebra.com — Parallel
Data Mover for cross-platform file and database copying and sharing.
Apatar, Chicopee, Massachusetts, www.apatar.com — Open-source data integration tools
focused on ETL and data federation scenarios.
Attunity, Burlington, Massachusetts, www.attunity.com — A range of data-integration-oriented
products, including adapters (Attunity Connect), change data capture (Attunity
Stream) and data federation (Attunity Federate) for various platforms and
database/file types.
CA, Islandia, New York,
www.ca.com — Advantage Data Transformer provides ETL-oriented data
integration. InfoRefiner provides replication and propagation capabilities for
mainframe data repositories.
CDB Software, Houston, Texas, www.cdbsoftware.com — CDB/Delta provides change data
capture and replication capabilities for IBM DB2 on the z/OS platform.
Composite Software, San Mateo, California,
www.compositesw.com —
Composite Information Server provides data federation/EII capabilities and
supports delivery of data services.
Datawatch, Chelmsford, Massachusetts, www.datawatch.com — The Monarch Data Pump product
provides ETL functionality with a bias toward extracting data from report text,
PDF files, spreadsheets and other less-structured data sources.
Denodo Technologies, Palo Alto, California and Madrid, Spain,
www.denodo.com — The Denodo
Platform provides data federation and mashup capabilities for joining structured
data sources with data from Web sites, documents and other less-structured
repositories.
Embarcadero Technologies, San Francisco, California,
www.embarcadero.com —
The DT/Studio ETL tool provides support for a range of relational and other data
sources, and integrates with the vendor's data modeling and database design
tools.
ETL Solutions, Blaenau Ffestiniog, U.K.,
www.etlsolutions.com
— Transformation Manager provides a metadata-driven toolset for the authoring,
testing, debugging and deployment of various data integration requirements.
Exeros, Santa Clara, California, www.exeros.com — The Discovery product automates the
process of discerning the business rules that enable mapping and transformation
of data between dissimilar data structures.
expressor software, Burlington, Massachusetts,
www.expressor-software.com — The expressor product is based on a semantic
approach to designing and managing data integration processes.
GoldenGate Software, San Francisco, California,
www.goldengate.com —
Real-time, heterogeneous data replication capabilities provided by the
Transactional Data Management (TDM) software platform.
Ikan Software, Mechelen, Belgium, www.etl4all.com — Java-based ETL technology named
ETL4all, supporting transformation servers on Windows, Linux, Unix and IBM
iSeries.
Innovative Routines International (CoSort), Melbourne, Florida,
www.cosort.com — The Fast
Extract and SortCL tools provide for rapid unloading and transformation of data
in Oracle databases in support of ETL processes.
Jitterbit, Oakland, California, www.jitterbit.com — Freely downloadable software with a
focus on both application integration (event- and message-based) and data
integration.
Kalido, Burlington, Massachusetts, and London, U.K.,
www.kalido.com — The Kalido
Active Information Management software enables dynamic data modeling and change
management for data warehouses and master data environments.
Metatomix, Dedham, Massachusetts, www.metatomix.com — Follows a semantics-based approach
to creation of data services and federated views of data across multiple data
sources.
Pentaho, Orlando, Florida, www.pentaho.org — A provider of open-source BI
solutions, Pentaho has added data integration tools to its portfolio by
leveraging the Kettle open-source project and providing services and support.
Progress Software, Bedford, Massachusetts,
www.progress.com — The
DataXtend and DataDirect product lines provide tools for data access,
replication and synchronization.
Quest Software, Aliso Viejo, California, www.quest.com — SharePlex provides real-time replication
support for Oracle DBMS environments and is targeted primarily at
high-availability applications.
Red Hat/MetaMatrix, Raleigh, North Carolina,
www.redhat.com — The
MetaMatrix Server, Enterprise and Query products support creation of data models
and model-driven federated views of data.
Relational Solutions, Westlake, Ohio,
www.relationalsolutions.com — The BlueSky Integration Studio provides ETL
capabilities in a simplified, low-cost toolset that runs in the Windows
environment.
SchemaLogic, Kirkland, Washington, www.schemalogic.com — Creation and maintenance of data
models (Workshop), business models (SchemaServer), and the ability to propagate
models and data across applications (Integration Service).
Seagull Software, Atlanta, Georgia,
www.seagullsoftware.com — SmartDB for data migrations to the Oracle
E-Business Suite.
SOALogix, Reston, Virginia, www.soalogix.com — The Confero SOA product offers a
platform for the creation and delivery of data services for SOA.
Software AG, Darmstadt, Germany, www.softwareag.com — The Enterprise Information
Integrator product provides data federation capabilities and is geared toward
SOA deployments. The vendor's acquisition of webMethods in 2007 added
process-oriented integration capabilities.
Software Labs, Roseville, California,
www.softlabsco.com —
The xFusion Studio product provides ETL functionality positioned toward a range
of use-cases including BI and migrations.
Sypherlink, Dublin, Ohio, www.sypherlink.com — Metadata discovery and mapping via
Harvester, and access to data sources for creation of integrated views via
Exploratory Warehouse.
Talend, Los Altos, California, and Suresnes, France,
www.talend.com — Open
Studio is an open-source tool that primarily supports ETL-oriented
implementations and is provided for on-premises deployment as well as in a
software-as-a-service (SaaS) delivery model.
TigerLogic (formerly Raining Data), Irvine, California,
www.tigerlogic.com —
TigerLogic XDMS provides XML-based data federation and persistence, as well as
delivery of data services.
Vamosa, Glasgow, U.K., and Boston, Massachusetts,
www.vamosa.com — Provides
content integration and migration, aimed at synchronization and consolidation of
document repositories, via its Content X-Change and Content Migrator products.
Vision Solutions, Irvine, California,
www.visionsolutions.com — Real-time database replication functionality is
provided in the Vision Replicate1 product.
WhereScape, Portland, Oregon, www.wherescape.com — WhereScape Red enables rapid
creation and maintenance of data warehouses, including ETL functionality.
XAware, Colorado Springs, Colorado, www.xaware.com — Provides support for the access,
integration and service enablement of data sources via its XA-Suite product.