Databricks

Databricks enables companies to accelerate data-driven innovation with a unified approach to data analytics and AI. Leveraging Data Migrator to automate Hadoop data and Hive metadata migration directly to Databricks enables organizations to focus resources on development of new AI innovations rather than migration complexities. With Data Migrator’s native integration with the Databricks Unity Catalog, centralized data governance and access control capabilities enable faster data operations and accelerated time-to-business-value for enterprises.

Automate data and metadata migration to Databricks

Cirata Data Migrator is a safe and reliable cloud migration solution that automates the migration of Hadoop data and Hive metadata to the cloud. Data Migrator provides three key Databricks-specific functionalities:

Cirata Data Migrator supports Databricks Unity Catalog, enhancing data governance, access control, and collaboration while automating the large-scale transfer of data and metadata from existing data lakes to Databricks, even during source changes. This ensures seamless adoption of the Databricks Data Intelligence Platform with minimal disruption and risk, with faster implementation.
Cirata Data Migrator streamlines Apache Hive metadata access directly to Databricks, empowering your data initiatives with crucial information. It ensures Databricks remains updated with live data transfers from your existing data lake infrastructure, eliminating the need for complex and fragile data pipelines.
Cirata Data Migrator simplifies the transformation of data formats from Hadoop and Spark to Databricks' Delta Lake. This automation allows you to leverage Databricks' unique features without the hassle of manual transformations and by using your scalable Databricks runtimes, this ensures flexible handling of both large and small data sets.

Talk to an expert to learn more

Learn more with an informative partnership webinar between Databricks and Cirata on accelerating Hadoop migration to Databricks with Cirata.

“As a long-standing partner, Cirata has helped many customers in their legacy Hadoop to Databricks migrations. Now, the seamless integration of Cirata Data Migrator with Unity Catalog enables enterprises to capitalize on our Data and AI capabilities to drive productivity and accelerate their business value.” — Siva Abbaraju, Go-to-Market Leader, Migrations, Databricks.

Case studies from global leaders on how Cirata solved their challenges.

Leading global airline

Challenge

Expedite migration of their on-premises Hadoop data in Cloudera to Azure Data Lake Storage (ADLS) Gen2 and Delta Lake on Databricks while the on-premises system remained active.

Results

Automated migration with no disruption to existing production environment.
Faster time-to-market for revenue generating apps.
Reduced ongoing support costs and cost avoidance by decommissioning on-premises platform faster.

Leading global telecom

Challenge

Migrate 10s of PBs of data from their on-premises Hadoop environment to Microsoft Azure and Databricks without disrupting the business or downtime of their production environment.

Results

Migrated 13 PB of data from on-prem Hortonworks cluster to ADLS Gen2.
Ability to block 1 billion robocalls per month, over 7.2 times more per year than before.
42% reduction in the original data integration timeline.

Leading global automotive manufacture

Challenge

Multiple attempts to transition from their approach of using batch uploads to near-real-time, including with Microsoft data mover, failed to deliver both accurate replication and ongoing synchronization between the data center and the cloud.

Results

Enables the automotive manufacturer to see near-real-time insights from their data.
Data scientists could begin developing AI & ML models immediately.
Initial replication performed without business impact.

Cirata Data Migrator for Hadoop automates the movement of data to the cloud

The following capabilities enable zero business disruption, reduced risk, and best time-to-value.

Quick deployment and operation

Data Migrator is installed on an edge node of your Hadoop cluster. Deployment can be performed in minutes without impacting current operations, so users can begin moving data immediately.

Synchronization & replication

Existing datasets can be moved with a single pass through the source storage system, eliminating the CPU cycles and overhead associated with multiple scans, while also supporting continuous migration of any ongoing changes from source to target with zero disruption to current production systems.

Support for multiple sources and targets

Data Migrator supports HDFS distributions v2.6 and higher as source systems, as well as leading cloud service providers and select independent software vendors, such as Databricks and Snowflake, as target systems. See Data Migrator documentation for further details.

Transfer Hadoop data and Hive metadata

Data Migrator supports migration of HDFS data and Hive metadata to any public cloud and on-premises environments.

Data transfer at any scale

Datasets of any size — from terabytes to multiple petabytes — can be moved without affecting production environments. Horizontal scaling capabilities allow users to scale their migration capacity by configuring transfer agents to maximize the productivity of available bandwidth.

Easy management

Cirata browser-based user interface (UI) lets users manage the entire data and metadata migration from a single management console.

Programmatic interface

Migrations can also be managed through a comprehensive and intuitive command-line interface or by using the self-documenting representational state transfer API to integrate the solution with other programs as needed.

Flexible configurations and precise control

Organizations can configure migration jobs to meet their specific needs, such as defining sources, targets, and which data to migrate. There are also advanced capabilities, such as migration prioritization, path mapping, and network bandwidth-management controls.

Transfer verification

Data Migrator contains a data transfer verification function that scans both source and target environments to ensure data fidelity and validate the success of all data transfers. Results and reports are delivered through the UI or by email.

Powerful metrics and real-time monitoring

Users are updated on migration jobs, from health and status metrics providing estimates for migration completion to email notifications and real-time insights regarding usage enabling hands-off operations.

The sweet spot: Cloudera/Hadoop data movement and data lake migration

Navigating the complexities of migrating Cloudera/Hadoop data to Databricks can be daunting. Cirata streamlines this process, addressing multiple challenges with precision and expertise, ensuring a seamless transition for your data workloads.

Data migrations can block or slow Databricks adoption with the following issues: