Tuesday, June 17, 2008

Data Synchronization Techniques - Choosing the right one


1 Abstract
With mobile devices emerging as world’s dominant computing platform Microsoft has simplified application development. There are many synchronization techniques available. Synchronization is intended to use for offline applications, where data from central database is cached locally. The application works on cached data and the changes are updated in batch, conflicts should be detected and resolved if any between the actual database and cached data changes.

This paper discusses some synchronization technique and the process of choosing the right strategy of synchronizing in occasionally connected architectures.

2 Introduction
Thanks to Microsoft Compact Framework and Microsoft Windows Mobile developers’ toolkit as developers are not required to relearn a different set of skills or language to provide solution for emerging mobile era. The above point holds good only for the application development. When it comes to the important area where user might want to use the application when he is occasionally connected, SYNCHRONIZATION comes into picture. There are many SYNCHRONIZATION STRATEGIES available like Merge Replication, Remote Data Access and Web services. Each technique has its own set of advantages and limitations. We’ll discuss each Merge Replication, Remote Data Access and Web services and factors to be considered while choosing the right synchronization technique for any mobile application.

3 Target Audience
This paper is for those who are aware of the disconnected architecture and Microsoft technology available to implement the solutions for disconnected architecture.

4 Synchronization techniques in brief

4.1 Merge Replication
Merge Replication allows mobile devices and server to work on the same set of data independently, autonomously and data on devices can be synchronized by sending changed data to server and receive the latest data from the server. Merge Replication can be implemented using the SQL Server 2000/2005 (Server), Internet Information Services (Communication Facilitator) and SQL Server Mobile. Merge Replication requires more configuration and maintenance on the SQL Server and the Internet Information Services but it has advantages like synchronization of multiple tables at a time, Identity Range Management and built-in and custom conflict resolution capabilities.

Relationship between SQL Server data and SQL Server Mobile is specified declaratively through the definition publication on SQL Server. For a publication on SQL Server there can be many subscribers SQL Server Mobile.

4.2 Remote Data Access
Remote Data Access allows a SQL Server Mobile to pull the data from the SQL Server 2000/2005 and push the data from SQL Server Mobile to the SQL Server 2000/2005. Remote Data Access is more of a client centric technique where all the processing is done by the SQL Server Mobile by writing the pull SQL statements and deciding on what to push. Remote Data Access can be implemented using the SQL Server 2000/2005(Server), Internet Information Services (Communication Facilitator) and SQL Server Mobile. The SQL Server Mobile can optionally track all the changes that are made to local table.

4.3 Web Services
Web services can be used to pull data from the SQL Server 2000/2005 to the SQL Server Mobile. Application can work on the pulled data using the web service. At later point of time, either manually or by a scheduled process the changed data can be pushed back to the server using the web service. This approach requires minimal configuration but a lot of coding effort is required to achieve the synchronization using this technique.

5 Implementation Details

5.1 Replication
Replication methodology also comes with the .Net framework.
Replication makes use of the SQL server mobile (CE) agent Sqlcesa30.dll and the SQL server mobile (CE) replication provider Sqlcerp30.dll to do the replication.

To achieve synchronization through Replication we have to do the following configurations
  • Configuring SQL Server 2000

  • Configuring IIS

  • Configuring SQL server Mobile (CE)


5.1.1 Configuring SQL Server 2000
Create a snapshot folder in your machine
Share the snapshot folder created by giving full rights to the anonymous access user (IUSR_machinename)
Publish the required database with the help of the publication wizard from the SQL server 2000(Type of publication should be ‘Merge Replication’)
Add the anonymous user (IUSR_machinename) to the publication Access list of the publication

5.1.2 Configuring IIS
Create a virtual directory in the IIS
Copy the server agent Sqlcesa30.dll to the created virtual directory and register the same
Copy the replication provider Sqlcerp30.dll to the created virtual directory and register the same
Check on the anonymous access and make sure it is been mapped to the (IUSR_machinename) user is been mapped to the anonymous access

5.1.3 Configuring SQL Server Mobile
Subscribe a database to the database that is been published in the SQL server with the help of the Subscription wizard

After doing the above mentioned configurations, it is enough if we just create a Replication object assigning valid values to its properties and calling ‘Replication. Synchronize’ method.

Using Replication we can do two ways synchronization with the help of a single method ‘synchronize’ by setting a property of the replication object.

5.2 RDA
Remote data access methodology comes with the .Net framework.
RDA makes use of the SQL server mobile (CE) agent Sqlcesa30.dll to do the replication.

To achieve synchronization through RDA we have to do the following configurations

Create a virtual directory in the IIS
Copy the server agent Sqlcesa30.dll to the created virtual directory and register the same
Check on the anonymous access and make sure it is been mapped to the (IUSR_machinename) user is been mapped to the anonymous access

After doing the above mentioned configurations, it is enough if we just create a RDA object assigning valid values to its properties and calling either the ‘RDA.Push’ method or ‘RDA.Pull’ method.

‘RDA.Pull’ method gets the data from the central database (SQL server 2000) and populates the local mobile database.

‘RDA.Push’ method gets the data from the local mobile database and synchronizes the same with the central database.

Using RDA we can fetch only the data what we want by making use of the second parameter of the Pull method. We can pass a SQL query as the second parameter to the Pull method and we can get the result set in a table (local mobile database).

Also RDA tracks the records that have really been changed and synchronizes those records alone back to the central database. This RDA action can be triggered programmatically by passing the fourth parameter to the Pull method which is optional.

5.3 Web Services
In this approach we have written a web service to interact with the centralized database. This web service will be responsible for fetching the data from the centralized database and also for updating the centralized database.

This Web service will be deployed in the IIS server.

Using web services we will be able to fetch only the data which we require and two ways synchronization is attained.

This approach is totally independent of the server agent and the replication provider dlls. In this approach we don’t have to do any configurations as we do in the above two approaches.

6 Technical Comparison b/w Synchronization Techniques

6.1 Schema Changes
While using RDA, schema changes on the server are not supported. Depending on the schema change, a push from the client to the server may fail. If the server schema is changed, the client must drop the table and pull all of the data from the server again. Changing the server schema can cause you to change and recompile your application. Minor schema changes on the client are allowed.

When using Merge Replication, schema changes can be replicated to the Subscriber after the subscription is initially created. When a pending schema change needs to be replicated to a Subscriber, the schema change is first replicated, and then the resulting delta changes are exchanged between the Publisher and the Subscriber.

6.2 Identity Columns
When using incrementing number to manage the records inserted into the tables Merge Replication manages the identity both on the publisher and subscriber.

RDA requires the developers to manage the identity on their own which is an overhead.

Web Services also expects the developers to manage the identity columns.

6.3 Conflict Resolution
When working in the disconnected architecture the most important aspect is the way the conflicts are handled while synchronizing. As many users might work on the same data and modify the same set of records, the changes that should reflect into central system may depend on some rules.

Merge Replication fully supports the resolution and management of conflicts on the server, including built-in and custom conflict resolvers. In most of the scenarios the built-in resolvers with suffice but in systems driven by a lot of business rules the custom conflict resolvers can be used.

RDA and Web Services do not have any built-in facilities to facilitate the conflict resolution. The conflicts between two subscribers cannot be resolved using this architecture. Only conflicts between the central server SQL Server and the SQL Server Mobile can be resolved using the logic written by developers.

6.4 Track Data Changes
Merge Replication optionally allows both row level and column level tracking for the data that is changed. So when the data is synchronized, the amount of data that travels over the network is minimal.

RDA allows only row level tracking so the changes can be tracked at row level.

Web Services do not support any tracking for data changes.

6.5 Server Centric
With the technical comparison given so far, the Merge Replication would win the race but when it comes to configuration Merge Replication is not all that easy.

Merge Replication uses the replication features of the SQL Server. Creating a publication and the conflict resolvers create a lot of system tables on the server. Merge Replication also creates an additional column in each table that is published. A lot of server side configuration is required to achieve the Merge Replication. It also requires maintenance of server.

RDA and Web Services do not require any changes to the database server in order the achieve synchronization. RDA and Web Services are easy to maintain.

While Merge Replication is server invasive, RDA and Web Services are not server invasive.

7 Choosing the right approach
With the three approaches introduction, configuration details and the technical comparison discussed we’ll discuss further on the factors to be considered while choosing the right technique for the project under consideration.

According to the nature of the project the each factor can be used as evaluation criteria to settle for an approach best suited for the project.

7.1 Effort
Effort required in achieving the synchronization in the developer’s point of view. Using Merge Replication the effort required is minimal, then comes RDA and Web Service requires a lot of developer’s effort. If the project requires little effort on coding the Merge Replication is a clear winner.

7.2 Configuration
Merge Replication requires a lot of configurations and monitoring when in production, while RDA and Web Service require one time basic configuration.

7.3 Tracking data changes
Merge Replication provides row level and column level tracking of data changes. RDA supports row level tracking. Web Service does not support any tracking.

7.4 Transaction
Transaction handling is a built-in support in Merge Replication, while RDA and Web Services do not have built-in transaction support. RDA and Web Service require the developer to write his/her own mechanism for transaction handling.

7.5 Multiple table synchronization
Merge Replication supports multiple table synchronization. RDA and Web Services do not support multiple table synchronization in parallel.

7.6 Traffic
As Merge Replication allows row and column level tracking, the traffic is minimal when using the Merge Replication. RDA allows row level tracking and traffic is more compare to Merge Replication but less compared to web services where there is neither row level nor column level tracking available.

7.7 Conflict Resolution
Merge Replication allows conflict resolution between multiple subscribers with the built-in conflict resolvers and custom conflict resolvers. RDA and Web Services requires developer to write his mechanism for conflict resolution.

8 Decision-matrix or PUGH matrix to settle on an architecture
To rank the architecture for a project in consideration we can make use of the PUGH matrix. A basic decision matrix consists of establishing a set of weighted criteria(Factors discussed in previous sections) upon which the potential options(Merge Replication, RDA and Web Service) can be decomposed, scored, and summed to gain a total score which can then be ranked.

Taking an example, the PUGH matrix looks like the following



And the decision graph for PUGH matrix above looks like the following.



Clearly Merge Replication is the winner in the example considered above. But the Weightage given to each factor was uniform for all. In the real time when a project is in consideration each factor should be given weightage carefully to arrive at a correct approach for synchronization.

9 Conclusion
In this paper we have learnt various techniques available for data synchronization, their technical comparison and factors to be considered while taking up a disconnected architecture implementation. Using the approach described in this paper, you can successfully choose the appropriate synchronization technique for synchronizing data between a SQL Server Mobile and SQL Server. For example, the application in question can take the benefit of simplicity of RDA and Web Services or can make use of the advanced features of Merge Replication. Some applications may favour Merge Replication, some RDA or some Web Services according to the nature of the application, effort, configurations required to have the application up and running on the field.

2 comments: