Conflict Resolution
From Kolab Wiki
Definitions
- conflict: A conflict is any situation in which there are two objects instances of the same object.
- object: An object is any kolab object (usually stored in a xCal/xCard file). Objects are usually globally unique and identified by uid. objects have multiple versions as they are modified and different versions of the same object may exist in different places.
- object instance: An object instance is an actual representation of the object and has therefore always one specific version, and instance can be an in-memory representation or also i.e. an imap-object.
Conflict Scenarios
- Parallel off-line editing of the same object.
- User A edits Object A, and does not synchronize
- User B edits Object A, and does not synchronize
- User A synchronizes
- User B synchronizes (and a conflict occurs)
- Multiple versions of an object with the same <uid>
- Multiple, different objects with the same <uid>
- Note: SHOULD not happen. CAN still happen.
- User A is new to a company that has been around for the past millenium.
- All users in said company have always been using Kolab.
- Almost all possible values for <uid> are used somewhere (Note that this is not a problem if a proper globally unique id is used, the possibility remains though).
- User A has a personal calendar folder with some events.
- User A now subscribes to the shared calendaring folders that contain humongous sets of data.
- There is a potential, though perhaps with a low likelihood, that multiple, different objects exist with the same <uid>.
- Object gets deleted while another user edits it
- User A starts to edit a shared object
- User B deletes the same object meanwhile
- Scheduling conflicts
- An event is scheduled at the same time another event is taking place.
- Client unable to mark as deleted and/or expunge
- There is a temporary conflict on every write as a write always consists of an APPEND first, so there is a conflict until the old object gets expunged.
- Client A moves object to a different folder (Client B had access, no longer has access)
Conflict Resolution Requirements
1. Detecting that there is a conflict between two objects
- A client must be able to detect that two object instances refer to the same object within a single conflict domain.
- Primarily a client must be able to identify that two objects should be the same, but aren't. I.e. by UID and comparison.
- A client must also be able to detect when the object has changed while it was editing it, and must never overwrite an object (this is not a problem on imap, as it's anyways not possible to modify imap objects).
2. Resolving the conflict
- A client must be able to resolve a conflict using a conflict resolution strategy in order to write data. At all times it must be ensured that the user is not forced to make decisions he cannot make (because he doesn't know how to resolve the conflict) and no data is lost (i.e. by automatically discarding one version).
Conflict Domain
A conflict domain, is the domain within which conflicts should be detected. That means within one conflict domain the multiple occurrence of a single object results in a conflict, while the same object may exist in different conflict domains simultaneously. The general rule for calendaring items is that a calendar instance represents a conflict domain, within which conflicts MUST be detected and resolved. On the other hand conflicts MUST NOT be resolved across the borders of conflict domains, as that could lead to potential data-loss. On the server, an imap folder currently equals a calendar instance, and thus a conflict domain, meaning conflicts MUST only be detected within a single folder (for a single type, of course we have an event/todo/journal folder each for a single calendar)
A client MUST always see the full conflict domain to be able to resolve a conflict. Otherwise it's possible that different clients resolve a conflict differently, resulting in the object not being visible for one of the clients anymore, or even in deleting the object completely.
Kolab IMAP Conflict Resolution
The following lines out an approach to conflict resolution under the following basic assumptions:
- The Kolab Objects are stored in IMAP, although they may be cached temporarily (e.g. for offline access).
- It is impossible to avoid conflicts on the server (due to IMAP), therefore all clients MUST be able to deal with conflicts.
- The IMAP store MUST always remain in a valid state, that means there is always exactly one valid object instance which can be worked with.
- This because a client may synchronize at any time, and must get data it can work with (no locks or alike)
- The Conflict Resolution is always only applied to a single Conflict Domain.
Summary
- Reading clients MUST be able to identify the latest valid object version in case of conflicts according to #Identifying the valid object instance
- Writing clients MUST ensure that an appended version resulted in a valid object to avoid data-loss.
Conflict Detection
- A conflict exists whenever two objects with the same UID exist within a #Conflict Domain.
- An IMAP-Folder SHALL equal a #Conflict Domain.
- This implies that no conflict can exist accross folders and objects MAY coexist in several folders in different versions at the same time.
Conflict Resolution
- There is always exactly one valid instance of an object, which may be used for further edits.
- Several object instances may exist (which is a conflict), but there is always only one valid instance, and all others are considered invalid and may be deleted.
- Clients MUST NOT use or rely on invalid objects at any time.
- Clients MUST NOT base new edits on invalid objects.
- Clients MUST NOT show invalid objects to the user.
- Clients MUST NOT use invalid objects for merges.
Identifying the valid object instance
The following set of rules SHALL be applied in the same order to identify the valid object instance within a Conflict Domain:
- If there is only one object instance, it SHALL be valid.
- If multiple object instances exist, the one with the lowest IMAP UID SHALL be selected
- Because a client cannot guarantee that no race-condition occurs while the IMAP-APPEND of an updated version of the object is in progress, the only reliable way is to declare only the first object valid. (Obviously it is not possible take the latest, as further versions may appear at any time (so the "latest" actually doesn't exist)).
- All object instances SHALL be checked for the #X-Kolab-Version-UID Header, if it contains the Version-UID of the currently selected object instance. If there is one, it's the successor and SHALL be selected for further checks.
- If there are multiple objects containing the same #X-Kolab-Version-UID Header, the one with the lowest IMAP UID wins (shouldn't happen normally).
- This check SHALL be repeated until no object instance contains an #X-Kolab-Version-UID Header containing the Version-UID of the currently selected object instance.
- The last selected object instance SHALL be the currently valid one, and all other instances MAY be deleted.
Handling invalid object instances
- Invalid object instances MAY be deleted by any client, but only after the proper valid object instance has been identified.
Writing objects
It is the writing clients responsibility to ensure that written objects are valid. This requires writing a possible new version to the server, and checking it's validity after a fresh sync. Failing to do so results in the loss of the conflicting data, in case of a conflict.
- Clients SHALL check an appended version to be valid according to the rules lined out in #Identifying the valid object instance
- If the created object is not valid but resulted in a conflict, it's the writing clients responsibility to retry writing the object instance to the server (after ensuring the local object is a direct successor of the updated copy from the server)
- Clients MUST ensure the appended object instance is a direct successor of the latest valid object instance using #X-Kolab-Version-UID
- Otherwise clients with outdated information could overwrite more recent data.
- A write operation has only been completed safely after the client ensured the appended version became a valid object
Scenarios which make it difficult to detect if the APPEND succeeded
- The append was successful, but we can't check
- A: APPEND X (valid)
- B: APPEND Y (successor of X)
- B: DELETE X
- A: checking if append succeeded but X is already gone, and we don't know that Y is a successor of X
- The append was not successful, but we can't check
- A: APPEND X (invalid)
- B: APPEND Y (successor of X-1)
- B: DELETE X-1 (deleted because ancestor of Y)
- B: DELETE X (deleted because invalid/conflicting item)
- A: checking if append succeeded but X is already gone, and we don't know if it did succeed or not
The solution for this problem is:
- Record all #X-Kolab-Version-UIDs of all ancestors.
- If the currently valid object contains the #X-Kolab-Version-UID which we want to check we're good
- If not, the appended object resulted in a conflict and was deleted => retry
Another option, not requiring the full history of X-Kolab-Version-UIDs:
- Always retry if the created object cannot be found.
- In case of a merge and a successful append another merge should be safe.
- For clients without non-interactive merges it would kinda suck though.
Resolving conflicts
Clients need to resolve conflicts of their local copy of an object to the latest valid version on the server, to ensure that the local copy is a direct successor of the latest valid object instance, and thus to be able to successfully write their version to the server. The used conflict resolution algorithm SHOULD always produce a usable result, which resolves the conflict, otherwise the object concerned may not be synchronized with the server. The conflict resolution algorithm consists of a set of suitable conflict resolution strategies, to turn the conflicting object into a successor of the latest valid object. Different clients MAY choose different strategies, which are suitable for the respective client (mobile devices have other needs than a desktop client).
Conflict Resolution strategies
A conflict resolution strategy provides a way to resolve a conflict. A strategy may fail, so multiple strategies may be tried until one succeeds to resolve the conflict. The last strategy should be one which never fails, eg. resolving manually or keeping both versions.
- no conflict (fails as soon as a conflict has been detected)
- 3-way merge (fails when changes are conflicting, could produce a partially merged version)
- organizer wins (some users always win, eg. the organizer of an event)
- automatic strategy (always local, always remote version, ....)
- user dialog (lets the user pick one)
- Avoid whenever possible, blocking user-dialogs are a PITA.
- keep both (keeps both versions but assigns a new UID to one of them, fails never).
A strategy MAY attach the original versions of the object, and mark the object if problems occurred during the merging using X-Kolab-Conflict, so those can be resolved at a later stage and no data is lost. This allows strategies to create a suggested version, which SHALL be used for further edits, while allowing clients which support that feature to provide advanced conflict resolution strategies, such as a manual merge.
Strategy: Merge
To be able to identify changed properties we need to do a 3-way diff which requires the latest common ancestor. Because the common ancestor is not necessarily available on the server, clients need to preserve it until the merge is complete. This can be achieved by always attaching the original version, from which a modification started, to the modified version until the modified version has been successfully written to the server and verified to be valid.
The merging algorithm SHALL consider the following rules in the same order to resolve conflicts:
- Special checking and handling of properties which are depending on others has to be done first.
- Where a property has only been modified by one instance, the changed version SHALL be used
- Where we have lists (such as addresses or attendees), the lists SHALL be merged whenever possible.
- Where properties are conflicting, the latest according to the last-modified date SHALL be taken (dataloss: mark using X-Kolab-Conflict).
The following diagram illustrates how to deal with merging in a disconnected IMAP scenario:
Pushing conflicts
To be able to push conflicting changes, the client MUST create a proposed version (This is a requirement in order to always have a valid object instance on the server), e.g. produced by a merging algorithm, but attach the conflicting objects as attachments, this way the conflict resolution does not block the process of synchronizing the object. This allows another client to detect the conflict and offer to the user to correct/review the automatically merged object.
- A client MUST NOT automatically start a conflict resolution dialog if it detects such an item, and conflict resolution MUST always be optional and triggered by the user. This is to avoid endless conflict resolution loops.
Objects with a pending conflict SHALL be marked using the X-Kolab-Conflict header. Each header MUST contain the cid of an attachment containing one of the conflicting objects. The attachment SHALL contain the plain XML object with all attachments and alike stored inline (selfcontained).
Being able to push conflicts is important so e.g. a mobile device can push conflicting changes and it is possible to resolve the conflict on a Desktop Computer later on (as opposed to showing a conflict resolution dialog on a mobile device)
X-Kolab-Version-UID
X-Kolab-Version-UID SHALL identify a specific version of a kolab object.
- The UID MUST be a globally unique identifier for a specific object version.
- The UID MAY be a HASH value which is guaranteed to be unique (SHA-1 or alike is enough, we don't need cryptographic strength)
- The UID MAY be a generated globally unique identifier.
A client MUST assign a new X-Kolab-Version-UID to every object which is appended, and then remember it to verify that the APPEND succeeded without creating a duplicate.
Further a client SHALL record the history as a list of X-Kolab-Version-UIDs where the first one is the Version-UID of the current object instance.
This serves the following purpose:
- To be able to detect if an APPEND resulted in a conflict, respectively to check if the object instance a client appended has a lower IMAP UID as another duplicate. For this we need a way of identifying a specific version of a kolab object, before it gets an IMAP UID assigned on the server.
- To be able to tell if the object instance appended by a client has been incorporated into the currently valid version, in case the appended version has already been overwritten by another edit before the client could check #Scenarios which make it difficult to detect if the APPEND succeeded.
OUTDATED Alternative Approach: X-Kolab-Expected-IMAP-UID
As an alternative to X-Kolab-Version-UID, there could also be a X-Kolab-Expected-IMAP-UID. The client would just retry until it manages to write an object where the IMAP UID == X-Kolab-Expected-IMAP-UID. The latest object instance where this holds true, is the currently valid one. Chances for conflicts increase, beacuse also writes of completely unrelated objects increase the IMAP-UID and would therefore be conflicting.
X-Kolab-Version-UID is still required for the client to identify the object it has written.
No real advantage compared to X-Kolab-Ancestor and less powerful because there is no way to detect ancestors (a conflict results always in the client having to merge again, although the written instance might already have been incorporated).
OUTDATED Alternative Approach: X-Kolab-Sequence-Number
A monotonically increasing X-Kolab-Sequence-Number could be used detect conflicts. If multiple objects with the same sequence number exist, the one with the lower IMAP UID is valid.
X-Kolab-Version-UID is still required for the client to identify the object it has written.
No real advantage compared to X-Kolab-Ancestor and less powerful because there is no way to detect ancestors (a conflict results always in the client having to merge again, although the written instance might already have been incorporated).
Concerns/Problems
- Every client MUST read all objects within a Conflict Domain (IMAP Folder), to be able to identify the valid object instance.
- Otherwise it would risk displaying/working with outdated/invalid data
- The identification process is not entirely trivial, but I couldn't find a way to simplify it so far.
- All clients need to preserve the X-Kolab-Version-UID history.
Considerations
General Considerations
Due to how Kolab works, with clients doing the processing and using the server only for Storage, there are some extra considerations to make as soon as multiple clients operate on the same dataset (either multiple clients of the same user or shared folders):
- Automatic tasks which are executed based on the data are potentially processed by multiple clients simultaneously.
- Without some special locking mechanism (which would seriously complicate things with offline clients), it must be ensured that all those operations may be executed simultaneously.
- For actions which create new data in the store this usually mean that all actions must be deterministic, so no conflicts are produced if several systems process the task simultaneously.
- Because a system may synchronize to the IMAP server at any time, it must be ensure that the IMAP store always remains in a valid state, which can be used also offline.
- Otherwise we would risk i.e. synchronizing some locks, which blocks the i.e. editing certain objects.
Endless Resolution Loop
The following scenario assumes that several system are watching the imap store, and act automatically upon detected conflicts.
With several system watching the same dataset for conflicts we could run into conflict-resolution loops:
- System A detects a conflict and starts to process it
- While System A is processing System B starts the conflict resolution as well.
- Both systems append their produced result, resulting in a new conflict => welcome to the loop
- One possibility to break the loop would be to first mark at least one of the messages as deleted (thus resolving the conflict), and only if that succeeds try to resolve the conflict (using one of the strategies). That works of course under the assumption that the mark-as-deleted operation can only succeed for a single system (which I'm not sure is the case with e.g cyrus-murder).
- The second conflict, of several resolved objects, doesn't require creating a new object and is solvable by deleting one of the two (or more). If there is a defined order which of the objects should be deleted and which should be kept, the loop can be broken safely. However, if one of the clients deletes the wrong object (and multiple are trying to resolve the same conflict) all objects are potentially lost.
- If a client would: mark all conflicting object first as deleted (thus blocking the resolution for other clients), and then always append the new object after doing the resolution (even if it is one of the previously marked as deleted ones), and then expunge. This strategy seems to prevent potential dataloss, and minimize the chances of an endless loop, it depends on timing though to break the loop, which fails in the following scenarios.
- Generally a system which detects a conflict, should "claim" the resolution as fast as possible (prevent other systems from resolving the same conflict). It can then take it's time to do the actual resolution (which can be slow due to manual user interaction), and finally upload the result.
- Systems with offline capabilities are much more likely to produce conflict resolution loops:
- System A synchronizes a conflict from the server and goes offline
- System B resolves the conflict on the server and System A in it's local imap cache
- System A synchronizes again resulting in a new conflict which it also synchronizes again
- Rinse and repeat
- PUSH notifications expose a similar problem as there is very little (or no?) time for a client to "claim" the resolution.
- Systems with offline capabilities might need to first fetch new data, and resolve conflicts locally, and then do a fully sync to avoid endless resolution loops.
- Which would require to be able to FETCH only, which we can't do with the imap resource atm. (the kolab resource has no control when the imap resource syncs).
The following diagram should illustrate the endless loop:
Even if we try to minimize the chances for a serverside conflict, it's not possible to avoid it completely, so we need to be able to deal with it:
Conclusion
If a conflict resolution is used which produces a new item out of $N conflicting items, with potentially $M systems processing the conflict resolution, we produce $M new conflicts, thus resulting in a potentially endless conflict resolution loop. While this is all very much timing dependent, the chances to produce such a loop increase by using techniques such as:
- IMAP IDLE (aka push): conflicts are quickly distributed to many systems, before it can be resolved
- disconnected IMAP: clients have local resolutions which have not yet been written back to the server, nor are they aware of resolutions already available on the server.
A more general conclusion from this mental experiment is that automated modifications done by several clients must be done very carefully as one can get into mutual reactions of clients quickly.
Breaking the loop
In order to break the loop the following procedure may be applied:
- Client A synchronizes, which results in a conflict on the server.
- Client A and $M other clients detect the conflict and merge it according to a merging logic
- The merging logic must be strictly deterministic (in order for all clients to produce the same result) and require no user interaction at all times
- The merged object must indicate which source objects have been used to produce the merged version
- The merged object must be marked as such, and with which version of the merging algorithm it was created in order to be able to update the merging algorithm.
- All clients append their merged object, producing a new conflict (of potentially $M+1 objects)
- All clients get the new conflict
- All clients identify the merged objects to be identical (by their merge source indicator) and delete all except one.
- The one to keep is identified by its imap uid, which must be chosen according to a logic which guarantees that there is always the same choice among all clients which are simultaneously resolving the conflict (otherwise there is potential data-loss). E.g. the one with the lowest message sequence number would be a sane choice.
- Always an item with the latest merging algorithm version must be chosen in order to be able to update the merging algorithm.
- Note that it is absolutely crucial that all clients realize that this is a merged object and NEVER merge an already merged object again. If any client fails to mark it's merged object as merged object or fails to detect a merged object, thus merging again, we have a loop again.
- Disconnected IMAP clients may merge the item locally and modify the merged object again. On the next sync that will simply result in a new conflict, thus restarting the merging process.
Potential problems:
- If IMAP IDLE is sufficiently quick to distribute the conflicts, and since every edit results in a temporary conflict, every edit of an object by a single client, results in $M clients trying to resolve the conflict, thus producing $M conflicting items on the server.
- To prevent this problem, clients which did not produce the conflict in the first place could i.e. wait first for 20s or so until they try to resolve the conflict.
- The merged objects MUST be compared by their merge identifier for equality, as a e.g. hash based comparison for equality is way to fragile (affected by every minor change in the object as well as in the merging logic => doesn't work in a heterogeneous system).
- If the a system can not identify the merged objects as identical we end up having an endless resolution loop (objects are merged over and over again).
- To be able to merge, we need the previous version available.
- The akonadi resource currently has no way to get to the old version of the imap object which would be required for a merge. It could have a local representation of the todo/event/.... available however.
- That could be helped by never deleting the last imap item directly on modify.
Alternative Kolab Conflict Resolution Approach: Full history
The following lines out a system which allows multiple clients to automatically detect and resolve conflicts. It relies heavily on a 3-way merging algorithm in order to make it possible to support simultaneous edits. Because 3-way merges are a requirement for simultaneous edits without data-loss and 3-way merges require the full history of the editing, we need to keep all edited objects.
- without a deterministic automated conflict resolution it is not possible to resolve conflicts (by potentially multiple systems) without creating new conflicts (resulting in a endless conflict resolution loop).
- the only other deterministic conflict resolution, is deleting one version resulting in one version always being lost if a conflict occurs.
Consider the following diagram showing the history of a single Kolab Object, with simultaneous edits from three Clients. Each IMAP object contains a specific version (identified by it's unique IMAP UID).
- Each new version of the object enters the imap store as "Unmerged", indicating that the item must first be checked if it needs to be merged. This means the object does not represent a valid version of the object until it has been checked for merging and flagged as "Valid". This detail is not visible in the diagram, it is however crucial that no edits are based on objects which have been marked in the IMAP store as "Valid".
- The shown IMAP UID of the objects represents the last "Valid" IMAP object from which an edit has been started.
- The current state of a Kolab Object is represented by the IMAP object with the highest UID marked as "Valid"
- Because every operation relies solely on the data (history) available so far, all operations can be safely executed by multiple systems simultaneously, even if their datasets have different synchronization states, and are stored offline in a disconnected IMAP scenario.
- Note that the process of identifying an object as "Valid" is the same as a merge, but not resulting in a new item. That means, the result of identifying the item as "Valid" or "Unmerged", takes some time (A new object enters the imap store always as unmerged, otherwise it cannot be guaranteed that another imap item doesn't come first). However, the result is not changed by additional modifications and solely depends on the currently available history. Therefore the identification process can safely be executed offline.
- "Unmerged" objects which have been successfully merged are actually not required anymore (except to revert to such a version) and could be deleted. (But only the "Unmerged" source object, not the "Valid" ones.)
- "Unmerged" objects which have been merged but are not deleted, MUST be marked as "Merged" to avoid repeated processing of the same merge.
- Ideally we would keep all "Valid" objects do be able to merge no matter for how long an object has been taken offline (while versions changed on the server).
- It would also be possible to keep only $N versions, where $N is the number of "Valid" versions in the IMAP store until an offline item cannot be merged anymore.
- It would also be possible to attach only the ancestor to every object. This way we always have the required ancestors available and no extra items in the imap store.
Ancestor UID
In order to be able to find the latest common ancestor (or base) of two to-be-merged objects (which is a requirement for a 3-way merge), we need to store the IMAP UID of the object from which the edit started. We can then determine the base for a merge by simply taking the smaller Ancestor UID (thus earlier in history) of the two objects.
The ancestor UID shall therefore be stored in a MIME header.
Merge ID
As multiple systems may do the same merge simultaneously, resulting in multiple merged versions on the server, a way to identify those duplicates is required. Therefore each merged object shall be marked by a "Merge ID".
The merge ID consists of:
- The IMAP UIDs of the source objects: This allows to match two duplicate merges.
- BROKEN FEATURE: A version number of the merging algorithm: This allows to update the merging algorithm (newer versions override the result of older versions.)
- Note that as soon as the first version of a merged object is flagged as "Valid" on the server, it's not possible to override that result by an updated merging algorithm anymore. So it is in fact not possible to update the merging algorithm reliably without updating ALL clients.
- If only the creator of the conflict would act immediately and the others not or only later, that problem would be largely resolved.
- Note that as soon as the first version of a merged object is flagged as "Valid" on the server, it's not possible to override that result by an updated merging algorithm anymore. So it is in fact not possible to update the merging algorithm reliably without updating ALL clients.
Valid/Merged Flag
It is preferred flag the valid/merged objects instead of the unmerged objects to protect against misbehaving clients (failing to set the unmerged flag on new objects).
Instead of introducing a new flag an existing one such as \Seen and \Answered may be used. TODO: Make sure that there is no conflicting usage, e.g. a mail client may set \Seen automatically.
Merging algorithm
To find the base/common ancestor for the merge, the Ancestor UID's of the objects SHALL be compared and the lower UID MUST be used.
The merging algorithm SHALL consider the following rules in the same order to resolve conflicts:
- Special checking and handling of properties which are depending on others has to be done first
- Where a property has only been modified by one instance, the changed version SHALL be used
- Where we have lists (such as addresses or attendees), the lists SHALL be merged.
- Where properties are conflicting, the latest according to the last-modified date SHALL be taken.
- TODO: do we need other heuristics such as admin first, organizer first, ...?
- In case one Object lacks the Ancestor UID, thus making a merge impossible, the one which is marked as "Valid" MUST remain and the other marked as merged (one of the two is always "Valid").
Example
Typically a disconnected IMAP client would:
- Synchronize it's data from the server and go offline
- Delete merge duplicates
- Check for unmerged objects and attempt a merge:
- either flag the object as "Valid" if it doesn't need a merge (fast-forward change)
- create a merged version which is added to the local IMAP store (containing a merge identifier)
- Identify the latest "Valid" version of the object
- Note that the previously created, merged version is not valid until it has been synchronized to the server.
- Edit the identified version as much as we like
- Upon edit we need to make sure the mime message contains the latest valid IMAP UID of the object
- Synchronize to the server again
- Note that our merged version will create a new conflict only if there have been simultaneous edits to the same object. Otherwise we can flag it as "Valid" after the next sync.
Considerations
- Note that this approach, doesn't require any locking, or introduce any invalid states in the IMAP store and is therefore very disconnected IMAP client friendly. Clients can sync at any time and receive data they can work with, without introducing conflicts.
- As side effect of having the full version history available, it should be fairly trivial to make it possible for clients to restore previous versions of any kolab-object (yay, we have notes versioning).
- Because we can detect hard conflicts during the merging, we could annotate the object accordingly, offering the user to overwrite/change the merged version (or even revert completely). This MUST be optional however, both for usability reasons and to avoid an endless conflict resolution loop.
Minimum requirements for a read-only clients:
- It SHOULD be able to identify the latest version based on IMAP UID and "Valid" flag.
Minimum requirements for a writing client:
- Identifying the latest version which SHALL be used for new edits (otherwise it would overwrite the new version with old changes)
- A client which does not set the Ancestor UID will not get it's changes merged (they will be overwritten by other clients in case of a merge). For fast-forward changes other clients will pick up the changes.
- A client which does edits based on "Unmerged" items would write outdated data.
So misbehaving clients would not really break the system, but only have a crippled experience (of course they are still able to overwrite the data with outdated information).
Problems
- 1. keeping the history on the server impacts some models of operation
- data spool size and quota
- clients that need to sift through the data spool (and may therefore become slow rather quickly)
- expunge policies set by a customer
- because deleting the history means it's not possible to merge anymore
- 2. potential consequences of not maintaining the history is an immediate negation of any merge capabilities
- 3. all clients are eligible to detect the fact there is a conflict
- 4. Kontact 4.8 and Kontact 7.4 may have different resolution strategies or algorithms, that is likely to result in two different merge results being brought back
Solutions:
- 1. By only attaching the ancestor to each object, that concern should be largely resolved. The data spool size is still increased as the size of each IMAP object is roughly doubled.
- 2. Solved by only attaching the ancestor.
Akonadi Resource Implementation
Assuming we don't keep the full history, but only attach the ancestor to each object.
- Synchronize data from the server and go offline
- Delete merge duplicates
- Check for unmerged objects andValid attempt a merge:
- either flag the object as "Valid" if it doesn't need a merge (fast-forward change)
- create a merged version which is added to the local IMAP cache (containing a merge identifier)
- By now there should be no more duplicate UIDs among the "Valid" objects
- identify the latest "Valid" version of the object
- Note that the previously created, merged version is not valid until it has been synchronized to the server.
- All objects before (lower uid) the identified object may be deleted.
- Edit the identified version as much as we like
- Upon edit we need to make sure the mime message has the latest valid version of the object attached (attaching the ancestor)
- Synchronize to the server again
- Note that our merged version will create a new conflict only if there have been simultaneous edits to the same object. Otherwise we can flag it as "Valid" after the next sync.
Virtual Objects
- virtual object: A virtual object is an object which is not itself stored but only described in potentially multiple other objects. This means the virtual object has not a physical file (e.g. an xCal file) itself, but exists because multiple other objects describe the virtual object (i.e. by uid and name). The simplest version of a virtual object is e.g. a iCal-Category. Virtual objects can have properties too, and need to be treated specially by the conflict resolution. A more sophisticated virtual object is described here: http://wiki.kolab.org/PIM-Item_Relations
- host object: An object containing a description of a virtual object.
Additional Conflict scenarios
- Multiple, different objects describe the same virtual object
- User A has two objects X,Y describing a single virtual object V with a property P
- User B synchronizes the object X, and changes the property P of the virtual object V
- User A synchronizes the changed object X.
- User A has now two descriptions of the virtual object V which are conflicting (one from X and one from Y).
Conflict Domain
- How does the conflict domain look for virtual objects?
- One tree for all (which is problematic because some objects could be read-only while others aren't) vs on tree per folder vs on tree per user vs something inbetween
Detecting conflicts on virtual objects
To detect and resolve conflicts on virtual objects they need to be interpreted, respectively an in-memory representation of the virtual objects has to be created. Each virtual object must contain the required properties for conflict detection and resolution (i.e. UID and sequence number), as it is not possible to use the existing values of the hosting object (because virtual objects can produce conflicts among different hosting objects). Because a virtual object usually is described by multiple host objects, conflicts are very likely and therefore there must be interaction-less conflict resolution available. So while a conflict is a "special" situation for normal objects, it is the normal situation for virtual object (except the virtual object has only one host-object).
Update to virtual objects: Because virtual objects are stored in multiple host objects, updates are somewhat complex.
- We have to write the changes to every host-object, resulting in a lot of modifications.
- On a system could be 20 host-objects with outdated virtual-object information and one which just has been synchronized. The information of the one object must override and update the information of all other objects (given it is more up-to-date).
- If we would not update the virtual object info on all host objects, deleting the wrong object could revert changes (nasty surprise).
- The virtual object could have been modified on two systems simultaneously, requiring conflict resolution.
- If two systems do the conflict resolution simultaneously, we get an endless conflict resolution loop (because the two resolutions conflict again).





