 a30fd0276b
			
		
	
	a30fd0276b
	
	
	
		
			
			Update doc for missed quotation marks Change-Id: I4993878b0b86e3b25ee6a7d86b7a406023f20118
		
			
				
	
	
		
			476 lines
		
	
	
		
			21 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			476 lines
		
	
	
		
			21 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| Replication
 | |
| ===========
 | |
| 
 | |
| For backend devices that offer replication features, Cinder provides a common
 | |
| mechanism for exposing that functionality on a per volume basis while still
 | |
| trying to allow flexibility for the varying implementation and requirements of
 | |
| all the different backend devices.
 | |
| 
 | |
| There are 2 sides to Cinder's replication feature, the core mechanism and the
 | |
| driver specific functionality, and in this document we'll only be covering the
 | |
| driver side of things aimed at helping vendors implement this functionality in
 | |
| their drivers in a way consistent with all other drivers.
 | |
| 
 | |
| Although we'll be focusing on the driver implementation there will also be some
 | |
| mentions on deployment configurations to provide a clear picture to developers
 | |
| and help them avoid implementing custom solutions to solve things that were
 | |
| meant to be done via the cloud configuration.
 | |
| 
 | |
| Overview
 | |
| --------
 | |
| 
 | |
| As a general rule replication is enabled and configured via the cinder.conf
 | |
| file under the driver's section, and volume replication is requested through
 | |
| the use of volume types.
 | |
| 
 | |
| *NOTE*: Current replication implementation is v2.1 and it's meant to solve a
 | |
| very specific use case, the "smoking hole" scenario.  It's critical that you
 | |
| read the Use Cases section of the spec here:
 | |
| https://specs.openstack.org/openstack/cinder-specs/specs/mitaka/cheesecake.html
 | |
| 
 | |
| From a user's perspective volumes will be created using specific volume types,
 | |
| even if it is the default volume type, and they will either be replicated or
 | |
| not, which will be reflected on the ``replication_status`` field of the volume.
 | |
| So in order to know if a snapshot is replicated we'll have to check its volume.
 | |
| 
 | |
| After the loss of the primary storage site all operations on the resources will
 | |
| fail and VMs will no longer have access to the data.  It is then when the Cloud
 | |
| Administrator will issue the ``failover-host`` command to make the
 | |
| cinder-volume service perform the failover.
 | |
| 
 | |
| After the failover is completed, the Cinder volume service will start using the
 | |
| failed-over secondary storage site for all operations and the user will once
 | |
| again be able to perform actions on all resources that were replicated, while
 | |
| all other resources will be in error status since they are no longer available.
 | |
| 
 | |
| Storage Device configuration
 | |
| ----------------------------
 | |
| 
 | |
| Most storage devices will require configuration changes to enable the
 | |
| replication functionality, and this configuration process is vendor and storage
 | |
| device specific so it is not contemplated by the Cinder core replication
 | |
| functionality.
 | |
| 
 | |
| It is up to the vendors whether they want to handle this device configuration
 | |
| in the Cinder driver or as a manual process, but the most common approach is to
 | |
| avoid including this configuration logic into Cinder and having the Cloud
 | |
| Administrators do a manual process following a specific guide to enable
 | |
| replication on the storage device before configuring the cinder volume service.
 | |
| 
 | |
| Service configuration
 | |
| ---------------------
 | |
| 
 | |
| The way to enable and configure replication is common to all drivers and it is
 | |
| done via the ``replication_device`` configuration option that goes in the
 | |
| driver's specific section in the ``cinder.conf`` configuration file.
 | |
| 
 | |
| ``replication_device`` is a multi dictionary option, that should be specified
 | |
| for each replication target device the admin wants to configure.
 | |
| 
 | |
| While it is true that all drivers use the same ``replication_device``
 | |
| configuration option this doesn't mean that they will all have the same data,
 | |
| as there is only one standardized and **REQUIRED** key in the configuration
 | |
| entry, all others are vendor specific:
 | |
| 
 | |
| - backend_id:<vendor-identifier-for-rep-target>
 | |
| 
 | |
| Values of ``backend_id`` keys are used to uniquely identify within the driver
 | |
| each of the secondary sites, although they can be reused on different driver
 | |
| sections.
 | |
| 
 | |
| These unique identifiers will be used by the failover mechanism as well as in
 | |
| the driver initialization process, and the only requirement is that is must
 | |
| never have the value "default".
 | |
| 
 | |
| An example driver configuration for a device with multiple replication targets
 | |
| is show below::
 | |
| 
 | |
|     .....
 | |
|     [driver-biz]
 | |
|     volume_driver=xxxx
 | |
|     volume_backend_name=biz
 | |
| 
 | |
|     [driver-baz]
 | |
|     volume_driver=xxxx
 | |
|     volume_backend_name=baz
 | |
| 
 | |
|     [driver-foo]
 | |
|     volume_driver=xxxx
 | |
|     volume_backend_name=foo
 | |
|     replication_device = backend_id:vendor-id-1,unique_key:val....
 | |
|     replication_device = backend_id:vendor-id-2,unique_key:val....
 | |
| 
 | |
| In this example the result of calling
 | |
| ``self.configuration.safe_get('replication_device')`` within the driver is the
 | |
| following list::
 | |
| 
 | |
|     [{backend_id: vendor-id-1, unique_key: val1},
 | |
|      {backend_id: vendor-id-2, unique_key: val2}]
 | |
| 
 | |
| It is expected that if a driver is configured with multiple replication
 | |
| targets, that replicated volumes are actually replicated on **all targets**.
 | |
| 
 | |
| Besides specific replication device keys defined in the ``replication_device``,
 | |
| a driver may also have additional normal configuration options in the driver
 | |
| section related with the replication to allow Cloud Administrators to configure
 | |
| things like timeouts.
 | |
| 
 | |
| Capabilities reporting
 | |
| ----------------------
 | |
| 
 | |
| There are 2 new replication stats/capability keys that drivers supporting
 | |
| replication v2.1 should be reporting: ``replication_enabled`` and
 | |
| ``replication_targets``::
 | |
| 
 | |
|     stats["replication_enabled"] = True|False
 | |
|     stats["replication_targets"] = [<backend-id_1, <backend-id_2>...]
 | |
| 
 | |
| If a driver is behaving correctly we can expect the ``replication_targets``
 | |
| field to be populated whenever ``replication_enabled`` is set to ``True``, and
 | |
| it is expected to either be set to ``[]`` or be missing altogether when
 | |
| ``replication_enabled`` is set to ``False``.
 | |
| 
 | |
| The purpose of the ``replication_enabled`` field is to be used by the scheduler
 | |
| in volume types for creation and migrations.
 | |
| 
 | |
| As for the ``replication_targets`` field it is only provided for informational
 | |
| purposes so it can be retrieved through the ``get_capabilities`` using the
 | |
| admin REST API, but it will not be used for validation at the API layer.  That
 | |
| way Cloud Administrators will be able to know available secondary sites where
 | |
| they can failover.
 | |
| 
 | |
| Volume Types / Extra Specs
 | |
| ---------------------------
 | |
| 
 | |
| The way to control the creation of volumes on a cloud with backends that have
 | |
| replication enabled is, like with many other features, through the use of
 | |
| volume types.
 | |
| 
 | |
| We won't go into the details of volume type creation, but suffice to say that
 | |
| you will most likely want to use volume types to discriminate between
 | |
| replicated and non replicated volumes and be explicit about it so that non
 | |
| replicated volumes won't end up in a replicated backend.
 | |
| 
 | |
| Since the driver is reporting the ``replication_enabled`` key, we just need to
 | |
| require it for replication volume types adding ``replication_enabled='<is>
 | |
| True'`` and also specifying it for all non replicated volume types
 | |
| ``replication_enabled='<is> False'``.
 | |
| 
 | |
| It's up to the driver to parse the volume type info on create and set things up
 | |
| as requested.  While the scoping key can be anything, it's strongly recommended
 | |
| that all backends utilize the same key (replication) for consistency and to
 | |
| make things easier for the Cloud Administrator.
 | |
| 
 | |
| Additional replication parameters can be supplied to the driver using vendor
 | |
| specific properties through the volume type's extra-specs so they can be used
 | |
| by the driver at volume creation time, or retype.
 | |
| 
 | |
| It is up to the driver to parse the volume type info on create and retype to
 | |
| set things up as requested.  A good pattern to get a custom parameter from a
 | |
| given volume instance is this::
 | |
| 
 | |
|     extra_specs = getattr(volume.volume_type, 'extra_specs', {})
 | |
|     custom_param = extra_specs.get('custom_param', 'default_value')
 | |
| 
 | |
| It may seem convoluted, but we must be careful when retrieving the
 | |
| ``extra_specs`` from the ``volume_type`` field as it could be ``None``.
 | |
| 
 | |
| Vendors should try to avoid obfuscating their custom properties and expose them
 | |
| using the ``_init_vendor_properties`` method so they can be checked by the
 | |
| Cloud Administrator using the ``get_capabilities`` REST API.
 | |
| 
 | |
| *NOTE*: For storage devices doing per backend/pool replication the use of
 | |
| volume types is also recommended.
 | |
| 
 | |
| Volume creation
 | |
| ---------------
 | |
| 
 | |
| Drivers are expected to honor the replication parameters set in the volume type
 | |
| during creation, retyping, or migration.
 | |
| 
 | |
| When implementing the replication feature there are some driver methods that
 | |
| will most likely need modifications -if they are implemented in the driver
 | |
| (since some are optional)- to make sure that the backend is replicating volumes
 | |
| that need to be replicated and not replicating those that don't need to be:
 | |
| 
 | |
| - ``create_volume``
 | |
| - ``create_volume_from_snapshot``
 | |
| - ``create_cloned_volume``
 | |
| - ``retype``
 | |
| - ``clone_image``
 | |
| - ``migrate_volume``
 | |
| 
 | |
| In these methods the driver will have to check the volume type to see if the
 | |
| volumes need to be replicated, we could use the same pattern described in the
 | |
| `Volume Types / Extra Specs`_ section::
 | |
| 
 | |
|     def _is_replicated(self, volume):
 | |
|         specs = getattr(volume.volume_type, 'extra_specs', {})
 | |
|         return specs.get('replication_enabled') == '<is> True'
 | |
| 
 | |
| But it is **not** the recommended mechanism, and the ``is_replicated`` method
 | |
| available in volumes and volume types versioned objects instances should be
 | |
| used instead.
 | |
| 
 | |
| Drivers are expected to keep the ``replication_status`` field up to date and in
 | |
| sync with reality, usually as specified in the volume type.  To do so in above
 | |
| mentioned methods' implementation they should use the update model mechanism
 | |
| provided for each one of those methods.  One must be careful since the update
 | |
| mechanism may be different from one method to another.
 | |
| 
 | |
| What this means is that most of these methods should be returning a
 | |
| ``replication_status`` key with the value set to ``enabled`` in the model
 | |
| update dictionary if the volume type is enabling replication.  There is no need
 | |
| to return the key with the value of ``disabled`` if it is not enabled since
 | |
| that is the default value.
 | |
| 
 | |
| In the case of the ``create_volume``, and ``retype`` method there is no need to
 | |
| return the ``replication_status`` in the model update since it has already been
 | |
| set by the scheduler on creation using the extra spec from the volume type. And
 | |
| on ``migrate_volume`` there is no need either since there is no change to the
 | |
| ``replication_status``.
 | |
| 
 | |
| *NOTE*: For storage devices doing per backend/pool replication it is not
 | |
| necessary to check the volume type for the ``replication_enabled`` key since
 | |
| all created volumes will be replicated, but they are expected to return the
 | |
| ``replication_status`` in all those methods, including the ``create_volume``
 | |
| method since the driver may receive a volume creation request without the
 | |
| replication enabled extra spec and therefore the driver will not have set the
 | |
| right ``replication_status`` and the driver needs to correct this.
 | |
| 
 | |
| Besides the ``replication_status`` field that drivers need to update there are
 | |
| other fields in the database related to the replication mechanism that the
 | |
| drivers can use:
 | |
| 
 | |
| - ``replication_extended_status``
 | |
| - ``replication_driver_data``
 | |
| 
 | |
| These fields are string type fields with a maximum size of 255 characters and
 | |
| they are available for drivers to use internally as they see fit for their
 | |
| normal replication operation.  So they can be assigned in the model update and
 | |
| later on used by the driver, for example during the failover.
 | |
| 
 | |
| To avoid using magic strings drivers must use values defined by the
 | |
| ``ReplicationStatus`` class in ``cinder/objects/fields.py`` file and
 | |
| these are:
 | |
| 
 | |
| - ``ERROR``: When setting the replication failed on creation, retype, or
 | |
|   migrate.  This should be accompanied by the volume status ``error``.
 | |
| - ``ENABLED``: When the volume is being replicated.
 | |
| - ``DISABLED``: When the volume is not being replicated.
 | |
| - ``FAILED_OVER``: After a volume has been successfully failed over.
 | |
| - ``FAILOVER_ERROR``: When there was an error during the failover of this
 | |
|   volume.
 | |
| - ``NOT_CAPABLE``: When we failed-over but the volume was not replicated.
 | |
| 
 | |
| The first 3 statuses revolve around the volume creation and the last 3 around
 | |
| the failover mechanism.
 | |
| 
 | |
| The only status that should not be used for the volume's ``replication_status``
 | |
| is the ``FAILING_OVER`` status.
 | |
| 
 | |
| Whenever we are referring to values of the ``replication_status`` in this
 | |
| document we will be referring to the ``ReplicationStatus`` attributes and not a
 | |
| literal string, so ``ERROR`` means
 | |
| ``cinder.objects.field.ReplicationStatus.ERROR`` and not the string "ERROR".
 | |
| 
 | |
| Failover
 | |
| --------
 | |
| 
 | |
| This is the mechanism used to instruct the cinder volume service to fail over
 | |
| to a secondary/target device.
 | |
| 
 | |
| Keep in mind the use case is that the primary backend has died a horrible death
 | |
| and is no longer valid, so any volumes that were on the primary and were not
 | |
| being replicated will no longer be available.
 | |
| 
 | |
| The method definition required from the driver to implement the failback
 | |
| mechanism is as follows::
 | |
| 
 | |
|     def failover_host(self, context, volumes, secondary_id=None):
 | |
| 
 | |
| There are several things that are expected of this method:
 | |
| 
 | |
| - Promotion of a secondary storage device to primary
 | |
| - Generating the model updates
 | |
| - Changing internally to access the secondary storage device for all future
 | |
|   requests.
 | |
| 
 | |
| If no secondary storage device is provided to the driver via the ``backend_id``
 | |
| argument (it is equal to ``None``), then it is up to the driver to choose which
 | |
| storage device to failover to.  In this regard it is important that the driver
 | |
| takes into consideration that it could be failing over from a secondary (there
 | |
| was a prior failover request), so it should discard current target from the
 | |
| selection.
 | |
| 
 | |
| If the ``secondary_id`` is not a valid one the driver is expected to raise
 | |
| ``InvalidReplicationTarget``, for any other non recoverable errors during a
 | |
| failover the driver should raise ``UnableToFailOver`` or any child of
 | |
| ``VolumeDriverException`` class and revert to a state where the previous
 | |
| backend is in use.
 | |
| 
 | |
| The failover method in the driver will receive a list of replicated volumes
 | |
| that need to be failed over.  Replicated volumes passed to the driver may have
 | |
| diverse ``replication_status`` values, but they will always be one of:
 | |
| ``ENABLED``, ``FAILED_OVER``, or ``FAILOVER_ERROR``.
 | |
| 
 | |
| The driver must return a 2-tuple with the new storage device target id as the
 | |
| first element and a list of dictionaries with the model updates required for
 | |
| the volumes so that the driver can perform future actions on those volumes now
 | |
| that they need to be accessed on a different location.
 | |
| 
 | |
| It's not a requirement for the driver to return model updates for all the
 | |
| volumes, or for any for that matter as it can return ``None`` or an empty list
 | |
| if there's no update necessary.  But if elements are returned in the model
 | |
| update list then it is a requirement that each of the dictionaries contains 2
 | |
| key-value pairs, ``volume_id`` and ``updates`` like this::
 | |
| 
 | |
|     [{
 | |
|          'volume_id': volumes[0].id,
 | |
|          'updates': {
 | |
|              'provider_id': new_provider_id1,
 | |
|              ...
 | |
|          },
 | |
|          'volume_id': volumes[1].id,
 | |
|          'updates': {
 | |
|              'provider_id': new_provider_id2,
 | |
|              'replication_status': fields.ReplicationStatus.FAILOVER_ERROR,
 | |
|              ...
 | |
|          },
 | |
|     }]
 | |
| 
 | |
| In these updates there is no need to set the ``replication_status`` to
 | |
| ``FAILED_OVER`` if the failover was successful, as this will be performed by
 | |
| the manager by default, but it won't create additional DB queries if it is
 | |
| returned.  It is however necessary to set it to ``FAILOVER_ERROR`` for those
 | |
| volumes that had errors during the failover.
 | |
| 
 | |
| Drivers don't have to worry about snapshots or non replicated volumes, since
 | |
| the manager will take care of those in the following manner:
 | |
| 
 | |
| - All non replicated volumes will have their current ``status`` field saved in
 | |
|   the ``previous_status`` field, the ``status`` field changed to ``error``, and
 | |
|   their ``replication_status`` set to ``NOT_CAPABLE``.
 | |
| - All snapshots from non replicated volumes will have their statuses changed to
 | |
|   ``error``.
 | |
| - All replicated volumes that failed on the failover will get their ``status``
 | |
|   changed to ``error``, their current ``status`` preserved in
 | |
|   ``previous_status``, and their ``replication_status`` set to
 | |
|   ``FAILOVER_ERROR`` .
 | |
| - All snapshots from volumes that had errors during the failover will have
 | |
|   their statuses set to ``error``.
 | |
| 
 | |
| Any model update request from the driver that changes the ``status`` field will
 | |
| trigger a change in the ``previous_status`` field to preserve the current
 | |
| status.
 | |
| 
 | |
| Once the failover is completed the driver should be pointing to the secondary
 | |
| and should be able to create and destroy volumes and snapshots as usual, and it
 | |
| is left to the Cloud Administrator's discretion whether resource modifying
 | |
| operations are allowed or not.
 | |
| 
 | |
| Failback
 | |
| --------
 | |
| 
 | |
| Drivers are not required to support failback, but they are required to raise a
 | |
| ``InvalidReplicationTarget`` exception if the failback is requested but not
 | |
| supported.
 | |
| 
 | |
| The way to request the failback is quite simple, the driver will receive the
 | |
| argument ``secondary_id`` with the value of ``default``.  That is why it was
 | |
| forbidden to use the ``default`` on the target configuration in the cinder
 | |
| configuration file.
 | |
| 
 | |
| Expected driver behavior is the same as the one explained in the `Failover`_
 | |
| section:
 | |
| 
 | |
| - Promotion of the original primary to primary
 | |
| - Generating the model updates
 | |
| - Changing internally to access the original primary storage device for all
 | |
|   future requests.
 | |
| 
 | |
| If the failback of any of the volumes fail the driver must return
 | |
| ``replication_status`` set to ``ERROR`` in the volume updates for those
 | |
| volumes.  If they succeed it is not necessary to change the
 | |
| ``replication_status`` since the default behavior will be to set them to
 | |
| ``ENABLED``, but it won't create additional DB queries if it is set.
 | |
| 
 | |
| The manager will update resources in a slightly different way than in the
 | |
| failover case:
 | |
| 
 | |
| - All non replicated volumes will not have any model modifications.
 | |
| - All snapshots from non replicated volumes will not have any model
 | |
|   modifications.
 | |
| - All replicated volumes that failed on the failback will get their ``status``
 | |
|   changed to ``error``, have their current ``status`` preserved in the
 | |
|   ``previous_status`` field, and their ``replication_status`` set to
 | |
|   ``FAILOVER_ERROR``.
 | |
| - All snapshots from volumes that had errors during the failover will have
 | |
|   their statuses set to ``error``.
 | |
| 
 | |
| We can avoid using the "default" magic string by using the
 | |
| ``FAILBACK_SENTINEL`` class attribute from the ``VolumeManager`` class.
 | |
| 
 | |
| Initialization
 | |
| --------------
 | |
| 
 | |
| It stands to reason that a failed over Cinder volume service may be restarted,
 | |
| so there needs to be a way for a driver to know on start which storage device
 | |
| should be used to access the resources.
 | |
| 
 | |
| So, to let drivers know which storage device they should use the manager passes
 | |
| drivers the ``active_backend_id`` argument to their ``__init__`` method during
 | |
| the initialization phase of the driver.  Default value is ``None`` when the
 | |
| default (primary) storage device should be used.
 | |
| 
 | |
| Drivers should store this value if they will need it, as the base driver is not
 | |
| storing it, for example to determine the current storage device when a failover
 | |
| is requested and we are already in a failover state, as mentioned above.
 | |
| 
 | |
| Freeze / Thaw
 | |
| -------------
 | |
| 
 | |
| In many cases, after a failover has been completed we'll want to allow changes
 | |
| to the data in the volumes as well as some operations like attach and detach
 | |
| while other operations that modify the number of existing resources, like
 | |
| delete or create, are not allowed.
 | |
| 
 | |
| And that is where the freezing mechanism comes in; freezing a backend puts the
 | |
| control plane of the specific Cinder volume service into a read only state, or
 | |
| at least most of it, while allowing the data plane to proceed as usual.
 | |
| 
 | |
| While this will mostly be handled by the Cinder core code, drivers are informed
 | |
| when the freezing mechanism is enabled or disabled via these 2 calls::
 | |
| 
 | |
|     freeze_backend(self, context)
 | |
|     thaw_backend(self, context)
 | |
| 
 | |
| In most cases the driver may not need to do anything, and then it doesn't need
 | |
| to define any of these methods as long as its a child class of the ``BaseVD``
 | |
| class that already implements them as noops.
 | |
| 
 | |
| Raising a `VolumeDriverException` exception in any of these methods will result
 | |
| in a 500 status code response being returned to the caller and the manager will
 | |
| not log the exception, so it's up to the driver to log the error if it is
 | |
| appropriate.
 | |
| 
 | |
| If the driver wants to give a more meaningful error response, then it can raise
 | |
| other exceptions that have different status codes.
 | |
| 
 | |
| When creating the `freeze_backend` and `thaw_backend` driver methods we must
 | |
| remember that this is a Cloud Administrator operation, so we can return errors
 | |
| that reveal internals of the cloud, for example the type of storage device, and
 | |
| we must use the appropriate internationalization translation methods when
 | |
| raising exceptions; for `VolumeDriverException` no translation is necessary
 | |
| since the manager doesn't log it or return to the user in any way, but any
 | |
| other exception should use the ``_()`` translation method since it will be
 | |
| returned to the REST API caller.
 | |
| 
 | |
| For example, if a storage device doesn't support the thaw operation when failed
 | |
| over, then it should raise an `Invalid` exception::
 | |
| 
 | |
|     def thaw_backend(self, context):
 | |
|         if self.failed_over:
 | |
|             msg = _('Thaw is not supported by driver XYZ.')
 | |
|             raise exception.Invalid(msg)
 |