codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[cinder] Ceph, active-active and no coordination


On Tue, 17 Nov 2020 at 20:27, RadosÅ?aw Piliszek
<radoslaw.piliszek at gmail.com> wrote:
>
> Dear Cinder Masters,
>
> I have a question for you. (or two, or several; well, actually the
> whole Kolla team has :-) )

Thanks for kicking off this thread, Radek.

>
> The background is that Kolla has been happily deploying multinode
> cinder-volume with Ceph RBD backend, with no coordination configured,
> cluster parameter unset, host properly set per host and backend_host
> normalised (as well as any other relevant config) between the
> cinder-volume hosts.
>
> The first question is: do we correctly understand that this was an
> active-active deployment? Or really something else?
>
> Now, there have been no reports that it misbehaved for anyone. It
> certainly has not for any Kolla core. The fact is it was brought to
> our attention because due to the drop of Kolla-deployed Ceph, the
> recommendation to set backend_host was not present and users tripped
> over non-uniform backend_host. And this is expected, of course.

Here is the bug report [1]. It relates to using an externally deployed
Ceph cluster, rather than one deployed via Kolla Ansible.

To provide a little more background, in Train and earlier releases we
documented to set backend_host. From Ussuri, we automated more of the
Ceph configuration, and in the process dropped backend_host. It's not
clear why.

Users upgrading to Ussuri from Train, and dropping their custom Cinder
config in favour of the Kolla automation would lose backend_host, and
therefore volumes would become unmanageable. A manual step is required
to move them to one of the cinder-volume hosts.

That bug caused us to question the active/active setup, especially
after finding a related OSA bug [2].

I can't find any Cinder admin guide for active/active configuration,
although there is a high level spec [3] (with linked sub-specs) and
some contributor docs [4] that outline the various problems.

[1] https://bugs.launchpad.net/kolla-ansible/+bug/1904062
[2] https://bugs.launchpad.net/openstack-ansible/+bug/1837403
[3] https://specs.openstack.org/openstack/cinder-specs/specs/mitaka/cinder-volume-active-active-support.html
[4] https://docs.openstack.org/cinder/latest/contributor/high_availability.html

>
> The second and final question is, building up on the first one, were
> we doing it wrong all the time?
> (plus extras: Why did it work? Were there any quirks? What should we do?)
>
> PS: Please let me know if this thought process is actually
> Ceph-independent as well.
>
> -yoctozepto
>