Add round-robin candidate generation strategy
The previous patch introduced [placement]max_allocation_candidates config option to limit the number of candidates generated for a single query. If the number of generated allocation candidates are limited by that config option then it is possible to get candidates from a limited set of root providers (computes, anchoring providers) as placement uses a depth-first strategy, generating all candidates from the first root before considering the next one. To avoid unbalanced results this patch introduces a new config option [placement]allocation_candidates_generation_strategy with the possible values: * depth-first, the original strategy that generates all candidate from the first root before moving to the next. This is will be the default strategy for backward compatibility * breadth-first, a new possible strategy that generates candidates from available roots in a round-robin fashion, one candidate from each root before taking the second candidate from the first root. Closes-Bug: #2070257 Change-Id: Ib7a140374bc91cc9ab597d0923b0623f618ec32c
This commit is contained in:
@@ -84,6 +84,38 @@ under the same root having inventory from the same resource class
|
|||||||
to tune this config option based on the memory available for the
|
to tune this config option based on the memory available for the
|
||||||
placement service and the client timeout setting on the client side. A good
|
placement service and the client timeout setting on the client side. A good
|
||||||
initial value could be around 100000.
|
initial value could be around 100000.
|
||||||
|
|
||||||
|
In a deployment with wide and symmetric provider trees we also recommend to
|
||||||
|
change the [placement]allocation_candidates_generation_strategy to
|
||||||
|
breadth-first.
|
||||||
|
"""),
|
||||||
|
cfg.StrOpt(
|
||||||
|
'allocation_candidates_generation_strategy',
|
||||||
|
default="depth-first",
|
||||||
|
choices=("depth-first", "breadth-first"),
|
||||||
|
help="""
|
||||||
|
Defines the order placement visits viable root providers during allocation
|
||||||
|
candidate generation:
|
||||||
|
|
||||||
|
* depth-first, generates all candidates from the first viable root provider
|
||||||
|
before moving to the next.
|
||||||
|
|
||||||
|
* breadth-first, generates candidates from viable roots in a round-robin
|
||||||
|
fashion, creating one candidate from each viable root before creating the
|
||||||
|
second candidate from the first root.
|
||||||
|
|
||||||
|
If the deployment has wide and symmetric provider trees, i.e. there are
|
||||||
|
multiple children providers under the same root having inventory from the same
|
||||||
|
resource class (e.g. in case of nova's mdev GPU or PCI in Placement features)
|
||||||
|
then the depth-first strategy with a max_allocation_candidates
|
||||||
|
limit might produce candidates from a limited set of root providers. On the
|
||||||
|
other hand breadth-first strategy will ensure that the candidates are returned
|
||||||
|
from all viable roots in a balanced way.
|
||||||
|
|
||||||
|
Both strategies produce the candidates in the API response in an undefined but
|
||||||
|
deterministic order. That is, all things being equal, two requests for
|
||||||
|
allocation candidates will return the same results in the same order; but no
|
||||||
|
guarantees are made as to how that order is determined.
|
||||||
"""),
|
"""),
|
||||||
]
|
]
|
||||||
|
|
||||||
|
@@ -24,6 +24,7 @@ from placement import exception
|
|||||||
from placement.objects import research_context as res_ctx
|
from placement.objects import research_context as res_ctx
|
||||||
from placement.objects import resource_provider as rp_obj
|
from placement.objects import resource_provider as rp_obj
|
||||||
from placement.objects import trait as trait_obj
|
from placement.objects import trait as trait_obj
|
||||||
|
from placement import util
|
||||||
|
|
||||||
|
|
||||||
_ALLOC_TBL = models.Allocation.__table__
|
_ALLOC_TBL = models.Allocation.__table__
|
||||||
@@ -718,9 +719,21 @@ def _get_areq_list_generators(areq_lists_by_anchor, all_suffixes):
|
|||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
def _generate_areq_lists(areq_lists_by_anchor, all_suffixes):
|
def _generate_areq_lists(rw_ctx, areq_lists_by_anchor, all_suffixes):
|
||||||
|
strategy = (
|
||||||
|
rw_ctx.config.placement.allocation_candidates_generation_strategy)
|
||||||
generators = _get_areq_list_generators(areq_lists_by_anchor, all_suffixes)
|
generators = _get_areq_list_generators(areq_lists_by_anchor, all_suffixes)
|
||||||
return itertools.chain(*generators)
|
if strategy == "depth-first":
|
||||||
|
# Generates all solutions from the first anchor before moving to the
|
||||||
|
# next
|
||||||
|
return itertools.chain(*generators)
|
||||||
|
if strategy == "breadth-first":
|
||||||
|
# Generates solutions from anchors in a round-robin manner. So the
|
||||||
|
# number of solutions generated are balanced between each viable
|
||||||
|
# anchors.
|
||||||
|
return util.roundrobin(*generators)
|
||||||
|
|
||||||
|
raise ValueError("Strategy '%s' not recognized" % strategy)
|
||||||
|
|
||||||
# TODO(efried): Move _merge_candidates to rw_ctx?
|
# TODO(efried): Move _merge_candidates to rw_ctx?
|
||||||
|
|
||||||
@@ -769,7 +782,9 @@ def _merge_candidates(candidates, rw_ctx):
|
|||||||
all_suffixes = set(candidates)
|
all_suffixes = set(candidates)
|
||||||
num_granular_groups = len(all_suffixes - set(['']))
|
num_granular_groups = len(all_suffixes - set(['']))
|
||||||
max_a_c = rw_ctx.config.placement.max_allocation_candidates
|
max_a_c = rw_ctx.config.placement.max_allocation_candidates
|
||||||
for areq_list in _generate_areq_lists(areq_lists_by_anchor, all_suffixes):
|
for areq_list in _generate_areq_lists(
|
||||||
|
rw_ctx, areq_lists_by_anchor, all_suffixes
|
||||||
|
):
|
||||||
# At this point, each AllocationRequest in areq_list is still
|
# At this point, each AllocationRequest in areq_list is still
|
||||||
# marked as use_same_provider. This is necessary to filter by group
|
# marked as use_same_provider. This is necessary to filter by group
|
||||||
# policy, which enforces how these interact with each other.
|
# policy, which enforces how these interact with each other.
|
||||||
|
@@ -32,6 +32,9 @@ class TestWideTreeAllocationCandidateExplosion(base.TestCase):
|
|||||||
|
|
||||||
self.conf_fixture.conf.set_override(
|
self.conf_fixture.conf.set_override(
|
||||||
"max_allocation_candidates", 100000, group="placement")
|
"max_allocation_candidates", 100000, group="placement")
|
||||||
|
self.conf_fixture.conf.set_override(
|
||||||
|
"allocation_candidates_generation_strategy", "breadth-first",
|
||||||
|
group="placement")
|
||||||
|
|
||||||
def create_tree(self, num_roots, num_child, num_res_per_child):
|
def create_tree(self, num_roots, num_child, num_res_per_child):
|
||||||
self.roots = {}
|
self.roots = {}
|
||||||
@@ -108,11 +111,14 @@ class TestWideTreeAllocationCandidateExplosion(base.TestCase):
|
|||||||
expected_candidates=1000, expected_computes_with_candidates=2)
|
expected_candidates=1000, expected_computes_with_candidates=2)
|
||||||
|
|
||||||
def test_too_many_candidates_global_limit_is_hit_result_unbalanced(self):
|
def test_too_many_candidates_global_limit_is_hit_result_unbalanced(self):
|
||||||
|
self.conf_fixture.conf.set_override(
|
||||||
|
"allocation_candidates_generation_strategy", "depth-first",
|
||||||
|
group="placement")
|
||||||
# With max_allocation_candidates set to 100k limit this test now
|
# With max_allocation_candidates set to 100k limit this test now
|
||||||
# runs in reasonable time (10 sec on my machine), without that it would
|
# runs in reasonable time (10 sec on my machine), without that it would
|
||||||
# time out.
|
# time out.
|
||||||
# However, with the global limit in place only the first compute gets
|
# However, with depth-first strategy and with the global limit in place
|
||||||
# candidates.
|
# only the first compute gets candidates.
|
||||||
# 524288 valid candidates, the generation stops at 100k candidates,
|
# 524288 valid candidates, the generation stops at 100k candidates,
|
||||||
# only 1000 is returned, result is unbalanced as the first 100k
|
# only 1000 is returned, result is unbalanced as the first 100k
|
||||||
# candidate is always from the first compute.
|
# candidate is always from the first compute.
|
||||||
@@ -121,6 +127,21 @@ class TestWideTreeAllocationCandidateExplosion(base.TestCase):
|
|||||||
req_limit=1000,
|
req_limit=1000,
|
||||||
expected_candidates=1000, expected_computes_with_candidates=1)
|
expected_candidates=1000, expected_computes_with_candidates=1)
|
||||||
|
|
||||||
|
def test_too_many_candidates_global_limit_is_hit_breadth_first_balanced(
|
||||||
|
self
|
||||||
|
):
|
||||||
|
# With max_allocation_candidates set to 100k limit this test now
|
||||||
|
# runs in reasonable time (10 sec on my machine), without that it would
|
||||||
|
# time out.
|
||||||
|
# With the round-robin candidate generator in place the 100k generated
|
||||||
|
# candidates spread across both computes now.
|
||||||
|
# 524288 valid candidates, the generation stops at 100k candidates,
|
||||||
|
# only 1000 is returned, result is balanced between the computes
|
||||||
|
self._test_num_candidates_and_computes(
|
||||||
|
computes=2, pfs=8, vfs_per_pf=8, req_groups=6, req_res_per_group=1,
|
||||||
|
req_limit=1000,
|
||||||
|
expected_candidates=1000, expected_computes_with_candidates=2)
|
||||||
|
|
||||||
def test_global_limit_hit(self):
|
def test_global_limit_hit(self):
|
||||||
# 8192 possible candidates, global limit is set to 8000, higher request
|
# 8192 possible candidates, global limit is set to 8000, higher request
|
||||||
# limit so number of candidates are limited by the global limit
|
# limit so number of candidates are limited by the global limit
|
||||||
@@ -140,3 +161,30 @@ class TestWideTreeAllocationCandidateExplosion(base.TestCase):
|
|||||||
computes=2, pfs=8, vfs_per_pf=8, req_groups=4, req_res_per_group=1,
|
computes=2, pfs=8, vfs_per_pf=8, req_groups=4, req_res_per_group=1,
|
||||||
req_limit=9000,
|
req_limit=9000,
|
||||||
expected_candidates=8192, expected_computes_with_candidates=2)
|
expected_candidates=8192, expected_computes_with_candidates=2)
|
||||||
|
|
||||||
|
def test_breadth_first_strategy_generates_stable_ordering(self):
|
||||||
|
"""Run the same query twice against the same two tree and assert that
|
||||||
|
response text is exactly the same proving that even with breadth-first
|
||||||
|
strategy the candidate ordering is stable.
|
||||||
|
"""
|
||||||
|
|
||||||
|
self.create_tree(num_roots=2, num_child=8, num_res_per_child=8)
|
||||||
|
|
||||||
|
def query():
|
||||||
|
return client.get(
|
||||||
|
self.get_candidate_query(
|
||||||
|
num_groups=2, num_res=1,
|
||||||
|
limit=1000),
|
||||||
|
headers=self.headers)
|
||||||
|
|
||||||
|
conf = self.conf_fixture.conf
|
||||||
|
with direct.PlacementDirect(conf) as client:
|
||||||
|
resp = query()
|
||||||
|
self.assertEqual(200, resp.status_code)
|
||||||
|
body1 = resp.text
|
||||||
|
|
||||||
|
resp = query()
|
||||||
|
self.assertEqual(200, resp.status_code)
|
||||||
|
body2 = resp.text
|
||||||
|
|
||||||
|
self.assertEqual(body1, body2)
|
||||||
|
@@ -97,3 +97,60 @@ class TestAllocationCandidatesNoDB(base.TestCase):
|
|||||||
for group in different_subtree:
|
for group in different_subtree:
|
||||||
self.assertFalse(
|
self.assertFalse(
|
||||||
ac_obj._check_same_subtree(group, parent_by_rp))
|
ac_obj._check_same_subtree(group, parent_by_rp))
|
||||||
|
|
||||||
|
@mock.patch('placement.objects.research_context._has_provider_trees',
|
||||||
|
new=mock.Mock(return_value=True))
|
||||||
|
def _test_generate_areq_list(self, strategy, expected_candidates):
|
||||||
|
self.conf_fixture.conf.set_override(
|
||||||
|
"allocation_candidates_generation_strategy", strategy,
|
||||||
|
group="placement")
|
||||||
|
|
||||||
|
rw_ctx = res_ctx.RequestWideSearchContext(
|
||||||
|
self.context, placement_lib.RequestWideParams(), True)
|
||||||
|
areq_lists_by_anchor = {
|
||||||
|
"root1": {
|
||||||
|
"": ["r1A", "r1B",],
|
||||||
|
"group1": ["r1g1A", "r1g1B",],
|
||||||
|
},
|
||||||
|
"root2": {
|
||||||
|
"": ["r2A"],
|
||||||
|
"group1": ["r2g1A", "r2g1B"],
|
||||||
|
},
|
||||||
|
"root3": {
|
||||||
|
"": ["r3A"],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
generator = ac_obj._generate_areq_lists(
|
||||||
|
rw_ctx, areq_lists_by_anchor, {"", "group1"})
|
||||||
|
|
||||||
|
self.assertEqual(expected_candidates, list(generator))
|
||||||
|
|
||||||
|
def test_generate_areq_lists_depth_first(self):
|
||||||
|
# Depth-first will generate all root1 candidates first then root2,
|
||||||
|
# root3 is ignored as it has no candidate for group1.
|
||||||
|
expected_candidates = [
|
||||||
|
('r1A', 'r1g1A'),
|
||||||
|
('r1A', 'r1g1B'),
|
||||||
|
('r1B', 'r1g1A'),
|
||||||
|
('r1B', 'r1g1B'),
|
||||||
|
('r2A', 'r2g1A'),
|
||||||
|
('r2A', 'r2g1B'),
|
||||||
|
]
|
||||||
|
self._test_generate_areq_list("depth-first", expected_candidates)
|
||||||
|
|
||||||
|
@mock.patch('placement.objects.research_context._has_provider_trees',
|
||||||
|
new=mock.Mock(return_value=True))
|
||||||
|
def test_generate_areq_lists_breadth_first(self):
|
||||||
|
# Breadth-first will take one candidate from root1 then root2 then goes
|
||||||
|
# back to root1 etc. Root2 runs out of candidates earlier than root1 so
|
||||||
|
# the last two candidates are both from root1. The root3 is still
|
||||||
|
# ignored as it has no candidates for group1.
|
||||||
|
expected_candidates = [
|
||||||
|
('r1A', 'r1g1A'),
|
||||||
|
('r2A', 'r2g1A'),
|
||||||
|
('r1A', 'r1g1B'),
|
||||||
|
('r2A', 'r2g1B'),
|
||||||
|
('r1B', 'r1g1A'),
|
||||||
|
('r1B', 'r1g1B')
|
||||||
|
]
|
||||||
|
self._test_generate_areq_list("breadth-first", expected_candidates)
|
||||||
|
@@ -32,6 +32,7 @@ from placement.objects import resource_class as rc_obj
|
|||||||
from placement.objects import resource_provider as rp_obj
|
from placement.objects import resource_provider as rp_obj
|
||||||
from placement.tests.unit import base
|
from placement.tests.unit import base
|
||||||
from placement import util
|
from placement import util
|
||||||
|
from placement.util import roundrobin
|
||||||
|
|
||||||
|
|
||||||
class TestCheckAccept(testtools.TestCase):
|
class TestCheckAccept(testtools.TestCase):
|
||||||
@@ -1450,3 +1451,30 @@ class RunOnceTests(testtools.TestCase):
|
|||||||
self.assertRaises(ValueError, f.reset)
|
self.assertRaises(ValueError, f.reset)
|
||||||
self.assertFalse(f.called)
|
self.assertFalse(f.called)
|
||||||
mock_clean.assert_called_once_with()
|
mock_clean.assert_called_once_with()
|
||||||
|
|
||||||
|
|
||||||
|
class RoundRobinTests(testtools.TestCase):
|
||||||
|
def test_no_input(self):
|
||||||
|
self.assertEqual([], list(roundrobin()))
|
||||||
|
|
||||||
|
def test_single_input(self):
|
||||||
|
self.assertEqual([1, 2], list(roundrobin(iter([1, 2]))))
|
||||||
|
|
||||||
|
def test_balanced_inputs(self):
|
||||||
|
self.assertEqual(
|
||||||
|
[1, "x", 2, "y"],
|
||||||
|
list(roundrobin(
|
||||||
|
iter([1, 2]),
|
||||||
|
iter(["x", "y"]))
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
def test_unbalanced_inputs(self):
|
||||||
|
self.assertEqual(
|
||||||
|
["A", "D", "E", "B", "F", "C"],
|
||||||
|
list(roundrobin(
|
||||||
|
iter("ABC"),
|
||||||
|
iter("D"),
|
||||||
|
iter("EF"))
|
||||||
|
)
|
||||||
|
)
|
||||||
|
@@ -614,3 +614,16 @@ def run_once(message, logger, cleanup=None):
|
|||||||
wrapper.reset = functools.partial(reset, wrapper)
|
wrapper.reset = functools.partial(reset, wrapper)
|
||||||
return wrapper
|
return wrapper
|
||||||
return outer_wrapper
|
return outer_wrapper
|
||||||
|
|
||||||
|
|
||||||
|
def roundrobin(*iterables):
|
||||||
|
"""roundrobin(iter('ABC'), iter('D'), iter('EF')) --> A D E B F C
|
||||||
|
Returns a new generator consuming items from the passed in iterators in a
|
||||||
|
round-robin fashion.
|
||||||
|
It is adapted from
|
||||||
|
https://docs.python.org/3/library/itertools.html#itertools-recipes
|
||||||
|
"""
|
||||||
|
iterators = map(iter, iterables)
|
||||||
|
for num_active in range(len(iterables), 0, -1):
|
||||||
|
iterators = itertools.cycle(itertools.islice(iterators, num_active))
|
||||||
|
yield from map(next, iterators)
|
||||||
|
@@ -0,0 +1,63 @@
|
|||||||
|
---
|
||||||
|
fixes:
|
||||||
|
- |
|
||||||
|
In a deployment with wide and symmetric provider trees, i.e. where there
|
||||||
|
are multiple children providers under the same root having inventory from
|
||||||
|
the same resource class (e.g. in case of nova's mdev GPU or PCI in
|
||||||
|
Placement features) if the allocation candidate request asks for resources
|
||||||
|
from those children RPs in multiple request groups the number of possible
|
||||||
|
allocation candidates grows rapidly.
|
||||||
|
E.g.:
|
||||||
|
|
||||||
|
* 1 root, 8 child RPs with 1 unit of resource each
|
||||||
|
a_c requests 6 groups with 1 unit of resource each
|
||||||
|
=> 8*7*6*5*4*3=20160 possible candidates
|
||||||
|
|
||||||
|
* 1 root, 8 child RPs with 6 unit of resources each
|
||||||
|
a_c requests 6 groups with 6 unit of resources each
|
||||||
|
=> 8^6=262144 possible candidates
|
||||||
|
|
||||||
|
Placement generates these candidates fully before applying the limit
|
||||||
|
parameter provided in the allocation candidate query to be able do a random
|
||||||
|
sampling if ``[placement]randomize_allocation_candidates`` is True.
|
||||||
|
|
||||||
|
Placement takes excessive time and memory to generate this amount of
|
||||||
|
allocation candidates and the client might time out waiting for the
|
||||||
|
response or the Placement API service run out of memory and crash.
|
||||||
|
|
||||||
|
To avoid request timeout or out of memory events a new
|
||||||
|
``[placement]max_allocation_candidates`` config option is implemented. This
|
||||||
|
limit is applied not after the request limit but *during* the
|
||||||
|
candidate generation process. So this new option can be used to limit the
|
||||||
|
runtime and memory consumption of the Placement API service.
|
||||||
|
|
||||||
|
The new config option is defaulted to ``-1``, meaning no limit, to keep the
|
||||||
|
legacy behavior. We suggest to tune this config in the affected
|
||||||
|
deployments based on the memory available for the Placement service and the
|
||||||
|
timeout setting of the clients. A good initial value could be around
|
||||||
|
``100000``.
|
||||||
|
|
||||||
|
If the number of generated allocation candidates is limited by the
|
||||||
|
``[placement]max_allocation_candidates`` config option then it is possible
|
||||||
|
to get candidates from a limited set of root providers (e.g. compute
|
||||||
|
nodes) as placement uses a depth-first strategy, i.e. generating all
|
||||||
|
candidates from the first root before considering the next one. To avoid
|
||||||
|
this issue a new config option
|
||||||
|
``[placement]allocation_candidates_generation_strategy`` is introduced
|
||||||
|
with two possible values:
|
||||||
|
|
||||||
|
* ``depth-first``, generates all candidates from the first viable root
|
||||||
|
provider before moving to the next. This is the default and this
|
||||||
|
triggers the old behavior
|
||||||
|
|
||||||
|
* ``breadth-first``, generates candidates from viable roots in a
|
||||||
|
round-robin fashion, creating one candidate from each viable root
|
||||||
|
before creating the second candidate from the first root. This is the
|
||||||
|
possible behavior.
|
||||||
|
|
||||||
|
In a deployment where ``[placement]max_allocation_candidates`` is
|
||||||
|
configured to a positive number we recommend to set
|
||||||
|
``[placement]allocation_candidates_generation_strategy`` to
|
||||||
|
``breadth-first``.
|
||||||
|
|
||||||
|
.. _Bug#2070257: https://bugs.launchpad.net/nova/+bug/2070257
|
Reference in New Issue
Block a user