本文讨论下Ceph在Jewel中引入的 dynamic throttle:BackoffThrottle;分析后优化Ceph filestore,journal相关的throttle配置;
参考文章:
http://blog.wjin.org/posts/ceph-dynamic-throttle.html
https://fossies.org/linux/ceph/src/doc/dynamic-throttle.txt
BackoffThrottle
Jewel引入了dynamic的throttle,就是代码中BackoffThrottle,现在filestore和Journal都是使用它来做throttle的;
class
FileStore
{
BackoffThrottle throttle_ops, throttle_bytes;
}
class
JournalThrottle {
BackoffThrottle throttle;
…
}
BackoffThrottle定义和相关参数如下:
/**
* BackoffThrottle
*
* Creates a throttle which gradually induces delays when get() is called
* based on params low_threshhold, high_threshhold, expected_throughput,
* high_multiple, and max_multiple.
*
* In [0, low_threshhold), we want no delay.
*
* In [low_threshhold, high_threshhold), delays should be injected based
* on a line from 0 at low_threshhold to
* high_multiple * (1/expected_throughput) at high_threshhold.
*
* In [high_threshhold, 1), we want delays injected based on a line from
* (high_multiple * (1/expected_throughput)) at high_threshhold to
* (high_multiple * (1/expected_throughput)) +
* (max_multiple * (1/expected_throughput)) at 1.
*
* Let the current throttle ratio (current/max) be r, low_threshhold be l,
* high_threshhold be h, high_delay (high_multiple / expected_throughput) be e,
* and max_delay (max_muliple / expected_throughput) be m.
*
* delay = 0, r \in [0, l)
* delay = (r - l) * (e / (h - l)), r \in [l, h)
* delay = h + (r - h)((m - e)/(1 - h))
*/
class
BackoffThrottle {
…
/// see above, values are in [0, 1].
double
low_threshhold =
0
;
double
high_threshhold =
1
;
/// see above, values are in seconds
double
high_delay_per_count =
0
;
double
max_delay_per_count =
0
;
/// Filled in in set_params
double
s0 =
0
;
///< e / (h - l), l != h, 0 otherwise
double
s1 =
0
;
///< (m - e)/(1 - h), 1 != h, 0 otherwise
/// max
uint64_t max =
0
;
uint64_t current =
0
;
…
}
filestore throttle举例分析
下面以使用BackoffThrottle的filestore throttle举例分析下其参数配置
filestore throttle的相关配置项
OPTION(filestore_expected_throughput_bytes, OPT_DOUBLE,
200
<<
20
)
OPTION(filestore_expected_throughput_ops, OPT_DOUBLE,
200
)
OPTION(filestore_queue_max_bytes, OPT_U64,
100
<<
20
)
OPTION(filestore_queue_max_ops, OPT_U64,
50
)
OPTION(filestore_queue_max_delay_multiple, OPT_DOUBLE,
0
)
OPTION(filestore_queue_high_delay_multiple, OPT_DOUBLE,
0
)
OPTION(filestore_queue_low_threshhold, OPT_DOUBLE,
0.3
)
OPTION(filestore_queue_high_threshhold, OPT_DOUBLE,
0.9
)
根据配置项初始化BackoffThrottle
bool BackoffThrottle::set_params(
double
_low_threshhold,
double
_high_threshhold,
double
_expected_throughput,
double
_high_multiple,
double
_max_multiple,
uint64_t _throttle_max,
ostream *errstream)
{
low_threshhold = _low_threshhold;
high_threshhold = _high_threshhold;
high_delay_per_count = _high_multiple / _expected_throughput;
max_delay_per_count = _max_multiple / _expected_throughput;
max = _throttle_max;
if
(high_threshhold - low_threshhold >
0
) {
s0 = high_delay_per_count / (high_threshhold - low_threshhold);
}
else
{
low_threshhold = high_threshhold;
s0 =
0
;
}
if
(
1
- high_threshhold >
0
) {
s1 = (max_delay_per_count - high_delay_per_count)
/ (
1
- high_threshhold);
}
else
{
high_threshhold =
1
;
s1 =
0
;
}
}
int
FileStore::set_throttle_params()
{
stringstream ss;
bool valid = throttle_bytes.set_params(
g_conf->filestore_queue_low_threshhold,
g_conf->filestore_queue_high_threshhold,
g_conf->filestore_expected_throughput_bytes,
g_conf->filestore_queue_high_delay_multiple,
g_conf->filestore_queue_max_delay_multiple,
g_conf->filestore_queue_max_bytes,
&ss);
valid &= throttle_ops.set_params(
g_conf->filestore_queue_low_threshhold,
g_conf->filestore_queue_high_threshhold,
g_conf->filestore_expected_throughput_ops,
g_conf->filestore_queue_high_delay_multiple,
g_conf->filestore_queue_max_delay_multiple,
g_conf->filestore_queue_max_ops,
&ss);
…
}
获取delay值
std::chrono::duration<
double
> BackoffThrottle::_get_delay(uint64_t c)
const
{
if
(max ==
0
)
return
std::chrono::duration<
double
>(
0
);
double
r = ((
double
)current) / ((
double
)max);
if
(r < low_threshhold) {
return
std::chrono::duration<
double
>(
0
);
}
else
if
(r < high_threshhold) {
return
c * std::chrono::duration<
double
>(
(r - low_threshhold) * s0);
}
else
{
return
c * std::chrono::duration<
double
>(
high_delay_per_count + ((r - high_threshhold) * s1));
}
}
如上述函数描述,分四种情况计算delay值:
max = 0时:永远返回 0current/max < low_threshhold时:返回 0low_threshhold <= current/max < high_threshhold时:计算一值high_threshhold <= current/max时:计算一值
如图所示,在第一个区间的时候,也就是压力不大的情况下,delay值为0,是不需要wait的。当压力增大,x落入第二个区间后,delay值开始起作用,并且逐步增大, 当压力过大的时候,会落入第三个区间,这时候delay值增加明显加快,wait值明显增大,尽量减慢io速度,减缓压力,故而得名dynamic throttle。
默认情况下filestore throttle分析
filestore有bytes和ops两个throttle,这里以bytes为例分析:
默认情况下:filestore_queue_high_delay_multiple = 0, filestore_queue_max_delay_multiple = 0;
相当于BackoffThrottle中的值如下:
low_threshhold =
0.3
high_threshhold =
0.9
high_delay_per_count =
0
max_delay_per_count =
0
s0 =
0
s1 =
0
max =
100
<<
20
所以默认配置下,是关闭dynamic delay的;
开启dynamic throttle
参考最早的代码,配置:
filestore_queue_high_delay_multiple =
2
filestore_queue_max_delay_multiple =
10
其他使用默认值是,BackoffThrottle中的值如下:
low_threshhold =
0.3
high_threshhold =
0.9
high_delay_per_count =
2
/(
200
<<
20
)
max_delay_per_count =
10
/(
200
<<
20
)
s0 = (
2
/(
200
<<
20
))/
0.6
s1 = (
8
/(
200
<<
20
))/
0.1
max =
100
<<
20
则此时的delay分为如下几种:
c: op->bytes,即一次请求的数据量
current: 当前filestore queue的数据量,初始化为 0,每次调用:throttle_bytes.get(o->bytes);{ current + = c;}
current/max < low_threshhold时:此时 current < (30 << 20);delay = 0low_threshhold <= current/max < high_threshhold时: 此时 (30 << 20) <= current < (90 << 20) delay = c * ((current/max - 0.3) * s0) a)current = 30 << 20时:delay = 0 b)current = 90 << 20时:delay = c / (100 << 20)high_threshhold <= current/max时: 此时 (90 << 20) < current delay = c * (2/(200 << 20) + (current/max - 0.9) * s1) a)current = 90 << 20时:delay = c / (100 << 20) b)current = 100 << 20时:delay = 5 * c / (100 << 20)
优化配置下的dynamic throttle
配置如下:
filestore_expected_throughput_bytes =
536870912
// 512M
filestore_queue_max_bytes=
1048576000
// 1000M
filestore_queue_low_threshhold =
0.6
filestore_queue_high_threshhold =
0.9
// 默认值
filestore_queue_high_delay_multiple =
2
filestore_queue_max_delay_multiple =
10
BackoffThrottle中的值如下:
low_threshhold =
0.6
high_threshhold =
0.9
high_delay_per_count =
2
/(
512
<<
20
)
max_delay_per_count =
10
/(
512
<<
20
)
s0 = (
2
/(
512
<<
20
))/
0.3
s1 = (
8
/(
512
<<
20
))/
0.1
max =
1000
<<
20
则此时的delay分为如下几种:
current/max < low_threshhold时:此时
current < (600 << 20);delay = 0low_threshhold <= current/max < high_threshhold时: 此时
(600 << 20) <= current < (900 << 20) delay = c * ((current/max - 0.6) * s0) a)
current = 600 << 20时:delay = 0 b)
current = 900 << 20时:
delay = c / (256 << 20)high_threshhold <= current/max时: 此时
(900 << 20) < current delay = c * (2/(512 << 20) + (current/max - 0.9) * s1) a)
current = 900 << 20时:
delay = c / (256 << 20) b)
current = 1000 << 20时:
delay = 5 * c / (256 << 20)
结论:这里的参数配置不是很合理;600M之前的delay都是0;后续随着current的增大,delay的值小于默认时候的值,可能会加大filestore的压力;