linux内核实时进程的调度原理

xiaoxiao2021-02-28 43

[摘要] [正文一]linux调度系统概述 [正文二] 调度过程 [总结] 注意：请使用谷歌浏览器阅读（IE浏览器排版混乱） [摘要] 本文主要介绍linux内核实时进程的调度过程。进程的创建、切换等过程请参考另一篇文档：linux系统进程的创建。要了解本文需要清楚linux系统与调度相关的时间的概念，请参考博文：linux系统调度之时间。 [正文一]linux调度系统概述

1 linux系统中常用的调度器：

实时进程（包括SCHED_RR/SCHED_FIFO）基于优先级队列进行调度。

实时进程调度类：rt_sched_class

普通进程（包括SCHED_NORMAL）使用CFS调度器进行调度。

普通进程调度类：fair_sched_class

2 系统中调度策略的定义：

#define SCHED_NORMAL 0 #define SCHED_FIFO 1 #define SCHED_RR 2 3 调度的关键：

1）无论实时进程还是普通进程，调度的关键都在于调度的时机、下一个进程的选取、优先级队列（实时进程中使用）或红黑树的维护（普通进程使用）。

2）对于实时进程来说，下一个进程是从优先级队列上选取的，选择的标准和优先级队列的维护最为关键。

[正文二] 调度过程

1 调度的发生。

当前进程需要被调度的标记：主要讨论当前进程在什么情况下被调度。

1>当前进程主动调度:由当前进程主动调用schedule()进行调度。

举例：比如调用msleep时，系统会启用一个定时器然后主动调度走(即schedule_timeout)。等待互斥锁和睡眠类似。

2>当前进程被动调度:当前进程或中断上下文中被动调度(此处仅仅设置调度标志，并未真正调度否则无法返回中断上下文)：

关键代码：wak_up_process->resched_task->set_tsk_need_resched

通过wake_up系列函数主动唤醒某个进程。当系统调用、中断处理返回时，操作系统会在do_work_pending中检查这个标记，如果被设置，则调用schedule()进行调度。

例1：msleep到期时，定时器中断下文处理函数会wake_up_process .

唤醒睡眠进程: process_timeout->wake_up_process->try_to_wake_up->ttwu_queue-> ttwu_do_activate();

ttwu_do_activate(struct rq *rq, struct task_struct*p, int wake_flags) { /* ->activate_task激活进程p，将p的调度实体(sched_entity)添加到运行队列上,接下来再设置需要调度的标记 . */ ttwu_activate(rq,p, ENQUEUE_WAKEUP | ENQUEUE_WAKING); /* ->check_preempt_curr->check_preempt_curr_rt->resched_task把当前进程标记为TIF_NEED_RESCHED . */ ttwu_do_wakeup(rq, p, wake_flags); } 注意此处的调度并不是真正完成上下文切换，而是设置调度标记.

例2：新进程创建后(do_fork)，也会被唤醒(wake_up_new_task)：

void wake_up_new_task(struct task_struct *p) { /* wake_up_new_task->activate_task->enqueue_task->enqueue_task_rt->enqueue_rt_entity -> activate_task激活进程p，将p的调度实体sched_entity添加到运行队列上 */ activate_task(rq,p, 0); p->on_rq= 1; /* ->check_preempt_curr->check_preempt_curr_rt->resched_task 把当前进程标记为TIF_NEED_RESCHED */ check_preempt_curr(rq,p, WF_FORK); } 其实从以上两个例子可以看到wake_up系列函数，完成两个主要功能：

一标记当前进程需要被调度；

二将被唤醒的进程添加到优先级队列，以便schedule()在选取下一个进程运行时，有机会选择到。注意此处:对于实时进程来说,不是添加到优先级队列就一定会被调度选择到，这还与进程的优先级相关，这一点和cfs调度器有明显区别,cfs策略在一个调度周期内所有进程都有机会被调度到，只是运行时间不同，与nice值有关prio_to_weight()。

3>实时进程的时间片耗尽时，当前进程会调度走。

关键代码：task_tick_rt->set_tsk_need_resched

进程时间片管理:定时器中完成进程时间片的管理:

（参考代码update_process_times->scheduler_tick->task_tick_rt）此处可以参考下面优先级队列的维护。

4>设置调度标记：

关键代码实现：set_tsk_need_resched；

系统通过set_tsk_need_resched接口标记tsk进程需要被调度，这个接口一般不直接调用，而是通过如下两个路径实现。

1) resched_task->set_tsk_need_resched

2) task_tick_rt->set_tsk_need_resched

注意：实时进程中直接调用了set_tsk_need_resched，普通进程时通过调用：task_tick_fair->resched_task实现的。

2 scheudle调度过程

上面介绍了当前进程需要被调度的标志设置的时机。事实上设置标记时，并未马上发生调度，而是当系统调用、中断处理返回时，操作系统会在do_work_pending中检查这个标记，如果被设置，则调用schedule()进行调度。下面就开始介绍关键的schedule函数，尽管这个函数有点长，但是因为其重要性，还是要拿出来介绍下：

static void __sched __schedule(void) { structtask_struct *prev, *next; unsignedlong *switch_count; structrq *rq; intcpu; need_resched: preempt_disable(); cpu =smp_processor_id(); rq =cpu_rq(cpu); rcu_note_context_switch(cpu); /*运行队列上的当前进程，这个进程将要让出cpu给下一个进程*/ prev =rq->curr; /* spin_lock原子状态下发生调度，会有告警错误 */ schedule_debug(prev); if(sched_feat(HRTICK)) hrtick_clear(rq); /* * Make sure thatsignal_pending_state()->signal_pending() below * can't be reordered with__set_current_state(TASK_INTERRUPTIBLE) * done by the caller to avoid the race withsignal_wake_up(). */ smp_mb__before_spinlock(); raw_spin_lock_irq(&rq->lock); /*当前进程的被切换次数（分为主动切换和被动切换），初始化为被动切换*/ switch_count= &prev->nivcsw; /* 非正在运行且没有被抢占的进程认为是主动切换的即prev->state非0；（prev->state=TASK_RUNNING=0即被动切换）比如msleep->schedule_timeout_uninterruptible中: __set_current_state(TASK_UNINTERRUPTIBLE); */ if(prev->state && !(preempt_count() & PREEMPT_ACTIVE)) { if(unlikely(signal_pending_state(prev->state, prev))) { prev->state= TASK_RUNNING; }else { /* schedule过程中，如果是自愿调度走，进程的调度实体也需要从active队列上删除如果是抢占的则不从运行队列优先级队列active上删除 */ deactivate_task(rq,prev, DEQUEUE_SLEEP); prev->on_rq= 0; /* * If a worker went to sleep, notify and askworkqueue * whether it wants to wake up a task tomaintain * concurrency. */ if(prev->flags & PF_WQ_WORKER) { structtask_struct *to_wakeup; to_wakeup= wq_worker_sleeping(prev, cpu); if (to_wakeup) try_to_wake_up_local(to_wakeup); } } switch_count= &prev->nvcsw; } pre_schedule(rq,prev); if(unlikely(!rq->nr_running)) idle_balance(cpu,rq); #ifndef gSysDebugInfoSched /* 1 rt_sched_class->put_prev_task_rt put_prev_task_rt->update_curr_rt更新进程的运行时间有可能resched_task 2 fair_sched_class->put_prev_task_fair */ #endif put_prev_task(rq,prev); /* 1 rt_sched_class->pick_next_task_rt 在rt_prio_array即rt_rq->active;上选择优先级最高的进程开始运行并更新进程在一个时钟中断中开始运行的时间exec_start 2 fair_sched_class->pick_next_task_fair */ next =pick_next_task(rq); clear_tsk_need_resched(prev); rq->skip_clock_update= 0; if(likely(prev != next)) { rq->nr_switches++; #ifndef gSysDebugInfoSched /*rq->curr表示当前运行的进程，在此初始化*/ #endif rq->curr= next; ++*switch_count; /*真正实现进程上下文切换*/ context_switch(rq,prev, next); /* unlocks the rq */ /* * The context switch have flipped the stackfrom under us * and restored the local variables which weresaved when * this task called schedule() in the past.prev == current * is still correct, but it can be moved toanother cpu/rq. */ cpu= smp_processor_id(); rq= cpu_rq(cpu); } else raw_spin_unlock_irq(&rq->lock); post_schedule(rq); sched_preempt_enable_no_resched(); if(need_resched()) goto need_resched; } 本节给出schedule的全貌，具体在以下章节中分步骤分析。

3 scheudle调度过程中下一个进程的选择.

关键代码实现：__schedule–>pick_next_task(rq)->pick_next_rt_entity;

static struct sched_rt_entity *pick_next_rt_entity(struct rq *rq,struct rt_rq *rt_rq) { /* rt_rq表示Cpu上实时进程的运行队列 rt_rq->active表示就绪的实时进程优先级队列，即队列上的进程可被选择运行。其中： struct rt_prio_array { DECLARE_BITMAP(bitmap,MAX_RT_PRIO+1); struct list_head queue[MAX_RT_PRIO]; rt_rq->active有MAX_RT_PRIO（100）个链表，每个实时进程的优先级都对应一个 }; */ struct rt_prio_array *array = &rt_rq->active; struct sched_rt_entity *next = NULL; struct list_head *queue; int idx; /*最高优先级*/ idx =sched_find_first_bit(array->bitmap); BUG_ON(idx>= MAX_RT_PRIO); /*在最高优先级队列上的链表中选择调度一个实体*/ queue =array->queue + idx; next =list_entry(queue->next, struct sched_rt_entity, run_list); return next; } 此处提出几个问题：schedule从优先级队列上选择下一个将要运行的进程，那么:

问题一：优先级队列上这些进程是如何添加上去的？

问题二：进程何时添加到优先级队列上以供调度，又何时从队列上移除，退出调度呢？

问题三：如果一个实时进程没有主动调度走（没有睡眠，没有等待互斥锁，也没有主动调用schedule），那么系统还会让出cpu给低优先级的实时进程或普通进程么？

下面就介绍实时进程调度最为重要的优先级队列的维护，同时也回答这几个问题。

4 实时进程优先级队列的维护

所谓优先级队列的维护其实就是：进程的调度实体适时的添加到优先级队列的链表上和从优先级队列的链表上移除的过程.

进程描述符task_struct中，都有一个调度实体：

struct sched_rt_entity { struct list_head run_list; } 上面介绍过，wake_up系列函数、时间片耗尽时，会设置当前进程需要被调度标记（set_tsk_need_resched）,其实wake_up系列函数在设置这个标志之前会先把要唤醒的进程添加到优先级队列上，这样在schedule选择下一个进程时，就有可能选择到要唤醒的进程.下面以wake_up_process为例进行介绍进程是如何被添加到优先级队列上的.

4.1 进程添加到调度队列的过程

关键函数：wake_up_process(p)->ttwu_queue->ttwu_do_activate->ttwu_activate->activate_task->enqueue_task;

首先简单回顾一下：事实上wake_up_process后并未马上进行线程切换，而是在系统调用返回，中断完成返回等过程进行切换的。

此处并未马上切换到子进程去执行,而是设置当前进程的调度标志set_tsk_thread_flag(tsk,TIF_NEED_RESCHED);

同时把要唤醒的进程p添加到调度队列中（enqueue_task）,当进程切换时,schedule()再从调度队列中选取进程,与当前进程进行上下文切换context_switch.

1) 进程入调度队列过程的关键函数介绍:enqueue_task();

调用过程：wake_up_process(p)->ttwu_queue->ttwu_do_activate->ttwu_activate->activate_task->enqueue_task;

static void enqueue_task(struct rq *rq, struct task_struct *p, int flags) { /*注意此处更新了队列时间rq->clock_task*/ update_rq_clock(rq); sched_info_queued(p); /* SCHED_NORMAL: enqueue_task_fair SCHED_RR: enqueue_task_rt 对于实时进程：Rt.c (kernel\sched): enqueue_task= enqueue_task_rt rq ：每个cpu对应一个运行队列，rq=cpu_rq(cpu); */ p->sched_class->enqueue_task(rq, p, flags); } 2) 进程入调度队列过程的关键函数介绍:enqueue_task_rt();

调用过程：wake_up_process(p)->ttwu_queue->ttwu_do_activate->ttwu_activate->activate_task->enqueue_task->enqueue_task_rt;

static void enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags) { /*sched_rt_entity是一个调度的实体，实时进程是通过它与运行队列关联的*/ struct sched_rt_entity *rt_se = &p->rt; if (flags & ENQUEUE_WAKEUP) rt_se->timeout = 0; /*enqueue_rt_entity中把调度实体rt_se->run_list挂载到运行队列*/ enqueue_rt_entity(rt_se, flags & ENQUEUE_HEAD); if (!task_current(rq, p) && p->nr_cpus_allowed > 1) enqueue_pushable_task(rq, p); inc_nr_running(rq); } 3) 进程入调度队列过程的关键函数介绍:enqueue_rt_entity();

调用过程：wake_up_process(p)->ttwu_queue->ttwu_do_activate->ttwu_activate->activate_task->enqueue_task->

enqueue_task_rt->enqueue_rt_entity;

static void enqueue_rt_entity(struct sched_rt_entity *rt_se, bool head) { /*判断该进程是否在运行队列上，如果是，需要先删除掉*/ dequeue_rt_stack(rt_se); /*调度实体rt_se->run_list挂载到运行队列*/ for_each_sched_rt_entity(rt_se) __enqueue_rt_entity(rt_se, head); } enqueue_rt_entity->dequeue_rt_stack ,出队列过程dequeue_rt_entity时也会调用dequeue_rt_stack static void dequeue_rt_stack(struct sched_rt_entity *rt_se) { struct sched_rt_entity *back = NULL; for_each_sched_rt_entity(rt_se) { rt_se->back = back; back = rt_se; } for (rt_se = back; rt_se; rt_se = rt_se->back) { if (on_rt_rq(rt_se)) /* 进程的调度实体sched_rt_entity在实时进程运行队列上，无论进程调度实体插入优先级队列acitve，还是出队列，都需要先从队列上删除，我们知道，出队列是一定需要删除，那么入队列时为什么需要删除？原因之一时，进程实体需要挂载到队列头，或尾部。而入之前就在队列上，则调度实体可能在队列中间。所以需要先删除再进入。 */ __dequeue_rt_entity(rt_se); } }

enqueue_rt_entity->__enqueue_rt_entity 调度实体添加到运行队列的链表上

static void __enqueue_rt_entity(struct sched_rt_entity *rt_se, bool head) { struct rt_rq *rt_rq = rt_rq_of_se(rt_se); struct rt_prio_array *array = &rt_rq->active; struct rt_rq *group_rq = group_rt_rq(rt_se); struct list_head *queue = array->queue + rt_se_prio(rt_se); /* * Don't enqueue the group if its throttled, or when empty. * The latter is a consequence of the former when a child group * get throttled and the current group doesn't have any other * active members. */ if (group_rq && (rt_rq_throttled(group_rq) || !group_rq->rt_nr_running)) return; if (!rt_rq->rt_nr_running) list_add_leaf_rt_rq(rt_rq); /* 将进程的调度实体的链表挂载到运行队列上。调度过程中选择下一个运行进程时，会通过函数pick_next_rt_entity从该链表上找到rt_re->run_list，从而找到下一进程的运行实体 */ if (head) list_add(&rt_se->run_list, queue); else list_add_tail(&rt_se->run_list, queue); __set_bit(rt_se_prio(rt_se), array->bitmap); inc_rt_tasks(rt_se, rt_rq); } 分析：

1> sched_rt_entity是一个调度的实体，实时进程是通过它与运行队列关联的。

2> enqueue_task->enqueue_rt_entity：实时进程在enqueue_rt_entity中把调度实体rt_se->run_list

挂载到运行队列active上:rt_prio_array *array = &rt_rq->active的队列头，每个cpu对应一个实时进程的运行队列，而且该调度实体是待唤醒进程的调度实体。

注意此处enqueue_rt_entity在挂载优先级队列之前，在dequeue_rt_stack->on_rt_rq中判断该进程是否在运行队列上，如果是，需要先删除掉。

3> 在进程调度__schedule时->pick_next_task(rq)->pick_next_rt_entity,选择下一个进程运行时，就是在这个active实时优先级队列上选取的。

4> enqueue_task的逆过程dequeue_task->dequeue_rt_stack时将进程的调度实体sched_rt_entity从优先级队列active上删除;

注意__enqueue_rt_entity入运行队列时使用进程的优先级:task_struct->prio.

4.2 进程从调度队列移除的过程

enqueue_task逆实现dequeue_task->dequeue_task_rt->dequeue_rt_entity

实时进程的调度实体从active优先级队列上删除的过程，注意区分与requeue_task_rt的区别，此处删除后并不加入到active队列尾部，一般主动调度走时(如msleep)会直接从active上删除，并且不再添加；而时间片到期或者被动调度则会requeue_task_rt :

dequeue_rt_entity完成从优先级队列上删除进程调度实体操作

static void dequeue_rt_entity(struct sched_rt_entity *rt_se) { /*把调度实体从运行队列上删除，上面介绍的入队列过程enqueue_rt_entity也会执行该该函数。*/ dequeue_rt_stack(rt_se); for_each_sched_rt_entity(rt_se) { struct rt_rq *rt_rq = group_rt_rq(rt_se); if (rt_rq && rt_rq->rt_nr_running) __enqueue_rt_entity(rt_se, false); } } dequeue_rt_entity->dequeue_rt_stack ,入队列过程enqueue_rt_entity时也会调用dequeue_rt_stack static void dequeue_rt_stack(struct sched_rt_entity *rt_se) { struct sched_rt_entity *back = NULL; for_each_sched_rt_entity(rt_se) { rt_se->back = back; back = rt_se; } for (rt_se = back; rt_se; rt_se = rt_se->back) { if (on_rt_rq(rt_se)) /* 进程的调度实体sched_rt_entity在实时进程运行队列上，无论进程调度实体插入优先级队列acitve，还是出队列，都需要先从队列上删除，我们知道，出队列是一定需要删除，那么入队列时为什么需要删除？原因之一时，进程实体需要挂载到队列头，或尾部。而入之前就在队列上，则调度实体可能在队列中间。所以需要先删除再进入。 */ __dequeue_rt_entity(rt_se); } } 真正完成从运行队列链表上删除进程调度实体操作。

调用关系：dequeue_rt_entity->dequeue_rt_stack->__dequeue_rt_entity()

static void __dequeue_rt_entity(struct sched_rt_entity *rt_se) { struct rt_rq *rt_rq = rt_rq_of_se(rt_se); struct rt_prio_array *array = &rt_rq->active; /* 把run_list从array->queue队列上移除。虽然此处没有出现array->queue，但链表在添加时都是以array->queue为头的. 对比上文入队时：在__enqueue_rt_entity->list_add调用中把run_list添加到array->queue队列上list_add(&rt_se->run_list,queue); */ list_del_init(&rt_se->run_list); if (list_empty(array->queue + rt_se_prio(rt_se))) __clear_bit(rt_se_prio(rt_se), array->bitmap); dec_rt_tasks(rt_se, rt_rq); if (!rt_rq->rt_nr_running) list_del_leaf_rt_rq(rt_rq); } 4.3 进程重入运行队列的过程：requeue_task_rt.

简单回顾一下,上面介绍了一个入队列的时机wake_up_new_task->enqueue_task。

问题：进程何时移除出优先级队列？

首先：以下两个接口都用作出队列，设置优先级时通过dequeue_task出队列，这个接口实时进程和普通进程都可用，里面通过不同的回调函数来区分。

其次：时间片耗尽(task_tick_rt)，yield()->(current->sched_class->yield_task(rq))等过程，都通过requeue_task_rt把调度实体先从active上删除，再添加到active头或尾部. schedule-> deactivate_task过程中，如果是自愿调度走，进程的调度实体也需要从active队列上删除（deactivate_task）如msleep过程（注意区分此处删除后并未加入active队列尾部），以后通过wake_up重新添加回来。被迫或者抢占时不从active队列上移除.

调用关系：task_tick_rt/yield->requeue_task_rt:

static void requeue_task_rt(struct rq *rq, struct task_struct *p, int head) { struct sched_rt_entity *rt_se = &p->rt; struct rt_rq *rt_rq; for_each_sched_rt_entity(rt_se) { rt_rq = rt_rq_of_se(rt_se); /*真正完成进程运行实体添加的优先级队列头或尾的操作*/ requeue_rt_entity(rt_rq, rt_se, head); } } requeue_task_rt->requeue_rt_entity：真正完成进程运行实体添加的优先级队列头或尾的操作。

static void requeue_rt_entity(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se, int head) { if (on_rt_rq(rt_se)) { struct rt_prio_array *array = &rt_rq->active; struct list_head *queue = array->queue + rt_se_prio(rt_se); if (head) list_move(&rt_se->run_list, queue); else list_move_tail(&rt_se->run_list, queue); } } 注意此处链表的操作:

1>requeue_task_rt->requeue_rt_entity：完成进程运行实体添加的优先级队列头或尾的操作。

2>时间片耗尽时：task_tick_rt->requeue_task_rt->requeue_rt_entity->list_move

注意此处是先从优先级队列上删除运行实体，再添加，注意如果时间耗尽的进程，本身就在active优先级队列的头或尾，则不用再进行删除添加了。

如果删除后，不再添加，那么时间片耗尽的进程就不能再进行调度了。

static inline void list_move(struct list_head *list,struct list_head *head) { /* 实际上是把时间片耗尽的进程的调度实体，从优先级队列array->queue上删除；这么做的目的是，当前进程调度实体，有可能在优先级队列中间。而时间片耗尽时，要把它移到队列尾，那么首先就要把调度实体从优先级队列中间删除。 */ __list_del_entry(list); /* 再把调度实体添加到优先级队列的头或尾。 */ list_add(list,head); } ps: 回顾一下上面介绍的链表操作：dequeue_rt_entity函数中完成从优先级队列上删除进程调度实体操作。

1>__dequeue_rt_entity->list_del_init 函数把run_list从array->queue队列上移除

2>__enqueue_rt_entity->list_add 函数把run_list添加到array->queue队列上

4.4 实时进程超出系统定义的运行时间：sched_rt_runtime_exceeded.

对于优先级高的实时进程，它一直占有cpu，如果不主动让出，即使时间片耗尽，也只是移动到优先级队列尾部，如果不是更高优先级的进程，则其他低优先级的程还是不能运行，那么这种情况下，普通进程是如何运行的？他是如何抢占在active优先级队列上的实时进程的cpu的？

一般来讲，系统会通过proc定义一个sysctl_sched_rt_runtime,在每个tick中断处理中都会通过update_curr_rt->sched_rt_runtime_exceeded判断实时进程最近一次被调度后的运行时间，是否超过系统定义的实时进程运行时间的阈值（这个阈值通常为0.95s，即1s内实时进程运行时间不得超过0.95s）.如果超过阈值则会设置重新调度的标志（通过resched_task接口），系统调度时选择下一进程，不会再选择实时进程（参看后文sched_rt_runtime_exceeded函数中对rt_throttled标志作用的介绍）

static void update_curr_rt(struct rq *rq) { struct task_struct *curr = rq->curr; struct sched_rt_entity *rt_se = &curr->rt; struct rt_rq *rt_rq = rt_rq_of_se(rt_se); u64 delta_exec; if (curr->sched_class != &rt_sched_class) return; /* 进程的执行时间，如果被调度走，则间隔可能不是一个时钟中断进程在一个tick中断中运行的时间=rq上的运行时间-本进程开始执行的时间 */ delta_exec = rq->clock_task - curr->se.exec_start; if (unlikely((s64)delta_exec <= 0)) return; schedstat_set(curr->se.statistics.exec_max, max(curr->se.statistics.exec_max, delta_exec)); /* 更新当前进程总运行时间 */ curr->se.sum_exec_runtime += delta_exec; account_group_exec_runtime(curr, delta_exec); /* 下次运行的起始时间 */ curr->se.exec_start = rq->clock_task; cpuacct_charge(curr, delta_exec); sched_rt_avg_update(rq, delta_exec); /*开启sysctl_sched_rt_runtime表示实时进程运行时间是0.95s*/ if (!rt_bandwidth_enabled()) return; for_each_sched_rt_entity(rt_se) { rt_rq = rt_rq_of_se(rt_se); if (sched_rt_runtime(rt_rq) != RUNTIME_INF) { raw_spin_lock(&rt_rq->rt_runtime_lock); rt_rq->rt_time += delta_exec; /* 判断实时进程最近一次被调度后的运行时间，是否超过系统定义的实时进程运行时间的阈值，这个阈值通常为0.95s，即1s内实时进程运行时间不得超过0.95s */ if (sched_rt_runtime_exceeded(rt_rq)) /* 超过实时进程运行时间的阈值，需要设置重新调度标识 */ resched_task(curr); raw_spin_unlock(&rt_rq->rt_runtime_lock); } } }

1 其中，update_curr_rt的详细介绍请参考博文：linux系统调度之时间。

update_curr_rt->sched_rt_runtime_exceeded,请参看sched_rt_runtime_exceeded函数：

scheduler_tick ->task_tick->update_curr_rt->sched_rt_runtime_exceeded; resched_task:

为什么在active优先级队列上的实时进程会让出cpu?

1)那是因为在实时进程运行队列更新队列时间时(即update_curr_rt)，会通过sched_rt_runtime_exceeded接口判断实时进程运行队列上的运行时间在实时进程的一个周期时间内(即sysctl_sched_rt_period)，是否超过系统设定的实时进程实际运行时间(即sysctl_sched_rt_runtime),如果超过了，则设置

rt_rq->rt_throttled = 1;

rt_throttled标志的作用是:schedule过程中，系统调度器会选择下一个运行的进程，当pick_next_task函数在实时进程运行队列上选择时

(即通过_pick_next_task_rt接口选择)；会通过rt_rq_throttled(rt_rq)函数判断rt_throttled的值，如果为1，则直接返回NULL，就是说尽管实时进程

在实时进程的active优先级队列上，但是也不能被选择，而是要从cfs运行队列上选择普通进程。

2 队列时间何时更新?

运行队列上的时间维护可以参看博文 linux系统调度之时间 ,update_process_times-scheduler_tick->update_rq_clock.

update_curr_rt接口在如下两种情况更新:

1> 时间片更新:update_process_times-scheduler_tick-task_tick_rt-update_curr_rt

2> __schedule-put_prev_task-update_curr_rt

update_curr_rt队列时间更新时，sched_rt_runtime_exceeded()判断如果实时进程运行时间超过了运行时间片，则还会设置重新调度的标志（通过resched_task接口），系统调度时选择下一进程，不会再选择实时进程（参看rt_throttled标志的作用）

static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq) { u64 runtime = sched_rt_runtime(rt_rq); if (rt_rq->rt_throttled) return rt_rq_throttled(rt_rq); if (runtime >= sched_rt_period(rt_rq)) return 0; balance_runtime(rt_rq); runtime = sched_rt_runtime(rt_rq); if (runtime == RUNTIME_INF) return 0; if (rt_rq->rt_time > runtime) { /* sched_rt_handler->sched_rt_do_global这个函数中定义了def_rt_bandwidth，此处的rt_b=def_rt_bandwidth; def_rtbandwidth.rt_runtime=global_rt_runtime()=sysctl_sched_rt_runtime实时进程一个周期内实际运行时间。 def_rtbandwidth.rt_period=ns_to_ktime(global_rt_period())=sysctl_sched_rt_period 实时进程一个运行周期。 */ struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); /* * Don't actually throttle groups that have no runtime assigned * but accrue some time due to boosting. */ if (likely(rt_b->rt_runtime)) { static bool once = false; /* rt_rq->rt_throttled = 1表示系统在调度过程schedule->pick_next_task->_pick_next_task_rt选择下一个进程时，会在 _pick_next_task_rt中直接返回NULL，不会从实时进程的运行队列上选取，因为实时进程的实际运行时间已经满了，需要让出一点时间给普通进程。 */ rt_rq->rt_throttled = 1; if (!once) { once = true; printk_deferred("sched: RT throttling activated\n"); } } else { /* * In case we did anyway, make it go away, * replenishment is a joke, since it will replenish us * with exactly 0 ns. */ rt_rq->rt_time = 0; } if (rt_rq_throttled(rt_rq)) { sched_rt_rq_dequeue(rt_rq); return 1; } } return 0; } [总结]

本文主要介绍了实时进程运行队列的维护，调度过程实质是从运行队列上选择好下一个进程，再把被调度进程从运行队列上移除或者添加到队列尾等。

运行队列维护过程需要注意：

1> 进程加入运行队列时，enqueue_rt_entity->__enqueue_rt_entity 函数真正完成调度实体添加到运行队列的链表上。

进程移出运行队列时,dequeue_rt_entity->dequeue_rt_stack->__dequeue_rt_entity函数真正完成调度实体移出运行队列的链表上。

2> 进程加入运行队列时，先通过enqueue_rt_entity->dequeue_rt_stack函数判断进程是否已经在运行队列上，如果是先删除。

进程移出运行队列时，出队列过程dequeue_rt_entity时也会调用dequeue_rt_stack。

3> 时间片到期时，会通过requeue_task_rt函数实现重入队列。

以下几个问题，强调一下：

1> schedule-> deactivate_task->dequeue_task过程中，如果是自愿调度走，进程的调度实体也需要从active队列上删除（deactivate_task）如msleep过程，以后再通过wake_up重新添加回来。被迫或者抢占时不从active队列上移除。

2> 对于因为等待解锁、等待等待量、睡眠而调度走的线程，都是系统主动唤醒的，通过wake_up系列函数。

3> 对于时间片耗尽的线程，何时再运行呢？

其实这种情况下进程调度实体并未从active优先级队列上删除，只是移动到了队列尾部。此问题答案相当于问题三答案。

转载请注明原文地址: https://www.6miu.com/read-78715.html

技术

最新回复(0)