SDC’s big theory
Just copy-pasting a high-level description of new SDC scheduling class taken from corresponding PSARC discussion to ease future reference.
/* * The System Duty Cycle (SDC) scheduling class * -------------------------------------------- * * Background * * Kernel threads in Solaris have traditionally not been large consumers * of CPU time. They typically wake up, perform a small amount of * work, then go back to sleep waiting for either a timeout or another * signal. On the assumption that the small amount of work that they do * is important for the behavior of the whole system, these threads are * treated kindly by the dispatcher and the SYS scheduling class: they run * without preemption from anything other than real-time and interrupt * threads; when preempted, they are put at the front of the queue, so they * generally do not migrate between CPUs; and they are allowed to stay * running until they voluntarily give up the CPU. * * As Solaris has evolved, new workloads have emerged which require the * kernel to perform significant amounts of CPU-intensive work. One * example of such a workload is ZFS's transaction group sync processing. * Each sync operation generates a large batch of I/Os, and each I/O * may need to be compressed and/or checksummed before it is written to * storage. The taskq threads which perform the compression and checksums * will run nonstop as long as they have work to do; a large sync operation * on a compression-heavy dataset can keep them busy for seconds on end. * This causes human-time-scale dispatch latency bubbles for any other * threads which have the misfortune to share a CPU with the taskq threads. * * The SDC scheduling class is a solution to this problem. * * * Overview * * SDC is centered around the concept of a thread's duty cycle (DC): * * ONPROC time * Duty Cycle = ---------------------- * ONPROC + Runnable time * * This is the ratio of the time that the thread spent running on a CPU * divided by the time it spent running or trying to run. It is unaffected * by any time the thread spent sleeping, stopped, etc. * * A thread joining the SDC class specifies a "target" DC that it wants * to run at. To implement this policy, the routine sysdc_update() scans * the list of active SDC threads every few ticks and uses each thread's * microstate data to compute the actual duty cycle that that thread * has experienced recently. If the thread is under its target DC, its * priority is increased to the maximum available (sysdc_maxpri, which is * 99 by default). If the thread is over its target DC, its priority is * reduced to the minimum available (sysdc_minpri, 0 by default). This * is a fairly primitive approach, in that it doesn't use any of the * intermediate priorities, but it's not completely inappropriate. Even * though threads in the SDC class might take a while to do their job, they * are by some definition important if they're running inside the kernel, * so it is reasonable that they should get to run at priority 99. * * If a thread is running when sysdc_update() calculates its actual duty * cycle, and there are other threads of equal or greater priority on its * CPU's dispatch queue, sysdc_update() preempts that thread. The thread * acknowledges the preemption by calling sysdc_preempt(), which calls * setbackdq(), which gives other threads with the same priority a chance * to run. This creates a de facto time quantum for threads in the SDC * scheduling class. * * An SDC thread which is assigned priority 0 can continue to run if * nothing else needs to use the CPU that it's running on. Similarly, an * SDC thread at priority 99 might not get to run as much as it wants to * if there are other priority-99 or higher threads on its CPU. These * situations would cause the thread to get ahead of or behind its target * DC; the longer the situations lasted, the further ahead or behind the * thread would get. Rather than condemning a thread to a lifetime of * paying for its youthful indiscretions, SDC keeps "base" values for * ONPROC and Runnable times in each thread's sysdc data, and updates these * values periodically. The duty cycle is then computed using the elapsed * amount of ONPROC and Runnable times since those base times. * * Since sysdc_update() scans SDC threads fairly frequently, it tries to * keep the list of "active" threads small by pruning out threads which * have been asleep for a brief time. They are not pruned immediately upon * going to sleep, since some threads may bounce back and forth between * sleeping and being runnable. * * * Interfaces * * void sysdc_thread_enter(t, dc, flags) * * Moves a kernel thread from the SYS scheduling class to the * SDC class. t must have an associated LWP (created by calling * lwp_kernel_create()). The thread will have a target DC of dc. * Flags should be either 0 or SYSDC_THREAD_BATCH. If * SYSDC_THREAD_BATCH is specified, the thread will run with a * slightly lower priority (see "Batch threads", below). *