root / doc / design-cpu-pinning.rst @ 33c730a2
History | View | Annotate | Download (7.7 kB)
1 | ae9b5e0f | Tsachy Shacham | Ganeti CPU Pinning |
---|---|---|---|
2 | ae9b5e0f | Tsachy Shacham | ================== |
3 | ae9b5e0f | Tsachy Shacham | |
4 | ae9b5e0f | Tsachy Shacham | Objective |
5 | ae9b5e0f | Tsachy Shacham | --------- |
6 | ae9b5e0f | Tsachy Shacham | |
7 | ae9b5e0f | Tsachy Shacham | This document defines Ganeti's support for CPU pinning (aka CPU |
8 | ae9b5e0f | Tsachy Shacham | affinity). |
9 | ae9b5e0f | Tsachy Shacham | |
10 | ae9b5e0f | Tsachy Shacham | CPU pinning enables mapping and unmapping entire virtual machines or a |
11 | ae9b5e0f | Tsachy Shacham | specific virtual CPU (vCPU), to a physical CPU or a range of CPUs. |
12 | ae9b5e0f | Tsachy Shacham | |
13 | ae9b5e0f | Tsachy Shacham | At this stage Pinning will be implemented for Xen and KVM. |
14 | ae9b5e0f | Tsachy Shacham | |
15 | ae9b5e0f | Tsachy Shacham | Command Line |
16 | ae9b5e0f | Tsachy Shacham | ------------ |
17 | ae9b5e0f | Tsachy Shacham | |
18 | ae9b5e0f | Tsachy Shacham | Suggested command line parameters for controlling CPU pinning are as |
19 | ae9b5e0f | Tsachy Shacham | follows:: |
20 | ae9b5e0f | Tsachy Shacham | |
21 | ae9b5e0f | Tsachy Shacham | gnt-instance modify -H cpu_mask=<cpu-pinning-info> <instance> |
22 | ae9b5e0f | Tsachy Shacham | |
23 | ae9b5e0f | Tsachy Shacham | cpu-pinning-info can be any of the following: |
24 | ae9b5e0f | Tsachy Shacham | |
25 | ae9b5e0f | Tsachy Shacham | * One vCPU mapping, which can be the word "all" or a combination |
26 | ae9b5e0f | Tsachy Shacham | of CPU numbers and ranges separated by comma. In this case, all |
27 | ae9b5e0f | Tsachy Shacham | vCPUs will be mapped to the indicated list. |
28 | ae9b5e0f | Tsachy Shacham | * A list of vCPU mappings, separated by a colon ':'. In this case |
29 | ae9b5e0f | Tsachy Shacham | each vCPU is mapped to an entry in the list, and the size of the |
30 | ae9b5e0f | Tsachy Shacham | list must match the number of vCPUs defined for the instance. This |
31 | ae9b5e0f | Tsachy Shacham | is enforced when setting CPU pinning or when setting the number of |
32 | ae9b5e0f | Tsachy Shacham | vCPUs using ``-B vcpus=#``. |
33 | ae9b5e0f | Tsachy Shacham | |
34 | ae9b5e0f | Tsachy Shacham | The mapping list is matched to consecutive virtual CPUs, so the first entry |
35 | ae9b5e0f | Tsachy Shacham | would be the CPU pinning information for vCPU 0, the second entry |
36 | ae9b5e0f | Tsachy Shacham | for vCPU 1, etc. |
37 | ae9b5e0f | Tsachy Shacham | |
38 | ae9b5e0f | Tsachy Shacham | The default setting for new instances is "all", which maps the entire |
39 | ae9b5e0f | Tsachy Shacham | instance to all CPUs, thus effectively turning off CPU pinning. |
40 | ae9b5e0f | Tsachy Shacham | |
41 | ae9b5e0f | Tsachy Shacham | Here are some usage examples:: |
42 | ae9b5e0f | Tsachy Shacham | |
43 | ae9b5e0f | Tsachy Shacham | # Map vCPU 0 to physical CPU 1 and vCPU 1 to CPU 3 (assuming 2 vCPUs) |
44 | ae9b5e0f | Tsachy Shacham | gnt-instance modify -H cpu_mask=1:3 my-inst |
45 | ae9b5e0f | Tsachy Shacham | |
46 | ae9b5e0f | Tsachy Shacham | # Pin vCPU 0 to CPUs 1 or 2, and vCPU 1 to any CPU |
47 | ae9b5e0f | Tsachy Shacham | gnt-instance modify -H cpu_mask=1-2:all my-inst |
48 | ae9b5e0f | Tsachy Shacham | |
49 | ae9b5e0f | Tsachy Shacham | # Pin vCPU 0 to any CPU, vCPU 1 to CPUs 1, 3, 4 or 5, and CPU 2 to |
50 | ae9b5e0f | Tsachy Shacham | # CPU 0 |
51 | ccf5dcf5 | Tsachy Shacham | gnt-instance modify -H cpu_mask=all:1\\,3-5:0 my-inst |
52 | ae9b5e0f | Tsachy Shacham | |
53 | ae9b5e0f | Tsachy Shacham | # Pin entire VM to CPU 0 |
54 | ae9b5e0f | Tsachy Shacham | gnt-instance modify -H cpu_mask=0 my-inst |
55 | ae9b5e0f | Tsachy Shacham | |
56 | ae9b5e0f | Tsachy Shacham | # Turn off CPU pinning (default setting) |
57 | ae9b5e0f | Tsachy Shacham | gnt-instance modify -H cpu_mask=all my-inst |
58 | ae9b5e0f | Tsachy Shacham | |
59 | ccf5dcf5 | Tsachy Shacham | Assuming an instance has 3 vCPUs, the following commands will fail:: |
60 | ae9b5e0f | Tsachy Shacham | |
61 | ae9b5e0f | Tsachy Shacham | # not enough mappings |
62 | ccf5dcf5 | Tsachy Shacham | gnt-instance modify -H cpu_mask=0:1 my-inst |
63 | ae9b5e0f | Tsachy Shacham | |
64 | ae9b5e0f | Tsachy Shacham | # too many |
65 | ccf5dcf5 | Tsachy Shacham | gnt-instance modify -H cpu_mask=2:1:1:all my-inst |
66 | ae9b5e0f | Tsachy Shacham | |
67 | ae9b5e0f | Tsachy Shacham | Validation |
68 | ae9b5e0f | Tsachy Shacham | ---------- |
69 | ae9b5e0f | Tsachy Shacham | |
70 | ae9b5e0f | Tsachy Shacham | CPU pinning information is validated by making sure it matches the |
71 | ae9b5e0f | Tsachy Shacham | number of vCPUs. This validation happens when changing either the |
72 | ae9b5e0f | Tsachy Shacham | cpu_mask or vcpus parameters. |
73 | ae9b5e0f | Tsachy Shacham | Changing either parameter in a way that conflicts with the other will |
74 | ae9b5e0f | Tsachy Shacham | fail with a proper error message. |
75 | ae9b5e0f | Tsachy Shacham | To make such a change, both parameters should be modified at the same |
76 | ae9b5e0f | Tsachy Shacham | time. For example: |
77 | ae9b5e0f | Tsachy Shacham | ``gnt-instance modify -B vcpus=4 -H cpu_mask=1:1:2-3:4\\,6 my-inst`` |
78 | ae9b5e0f | Tsachy Shacham | |
79 | ae9b5e0f | Tsachy Shacham | Besides validating CPU configuration, i.e. the number of vCPUs matches |
80 | ae9b5e0f | Tsachy Shacham | the requested CPU pinning, Ganeti will also verify the number of |
81 | ae9b5e0f | Tsachy Shacham | physical CPUs is enough to support the required configuration. For |
82 | ae9b5e0f | Tsachy Shacham | example, trying to run a configuration of vcpus=2,cpu_mask=0:4 on |
83 | ae9b5e0f | Tsachy Shacham | a node with 4 cores will fail (Note: CPU numbers are 0-based). |
84 | ae9b5e0f | Tsachy Shacham | |
85 | ae9b5e0f | Tsachy Shacham | This validation should repeat every time an instance is started or |
86 | ae9b5e0f | Tsachy Shacham | migrated live. See more details under Migration below. |
87 | ae9b5e0f | Tsachy Shacham | |
88 | ae9b5e0f | Tsachy Shacham | Cluster verification should also test the compatibility of other nodes in |
89 | ae9b5e0f | Tsachy Shacham | the cluster to required configuration and alert if a minimum requirement |
90 | ae9b5e0f | Tsachy Shacham | is not met. |
91 | ae9b5e0f | Tsachy Shacham | |
92 | ae9b5e0f | Tsachy Shacham | Failover |
93 | ae9b5e0f | Tsachy Shacham | -------- |
94 | ae9b5e0f | Tsachy Shacham | |
95 | ae9b5e0f | Tsachy Shacham | CPU pinning configuration can be transferred from node to node, unless |
96 | ae9b5e0f | Tsachy Shacham | the number of physical CPUs is smaller than what the configuration calls |
97 | ae9b5e0f | Tsachy Shacham | for. It is suggested that unless this is the case, all transfers and |
98 | ae9b5e0f | Tsachy Shacham | migrations will succeed. |
99 | ae9b5e0f | Tsachy Shacham | |
100 | ae9b5e0f | Tsachy Shacham | In case the number of physical CPUs is smaller than the numbers |
101 | ae9b5e0f | Tsachy Shacham | indicated by CPU pinning information, instance failover will fail. |
102 | ae9b5e0f | Tsachy Shacham | |
103 | ae9b5e0f | Tsachy Shacham | In case of emergency, to force failover to ignore mismatching CPU |
104 | ae9b5e0f | Tsachy Shacham | information, the following switch can be used: |
105 | ccf5dcf5 | Tsachy Shacham | ``gnt-instance failover --fix-cpu-mismatch my-inst``. |
106 | ccf5dcf5 | Tsachy Shacham | This command will try to failover the instance with the current cpu mask, |
107 | ae9b5e0f | Tsachy Shacham | but if that fails, it will change the mask to be "all". |
108 | ae9b5e0f | Tsachy Shacham | |
109 | ae9b5e0f | Tsachy Shacham | Migration |
110 | ae9b5e0f | Tsachy Shacham | --------- |
111 | ae9b5e0f | Tsachy Shacham | |
112 | ae9b5e0f | Tsachy Shacham | In case of live migration, and in addition to failover considerations, |
113 | ae9b5e0f | Tsachy Shacham | it is required to remap CPU pinning after migration. This can be done in |
114 | ae9b5e0f | Tsachy Shacham | realtime for instances for both Xen and KVM, and only depends on the |
115 | ae9b5e0f | Tsachy Shacham | number of physical CPUs being sufficient to support the migrated |
116 | ae9b5e0f | Tsachy Shacham | instance. |
117 | ae9b5e0f | Tsachy Shacham | |
118 | ae9b5e0f | Tsachy Shacham | Data |
119 | ae9b5e0f | Tsachy Shacham | ---- |
120 | ae9b5e0f | Tsachy Shacham | |
121 | ae9b5e0f | Tsachy Shacham | Pinning information will be kept as a list of integers per vCPU. |
122 | ae9b5e0f | Tsachy Shacham | To mark a mapping of any CPU, we will use (-1). |
123 | ae9b5e0f | Tsachy Shacham | A single entry, no matter what the number of vCPUs is, will always mean |
124 | ae9b5e0f | Tsachy Shacham | that all vCPUs have the same mapping. |
125 | ae9b5e0f | Tsachy Shacham | |
126 | ae9b5e0f | Tsachy Shacham | Configuration file |
127 | ae9b5e0f | Tsachy Shacham | ------------------ |
128 | ae9b5e0f | Tsachy Shacham | |
129 | ae9b5e0f | Tsachy Shacham | The pinning information is kept for each instance's hypervisor |
130 | ccf5dcf5 | Tsachy Shacham | params section of the configuration file as the original string. |
131 | ae9b5e0f | Tsachy Shacham | |
132 | ae9b5e0f | Tsachy Shacham | Xen |
133 | ae9b5e0f | Tsachy Shacham | --- |
134 | ae9b5e0f | Tsachy Shacham | |
135 | ae9b5e0f | Tsachy Shacham | There are 2 ways to control pinning in Xen, either via the command line |
136 | ae9b5e0f | Tsachy Shacham | or through the configuration file. |
137 | ae9b5e0f | Tsachy Shacham | |
138 | ae9b5e0f | Tsachy Shacham | The commands to make direct pinning changes are the following:: |
139 | ae9b5e0f | Tsachy Shacham | |
140 | ae9b5e0f | Tsachy Shacham | # To pin a vCPU to a specific CPU |
141 | ae9b5e0f | Tsachy Shacham | xm vcpu-pin <domain> <vcpu> <cpu> |
142 | ae9b5e0f | Tsachy Shacham | |
143 | ae9b5e0f | Tsachy Shacham | # To unpin a vCPU |
144 | ae9b5e0f | Tsachy Shacham | xm vcpu-pin <domain> <vcpu> all |
145 | ae9b5e0f | Tsachy Shacham | |
146 | ae9b5e0f | Tsachy Shacham | # To get the current pinning status |
147 | ae9b5e0f | Tsachy Shacham | xm vcpu-list <domain> |
148 | ae9b5e0f | Tsachy Shacham | |
149 | ae9b5e0f | Tsachy Shacham | Since currently controlling Xen in Ganeti is done in the configuration |
150 | ae9b5e0f | Tsachy Shacham | file, it is straight forward to use the same method for CPU pinning. |
151 | ae9b5e0f | Tsachy Shacham | There are 2 different parameters that control Xen's CPU pinning and |
152 | ae9b5e0f | Tsachy Shacham | configuration: |
153 | ae9b5e0f | Tsachy Shacham | |
154 | ae9b5e0f | Tsachy Shacham | vcpus |
155 | ae9b5e0f | Tsachy Shacham | controls the number of vCPUs |
156 | ae9b5e0f | Tsachy Shacham | cpus |
157 | ae9b5e0f | Tsachy Shacham | maps vCPUs to physical CPUs |
158 | ae9b5e0f | Tsachy Shacham | |
159 | ae9b5e0f | Tsachy Shacham | When no pinning is required (pinning information is "all"), the |
160 | ae9b5e0f | Tsachy Shacham | "cpus" entry is removed from the configuration file. |
161 | ae9b5e0f | Tsachy Shacham | |
162 | ae9b5e0f | Tsachy Shacham | For all other cases, the configuration is "translated" to Xen, which |
163 | ae9b5e0f | Tsachy Shacham | expects either ``cpus = "a"`` or ``cpus = [ "a", "b", "c", ...]``, |
164 | ae9b5e0f | Tsachy Shacham | where each a, b or c are a physical CPU number, CPU range, or a |
165 | ae9b5e0f | Tsachy Shacham | combination, and the number of entries (if a list is used) must match |
166 | ae9b5e0f | Tsachy Shacham | the number of vCPUs, and are mapped in order. |
167 | ae9b5e0f | Tsachy Shacham | |
168 | ae9b5e0f | Tsachy Shacham | For example, CPU pinning information of ``1:2,4-7:0-1`` is translated |
169 | ae9b5e0f | Tsachy Shacham | to this entry in Xen's configuration ``cpus = [ "1", "2,4-7", "0-1" ]`` |
170 | ae9b5e0f | Tsachy Shacham | |
171 | ae9b5e0f | Tsachy Shacham | KVM |
172 | ae9b5e0f | Tsachy Shacham | --- |
173 | ae9b5e0f | Tsachy Shacham | |
174 | ae9b5e0f | Tsachy Shacham | Controlling pinning in KVM is a little more complicated as there is no |
175 | ae9b5e0f | Tsachy Shacham | configuration to control pinning before instances are started. |
176 | ae9b5e0f | Tsachy Shacham | |
177 | ae9b5e0f | Tsachy Shacham | The way to change or assign CPU pinning under KVM is to use ``taskset`` or |
178 | ae9b5e0f | Tsachy Shacham | its underlying system call ``sched_setaffinity``. Setting the affinity for |
179 | ae9b5e0f | Tsachy Shacham | the VM process will change CPU pinning for the entire VM, and setting it |
180 | ae9b5e0f | Tsachy Shacham | for specific vCPU threads will control specific vCPUs. |
181 | ae9b5e0f | Tsachy Shacham | |
182 | ae9b5e0f | Tsachy Shacham | The sequence of commands to control pinning is this: start the instance |
183 | ae9b5e0f | Tsachy Shacham | with the ``-S`` switch, so it halts before starting execution, get the |
184 | ae9b5e0f | Tsachy Shacham | process ID or identify thread IDs of each vCPU by sending ``info cpus`` |
185 | ae9b5e0f | Tsachy Shacham | to the monitor, map vCPUs as required by the cpu-pinning information, |
186 | ae9b5e0f | Tsachy Shacham | and issue a ``cont`` command on the KVM monitor to allow the instance |
187 | ae9b5e0f | Tsachy Shacham | to start execution. |
188 | ae9b5e0f | Tsachy Shacham | |
189 | ae9b5e0f | Tsachy Shacham | For example, a sequence of commands to control CPU affinity under KVM |
190 | ae9b5e0f | Tsachy Shacham | may be: |
191 | ae9b5e0f | Tsachy Shacham | |
192 | ae9b5e0f | Tsachy Shacham | * Start KVM: ``/usr/bin/kvm … <kvm-command-line-options> … -S`` |
193 | ae9b5e0f | Tsachy Shacham | * Use socat to connect to monitor |
194 | ae9b5e0f | Tsachy Shacham | * send ``info cpus`` to monitor to get thread/vCPU information |
195 | ae9b5e0f | Tsachy Shacham | * call ``sched_setaffinity`` for each thread with the CPU mask |
196 | ae9b5e0f | Tsachy Shacham | * send ``cont`` to KVM's monitor |
197 | ae9b5e0f | Tsachy Shacham | |
198 | ae9b5e0f | Tsachy Shacham | A CPU mask is a hexadecimal bit mask where each bit represents one |
199 | ae9b5e0f | Tsachy Shacham | physical CPU. See man page for :manpage:`sched_setaffinity(2)` for more |
200 | ae9b5e0f | Tsachy Shacham | details. |
201 | ae9b5e0f | Tsachy Shacham | |
202 | ae9b5e0f | Tsachy Shacham | For example, to run a specific thread-id on CPUs 1 or 3 the mask is |
203 | ae9b5e0f | Tsachy Shacham | 0x0000000A. |
204 | ae9b5e0f | Tsachy Shacham | |
205 | ae9b5e0f | Tsachy Shacham | We will control process and thread affinity using the python affinity |
206 | ae9b5e0f | Tsachy Shacham | package (http://pypi.python.org/pypi/affinity). This package is a Python |
207 | ae9b5e0f | Tsachy Shacham | wrapper around the two affinity system calls, and has no other |
208 | ae9b5e0f | Tsachy Shacham | requirements. |
209 | ae9b5e0f | Tsachy Shacham | |
210 | ae9b5e0f | Tsachy Shacham | Alternative Design Options |
211 | ae9b5e0f | Tsachy Shacham | -------------------------- |
212 | ae9b5e0f | Tsachy Shacham | |
213 | ae9b5e0f | Tsachy Shacham | 1. There's an option to ignore the limitations of the underlying |
214 | ae9b5e0f | Tsachy Shacham | hypervisor and instead of requiring explicit pinning information |
215 | ae9b5e0f | Tsachy Shacham | for *all* vCPUs, assume a mapping of "all" to vCPUs not mentioned. |
216 | ae9b5e0f | Tsachy Shacham | This can lead to inadvertent missing information, but either way, |
217 | ae9b5e0f | Tsachy Shacham | since using cpu-pinning options is probably not going to be |
218 | ae9b5e0f | Tsachy Shacham | frequent, there's no real advantage. |
219 | ae9b5e0f | Tsachy Shacham | |
220 | ae9b5e0f | Tsachy Shacham | .. vim: set textwidth=72 : |
221 | ae9b5e0f | Tsachy Shacham | .. Local Variables: |
222 | ae9b5e0f | Tsachy Shacham | .. mode: rst |
223 | ae9b5e0f | Tsachy Shacham | .. fill-column: 72 |
224 | ae9b5e0f | Tsachy Shacham | .. End: |