Revision fba49cbc doc/arch/aquarium.tex
b/doc/arch/aquarium.tex | ||
---|---|---|
20 | 20 |
|
21 | 21 |
\title{Aquarium: Billing for the Cloud in the Cloud} |
22 | 22 |
|
23 |
\authorinfo{Georgios Gousios \and Christos KK Loverdos}
|
|
23 |
\authorinfo{Georgios Gousios \and Christos Loverdos} |
|
24 | 24 |
{GRNet SA} |
25 | 25 |
{\{gousiosg,loverdos\}@grnet.gr} |
26 | 26 |
|
... | ... | |
83 | 83 |
the GRnet IaaS, they all share a common notion of \emph{resources}, access |
84 | 84 |
and manipulation options to which they offer to users. |
85 | 85 |
|
86 |
\subsection{Supported Users}
|
|
86 |
\subsection{Sharing}
|
|
87 | 87 |
|
88 | 88 |
The Okeanos IaaS will support the Greek higher education, an estimated |
89 | 89 |
population of 100.000 students and researchers. Each member will be granted |
... | ... | |
92 | 92 |
have a limited amount of credits, renewable each month, which will be allowed |
93 | 93 |
to spend on any resource available through the infrastructure. Resources can |
94 | 94 |
also be shared; for example files on the Pithos file service or virtual machine |
95 |
images on the Archipelago storage can be |
|
95 |
images on the Archipelago storage are potentially subject to concurrent usage from |
|
96 |
multiple users. This means that charges for the use of a single resource |
|
97 |
may need to be distributed among several users. Also this may mean that in order |
|
98 |
for sharing to work correctly, users may need to transfer credits among them. |
|
96 | 99 |
|
97 | 100 |
|
98 | 101 |
\subsection{Configuration} |
... | ... | |
153 | 156 |
\begin{description} |
154 | 157 |
|
155 | 158 |
\item[Resources] specify the properties of resources that Aquarium knows |
156 |
about. Apart from the expected ones (name, unit etc), |
|
157 |
a resource has two properties that affect billing: \textsf{costpolicy}
|
|
158 |
defines whether the billing operation is to be performed at the moment
|
|
159 |
a billing event has arrived, while the \textsf{complex} attribute defines
|
|
160 |
whether a resource can have many instances per user.
|
|
159 |
about. Apart from the expected ones (name, unit etc), a resource has
|
|
160 |
two properties that affect billing: \textsf{costpolicy} defines the
|
|
161 |
algorithm to be used to calculate the resource usage, while the
|
|
162 |
\textsf{complex} attribute defines whether a resource can have one or
|
|
163 |
many instances per user. |
|
161 | 164 |
|
162 | 165 |
\item[Pricelists] assign a price tag to each resource, within a timeframe. |
163 | 166 |
|
... | ... | |
180 | 183 |
|
181 | 184 |
|
182 | 185 |
\begin{figure} |
183 |
\lstset{language=ruby, basicstyle=\footnotesize,
|
|
186 |
\lstset{language=c, basicstyle=\footnotesize,
|
|
184 | 187 |
stringstyle=\ttfamily, |
185 | 188 |
flexiblecolumns=true, aboveskip=-0.9em, belowskip=0em, lineskip=0em} |
186 | 189 |
|
... | ... | |
205 | 208 |
repeat: |
206 | 209 |
- start: "00 02 * * Tue" |
207 | 210 |
end: "00 02 * * Wed" |
208 |
from: 1326041177 #Sun, 8 Jan 2012 18:46:27 EET
|
|
211 |
from: 1326041177 //Sun, 8 Jan 2012 18:46:27 EET
|
|
209 | 212 |
algorithms: |
210 | 213 |
- algorithm: |
211 | 214 |
name: default |
... | ... | |
245 | 248 |
|
246 | 249 |
\subsection{Billing} |
247 | 250 |
|
248 |
As common to most similar systems, billing in Aquarium is the application of |
|
249 |
a billing contract to an incoming billing event in order to produce an |
|
250 |
entry for the user's wallet. However, in stark contrast to most other systems, |
|
251 |
which rely on database transactions in order to securely modify the user's |
|
252 |
balance, Aquarium performs account updates asynchronously and concurrently |
|
253 |
for all known users. |
|
251 |
Commonly to most similar systems, billing in Aquarium is the application of the |
|
252 |
provisions of a user's contract to an incoming billing event in order to |
|
253 |
produce an entry for the user's wallet. However, in stark contrast to most |
|
254 |
other systems, which rely on database transactions in order to securely modify |
|
255 |
the user's balance, Aquarium performs account updates asynchronously and |
|
256 |
concurrently for all users. |
|
257 |
|
|
258 |
Per resource, the charging operation is affected by the cost policy and complexity |
|
259 |
parameters. Specifically, the 3 available cost policies affect the calculation |
|
260 |
of the amount of resource usage to be charged as follows: |
|
261 |
|
|
262 |
\begin{itemize} |
|
263 |
\item resources employing the \textsf{continuous} cost policy are charged for |
|
264 |
the actual resource usage through time. When a resource event arrives, |
|
265 |
the previous resource state between the previous charge operation and the |
|
266 |
current event event timestamp is charged and the resource state is then |
|
267 |
updated. More formally, for continuous resources, if $f(t)$ represents |
|
268 |
the function of resource usage through time and $p(t)$ is the function |
|
269 |
representing the pricelist at time $t$, |
|
270 |
then the total cost up to a |
|
271 |
$c(t) = \sum_{i=0}^{t} {p(t) \times \int_0^{t}{f(t)dt}}$ |
|
272 |
|
|
273 |
\item resources employing the \textsf{onoff} cost policy can be in two states: |
|
274 |
either switched on and actively used or switched off. Therefore, the unit |
|
275 |
of resource usage is time and not |
|
276 |
|
|
277 |
\item and finally, resources using the \textsf{distinct} cost policy are charged |
|
278 |
upon usage, without time playing |
|
279 |
|
|
280 |
\end{itemize} |
|
281 |
|
|
282 |
Billing events are obtained by a connection to a message queue. Upon arrival, |
|
283 |
a billing event is stored in an immutable log, and then forwarded to the user |
|
284 |
actor's mailbox; the calculation of the actual billing entries to be stored in |
|
285 |
the user's wallet is done within the context of an actor. The calculation |
|
286 |
process is in itself complicated. A significant source of complexity is |
|
287 |
the support for temporal overriding for pricelists and algorithms: |
|
288 |
within the timeframe of the billing |
|
289 |
algorithm must first decide the |
|
254 | 290 |
|
255 |
Billing events are obtained by a connection to a reliable message queue. |
|
256 |
The billing event format depends on the |
|
257 | 291 |
The actual format of the event is presented in Figure~\ref{fig:resevt}. |
258 | 292 |
|
259 | 293 |
\begin{figure} |
... | ... | |
281 | 315 |
|
282 | 316 |
\end{figure} |
283 | 317 |
|
284 |
\subsection{Implementation Experience} |
|
318 |
\subsection{User State} |
|
319 |
|
|
320 |
|
|
321 |
\section{Performance} |
|
322 |
|
|
323 |
To evaluate the performance and scalability of Aquarium, we performed two |
|
324 |
experiments: The first one is a micro-benchmark that measures the time required |
|
325 |
for the basic processing operation performed by Aquarium, which is billing for |
|
326 |
increasing number of messages. The second one demonstrates Aquarium's |
|
327 |
scalability on a single node with respect to the number of users. In both |
|
328 |
cases, Aquarium was run on a MacBookPro featuring a quad core 2.33{\sc g}hz |
|
329 |
Intel i7 processor and 8{\sc gb} of {\sc ram}. We selected Rabbit{\sc mq} and |
|
330 |
Mongo{\sc db} as the queue and database servers, both of which were run on a |
|
331 |
virtualised 4 core with 4{\sc gb} {\sc ram} Debian Linux server. Both systems |
|
332 |
were run using current versions at the time of benchmarking (2.7.1 for |
|
333 |
Rabbit{\sc mq} and 2.6 for Mongo{\sc db}). The two systems were connected with |
|
334 |
a full duplex 100Mbps connection. No particular optimization was performed on |
|
335 |
either back-end system, nor to the {\sc jvm} that run Aquarium. |
|
336 |
|
|
337 |
To simulate a realistic deployment, Aquarium was configured, using the policy |
|
338 |
{\sc dsl} to handle billing events for 4 types of resources, using 3 overloaded |
|
339 |
pricelists, 2 overloaded algorithms, all of which were combined to 10 different |
|
340 |
agreements, which were randomly (uniformly) assigned to users. To drive the |
|
341 |
benchmarks, we used a synthetic load generator that worked in two stages: it |
|
342 |
first created a configurable number of users and then produced billing events |
|
343 |
that |
|
344 |
|
|
345 |
|
|
346 |
All measurements were done using the first working version of the |
|
347 |
Aquarium deployment, so no real optimisation effort did take place. |
|
348 |
|
|
349 |
|
|
350 |
\section{Lessons Learned} |
|
285 | 351 |
|
286 | 352 |
One of the topics of debate while designing Aquarium was the choice of |
287 | 353 |
programming platform to use. With all user facing systems in the Okeanos cloud |
... | ... | |
292 | 358 |
need to employ in order to satisfy them, it became clear that a |
293 | 359 |
typesafe language was a hard requirement. Of the platforms examined, the {\sc |
294 | 360 |
jvm} had the richest collection of ready made components; the Akka library was |
295 |
particularly enticing for its scalability and distribution possibilities it
|
|
361 |
particularly enticing for the scalability and distribution possibilities it
|
|
296 | 362 |
offered. |
297 | 363 |
|
298 |
The choice of Scala at the moment was a high risk/high gain bet for GRNet.
|
|
299 |
However, the development team's experience has been generally positive. Scala
|
|
300 |
as a language was an enabling factor; case classes permitted the expression of
|
|
301 |
data models, including the configuration {\sc dsl}, that could be easily be
|
|
302 |
serialized or read back from wire formats while also promoting immutability
|
|
303 |
through the use of the \texttt{copy()} constructor. The very active use of
|
|
304 |
immutability allowed us to write strict, yet simple and concise unit tests, as
|
|
305 |
the number of cases to be examined was generally low. The
|
|
306 |
\textsf{Maybe}\footnote{\textsf{Maybe} works like \textsf{Option}, but it has
|
|
307 |
an extra possible state (\textsf{Failed}), which allows exceptions to be |
|
364 |
The choice of Scala at the moment it had been made was a high risk/high gain
|
|
365 |
bet for GRNet. However, the development team's experience has been generally
|
|
366 |
positive. Scala as a language was an enabling factor; case classes permitted
|
|
367 |
the expression of data models, including the configuration {\sc dsl}, that
|
|
368 |
could be easily be serialized or read back from wire formats while also
|
|
369 |
promoting immutability through the use of the \texttt{copy()} constructor. The
|
|
370 |
pervasive use of immutability allowed us to write strict, yet simple and
|
|
371 |
concise unit tests, as the number of cases to be examined was generally low.
|
|
372 |
The \textsf{Maybe}\footnote{\textsf{Maybe} works like \textsf{Option}, but it
|
|
373 |
has an extra possible state (\textsf{Failed}), which allows exceptions to be
|
|
308 | 374 |
encapsulated in the return type of a function, and then retrieved and accounted |
309 | 375 |
for in a pattern matching operation with no side effects. More at |
310 |
\url{https://github.com/loverdos/Maybe}} monad, developed by the second author, |
|
311 |
enabled side-effect free development of data processing functions, even in cases |
|
312 |
where exceptions were the only way to go. Java interoperability was excellent, |
|
313 |
while thin Scala wrappers around existing Java libraries enabled higher |
|
314 |
productivity and use of Scala idioms in conjunction with Java code. |
|
376 |
\url{https://github.com/loverdos/Maybe}} monad, enabled side-effect free |
|
377 |
development of data processing functions, even in cases where exceptions were |
|
378 |
the only way to go. Java interoperability was excellent, while thin Scala |
|
379 |
wrappers around existing Java libraries enabled higher productivity and use of |
|
380 |
Scala idioms in conjunction with Java code. |
|
381 |
|
|
382 |
The Akka library, which is the backbone of our system, is a prime example of |
|
383 |
the simplicity that can be achieved by using carefully designed high-level |
|
384 |
components. Akka's custom supervision hierarchies allowed us to partition the |
|
385 |
system in self-healing sub-components, each of which can fail independently |
|
386 |
of the other. For example, if the queue reader component fails due to a queue |
|
387 |
failure, Aquarium will still be accessible and responsive for the {\sc rest} |
|
388 |
interface. Also, Akka allowed us to easily saturate the processing components |
|
389 |
of any system we tested Aquarium on, simply by tuning the number of threads (in |
|
390 |
{\sc i/o} bound parts) and actors (in {\sc cpu} bound parts) per dispatcher. |
|
315 | 391 |
|
316 | 392 |
Despite the above, the experience was not as smooth as initially expected. The |
317 |
most prominent problem we encountered was that of missing documentation. The
|
|
393 |
most prominent problem we encountered was that of lacking documentation. The
|
|
318 | 394 |
Akka library documentation, extensive as is, only scratches the surface. |
319 | 395 |
Several other libraries we use, for example Spray for {\sc rest} handling, have |
320 | 396 |
non-existent documentation. The Java platform, and .Net that followed, has |
... | ... | |
335 | 411 |
statements (including about 1.000 lines of tests), divided in about 10 |
336 | 412 |
packages. The system is built using both {\sc sbt} and Maven. |
337 | 413 |
|
338 |
\section{Performance} |
|
339 |
|
|
340 |
To evaluate the performance and scalability of Aquarium, we performed two |
|
341 |
experiments: The first one is a micro-benchmark that measures the time required |
|
342 |
for the basic processing operation performed by Aquarium, which is billing for |
|
343 |
increasing number of messages. The second one demonstrates Aquarium's |
|
344 |
scalability on a single node with respect to the number of users. In both |
|
345 |
cases, Aquarium was run on a MacBookPro featuring a quad core 2.33{\sc g}hz |
|
346 |
Intel i7 processor and 8{\sc gb} of {\sc ram}. We selected Rabbit{\sc mq} and |
|
347 |
Mongo{\sc db} as the queue and database servers, both of which were run on a |
|
348 |
virtualised 4 core with 4{\sc gb} {\sc ram} Debian Linux server. Both systems |
|
349 |
were run using current versions at the time of benchmarking (2.7.1 for |
|
350 |
Rabbit{\sc mq} and 2.6 for Mongo{\sc db}). The two systems were connected with |
|
351 |
a full duplex 100Mbps connection. No particular optimization was performed on |
|
352 |
either back-end system, nor to the {\sc jvm} that run Aquarium. |
|
353 |
|
|
354 |
To simulate a realistic deployment, Aquarium was configured, using the policy |
|
355 |
{\sc dsl} to handle billing events for 4 types of resources, using 3 overloaded |
|
356 |
pricelists, 2 overloaded algorithms, all of which were combined to 10 different |
|
357 |
agreements, which were randomly (uniformly) assigned to users. To drive the |
|
358 |
benchmarks, we used a synthetic load generator that worked in two stages: it |
|
359 |
first created a configurable number of users and then produced billing events |
|
360 |
that |
|
361 |
|
|
362 |
|
|
363 |
The measurements above were done on the first working version of the |
|
364 |
Aquarium deployment. They present |
|
365 |
|
|
366 | 414 |
\section{Related Work} |
367 | 415 |
|
368 | 416 |
\section{Conclusions and Future Work} |
Also available in: Unified diff