Revision fba49cbc doc/arch/aquarium.tex

b/doc/arch/aquarium.tex
20 20

  
21 21
\title{Aquarium: Billing for the Cloud in the Cloud}
22 22

  
23
\authorinfo{Georgios Gousios \and Christos KK Loverdos}
23
\authorinfo{Georgios Gousios \and Christos Loverdos}
24 24
{GRNet SA}
25 25
{\{gousiosg,loverdos\}@grnet.gr}
26 26

  
......
83 83
the GRnet IaaS, they all share a common notion of \emph{resources}, access
84 84
and manipulation options to which they offer to users. 
85 85

  
86
\subsection{Supported Users}
86
\subsection{Sharing}
87 87

  
88 88
The Okeanos IaaS will support the Greek higher education, an estimated
89 89
population of 100.000 students and researchers. Each member will be granted
......
92 92
have a limited amount of credits, renewable each month, which will be allowed
93 93
to spend on any resource available through the infrastructure. Resources can
94 94
also be shared; for example files on the Pithos file service or virtual machine
95
images on the Archipelago storage can be 
95
images on the Archipelago storage are potentially subject to concurrent usage from 
96
multiple users. This means that charges for the use of a single resource
97
may need to be distributed among several users. Also this may mean that in order
98
for sharing to work correctly, users may need to transfer credits among them.
96 99

  
97 100

  
98 101
\subsection{Configuration}
......
153 156
\begin{description}
154 157

  
155 158
    \item[Resources] specify the properties of resources that Aquarium knows
156
        about. Apart from the expected ones (name, unit etc), 
157
        a resource has two properties that affect billing: \textsf{costpolicy}
158
        defines whether the billing operation is to be performed at the moment
159
        a billing event has arrived, while the \textsf{complex} attribute defines
160
        whether a resource can have many instances per user.
159
        about. Apart from the expected ones (name, unit etc), a resource has
160
        two properties that affect billing: \textsf{costpolicy} defines the
161
        algorithm to be used to calculate the resource usage, while the
162
        \textsf{complex} attribute defines whether a resource can have one or
163
        many instances per user.
161 164

  
162 165
    \item[Pricelists] assign a price tag to each resource, within a timeframe.
163 166
    
......
180 183

  
181 184

  
182 185
\begin{figure}
183
\lstset{language=ruby, basicstyle=\footnotesize,
186
\lstset{language=c, basicstyle=\footnotesize,
184 187
stringstyle=\ttfamily, 
185 188
flexiblecolumns=true, aboveskip=-0.9em, belowskip=0em, lineskip=0em}
186 189

  
......
205 208
      repeat:
206 209
      - start: "00 02 * * Tue"
207 210
        end:   "00 02 * * Wed"
208
      from: 1326041177 #Sun, 8 Jan 2012 18:46:27 EET
211
      from: 1326041177        //Sun, 8 Jan 2012 18:46:27 EET
209 212
algorithms:
210 213
  - algorithm:
211 214
    name: default
......
245 248

  
246 249
\subsection{Billing}
247 250

  
248
As common to most similar systems, billing in Aquarium is the application of
249
a billing contract to an incoming billing event in order to produce an 
250
entry for the user's wallet. However, in stark contrast to most other systems,
251
which rely on database transactions in order to securely modify the user's
252
balance, Aquarium performs account updates asynchronously and concurrently
253
for all known users.
251
Commonly to most similar systems, billing in Aquarium is the application of the
252
provisions of a user's contract to an incoming billing event in order to
253
produce an entry for the user's wallet. However, in stark contrast to most
254
other systems, which rely on database transactions in order to securely modify
255
the user's balance, Aquarium performs account updates asynchronously and
256
concurrently for all users.
257

  
258
Per resource, the charging operation is affected by the cost policy and complexity
259
parameters. Specifically, the 3 available cost policies affect the calculation 
260
of the amount of resource usage to be charged as follows:
261

  
262
\begin{itemize}
263
    \item resources employing the \textsf{continuous} cost policy are charged for
264
        the actual resource usage through time. When a resource event arrives,
265
        the previous resource state between the previous charge operation and the
266
        current event event timestamp is charged and the resource state is then
267
        updated. More formally, for continuous resources, if $f(t)$ represents
268
        the function of resource usage through time and $p(t)$ is the function
269
        representing the pricelist at time $t$, 
270
        then the total cost up to a 
271
        $c(t) = \sum_{i=0}^{t} {p(t) \times \int_0^{t}{f(t)dt}}$
272

  
273
    \item resources employing the \textsf{onoff} cost policy can be in two states:
274
        either switched on and actively used or switched off. Therefore, the unit
275
        of resource usage is time and not
276

  
277
    \item and finally, resources using the \textsf{distinct} cost policy are charged
278
        upon usage, without time playing 
279

  
280
\end{itemize}
281

  
282
Billing events are obtained by a connection to a message queue.  Upon arrival,
283
a billing event is stored in an immutable log, and then forwarded to the user
284
actor's mailbox; the calculation of the actual billing entries to be stored in
285
the user's wallet is done within the context of an actor. The calculation
286
process is in itself complicated. A significant source of complexity is
287
the support for temporal overriding for pricelists and algorithms:
288
within the timeframe of the billing
289
algorithm must first decide the 
254 290

  
255
Billing events are obtained by a connection to a reliable message queue.
256
The billing event format depends on the 
257 291
The actual format of the event is presented in Figure~\ref{fig:resevt}.
258 292

  
259 293
\begin{figure}
......
281 315

  
282 316
\end{figure}
283 317

  
284
\subsection{Implementation Experience}
318
\subsection{User State}
319

  
320

  
321
\section{Performance}
322

  
323
To evaluate the performance and scalability of Aquarium, we performed two
324
experiments: The first one is a micro-benchmark that measures the time required
325
for the basic processing operation performed by Aquarium, which is billing for
326
increasing number of messages. The second one demonstrates Aquarium's
327
scalability on a single node with respect to the number of users.  In both
328
cases, Aquarium was run on a MacBookPro featuring a quad core 2.33{\sc g}hz
329
Intel i7 processor and 8{\sc gb} of {\sc ram}. We selected Rabbit{\sc mq} and
330
Mongo{\sc db} as the queue and database servers, both of which were run on a
331
virtualised 4 core with 4{\sc gb} {\sc ram} Debian Linux server. Both systems
332
were run using current versions at the time of benchmarking (2.7.1 for
333
Rabbit{\sc mq} and 2.6 for Mongo{\sc db}).  The two systems were connected with
334
a full duplex 100Mbps connection.  No particular optimization was performed on
335
either back-end system, nor to the {\sc jvm} that run Aquarium. 
336

  
337
To simulate a realistic deployment, Aquarium was configured, using the policy
338
{\sc dsl} to handle billing events for 4 types of resources, using 3 overloaded
339
pricelists, 2 overloaded algorithms, all of which were combined to 10 different
340
agreements, which were randomly (uniformly) assigned to users. To drive the
341
benchmarks, we used a synthetic load generator that worked in two stages: it
342
first created a configurable number of users and then produced billing events
343
that 
344

  
345

  
346
All measurements were done using the first working version of the
347
Aquarium deployment, so no real optimisation effort did take place. 
348

  
349

  
350
\section{Lessons Learned}
285 351

  
286 352
One of the topics of debate while designing Aquarium was the choice of
287 353
programming platform to use. With all user facing systems in the Okeanos cloud
......
292 358
need to employ in order to satisfy them, it became clear that a
293 359
typesafe language was a hard requirement. Of the platforms examined, the {\sc
294 360
jvm} had the richest collection of ready made components; the Akka library was
295
particularly enticing for its scalability and distribution possibilities it
361
particularly enticing for the scalability and distribution possibilities it
296 362
offered.
297 363

  
298
The choice of Scala at the moment was a high risk/high gain bet for GRNet.
299
However, the development team's experience has been generally positive.  Scala
300
as a language was an enabling factor; case classes permitted the expression of
301
data models, including the configuration {\sc dsl}, that could be easily be
302
serialized or read back from wire formats while also promoting immutability
303
through the use of the \texttt{copy()} constructor.  The very active use of
304
immutability allowed us to write strict, yet simple and concise unit tests, as
305
the number of cases to be examined was generally low. The
306
\textsf{Maybe}\footnote{\textsf{Maybe} works like \textsf{Option}, but it has
307
an extra possible state (\textsf{Failed}), which allows exceptions to be
364
The choice of Scala at the moment it had been made was a high risk/high gain
365
bet for GRNet. However, the development team's experience has been generally
366
positive. Scala as a language was an enabling factor; case classes permitted
367
the expression of data models, including the configuration {\sc dsl}, that
368
could be easily be serialized or read back from wire formats while also
369
promoting immutability through the use of the \texttt{copy()} constructor. The
370
pervasive use of immutability allowed us to write strict, yet simple and
371
concise unit tests, as the number of cases to be examined was generally low.
372
The \textsf{Maybe}\footnote{\textsf{Maybe} works like \textsf{Option}, but it
373
has an extra possible state (\textsf{Failed}), which allows exceptions to be
308 374
encapsulated in the return type of a function, and then retrieved and accounted
309 375
for in a pattern matching operation with no side effects. More at
310
\url{https://github.com/loverdos/Maybe}} monad, developed by the second author,
311
enabled side-effect free development of data processing functions, even in cases
312
where exceptions were the only way to go. Java interoperability was excellent,
313
while thin Scala wrappers around existing Java libraries enabled higher
314
productivity and use of Scala idioms in conjunction with Java code.
376
\url{https://github.com/loverdos/Maybe}} monad, enabled side-effect free
377
development of data processing functions, even in cases where exceptions were
378
the only way to go. Java interoperability was excellent, while thin Scala
379
wrappers around existing Java libraries enabled higher productivity and use of
380
Scala idioms in conjunction with Java code.
381

  
382
The Akka library, which is the backbone of our system, is a prime example of 
383
the simplicity that can be achieved by using carefully designed high-level
384
components. Akka's custom supervision hierarchies allowed us to partition the
385
system in self-healing sub-components, each of which can fail independently
386
of the other. For example, if the queue reader component fails due to a queue
387
failure, Aquarium will still be accessible and responsive for the {\sc rest}
388
interface. Also, Akka allowed us to easily saturate the processing components
389
of any system we tested Aquarium on, simply by tuning the number of threads (in
390
{\sc i/o} bound parts) and actors (in {\sc cpu} bound parts) per dispatcher. 
315 391

  
316 392
Despite the above, the experience was not as smooth as initially expected. The
317
most prominent problem we encountered was that of missing documentation. The
393
most prominent problem we encountered was that of lacking documentation. The
318 394
Akka library documentation, extensive as is, only scratches the surface.
319 395
Several other libraries we use, for example Spray for {\sc rest} handling, have
320 396
non-existent documentation. The Java platform, and .Net that followed, has
......
335 411
statements (including about 1.000 lines of tests), divided in about 10
336 412
packages. The system is built using both {\sc sbt} and Maven. 
337 413

  
338
\section{Performance}
339

  
340
To evaluate the performance and scalability of Aquarium, we performed two
341
experiments: The first one is a micro-benchmark that measures the time required
342
for the basic processing operation performed by Aquarium, which is billing for
343
increasing number of messages. The second one demonstrates Aquarium's
344
scalability on a single node with respect to the number of users.  In both
345
cases, Aquarium was run on a MacBookPro featuring a quad core 2.33{\sc g}hz
346
Intel i7 processor and 8{\sc gb} of {\sc ram}. We selected Rabbit{\sc mq} and
347
Mongo{\sc db} as the queue and database servers, both of which were run on a
348
virtualised 4 core with 4{\sc gb} {\sc ram} Debian Linux server. Both systems
349
were run using current versions at the time of benchmarking (2.7.1 for
350
Rabbit{\sc mq} and 2.6 for Mongo{\sc db}).  The two systems were connected with
351
a full duplex 100Mbps connection.  No particular optimization was performed on
352
either back-end system, nor to the {\sc jvm} that run Aquarium. 
353

  
354
To simulate a realistic deployment, Aquarium was configured, using the policy
355
{\sc dsl} to handle billing events for 4 types of resources, using 3 overloaded
356
pricelists, 2 overloaded algorithms, all of which were combined to 10 different
357
agreements, which were randomly (uniformly) assigned to users. To drive the
358
benchmarks, we used a synthetic load generator that worked in two stages: it
359
first created a configurable number of users and then produced billing events
360
that 
361

  
362

  
363
The measurements above were done on the first working version of the
364
Aquarium deployment. They present 
365

  
366 414
\section{Related Work}
367 415

  
368 416
\section{Conclusions and Future Work}

Also available in: Unified diff