Statistics
| Branch: | Tag: | Revision:

root / doc / arch / aquarium.tex @ 44840f76

History | View | Annotate | Download (23.4 kB)

1
\documentclass[preprint,10pt]{sigplanconf}
2
\usepackage{amsmath}
3
\usepackage{amssymb}
4
\usepackage{graphicx}
5
\usepackage[british]{babel}
6
\usepackage{url}
7
\usepackage{listings}
8
\usepackage{color}
9

    
10
\newcommand{\cL}{{\cal L}}
11
\newcommand{\TODO}{{\sl TODO \marginpar{\sl TODO}}}
12

    
13
\begin{document}
14
\conferenceinfo{ScalaDays '12}{London, UK.}
15
\copyrightyear{2012}
16
\copyrightdata{1-59593-056-6/05/0006}
17

    
18
\titlebanner{DRAFT---Do not distribute}
19

    
20

    
21

    
22
\title{Aquarium: Billing for the Cloud in the Cloud}
23

    
24
\authorinfo{Georgios Gousios \and Christos KK Loverdos \and Nectarios Koziris}
25
{GRNet SA}
26
{\{gousiosg,loverdos,nkoziris\}@grnet.gr}
27

    
28
\maketitle
29
\begin{abstract}
30
    This paper describes the architecture for the Aquarium cloud infrastructure
31
    software. Aquarium is a new software we have built whose main function is to associate cloud resources usage with charging policies.
32
\end{abstract}
33

    
34
\category{D.3.3}{Programming Languages}{Language Constructs and Features}[Control structures]
35

    
36
\terms
37
    Object-Oriented Programming, Philosophy
38

    
39
\keywords
40
    OOP, Ontology, Programming Philosophy
41

    
42
\section{Introduction}
43
\section{Requirements}
44

    
45
Aquarium was designed on a clean sheet to serve a particular purpose,
46
namely the provision of billing services to an IaaS infrastructure,
47
and also be extensible to new services. In the following sections,
48
we briefly present the requirements that shaped Aquarium's design.
49

    
50
\subsection{Application Environment}
51
Aquarium developed as part of the Okeanos project at GRNet. The
52
Okeanos project is building a full stack public IaaS system for Greek
53
universities, and several services on top of it. Several components comprise
54
the Okeanos infrastructure:
55

    
56
\begin{description}
57

    
58
    \item[Synnefo] is an IaaS management console. Users can create and start
59
        VMs, monitor their usage, create private internal networks among VMs
60
        and connect to them over the web. The service backend is based on
61
        Google's Ganneti for VM host management and hundrends of physical
62
        VM container nodes.
63

    
64
    \item[Archipelago] is a storage service, based on the Rados
65
        distributed object store. It is currently under development, and the
66
        plan is to act as the single point of storage for VM images, shared
67
        volumes and user files, providing clonable snapshots and distributed
68
        fault tolerance.
69
    
70
    \item[Pithos] is a user oriented file storage service. Currently it its
71
        second incarnation, it supports content deduplication, sharing of files
72
        and folders and a multitude of clients.
73

    
74
    \item[Astakos] is an identity consolidation system that also acts as the
75
        entry point to the entire infrastructure. Users can login using 
76
        identities from multiple systems, such as the Shibboleth (SAML) 
77
        federation enabled across all Greek universities or their Twitter 
78
        accounts.
79

    
80
\end{description}
81

    
82
While all the above systems (and several prospective ones) have different 
83
user interfaces and provide distinct functionality in the context of
84
the GRnet IaaS, they all share a common notion of \emph{resources}, access
85
and manipulation options to which they offer to users. 
86

    
87
\subsection{Sharing}
88

    
89
The Okeanos IaaS will support the Greek higher education, an estimated
90
population of 100.000 students and researchers. Each member will be granted
91
access to a collection of resources using her institutional account. To enforce
92
a limit to the resources that can be acquired from the platform, each user will
93
have a limited amount of credits, renewable each month, which will be allowed
94
to spend on any resource available through the infrastructure. Resources can
95
also be shared; for example files on the Pithos file service or virtual machine
96
images on the Archipelago storage are potentially subject to concurrent usage from 
97
multiple users. This means that charges for the use of a single resource
98
may need to be distributed among several users. Also this may mean that in order
99
for sharing to work correctly, users may need to transfer credits among them.
100

    
101

    
102
\subsection{Configuration}
103

    
104
Billing systems are by nature open ended. As new services are deployed, new
105
resources appear, while others might be phased out.  Moreover, changes to
106
company policies may trigger changes to price lists for those resources, while
107
ad-hoc requests for large scale computational resources may require special
108
pricing policies. In order for a billing system to be able to successfully
109
adapt to changing requirements, it must be able to accommodate such changes
110
without requiring changes to the application itself. This means that all
111
information required for Aquarium in order to perform a billing operation,
112
must be provided to it externally. Moreover, to ensure high-availability,
113
billing configuration should be updatable while Aquarium is running, or at
114
least with minimal downtime, without affecting the operation of external
115
systems.
116

    
117

    
118
\subsection{Scaling}
119

    
120
In the context of the Okeanos system, Aquarium provides billing services on a
121
per user basis for all resources exposed by other systems. As such, it is in
122
the critical path of user requests that modify resource state; all supported
123
applications must query Aquarium in order to ensure that the user has enough
124
credits to create a new resource. This means that for a large number of users
125
(given previous GRNet systems usage by the Greek research community, we
126
estimate a concurrency level of 30.000 users), Aquarium must update and
127
maintain in a queryable form their credit status, 
128
with soft realtime guarantees. 
129

    
130
Being on the critical path also means that Aquarium must be highly resilient,
131
too. If Aquarium fails, all supported systems will also fail. Even if Aquarium
132
fails for a short period of time, it must not loose any billing events, as this
133
will allow users to use resources without paying for them. Moreover, in case of
134
failure, Aquarium must not corrupt any billing data under any circumstances,
135
while it should reach an operating state very fast after a service restart.
136

    
137
\section{Domain Modeling}
138

    
139
\subsection{Basic terminology}
140
We have already mentioned several entities in our description so far. Let us be a bit more specific on several key terms.
141

    
142
\begin{description}
143
\item[Credits]
144
The analog of money. Credits are the `universal money` within Aquarium.
145

    
146
\item[Resource]
147
A billable/chargeable entity. We generally need credits to use a resource. When a resource is used,  then consume credits. Examples of resources are the `download bandwidth` and, respectively,  the 'upload bandwidth', the `disk space` and the `VM time` to name a few. Generally speaking, the ``resource'' term specifies the type. A resource may have several properties attached, i.e. it name, unit of measure, a description of how its consumption translates to credit and whether it can have more than one instances.
148

    
149
\item[Resource instance]
150
A user may have several instances of a resource type. For example, regarding a ``Virtual Machine'' resource, a user may have more then one of them. They are distinguished by their unique resource instance identifier. We call resources that can have more than one instance ``complex'' resources.
151

    
152
\item[Resource event]
153
An event that is generated by a system, which is responsible for the resource.
154
The resource event describes a state change for the resource. In particular, a resource event records the time when that state change occurred (`occurredMillis` attribute) and the changed value (`value` attribute).
155

    
156
\item[Cost policy]
157
A cost policy refers to a resource and it is the policy used in order to charge credits for resource usage. Cost policies come in three flavors, namely \textsf{continuous}, \textsf{discrete} and \textsf{onoff}.
158
          
159
\item[Pricelists] assign a price tag to each resource, within a timeframe.
160

    
161
\item[Charging algorithms] specify the way a resource event can generate consumed credits. 
162
A charging algorithm can be as simple as a direct multiplication of the 
163
        chargeable resource quantity with the applicable price. We offer the option, though, to define more complex charging  scenarios, the Aquarium DSL supports a simple imperative language with
164
        a number of implicit variables (e.g. \texttt{price, volume, date}) 
165
        that enable administrators to specify, e.g. billing algorithms that
166
        scale with billable volume. Similarily to price lists, charging algorithms
167
        have an applicability timeframe attached to them.
168
        
169
\item[Credit plans] define a number of credits to give to users and a repetition
170
        period.
171

    
172
\item[Agreements] assign a name to a charging algorithm, pricelist and creditplan triplets,
173
        which is then assigned to each user.
174
        
175

    
176
\item[Resource instance state]
177
A value which is associated with a resource instance. Usually a floating point number, as in the $10.5$ MB designation regarding a possible ``total current downloading bandwidth''.
178

    
179
\item[User]
180
An owner of resources and credits. Users are defined externally of Aquarium.
181

    
182
\item[Resource event store]
183
A datatabase, in the general sense, where resource events are stored.
184

    
185
\item[User bill]
186
A, usually, periodic statement of how many credits the user has consumed. It may contain detailed analysis that relates consumed credits to resources.
187
  
188
\item[Billing period]
189
A time period at the end of which we issue a user bill.
190
A billing period is made of a starting date and a duration that is a multiple of a week.
191
A usual billing period starts on a particular month date (eg. 3rd) and lasts for a month.
192
Each resource type designates what happens to its accumulated value (if any) at the beginning of the billing period. Usually, at the beginning of the billing period, the accumulating amounts of resources are set accumulating amount. For example, for a monthly billing period, the total uploading bandwidth is reset to zero every month.
193
   
194
\item[User state]
195
The user state is made of the following distinct parts, of which the first two can be integrated to a unifying ``resource'' concept.
196

    
197
\begin{enumerate}
198
\item User credit state, that is the total credit amount for the user.
199

    
200
\item User resource state, which refers to the state of each resource instance that the user owns.
201

    
202
\item Processing state \TODO
203
\end{enumerate}
204

    
205
\item[Resource event processing]
206
The set of algorithmic steps by which a resource event leads to state changes of the user state.
207

    
208
\end{description}
209

    
210

    
211
\subsection{The configuration DSL}
212
\label{sec:dsl}
213
The configuration requirements of Aquarium  were addressed by creating a new
214
domain specific language ({\sc dsl}), based on the YAML format.  The DSL
215
enables administrators to specify chargeable resources, charging policies and
216
price lists and combine them arbitrarily into agreements applicable to specific
217
users, user groups or the whole system. 
218
The DSL supports inheritance for policies, price lists and agreements and composition in the case of agreements.
219
It also facilitates the
220
definition of generic, repeatable debiting rules, which are then used by the
221
system to refill the user's account with credits on a periodic base.
222

    
223
From the previously specified term, the following five are used in the DSL:
224

    
225
\begin{enumerate}
226
\item Resources
227
\item Charging algorithms
228
\item Pricelists
229
\item Credit plans
230
\item Agreements
231
\end{enumerate}
232

    
233

    
234
\begin{figure}
235
\lstset{language=c, basicstyle=\footnotesize,
236
stringstyle=\ttfamily, 
237
flexiblecolumns=true, aboveskip=-0.9em, belowskip=0em, lineskip=0em}
238

    
239
\begin{lstlisting}
240
resources:
241
  - resource:
242
    name: bandwidthup
243
    unit: MB/hr
244
    complex: false
245
    costpolicy: continuous
246
pricelists:
247
  - pricelist: 
248
    name: default
249
    bandwidthup: 0.01
250
    effective:
251
      from: 0
252
  - pricelist: 
253
    name: everyTue2
254
    overrides: default
255
    bandwidthup: 0.1
256
    effective:
257
      repeat:
258
      - start: "00 02 * * Tue"
259
        end:   "00 02 * * Wed"
260
      from: 1326041177        //Sun, 8 Jan 2012 18:46:27 EET
261
algorithms:
262
  - algorithm:
263
    name: default
264
    bandwidthup: $price times $volume
265
    effective:
266
      from: 0
267
agreements:
268
  - agreement:
269
    name: scaledbandwidth
270
    pricelist: everyTue2
271
    algorithm:
272
      bandwidthup: |
273
        if $volume gt 15 then
274
          $volume times $price
275
        elsif $volume gt 15 and volume lt 30 then
276
          $volume times $price times 1.2
277
        else
278
          $volume times price times 1.4
279
        end
280
\end{lstlisting}
281

    
282
\caption{A simple billing policy definition.} 
283
\label{fig:dsl}
284
\end{figure}
285

    
286
In Figure~\ref{fig:dsl}, we present the definition of a simple (albeit valid) 
287
policy. The policy parsing is done top down, so the order of definition 
288
is important. The definition starts with a resource, whose name is then
289
re-used in order to attach a pricelist and a price calculation algorith to it.
290
In the case of pricelists, we present an example of \emph{temporal overloading};
291
the \texttt{everyTue2} pricelist overrides the default one, but only for 
292
all repeating time frames between every Tuesday at 02:00 and Wednesday at
293
02:00, starting from the timestamp indicated at the \texttt{from} field. Another
294
example of overloading is presented at the definition of the agreement, which
295
overloads the default algorithm definition using the imperative part of the
296
Aquarium {\sc dsl} to provide a scaling charge algorithm.
297

    
298
\subsection{Billing}
299

    
300
Commonly to most similar systems, billing in Aquarium is the application of the
301
provisions of a user's contract to an incoming billing event in order to
302
produce an entry for the user's wallet. However, in stark contrast to most
303
other systems, which rely on database transactions in order to securely modify
304
the user's balance, Aquarium performs account updates asynchronously and
305
concurrently for all users.
306

    
307

    
308

    
309
Per resource, the charging operation is affected by the cost policy and complexity
310
parameters. Specifically, the 3 available cost policies affect the calculation 
311
of the amount of resource usage to be charged as follows:
312

    
313
\begin{itemize}
314
    \item resources employing the \textsf{continuous} cost policy are charged for
315
        the actual resource usage through time. When a resource event arrives,
316
        the previous resource state between the previous charge operation and the
317
        current event event timestamp is charged and the resource state is then
318
        updated. More formally, for continuous resources, if $f(t)$ represents
319
        the function of resource usage through time and $p(t)$ is the function
320
        representing the pricelist at time $t$, 
321
        then the total cost up to a 
322
        $c(t) = \sum_{i=0}^{t} {p(t) \times \int_0^{t}{f(t)dt}}$. Most resources
323
        in Aquarium are continuous, for example bandwidth and disk space.
324

    
325
    \item resources employing the \textsf{onoff} cost policy can be in two states:
326
        either switched on and actively used or switched off. Therefore, the unit
327
        of resource usage is time and not the actual resource usage, while the
328
        period of charging is calculated only when the resource is switched on.
329
        Virtual machines are examples of resources with the \textsf{onoff} cost
330
        policy.
331

    
332
    \item resources using the \textsf{distinct} cost policy are charged
333
        upon usage, without time playing a role in the charge. Such resources
334
        are useful for one off charges, such as the allocation of
335
        virtual machine or the migration of a virtual machine to a less busy
336
        host.
337

    
338
\end{itemize}
339

    
340
Billing events are obtained through a connection to a message queue. Upon
341
arrival, a billing event is stored in an immutable log, and then forwarded to
342
the user actor's mailbox; the calculation of the actual billing entries to be
343
stored in the user's wallet is done within the context of the user actor,
344
serially for each incoming events. This permits the actor to have mutable state
345
internally (as described in Section~\ref{sec:ustate}), without risking the
346
calculation correctness. The calculation process involves steps such as
347
validating the resource event, resolving the current state of resource affected
348
by the incoming resource event, deciding the value applicable pricelist and
349
algorithm, generating entries for the user's wallet and updating the current
350
resource state for the user. A significant source of complexity in the process
351
is the support for temporal overriding for pricelists and algorithms: within
352
the timeframe between resource updates, several policies or algorithms may be
353
active. The billing algorithm must therefore split the billing period to pieces
354
according the applicability of each policy/algorithm and make sure that at 
355
least a baseline policy is in effect in order to perform the calculation.
356
Consequently, a resource event might lead to several entries to the user's wallet.
357

    
358
The actual format of the event is presented in Figure~\ref{fig:resevt}.
359

    
360
\begin{figure}
361
\lstset{language=C, basicstyle=\footnotesize,
362
stringstyle=\ttfamily, 
363
flexiblecolumns=true, aboveskip=-0.9em, belowskip=0em, lineskip=0em}
364

    
365
\begin{lstlisting}
366
{
367
  "id":"4b3288b57e5c1b08a67147c495e54a68655fdab8",
368
  "occured":1314829876295,
369
  "userId":31,
370
  "cliendId":3,
371
  "resource":"vmtime",
372
  "eventVersion":1,
373
  "value": 1,
374
  "details":{
375
    "vmid":"3300",
376
    "action": "on"
377
  }
378
}
379
\end{lstlisting}
380
\caption{A billing event example} 
381
\label{fig:resevt}
382

    
383
\end{figure}
384

    
385
\subsection{User State}
386
\label{sec:ustate}
387

    
388
\section{Architecture}
389
\input{arch}
390

    
391
\section{Performance}
392

    
393
To evaluate the performance and scalability of Aquarium, we performed two
394
experiments: The first one is a micro-benchmark that measures the time required
395
for the basic processing operation performed by Aquarium, which is billing for
396
increasing number of messages. The second one demonstrates Aquarium's
397
scalability on a single node with respect to the number of users.  In both
398
cases, Aquarium was run on a MacBookPro featuring a quad core 2.33{\sc g}hz
399
Intel i7 processor and 8{\sc gb} of {\sc ram}. We selected Rabbit{\sc mq} and
400
Mongo{\sc db} as the queue and database servers, both of which were run on a
401
virtualised 4 core with 4{\sc gb} {\sc ram} Debian Linux server. Both systems
402
were run using current versions at the time of benchmarking (2.7.1 for
403
Rabbit{\sc mq} and 2.6 for Mongo{\sc db}).  The two systems were connected with
404
a full duplex 100Mbps connection.  No particular optimization was performed on
405
either back-end system, nor to the {\sc jvm} that run Aquarium. 
406

    
407
To simulate a realistic deployment, Aquarium was configured, using the policy
408
{\sc dsl} to handle billing events for 4 types of resources, using 3 overloaded
409
pricelists, 2 overloaded algorithms, all of which were combined to 10 different
410
agreements, which were randomly (uniformly) assigned to users. To drive the
411
benchmarks, we used a synthetic load generator that worked in two stages: it
412
first created a configurable number of users and then produced billing events
413
that 
414

    
415

    
416
All measurements were done using the first working version of the
417
Aquarium deployment, so no real optimisation effort did take place. 
418

    
419

    
420
\section{Lessons Learned}
421

    
422
One of the topics of debate while designing Aquarium was the choice of
423
programming platform to use. With all user facing systems in the Okeanos cloud
424
being developed in Python and the initial Aquarium designers being beginner
425
Scala users (but experts in Java), the choice certainly involved risk that
426
management was initially reluctant to take. However, by breaking down the
427
requirements and considering the various safeguards that the software would
428
need to employ in order to satisfy them, it became clear that a
429
typesafe language was a hard requirement. Of the platforms examined, the {\sc
430
jvm} had the richest collection of ready made components; the Akka library was
431
particularly enticing for the scalability and distribution possibilities it
432
offered.
433

    
434
The choice of Scala at the moment it had been made was a high risk/high gain
435
bet for GRNet. However, the development team's experience has been generally
436
positive. Scala as a language was an enabling factor; case classes permitted
437
the expression of data models, including the configuration {\sc dsl}, that
438
could be easily be serialized or read back from wire formats while also
439
promoting immutability through the use of the \texttt{copy()} constructor. The
440
pervasive use of immutability allowed us to write strict, yet simple and
441
concise unit tests, as the number of cases to be examined was generally low.
442
The \textsf{Maybe}\footnote{\textsf{Maybe} works like \textsf{Option}, but it
443
has an extra possible state (\textsf{Failed}), which allows exceptions to be
444
encapsulated in the return type of a function, and then retrieved and accounted
445
for in a pattern matching operation with no side effects. More at
446
\url{https://github.com/loverdos/Maybe}} monad, enabled side-effect free
447
development of data processing functions, even in cases where exceptions were
448
the only way to go. Java interoperability was excellent, while thin Scala
449
wrappers around existing Java libraries enabled higher productivity and use of
450
Scala idioms in conjunction with Java code.
451

    
452
The Akka library, which is the backbone of our system, is a prime example of 
453
the simplicity that can be achieved by using carefully designed high-level
454
components. Akka's custom supervision hierarchies allowed us to partition the
455
system in self-healing sub-components, each of which can fail independently
456
of the other. For example, if the queue reader component fails due to a queue
457
failure, Aquarium will still be accessible and responsive for the {\sc rest}
458
interface. Also, Akka allowed us to easily saturate the processing components
459
of any system we tested Aquarium on, simply by tuning the number of threads (in
460
{\sc i/o} bound parts) and actors (in {\sc cpu} bound parts) per dispatcher. 
461

    
462
Despite the above, the experience was not as smooth as initially expected. The
463
most prominent problem we encountered was that of lacking documentation. The
464
Akka library documentation, extensive as is, only scratches the surface.
465
Several other libraries we use, for example Spray for {\sc rest} handling, have
466
non-existent documentation. The Java platform, and .Net that followed, has
467
shown that thorough and precise documentation are key to adoption, and we
468
expected a similar quality level. Related is the problem of shared community
469
wisdom; as most developers know, a search for any programming problem will
470
reveal several straightforward Java or scripting language sources. The
471
situation with Scala is usually the opposite; the expressive power of the
472
language makes it the current language of choice for treating esoteric
473
functional programming concepts, while simple topics are often neglected. Scala
474
has several libraries of algebraic datatypes but no {\sc yaml} parser. As Scala
475
gains mainstream adoption, we hope that such problems will fade.
476

    
477
From a software engineering point of view, the current state of the project was
478
reached using about 6 person months of effort, 2 of which were devoted to
479
requirements elicitation, prototype building and familiarizing with the
480
language. The source code currently consists of 5.000 lines of executable
481
statements (including about 1.000 lines of tests), divided in about 10
482
packages. The system is built using both {\sc sbt} and Maven. 
483

    
484
\section{Related Work}
485

    
486
\section{Conclusions and Future Work}
487
In this paper, we presented Aquarium, a high-performance, generic billing 
488
system, currently tuned for cloud applications. We presented the requirements
489
that underpinned its design, outlined the architectural decisions made
490
and analysed its implementation and performance.
491

    
492
Scala has been an enabling factor for the implementation of Aquarium, both
493
at the system prototyping phase and during actual development. 
494

    
495
Aquarium is still under development, with a first stable version 
496
being planned for early 2012. 
497

    
498
Aquarium is available under an open source license at 
499
\url{https://code.grnet.gr/projects/aquarium}.
500

    
501
\bibliographystyle{abbrvnat}
502
\bibliography{aquarium}
503

    
504
\end{document}