Revision e5d8df8c docs/pithos.rst

b/docs/pithos.rst
1
Object Storage Service (Pithos+)
2
================================
1
Object Storage Service (Pithos)
2
===============================
3 3

  
4
Pithos+ is an online storage service based on the OpenStack Object
4
Pithos is an online storage service based on the OpenStack Object
5 5
Storage API with several important extensions. It uses a
6 6
block-based mechanism to allow users to upload, download, and share
7 7
files, keep different versions of a file, and attach policies to them.
8
It follows a layered, modular implementation. Pithos+ was designed to
8
It follows a layered, modular implementation. Pithos was designed to
9 9
be used as a storage service by the total set of the Greek research
10 10
and academic community (counting tens of thousands of users) but is
11 11
free and open to use by anybody, under a BSD-2 clause license.
12 12

  
13
A presentation of Pithos+ features and architecture is :download:`here <pithos-plus.pdf>`.
13
A presentation of Pithos features and architecture is :download:`here <pithos-plus.pdf>`.
14 14

  
15 15
Introduction
16 16
------------
......
22 22
12,000 users.
23 23

  
24 24
In 2011 GRNET decided to offer a new, evolved online storage
25
service, to be called Pithos+. Pithos+ is designed to address the
25
service, to be called Pithos. Pithos is designed to address the
26 26
main requirements expressed by the Pithos users in the first two years of
27 27
operation:
28 28

  
......
45 45
  open up Pithos to non-Shibboleth authenticated users as well.
46 46
* Use open standards as far as possible.   
47 47

  
48
In what follows we describe the main features of Pithos+, the elements
48
In what follows we describe the main features of Pithos, the elements
49 49
of its design and the capabilities it affords. We touch on related
50 50
work and we provide some discussion on our experiences and thoughts on
51 51
the future.
52 52

  
53
Pithos+ Features
54
----------------
53
Pithos Features
54
---------------
55 55

  
56
Pithos+ is based on the OpenStack Object Storage API (Pithos
56
Pithos is based on the OpenStack Object Storage API (Pithos
57 57
used a home-grown API). We decided to adopt an open standard
58 58
API in order to leverage existing clients that implement the
59
API. In this way, a user can access Pithos+ with a standard
60
OpenStack client - although users will want to use a Pithos+ client to
59
API. In this way, a user can access Pithos with a standard
60
OpenStack client - although users will want to use a Pithos client to
61 61
use features going beyond those offered by the OpenStack API.
62
The strategy paid off during Pithos+ development itself, as we were
62
The strategy paid off during Pithos development itself, as we were
63 63
able to access and test the service with existing clients, while also
64 64
developing new clients based on open source OpenStack clients.
65 65

  
......
69 69
  OpenStack stores objects, which may be files, but this is not
70 70
  necessary - large files (longer than 5GBytes), for instance, must be
71 71
  stored as a series of distinct objects accompanied by a manifest.
72
  Pithos+ stores blocks, so objects can be of unlimited size.
72
  Pithos stores blocks, so objects can be of unlimited size.
73 73
* Permissions on individual files and folders. Note that folders
74 74
  do not exist in the OpenStack API, but are simulated by
75
  appropriate conventions, an approach we have kept in Pithos+ to
75
  appropriate conventions, an approach we have kept in Pithos to
76 76
  avoid incompatibility.
77 77
* Fully-versioned objects.
78 78
* Metadata-based queries. Users are free to set metadata on their
......
84 84
* Partial upload and download based on HTTP request
85 85
  headers and parameters.
86 86
* Object updates, where data may even come from other objects
87
  already stored in Pithos+. This allows users to compose objects from
87
  already stored in Pithos. This allows users to compose objects from
88 88
  other objects without uploading data.
89 89
* All objects are assigned UUIDs on creation, which can be
90 90
  used to reference them regardless of their path location.
91 91

  
92
Pithos+ Design
93
--------------
92
Pithos Design
93
-------------
94 94

  
95
Pithos+ is built on a layered architecture (see Figure).
96
The Pithos+ server speaks HTTP with the outside world. The HTTP
95
Pithos is built on a layered architecture (see Figure).
96
The Pithos server speaks HTTP with the outside world. The HTTP
97 97
operations implement an extended OpenStack Object Storage API.
98 98
The back end is a library meant to be used by internal code and
99 99
other front ends. For instance, the back end library, apart from being
100
used in Pithos+ for implementing the OpenStack Object Storage API,
100
used in Pithos for implementing the OpenStack Object Storage API,
101 101
is also used in our implementation of the OpenStack Image
102 102
Service API. Moreover, the back end library allows specification
103 103
of different namespaces for metadata, so that the same object can be
104 104
viewed by different front end APIs with different sets of
105
metadata. Hence the same object can be viewed as a file in Pithos+,
105
metadata. Hence the same object can be viewed as a file in Pithos,
106 106
with one set of metadata, or as an image with a different set of
107 107
metadata, in our implementation of the OpenStack Image Service.
108 108

  
......
118 118
Block-based Storage for the Client
119 119
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
120 120

  
121
Since an object is saved as a set of blocks in Pithos+, object
121
Since an object is saved as a set of blocks in Pithos, object
122 122
operations are no longer required to refer to the whole object. We can
123 123
handle parts of objects as needed when uploading, downloading, or
124 124
copying and moving data.
125 125

  
126 126
In particular, a client, provided it has access permissions, can
127
download data from Pithos+ by issuing a ``GET`` request on an
127
download data from Pithos by issuing a ``GET`` request on an
128 128
object. If the request includes the ``hashmap`` parameter, then the
129 129
request refers to a hashmap, that is, a set containing the
130 130
object's block hashes. The reply is of the form::
......
142 142
server and containing the root Merkle hash of the object's
143 143
hashmap.
144 144

  
145
When uploading a file to Pithos+, only the missing blocks will be
145
When uploading a file to Pithos, only the missing blocks will be
146 146
submitted to the server, with the following algorithm:
147 147

  
148 148
* Calculate the hash value for each block of the object to be
......
162 162

  
163 163
In effect, we are deduplicating data based on their block hashes,
164 164
transparently to the users. This results to perceived instantaneous
165
uploads when material is already present in Pithos+ storage.
165
uploads when material is already present in Pithos storage.
166 166

  
167 167
Block-based Storage Processing
168 168
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
169 169

  
170 170
Hashmaps themselves are saved in blocks. All blocks are persisted to
171 171
storage using content-based addressing. It follows that to read a
172
file, Pithos+ performs the following operations:
172
file, Pithos performs the following operations:
173 173

  
174 174
* The client issues a request to get a file, via HTTP ``GET``.
175 175
* The API front end asks from the back end the metadata
......
227 227
data will come from another object, instead of being uploaded by the
228 228
client. 
229 229

  
230
Pithos+ Back End Nodes
231
^^^^^^^^^^^^^^^^^^^^^^
230
Pithos Back End Nodes
231
^^^^^^^^^^^^^^^^^^^^^
232 232

  
233
Pithos+ organizes entities in a tree hierarchy, with one tree node per
233
Pithos organizes entities in a tree hierarchy, with one tree node per
234 234
path entry (see Figure). Nodes can be accounts,
235 235
containers, and objects. A user may have multiple
236 236
accounts, each account may have multiple containers, and each
......
246 246
simulate pseudo-hierarchical folders. In particular, object names that
247 247
contain the forward slash character and have an accompanying marker
248 248
object with a ``Content-Type: application/directory`` as part of
249
their metadata can be treated as directories by Pithos+ clients. Each
249
their metadata can be treated as directories by Pithos clients. Each
250 250
node corresponds to a unique path, and we keep its parent in the
251 251
account/container/object hierarchy (that is, all objects have a
252 252
container as their parent).
253 253

  
254
Pithos+ Back End Versions
255
^^^^^^^^^^^^^^^^^^^^^^^^^
254
Pithos Back End Versions
255
^^^^^^^^^^^^^^^^^^^^^^^^
256 256

  
257 257
For each object version we keep the root Merkle hash of the object it
258 258
refers to, the size of the object, the last modification time and the
......
265 265

  
266 266
.. image:: images/pithos-backend-versions.png
267 267

  
268
This versioning allows Pithos+ to offer to its user time-based
268
This versioning allows Pithos to offer to its user time-based
269 269
contents listing of their accounts. In effect, this also allows them
270 270
to take their containers back in time. This is implemented
271 271
conceptually by taking a vertical line in the Figure and
272 272
presenting to the user the state on the left side of the line.
273 273

  
274
Pithos+ Back End Permissions
275
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
274
Pithos Back End Permissions
275
^^^^^^^^^^^^^^^^^^^^^^^^^^^
276 276

  
277
Pithos+ recognizes read and write permissions, which can be granted to
277
Pithos recognizes read and write permissions, which can be granted to
278 278
individual users or groups of users. Groups as collections of users
279 279
created at the account level by users themselves, and are flat - a
280 280
group cannot contain or reference another group. Ownership of a file
281 281
cannot be delegated.
282 282

  
283
Pithos+ also recognizes a "public" permission, which means that the
283
Pithos also recognizes a "public" permission, which means that the
284 284
object is readable by all. When an object is made public, it is
285 285
assigned a URL that can be used to access the object from
286
outside Pithos+ even by non-Pithos+ users. 
286
outside Pithos even by non-Pithos users. 
287 287

  
288 288
Permissions can be assigned to objects, which may be actual files, or
289 289
directories. When listing objects, the back end uses the permissions as
......
310 310
Discussion
311 311
----------
312 312

  
313
Pithos+ is implemented in Python as a Django application. We use SQLAlchemy
313
Pithos is implemented in Python as a Django application. We use SQLAlchemy
314 314
as a database abstraction layer. It is currently about
315 315
17,000 lines of code, and it has taken about 50 person months of
316 316
development effort. This development was done from scratch, with no
......
328 328
The desktop clients allow synchronization with local directories, a
329 329
feature that existing users of Pithos have been asking for, probably
330 330
influenced by services like DropBox. These clients are offered in
331
parallel to the standard Pithos+ interface, which is a web application
331
parallel to the standard Pithos interface, which is a web application
332 332
build on top of the API front end - we treat our own web
333 333
application as just another client that has to go through the API
334 334
front end, without granting it access to the back end directly.
335 335

  
336
We are carrying the idea of our own services being clients to Pithos+
336
We are carrying the idea of our own services being clients to Pithos
337 337
a step further, with new projects we have in our pipeline, in which a
338
digital repository service will be built on top of Pithos+. It will
338
digital repository service will be built on top of Pithos. It will
339 339
use again the API front end, so that repository users will have
340
all Pithos+ capabilities, and on top of them we will build additional
340
all Pithos capabilities, and on top of them we will build additional
341 341
functionality such as full text search, Dublin Core metadata storage
342 342
and querying, streaming, and so on.
343 343

  
344
At the time of this writing (March 2012) Pithos+ is in alpha,
344
At the time of this writing (March 2012) Pithos is in alpha,
345 345
available to users by invitation. We will extend our user base as we
346 346
move to beta in the coming months, and to our full set of users in the
347 347
second half of 2012. We are eager to see how our ideas fare as we will
......
350 350
Acknowledgments
351 351
---------------
352 352

  
353
Pithos+ is financially supported by Grant 296114, "Advanced Computing
353
Pithos is financially supported by Grant 296114, "Advanced Computing
354 354
Services for the Research and Academic Community", of the Greek
355 355
National Strategic Reference Framework.
356 356

  
357 357
Availability
358 358
------------
359 359

  
360
The Pithos+ code is available under a BSD 2-clause license from:
360
The Pithos code is available under a BSD 2-clause license from:
361 361
https://code.grnet.gr/projects/pithos/repository
362 362

  
363 363
The code can also be accessed from its source repository:

Also available in: Unified diff