Revision dad708b4

b/docs/dev-guide.rst
44 44
Storage API (Pithos+)
45 45
=====================
46 46

  
47
This is the Pithos+ File Storage API:
47
This is the Pithos+ Object Storage API:
48 48

  
49 49
.. toctree::
50 50
   :maxdepth: 2
51 51

  
52
   File Storage API <pithos-api-guide>
52
   Object Storage API <pithos-api-guide>
53 53

  
54 54
Implementing new clients
55 55
========================
......
218 218
 * Updating a state (either local or remote) implies downloading, uploading or
219 219
   deleting the appropriate file.
220 220

  
221
Recommended Practices and Examples
222
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
223

  
224
Assuming an authentication token is obtained, the following high-level
225
operations are available - shown with ``curl``:
226

  
227
* Get account information ::
228

  
229
    curl -X HEAD -D - \
230
         -H "X-Auth-Token: 0000" \
231
         https://pithos.dev.grnet.gr/v1/user
232

  
233
* List available containers ::
234

  
235
    curl -X GET -D - \
236
         -H "X-Auth-Token: 0000" \
237
         https://pithos.dev.grnet.gr/v1/user
238

  
239
* Get container information ::
240

  
241
    curl -X HEAD -D - \
242
         -H "X-Auth-Token: 0000" \
243
         https://pithos.dev.grnet.gr/v1/user/pithos
244

  
245
* Add a new container ::
246

  
247
    curl -X PUT -D - \
248
         -H "X-Auth-Token: 0000" \
249
         https://pithos.dev.grnet.gr/v1/user/test
250

  
251
* Delete a container ::
252

  
253
    curl -X DELETE -D - \
254
         -H "X-Auth-Token: 0000" \
255
         https://pithos.dev.grnet.gr/v1/user/test
256

  
257
* List objects in a container ::
258

  
259
    curl -X GET -D - \
260
         -H "X-Auth-Token: 0000" \
261
         https://pithos.dev.grnet.gr/v1/user/pithos
262

  
263
* List objects in a container (extended reply) ::
264

  
265
    curl -X GET -D - \
266
         -H "X-Auth-Token: 0000" \
267
         https://pithos.dev.grnet.gr/v1/user/pithos?format=json
268

  
269
  It is recommended that extended replies are cached and subsequent requests
270
  utilize the ``If-Modified-Since`` header.
271

  
272
* List metadata keys used by objects in a container
273

  
274
  Will be in the ``X-Container-Object-Meta`` reply header, included in
275
  container information or object list (``HEAD`` or ``GET``). (**TBD**)
276

  
277
* List objects in a container having a specific meta defined ::
278

  
279
    curl -X GET -D - \
280
         -H "X-Auth-Token: 0000" \
281
         https://pithos.dev.grnet.gr/v1/user/pithos?meta=favorites
282

  
283
* Retrieve an object ::
284

  
285
    curl -X GET -D - \
286
         -H "X-Auth-Token: 0000" \
287
         https://pithos.dev.grnet.gr/v1/user/pithos/README.txt
288

  
289
* Retrieve an object (specific ranges of data) ::
290

  
291
    curl -X GET -D - \
292
         -H "X-Auth-Token: 0000" \
293
         -H "Range: bytes=0-9" \
294
         https://pithos.dev.grnet.gr/v1/user/pithos/README.txt
295

  
296
  This will return the first 10 bytes. To get the first 10, bytes 30-39 and the
297
  last 100 use ``Range: bytes=0-9,30-39,-100``.
298

  
299
* Add a new object (folder type) (**TBD**) ::
300

  
301
    curl -X PUT -D - \
302
         -H "X-Auth-Token: 0000" \
303
         -H "Content-Type: application/directory" \
304
         https://pithos.dev.grnet.gr/v1/user/pithos/folder
305

  
306
* Add a new object ::
307

  
308
    curl -X PUT -D - \
309
         -H "X-Auth-Token: 0000" \
310
         -H "Content-Type: text/plain" \
311
         -T EXAMPLE.txt
312
         https://pithos.dev.grnet.gr/v1/user/pithos/folder/EXAMPLE.txt
313

  
314
* Update an object ::
315

  
316
    curl -X POST -D - \
317
         -H "X-Auth-Token: 0000" \
318
         -H "Content-Length: 10" \
319
         -H "Content-Type: application/octet-stream" \
320
         -H "Content-Range: bytes 10-19/*" \
321
         -d "0123456789" \
322
         https://pithos.dev.grnet.gr/v1/user/folder/EXAMPLE.txt
323

  
324
  This will update bytes 10-19 with the data specified.
325

  
326
* Update an object (append) ::
327

  
328
    curl -X POST -D - \
329
         -H "X-Auth-Token: 0000" \
330
         -H "Content-Length: 10" \
331
         -H "Content-Type: application/octet-stream" \
332
         -H "Content-Range: bytes */*" \
333
         -d "0123456789" \
334
         https://pithos.dev.grnet.gr/v1/user/folder/EXAMPLE.txt
335

  
336
* Update an object (truncate) ::
337

  
338
    curl -X POST -D - \
339
         -H "X-Auth-Token: 0000" \
340
         -H "X-Source-Object: /folder/EXAMPLE.txt" \
341
         -H "Content-Range: bytes 0-0/*" \
342
         -H "X-Object-Bytes: 0" \
343
         https://pithos.dev.grnet.gr/v1/user/folder/EXAMPLE.txt
344

  
345
  This will truncate the object to 0 bytes.
346

  
347
* Add object metadata ::
348

  
349
    curl -X POST -D - \
350
         -H "X-Auth-Token: 0000" \
351
         -H "X-Object-Meta-First: first_meta_value" \
352
         -H "X-Object-Meta-Second: second_meta_value" \
353
         https://pithos.dev.grnet.gr/v1/user/folder/EXAMPLE.txt
354

  
355
* Delete object metadata ::
356

  
357
    curl -X POST -D - \
358
         -H "X-Auth-Token: 0000" \
359
         -H "X-Object-Meta-First: first_meta_value" \
360
         https://pithos.dev.grnet.gr/v1/user/folder/EXAMPLE.txt
361

  
362
  Metadata can only be "set". To delete ``X-Object-Meta-Second``, reset all
363
  metadata.
364

  
365
* Delete an object ::
366

  
367
    curl -X DELETE -D - \
368
         -H "X-Auth-Token: 0000" \
369
         https://pithos.dev.grnet.gr/v1/user/folder/EXAMPLE.txt
370

  
b/docs/index.rst
14 14
   :maxdepth: 1
15 15

  
16 16
   Identity Management (codename: astakos) <astakos>
17
   File Storage Service (codename: pithos+) <pithos>
17
   Object Storage Service (codename: pithos+) <pithos>
18 18
   Compute/Network Service (codename: cyclades) <cyclades>
19 19
   Image Registry (codename: plankton) <plankton>
20 20
   Billing Service (codename: aquarium) <http://docs.dev.grnet.gr/aquarium/latest/index.html>
b/docs/pithos-api-guide.rst
1 1
Pithos+ API
2 2
===========
3 3

  
4
This is the Pithos+ API guide.
4
Introduction
5
------------
5 6

  
7
Pithos is a storage service implemented by GRNET (http://www.grnet.gr). Data is stored as objects, organized in containers, belonging to an account. This hierarchy of storage layers has been inspired by the OpenStack Object Storage (OOS) API and similar CloudFiles API by Rackspace. The Pithos API follows the OOS API as closely as possible. One of the design requirements has been to be able to use Pithos with clients built for the OOS, without changes.
6 8

  
7
Overview
8
--------
9
However, to be able to take full advantage of the Pithos infrastructure, client software should be aware of the extensions that differentiate Pithos from OOS. Pithos objects can be updated, or appended to. Pithos will store sharing permissions per object and enforce corresponding authorization policies. Automatic version management, allows taking account and container listings back in time, as well as reading previous instances of objects.
9 10

  
10
Pithos+ data is stored as objects, organized in containers, belonging to an
11
account. This hierarchy of storage layers has been inspired by the OpenStack
12
Object Storage (OOS) API and similar CloudFiles API by Rackspace. The Pithos
13
API follows the OOS API as closely as possible. One of the design requirements
14
has been to be able to use Pithos with clients built for the OOS, without
15
changes.
16

  
17
However, to be able to take full advantage of the Pithos infrastructure, client
18
software should be aware of the extensions that differentiate Pithos from OOS.
11
The storage backend of Pithos is block oriented, permitting efficient, deduplicated data placement. The block structure of objects is exposed at the API layer, in order to encourage external software to implement advanced data management operations.
19 12

  
20 13
This document's goals are:
21 14

  
22
 * Define the Pithos ReST API that allows the storage and retrieval of data and
23
   metadata via HTTP calls
24
 * Specify metadata semantics and user interface guidelines for a common
25
   experience across client software implementations
15
* Define the Pithos ReST API that allows the storage and retrieval of data and metadata via HTTP calls
16
* Specify metadata semantics and user interface guidelines for a common experience across client software implementations
17

  
18
The present document is meant to be read alongside the OOS API documentation. Thus, it is suggested that the reader is familiar with associated technologies, the OOS API as well as the first version of the Pithos API. This document refers to the second version of Pithos. Information on the first version of the storage API can be found at http://code.google.com/p/gss.
26 19

  
27
The present document is meant to be read alongside the OOS API documentation.
28
Thus, it is suggested that the reader is familiar with associated technologies,
29
the OOS API as well as the first version of the Pithos API. This document
30
refers to the version of Pithos+. Information on the first version of the
31
storage API can be found at http://code.google.com/p/gss.
20
Whatever marked as to be determined (**TBD**), should not be considered by implementors.
32 21

  
33
Whatever marked as to be determined (**TBD**), should not be considered by
34
implementors.
22
More info about Pithos can be found here: https://code.grnet.gr/projects/pithos
35 23

  
36 24
Document Revisions
37 25
^^^^^^^^^^^^^^^^^^
......
93 81
0.1 (May 17, 2011)         Initial release. Based on OpenStack Object Storage Developer Guide API v1 (Apr. 15, 2011).
94 82
=========================  ================================
95 83

  
96
Users and Authentication
97
------------------------
84
Pithos Users and Authentication
85
-------------------------------
98 86

  
99
In Pithos+, each user is uniquely identified by a token. All API requests
100
require a token and each token is internally resolved to an account string. The
101
API uses the account string to identify the user's own files, thus whether a
102
request is local or cross-account.
87
In Pithos, each user is uniquely identified by a token. All API requests require a token and each token is internally resolved to an account string. The API uses the account string to identify the user's own files, thus whether a request is local or cross-account.
103 88

  
104
Pithos+ does not keep a user database. For development and testing purposes,
105
user identifiers and their corresponding tokens can be defined in the settings
106
file. However, Pithos is designed with an external authentication service in
107
mind. This service must handle the details of validating user credentials and
108
communicate with Pithos via a middleware software component that, given a
109
token, fills in the internal request account variable.
89
Pithos does not keep a user database. For development and testing purposes, user identifiers and their corresponding tokens can be defined in the settings file. However, Pithos is designed with an external authentication service in mind. This service must handle the details of validating user credentials and communicate with Pithos via a middleware software component that, given a token, fills in the internal request account variable.
110 90

  
111
Client software using Pithos+, if not already knowing a user's identifier and
112
token, should forward to the ``/login`` URI. The Pithos server, depending on
113
its configuration will redirect to the appropriate login page.
91
Client software using Pithos, if not already knowing a user's identifier and token, should forward to the ``/login`` URI. The Pithos server, depending on its configuration will redirect to the appropriate login page.
114 92

  
115 93
The login URI accepts the following parameters:
116 94

  
......
126 104

  
127 105
A user management service that implements a login URI according to these conventions is Astakos (https://code.grnet.gr/projects/astakos), by GRNET.
128 106

  
129
API Operations
107
The Pithos API
130 108
--------------
131 109

  
132
The URI requests supported by the Pithos+ API follow one of the following forms:
110
The URI requests supported by the Pithos API follow one of the following forms:
133 111

  
134 112
* Top level: ``https://hostname/v1/``
135 113
* Account level: ``https://hostname/v1/<account>``
......
1120 1098
* The ``Last-Modified`` header value always reflects the actual latest change timestamp, regardless of time control parameters and version requests. Time precondition checks with ``If-Modified-Since`` and ``If-Unmodified-Since`` headers are applied to this value.
1121 1099
* A copy/move using ``PUT``/``COPY``/``MOVE`` will always update metadata, keeping all old values except the ones redefined in the request headers.
1122 1100
* A ``HEAD`` or ``GET`` for an ``X-Object-Manifest`` object, will include modified ``Content-Length`` and ``ETag`` headers, according to the characteristics of the objects under the specified prefix. The ``Etag`` will be the MD5 hash of the corresponding ETags concatenated. In extended container listings there is no metadata processing.
1101

  
1102
Recommended Practices and Examples
1103
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1104

  
1105
Assuming an authentication token is obtained, the following high-level operations are available - shown with ``curl``:
1106

  
1107
* Get account information ::
1108

  
1109
    curl -X HEAD -D - \
1110
         -H "X-Auth-Token: 0000" \
1111
         https://pithos.dev.grnet.gr/v1/user
1112

  
1113
* List available containers ::
1114

  
1115
    curl -X GET -D - \
1116
         -H "X-Auth-Token: 0000" \
1117
         https://pithos.dev.grnet.gr/v1/user
1118

  
1119
* Get container information ::
1120

  
1121
    curl -X HEAD -D - \
1122
         -H "X-Auth-Token: 0000" \
1123
         https://pithos.dev.grnet.gr/v1/user/pithos
1124

  
1125
* Add a new container ::
1126

  
1127
    curl -X PUT -D - \
1128
         -H "X-Auth-Token: 0000" \
1129
         https://pithos.dev.grnet.gr/v1/user/test
1130

  
1131
* Delete a container ::
1132

  
1133
    curl -X DELETE -D - \
1134
         -H "X-Auth-Token: 0000" \
1135
         https://pithos.dev.grnet.gr/v1/user/test
1136

  
1137
* List objects in a container ::
1138

  
1139
    curl -X GET -D - \
1140
         -H "X-Auth-Token: 0000" \
1141
         https://pithos.dev.grnet.gr/v1/user/pithos
1142

  
1143
* List objects in a container (extended reply) ::
1144

  
1145
    curl -X GET -D - \
1146
         -H "X-Auth-Token: 0000" \
1147
         https://pithos.dev.grnet.gr/v1/user/pithos?format=json
1148

  
1149
  It is recommended that extended replies are cached and subsequent requests utilize the ``If-Modified-Since`` header.
1150

  
1151
* List metadata keys used by objects in a container
1152

  
1153
  Will be in the ``X-Container-Object-Meta`` reply header, included in container information or object list (``HEAD`` or ``GET``). (**TBD**)
1154

  
1155
* List objects in a container having a specific meta defined ::
1156

  
1157
    curl -X GET -D - \
1158
         -H "X-Auth-Token: 0000" \
1159
         https://pithos.dev.grnet.gr/v1/user/pithos?meta=favorites
1160

  
1161
* Retrieve an object ::
1162

  
1163
    curl -X GET -D - \
1164
         -H "X-Auth-Token: 0000" \
1165
         https://pithos.dev.grnet.gr/v1/user/pithos/README.txt
1166

  
1167
* Retrieve an object (specific ranges of data) ::
1168

  
1169
    curl -X GET -D - \
1170
         -H "X-Auth-Token: 0000" \
1171
         -H "Range: bytes=0-9" \
1172
         https://pithos.dev.grnet.gr/v1/user/pithos/README.txt
1173

  
1174
  This will return the first 10 bytes. To get the first 10, bytes 30-39 and the last 100 use ``Range: bytes=0-9,30-39,-100``.
1175

  
1176
* Add a new object (folder type) (**TBD**) ::
1177

  
1178
    curl -X PUT -D - \
1179
         -H "X-Auth-Token: 0000" \
1180
         -H "Content-Type: application/directory" \
1181
         https://pithos.dev.grnet.gr/v1/user/pithos/folder
1182

  
1183
* Add a new object ::
1184

  
1185
    curl -X PUT -D - \
1186
         -H "X-Auth-Token: 0000" \
1187
         -H "Content-Type: text/plain" \
1188
         -T EXAMPLE.txt
1189
         https://pithos.dev.grnet.gr/v1/user/pithos/folder/EXAMPLE.txt
1190

  
1191
* Update an object ::
1192

  
1193
    curl -X POST -D - \
1194
         -H "X-Auth-Token: 0000" \
1195
         -H "Content-Length: 10" \
1196
         -H "Content-Type: application/octet-stream" \
1197
         -H "Content-Range: bytes 10-19/*" \
1198
         -d "0123456789" \
1199
         https://pithos.dev.grnet.gr/v1/user/folder/EXAMPLE.txt
1200

  
1201
  This will update bytes 10-19 with the data specified.
1202

  
1203
* Update an object (append) ::
1204

  
1205
    curl -X POST -D - \
1206
         -H "X-Auth-Token: 0000" \
1207
         -H "Content-Length: 10" \
1208
         -H "Content-Type: application/octet-stream" \
1209
         -H "Content-Range: bytes */*" \
1210
         -d "0123456789" \
1211
         https://pithos.dev.grnet.gr/v1/user/folder/EXAMPLE.txt
1212

  
1213
* Update an object (truncate) ::
1214

  
1215
    curl -X POST -D - \
1216
         -H "X-Auth-Token: 0000" \
1217
         -H "X-Source-Object: /folder/EXAMPLE.txt" \
1218
         -H "Content-Range: bytes 0-0/*" \
1219
         -H "X-Object-Bytes: 0" \
1220
         https://pithos.dev.grnet.gr/v1/user/folder/EXAMPLE.txt
1221

  
1222
  This will truncate the object to 0 bytes.
1223

  
1224
* Add object metadata ::
1225

  
1226
    curl -X POST -D - \
1227
         -H "X-Auth-Token: 0000" \
1228
         -H "X-Object-Meta-First: first_meta_value" \
1229
         -H "X-Object-Meta-Second: second_meta_value" \
1230
         https://pithos.dev.grnet.gr/v1/user/folder/EXAMPLE.txt
1231

  
1232
* Delete object metadata ::
1233

  
1234
    curl -X POST -D - \
1235
         -H "X-Auth-Token: 0000" \
1236
         -H "X-Object-Meta-First: first_meta_value" \
1237
         https://pithos.dev.grnet.gr/v1/user/folder/EXAMPLE.txt
1238

  
1239
  Metadata can only be "set". To delete ``X-Object-Meta-Second``, reset all metadata.
1240

  
1241
* Delete an object ::
1242

  
1243
    curl -X DELETE -D - \
1244
         -H "X-Auth-Token: 0000" \
1245
         https://pithos.dev.grnet.gr/v1/user/folder/EXAMPLE.txt
b/docs/pithos.rst
1
.. _pithos:
1
Object Storage Service (Pithos+)
2
================================
2 3

  
3
File Storage Service (pithos+)
4
Pithos+ is an online storage service based on the OpenStack Object
5
Storage API with several important extensions. It uses a
6
block-based mechanism to allow users to upload, download, and share
7
files, keep different versions of a file, and attach policies to them.
8
It follows a layered, modular implementation. Pithos+ was designed to
9
be used as a storage service by the total set of the Greek research
10
and academic community (counting tens of thousands of users) but is
11
free and open to use by anybody, under a BSD-2 clause license.
12

  
13
A presentation of Pithos+ features and architecture is :download:`here <pithos-plus.pdf>`.
14

  
15
Introduction
16
------------
17

  
18
In 2008 the Greek Research and Technology Network (GRNET) decided
19
to offer an online storage service to the Greek research and academic
20
community. The service, called Pithos, was implemented in 2008-2009,
21
and was made available in spring 2009. It now has more than
22
12,000 users.
23

  
24
In 2011 GRNET decided to offer a new, evolved online storage
25
service, to be called Pithos+. Pithos+ is designed to address the
26
main requirements expressed by the Pithos users in the first two years of
27
operation:
28

  
29
* Provide both a web-based client and native desktop clients for
30
  the most common operating systems.
31
* Allow not only uploading, downloading, and sharing, but also
32
  synchronization capabilities so that uses are able to select folders
33
  and have then synchronized automatically with their online accounts.
34
* Allow uploading of large files, regardless of browser
35
  capabilities (depending on the version,  browsers may place a 2
36
  GBytes upload limit).
37
* Improve upload speed; not an issue as long as the user is on a
38
  computer connected to the GRNET backbone, but it becomes important
39
  over ADSL connections.
40
* Allow access by
41
  non-Shibboleth (http://shibboleth.internet2.edu/).
42
  accounts. Pithos delegates user authentication to the Greek
43
  Shibboleth federation, in which all research and academic
44
  institutions belong. However, it is desirable to have the option to
45
  open up Pithos to non-Shibboleth authenticated users as well.
46
* Use open standards as far as possible.   
47

  
48
In what follows we describe the main features of Pithos+, the elements
49
of its design and the capabilities it affords. We touch on related
50
work and we provide some discussion on our experiences and thoughts on
51
the future.
52

  
53
Pithos+ Features
54
----------------
55

  
56
Pithos+ is based on the OpenStack Object Storage API (Pithos
57
used a home-grown API). We decided to adopt an open standard
58
API in order to leverage existing clients that implement the
59
API. In this way, a user can access Pithos+ with a standard
60
OpenStack client - although users will want to use a Pithos+ client to
61
use features going beyond those offered by the OpenStack API.
62
The strategy paid off during Pithos+ development itself, as we were
63
able to access and test the service with existing clients, while also
64
developing new clients based on open source OpenStack clients.
65

  
66
The major extensions on the OpenStack API are:
67

  
68
* The use of block-based storage in lieu of an object-based one.
69
  OpenStack stores objects, which may be files, but this is not
70
  necessary - large files (longer than 5GBytes), for instance, must be
71
  stored as a series of distinct objects accompanied by a manifest.
72
  Pithos+ stores blocks, so objects can be of unlimited size.
73
* Permissions on individual files and folders. Note that folders
74
  do not exist in the OpenStack API, but are simulated by
75
  appropriate conventions, an approach we have kept in Pithos+ to
76
  avoid incompatibility.
77
* Fully-versioned objects.
78
* Metadata-based queries. Users are free to set metadata on their
79
  objects, and they can list objects meeting metadata criteria.
80
* Policies, such as whether to enable object versioning and to
81
  enforce quotas. This is particularly important for sharing object
82
  containers, since the user may want to avoid running out of space
83
  because of collaborators writing in the shared storage.
84
* Partial upload and download based on HTTP request
85
  headers and parameters.
86
* Object updates, where data may even come from other objects
87
  already stored in Pithos+. This allows users to compose objects from
88
  other objects without uploading data.
89
* All objects are assigned UUIDs on creation, which can be
90
  used to reference them regardless of their path location.
91

  
92
Pithos+ Design
93
--------------
94

  
95
Pithos+ is built on a layered architecture (see Figure).
96
The Pithos+ server speaks HTTP with the outside world. The HTTP
97
operations implement an extended OpenStack Object Storage API.
98
The back end is a library meant to be used by internal code and
99
other front ends. For instance, the back end library, apart from being
100
used in Pithos+ for implementing the OpenStack Object Storage API,
101
is also used in our implementation of the OpenStack Image
102
Service API. Moreover, the back end library allows specification
103
of different namespaces for metadata, so that the same object can be
104
viewed by different front end APIs with different sets of
105
metadata. Hence the same object can be viewed as a file in Pithos+,
106
with one set of metadata, or as an image with a different set of
107
metadata, in our implementation of the OpenStack Image Service.
108

  
109
The data component provides storage of block and the information
110
needed to retrieve them, while the metadata component is a database of
111
nodes and permissions. At the current implementation, data is saved to
112
the filesystem and metadata in an SQL database. In the future,
113
data will be saved to some distributed block storage (we are currently
114
evaluating RADOS - http://ceph.newdream.net/category/rados), and metadata to a NoSQL database.
115

  
116
.. image:: images/pithos-layers.png
117

  
118
Block-based Storage for the Client
119
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
120

  
121
Since an object is saved as a set of blocks in Pithos+, object
122
operations are no longer required to refer to the whole object. We can
123
handle parts of objects as needed when uploading, downloading, or
124
copying and moving data.
125

  
126
In particular, a client, provided it has access permissions, can
127
download data from Pithos+ by issuing a ``GET`` request on an
128
object. If the request includes the ``hashmap`` parameter, then the
129
request refers to a hashmap, that is, a set containing the
130
object's block hashes. The reply is of the form::
131

  
132
    {"block_hash": "sha1", 
133
     "hashes": ["7295c41da03d7f916440b98e32c4a2a39351546c", ...],
134
     "block_size":131072,
135
     "bytes": 242}
136

  
137
The client can then compare the hashmap with the hashmap computed from
138
the local file. Any missing parts can be downloaded with ``GET``
139
requests with an additional ``Range`` header containing the hashes
140
of the blocks to be retrieved. The integrity of the file can be
141
checked against the ``X-Object-Hash`` header, returned by the
142
server and containing the root Merkle hash of the object's
143
hashmap.
144

  
145
When uploading a file to Pithos+, only the missing blocks will be
146
submitted to the server, with the following algorithm:
147

  
148
* Calculate the hash value for each block of the object to be
149
  uploaded.
150
* Send a hashmap ``PUT`` request for the object. This is a
151
  ``PUT`` request with a ``hashmap`` request parameter appended
152
  to it. If the parameter is not present, the object's data (or part
153
  of it) is provided with the request. If the parameter is present,
154
  the object hashmap is provided with the request.
155
* If the server responds with status 201 (Created), the blocks are
156
  already on the server and we do not need to do anything more.
157
* If the server responds with status 409 (Conflict), the server’s
158
  response body contains the hashes of the blocks that do not exist on
159
  the server. Then, for each hash value in the server’s response (or all
160
  hashes together) send a ``POST`` request to the server with the
161
  block's data.
162

  
163
In effect, we are deduplicating data based on their block hashes,
164
transparently to the users. This results to perceived instantaneous
165
uploads when material is already present in Pithos+ storage.
166

  
167
Block-based Storage Processing
4 168
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5 169

  
6
Pithos+ is the synnefo File Storage Service and implements the OpenStack Object
7
Storage API + synnefo extensions.
170
Hashmaps themselves are saved in blocks. All blocks are persisted to
171
storage using content-based addressing. It follows that to read a
172
file, Pithos+ performs the following operations:
8 173

  
174
* The client issues a request to get a file, via HTTP ``GET``.
175
* The API front end asks from the back end the metadata
176
  of the object.
177
* The back end checks the permissions of the object and, if they
178
  allow access to it, returns the object's metadata.
179
* The front end evaluates any HTTP headers (such as
180
  ``If-Modified-Since``, ``If-Match``, etc.).
181
* If the preconditions are met, the API front end requests
182
  from the back end the object's hashmap (hashmaps are indexed by the
183
  full path).
184
* The back end will read and return to the API front end the
185
  object's hashmap from the underlying storage.
186
* Depending on the HTTP ``Range`` header, the 
187
  API front end asks from the back end the required blocks, giving
188
  their corresponding hashes.
189
* The back end fetches the blocks from the underlying storage,
190
  passes them to the API front end, which returns them to the client.
9 191

  
10
Introduction
11
============
12

  
13
Pithos is a storage service implemented by GRNET (http://www.grnet.gr). Data is
14
stored as objects, organized in containers, belonging to an account. This
15
hierarchy of storage layers has been inspired by the OpenStack Object Storage
16
(OOS) API and similar CloudFiles API by Rackspace. The Pithos API follows the
17
OOS API as closely as possible. One of the design requirements has been to be
18
able to use Pithos with clients built for the OOS, without changes.
19

  
20
However, to be able to take full advantage of the Pithos infrastructure, client
21
software should be aware of the extensions that differentiate Pithos from OOS.
22
Pithos objects can be updated, or appended to. Pithos will store sharing
23
permissions per object and enforce corresponding authorization policies.
24
Automatic version management, allows taking account and container listings back
25
in time, as well as reading previous instances of objects.
26

  
27
The storage backend of Pithos is block oriented, permitting efficient,
28
deduplicated data placement. The block structure of objects is exposed at the
29
API layer, in order to encourage external software to implement advanced data
30
management operations.
31

  
32

  
33
Pithos Users and Authentication
34
===============================
35

  
36
In Pithos, each user is uniquely identified by a token. All API requests
37
require a token and each token is internally resolved to an account string. The
38
API uses the account string to identify the user's own files, thus whether a
39
request is local or cross-account.
40

  
41
Pithos does not keep a user database. For development and testing purposes,
42
user identifiers and their corresponding tokens can be defined in the settings
43
file. However, Pithos is designed with an external authentication service in
44
mind. This service must handle the details of validating user credentials and
45
communicate with Pithos via a middleware software component that, given a
46
token, fills in the internal request account variable.
47

  
48
Client software using Pithos, if not already knowing a user's identifier and
49
token, should forward to the ``/login`` URI. The Pithos server, depending on
50
its configuration will redirect to the appropriate login page.
51

  
52
The login URI accepts the following parameters:
53

  
54
======================  =========================
55
Request Parameter Name  Value
56
======================  =========================
57
next                    The URI to redirect to when the process is finished
58
renew                   Force token renewal (no value parameter)
59
force                   Force logout current user (no value parameter)
60
======================  =========================
61

  
62
When done with logging in, the service's login URI should redirect to the URI
63
provided with ``next``, adding ``user`` and ``token`` parameters, which contain
64
the account and token fields respectively.
65

  
66
A user management service that implements a login URI according to these
67
conventions is Astakos.
68

  
69

  
70
Pithos+ Architecture
71
====================
192
Saving data from the client to the server is done in several different
193
ways.
194

  
195
First, a regular HTTP ``PUT`` is the reverse of the HTTP ``GET``.
196
The client sends the full object to the API front end.
197
The API front end splits the object to blocks. It sends each
198
block to the back end, which calculates its hash and saves it to
199
storage. When the hashmap is complete, the API front end commands
200
the back end to create a new object with the created hashmap and any
201
associated metadata.
202

  
203
Secondly, the client may send to the API front end a hashmap and
204
any associated metadata, with a special formatted HTTP ``PUT``,
205
using an appropriate URL parameter. In this case, if the
206
back end can find the requested blocks, the object will be created as
207
previously, otherwise it will report back the list of missing blocks,
208
which will be passed back to the client. The client then may send the
209
missing blocks by issuing an HTTP ``POST`` and then retry the
210
HTTP ``PUT`` for the hashmap. This allows for very fast uploads,
211
since it may happen that no real data uploading takes place, if the
212
blocks are already in data storage.
213

  
214
Copying objects does not involve data copying, but is performed by
215
associating the object's hashmap with the new path. Moving objects, as
216
in OpenStack, is a copy followed by a delete, again with no real data
217
being moved.
218

  
219
Updates to an existing object, which are not offered by OpenStack, are
220
implemented by issuing an HTTP ``POST`` request including the
221
offset and the length of the data. The API front end requests
222
from the back end the hashmap of the existing object. Depending on the
223
offset of the update (whether it falls within block boundaries or not)
224
the front end will ask the back end to update or create new blocks. At
225
the end, the front end will save the updated hashmap. It is also
226
possible to pass a parameter to HTTP ``POST`` to specify that the
227
data will come from another object, instead of being uploaded by the
228
client. 
229

  
230
Pithos+ Back End Nodes
231
^^^^^^^^^^^^^^^^^^^^^^
232

  
233
Pithos+ organizes entities in a tree hierarchy, with one tree node per
234
path entry (see Figure). Nodes can be accounts,
235
containers, and objects. A user may have multiple
236
accounts, each account may have multiple containers, and each
237
container may have multiple objects. An object may have multiple
238
versions, and each version of an object has properties (a set of fixed
239
metadata, like size and mtime) and arbitrary metadata.
240

  
241
.. image:: images/pithos-backend-nodes.png
242

  
243
The tree hierarchy has up to three levels, since, following the
244
OpenStack API, everything is stored as an object in a container.
245
The notion of folders or directories is through conventions that
246
simulate pseudo-hierarchical folders. In particular, object names that
247
contain the forward slash character and have an accompanying marker
248
object with a ``Content-Type: application/directory`` as part of
249
their metadata can be treated as directories by Pithos+ clients. Each
250
node corresponds to a unique path, and we keep its parent in the
251
account/container/object hierarchy (that is, all objects have a
252
container as their parent).
253

  
254
Pithos+ Back End Versions
255
^^^^^^^^^^^^^^^^^^^^^^^^^
256

  
257
For each object version we keep the root Merkle hash of the object it
258
refers to, the size of the object, the last modification time and the
259
user that modified the file, and its cluster. A version belongs
260
to one of the following three clusters (see Figure):
261

  
262
  * normal, which are the current versions
263
  * history, which contain the previous versions of an object
264
  * deleted, which contain objects that have been deleted
265

  
266
.. image:: images/pithos-backend-versions.png
267

  
268
This versioning allows Pithos+ to offer to its user time-based
269
contents listing of their accounts. In effect, this also allows them
270
to take their containers back in time. This is implemented
271
conceptually by taking a vertical line in the Figure and
272
presenting to the user the state on the left side of the line.
273

  
274
Pithos+ Back End Permissions
275
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
276

  
277
Pithos+ recognizes read and write permissions, which can be granted to
278
individual users or groups of users. Groups as collections of users
279
created at the account level by users themselves, and are flat - a
280
group cannot contain or reference another group. Ownership of a file
281
cannot be delegated.
282

  
283
Pithos+ also recognizes a "public" permission, which means that the
284
object is readable by all. When an object is made public, it is
285
assigned a URL that can be used to access the object from
286
outside Pithos+ even by non-Pithos+ users. 
287

  
288
Permissions can be assigned to objects, which may be actual files, or
289
directories. When listing objects, the back end uses the permissions as
290
filters for what to display, so that users will see only objects to
291
which they have access. Depending on the type of the object, the
292
filter may be exact (plain object), or a prefix (like ``path/*`` for
293
a directory). When accessing objects, the same rules are used to
294
decide whether to allow the user to read or modify the object or
295
directory. If no permissions apply to a specific object, the back end
296
searches for permissions on the closest directory sharing a common
297
prefix with the object.
298

  
299
Related Work
300
------------
301

  
302
Commercial cloud providers have been offering online storage for quite
303
some time, but the code is not published and we do not know the
304
details of their implementation. Rackspace has used the OpenStack
305
Object Storage in its Cloud Files product. Swift is an open source
306
implementation of the OpenStack Object Storage API. As we have
307
pointed out, our implementation maintains compatibility with
308
OpenStack, while offering additional capabilities.
309

  
310
Discussion
311
----------
312

  
313
Pithos+ is implemented in Python as a Django application. We use SQLAlchemy
314
as a database abstraction layer. It is currently about
315
17,000 lines of code, and it has taken about 50 person months of
316
development effort. This development was done from scratch, with no
317
reuse of the existing Pithos code. That service was written in the
318
J2EE framework. We decided to move from J2EE to Python for
319
two reasons: first, J2EE proved an overkill for the original
320
Pithos service in its years of operation. Secondly, Python was
321
strongly favored by the GRNET operations team, who are the people
322
taking responsibility for running the service - so their voice is
323
heard.
324

  
325
Apart from the service implementation, which we have been describing
326
here, we have parallel development lines for native client tools on
327
different operating systems (MS-Windows, Mac OS X, Android, and iOS).
328
The desktop clients allow synchronization with local directories, a
329
feature that existing users of Pithos have been asking for, probably
330
influenced by services like DropBox. These clients are offered in
331
parallel to the standard Pithos+ interface, which is a web application
332
build on top of the API front end - we treat our own web
333
application as just another client that has to go through the API
334
front end, without granting it access to the back end directly.
335

  
336
We are carrying the idea of our own services being clients to Pithos+
337
a step further, with new projects we have in our pipeline, in which a
338
digital repository service will be built on top of Pithos+. It will
339
use again the API front end, so that repository users will have
340
all Pithos+ capabilities, and on top of them we will build additional
341
functionality such as full text search, Dublin Core metadata storage
342
and querying, streaming, and so on.
343

  
344
At the time of this writing (March 2012) Pithos+ is in alpha,
345
available to users by invitation. We will extend our user base as we
346
move to beta in the coming months, and to our full set of users in the
347
second half of 2012. We are eager to see how our ideas fare as we will
348
scaling up, and we welcome any comments and suggestions.
349

  
350
Acknowledgments
351
---------------
352

  
353
Pithos+ is financially supported by Grant 296114, "Advanced Computing
354
Services for the Research and Academic Community", of the Greek
355
National Strategic Reference Framework.
356

  
357
Availability
358
------------
359

  
360
The Pithos+ code is available under a BSD 2-clause license from:
361
https://code.grnet.gr/projects/pithos/repository
362

  
363
The code can also be accessed from its source repository:
364
https://code.grnet.gr/git/pithos/
365

  
366
More information and documentation is available at:
367
http://docs.dev.grnet.gr/pithos/latest/index.html
b/docs/quick-install-admin-guide.rst
11 11
have the following services running:
12 12

  
13 13
 * Identity Management (Astakos)
14
 * File Storage Service (Pithos+)
14
 * Object Storage Service (Pithos+)
15 15
 * Compute Service (Cyclades)
16 16
 * Image Registry Service (Plankton)
17 17

  
......
20 20
The Volume Storage Service (Archipelago) and the Billing Service (Aquarium) are
21 21
not released yet.
22 22

  
23
If you just want to install the File Storage Service (Pithos+), follow the guide
23
If you just want to install the Object Storage Service (Pithos+), follow the guide
24 24
and just stop after the "Testing of Pithos+" section.
25 25

  
26 26

  
b/docs/quick-install-intgrt-guide.rst
15 15
installation, you will have the following services running:
16 16

  
17 17
 * Identity Management (Astakos)
18
 * File Storage Service (Pithos+)
18
 * Object Storage Service (Pithos+)
19 19
 * Compute Service (Cyclades)
20 20
 * Image Registry Service (Plankton)
21 21

  
......
24 24
The Volume Storage Service (Archipelago) and the Billing Service (Aquarium) are
25 25
not released yet.
26 26

  
27
If you just want to install the File Storage Service (Pithos+), follow the guide
27
If you just want to install the Object Storage Service (Pithos+), follow the guide
28 28
and just stop after the "Testing of Pithos+" section.
29 29

  
30 30
Building a dev environment

Also available in: Unified diff