Statistics
| Branch: | Tag: | Revision:

root / docs / pithos.rst @ 2e1e6844

History | View | Annotate | Download (17 kB)

1 dad708b4 Antony Chazapis
Object Storage Service (Pithos+)
2 dad708b4 Antony Chazapis
================================
3 bc055d09 Constantinos Venetsanopoulos
4 dad708b4 Antony Chazapis
Pithos+ is an online storage service based on the OpenStack Object
5 dad708b4 Antony Chazapis
Storage API with several important extensions. It uses a
6 dad708b4 Antony Chazapis
block-based mechanism to allow users to upload, download, and share
7 dad708b4 Antony Chazapis
files, keep different versions of a file, and attach policies to them.
8 dad708b4 Antony Chazapis
It follows a layered, modular implementation. Pithos+ was designed to
9 dad708b4 Antony Chazapis
be used as a storage service by the total set of the Greek research
10 dad708b4 Antony Chazapis
and academic community (counting tens of thousands of users) but is
11 dad708b4 Antony Chazapis
free and open to use by anybody, under a BSD-2 clause license.
12 dad708b4 Antony Chazapis
13 dad708b4 Antony Chazapis
A presentation of Pithos+ features and architecture is :download:`here <pithos-plus.pdf>`.
14 dad708b4 Antony Chazapis
15 dad708b4 Antony Chazapis
Introduction
16 dad708b4 Antony Chazapis
------------
17 dad708b4 Antony Chazapis
18 dad708b4 Antony Chazapis
In 2008 the Greek Research and Technology Network (GRNET) decided
19 dad708b4 Antony Chazapis
to offer an online storage service to the Greek research and academic
20 dad708b4 Antony Chazapis
community. The service, called Pithos, was implemented in 2008-2009,
21 dad708b4 Antony Chazapis
and was made available in spring 2009. It now has more than
22 dad708b4 Antony Chazapis
12,000 users.
23 dad708b4 Antony Chazapis
24 dad708b4 Antony Chazapis
In 2011 GRNET decided to offer a new, evolved online storage
25 dad708b4 Antony Chazapis
service, to be called Pithos+. Pithos+ is designed to address the
26 dad708b4 Antony Chazapis
main requirements expressed by the Pithos users in the first two years of
27 dad708b4 Antony Chazapis
operation:
28 dad708b4 Antony Chazapis
29 dad708b4 Antony Chazapis
* Provide both a web-based client and native desktop clients for
30 dad708b4 Antony Chazapis
  the most common operating systems.
31 dad708b4 Antony Chazapis
* Allow not only uploading, downloading, and sharing, but also
32 dad708b4 Antony Chazapis
  synchronization capabilities so that uses are able to select folders
33 dad708b4 Antony Chazapis
  and have then synchronized automatically with their online accounts.
34 dad708b4 Antony Chazapis
* Allow uploading of large files, regardless of browser
35 dad708b4 Antony Chazapis
  capabilities (depending on the version,  browsers may place a 2
36 dad708b4 Antony Chazapis
  GBytes upload limit).
37 dad708b4 Antony Chazapis
* Improve upload speed; not an issue as long as the user is on a
38 dad708b4 Antony Chazapis
  computer connected to the GRNET backbone, but it becomes important
39 dad708b4 Antony Chazapis
  over ADSL connections.
40 dad708b4 Antony Chazapis
* Allow access by
41 dad708b4 Antony Chazapis
  non-Shibboleth (http://shibboleth.internet2.edu/).
42 dad708b4 Antony Chazapis
  accounts. Pithos delegates user authentication to the Greek
43 dad708b4 Antony Chazapis
  Shibboleth federation, in which all research and academic
44 dad708b4 Antony Chazapis
  institutions belong. However, it is desirable to have the option to
45 dad708b4 Antony Chazapis
  open up Pithos to non-Shibboleth authenticated users as well.
46 dad708b4 Antony Chazapis
* Use open standards as far as possible.   
47 dad708b4 Antony Chazapis
48 dad708b4 Antony Chazapis
In what follows we describe the main features of Pithos+, the elements
49 dad708b4 Antony Chazapis
of its design and the capabilities it affords. We touch on related
50 dad708b4 Antony Chazapis
work and we provide some discussion on our experiences and thoughts on
51 dad708b4 Antony Chazapis
the future.
52 dad708b4 Antony Chazapis
53 dad708b4 Antony Chazapis
Pithos+ Features
54 dad708b4 Antony Chazapis
----------------
55 dad708b4 Antony Chazapis
56 dad708b4 Antony Chazapis
Pithos+ is based on the OpenStack Object Storage API (Pithos
57 dad708b4 Antony Chazapis
used a home-grown API). We decided to adopt an open standard
58 dad708b4 Antony Chazapis
API in order to leverage existing clients that implement the
59 dad708b4 Antony Chazapis
API. In this way, a user can access Pithos+ with a standard
60 dad708b4 Antony Chazapis
OpenStack client - although users will want to use a Pithos+ client to
61 dad708b4 Antony Chazapis
use features going beyond those offered by the OpenStack API.
62 dad708b4 Antony Chazapis
The strategy paid off during Pithos+ development itself, as we were
63 dad708b4 Antony Chazapis
able to access and test the service with existing clients, while also
64 dad708b4 Antony Chazapis
developing new clients based on open source OpenStack clients.
65 dad708b4 Antony Chazapis
66 dad708b4 Antony Chazapis
The major extensions on the OpenStack API are:
67 dad708b4 Antony Chazapis
68 dad708b4 Antony Chazapis
* The use of block-based storage in lieu of an object-based one.
69 dad708b4 Antony Chazapis
  OpenStack stores objects, which may be files, but this is not
70 dad708b4 Antony Chazapis
  necessary - large files (longer than 5GBytes), for instance, must be
71 dad708b4 Antony Chazapis
  stored as a series of distinct objects accompanied by a manifest.
72 dad708b4 Antony Chazapis
  Pithos+ stores blocks, so objects can be of unlimited size.
73 dad708b4 Antony Chazapis
* Permissions on individual files and folders. Note that folders
74 dad708b4 Antony Chazapis
  do not exist in the OpenStack API, but are simulated by
75 dad708b4 Antony Chazapis
  appropriate conventions, an approach we have kept in Pithos+ to
76 dad708b4 Antony Chazapis
  avoid incompatibility.
77 dad708b4 Antony Chazapis
* Fully-versioned objects.
78 dad708b4 Antony Chazapis
* Metadata-based queries. Users are free to set metadata on their
79 dad708b4 Antony Chazapis
  objects, and they can list objects meeting metadata criteria.
80 dad708b4 Antony Chazapis
* Policies, such as whether to enable object versioning and to
81 dad708b4 Antony Chazapis
  enforce quotas. This is particularly important for sharing object
82 dad708b4 Antony Chazapis
  containers, since the user may want to avoid running out of space
83 dad708b4 Antony Chazapis
  because of collaborators writing in the shared storage.
84 dad708b4 Antony Chazapis
* Partial upload and download based on HTTP request
85 dad708b4 Antony Chazapis
  headers and parameters.
86 dad708b4 Antony Chazapis
* Object updates, where data may even come from other objects
87 dad708b4 Antony Chazapis
  already stored in Pithos+. This allows users to compose objects from
88 dad708b4 Antony Chazapis
  other objects without uploading data.
89 dad708b4 Antony Chazapis
* All objects are assigned UUIDs on creation, which can be
90 dad708b4 Antony Chazapis
  used to reference them regardless of their path location.
91 dad708b4 Antony Chazapis
92 dad708b4 Antony Chazapis
Pithos+ Design
93 dad708b4 Antony Chazapis
--------------
94 dad708b4 Antony Chazapis
95 dad708b4 Antony Chazapis
Pithos+ is built on a layered architecture (see Figure).
96 dad708b4 Antony Chazapis
The Pithos+ server speaks HTTP with the outside world. The HTTP
97 dad708b4 Antony Chazapis
operations implement an extended OpenStack Object Storage API.
98 dad708b4 Antony Chazapis
The back end is a library meant to be used by internal code and
99 dad708b4 Antony Chazapis
other front ends. For instance, the back end library, apart from being
100 dad708b4 Antony Chazapis
used in Pithos+ for implementing the OpenStack Object Storage API,
101 dad708b4 Antony Chazapis
is also used in our implementation of the OpenStack Image
102 dad708b4 Antony Chazapis
Service API. Moreover, the back end library allows specification
103 dad708b4 Antony Chazapis
of different namespaces for metadata, so that the same object can be
104 dad708b4 Antony Chazapis
viewed by different front end APIs with different sets of
105 dad708b4 Antony Chazapis
metadata. Hence the same object can be viewed as a file in Pithos+,
106 dad708b4 Antony Chazapis
with one set of metadata, or as an image with a different set of
107 dad708b4 Antony Chazapis
metadata, in our implementation of the OpenStack Image Service.
108 dad708b4 Antony Chazapis
109 dad708b4 Antony Chazapis
The data component provides storage of block and the information
110 dad708b4 Antony Chazapis
needed to retrieve them, while the metadata component is a database of
111 dad708b4 Antony Chazapis
nodes and permissions. At the current implementation, data is saved to
112 dad708b4 Antony Chazapis
the filesystem and metadata in an SQL database. In the future,
113 dad708b4 Antony Chazapis
data will be saved to some distributed block storage (we are currently
114 dad708b4 Antony Chazapis
evaluating RADOS - http://ceph.newdream.net/category/rados), and metadata to a NoSQL database.
115 dad708b4 Antony Chazapis
116 dad708b4 Antony Chazapis
.. image:: images/pithos-layers.png
117 dad708b4 Antony Chazapis
118 dad708b4 Antony Chazapis
Block-based Storage for the Client
119 dad708b4 Antony Chazapis
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
120 dad708b4 Antony Chazapis
121 dad708b4 Antony Chazapis
Since an object is saved as a set of blocks in Pithos+, object
122 dad708b4 Antony Chazapis
operations are no longer required to refer to the whole object. We can
123 dad708b4 Antony Chazapis
handle parts of objects as needed when uploading, downloading, or
124 dad708b4 Antony Chazapis
copying and moving data.
125 dad708b4 Antony Chazapis
126 dad708b4 Antony Chazapis
In particular, a client, provided it has access permissions, can
127 dad708b4 Antony Chazapis
download data from Pithos+ by issuing a ``GET`` request on an
128 dad708b4 Antony Chazapis
object. If the request includes the ``hashmap`` parameter, then the
129 dad708b4 Antony Chazapis
request refers to a hashmap, that is, a set containing the
130 dad708b4 Antony Chazapis
object's block hashes. The reply is of the form::
131 dad708b4 Antony Chazapis
132 dad708b4 Antony Chazapis
    {"block_hash": "sha1", 
133 dad708b4 Antony Chazapis
     "hashes": ["7295c41da03d7f916440b98e32c4a2a39351546c", ...],
134 dad708b4 Antony Chazapis
     "block_size":131072,
135 dad708b4 Antony Chazapis
     "bytes": 242}
136 dad708b4 Antony Chazapis
137 dad708b4 Antony Chazapis
The client can then compare the hashmap with the hashmap computed from
138 dad708b4 Antony Chazapis
the local file. Any missing parts can be downloaded with ``GET``
139 dad708b4 Antony Chazapis
requests with an additional ``Range`` header containing the hashes
140 dad708b4 Antony Chazapis
of the blocks to be retrieved. The integrity of the file can be
141 dad708b4 Antony Chazapis
checked against the ``X-Object-Hash`` header, returned by the
142 dad708b4 Antony Chazapis
server and containing the root Merkle hash of the object's
143 dad708b4 Antony Chazapis
hashmap.
144 dad708b4 Antony Chazapis
145 dad708b4 Antony Chazapis
When uploading a file to Pithos+, only the missing blocks will be
146 dad708b4 Antony Chazapis
submitted to the server, with the following algorithm:
147 dad708b4 Antony Chazapis
148 dad708b4 Antony Chazapis
* Calculate the hash value for each block of the object to be
149 dad708b4 Antony Chazapis
  uploaded.
150 dad708b4 Antony Chazapis
* Send a hashmap ``PUT`` request for the object. This is a
151 dad708b4 Antony Chazapis
  ``PUT`` request with a ``hashmap`` request parameter appended
152 dad708b4 Antony Chazapis
  to it. If the parameter is not present, the object's data (or part
153 dad708b4 Antony Chazapis
  of it) is provided with the request. If the parameter is present,
154 dad708b4 Antony Chazapis
  the object hashmap is provided with the request.
155 dad708b4 Antony Chazapis
* If the server responds with status 201 (Created), the blocks are
156 dad708b4 Antony Chazapis
  already on the server and we do not need to do anything more.
157 dad708b4 Antony Chazapis
* If the server responds with status 409 (Conflict), the server’s
158 dad708b4 Antony Chazapis
  response body contains the hashes of the blocks that do not exist on
159 dad708b4 Antony Chazapis
  the server. Then, for each hash value in the server’s response (or all
160 dad708b4 Antony Chazapis
  hashes together) send a ``POST`` request to the server with the
161 dad708b4 Antony Chazapis
  block's data.
162 dad708b4 Antony Chazapis
163 dad708b4 Antony Chazapis
In effect, we are deduplicating data based on their block hashes,
164 dad708b4 Antony Chazapis
transparently to the users. This results to perceived instantaneous
165 dad708b4 Antony Chazapis
uploads when material is already present in Pithos+ storage.
166 dad708b4 Antony Chazapis
167 dad708b4 Antony Chazapis
Block-based Storage Processing
168 bc055d09 Constantinos Venetsanopoulos
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
169 bc055d09 Constantinos Venetsanopoulos
170 dad708b4 Antony Chazapis
Hashmaps themselves are saved in blocks. All blocks are persisted to
171 dad708b4 Antony Chazapis
storage using content-based addressing. It follows that to read a
172 dad708b4 Antony Chazapis
file, Pithos+ performs the following operations:
173 8f9976c6 Constantinos Venetsanopoulos
174 dad708b4 Antony Chazapis
* The client issues a request to get a file, via HTTP ``GET``.
175 dad708b4 Antony Chazapis
* The API front end asks from the back end the metadata
176 dad708b4 Antony Chazapis
  of the object.
177 dad708b4 Antony Chazapis
* The back end checks the permissions of the object and, if they
178 dad708b4 Antony Chazapis
  allow access to it, returns the object's metadata.
179 dad708b4 Antony Chazapis
* The front end evaluates any HTTP headers (such as
180 dad708b4 Antony Chazapis
  ``If-Modified-Since``, ``If-Match``, etc.).
181 dad708b4 Antony Chazapis
* If the preconditions are met, the API front end requests
182 dad708b4 Antony Chazapis
  from the back end the object's hashmap (hashmaps are indexed by the
183 dad708b4 Antony Chazapis
  full path).
184 dad708b4 Antony Chazapis
* The back end will read and return to the API front end the
185 dad708b4 Antony Chazapis
  object's hashmap from the underlying storage.
186 dad708b4 Antony Chazapis
* Depending on the HTTP ``Range`` header, the 
187 dad708b4 Antony Chazapis
  API front end asks from the back end the required blocks, giving
188 dad708b4 Antony Chazapis
  their corresponding hashes.
189 dad708b4 Antony Chazapis
* The back end fetches the blocks from the underlying storage,
190 dad708b4 Antony Chazapis
  passes them to the API front end, which returns them to the client.
191 8f9976c6 Constantinos Venetsanopoulos
192 dad708b4 Antony Chazapis
Saving data from the client to the server is done in several different
193 dad708b4 Antony Chazapis
ways.
194 dad708b4 Antony Chazapis
195 dad708b4 Antony Chazapis
First, a regular HTTP ``PUT`` is the reverse of the HTTP ``GET``.
196 dad708b4 Antony Chazapis
The client sends the full object to the API front end.
197 dad708b4 Antony Chazapis
The API front end splits the object to blocks. It sends each
198 dad708b4 Antony Chazapis
block to the back end, which calculates its hash and saves it to
199 dad708b4 Antony Chazapis
storage. When the hashmap is complete, the API front end commands
200 dad708b4 Antony Chazapis
the back end to create a new object with the created hashmap and any
201 dad708b4 Antony Chazapis
associated metadata.
202 dad708b4 Antony Chazapis
203 dad708b4 Antony Chazapis
Secondly, the client may send to the API front end a hashmap and
204 dad708b4 Antony Chazapis
any associated metadata, with a special formatted HTTP ``PUT``,
205 dad708b4 Antony Chazapis
using an appropriate URL parameter. In this case, if the
206 dad708b4 Antony Chazapis
back end can find the requested blocks, the object will be created as
207 dad708b4 Antony Chazapis
previously, otherwise it will report back the list of missing blocks,
208 dad708b4 Antony Chazapis
which will be passed back to the client. The client then may send the
209 dad708b4 Antony Chazapis
missing blocks by issuing an HTTP ``POST`` and then retry the
210 dad708b4 Antony Chazapis
HTTP ``PUT`` for the hashmap. This allows for very fast uploads,
211 dad708b4 Antony Chazapis
since it may happen that no real data uploading takes place, if the
212 dad708b4 Antony Chazapis
blocks are already in data storage.
213 dad708b4 Antony Chazapis
214 dad708b4 Antony Chazapis
Copying objects does not involve data copying, but is performed by
215 dad708b4 Antony Chazapis
associating the object's hashmap with the new path. Moving objects, as
216 dad708b4 Antony Chazapis
in OpenStack, is a copy followed by a delete, again with no real data
217 dad708b4 Antony Chazapis
being moved.
218 dad708b4 Antony Chazapis
219 dad708b4 Antony Chazapis
Updates to an existing object, which are not offered by OpenStack, are
220 dad708b4 Antony Chazapis
implemented by issuing an HTTP ``POST`` request including the
221 dad708b4 Antony Chazapis
offset and the length of the data. The API front end requests
222 dad708b4 Antony Chazapis
from the back end the hashmap of the existing object. Depending on the
223 dad708b4 Antony Chazapis
offset of the update (whether it falls within block boundaries or not)
224 dad708b4 Antony Chazapis
the front end will ask the back end to update or create new blocks. At
225 dad708b4 Antony Chazapis
the end, the front end will save the updated hashmap. It is also
226 dad708b4 Antony Chazapis
possible to pass a parameter to HTTP ``POST`` to specify that the
227 dad708b4 Antony Chazapis
data will come from another object, instead of being uploaded by the
228 dad708b4 Antony Chazapis
client. 
229 dad708b4 Antony Chazapis
230 dad708b4 Antony Chazapis
Pithos+ Back End Nodes
231 dad708b4 Antony Chazapis
^^^^^^^^^^^^^^^^^^^^^^
232 dad708b4 Antony Chazapis
233 dad708b4 Antony Chazapis
Pithos+ organizes entities in a tree hierarchy, with one tree node per
234 dad708b4 Antony Chazapis
path entry (see Figure). Nodes can be accounts,
235 dad708b4 Antony Chazapis
containers, and objects. A user may have multiple
236 dad708b4 Antony Chazapis
accounts, each account may have multiple containers, and each
237 dad708b4 Antony Chazapis
container may have multiple objects. An object may have multiple
238 dad708b4 Antony Chazapis
versions, and each version of an object has properties (a set of fixed
239 dad708b4 Antony Chazapis
metadata, like size and mtime) and arbitrary metadata.
240 dad708b4 Antony Chazapis
241 dad708b4 Antony Chazapis
.. image:: images/pithos-backend-nodes.png
242 dad708b4 Antony Chazapis
243 dad708b4 Antony Chazapis
The tree hierarchy has up to three levels, since, following the
244 dad708b4 Antony Chazapis
OpenStack API, everything is stored as an object in a container.
245 dad708b4 Antony Chazapis
The notion of folders or directories is through conventions that
246 dad708b4 Antony Chazapis
simulate pseudo-hierarchical folders. In particular, object names that
247 dad708b4 Antony Chazapis
contain the forward slash character and have an accompanying marker
248 dad708b4 Antony Chazapis
object with a ``Content-Type: application/directory`` as part of
249 dad708b4 Antony Chazapis
their metadata can be treated as directories by Pithos+ clients. Each
250 dad708b4 Antony Chazapis
node corresponds to a unique path, and we keep its parent in the
251 dad708b4 Antony Chazapis
account/container/object hierarchy (that is, all objects have a
252 dad708b4 Antony Chazapis
container as their parent).
253 dad708b4 Antony Chazapis
254 dad708b4 Antony Chazapis
Pithos+ Back End Versions
255 dad708b4 Antony Chazapis
^^^^^^^^^^^^^^^^^^^^^^^^^
256 dad708b4 Antony Chazapis
257 dad708b4 Antony Chazapis
For each object version we keep the root Merkle hash of the object it
258 dad708b4 Antony Chazapis
refers to, the size of the object, the last modification time and the
259 dad708b4 Antony Chazapis
user that modified the file, and its cluster. A version belongs
260 dad708b4 Antony Chazapis
to one of the following three clusters (see Figure):
261 dad708b4 Antony Chazapis
262 dad708b4 Antony Chazapis
  * normal, which are the current versions
263 dad708b4 Antony Chazapis
  * history, which contain the previous versions of an object
264 dad708b4 Antony Chazapis
  * deleted, which contain objects that have been deleted
265 dad708b4 Antony Chazapis
266 dad708b4 Antony Chazapis
.. image:: images/pithos-backend-versions.png
267 dad708b4 Antony Chazapis
268 dad708b4 Antony Chazapis
This versioning allows Pithos+ to offer to its user time-based
269 dad708b4 Antony Chazapis
contents listing of their accounts. In effect, this also allows them
270 dad708b4 Antony Chazapis
to take their containers back in time. This is implemented
271 dad708b4 Antony Chazapis
conceptually by taking a vertical line in the Figure and
272 dad708b4 Antony Chazapis
presenting to the user the state on the left side of the line.
273 dad708b4 Antony Chazapis
274 dad708b4 Antony Chazapis
Pithos+ Back End Permissions
275 dad708b4 Antony Chazapis
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
276 dad708b4 Antony Chazapis
277 dad708b4 Antony Chazapis
Pithos+ recognizes read and write permissions, which can be granted to
278 dad708b4 Antony Chazapis
individual users or groups of users. Groups as collections of users
279 dad708b4 Antony Chazapis
created at the account level by users themselves, and are flat - a
280 dad708b4 Antony Chazapis
group cannot contain or reference another group. Ownership of a file
281 dad708b4 Antony Chazapis
cannot be delegated.
282 dad708b4 Antony Chazapis
283 dad708b4 Antony Chazapis
Pithos+ also recognizes a "public" permission, which means that the
284 dad708b4 Antony Chazapis
object is readable by all. When an object is made public, it is
285 dad708b4 Antony Chazapis
assigned a URL that can be used to access the object from
286 dad708b4 Antony Chazapis
outside Pithos+ even by non-Pithos+ users. 
287 dad708b4 Antony Chazapis
288 dad708b4 Antony Chazapis
Permissions can be assigned to objects, which may be actual files, or
289 dad708b4 Antony Chazapis
directories. When listing objects, the back end uses the permissions as
290 dad708b4 Antony Chazapis
filters for what to display, so that users will see only objects to
291 dad708b4 Antony Chazapis
which they have access. Depending on the type of the object, the
292 dad708b4 Antony Chazapis
filter may be exact (plain object), or a prefix (like ``path/*`` for
293 dad708b4 Antony Chazapis
a directory). When accessing objects, the same rules are used to
294 dad708b4 Antony Chazapis
decide whether to allow the user to read or modify the object or
295 dad708b4 Antony Chazapis
directory. If no permissions apply to a specific object, the back end
296 dad708b4 Antony Chazapis
searches for permissions on the closest directory sharing a common
297 dad708b4 Antony Chazapis
prefix with the object.
298 dad708b4 Antony Chazapis
299 dad708b4 Antony Chazapis
Related Work
300 dad708b4 Antony Chazapis
------------
301 dad708b4 Antony Chazapis
302 dad708b4 Antony Chazapis
Commercial cloud providers have been offering online storage for quite
303 dad708b4 Antony Chazapis
some time, but the code is not published and we do not know the
304 dad708b4 Antony Chazapis
details of their implementation. Rackspace has used the OpenStack
305 dad708b4 Antony Chazapis
Object Storage in its Cloud Files product. Swift is an open source
306 dad708b4 Antony Chazapis
implementation of the OpenStack Object Storage API. As we have
307 dad708b4 Antony Chazapis
pointed out, our implementation maintains compatibility with
308 dad708b4 Antony Chazapis
OpenStack, while offering additional capabilities.
309 dad708b4 Antony Chazapis
310 dad708b4 Antony Chazapis
Discussion
311 dad708b4 Antony Chazapis
----------
312 dad708b4 Antony Chazapis
313 dad708b4 Antony Chazapis
Pithos+ is implemented in Python as a Django application. We use SQLAlchemy
314 dad708b4 Antony Chazapis
as a database abstraction layer. It is currently about
315 dad708b4 Antony Chazapis
17,000 lines of code, and it has taken about 50 person months of
316 dad708b4 Antony Chazapis
development effort. This development was done from scratch, with no
317 dad708b4 Antony Chazapis
reuse of the existing Pithos code. That service was written in the
318 dad708b4 Antony Chazapis
J2EE framework. We decided to move from J2EE to Python for
319 dad708b4 Antony Chazapis
two reasons: first, J2EE proved an overkill for the original
320 dad708b4 Antony Chazapis
Pithos service in its years of operation. Secondly, Python was
321 dad708b4 Antony Chazapis
strongly favored by the GRNET operations team, who are the people
322 dad708b4 Antony Chazapis
taking responsibility for running the service - so their voice is
323 dad708b4 Antony Chazapis
heard.
324 dad708b4 Antony Chazapis
325 dad708b4 Antony Chazapis
Apart from the service implementation, which we have been describing
326 dad708b4 Antony Chazapis
here, we have parallel development lines for native client tools on
327 dad708b4 Antony Chazapis
different operating systems (MS-Windows, Mac OS X, Android, and iOS).
328 dad708b4 Antony Chazapis
The desktop clients allow synchronization with local directories, a
329 dad708b4 Antony Chazapis
feature that existing users of Pithos have been asking for, probably
330 dad708b4 Antony Chazapis
influenced by services like DropBox. These clients are offered in
331 dad708b4 Antony Chazapis
parallel to the standard Pithos+ interface, which is a web application
332 dad708b4 Antony Chazapis
build on top of the API front end - we treat our own web
333 dad708b4 Antony Chazapis
application as just another client that has to go through the API
334 dad708b4 Antony Chazapis
front end, without granting it access to the back end directly.
335 dad708b4 Antony Chazapis
336 dad708b4 Antony Chazapis
We are carrying the idea of our own services being clients to Pithos+
337 dad708b4 Antony Chazapis
a step further, with new projects we have in our pipeline, in which a
338 dad708b4 Antony Chazapis
digital repository service will be built on top of Pithos+. It will
339 dad708b4 Antony Chazapis
use again the API front end, so that repository users will have
340 dad708b4 Antony Chazapis
all Pithos+ capabilities, and on top of them we will build additional
341 dad708b4 Antony Chazapis
functionality such as full text search, Dublin Core metadata storage
342 dad708b4 Antony Chazapis
and querying, streaming, and so on.
343 dad708b4 Antony Chazapis
344 dad708b4 Antony Chazapis
At the time of this writing (March 2012) Pithos+ is in alpha,
345 dad708b4 Antony Chazapis
available to users by invitation. We will extend our user base as we
346 dad708b4 Antony Chazapis
move to beta in the coming months, and to our full set of users in the
347 dad708b4 Antony Chazapis
second half of 2012. We are eager to see how our ideas fare as we will
348 dad708b4 Antony Chazapis
scaling up, and we welcome any comments and suggestions.
349 dad708b4 Antony Chazapis
350 dad708b4 Antony Chazapis
Acknowledgments
351 dad708b4 Antony Chazapis
---------------
352 dad708b4 Antony Chazapis
353 dad708b4 Antony Chazapis
Pithos+ is financially supported by Grant 296114, "Advanced Computing
354 dad708b4 Antony Chazapis
Services for the Research and Academic Community", of the Greek
355 dad708b4 Antony Chazapis
National Strategic Reference Framework.
356 dad708b4 Antony Chazapis
357 dad708b4 Antony Chazapis
Availability
358 dad708b4 Antony Chazapis
------------
359 dad708b4 Antony Chazapis
360 dad708b4 Antony Chazapis
The Pithos+ code is available under a BSD 2-clause license from:
361 dad708b4 Antony Chazapis
https://code.grnet.gr/projects/pithos/repository
362 dad708b4 Antony Chazapis
363 dad708b4 Antony Chazapis
The code can also be accessed from its source repository:
364 dad708b4 Antony Chazapis
https://code.grnet.gr/git/pithos/
365 dad708b4 Antony Chazapis
366 dad708b4 Antony Chazapis
More information and documentation is available at:
367 dad708b4 Antony Chazapis
http://docs.dev.grnet.gr/pithos/latest/index.html