root / docs / pithos.rst @ 2e1e6844
History | View | Annotate | Download (17 kB)
1 | dad708b4 | Antony Chazapis | Object Storage Service (Pithos+) |
---|---|---|---|
2 | dad708b4 | Antony Chazapis | ================================ |
3 | bc055d09 | Constantinos Venetsanopoulos | |
4 | dad708b4 | Antony Chazapis | Pithos+ is an online storage service based on the OpenStack Object |
5 | dad708b4 | Antony Chazapis | Storage API with several important extensions. It uses a |
6 | dad708b4 | Antony Chazapis | block-based mechanism to allow users to upload, download, and share |
7 | dad708b4 | Antony Chazapis | files, keep different versions of a file, and attach policies to them. |
8 | dad708b4 | Antony Chazapis | It follows a layered, modular implementation. Pithos+ was designed to |
9 | dad708b4 | Antony Chazapis | be used as a storage service by the total set of the Greek research |
10 | dad708b4 | Antony Chazapis | and academic community (counting tens of thousands of users) but is |
11 | dad708b4 | Antony Chazapis | free and open to use by anybody, under a BSD-2 clause license. |
12 | dad708b4 | Antony Chazapis | |
13 | dad708b4 | Antony Chazapis | A presentation of Pithos+ features and architecture is :download:`here <pithos-plus.pdf>`. |
14 | dad708b4 | Antony Chazapis | |
15 | dad708b4 | Antony Chazapis | Introduction |
16 | dad708b4 | Antony Chazapis | ------------ |
17 | dad708b4 | Antony Chazapis | |
18 | dad708b4 | Antony Chazapis | In 2008 the Greek Research and Technology Network (GRNET) decided |
19 | dad708b4 | Antony Chazapis | to offer an online storage service to the Greek research and academic |
20 | dad708b4 | Antony Chazapis | community. The service, called Pithos, was implemented in 2008-2009, |
21 | dad708b4 | Antony Chazapis | and was made available in spring 2009. It now has more than |
22 | dad708b4 | Antony Chazapis | 12,000 users. |
23 | dad708b4 | Antony Chazapis | |
24 | dad708b4 | Antony Chazapis | In 2011 GRNET decided to offer a new, evolved online storage |
25 | dad708b4 | Antony Chazapis | service, to be called Pithos+. Pithos+ is designed to address the |
26 | dad708b4 | Antony Chazapis | main requirements expressed by the Pithos users in the first two years of |
27 | dad708b4 | Antony Chazapis | operation: |
28 | dad708b4 | Antony Chazapis | |
29 | dad708b4 | Antony Chazapis | * Provide both a web-based client and native desktop clients for |
30 | dad708b4 | Antony Chazapis | the most common operating systems. |
31 | dad708b4 | Antony Chazapis | * Allow not only uploading, downloading, and sharing, but also |
32 | dad708b4 | Antony Chazapis | synchronization capabilities so that uses are able to select folders |
33 | dad708b4 | Antony Chazapis | and have then synchronized automatically with their online accounts. |
34 | dad708b4 | Antony Chazapis | * Allow uploading of large files, regardless of browser |
35 | dad708b4 | Antony Chazapis | capabilities (depending on the version, browsers may place a 2 |
36 | dad708b4 | Antony Chazapis | GBytes upload limit). |
37 | dad708b4 | Antony Chazapis | * Improve upload speed; not an issue as long as the user is on a |
38 | dad708b4 | Antony Chazapis | computer connected to the GRNET backbone, but it becomes important |
39 | dad708b4 | Antony Chazapis | over ADSL connections. |
40 | dad708b4 | Antony Chazapis | * Allow access by |
41 | dad708b4 | Antony Chazapis | non-Shibboleth (http://shibboleth.internet2.edu/). |
42 | dad708b4 | Antony Chazapis | accounts. Pithos delegates user authentication to the Greek |
43 | dad708b4 | Antony Chazapis | Shibboleth federation, in which all research and academic |
44 | dad708b4 | Antony Chazapis | institutions belong. However, it is desirable to have the option to |
45 | dad708b4 | Antony Chazapis | open up Pithos to non-Shibboleth authenticated users as well. |
46 | dad708b4 | Antony Chazapis | * Use open standards as far as possible. |
47 | dad708b4 | Antony Chazapis | |
48 | dad708b4 | Antony Chazapis | In what follows we describe the main features of Pithos+, the elements |
49 | dad708b4 | Antony Chazapis | of its design and the capabilities it affords. We touch on related |
50 | dad708b4 | Antony Chazapis | work and we provide some discussion on our experiences and thoughts on |
51 | dad708b4 | Antony Chazapis | the future. |
52 | dad708b4 | Antony Chazapis | |
53 | dad708b4 | Antony Chazapis | Pithos+ Features |
54 | dad708b4 | Antony Chazapis | ---------------- |
55 | dad708b4 | Antony Chazapis | |
56 | dad708b4 | Antony Chazapis | Pithos+ is based on the OpenStack Object Storage API (Pithos |
57 | dad708b4 | Antony Chazapis | used a home-grown API). We decided to adopt an open standard |
58 | dad708b4 | Antony Chazapis | API in order to leverage existing clients that implement the |
59 | dad708b4 | Antony Chazapis | API. In this way, a user can access Pithos+ with a standard |
60 | dad708b4 | Antony Chazapis | OpenStack client - although users will want to use a Pithos+ client to |
61 | dad708b4 | Antony Chazapis | use features going beyond those offered by the OpenStack API. |
62 | dad708b4 | Antony Chazapis | The strategy paid off during Pithos+ development itself, as we were |
63 | dad708b4 | Antony Chazapis | able to access and test the service with existing clients, while also |
64 | dad708b4 | Antony Chazapis | developing new clients based on open source OpenStack clients. |
65 | dad708b4 | Antony Chazapis | |
66 | dad708b4 | Antony Chazapis | The major extensions on the OpenStack API are: |
67 | dad708b4 | Antony Chazapis | |
68 | dad708b4 | Antony Chazapis | * The use of block-based storage in lieu of an object-based one. |
69 | dad708b4 | Antony Chazapis | OpenStack stores objects, which may be files, but this is not |
70 | dad708b4 | Antony Chazapis | necessary - large files (longer than 5GBytes), for instance, must be |
71 | dad708b4 | Antony Chazapis | stored as a series of distinct objects accompanied by a manifest. |
72 | dad708b4 | Antony Chazapis | Pithos+ stores blocks, so objects can be of unlimited size. |
73 | dad708b4 | Antony Chazapis | * Permissions on individual files and folders. Note that folders |
74 | dad708b4 | Antony Chazapis | do not exist in the OpenStack API, but are simulated by |
75 | dad708b4 | Antony Chazapis | appropriate conventions, an approach we have kept in Pithos+ to |
76 | dad708b4 | Antony Chazapis | avoid incompatibility. |
77 | dad708b4 | Antony Chazapis | * Fully-versioned objects. |
78 | dad708b4 | Antony Chazapis | * Metadata-based queries. Users are free to set metadata on their |
79 | dad708b4 | Antony Chazapis | objects, and they can list objects meeting metadata criteria. |
80 | dad708b4 | Antony Chazapis | * Policies, such as whether to enable object versioning and to |
81 | dad708b4 | Antony Chazapis | enforce quotas. This is particularly important for sharing object |
82 | dad708b4 | Antony Chazapis | containers, since the user may want to avoid running out of space |
83 | dad708b4 | Antony Chazapis | because of collaborators writing in the shared storage. |
84 | dad708b4 | Antony Chazapis | * Partial upload and download based on HTTP request |
85 | dad708b4 | Antony Chazapis | headers and parameters. |
86 | dad708b4 | Antony Chazapis | * Object updates, where data may even come from other objects |
87 | dad708b4 | Antony Chazapis | already stored in Pithos+. This allows users to compose objects from |
88 | dad708b4 | Antony Chazapis | other objects without uploading data. |
89 | dad708b4 | Antony Chazapis | * All objects are assigned UUIDs on creation, which can be |
90 | dad708b4 | Antony Chazapis | used to reference them regardless of their path location. |
91 | dad708b4 | Antony Chazapis | |
92 | dad708b4 | Antony Chazapis | Pithos+ Design |
93 | dad708b4 | Antony Chazapis | -------------- |
94 | dad708b4 | Antony Chazapis | |
95 | dad708b4 | Antony Chazapis | Pithos+ is built on a layered architecture (see Figure). |
96 | dad708b4 | Antony Chazapis | The Pithos+ server speaks HTTP with the outside world. The HTTP |
97 | dad708b4 | Antony Chazapis | operations implement an extended OpenStack Object Storage API. |
98 | dad708b4 | Antony Chazapis | The back end is a library meant to be used by internal code and |
99 | dad708b4 | Antony Chazapis | other front ends. For instance, the back end library, apart from being |
100 | dad708b4 | Antony Chazapis | used in Pithos+ for implementing the OpenStack Object Storage API, |
101 | dad708b4 | Antony Chazapis | is also used in our implementation of the OpenStack Image |
102 | dad708b4 | Antony Chazapis | Service API. Moreover, the back end library allows specification |
103 | dad708b4 | Antony Chazapis | of different namespaces for metadata, so that the same object can be |
104 | dad708b4 | Antony Chazapis | viewed by different front end APIs with different sets of |
105 | dad708b4 | Antony Chazapis | metadata. Hence the same object can be viewed as a file in Pithos+, |
106 | dad708b4 | Antony Chazapis | with one set of metadata, or as an image with a different set of |
107 | dad708b4 | Antony Chazapis | metadata, in our implementation of the OpenStack Image Service. |
108 | dad708b4 | Antony Chazapis | |
109 | dad708b4 | Antony Chazapis | The data component provides storage of block and the information |
110 | dad708b4 | Antony Chazapis | needed to retrieve them, while the metadata component is a database of |
111 | dad708b4 | Antony Chazapis | nodes and permissions. At the current implementation, data is saved to |
112 | dad708b4 | Antony Chazapis | the filesystem and metadata in an SQL database. In the future, |
113 | dad708b4 | Antony Chazapis | data will be saved to some distributed block storage (we are currently |
114 | dad708b4 | Antony Chazapis | evaluating RADOS - http://ceph.newdream.net/category/rados), and metadata to a NoSQL database. |
115 | dad708b4 | Antony Chazapis | |
116 | dad708b4 | Antony Chazapis | .. image:: images/pithos-layers.png |
117 | dad708b4 | Antony Chazapis | |
118 | dad708b4 | Antony Chazapis | Block-based Storage for the Client |
119 | dad708b4 | Antony Chazapis | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
120 | dad708b4 | Antony Chazapis | |
121 | dad708b4 | Antony Chazapis | Since an object is saved as a set of blocks in Pithos+, object |
122 | dad708b4 | Antony Chazapis | operations are no longer required to refer to the whole object. We can |
123 | dad708b4 | Antony Chazapis | handle parts of objects as needed when uploading, downloading, or |
124 | dad708b4 | Antony Chazapis | copying and moving data. |
125 | dad708b4 | Antony Chazapis | |
126 | dad708b4 | Antony Chazapis | In particular, a client, provided it has access permissions, can |
127 | dad708b4 | Antony Chazapis | download data from Pithos+ by issuing a ``GET`` request on an |
128 | dad708b4 | Antony Chazapis | object. If the request includes the ``hashmap`` parameter, then the |
129 | dad708b4 | Antony Chazapis | request refers to a hashmap, that is, a set containing the |
130 | dad708b4 | Antony Chazapis | object's block hashes. The reply is of the form:: |
131 | dad708b4 | Antony Chazapis | |
132 | dad708b4 | Antony Chazapis | {"block_hash": "sha1", |
133 | dad708b4 | Antony Chazapis | "hashes": ["7295c41da03d7f916440b98e32c4a2a39351546c", ...], |
134 | dad708b4 | Antony Chazapis | "block_size":131072, |
135 | dad708b4 | Antony Chazapis | "bytes": 242} |
136 | dad708b4 | Antony Chazapis | |
137 | dad708b4 | Antony Chazapis | The client can then compare the hashmap with the hashmap computed from |
138 | dad708b4 | Antony Chazapis | the local file. Any missing parts can be downloaded with ``GET`` |
139 | dad708b4 | Antony Chazapis | requests with an additional ``Range`` header containing the hashes |
140 | dad708b4 | Antony Chazapis | of the blocks to be retrieved. The integrity of the file can be |
141 | dad708b4 | Antony Chazapis | checked against the ``X-Object-Hash`` header, returned by the |
142 | dad708b4 | Antony Chazapis | server and containing the root Merkle hash of the object's |
143 | dad708b4 | Antony Chazapis | hashmap. |
144 | dad708b4 | Antony Chazapis | |
145 | dad708b4 | Antony Chazapis | When uploading a file to Pithos+, only the missing blocks will be |
146 | dad708b4 | Antony Chazapis | submitted to the server, with the following algorithm: |
147 | dad708b4 | Antony Chazapis | |
148 | dad708b4 | Antony Chazapis | * Calculate the hash value for each block of the object to be |
149 | dad708b4 | Antony Chazapis | uploaded. |
150 | dad708b4 | Antony Chazapis | * Send a hashmap ``PUT`` request for the object. This is a |
151 | dad708b4 | Antony Chazapis | ``PUT`` request with a ``hashmap`` request parameter appended |
152 | dad708b4 | Antony Chazapis | to it. If the parameter is not present, the object's data (or part |
153 | dad708b4 | Antony Chazapis | of it) is provided with the request. If the parameter is present, |
154 | dad708b4 | Antony Chazapis | the object hashmap is provided with the request. |
155 | dad708b4 | Antony Chazapis | * If the server responds with status 201 (Created), the blocks are |
156 | dad708b4 | Antony Chazapis | already on the server and we do not need to do anything more. |
157 | dad708b4 | Antony Chazapis | * If the server responds with status 409 (Conflict), the server’s |
158 | dad708b4 | Antony Chazapis | response body contains the hashes of the blocks that do not exist on |
159 | dad708b4 | Antony Chazapis | the server. Then, for each hash value in the server’s response (or all |
160 | dad708b4 | Antony Chazapis | hashes together) send a ``POST`` request to the server with the |
161 | dad708b4 | Antony Chazapis | block's data. |
162 | dad708b4 | Antony Chazapis | |
163 | dad708b4 | Antony Chazapis | In effect, we are deduplicating data based on their block hashes, |
164 | dad708b4 | Antony Chazapis | transparently to the users. This results to perceived instantaneous |
165 | dad708b4 | Antony Chazapis | uploads when material is already present in Pithos+ storage. |
166 | dad708b4 | Antony Chazapis | |
167 | dad708b4 | Antony Chazapis | Block-based Storage Processing |
168 | bc055d09 | Constantinos Venetsanopoulos | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
169 | bc055d09 | Constantinos Venetsanopoulos | |
170 | dad708b4 | Antony Chazapis | Hashmaps themselves are saved in blocks. All blocks are persisted to |
171 | dad708b4 | Antony Chazapis | storage using content-based addressing. It follows that to read a |
172 | dad708b4 | Antony Chazapis | file, Pithos+ performs the following operations: |
173 | 8f9976c6 | Constantinos Venetsanopoulos | |
174 | dad708b4 | Antony Chazapis | * The client issues a request to get a file, via HTTP ``GET``. |
175 | dad708b4 | Antony Chazapis | * The API front end asks from the back end the metadata |
176 | dad708b4 | Antony Chazapis | of the object. |
177 | dad708b4 | Antony Chazapis | * The back end checks the permissions of the object and, if they |
178 | dad708b4 | Antony Chazapis | allow access to it, returns the object's metadata. |
179 | dad708b4 | Antony Chazapis | * The front end evaluates any HTTP headers (such as |
180 | dad708b4 | Antony Chazapis | ``If-Modified-Since``, ``If-Match``, etc.). |
181 | dad708b4 | Antony Chazapis | * If the preconditions are met, the API front end requests |
182 | dad708b4 | Antony Chazapis | from the back end the object's hashmap (hashmaps are indexed by the |
183 | dad708b4 | Antony Chazapis | full path). |
184 | dad708b4 | Antony Chazapis | * The back end will read and return to the API front end the |
185 | dad708b4 | Antony Chazapis | object's hashmap from the underlying storage. |
186 | dad708b4 | Antony Chazapis | * Depending on the HTTP ``Range`` header, the |
187 | dad708b4 | Antony Chazapis | API front end asks from the back end the required blocks, giving |
188 | dad708b4 | Antony Chazapis | their corresponding hashes. |
189 | dad708b4 | Antony Chazapis | * The back end fetches the blocks from the underlying storage, |
190 | dad708b4 | Antony Chazapis | passes them to the API front end, which returns them to the client. |
191 | 8f9976c6 | Constantinos Venetsanopoulos | |
192 | dad708b4 | Antony Chazapis | Saving data from the client to the server is done in several different |
193 | dad708b4 | Antony Chazapis | ways. |
194 | dad708b4 | Antony Chazapis | |
195 | dad708b4 | Antony Chazapis | First, a regular HTTP ``PUT`` is the reverse of the HTTP ``GET``. |
196 | dad708b4 | Antony Chazapis | The client sends the full object to the API front end. |
197 | dad708b4 | Antony Chazapis | The API front end splits the object to blocks. It sends each |
198 | dad708b4 | Antony Chazapis | block to the back end, which calculates its hash and saves it to |
199 | dad708b4 | Antony Chazapis | storage. When the hashmap is complete, the API front end commands |
200 | dad708b4 | Antony Chazapis | the back end to create a new object with the created hashmap and any |
201 | dad708b4 | Antony Chazapis | associated metadata. |
202 | dad708b4 | Antony Chazapis | |
203 | dad708b4 | Antony Chazapis | Secondly, the client may send to the API front end a hashmap and |
204 | dad708b4 | Antony Chazapis | any associated metadata, with a special formatted HTTP ``PUT``, |
205 | dad708b4 | Antony Chazapis | using an appropriate URL parameter. In this case, if the |
206 | dad708b4 | Antony Chazapis | back end can find the requested blocks, the object will be created as |
207 | dad708b4 | Antony Chazapis | previously, otherwise it will report back the list of missing blocks, |
208 | dad708b4 | Antony Chazapis | which will be passed back to the client. The client then may send the |
209 | dad708b4 | Antony Chazapis | missing blocks by issuing an HTTP ``POST`` and then retry the |
210 | dad708b4 | Antony Chazapis | HTTP ``PUT`` for the hashmap. This allows for very fast uploads, |
211 | dad708b4 | Antony Chazapis | since it may happen that no real data uploading takes place, if the |
212 | dad708b4 | Antony Chazapis | blocks are already in data storage. |
213 | dad708b4 | Antony Chazapis | |
214 | dad708b4 | Antony Chazapis | Copying objects does not involve data copying, but is performed by |
215 | dad708b4 | Antony Chazapis | associating the object's hashmap with the new path. Moving objects, as |
216 | dad708b4 | Antony Chazapis | in OpenStack, is a copy followed by a delete, again with no real data |
217 | dad708b4 | Antony Chazapis | being moved. |
218 | dad708b4 | Antony Chazapis | |
219 | dad708b4 | Antony Chazapis | Updates to an existing object, which are not offered by OpenStack, are |
220 | dad708b4 | Antony Chazapis | implemented by issuing an HTTP ``POST`` request including the |
221 | dad708b4 | Antony Chazapis | offset and the length of the data. The API front end requests |
222 | dad708b4 | Antony Chazapis | from the back end the hashmap of the existing object. Depending on the |
223 | dad708b4 | Antony Chazapis | offset of the update (whether it falls within block boundaries or not) |
224 | dad708b4 | Antony Chazapis | the front end will ask the back end to update or create new blocks. At |
225 | dad708b4 | Antony Chazapis | the end, the front end will save the updated hashmap. It is also |
226 | dad708b4 | Antony Chazapis | possible to pass a parameter to HTTP ``POST`` to specify that the |
227 | dad708b4 | Antony Chazapis | data will come from another object, instead of being uploaded by the |
228 | dad708b4 | Antony Chazapis | client. |
229 | dad708b4 | Antony Chazapis | |
230 | dad708b4 | Antony Chazapis | Pithos+ Back End Nodes |
231 | dad708b4 | Antony Chazapis | ^^^^^^^^^^^^^^^^^^^^^^ |
232 | dad708b4 | Antony Chazapis | |
233 | dad708b4 | Antony Chazapis | Pithos+ organizes entities in a tree hierarchy, with one tree node per |
234 | dad708b4 | Antony Chazapis | path entry (see Figure). Nodes can be accounts, |
235 | dad708b4 | Antony Chazapis | containers, and objects. A user may have multiple |
236 | dad708b4 | Antony Chazapis | accounts, each account may have multiple containers, and each |
237 | dad708b4 | Antony Chazapis | container may have multiple objects. An object may have multiple |
238 | dad708b4 | Antony Chazapis | versions, and each version of an object has properties (a set of fixed |
239 | dad708b4 | Antony Chazapis | metadata, like size and mtime) and arbitrary metadata. |
240 | dad708b4 | Antony Chazapis | |
241 | dad708b4 | Antony Chazapis | .. image:: images/pithos-backend-nodes.png |
242 | dad708b4 | Antony Chazapis | |
243 | dad708b4 | Antony Chazapis | The tree hierarchy has up to three levels, since, following the |
244 | dad708b4 | Antony Chazapis | OpenStack API, everything is stored as an object in a container. |
245 | dad708b4 | Antony Chazapis | The notion of folders or directories is through conventions that |
246 | dad708b4 | Antony Chazapis | simulate pseudo-hierarchical folders. In particular, object names that |
247 | dad708b4 | Antony Chazapis | contain the forward slash character and have an accompanying marker |
248 | dad708b4 | Antony Chazapis | object with a ``Content-Type: application/directory`` as part of |
249 | dad708b4 | Antony Chazapis | their metadata can be treated as directories by Pithos+ clients. Each |
250 | dad708b4 | Antony Chazapis | node corresponds to a unique path, and we keep its parent in the |
251 | dad708b4 | Antony Chazapis | account/container/object hierarchy (that is, all objects have a |
252 | dad708b4 | Antony Chazapis | container as their parent). |
253 | dad708b4 | Antony Chazapis | |
254 | dad708b4 | Antony Chazapis | Pithos+ Back End Versions |
255 | dad708b4 | Antony Chazapis | ^^^^^^^^^^^^^^^^^^^^^^^^^ |
256 | dad708b4 | Antony Chazapis | |
257 | dad708b4 | Antony Chazapis | For each object version we keep the root Merkle hash of the object it |
258 | dad708b4 | Antony Chazapis | refers to, the size of the object, the last modification time and the |
259 | dad708b4 | Antony Chazapis | user that modified the file, and its cluster. A version belongs |
260 | dad708b4 | Antony Chazapis | to one of the following three clusters (see Figure): |
261 | dad708b4 | Antony Chazapis | |
262 | dad708b4 | Antony Chazapis | * normal, which are the current versions |
263 | dad708b4 | Antony Chazapis | * history, which contain the previous versions of an object |
264 | dad708b4 | Antony Chazapis | * deleted, which contain objects that have been deleted |
265 | dad708b4 | Antony Chazapis | |
266 | dad708b4 | Antony Chazapis | .. image:: images/pithos-backend-versions.png |
267 | dad708b4 | Antony Chazapis | |
268 | dad708b4 | Antony Chazapis | This versioning allows Pithos+ to offer to its user time-based |
269 | dad708b4 | Antony Chazapis | contents listing of their accounts. In effect, this also allows them |
270 | dad708b4 | Antony Chazapis | to take their containers back in time. This is implemented |
271 | dad708b4 | Antony Chazapis | conceptually by taking a vertical line in the Figure and |
272 | dad708b4 | Antony Chazapis | presenting to the user the state on the left side of the line. |
273 | dad708b4 | Antony Chazapis | |
274 | dad708b4 | Antony Chazapis | Pithos+ Back End Permissions |
275 | dad708b4 | Antony Chazapis | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
276 | dad708b4 | Antony Chazapis | |
277 | dad708b4 | Antony Chazapis | Pithos+ recognizes read and write permissions, which can be granted to |
278 | dad708b4 | Antony Chazapis | individual users or groups of users. Groups as collections of users |
279 | dad708b4 | Antony Chazapis | created at the account level by users themselves, and are flat - a |
280 | dad708b4 | Antony Chazapis | group cannot contain or reference another group. Ownership of a file |
281 | dad708b4 | Antony Chazapis | cannot be delegated. |
282 | dad708b4 | Antony Chazapis | |
283 | dad708b4 | Antony Chazapis | Pithos+ also recognizes a "public" permission, which means that the |
284 | dad708b4 | Antony Chazapis | object is readable by all. When an object is made public, it is |
285 | dad708b4 | Antony Chazapis | assigned a URL that can be used to access the object from |
286 | dad708b4 | Antony Chazapis | outside Pithos+ even by non-Pithos+ users. |
287 | dad708b4 | Antony Chazapis | |
288 | dad708b4 | Antony Chazapis | Permissions can be assigned to objects, which may be actual files, or |
289 | dad708b4 | Antony Chazapis | directories. When listing objects, the back end uses the permissions as |
290 | dad708b4 | Antony Chazapis | filters for what to display, so that users will see only objects to |
291 | dad708b4 | Antony Chazapis | which they have access. Depending on the type of the object, the |
292 | dad708b4 | Antony Chazapis | filter may be exact (plain object), or a prefix (like ``path/*`` for |
293 | dad708b4 | Antony Chazapis | a directory). When accessing objects, the same rules are used to |
294 | dad708b4 | Antony Chazapis | decide whether to allow the user to read or modify the object or |
295 | dad708b4 | Antony Chazapis | directory. If no permissions apply to a specific object, the back end |
296 | dad708b4 | Antony Chazapis | searches for permissions on the closest directory sharing a common |
297 | dad708b4 | Antony Chazapis | prefix with the object. |
298 | dad708b4 | Antony Chazapis | |
299 | dad708b4 | Antony Chazapis | Related Work |
300 | dad708b4 | Antony Chazapis | ------------ |
301 | dad708b4 | Antony Chazapis | |
302 | dad708b4 | Antony Chazapis | Commercial cloud providers have been offering online storage for quite |
303 | dad708b4 | Antony Chazapis | some time, but the code is not published and we do not know the |
304 | dad708b4 | Antony Chazapis | details of their implementation. Rackspace has used the OpenStack |
305 | dad708b4 | Antony Chazapis | Object Storage in its Cloud Files product. Swift is an open source |
306 | dad708b4 | Antony Chazapis | implementation of the OpenStack Object Storage API. As we have |
307 | dad708b4 | Antony Chazapis | pointed out, our implementation maintains compatibility with |
308 | dad708b4 | Antony Chazapis | OpenStack, while offering additional capabilities. |
309 | dad708b4 | Antony Chazapis | |
310 | dad708b4 | Antony Chazapis | Discussion |
311 | dad708b4 | Antony Chazapis | ---------- |
312 | dad708b4 | Antony Chazapis | |
313 | dad708b4 | Antony Chazapis | Pithos+ is implemented in Python as a Django application. We use SQLAlchemy |
314 | dad708b4 | Antony Chazapis | as a database abstraction layer. It is currently about |
315 | dad708b4 | Antony Chazapis | 17,000 lines of code, and it has taken about 50 person months of |
316 | dad708b4 | Antony Chazapis | development effort. This development was done from scratch, with no |
317 | dad708b4 | Antony Chazapis | reuse of the existing Pithos code. That service was written in the |
318 | dad708b4 | Antony Chazapis | J2EE framework. We decided to move from J2EE to Python for |
319 | dad708b4 | Antony Chazapis | two reasons: first, J2EE proved an overkill for the original |
320 | dad708b4 | Antony Chazapis | Pithos service in its years of operation. Secondly, Python was |
321 | dad708b4 | Antony Chazapis | strongly favored by the GRNET operations team, who are the people |
322 | dad708b4 | Antony Chazapis | taking responsibility for running the service - so their voice is |
323 | dad708b4 | Antony Chazapis | heard. |
324 | dad708b4 | Antony Chazapis | |
325 | dad708b4 | Antony Chazapis | Apart from the service implementation, which we have been describing |
326 | dad708b4 | Antony Chazapis | here, we have parallel development lines for native client tools on |
327 | dad708b4 | Antony Chazapis | different operating systems (MS-Windows, Mac OS X, Android, and iOS). |
328 | dad708b4 | Antony Chazapis | The desktop clients allow synchronization with local directories, a |
329 | dad708b4 | Antony Chazapis | feature that existing users of Pithos have been asking for, probably |
330 | dad708b4 | Antony Chazapis | influenced by services like DropBox. These clients are offered in |
331 | dad708b4 | Antony Chazapis | parallel to the standard Pithos+ interface, which is a web application |
332 | dad708b4 | Antony Chazapis | build on top of the API front end - we treat our own web |
333 | dad708b4 | Antony Chazapis | application as just another client that has to go through the API |
334 | dad708b4 | Antony Chazapis | front end, without granting it access to the back end directly. |
335 | dad708b4 | Antony Chazapis | |
336 | dad708b4 | Antony Chazapis | We are carrying the idea of our own services being clients to Pithos+ |
337 | dad708b4 | Antony Chazapis | a step further, with new projects we have in our pipeline, in which a |
338 | dad708b4 | Antony Chazapis | digital repository service will be built on top of Pithos+. It will |
339 | dad708b4 | Antony Chazapis | use again the API front end, so that repository users will have |
340 | dad708b4 | Antony Chazapis | all Pithos+ capabilities, and on top of them we will build additional |
341 | dad708b4 | Antony Chazapis | functionality such as full text search, Dublin Core metadata storage |
342 | dad708b4 | Antony Chazapis | and querying, streaming, and so on. |
343 | dad708b4 | Antony Chazapis | |
344 | dad708b4 | Antony Chazapis | At the time of this writing (March 2012) Pithos+ is in alpha, |
345 | dad708b4 | Antony Chazapis | available to users by invitation. We will extend our user base as we |
346 | dad708b4 | Antony Chazapis | move to beta in the coming months, and to our full set of users in the |
347 | dad708b4 | Antony Chazapis | second half of 2012. We are eager to see how our ideas fare as we will |
348 | dad708b4 | Antony Chazapis | scaling up, and we welcome any comments and suggestions. |
349 | dad708b4 | Antony Chazapis | |
350 | dad708b4 | Antony Chazapis | Acknowledgments |
351 | dad708b4 | Antony Chazapis | --------------- |
352 | dad708b4 | Antony Chazapis | |
353 | dad708b4 | Antony Chazapis | Pithos+ is financially supported by Grant 296114, "Advanced Computing |
354 | dad708b4 | Antony Chazapis | Services for the Research and Academic Community", of the Greek |
355 | dad708b4 | Antony Chazapis | National Strategic Reference Framework. |
356 | dad708b4 | Antony Chazapis | |
357 | dad708b4 | Antony Chazapis | Availability |
358 | dad708b4 | Antony Chazapis | ------------ |
359 | dad708b4 | Antony Chazapis | |
360 | dad708b4 | Antony Chazapis | The Pithos+ code is available under a BSD 2-clause license from: |
361 | dad708b4 | Antony Chazapis | https://code.grnet.gr/projects/pithos/repository |
362 | dad708b4 | Antony Chazapis | |
363 | dad708b4 | Antony Chazapis | The code can also be accessed from its source repository: |
364 | dad708b4 | Antony Chazapis | https://code.grnet.gr/git/pithos/ |
365 | dad708b4 | Antony Chazapis | |
366 | dad708b4 | Antony Chazapis | More information and documentation is available at: |
367 | dad708b4 | Antony Chazapis | http://docs.dev.grnet.gr/pithos/latest/index.html |