Revision e9285524 docs/source/devguide.rst

b/docs/source/devguide.rst
6 6

  
7 7
Pithos is a storage service implemented by GRNET (http://www.grnet.gr). Data is stored as objects, organized in containers, belonging to an account. This hierarchy of storage layers has been inspired by the OpenStack Object Storage (OOS) API and similar CloudFiles API by Rackspace. The Pithos API follows the OOS API as closely as possible. One of the design requirements has been to be able to use Pithos with clients built for the OOS, without changes.
8 8

  
9
However, to be able to take full advantage of the Pithos infrastructure, client software should be aware of the extensions that differentiate Pithos from OOS. Pithos objects can be updated, or appended to. They can also be versioned, meaning that the server will track changes, assign version numbers and allow reading previous instances.
10

  
11
The storage backend of Pithos is block oriented, which allows for efficient, deduplicated data placement. The block structure of objects is exposed at the API layer, in order to encourage external software to implement advanced data management operations.
12

  
9 13
This document's goals are:
10 14

  
11 15
* Define the Pithos ReST API that allows the storage and retrieval of data and metadata via HTTP calls
......
21 25
=========================  ================================
22 26
Revision                   Description
23 27
=========================  ================================
24
0.2 (May 25, 2011)         Add object meta listing and filtering in containers.
28
0.2 (May 29, 2011)         Add object meta listing and filtering in containers.
29
\                          Support for partial object updates through POST.
30
\                          Expose object hashmaps through GET.
25 31
\                          Support for multi-range object GET requests.
26 32
0.1 (May 17, 2011)         Initial release. Based on OpenStack Object Storage Developer Guide API v1 (Apr. 15, 2011).
27 33
=========================  ================================
......
38 44

  
39 45
All requests must include an ``X-Auth-Token``, except from those that refer to publicly available files (**TBD**). The process of obtaining the token is still to be determined (**TBD**).
40 46

  
41
The allowable request operations and corresponding return codes per level are presented in the remainder of this chapter. Common to all requests are the following return codes.
47
The allowable request operations and respective return codes per level are presented in the remainder of this chapter. Common to all requests are the following return codes.
42 48

  
43 49
=========================  ================================
44 50
Return Code                Description
......
253 259
Name                 Description
254 260
===================  ======================================
255 261
name                 The name of the object
256
hash                 The MD5 hash of the object
262
hash                 The ETag of the object
257 263
bytes                The size of the object
258 264
content_type         The MIME content type of the object
259 265
content_encoding     The encoding of the object (optional)
......
359 365
==========================  ===============================
360 366
Reply Header Name           Value
361 367
==========================  ===============================
362
ETag                        The MD5 hash of the object
368
ETag                        The ETag of the object
363 369
Content-Length              The size of the object
364 370
Content-Type                The MIME content type of the object
365 371
Last-Modified               The last object modification date
......
391 397
If-Unmodified-Since   Retrieve if object has not changed since provided timestamp
392 398
====================  ================================
393 399

  
394
The reply is the object's data (or part of it). Object headers (as in a ``HEAD`` request) will also be included.
400
|
401

  
402
======================  ===================================
403
Request Parameter Name  Value
404
======================  ===================================
405
format                  Optional extended reply type (can be ``json`` or ``xml``)
406
======================  ===================================
407

  
408
The reply is the object's data (or part of it), except if a hashmap is requested with the ``format`` parameter. Object headers (as in a ``HEAD`` request) are always included.
409

  
410
Hashmaps expose the underlying storage format of the object:
411

  
412
* Blocksize of 4MB.
413
* Blocks stored indexed by SHA256 hash.
414
* Hash is computed after trimming trailing null bytes.
415

  
416
Example ``format=json`` reply:
417

  
418
::
419

  
420
  {"hashes": ["7295c41da03d7f916440b98e32c4a2a39351546c", ...], "bytes": 24223726}
421

  
422
Example ``format=xml`` reply:
423

  
424
::
425

  
426
  <?xml version="1.0" encoding="UTF-8"?>
427
  <object name="file" bytes="24223726">
428
    <hash>7295c41da03d7f916440b98e32c4a2a39351546c</hash>
429
    <hash>...</hash>
430
  </object>
395 431

  
396 432
The ``Range`` header may include multiple ranges, as outlined in RFC2616. Then the ``Content-Type`` of the reply will be ``multipart/byteranges`` and each part will include a ``Content-Range`` header.
397 433

  
398 434
==========================  ===============================
399 435
Reply Header Name           Value
400 436
==========================  ===============================
401
ETag                        The MD5 hash of the object
437
ETag                        The ETag of the object
402 438
Content-Length              The size of the data returned
403 439
Content-Type                The MIME content type of the object
404 440
Content-Range               The range of data included (only on a single range request)
......
494 530
====================  ================================
495 531
Request Header Name   Value
496 532
====================  ================================
533
Content-Length        The size of the data written (optional, to update)
534
Content-Type          The MIME content type of the object (optional, to update)
535
Content-Range         The range of data supplied (optional, to update)
536
Transfer-Encoding     Set to ``chunked`` to specify incremental uploading (if used, ``Content-Length`` is ignored)
497 537
Content-Encoding      The encoding of the object (optional)
498 538
Content-Disposition   The presentation style of the object (optional)
499 539
X-Object-Manifest     Large object support (optional)
500 540
X-Object-Meta-*       Optional user defined metadata
501 541
====================  ================================
502 542

  
503
No reply content/headers.
543
The ``Content-Encoding``, ``Content-Disposition``, ``X-Object-Manifest`` and ``X-Object-Meta-*`` headers are considered to be user defined metadata. The update operation will overwrite all previous values and remove any keys not supplied.
544

  
545
To update an object:
546

  
547
* Supply ``Content-Length`` (except if using chunked transfers), ``Content-Type`` and ``Content-Range`` headers.
548
* Set ``Content-Type`` to ``application/octet-stream``.
549
* Set ``Content-Range`` as specified in RFC2616, with the following differences:
550

  
551
  * Client software MAY omit ``last-byte-pos`` of if the length of the range being transferred is unknown or difficult to determine.
552
  * Client software SHOULD not specify the ``instance-length`` (use a ``*``), unless there is a reason for performing a size check at the server.
553
* If ``Content-Range`` used has a ``byte-range-resp-spec = *``, data supplied will be appended to the object.
554

  
555
A data update will trigger an ETag change. The new ETag will not correspond to the object's MD5 sum (**TBD**) and will be included in reply headers.
504 556

  
505
The allowed headers are considered to be user defined metadata. The update operation will overwrite all previous values and remove any keys not supplied.
557
No reply content. No reply headers if only metadata is updated.
558

  
559
==========================  ===============================
560
Reply Header Name           Value
561
==========================  ===============================
562
ETag                        The new ETag of the object (data updated)
563
==========================  ===============================
564

  
565
|
506 566

  
507 567
===========================  ==============================
508 568
Return Code                  Description
509 569
===========================  ==============================
510
202 (Accepted)               The request has been accepted
570
202 (Accepted)               The request has been accepted (not a data update)
571
204 (No Content)             The request succeeded (data updated)
572
416 (Range Not Satisfiable)  The supplied range is out of limits or invalid size
511 573
===========================  ==============================
512 574

  
513 575

  
......
538 600
* Container/object lists include all associated metadata if the reply is of type json/xml. Some names are kept to their OOS API equivalents for compatibility. 
539 601
* Object metadata allowed, in addition to ``X-Object-Meta-*``: ``Content-Encoding``, ``Content-Disposition``, ``X-Object-Manifest``. These are all replaced with every update operation.
540 602
* Multi-range object GET support as outlined in RFC2616.
603
* Object hashmap retrieval through GET and the ``format`` parameter.
604
* Partial object updates through POST, using the ``Content-Length``, ``Content-Type``, ``Content-Range`` and ``Transfer-Encoding`` headers.
541 605
* Object ``MOVE`` support.
542 606

  
543 607
Clarifications/suggestions:

Also available in: Unified diff