Revision 5bd53e3b

b/docs/source/backends.rst
6 6
   
7 7
   base_backend
8 8
   simple_backend
9
   hashfiler_backend
b/docs/source/design.rst
1

  
2
********************
3
Pithos Server Design
4
********************
5

  
6
Up-to-date: 2011-06-14
7

  
8
Key Target Features
9
===================
10

  
11
OpenStack Object Storage API
12
----------------------------
13

  
14
Originally derived from Rackspace's Cloudfiles API,
15
it features, like Amazon S3, a two-level real hieararchy
16
with containers and objects per account, while objects
17
within each container are hosted in a flat namespace,
18
albeit with options for 'virtual' hierarchy listings.
19
Accounts, containers and objects may host user-defined metadata.
20

  
21
The use of a well-known API, hopefully, will provide immediate
22
compatibility with existing clients and provide familiarity
23
to developers. Adoption and familiarity will also make
24
custom extensions more accessible to developers.
25

  
26
However, Pithos differs as a service, mainly because it (also)
27
targets end users. This inevitably forces extensions to the API,
28
but the plan is to keep OOS compatibility, anyway.
29

  
30
Partial Transfers
31
-----------------
32

  
33
An important feature, especially for home users, is the ability
34
to continue transfers after interruption by choice or failure,
35
without significant loss of the partial transfer completed
36
up to the point of interruption.
37

  
38
The Manifest mechanism in the CloudFiles API and
39
the the Multipart Upload mechanism in Amazon S3,
40
both provide a means to partial transfers.
41
Manifests is not (yet?) in the OpenStack specification
42
but is considered for support in Pithos.
43

  
44
The Pithos Server approach is similar, allowing appending
45
(actually, any modification) to existing objects via
46
HTTP 1.1 chunked transfers. Chunks reach stable storage
47
before the whole transfer is complete and restarting
48
a transfer from the point it was interrupted is possible
49
by querying the status of the target object (e.g. its size).
50

  
51
Atomic Updates
52
--------------
53

  
54
However, when updating existing objects, transfers do not
55
happen instantaneously. To support partial transfers,
56
the content must be committed to storage before the transfer
57
is complete. This creates a hazard:
58
The partially committed content may temporarily set the object
59
in a visible, inconsistent state after the transfer has started
60
and before the transfer has completed.
61
Furthermore, failed transfers that are never retried result
62
in permanent corruption and data loss.
63

  
64
The Manifest and Multipart Upload mechanisms both provide atomicity,
65
but they only specify the creation of a new object and not
66
the updating of an existing one.
67

  
68
The Pithos Server approach is similar, but more flexible.
69
There are two types of updates to an object:
70
Those with an HTTP body as a source which is not atomic,
71
and those with another object as a source, which is atomic.
72
This way, clients may choose to perform atomic updates in
73
two stages, as with Manifests and Multipart Uploads,
74
or choose to make a hazardous update directly.
75

  
76
Transfer bandwidth efficiency
77
-----------------------------
78

  
79
Another issue is the volume of data needed to be transfered
80
when updating objects.
81
If multiple clients update the object regularly, each one of
82
them cannot just send a patch because the state of the file
83
on the server is unknown.
84
A mechanism is needed that can compute delta patches between
85
two versions. The standard candidate is librsync.
86
However, content-hashing provides another alternative,
87
discussed below.
88

  
89
Cheap versions, snapshots and clones
90
------------------------------------
91

  
92
The Pithos Server supports cheap versions, snapshots and clones
93
for its hosted objects. First, to clarify the terms.
94

  
95
Versions
96
    are different objects with the same path,
97
    identified by an extra version identifier.
98
    Version identifiers are ordered on the time of the version creation.
99

  
100
Snapshots
101
    are immutable copies of objects at a specific point in time,
102
    archived for future reference.
103

  
104
Clones
105
    are mutable copies of snapshots or other objects,
106
    and have their own private history after their creation.
107

  
108
Note that snapshots and clones may have a different path that
109
their source object, while versions always have the same path.
110

  
111
It is not yet decided if snapshots will be explicitly in the API,
112
by providing a read-only object type.
113
It is also not yet decided if clones will be explicitly in the API,
114
as objects that have a 'source' back to their source object.
115

  
116
Effectively, the only prerequisite for cheap versions, shapshot and
117
clones is cheap copying of objects. This Pithos Server feature is
118
designed in concert with content-hashing, explained below.
119

  
120
Pluggable Storage Backend
121
-------------------------
122

  
123
The Pithos service is destined to run on distributed block storage servers.
124
Therefore the Pithos Server design assumes a pluggable storage backend
125
to a reliable, redundant object storage service.
126

  
127

  
128
Content-Hashing: The Case for Virtualizing Files
129
================================================
130

  
131
All of the key target features (except the OOS API) can be well serviced
132
by a content-hashing/blocking approach to storage.
133

  
134
Raw storage is becoming a basic commodity over the internet,
135
in many forms, and especially as a 'cloud' service.
136
It makes sense to virtualize files and separate
137
the storage containing and serving layer
138
from the semantically rich and application-defined file serving layer.
139

  
140
Content hashing with a consistent blocking (chunking) scheme,
141
inherently provides both universal content identification
142
and logical separation of file- and content-serving.
143
Additionally it offers many benefits and
144
its costs have minimum impact.
145

  
146
We iterate through the benefits and comment:
147

  
148
* Universal identification
149
    With the reservation of hash collisions, every file and every block-aligned
150
    part of a file can be uniquely identified by its hash digest.
151

  
152
* Get data from anyone who might serve it, easy sharing
153
    Because of universal identification,
154
    it matters not where you get the actual contents from.
155
    Just like it (almost) does not matter where you get
156
    your internet feed from.
157

  
158
    Pithos, will seek to exploit that by deploying a
159
    separate layer for serving blocks.
160

  
161
* Universal storage provisioning, efficient content serving.
162
    Because content-hashing separates the semantically and funtionally rich
163
    file-serving layer from the content serving layer,
164
    the actual storage service has a simple get/put interface
165
    to read-only objects.
166

  
167
    This enables easy deployment of diverse systems as storage backends.
168
    It also enables the content-serving system to be separate, simpler
169
    and more performant than the file-serving system.
170

  
171
* Cheap file copies
172
    Files become maps of hashes, and are "virtualized", in the sense
173
    that they only contain pointers to the actual content.
174
    The maps are much smaller (depending on the block size)
175
    and copying them incurs little overhead compared to copying
176
    the data.
177

  
178
* Cheap updates
179
    A small change in a large file will result only in small changes in
180
    the hashes map, therefore only the relevant blocks need to be uploaded
181
    and updated within the map.
182

  
183
* Data checksums
184
    Data reliability is no longer an ignorable issue and we need to
185
    checksum our data anyway. Content-hash is precicely that.
186

  
187
The only drawback we have registered is the overhead of managing
188
blocks that are no longer used.
189
However, our (somewhat) educated guess is that the unused blocks will
190
be only a small percentage of the total.
191
In any case, our current consensus is that we clean them up
192
in maintenance operations.
193

  
194

  
195
Specific Issues
196
===============
197

  
198
Pseudo- vs real hierarchies
199
---------------------------
200

  
201
The main difference between real and pseudo-hierarchies is in the
202
way the namespace is built.
203
In real hierarchies, like most disk filesystems,
204
the namespace is built by by recursive containing of directories
205
inside a root directory.
206
In pseudo-hierarchies, the namespace is flat, and hierarchy is defined
207
by the lexicographical ordering of the path of each file.
208

  
209
There are two important practical consequences:
210

  
211
- Pseudo-hierarchies can have less overhead and perform faster lookups,
212
  but cannot move files efficiently.
213

  
214
- Real-hierarchies can move files instantly,
215
  but every lookup must iterate through all parents of the target file.
216

  
217
Since Pithos Server, through the OpenStack Object Storage API,
218
has adopted a pseudo-hierarchy for the containers,
219
it is important to not contradict this design choice
220
with other incompatible choices.
221

  
222
File Properties, Attributes, Features
223
-------------------------------------
224

  
225
For the purpose of the design of Pithos Server Internals,
226
we define three types of metadata for files.
227

  
228
Properties
229
    are intrinsic qualities of files, such as their size, or content-type.
230
    Properties are interpreted and handled specially by the Server code,
231
    and their changing means fundamentally altering the object.
232

  
233
Attributes
234
    are key-value pairs attached to each file by the user,
235
    and are not intrinsic qualities of files like properties.
236
    The System may interpret some special keys as user input,
237
    but never consider attributes to be a fundamental
238
    and trusted quality of a file.
239
    In the current design,
240
    file versions do not share one attribute set, but each has its own.
241

  
242
X-Features
243
    are like attributes, but with one key difference and one key limitation.
244
    Unlike attributes, features are attached to paths and not versions.
245
    Therefore, each file "inherits" the features that are defined
246
    by its path, or some prefix of its path.
247
    The X stands for *exclusive*, which is the limitation of x-features.
248
    There can never be two X-Feature sets in two overlapping paths.
249
    Therefore, in order to set a feature set on a path,
250
    it is necessary to purge features from overlapping paths,
251
    or the operation fails.
252
    This limitation greatly reduces the practical overhead for the Server
253
    to query features and feature inheritance for arbitrary paths.
254
    However it does not cripple their use nearly as much.
255
    One may even argue that it simplifies things for the users.
256

  
257
Sharing
258
-------
259

  
260
    The basic idea is that sharing is one read-list and one write-list
261
    as x-features of a path, and that users and group of users may
262
    be specified in each list, granting the corresponding access rights.
263
    Currently, the 'write' permission implies the 'read' one.
264
    More permission types are possible with the addition of relevant lists.
265

  
b/docs/source/hashfiler.rst
1
HashFiler
2
===========
3

  
4
.. automodule:: pithos.lib.hashfiler
5
    :members:
6
    :show-inheritance:
7
    :undoc-members:
8

  
9

  
10
Blocker
11
-------
12
.. automodule:: pithos.lib.hashfiler.blocker
13
    :members:
14
    :show-inheritance:
15
    :undoc-members:
16

  
17

  
18
Mapper
19
------
20
.. automodule:: pithos.lib.hashfiler.mapper
21
    :members:
22
    :show-inheritance:
23
    :undoc-members:
24

  
25

  
26
AccessController
27
----------------
28
.. automodule:: pithos.lib.hashfiler.access_controller
29
    :members:
30
    :show-inheritance:
31
    :undoc-members:
32

  
33

  
34
Noder
35
-----
36
.. automodule:: pithos.lib.hashfiler.noder
37
    :members:
38
    :show-inheritance:
39
    :undoc-members:
40

  
41

  
42
Filer
43
-----
44
.. automodule:: pithos.lib.hashfiler.filer
45
    :members:
46
    :show-inheritance:
47
    :undoc-members:
48

  
49

  
b/docs/source/hashfiler_backend.rst
1
HashFilerBack
2
=============
3

  
4
.. automodule:: pithos.backends.hashfilerback
5
    :members:
6
    :show-inheritance:
7
    :undoc-members:
8

  
b/docs/source/index.rst
6 6
.. toctree::
7 7
   :maxdepth: 2
8 8
   
9
   design
9 10
   devguide
10 11
   adminguide
11 12
   backends
13
   hashfiler
12 14

  
13 15

  
14 16
Indices and tables
b/pithos/backends/__init__.py
34 34
from django.conf import settings
35 35

  
36 36
from simple import SimpleBackend
37
from hashfilerback import HashFilerBack 
37 38

  
38 39
backend = None
39 40
options = getattr(settings, 'BACKEND', None)
b/pithos/backends/hashfilerback.py
1
# Copyright 2011 GRNET S.A. All rights reserved.
2
# 
3
# Redistribution and use in source and binary forms, with or
4
# without modification, are permitted provided that the following
5
# conditions are met:
6
# 
7
#   1. Redistributions of source code must retain the above
8
#      copyright notice, this list of conditions and the following
9
#      disclaimer.
10
# 
11
#   2. Redistributions in binary form must reproduce the above
12
#      copyright notice, this list of conditions and the following
13
#      disclaimer in the documentation and/or other materials
14
#      provided with the distribution.
15
# 
16
# THIS SOFTWARE IS PROVIDED BY GRNET S.A. ``AS IS'' AND ANY EXPRESS
17
# OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
18
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL GRNET S.A OR
20
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
21
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
22
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
23
# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
24
# AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
25
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
26
# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
27
# POSSIBILITY OF SUCH DAMAGE.
28
# 
29
# The views and conclusions contained in the software and
30
# documentation are those of the authors and should not be
31
# interpreted as representing official policies, either expressed
32
# or implied, of GRNET S.A.
33

  
34
from pithos.lib.hashfiler import (Filer, ROOTNODE,
35
                                  SERIAL, PARENT, PATH, SIZE, POPULATION,
36
                                  POPSIZE, SOURCE, MTIME, CLUSTER)
37
from base import NotAllowedError
38
from binascii import hexlify, unhexlify
39
from os import makedirs, mkdir
40
from os.path import exists, isdir, split
41
from itertools import chain
42
from collections import defaultdict
43
from heapq import merge
44

  
45
debug = 0
46

  
47
READ  = 0
48
WRITE = 1
49

  
50
MAIN_CLUSTER  = 0
51
TRASH_CLUSTER = 1
52

  
53
def backend_method(func=None, name=None, autocommit=1):
54
    if func is None:
55
        def fn(func):
56
            return backend_method(func, name=name, autocommit=autocommit)
57
        return fn
58

  
59
    name = func.__name__ if name is None else name
60
    thefunc = None
61
    if debug:
62
        def fn(self, *args, **kw):
63
            print "%s %s %s" % (name, args, kw)
64
            return func(self, *args, **kw)
65
        thefunc = fn
66
    else:
67
        thefunc = func
68
    #XXX: Can't set __doc__ in methods?!
69
    #
70
    #doc = func.__doc__
71
    #if doc is None:
72
    #    doc = ''
73
    #func.__doc__ = getattr(BaseBackend, name).__doc__ + '\n***\n' + doc
74
    if not autocommit:
75
        return func
76

  
77
    def method(self, *args, **kw):
78
        with self.hf:
79
            return thefunc(self, *args, **kw)
80

  
81
    return method
82

  
83

  
84
class HashFilerBack(object):
85
    """Abstract backend class that serves as a reference for actual implementations.
86

  
87
    The purpose of the backend is to provide the necessary functions for handling data
88
    and metadata. It is responsible for the actual storage and retrieval of information.
89

  
90
    Note that the account level is always valid as it is checked from another subsystem.
91

  
92
    The following variables should be available:
93
        'hash_algorithm': Suggested is 'sha256'
94
        'block_size': Suggested is 4MB
95
    """
96

  
97
    def __init__(self, db):
98
        head, tail = split(db)
99
        if not exists(head):
100
            makedirs(head)
101
        if not isdir(head):
102
            raise RuntimeError("Cannot open database at '%s'" % (db,))
103

  
104
        blocksize = 4*1024*1024
105
        hashtype = 'sha256'
106
        root = head + '/root'
107
        if not exists(root):
108
            mkdir(root)
109
        db = head + '/db'
110
        params = {  'filepath': db,
111
                    'mappath' : root,
112
                    'blockpath': root,
113
                    'hashtype': hashtype,
114
                    'blocksize': blocksize  }
115

  
116
        hf = Filer(**params)
117
        self.hf = hf
118
        self.block_size = blocksize
119
        self.hash_algorithm = hashtype
120

  
121
    def quit(self):
122
        self.hf.quit()
123

  
124
    def get_account(self, account, root=ROOTNODE):
125
        #XXX: how are accounts created?
126
        r = self.hf.file_lookup(root, account, create=1)
127
        if r is None:
128
            raise NameError("account '%s' not found" % (account,))
129
        return r[0]
130

  
131
    def get_container(self, serial, name):
132
        r = self.hf.file_lookup(serial, name, create=0)
133
        if r is None:
134
            raise NameError("container '%s' not found" % (name,))
135
        return r[0]
136

  
137
    def get_file(self, serial, path):
138
        r = self.hf.file_lookup(ROOTNODE, account, create=0)
139
        if r is None:
140
            raise NameError("file '%s' not found" % (path,))
141
        return r
142

  
143
    @staticmethod
144
    def insert_properties(meta, props):
145
        (serial, parent, path, size,
146
         population, popsize, source, mtime, cluster) = props
147
        meta['version'] = serial
148
        meta['name'] = path
149
        meta['bytes'] = size + popsize
150
        meta['count'] = population
151
        meta['created'] = mtime
152
        meta['modified'] = mtime
153
        return meta
154

  
155
    def lookup_file(self, account, container, name,
156
                    create=0, size=0, authorize=0, before=None):
157
        # XXX: account/container lookup cache?
158
        serial = self.get_account(account)
159
        serial = self.get_container(serial, container)
160
        if before is None:
161
            before = float('inf')
162
        props = self.hf.file_lookup(serial, name,
163
                                    create=create, size=size, before=before)
164
        if props is None:
165
            raise NameError("File '%s/%s/%s' does not exist" %
166
                            (account, container, name))
167
        return props
168

  
169
    def can_read(self, user, account, container, path):
170
        path = "%s:%s:%s" % (account, container, path)
171
        check = self.hf.access_check
172
        return check(READ, path, user) or check(WRITE, path, user)
173

  
174
    def can_write(self, user, account, container, path):
175
        path = "%s:%s:%s" % (account, container, path)
176
        check = self.hf.access_check
177
        return check(WRITE, path, user)
178

  
179
    @backend_method
180
    def get_account_meta(self, user, account, until=None):
181
        if user != account:
182
            raise NotAllowedError
183

  
184
        hf = self.hf
185
        before = float("inf") if until is None else until
186
        props = hf.file_lookup(ROOTNODE, account, create=0, before=before)
187
        if props is None:
188
            #XXX: why not return error?
189
            meta = {'name': account,
190
                    'count': 0,
191
                    'bytes': 0}
192
            if until is not None:
193
                meta['until_timestamp'] = 0
194
            return meta
195

  
196
        serial = props[SERIAL]
197
        meta = dict(hf.file_get_attributes(serial))
198
        meta['name'] = account
199
        self.insert_properties(meta, props)
200
        if until is not None:
201
            meta['until_timestamp'] = props[MTIME]
202
        return meta
203

  
204
    @backend_method
205
    def update_account_meta(self, user, account, meta, replace=False):
206
        if user != account:
207
            raise NotAllowedError
208

  
209
        hf = self.hf
210
        serial = self.get_account(account)
211
        if replace:
212
            hf.file_del_attributes(serial)
213

  
214
        hf.file_set_attributes(serial, meta.iteritems())
215

  
216
    @backend_method
217
    def get_account_groups(self, user, account):
218
        if user != account:
219
            raise NotAllowedError
220

  
221
        groups = defaultdict(list)
222
        grouplist = self.hf.group_list("%s:" % user)
223
        for name, member in grouplist:
224
            groups[name].append(member)
225

  
226
        return groups
227

  
228
    @backend_method
229
    def update_account_groups(self, user, account, groups, replace=False):
230
        if user != account:
231
            raise NotAllowedError
232

  
233
        hf = self.hf
234
        group_addmany = hf.group_addmany
235
        for name, members in groups:
236
            name = "%s:%s" % (user, name)
237
            if not members:
238
                members = (name,)
239
            group_addmany(name, members)
240

  
241
        if not replace:
242
            return
243

  
244
        groupnames = hf.group_list("%s:" % user)
245
        group_destroy = hf.group_destroy
246
        for name in groupnames:
247
            if name not in groups:
248
                group_destroy(name)
249

  
250
    @backend_method
251
    def delete_account(self, user, account):
252
        if user != account:
253
            raise NotAllowedError
254

  
255
        hf = self.hf
256
        r = hf.file_lookup(ROOTNODE, account)
257
        if r is None or r[SERIAL] is None:
258
            raise NameError
259

  
260
        if r[POPULATION]:
261
            raise IndexError
262

  
263
        hf.file_remove(r[SERIAL])
264

  
265
    @backend_method
266
    def put_container(self, user, account, name, policy=None):
267
        if user != account:
268
            raise NotAllowedError
269

  
270
        parent = self.get_account(account)
271
        hf = self.hf
272
        r = hf.file_lookup(parent, name)
273
        if r is not None:
274
            raise NameError("%s:%s already exists" % (account, name))
275

  
276
        serial, mtime = hf.file_create(parent, name, 0)
277
        return serial, mtime
278

  
279
    @backend_method
280
    def delete_container(self, user, account, name):
281
        if user != account:
282
            raise NotAllowedError
283

  
284
        serial = self.get_account(account)
285
        serial = self.get_container(serial, name)
286
        r = self.hf.file_remove(serial)
287
        if r is None:
288
            raise AssertionError("This should not have happened.")
289
        elif r == 0:
290
            #XXX: if container has only trashed objects, is it empty or not?
291
            msg = "'%s:%s' not empty. will not delete." % (account, name)
292
            raise IndexError(msg)
293

  
294
    @backend_method
295
    def get_container_meta(self, user, account, name, until=None):
296
        if user != account:
297
            raise NotAllowedError
298

  
299
        serial = self.get_account(account)
300
        hf = self.hf
301
        before = float('inf') if until is None else until
302
        props = hf.file_lookup(serial, name, before=before)
303
        if props is None:
304
            raise NameError("%s:%s does not exist" % (account, name))
305
        serial = props[0]
306

  
307
        r = hf.file_get_attributes(serial)
308
        meta = dict(r)
309
        self.insert_properties(meta, props)
310
        if until:
311
            meta['until_timestamp'] = props[MTIME]
312
        return meta
313

  
314
    @backend_method
315
    def update_container_meta(self, user, account, name, meta, replace=False):
316
        if user != account:
317
            raise NotAllowedError
318

  
319
        serial = self.get_account(account)
320
        serial = self.get_container(serial, name)
321
        hf = self.hf
322
        if replace:
323
            hf.file_del_attributes(serial)
324

  
325
        hf.file_set_attributes(serial, meta.iteritems())
326

  
327
    @backend_method
328
    def get_container_policy(self, user, account, container):
329
        return {}
330

  
331
    @backend_method
332
    def update_container_policy(self, user, account,
333
                                container, policy, replace=0):
334
        return
335

  
336
    @backend_method
337
    def list_containers(self, user, account,
338
                        marker=None, limit=10000, until=None):
339
        if not marker:
340
            marker = ''
341
        if limit is None:
342
            limit = -1
343
        serial = self.get_account(account)
344
        r = self.hf.file_list(serial, '', start=marker, limit=limit)
345
        objects, prefixes = r
346
        return [(o[PATH], o[SERIAL]) for o in objects]
347

  
348
    @backend_method
349
    def list_objects(self, user, account, container, prefix='',
350
                     delimiter=None, marker=None, limit=10000,
351
                     virtual=True, keys=[], until=None):
352
        if user != account:
353
            raise NotAllowedError
354

  
355
        if not marker:
356
            #XXX: '' is a valid string, should be None instead
357
            marker = None
358
        if limit is None:
359
            #XXX: limit should be an integer
360
            limit = -1
361

  
362
        if not delimiter:
363
            delimiter = None
364

  
365
        serial = self.get_account(account)
366
        serial = self.get_container(serial, container)
367
        hf = self.hf
368
        filterq = ','.join(keys) if keys else None
369
        r = hf.file_list(serial, prefix=prefix, start=marker,
370
                         versions=0, filterq=filterq,
371
                         delimiter=delimiter, limit=limit)
372
        objects, prefixes = r
373
        # XXX: virtual is not needed since we get them anyway
374
        #if virtual:
375
        #    return [(p, None) for p in prefixes]
376

  
377
        objects = ((o[PATH], o[MTIME]) for o in objects)
378
        prefixes = ((p, None) for p in prefixes) if virtual else ()
379
        return list(merge(objects, prefixes))
380

  
381
    @backend_method
382
    def list_object_meta(self, user, account, container, until=None):
383
        serial = self.get_account(account)
384
        serial = self.get_container(serial, container)
385
        return self.hf.file_list_attributes(serial)
386

  
387
    @backend_method
388
    def get_object_meta(self, user, account, container, name, until=None):
389
        props = self.lookup_file(account, container, name,
390
                                 before=until, authorize=2)
391
        serial = props[0]
392
        meta = dict(self.hf.file_get_attributes(serial))
393
        self.insert_properties(meta, props)
394
        meta['version_timestamp'] = props[MTIME]
395
        meta['modified_by'] = 'fuck you'
396
        return meta
397

  
398
    @backend_method
399
    def update_object_meta(self, user, account, container,
400
                           name, meta, replace=False):
401
        props = self.lookup_file(account, container, name, authorize=1)
402
        serial = props[0]
403
        hf = self.hf
404
        if replace:
405
            hf.file_del_attributes(serial)
406
        hf.file_set_attributes(serial, meta.iteritems())
407

  
408
    @backend_method
409
    def get_object_permissions(self, user, account, container, name):
410
        path = "%s/%s/%s" % (account, container, name)
411
        perms = defaultdict(list)
412
        for access, ident in self.hf.access_list(path):
413
            perms[access].append(ident)
414
        perms['read'] = perms.pop(READ, [])
415
        perms['write'] = perms.pop(WRITE, [])
416
        return path, perms
417

  
418
    @backend_method
419
    def update_object_permissions(self, user, account, container, name, permissions):
420
        if user != account:
421
            return NotAllowedError
422

  
423
        hf = self.hf
424
        path = "%s/%s/%s" % (account, container, name)
425
        feature = hf.feature_create()
426
        hf.access_grant(READ, path, members=permissions['read'])
427
        hf.access_grant(WRITE, path, members=permissions['write'])
428
        r = hf.xfeature_bestow(path, feature)
429
        assert(not r)
430

  
431
    @backend_method
432
    def get_object_public(self, user, account, container, name):
433
        """Return the public URL of the object if applicable."""
434
        return None
435

  
436
    @backend_method
437
    def update_object_public(self, user, account, container, name, public):
438
        """Update the public status of the object."""
439
        return
440

  
441
    @backend_method
442
    def get_object_hashmap(self, user, account, container, name, version=None):
443
        hf = self.hf
444
        if version is None:
445
            props = self.lookup_file(account, container, name, authorize=2)
446
        else:
447
            props = hf.file_get_properties(version)
448
        serial = props[SERIAL]
449
        size = props[SIZE]
450
        hashes = [hexlify(h) for h in hf.map_retr(serial)]
451
        return size, hashes
452

  
453
    @backend_method
454
    def update_object_hashmap(self, user, account, container,
455
                              name, size, hashmap, meta=None,
456
                              replace_meta=0, permissions=None):
457
        hf = self.hf
458
        props = self.lookup_file(account, container, name,
459
                                 create=1, size=size, authorize=1)
460
        serial = props[SERIAL]
461
        if size != props[SIZE]:
462
            hf.file_increment(serial, size)
463
        hf.map_stor(serial, (unhexlify(h) for h in hashmap))
464
        if replace_meta:
465
            hf.file_del_attributes(serial)
466
        if meta:
467
            hf.file_set_attributes(serial, meta.iteritems())
468

  
469
    @backend_method
470
    def copy_object(self, user, account, src_container, src_name,
471
                    dest_container, dest_name, dest_meta={},
472
                    replace_meta=False, permissions=None, src_version=None):
473
        hf = self.hf
474
        #FIXME: authorization
475
        if src_version:
476
            props = hf.file_get_properties(src_version)
477
        else:
478
            acc = self.get_account(account)
479
            srccont = self.get_container(acc, src_container)
480
            props = hf.file_lookup(srccont, src_name)
481
        if props is None:
482
            raise NameError
483
        srcobj = props[SERIAL]
484

  
485
        dstcont = self.get_container(acc, dest_container)
486
        serial, mtime = hf.file_copy(srcobj, dstcont, dest_name)
487
        if replace_meta:
488
            hf.file_del_attributes(serial)
489
        hf.file_set_attributes(serial, dest_meta.iteritems())
490

  
491
    @backend_method
492
    def move_object(self, user, account, src_container, src_name,
493
                    dest_container, dest_name, dest_meta={},
494
                    replace_meta=False, permissions=None, src_version=None):
495
        hf = self.hf
496
        #FIXME: authorization
497
        if src_version:
498
            props = hf.file_get_properties(src_version)
499
        else:
500
            acc = self.get_account(account)
501
            srccont = self.get_container(acc, src_container)
502
            props = hf.file_lookup(srccont, src_name)
503
        if props is None:
504
            raise NameError
505
        srcobj = props[SERIAL]
506

  
507
        dstcont = self.get_container(acc, dest_container)
508
        serial, mtime = hf.file_copy(srcobj, dstcont, dest_name)
509
        if replace_meta:
510
            hf.file_del_attributes(serial)
511
        hf.file_set_attributes(serial, dest_meta.iteritems())
512
        hf.file_remove(srcobj)
513

  
514
    @backend_method
515
    def delete_object(self, user, account, container, name):
516
        if user != account:
517
            raise NotAllowedError
518

  
519
        props = self.lookup_file(account, container, name, authorize=WRITE)
520
        serial = props[SERIAL]
521
        hf = self.hf
522
        #XXX: see delete_container, for empty-trashed ambiguty
523
        #     for now, we will just remove the object
524
        hf.file_remove(serial)
525
        #hf.file_recluster(serial, TRASH_CLUSTER)
526
        #XXX: Where to unset sharing? on file trash or on trash purge?
527
        #     We'll do it on file trash, but should we trash all versions?
528
        hf.xfeature_disinherit("%s/%s/%s" % (account, container, name))
529

  
530
    @backend_method
531
    def purge_trash(self, user, acount, container):
532
        if user != account:
533
            raise NotAllowedError
534

  
535
        serial = self.get_account(account)
536
        serial = self.get_container(serial, container)
537
        self.hf.file_purge_cluster(serial, cluster=TRASH_CLUSTER)
538

  
539
    @backend_method(autocommit=0)
540
    def get_block(self, blkhash):
541
        blocks = self.hf.block_retr((unhexlify(blkhash),))
542
        if not blocks:
543
            raise NameError
544
        return blocks[0]
545

  
546
    @backend_method(autocommit=0)
547
    def put_block(self, data):
548
        hashes, absent = self.hf.block_stor((data,))
549
        #XXX: why hexlify hashes?
550
        return hexlify(hashes[0])
551

  
552
    @backend_method(autocommit=0)
553
    def update_block(self, blkhash, data, offset=0):
554
        h, e = self.hf.block_delta(unhexlify(blkhash), ((offset, data),))
555
        return hexlify(h)
556

  
b/pithos/backends/tests.py
36 36
import types
37 37
import json
38 38

  
39
from simple import SimpleBackend
39
from hashfilerback import HashFilerBack as TheBackend
40
#from simple import SimpleBackend as TheBackend
40 41

  
41 42

  
42 43
class TestAccount(unittest.TestCase):
43 44
    def setUp(self):
44 45
        self.basepath = './test/content'
45
        self.b = SimpleBackend(self.basepath)
46
        self.b = TheBackend(self.basepath)
46 47
        self.account = 'test'
47 48
    
48 49
    def tearDown(self):
......
129 130
class TestContainer(unittest.TestCase):
130 131
    def setUp(self):
131 132
        self.basepath = './test/content'
132
        self.b = SimpleBackend(self.basepath)
133
        self.b = TheBackend(self.basepath)
133 134
        self.account = 'test'
134 135
    
135 136
    def tearDown(self):
......
244 245
class TestObject(unittest.TestCase):
245 246
    def setUp(self):
246 247
        self.basepath = './test/content'
247
        self.b = SimpleBackend(self.basepath)
248
        self.b = TheBackend(self.basepath)
248 249
        self.account = 'test'
249 250
    
250 251
    def tearDown(self):
......
397 398
        self.assertRaises(NameError, self.b.update_object_meta, 'test', self.account, cname, name, {})
398 399

  
399 400
if __name__ == "__main__":
400
    unittest.main()
401
    unittest.main()
b/pithos/lib/hashfiler/__init__.py
1

  
2
from blocker import Blocker
3
from mapper import Mapper
4
from noder import (Noder, inf, strnextling, ROOTNODE,
5
                   SERIAL, PARENT, PATH, SIZE, POPULATION,
6
                   POPSIZE, SOURCE, MTIME, CLUSTER)
7
from access_controller import AccessController
8
from filer import Filer
9

  
b/pithos/lib/hashfiler/access_controller.py
1
from pithos.lib.hashfiler import strnextling
2

  
3
class AccessController(object):
4

  
5
    def __init__(self, **params):
6
        self.params = params
7
        conn = params['connection']
8
        cur = params['cursor']
9
        execute = cur.execute
10
        self.execute = execute
11
        self.executemany = cur.executemany
12
        self.fetchone = cur.fetchone
13
        self.fetchall = cur.fetchall
14
        self.cur = cur
15
        self.conn = conn
16

  
17
        execute(""" create table if not exists xfeatures
18
                          ( path    text,
19
                            feature integer,
20
                            primary key (path) ) """)
21
        execute(""" create unique index if not exists idx_feature
22
                    on xfeatures(feature, path) """)
23

  
24
        execute(""" create table if not exists ftvals
25
                          ( feature integer,
26
                            key     integer,
27
                            value   text,
28
                            primary key (feature, key, value) ) """)
29

  
30
        execute(""" create table if not exists members
31
                          ( name   text,
32
                            member text,
33
                            primary key (name, member) ) """)
34

  
35
        execute(""" create index if not exists idx_members
36
                    on members(member, name) """)
37

  
38
    def xfeature_inherit(self, path):
39
        """Return the feature inherited by the path, or None"""
40
        q = ("select path, feature from xfeatures "
41
             "where path <= ? "
42
             "order by path desc limit 1")
43
        self.execute(q, (path,))
44
        r = self.fetchone()
45
        if r is None:
46
            return r
47

  
48
        if path.startswith(r[0]):
49
            return r
50

  
51
        return None
52

  
53
    def xfeature_list(self, path):
54
        """Return the list of the (prefix, feature) pairs matching path.
55
           A prefix matches path if either the prefix includes the path,
56
           or the path includes the prefix.
57
        """
58
        inherited = self.xfeature_inherit(path)
59
        if inherited:
60
            return [inherited]
61

  
62
        q = ("select path, feature from xfeatures "
63
             "where path > ? order by path")
64
        self.execute(q, (path,))
65
        fetchone = self.fetchone
66
        prefixes = []
67
        append = prefixes.append
68
        while 1:
69
            r = fetchone()
70
            if r is None:
71
                break
72
            if not r[0].startswith(path):
73
                break
74
            append(r)
75

  
76
        return prefixes
77

  
78
    def xfeature_bestow(self, path, feature):
79
        """Bestow a feature to a path.
80
           If the path already inherits a feature or
81
           bestows to paths already inheriting a feature,
82
           bestow no feature and return the list of matching paths.
83
           Otherwise, bestow the feature and return an empty iterable.
84
        """
85
        prefixes = self.xfeature_list(path)
86
        pl = len(prefixes)
87
        if (pl > 1) or (pl == 1 and prefixes[0][0] != path):
88
            return prefixes
89
        q = "insert or replace into xfeatures values (?, ?)"
90
        self.execute(q, (path, feature))
91
        return ()
92

  
93
    def xfeature_disinherit(self, path):
94
        """Remove the path and the feature it bestows.
95
           If the path already inherits a feature or
96
           bestows to paths already inheriting a feature,
97
           remove nothing and return the list of matching paths.
98
           Otherwise, remove the path and return an empty iterable.
99
        """
100
        prefixes = self.xfeature_list(path)
101
        if not prefixes or not (len(prefixes) == 1 and prefixes[0][0] == path):
102
            return prefixes
103

  
104
        q = "delete from xfeatures where path = ?"
105
        self.execute(q, (path,))
106
        return ()
107

  
108
    def feature_create(self):
109
        """Create an empty feature and return its identifier."""
110
        return self.alloc_serial()
111

  
112
    def feature_destroy(self, feature):
113
        """Destroy a feature and all its key, value pairs."""
114
        q = "delete from ftvals where feture = ?"
115
        self.execute(q, (feature,))
116

  
117
    def feature_list(self, feature):
118
        """Return the list of all key, value pairs
119
           associated with a feature.
120
        """
121
        q = "select key, value from ftvals where feature = ?"
122
        self.execute(q, (feature,))
123
        return self.fetchall()
124

  
125
    def feature_set(self, feature, key, value):
126
        """Associate a key, value pair with a feature."""
127
        q = "insert into ftvals values (?, ?, ?)"
128
        self.execute(q, (feature, key, value))
129

  
130
    def feature_setmany(self, feature, key, values):
131
        """Associate the given key, and values with a feature."""
132
        q = "insert into ftvals values (?, ?, ?)"
133
        self.executemany(q, ((feature, key, v) for v in values))
134

  
135
    def feature_unset(self, feature, key, value):
136
        """Disassociate a key, value pair from a feature."""
137
        q = ("delete from ftvals where "
138
             "feature = ? and key = ? and value = ?")
139
        self.execute(q, (feature, key, value))
140

  
141
    def feature_unsetmany(self, feature, key, values):
142
        """Disassociate the key for the values given, from a feature."""
143
        q = ("delete from ftvals where "
144
             "feature = ? and key = ? and value = ?")
145
        self.executemany(q, ((feature, key, v) for v in values))
146

  
147
    def feature_get(self, feature, key):
148
        """Return the list of values for a key of a feature."""
149
        q = "select value from ftvals where feature = ? and key = ?"
150
        self.execute(q, (feature, key))
151
        return [r[0] for r in self.fetchall()]
152

  
153
    def feature_clear(self, feature, key):
154
        """Delete all key, value pairs for a key of a feature."""
155
        q = "delete from ftvals where feature = ? and key = ?"
156
        self.execute(q, (feature, key))
157

  
158
    def group_names(self, prefix):
159
        """List all group name, member with name begining with prefix."""
160
        stop = strnextling(prefix)
161
        q = "select distinct name from members where name between ? and ?"
162
        self.execute(q, (prefix, stop))
163
        return [r[0] for r in self.fetchall()]
164

  
165
    def group_list(self, prefix):
166
        """List all groups with name begining with prefix."""
167
        stop = strnextling(prefix)
168
        q = "select name, member from members where name between ? and ?"
169
        self.execute(q, (prefix, stop))
170
        return self.fetchall()
171

  
172
    def group_add(self, group, member):
173
        """Add a member to a group."""
174
        q = "insert or ignore into members values (?, ?)"
175
        self.execute(q, (group, member))
176

  
177
    def group_addmany(self, group, members):
178
        """Add members to a group."""
179
        q = "insert or ignore into members values (?, ?)"
180
        self.executemany(q, ((group, member) for member in members))
181

  
182
    def group_destroy(self, group):
183
        """Destroy a group."""
184
        q = "delete from members where name = ?"
185
        self.execute(q, (group,))
186

  
187
    def group_del(self, group, member):
188
        """Delete a member from a group."""
189
        q = "delete from members where name = ? and member = ?"
190
        self.execute(q, (group, member))
191

  
192
    def group_clear(self, group):
193
        """Clear the members of a group."""
194
        q = "delete from members where name = ?"
195
        self.execute(q, (group, member))
196

  
197
    def group_members(self, group):
198
        """Return the list of members of a group."""
199
        q = "select member from members where name = ?"
200
        self.execute(q, (group,))
201
        return [r[0] for r in self.fetchall()]
202

  
203
    def group_check(self, group, member):
204
        """Check if a member is in a group."""
205
        q = "select 1 from members where name = ? and member = ?"
206
        self.execute(q, (group, member))
207
        return bool(self.fetchone())
208

  
209
    def group_parents(self, group, member):
210
        """Return a list with the groups that contain a member."""
211
        q = "select name from members where member = ?"
212
        self.execute(q, (member,))
213
        return [r[0] for r in self.fetchall()]
214

  
215
    def access_grant(self, access, path, member='all', members=()):
216
        """Grant a member with an access to a path."""
217
        xfeatures = self.xfeature_list(path)
218
        xfl = len(xfeatures)
219
        if xfl > 1 or (xfl == 1 and xfeatures[0][0] != path):
220
            return xfeatures
221
        if xfl == 0:
222
            feature = self.alloc_serial()
223
            self.xfeature_bestow(path, feature)
224
        else:
225
            fpath, feature = xfeatures[0]
226

  
227
        if members:
228
            self.feature_setmany(feature, access, members)
229
        else:
230
            self.feature_set(feature, access, member)
231

  
232
        return ()
233

  
234
    def access_revoke(self, access, path, member='all', members=()):
235
        """Revoke access to path from members.
236
           Note that this will not revoke access for members
237
           that are indirectly granted access through group membership.
238
        """
239
        # XXX: Maybe provide a force_revoke that will kick out
240
        #      all groups containing the given members?
241
        xfeatures = self.xfeature_list(path)
242
        xfl = len(xfeatures)
243
        if xfl != 1 or xfeatures[0][0] != path:
244
            return xfeatures
245

  
246
        fpath, feature = xfeatures[0]
247

  
248
        if members:
249
            self.feature_unsetmany(feature, access, members=members)
250
        else:
251
            self.feature_unset(feature, access, member)
252

  
253
        # XXX: provide a meaningful return value? 
254

  
255
        return ()
256

  
257
    def access_check(self, access, path, member):
258
        """Return true if the member has this access to the path."""
259
        r = self.xfeature_inherit(path)
260
        if not r:
261
            return 0
262

  
263
        fpath, feature = r
264
        memberset = set(self.feature_get(feature, access))
265
        if member in memberset:
266
            return 1
267

  
268
        for group in self.group_parents(self, member):
269
            if group in memberset:
270
                return 1
271

  
272
        return 0
273

  
274
    def access_list(self, path):
275
        """Return the list of (access, member) pairs for the path."""
276
        r = self.xfeature_inherit(path)
277
        if not r:
278
            return ()
279

  
280
        fpath, feature = r
281
        return self.feature_list(feature)
282

  
283
    def access_list_paths(self, member):
284
        """Return the list of (access, path) pairs granted to member."""
285
        q = ("select distinct key, path from xfeatures inner join "
286
             "   (select distinct feature, key from ftvals inner join "
287
             "      (select name as value from members "
288
             "       where member = ? union select ?) "
289
             "    using (value)) "
290
             "using (feature)")
291

  
292
        self.execute(q, (member, member))
293
        return self.fetchall()
294

  
b/pithos/lib/hashfiler/blocker.py
1
#!/usr/bin/env python
2

  
3
from sqlite3 import connect, OperationalError
4
from os import chdir, makedirs, fsync, SEEK_CUR, SEEK_SET, unlink
5
from os.path import isdir, realpath, exists, join
6
from hashlib import new as newhasher
7
from binascii import hexlify, unhexlify
8
from time import time
9

  
10
from pithos.lib.hashfiler.context_file import ContextFile
11

  
12

  
13
class Blocker(object):
14
    """Blocker.
15
       Required contstructor parameters: blocksize, blockpath, hashtype.
16
    """
17

  
18
    blocksize = None
19
    blockpath = None
20
    hashtype = None
21

  
22
    def __init__(self, **params):
23
        blocksize = params['blocksize']
24
        blockpath = params['blockpath']
25
        blockpath = realpath(blockpath)
26
        if not isdir(blockpath):
27
            if not exists(blockpath):
28
                makedirs(blockpath)
29
            else:
30
                raise ValueError("blockpath '%s' is not a directory" % (blockpath,))
31

  
32
        hashtype = params['hashtype']
33
        try:
34
            hasher = newhasher(hashtype)
35
        except ValueError:
36
            msg = "hashtype '%s' is not available from hashlib"
37
            raise ValueError(msg % (hashtype,))
38

  
39
        hasher.update("")
40
        emptyhash = hasher.digest()
41

  
42
        self.blocksize = blocksize
43
        self.blockpath = blockpath
44
        self.hashtype = hashtype
45
        self.hashlen = len(emptyhash)
46
        self.emptyhash = emptyhash
47

  
48
    def get_rear_block(self, blkhash, create=0):
49
        name = join(self.blockpath, hexlify(blkhash))
50
        return ContextFile(name, create)
51

  
52
    def check_rear_block(self, blkhash):
53
        name = join(self.blockpath, hexlify(blkhash))
54
        return exists(name)
55

  
56
    def block_hash(self, data):
57
        """Hash a block of data"""
58
        hasher = newhasher(self.hashtype)
59
        hasher.update(data.rstrip('\x00'))
60
        return hasher.digest()
61

  
62
    def block_ping(self, hashes):
63
        """Check hashes for existence and
64
           return those missing from block storage.
65
        """
66
        missing = []
67
        append = missing.append
68
        for i, h in enumerate(hashes):
69
            if not self.check_rear_block(h):
70
                append(i)
71
        return missing
72

  
73
    def block_retr(self, hashes):
74
        """Retrieve blocks from storage by their hashes."""
75
        blocksize = self.blocksize
76
        blocks = []
77
        append = blocks.append
78
        block = None
79

  
80
        for h in hashes:
81
            with self.get_rear_block(h, 0) as rbl:
82
                if not rbl:
83
                    break
84
                for block in rbl.sync_read_chunks(blocksize, 1, 0):
85
                    break # there should be just one block there
86
            if not block:
87
                break
88
            append(block)
89

  
90
        return blocks
91

  
92
    def block_stor(self, blocklist):
93
        """Store a bunch of blocks and return (hashes, missing).
94
           Hashes is a list of the hashes of the blocks,
95
           missing is a list of indices in that list indicating
96
           which blocks were missing from the store.
97
        """
98
        block_hash = self.block_hash
99
        hashlist = [block_hash(b) for b in blocklist]
100
        mf = None
101
        missing = self.block_ping(hashlist)
102
        for i in missing:
103
            with self.get_rear_block(hashlist[i], 1) as rbl:
104
                 rbl.sync_write(blocklist[i]) #XXX: verify?
105

  
106
        return hashlist, missing
107

  
108
    def block_delta(self, blkhash, offdata=()):
109
        """Construct and store a new block from a given block
110
           and a list of (offset, data) 'patches'. Return:
111
           (the hash of the new block, if the block already existed)
112
        """
113
        if not offdata:
114
            return None, None
115

  
116
        blocksize = self.blocksize
117
        block = self.block_retr((blkhash,))
118
        if not block:
119
            return None, None
120

  
121
        block = block[0]
122
        newblock = ''
123
        idx = 0
124
        size = 0
125
        trunc = 0
126
        for off, data in offdata:
127
            if not data:
128
                trunc = 1
129
                break
130
            newblock += block[idx:off] + data
131
            size += off - idx + len(data)
132
            if size >= blocksize:
133
                break
134
            off = size
135

  
136
        if not trunc:
137
            newblock += block[size:len(block)]
138

  
139
        h, a = self.block_stor((newblock,))
140
        return h[0], 1 if a else 0
141

  
142
    def block_hash_file(self, openfile):
143
        """Return the list of hashes (hashes map)
144
           for the blocks in a buffered file.
145
           Helper method, does not affect store.
146
        """
147
        hashes = []
148
        append = hashes.append
149
        block_hash = self.block_hash
150

  
151
        for block in file_sync_read_chunks(openfile, self.blocksize, 1, 0):
152
            append(block_hash(block))
153

  
154
        return hashes
155

  
156
    def block_stor_file(self, openfile):
157
        """Read blocks from buffered file object and store them. Return:
158
           (bytes read, list of hashes, list of hashes that were missing)
159
        """
160
        blocksize = self.blocksize
161
        block_stor = self.block_stor
162
        hashlist = []
163
        hextend = hashlist.extend
164
        storedlist = []
165
        sextend = storedlist.extend
166
        lastsize = 0
167

  
168
        for block in file_sync_read_chunks(openfile, blocksize, 1, 0):
169
            hl, sl = block_stor((block,))
170
            hextend(hl)
171
            sextend(sl)
172
            lastsize = len(block)
173

  
174
        size = (len(hashlist) -1) * blocksize + lastsize if hashlist else 0
175
        return size, hashlist, storedlist
176

  
... This diff was truncated because it exceeds the maximum size that can be displayed.

Also available in: Unified diff