Statistics
| Branch: | Tag: | Revision:

root / snf-pithos-app / pithos / api / short_url.py @ fda50d75

History | View | Annotate | Download (5.2 kB)

1 bb4eafc6 Antony Chazapis
# Copyright (C) 2009 by Michael Fogleman
2 2715ade4 Sofia Papagiannaki
#
3 bb4eafc6 Antony Chazapis
# Permission is hereby granted, free of charge, to any person obtaining a copy
4 bb4eafc6 Antony Chazapis
# of this software and associated documentation files (the "Software"), to deal
5 bb4eafc6 Antony Chazapis
# in the Software without restriction, including without limitation the rights
6 bb4eafc6 Antony Chazapis
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7 bb4eafc6 Antony Chazapis
# copies of the Software, and to permit persons to whom the Software is
8 bb4eafc6 Antony Chazapis
# furnished to do so, subject to the following conditions:
9 2715ade4 Sofia Papagiannaki
#
10 bb4eafc6 Antony Chazapis
# The above copyright notice and this permission notice shall be included in
11 bb4eafc6 Antony Chazapis
# all copies or substantial portions of the Software.
12 2715ade4 Sofia Papagiannaki
#
13 bb4eafc6 Antony Chazapis
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14 bb4eafc6 Antony Chazapis
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15 bb4eafc6 Antony Chazapis
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16 bb4eafc6 Antony Chazapis
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17 bb4eafc6 Antony Chazapis
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18 bb4eafc6 Antony Chazapis
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19 bb4eafc6 Antony Chazapis
# THE SOFTWARE.
20 bb4eafc6 Antony Chazapis
21 bb4eafc6 Antony Chazapis
'''
22 bb4eafc6 Antony Chazapis
Short URL Generator
23 bb4eafc6 Antony Chazapis
===================
24 bb4eafc6 Antony Chazapis

25 bb4eafc6 Antony Chazapis
Python implementation for generating Tiny URL- and bit.ly-like URLs.
26 bb4eafc6 Antony Chazapis

27 2715ade4 Sofia Papagiannaki
A bit-shuffling approach is used to avoid generating consecutive, predictable
28 2715ade4 Sofia Papagiannaki
URLs.  However, the algorithm is deterministic and will guarantee that no
29 bb4eafc6 Antony Chazapis
collisions will occur.
30 bb4eafc6 Antony Chazapis

31 2715ade4 Sofia Papagiannaki
The URL alphabet is fully customizable and may contain any number of
32 2715ade4 Sofia Papagiannaki
characters.  By default, digits and lower-case letters are used, with
33 2715ade4 Sofia Papagiannaki
some removed to avoid confusion between characters like o, O and 0.  The
34 2715ade4 Sofia Papagiannaki
default alphabet is shuffled and has a prime number of characters to further
35 bb4eafc6 Antony Chazapis
improve the results of the algorithm.
36 bb4eafc6 Antony Chazapis

37 2715ade4 Sofia Papagiannaki
The block size specifies how many bits will be shuffled.  The lower BLOCK_SIZE
38 bb4eafc6 Antony Chazapis
bits are reversed.  Any bits higher than BLOCK_SIZE will remain as is.
39 2715ade4 Sofia Papagiannaki
BLOCK_SIZE of 0 will leave all bits unaffected and the algorithm will simply
40 bb4eafc6 Antony Chazapis
be converting your integer to a different base.
41 bb4eafc6 Antony Chazapis

42 2715ade4 Sofia Papagiannaki
The intended use is that incrementing, consecutive integers will be used as
43 2715ade4 Sofia Papagiannaki
keys to generate the short URLs.  For example, when creating a new URL, the
44 2715ade4 Sofia Papagiannaki
unique integer ID assigned by a database could be used to generate the URL
45 2715ade4 Sofia Papagiannaki
by using this module.  Or a simple counter may be used.  As long as the same
46 bb4eafc6 Antony Chazapis
integer is not used twice, the same short URL will not be generated twice.
47 bb4eafc6 Antony Chazapis

48 2715ade4 Sofia Papagiannaki
The module supports both encoding and decoding of URLs. The min_length
49 bb4eafc6 Antony Chazapis
parameter allows you to pad the URL if you want it to be a specific length.
50 bb4eafc6 Antony Chazapis

51 bb4eafc6 Antony Chazapis
Sample Usage:
52 bb4eafc6 Antony Chazapis

53 bb4eafc6 Antony Chazapis
>>> import short_url
54 bb4eafc6 Antony Chazapis
>>> url = short_url.encode_url(12)
55 bb4eafc6 Antony Chazapis
>>> print url
56 bb4eafc6 Antony Chazapis
LhKA
57 bb4eafc6 Antony Chazapis
>>> key = short_url.decode_url(url)
58 bb4eafc6 Antony Chazapis
>>> print key
59 bb4eafc6 Antony Chazapis
12
60 bb4eafc6 Antony Chazapis

61 2715ade4 Sofia Papagiannaki
Use the functions in the top-level of the module to use the default encoder.
62 2715ade4 Sofia Papagiannaki
Otherwise, you may create your own UrlEncoder object and use its encode_url
63 bb4eafc6 Antony Chazapis
and decode_url methods.
64 bb4eafc6 Antony Chazapis

65 bb4eafc6 Antony Chazapis
Author: Michael Fogleman
66 bb4eafc6 Antony Chazapis
License: MIT
67 bb4eafc6 Antony Chazapis
Link: http://code.activestate.com/recipes/576918/
68 bb4eafc6 Antony Chazapis
'''
69 bb4eafc6 Antony Chazapis
70 bb4eafc6 Antony Chazapis
DEFAULT_ALPHABET = 'mn6j2c4rv8bpygw95z7hsdaetxuk3fq'
71 bb4eafc6 Antony Chazapis
DEFAULT_BLOCK_SIZE = 24
72 bb4eafc6 Antony Chazapis
MIN_LENGTH = 5
73 bb4eafc6 Antony Chazapis
74 2715ade4 Sofia Papagiannaki
75 bb4eafc6 Antony Chazapis
class UrlEncoder(object):
76 29148653 Sofia Papagiannaki
    def __init__(self, alphabet=DEFAULT_ALPHABET,
77 29148653 Sofia Papagiannaki
                 block_size=DEFAULT_BLOCK_SIZE):
78 bb4eafc6 Antony Chazapis
        self.alphabet = alphabet
79 bb4eafc6 Antony Chazapis
        self.block_size = block_size
80 bb4eafc6 Antony Chazapis
        self.mask = (1 << block_size) - 1
81 bb4eafc6 Antony Chazapis
        self.mapping = range(block_size)
82 bb4eafc6 Antony Chazapis
        self.mapping.reverse()
83 2715ade4 Sofia Papagiannaki
84 bb4eafc6 Antony Chazapis
    def encode_url(self, n, min_length=MIN_LENGTH):
85 bb4eafc6 Antony Chazapis
        return self.enbase(self.encode(n), min_length)
86 2715ade4 Sofia Papagiannaki
87 bb4eafc6 Antony Chazapis
    def decode_url(self, n):
88 bb4eafc6 Antony Chazapis
        return self.decode(self.debase(n))
89 2715ade4 Sofia Papagiannaki
90 bb4eafc6 Antony Chazapis
    def encode(self, n):
91 bb4eafc6 Antony Chazapis
        return (n & ~self.mask) | self._encode(n & self.mask)
92 2715ade4 Sofia Papagiannaki
93 bb4eafc6 Antony Chazapis
    def _encode(self, n):
94 bb4eafc6 Antony Chazapis
        result = 0
95 bb4eafc6 Antony Chazapis
        for i, b in enumerate(self.mapping):
96 bb4eafc6 Antony Chazapis
            if n & (1 << i):
97 bb4eafc6 Antony Chazapis
                result |= (1 << b)
98 bb4eafc6 Antony Chazapis
        return result
99 2715ade4 Sofia Papagiannaki
100 bb4eafc6 Antony Chazapis
    def decode(self, n):
101 bb4eafc6 Antony Chazapis
        return (n & ~self.mask) | self._decode(n & self.mask)
102 2715ade4 Sofia Papagiannaki
103 bb4eafc6 Antony Chazapis
    def _decode(self, n):
104 bb4eafc6 Antony Chazapis
        result = 0
105 bb4eafc6 Antony Chazapis
        for i, b in enumerate(self.mapping):
106 bb4eafc6 Antony Chazapis
            if n & (1 << b):
107 bb4eafc6 Antony Chazapis
                result |= (1 << i)
108 bb4eafc6 Antony Chazapis
        return result
109 2715ade4 Sofia Papagiannaki
110 bb4eafc6 Antony Chazapis
    def enbase(self, x, min_length=MIN_LENGTH):
111 bb4eafc6 Antony Chazapis
        result = self._enbase(x)
112 bb4eafc6 Antony Chazapis
        padding = self.alphabet[0] * (min_length - len(result))
113 bb4eafc6 Antony Chazapis
        return '%s%s' % (padding, result)
114 2715ade4 Sofia Papagiannaki
115 bb4eafc6 Antony Chazapis
    def _enbase(self, x):
116 bb4eafc6 Antony Chazapis
        n = len(self.alphabet)
117 bb4eafc6 Antony Chazapis
        if x < n:
118 bb4eafc6 Antony Chazapis
            return self.alphabet[x]
119 bb4eafc6 Antony Chazapis
        return self._enbase(x / n) + self.alphabet[x % n]
120 2715ade4 Sofia Papagiannaki
121 bb4eafc6 Antony Chazapis
    def debase(self, x):
122 bb4eafc6 Antony Chazapis
        n = len(self.alphabet)
123 bb4eafc6 Antony Chazapis
        result = 0
124 bb4eafc6 Antony Chazapis
        for i, c in enumerate(reversed(x)):
125 bb4eafc6 Antony Chazapis
            result += self.alphabet.index(c) * (n ** i)
126 bb4eafc6 Antony Chazapis
        return result
127 bb4eafc6 Antony Chazapis
128 bb4eafc6 Antony Chazapis
DEFAULT_ENCODER = UrlEncoder()
129 bb4eafc6 Antony Chazapis
130 2715ade4 Sofia Papagiannaki
131 bb4eafc6 Antony Chazapis
def encode(n):
132 bb4eafc6 Antony Chazapis
    return DEFAULT_ENCODER.encode(n)
133 bb4eafc6 Antony Chazapis
134 2715ade4 Sofia Papagiannaki
135 bb4eafc6 Antony Chazapis
def decode(n):
136 bb4eafc6 Antony Chazapis
    return DEFAULT_ENCODER.decode(n)
137 bb4eafc6 Antony Chazapis
138 2715ade4 Sofia Papagiannaki
139 bb4eafc6 Antony Chazapis
def enbase(n, min_length=MIN_LENGTH):
140 bb4eafc6 Antony Chazapis
    return DEFAULT_ENCODER.enbase(n, min_length)
141 bb4eafc6 Antony Chazapis
142 2715ade4 Sofia Papagiannaki
143 bb4eafc6 Antony Chazapis
def debase(n):
144 bb4eafc6 Antony Chazapis
    return DEFAULT_ENCODER.debase(n)
145 bb4eafc6 Antony Chazapis
146 2715ade4 Sofia Papagiannaki
147 bb4eafc6 Antony Chazapis
def encode_url(n, min_length=MIN_LENGTH):
148 bb4eafc6 Antony Chazapis
    return DEFAULT_ENCODER.encode_url(n, min_length)
149 bb4eafc6 Antony Chazapis
150 2715ade4 Sofia Papagiannaki
151 bb4eafc6 Antony Chazapis
def decode_url(n):
152 bb4eafc6 Antony Chazapis
    return DEFAULT_ENCODER.decode_url(n)
153 bb4eafc6 Antony Chazapis
154 bb4eafc6 Antony Chazapis
if __name__ == '__main__':
155 bb4eafc6 Antony Chazapis
    for a in range(0, 200000, 37):
156 bb4eafc6 Antony Chazapis
        b = encode(a)
157 bb4eafc6 Antony Chazapis
        c = enbase(b)
158 bb4eafc6 Antony Chazapis
        d = debase(c)
159 bb4eafc6 Antony Chazapis
        e = decode(d)
160 bb4eafc6 Antony Chazapis
        assert a == e
161 bb4eafc6 Antony Chazapis
        assert b == d
162 bb4eafc6 Antony Chazapis
        c = (' ' * (7 - len(c))) + c
163 bb4eafc6 Antony Chazapis
        print '%6d %12d %s %12d %6d' % (a, b, c, d, e)