1.1. 数据结构和算法

1.1.1. 解压序列赋值给多个变量

In [1]: a,b=1,2

In [2]: print(a,b)
1 2

In [3]: a,b=[1,2]

In [4]: print(a,b)
1 2

In [5]: s = 'Hello'

In [6]: a,b,c,d,_=s

In [7]: print(a)
H

In [8]: print(d)
l

In [9]: data = [ 'ACME', 50, 91.1, (2012, 12, 21) ]

In [10]: a,b,c,(d,e,f) =data

In [11]: print(a)
ACME

In [12]: print(d)
2012

序列都是可以拆解的, 如果想丢弃参数,可以使用_(本质也是变量),需要保证这个变量不被其他使用或者使用这个变量覆盖其他的。

1.1.2. 解压可迭代对象赋值给多个变量

def drop_first_last(grades):
    first,*middle,last = grades
    return sum(middle)/len(middle)

In [17]: drop_first_last([1,2,5,6])
Out[17]: 3.5

In [18]: record = ('Dave', 'dave@example.com', '773-555-1212', '847-555-1212')

In [19]: name,email,*phone = record

In [20]: print(phone)
['773-555-1212', '847-555-1212']

*变量名 获取的结果一定是一个list类型的。

1.1.3. 保留最后 N 个元素

使用 deque(maxlen=N) 构造函数会新建一个固定大小的队列。当新的元素加入并且这个队列已满的时候, 最老的元素会自动被移除掉。

#!/usr/bin/env python3
# -*- coding: utf-8 -*-


from collections import deque

def search(lines, pattern, history=5):
    previous_lines = deque(maxlen=history)
    for line in lines:
        if pattern in line: 
            yield line,previous_lines
            previous_lines.append(line)

if __name__ == '__main__':
    with open(r'./somefile.txt') as f:
        for line, prevlines in search(f, 'python', 3):
            for pline in prevlines:
                print(pline, end='')
            # print(line, end='')
            print('-' * 20)

1.1.4. 查找最大或最小的 N 个元素

heapq 模块有两个函数:nlargest() 和 nsmallest() 可以完美解决这个问题。

堆数据结构最重要的特征是 heap[0] 永远是最小的元素。并且剩余的元素可以很容易的通过调用 heapq.heappop() 方法得到

In [2]: import heapq
...: nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
In [4]: heapq.nlargest(3,nums)
Out[4]: [42, 37, 23]

In [7]:  heapq.heapify(nums)
In [10]: nums[0]
Out[10]: -4

In [11]: heapq.heappop(nums)
Out[11]: -4

In [12]: nums
Out[12]: [1, 2, 2, 23, 7, 8, 18, 23, 42, 37]
#!/usr/bin/env python3
# -*- coding: utf-8 -*-


def heap_sort(nums):
    first = len(nums)//2-1
    for start in range(first,-1,-1): 
        big_heap(nums,start,len(nums)-1)

    for end in range(len(nums)-1,0,-1):
        nums[0],nums[end] = nums[end],nums[0]
        big_heap(nums,0,end-1)

def big_heap(nums,start,end):
    root = start 
    child = root*2 +1

    while child<=end:
        if child +1 <=end and nums[child] < nums[child+1]: 
            child +=1 
        if nums[root] < nums[child]: 
            nums[root],nums[child] = nums[child],nums[root]
            root = child 
            child=root*2+1
        else: 
            break 

if __name__ == "__main__": 
    nums=[10,17,50,7,30,24,27,45,15,5,36,21]
    print(heap_sort(nums))

1.1.5. 实现一个优先级队列

index 变量组成三元组 (priority, index, item) ,就能很好的避免上面的错误, 因为不可能有两个元素有相同的 index 值。

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import heapq

class PriorityQueue():
    def __init__(self) -> None:
        self._queue = []
        self._index = 0
    def push(self, item, priority):
        heapq.heappush(self._queue,(-priority,self._index,item))
        self._index += 1
    def pop(self):
        return heapq.heappop(self._queue)[-1]
    
class Item:
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return 'Item({!r})'.format(self.name)
    


if __name__ == "__main__":
    q = PriorityQueue()
    print(q.push(Item('foo'), 1))
    print(q.push(Item('bar'), 5))
    print(q.push(Item('spam'), 4))
    print(q.push(Item('grok'), 1))
    print(q.pop())

1.1.6. 字典中的键映射多个值

d = {
    'a' : [1, 2, 3],
    'b' : [4, 5]
}
e = {
    'a' : {1, 2, 3},
    'b' : {4, 5}
}

def m1():
    result = {}
    for k,v in d.items():
        if k not in result: 
            result[k] =[]
        result[k].extend(v)
    for k,v in e.items():
        if k not in result: 
            result[k] =[]
        result[k].extend(v)
    print(result)

def m2(): 
    from collections import defaultdict
    result =defaultdict(list)
    for k,v in d.items():
        result[k].extend(v)
    for k,v in e.items():
        result[k].extend(v)
    print(result)
def m3(): 
  
    result ={}
    for k,v in d.items():
        result.setdefault(k,[]).extend(v)
    for k,v in e.items():
         result.setdefault(k,[]).extend(v)
    print(result)

if __name__ == "__main__": 
    m1()
    m2()
    m3()

1.1.7. 字典排序

为了能控制一个字典中元素的顺序,你可以使用 collections 模块中的 OrderedDict 类。 在迭代操作的时候它会保持元素被插入时的顺序。

prices = {
    'ACME': 45.23,
    'AAPL': 612.78,
    'IBM': 205.55,
    'HPQ': 37.20,
    'FB': 10.75
}


# find min prices 
min_price = min(zip(prices.values(),prices.keys()))
print(min_price)

1.1.8. 查找两字典的相同点

In [3]: a = {
...:     'x' : 1,
...:     'y' : 2,
...:     'z' : 3
...: }
...:
...: b = {
...:     'w' : 10,
...:     'x' : 11,
...:     'y' : 2
...: }
...:

In [4]: a.keys() & b.keys()
Out[4]: {'x', 'y'}

In [5]: a.keys() - b.keys()
Out[5]: {'z'}

1.1.9. 删除序列相同元素并保持顺序

#!/usr/bin/env python3
# -*- coding: utf-8 -*-


def dedupe(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item
            seen.add(item)

def dedupe_v2(items,key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in seen:
            yield item
            seen.add(val)

if __name__ == "__main__":
    a = [1, 5, 2, 1, 9, 1, 5, 10]
    print(list(dedupe(a)))
    b= [ {'x':1, 'y':2}, {'x':1, 'y':3}, {'x':1, 'y':2}, {'x':2, 'y':4}]
    print(list(dedupe_v2(b, key=lambda d: (d['x'],d['y']))))


1.1.10. 命名切片

你避免了大量无法理解的硬编码下标,使得你的代码更加清晰可读了 .. code-block:: python

record = ‘………………..100 …….513.25 ……….’ cost = int(record[20:23]) * float(record[31:37])

SHARES = slice(20, 23) PRICE = slice(31, 37) cost = int(record[SHARES]) * float(record[PRICE])

1.1.11. 序列中出现次数最多的元素

collections.Counter 类就是专门为这类问题而设计的, 它甚至有一个有用的 most_common() 方法

words = [
    'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',
    'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the',
    'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into',
    'my', 'eyes', "you're", 'under'
]
from collections import Counter
a = Counter(words)
# 出现频率最高的3个单词
top_three = a.most_common(3)
print(top_three)


a.update(words)
b=Counter(words)
print(a-b)
print(a+b)
a.clear()

1.1.12. 通过某个关键字排序一个字典列表

In [1]: rows = [
...:     {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
...:     {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
...:     {'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
...:     {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}
...: ]

In [2]: from operator import itemgetter

In [3]: sorted(rows,key=itemgetter('uid'))
Out[3]:
[{'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
{'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
{'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}]

In [4]: sorted(rows,key=itemgetter('uid','fname'))
Out[4]:
[{'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
{'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
{'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}]

字典可以通过 itemgetter() ,而对象可以通过 attrgetter()

1.1.13. 通过某个字段将记录分组

In [5]: rows = [
...:     {'address': '5412 N CLARK', 'date': '07/01/2012'},
...:     {'address': '5148 N CLARK', 'date': '07/04/2012'},
...:     {'address': '5800 E 58TH', 'date': '07/02/2012'},
...:     {'address': '2122 N CLARK', 'date': '07/03/2012'},
...:     {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'},
...:     {'address': '1060 W ADDISON', 'date': '07/02/2012'},
...:     {'address': '4801 N BROADWAY', 'date': '07/01/2012'},
...:     {'address': '1039 W GRANVILLE', 'date': '07/04/2012'},
...: ]
...:

In [6]: from itertools import groupby

In [7]: from operator import itemgetter

In [11]: rows.sort(key=itemgetter('date'))
In [9]: res = groupby(rows,key=itemgetter('date'))

In [10]: list(res)

1.1.14. 过滤序列元素

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
a = [1, 4, -5, 10, -7, 2, 3, -1]
print([x for x in a if x>0])

b = ['1', '2', '-3', '-', '4', 'N/A', '5']

def is_int(s): 
    try : 
        _= int(s)
    except: 
        return False
    return True

c = list(filter(is_int, b))
print(c)


import itertools

Codes =['C', 'C++', 'Java', 'Python'] 
selectors = [False, False, False, True] 
  
Best_Programming = itertools.compress(Codes, selectors) 
  
for each in Best_Programming: 
    print(each) 

1.1.15. 从字典中提取子集

prices = {
    'ACME': 45.23,
    'AAPL': 612.78,
    'IBM': 205.55,
    'HPQ': 37.20,
    'FB': 10.75
}

p1 = {k:v for k,v in prices.items() if v>200}
print(p1)

p2 = {k:v for k,v in prices.items() if k in ['IBM','HPQ']}
print(p2)

1.1.16. 映射名称到序列元素

from collections import namedtuple
Subscriber = namedtuple('Subscriber', ['addr', 'joined'])
sub = Subscriber('jonesy@example.com', '2012-10-19')
print(sub.addr,sub.joined)
sub = sub._replace(addr='hello')

1.1.17. 转换并同时计算数据

nums = [1, 2, 3, 4, 5]
s = sum(x * x for x in nums)
s = sum((x * x for x in nums))

1.1.18. 合并多个字典或映射

假设你必须在两个字典中执行查找操作(比如先从 a 中找,如果找不到再在 b 中找)。 一个非常简单的解决方案就是使用 collections 模块中的 ChainMap 类

a = {'x': 1, 'z': 3 }
b = {'y': 2, 'z': 4 }
from collections import ChainMap
c = ChainMap(a,b)
print(c['x']) # Outputs 1 (from a)
print(c['y']) # Outputs 2 (from b)
print(c['z']) # Outputs 3 (from a)