Python3 迭代器与生成器

迭代器

迭代是Python最强大的功能之一，是访问集合元素的一种方式。

迭代器是一个可以记住遍历的位置的对象。

迭代器对象从集合的第一个元素开始访问，直到所有的元素被访问完结束。迭代器只能往前不会后退。

迭代器有两个基本的方法：iter() 和 next()。

字符串，列表，元组，集合、字典、range()、文件句柄等可迭代对象（iterable）都可用于创建迭代器：

内部含有__iter__（）方法的就是可迭代对象，遵循可迭代协议。
可迭代对象.__iter__() 或者 iter(可迭代对象)化成迭代器

>>> list = [1,2,3,4]

>>> it = iter(list)        # 创建迭代器对象

>>> next(it)               # 输出迭代器的下一个元素

1

>>> next(it)

2

>>>

迭代器对象可以使用常规for语句进行遍历：

>>> list = ['a', 'b', 'c', 'd']

>>> it = iter(list)	        # 创建迭代器对象

>>> for x in it:

	print(x, end=" ")

a b c d

>>>

也可以使用 next() 函数：

>>> lst = [2,6,8,9]

>>> it = iter(lst)              # 创建迭代器对象

>>>

>>> while True:

	try:

		print(next(it))

	except StopIteration:

		break

2

6

8

9

>>>

创建一个迭代器

把一个类作为一个迭代器使用需要在类中实现两个方法 __iter__() 与 __next__() 。

如果你已经了解的面向对象编程，就知道类都有一个构造函数，Python 的构造函数为 __init__(), 它会在对象初始化的时候执行。

__iter__() 方法返回一个特殊的迭代器对象， 这个迭代器对象实现了 __next__() 方法并通过 StopIteration 异常标识迭代的完成。

__next__() 方法（Python 2 里是 next()）会返回下一个迭代器对象。

创建一个返回数字的迭代器（计数器），初始值为 1，逐步递增 1：

class Counter:

  def __iter__(self):

    self.a = 1

    return self

  def __next__(self):

    x = self.a

    self.a += 1

    return x

myclass = Counter()

myiter = iter(myclass)

print(next(myiter))

print(next(myiter))

print(next(myiter))

print(next(myiter))

print(next(myiter))

# 执行输出结果为：

1

2

3

4

5

StopIteration

　　StopIteration 异常用于标识迭代的完成，防止出现无限循环的情况，在 __next__() 方法中我们可以设置在完成指定循环次数后触发 StopIteration 异常来结束迭代。

>>> str1 = "Python"

>>> strObj = str1.__iter__()

>>> strObj.__next__()

'P'

>>> strObj.__next__()

'y'

>>> strObj.__next__()

't'

>>> strObj.__next__()

'h'

>>> strObj.__next__()

'o'

>>> strObj.__next__()

'n'

>>> strObj.__next__()

Traceback (most recent call last):

  File "<pyshell#33>", line 1, in <module>

    strObj.__next__()

StopIteration

>>>

那么如何判断一个对象是否是可迭代对象？

内部是否含有__iter__方法：
借助 collections 中 Iterable，Iterator 判断类型

>>> tup = (1,2,3)

>>> type(tup)

<class 'tuple'>

>>> dir(tup)        # 带参数时，返回参数的属性、方法列表。

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', 
'__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__',
 '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'count', 'index']

>>> print('__iter__' in dir(tup))

True

>>>

>>> dic = {1:'dict', 2:'str', 3:'list', 4:'tuple', 5:'set', 6:'range()',7:'flie handler'}

>>> isinstance(dic, Iterable)

True

>>> isinstance(dic, Iterator)

False

>>>

>>> ran = range(6)

>>> type(ran)

<class 'range'>

>>> isinstance(ran, Iterable)

True

>>> isinstance(ran, Iterator)

False

>>>

生成器

　　在 Python 中，使用了 yield 的函数被称为生成器（generator）。

　　跟普通函数不同的是，生成器是一个返回迭代器的函数，只能用于迭代操作，更简单点理解生成器就是一个迭代器。

　　在调用生成器运行的过程中，每次遇到 yield 时函数会暂停并保存当前所有的运行信息，返回 yield 的值, 并在下一次执行 next() 方法时从当前位置继续运行。

　　调用一个生成器函数，返回的是一个迭代器对象。

　　yield Vs return：

 return返回后，函数状态终止，而yield会保存当前函数的执行状态，在返回后，函数又回到之前保存的状态继续执行。

return 终止函数，yield 不会终止生成器函数。
都会返回一个值，return给函数的执行者返回值，yield是给next()返回值

　　以下实例使用 yield 实现斐波那契数列：

>>> def fib(max):          # 生成器函数 - 斐波那契

	a, b, n = 0, 1, 0

	while n < max:

		yield b    # 使用 yield

		a, b = b, a + b

		n = n + 1

>>> f = fib(6)             # 调用 fab(5) 不会执行 fab 函数，而是返回一个 iterable 对象！

>>> f                      # Python 解释器会将其视为一个 generator

<generator object fib at 0x000001C6CB627780>

>>>

>>> for n in fib(5):

	print(n)

1

1

2

3

5

>>>

>>> f = fib(5)

>>> next(f)　　       # 使用next函数从生成器中取值，使用next可以推动生成器的执行

1

>>> next(f)

1

>>> next(f)

2

>>> next(f)

3

>>> next(f)

5

>>> next(f)　　        # 当函数中已经没有更多的yield时继续执行next(g)，遇到StopIteration

Traceback (most recent call last):

  File "<pyshell#37>", line 1, in <module>

    next(f)

StopIteration

>>>

>>> fwrong = fib(6)

>>> fwrong.next()      # Python2 中的语法，Python3 会报错

Traceback (most recent call last):

  File "<pyshell#40>", line 1, in <module>

    fwrong.next()      # Python2 中的语法，Python3 会报错

AttributeError: 'generator' object has no attribute 'next'

>>>

　　send向生成器中发送数据。send的作用相当于next，只是在驱动生成器继续执行的同时还可以向生成器中传递数据。

>>> import numbers

>>> def gen_sum():

	total = 0

	while True:

		num = yield

		if isinstance(num, numbers.Integral):

			total += num

			print('total: ', total)

		elif num is None:

			break

	return total

>>> g = gen_sum()

>>> g

<generator object gen_sum at 0x0000026A6703D3B8>

>>> g.send(None)    # 相当于next(g),预激活生成器

>>> g.send(2)

total:  2

>>> g.send(6)

total:  8

>>> g.send(12)

total:  20

>>> g.send(None)    # 停止生成器

Traceback (most recent call last):

  File "<pyshell#40>", line 1, in <module>

    g.send(None)

StopIteration: 20

>>>

>>> try:

	g.send(None)	# 停止生成器

except StopIteration as e:

	print(e.value)

None

>>>

yield from关键字

　　yield from 将一个可迭代对象变成一个迭代器返回，也可以说，yield from关键字可以直接返回一个生成器

>>> def func():

	lst = ['str', 'tuple', 'list', 'dict', 'set']

	yield lst

>>> gen = func()

>>> next(gen)

['str', 'tuple', 'list', 'dict', 'set']

>>> for i in gen:

	print(i)

>>> # yield from 将一个可迭代对象变成一个迭代器返回

>>> def func2():

	lst = ['str', 'tuple', 'list', 'dict', 'set']

	yield from lst

>>> gen2 = func2()

>>> next(gen2)

'str'

>>> next(gen2)

'tuple'

>>> for i in gen2:

	print(i)

list

dict

set

>>>

>>> lst = ['H','e','l']

>>> dic = {'l':'vvvvv','o':'eeeee'}

>>> str1 = 'Python'

>>>

>>> def yield_gen():

    for i in lst:

        yield i

    for j in dic:

        yield j

    for k in str1:

        yield k

>>> for item in yield_gen():

	print(item, end='')

HelloPython

>>>

>>> l = ['H','e','l']

>>> d = {'l':'xxxxx','o':'ooooo'}

>>> s = 'Java'

>>>

>>> def yield_from_gen():

	yield from l

	yield from d

	yield from s

>>> for item in yield_from_gen():

	print(item, end='')

HelloJava

>>>

为什么使用生成器

更容易使用，代码量较小内存使用更加高效。比如：

列表是在建立的时候就分配所有的内存空间，
而生成器仅仅是需要的时候才使用，更像一个记录代表了一个无限的流。有点像数据库操作单条记录使用的游标。

如果我们要读取并使用的内容远远超过内存，但是需要对所有的流中的内容进行处理，那么生成器是一个很好的选择，
比如可以让生成器返回当前的处理状态，由于它可以保存状态，那么下一次直接处理即可。

协程

　　根据维基百科给出的定义，“协程是为非抢占式多任务产生子程序的计算机程序组件，协程允许不同入口点在不同位置暂停或开始执行程序”。从技术的角度来说，“协程就是你可以暂停执行的函数”。如果你把它理解成“就像生成器一样”，那么你就想对了。

协程，又称微线程，纤程。英文名Coroutine。

协程的概念很早就提出来了，但直到最近几年才在某些语言（如Lua）中得到广泛应用。

# 与多线程、多进程等并发模型不同，协程依靠user-space调度，而线程、进程则是依靠kernel来进行调度。

# 线程、进程间切换都需要从用户态进入内核态，而协程的切换完全是在用户态完成，且不像线程进行抢占式调度，协程是非抢占式的调度。

# 通常多个运行在同一调度器中的协程运行在一个线程内，这也消除掉了多线程同步等带来的编程复杂性。同一时刻同一调度器中的协程只有一个会处于运行状态，这一点很容易从前言得出。

一个通常的误解是协程不能利用CPU的多核心，通过利用多个线程多个调度器，协程也是可以用到CPU多核心性能的。

协程的定义

协程最早的描述是由Melvin Conway于1958给出“subroutines who act as the master program”(与主程序行为类似的子例程)，此后他又在博士论文中给出了如下定义：

· the values of data local to a coroutine persist between successive calls(协程的局部数据在后续调用中始终保持)

· the execution of a coroutine is suspended as control leaves it, only to carry on where it left off when control re-enters the coroutine at some later stage

(当控制流程离开时，协程的执行被挂起，此后控制流程再次进入这个协程时，这个协程只应从上次离开挂起的地方继续)。

协程的特点在于是一个线程执行，那和多线程比，协程有何优势？

最大的优势就是协程极高的执行效率。
- 因为子程序切换不是线程切换，而是由程序自身控制，因此，没有线程切换的开销，
- 和多线程比，线程数量越多，协程的性能优势就越明显。
第二大优势就是不需要多线程的锁机制，
- 因为只有一个线程，也不存在同时写变量冲突，
- 在协程中控制共享资源不加锁，只需要判断状态就好了，所以执行效率比多线程高很多。

因为协程是一个线程执行，那怎么利用多核CPU呢？

最简单的方法是多进程+协程，既充分利用多核，又充分发挥协程的高效率，可获得极高的性能。
Python对协程的支持是通过generator实现的。

使用yield实现协程

#基于yield实现异步

def consumer():

    '''任务1:接收数据,处理数据'''

    while True:

        x=yield

def producer():

    '''任务2:生产数据'''

    g=consumer()

    next(g)

    for i in range(10000000):

        g.send(i)

producer()

使用yield from实现的协程

import datetime

import heapq    # 堆模块

import time

class Task:

    def __init__(self, wait_until, coro):

        self.coro = coro

        self.waiting_until = wait_until

    def __eq__(self, other):

        return self.waiting_until == other.waiting_until

    def __lt__(self, other):

        return self.waiting_until < other.waiting_until

class SleepingLoop:

    def __init__(self, *coros):

        self._new = coros

        self._waiting = []

    def run_until_complete(self):

        for coro in self._new:

            wait_for = coro.send(None)

            heapq.heappush(self._waiting, Task(wait_for, coro))

        while self._waiting:

            now = datetime.datetime.now()

            task = heapq.heappop(self._waiting)

            if now < task.waiting_until:

                delta = task.waiting_until - now

                time.sleep(delta.total_seconds())

                now = datetime.datetime.now()

            try:

                print('*'*50)

                wait_until = task.coro.send(now)

                print('-'*50)

                heapq.heappush(self._waiting, Task(wait_until, task.coro))

            except StopIteration:

                pass

def sleep(seconds):

    now = datetime.datetime.now()

    wait_until = now + datetime.timedelta(seconds=seconds)

    print('before yield wait_until')

    actual = yield wait_until   # 返回一个datetime数据类型的时间

    print('after yield wait_until')

    return actual - now

def countdown(label, length, *, delay=0):

    print(label, 'waiting', delay, 'seconds before starting countdown')

    delta = yield from sleep(delay)

    print(label, 'starting after waiting', delta)

    while length:

        print(label, 'T-minus', length)

        waited = yield from sleep(1)

        length -= 1

    print(label, 'lift-off!')

def main():

    loop = SleepingLoop(countdown('A', 5), countdown('B', 3, delay=2),

                        countdown('C', 4, delay=1))

    start = datetime.datetime.now()

    loop.run_until_complete()

    print('Total elapsed time is', datetime.datetime.now() - start)

if __name__ == '__main__':

    main()

　　执行结果：

A waiting 0 seconds before starting countdown

before yield wait_until

B waiting 2 seconds before starting countdown

before yield wait_until

C waiting 1 seconds before starting countdown

before yield wait_until

**************************************************

after yield wait_until

A starting after waiting 0:00:00

A T-minus 5

before yield wait_until

--------------------------------------------------

**************************************************

after yield wait_until

C starting after waiting 0:00:01.001511

C T-minus 4

before yield wait_until

--------------------------------------------------

**************************************************

after yield wait_until

A T-minus 4

before yield wait_until

--------------------------------------------------

**************************************************

after yield wait_until

B starting after waiting 0:00:02.000894

B T-minus 3

before yield wait_until

--------------------------------------------------

**************************************************

after yield wait_until

C T-minus 3

before yield wait_until

--------------------------------------------------

**************************************************

after yield wait_until

A T-minus 3

before yield wait_until

--------------------------------------------------

**************************************************

after yield wait_until

B T-minus 2

before yield wait_until

--------------------------------------------------

**************************************************

after yield wait_until

C T-minus 2

before yield wait_until

--------------------------------------------------

**************************************************

after yield wait_until

A T-minus 2

before yield wait_until

--------------------------------------------------

**************************************************

after yield wait_until

B T-minus 1

before yield wait_until

--------------------------------------------------

**************************************************

after yield wait_until

C T-minus 1

before yield wait_until

--------------------------------------------------

**************************************************

after yield wait_until

A T-minus 1

before yield wait_until

--------------------------------------------------

**************************************************

after yield wait_until

B lift-off!

**************************************************

after yield wait_until

C lift-off!

**************************************************

after yield wait_until

A lift-off!

Total elapsed time is 0:00:05.005168

asyncio模块

　　asyncio是Python 3.4版本引入的标准库，直接内置了对异步IO的支持。

　　用asyncio提供的@asyncio.coroutine可以把一个generator标记为coroutine类型，然后在coroutine内部用yield from调用另一个coroutine实现异步操作。

　　asyncio的编程模型就是一个消息循环。我们从asyncio模块中直接获取一个EventLoop的引用，然后把需要执行的协程扔到EventLoop中执行，就实现了异步IO。

coroutine+yield from

import asyncio

@asyncio.coroutine

def hello():

    print("Nice to learn asyncio.coroutine!")

    # 异步调用asyncio.sleep(1):

    r = yield from asyncio.sleep(1)

    print("Nice to learn asyncio.coroutine again !")

# 获取EventLoop:

loop = asyncio.get_event_loop()

# 执行coroutine

loop.run_until_complete(hello())

loop.close()

Nice to learn asyncio.coroutine !

Nice to learn asyncio.coroutine again !

　为了简化并更好地标识异步IO，从Python 3.5开始引入了新的语法async和await，可以让coroutine的代码更简洁易读。

　请注意，async和 await是针对coroutine的新语法，要使用新的语法，只需要做两步简单的替换：

把@asyncio.coroutine替换为async；
把yield from替换为await。

async+await

　　在协程函数中，可以通过await语法来挂起自身的协程，并等待另一个协程完成直到返回结果：

import asyncio

async def hello():

    print("Nice to learn asyncio.coroutine!")

    # 异步调用asyncio.sleep(1):

    await asyncio.sleep(1)

    print("Nice to learn asyncio.coroutine again !")

# 获取EventLoop:

loop = asyncio.get_event_loop()

# 执行coroutine

loop.run_until_complete(hello())

loop.close()

执行多个任务

import threading

import asyncio

async def hello():

    print('Hello Python! (%s)' % threading.currentThread())

    await asyncio.sleep(1)

    print('Hello Python again! (%s)' % threading.currentThread())

loop = asyncio.get_event_loop()

tasks = [hello(), hello()]

loop.run_until_complete(asyncio.wait(tasks))

loop.close()

　结果：

Hello Python! (<_MainThread(MainThread, started 4536)>)

Hello Python! (<_MainThread(MainThread, started 4536)>)

Hello Python again! (<_MainThread(MainThread, started 4536)>)

Hello Python again! (<_MainThread(MainThread, started 4536)>)

获取返回值

import threading

import asyncio

async def hello():

    print('Hello Python! (%s)' % threading.currentThread())

    await asyncio.sleep(1)

    print('Hello Python again! (%s)' % threading.currentThread())

    return "It's done"

loop = asyncio.get_event_loop()

task = loop.create_task(hello())

loop.run_until_complete(task)

ret = task.result()

print(ret)

　结果：

Hello Python! (<_MainThread(MainThread, started 6136)>)

Hello Python again! (<_MainThread(MainThread, started 6136)>)

It's done

执行多个任务获取返回值

import threading

import asyncio

async def hello(seq):

    print('Hello Python! (%s)' % threading.currentThread())

    await asyncio.sleep(1)

    print('Hello Python again! (%s)' % threading.currentThread())

    return "It's done", seq

loop = asyncio.get_event_loop()

task1 = loop.create_task(hello(2))

task2 = loop.create_task(hello(1))

task_list = [task1, task2]

tasks = asyncio.wait(task_list)

loop.run_until_complete(tasks)

for t in task_list:

    print(t.result())

结果：

Hello Python! (<_MainThread(MainThread, started 12956)>)

Hello Python! (<_MainThread(MainThread, started 12956)>)

Hello Python again! (<_MainThread(MainThread, started 12956)>)

Hello Python again! (<_MainThread(MainThread, started 12956)>)

("It's done", 2)

("It's done", 1)

巴特西

Python 生成器和协程