Python 进阶：生成器与迭代器

迭代器协议

迭代器是 Python 中最容易被忽视但又极其强大的特性。理解迭代器协议是掌握生成器的前提。

Python 中任何对象只要实现了 __iter__() 和 __next__() 两个方法，就可以成为迭代器：

python

class CountDown:
    """倒计时迭代器"""
    def __init__(self, start):
        self.current = start

    def __iter__(self):
        return self  # 迭代器自身实现 __iter__，返回 self

    def __next__(self):
        if self.current <= 0:
            raise StopIteration  # 迭代结束
        value = self.current
        self.current -= 1
        return value

# 使用方式
countdown = CountDown(5)
for num in countdown:
    print(num)
# 输出: 5 4 3 2 1

迭代器的核心原则：惰性求值（Lazy Evaluation）——只在需要时才计算下一个值，节省内存。

生成器函数

生成器是一种特殊的迭代器，用函数语法糖实现，比手写迭代器简洁得多。关键在于 yield 关键字：

python

def fibonacci(n):
    """生成前 n 个斐波那契数列"""
    a, b = 0, 1
    for _ in range(n):
        yield a      # 暂停函数执行，返回值
        a, b = b, a + b

# 生成斐波那契数列前 10 项
for num in fibonacci(10):
    print(num, end=' ')
# 输出: 0 1 1 2 3 5 8 13 21 34

yield 与 return 的区别：

特性	`return`	`yield`
函数类型	普通函数	生成器函数
返回值	一次性返回	逐个产出
状态	函数结束	暂停，保留状态
调用结果	直接得到返回值	得到生成器对象

生成器表达式

类似列表推导式，但用圆括号，返回生成器而非列表：

python

# 列表推导式 - 立即计算，内存中存储所有元素
squares_list = [x**2 for x in range(10)]

# 生成器表达式 - 惰性计算，按需取值
squares_gen = (x**2 for x in range(10))

print(squares_list)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
print(squares_gen)   # <generator object <genexpr> at 0x...>

# 取值
print(next(squares_gen))  # 0
print(next(squares_gen))  # 1
print(list(squares_gen))  # [4, 9, 16, 25, 36, 49, 64, 81]

生成器表达式的优势在于内存占用极低，适合处理大数据流：

python

# 计算 100 万个数的立方和 - 列表需要存储 100 万个数，生成器只存储计算逻辑
total = sum(x**3 for x in range(1_000_000))
print(total)

生成器的进阶用法

send() 方法

生成器可以通过 send() 从外部向生成器内部传递数据，实现双向通信：

python

def coro():
    """协程风格的生成器"""
    print("coro started")
    while True:
        received = yield  # yield 表达式可以接收值
        print(f"Received: {received}")

c = coro()
next(c)           # 启动协程，输出 "coro started"
c.send("Hello")   # 输出 "Received: Hello"
c.send("World")   # 输出 "Received: World"
c.close()         # 关闭协程

throw() 与 close()

python

def gen():
    try:
        yield 1
        yield 2
    except ValueError:
        print("Caught ValueError")

g = gen()
print(next(g))    # 1
g.throw(ValueError)  # 抛出异常，输出 "Caught ValueError"
print(next(g))    # StopIteration（生成器结束）

itertools 模块

itertools 是 Python 标准库中处理迭代器的利器，提供三大类函数：无限迭代器、有限迭代器、组合生成器。

无限迭代器

python

import itertools

# count(start, step) - 无限计数
for i in itertools.count(10, 2):
    if i > 20:
        break
    print(i, end=' ')
# 输出: 10 12 14 16 18 20

# cycle(iterable) - 无限循环
colors = itertools.cycle(['红', '绿', '蓝'])
for i in range(7):
    print(next(colors), end=' ')
# 输出: 红 绿 蓝 红 绿 蓝 红

# repeat(elem, n) - 重复元素
for i in itertools.repeat("⚡", 3):
    print(i, end=' ')
# 输出: ⚡ ⚡ ⚡

有限迭代器

python

# accumulate(iterable, func) - 累积计算（默认求和）
import itertools
data = [1, 2, 3, 4, 5]
result = list(itertools.accumulate(data))
print(result)  # [1, 3, 6, 10, 15]

# 自定义累积函数
result = list(itertools.accumulate(data, lambda a, b: a * b))
print(result)  # [1, 2, 6, 24, 120]

# chain(*iterables) - 连接多个迭代器
list1 = [1, 2, 3]
list2 = [4, 5]
list3 = [6]
combined = list(itertools.chain(list1, list2, list3))
print(combined)  # [1, 2, 3, 4, 5, 6]

# compress(data, selectors) - 按条件过滤
letters = 'ABCDEF'
flags = [True, False, True, False, True, True]
filtered = list(itertools.compress(letters, flags))
print(filtered)  # ['A', 'C', 'E', 'F']

# islice(iterable, start, stop, step) - 切片
numbers = range(20)
sliced = list(itertools.islice(numbers, 5, 15, 2))
print(sliced)  # [5, 7, 9, 11, 13]

组合生成器

python

# product(*iterables, repeat) - 笛卡尔积
colors = ['红', '绿']
sizes = ['S', 'M', 'L']
for item in itertools.product(colors, sizes):
    print(item)
# ('红', 'S') ('红', 'M') ('红', 'L')
# ('绿', 'S') ('绿', 'M') ('绿', 'L')

# permutations(iterable, r) - 排列
for perm in itertools.permutations('ABC', 2):
    print(perm, end=' ')
# ('A', 'B') ('A', 'C') ('B', 'A') ('B', 'C') ('C', 'A') ('C', 'B')

# combinations(iterable, r) - 组合（不重复）
for comb in itertools.combinations('ABCD', 2):
    print(comb, end=' ')
# ('A', 'B') ('A', 'C') ('A', 'D') ('B', 'C') ('B', 'D') ('C', 'D')

# combinations_with_replacement - 带重复的组合
for comb in itertools.combinations_with_replacement('AB', 3):
    print(comb, end=' ')
# ('A', 'A', 'A') ('A', 'A', 'B') ('A', 'B', 'B') ('B', 'B', 'B')

实战案例：分批处理大文件

生成器最经典的应用场景——分批读取大文件，避免内存溢出：

python

import itertools

def read_large_file(filepath, batch_size=1000):
    """分批读取大文件，每次返回 batch_size 行"""
    with open(filepath, 'r', encoding='utf-8') as f:
        while True:
            batch = list(itertools.islice(f, batch_size))
            if not batch:
                break
            yield batch

# 使用示例
for batch in read_large_file('large_log.txt', batch_size=100):
    process(batch)  # 处理每批数据

小结

迭代器协议：__iter__() + __next__()，惰性求值
生成器函数：yield 关键字，自动实现迭代器协议
生成器表达式：() 语法，适合简单场景
itertools：处理迭代器的瑞士军刀，无限迭代、有限迭代、组合生成
生成器的核心价值：节省内存、处理大数据流、实现惰性计算

[[返回 Python 首页|python/index]]

迭代器协议 ​

生成器函数 ​

生成器表达式 ​

生成器的进阶用法 ​

send() 方法 ​

throw() 与 close() ​

itertools 模块 ​

无限迭代器 ​

有限迭代器 ​

组合生成器 ​

实战案例：分批处理大文件 ​

小结 ​

迭代器协议

生成器函数

生成器表达式

生成器的进阶用法

send() 方法

throw() 与 close()

itertools 模块

无限迭代器

有限迭代器

组合生成器

实战案例：分批处理大文件

小结