kindle电子书批量下载

Kindle中国电子书店在2023年6月30日18店停止运营，留给我们只有1年的时间(2024年6月30日)可以下载已经购买过的电子书。

对于中国读者而言，这是一个无奈且悲伤的象征性事件，然而我们不得不面对现实:

批量下载已购买的电子书
移除DRM( 使用Calibre移除Kindle文档DRM )，将电子书转到其他平台继续阅读(我准备通过 Send to Kindle 转一部分到美亚账号，并且通过后Kindle时代阅读电子书方式在Apple iBook和Google Books中归档阅读)

批量下载

准备 Python virtualenv ( macOS ) :

在 macOS 环境安装 pip3 以及 venv

# 安装python 3同时会安装pip3
brew install python3

venv初始化

cd ~
python3 -m venv venv3

激活venv

# bash 使用 activate
source venv3/bin/activate

# csh 使用 activate.csh
# source venv3/bin/activate.csh

clone Kindle_download_helper (Github) ，然后 pip 安装对应模块:

安装 Kindle_download_helper

git clone git@github.com:yihong0618/Kindle_download_helper.git
cd Kindle_download_helper
pip3 install -r requirements.txt

执行以下命令下载自己的购买书籍，同时移除DRM:

运行 kindle_download_helper 下载书籍

python3 kindle.py  --dedrm --cn  ## --dedrm 移除 DRM

默认(如果是已经通过浏览器登陆在 amazon 网站)会使用 browser-cookie3 库自动从浏览器获得cookie。如果不是本级下载，则需要手工获得cookie来下载，详见 Kindle_download_helper (Github)

不过，我还是遇到无法获取 CSRF token 报错:

运行 kindle_download_helper 下载书籍报错: 没有找到 csrf token

...
Exception: Can't get the csrf token, please refresh the page at https://www.amazon.cn/hz/mycd/myx#/home/content/booksAll and retry

这个 CSRF Token 参考 Kindle_download_helper (Github) 帮助，从 Amazon.cn全部书籍页面中，通过网页源码，搜索 csrfToken 添加到命令行参数 ${csrfToken} :

运行 kindle_download_helper 下载书籍(结合 device_sn 和 csrfToken )

csrfToken="XXXXXXXX"
device_sn="YYYYYY"

python3 kindle.py --device_sn ${device_sn} --dedrm --cn ${csrfToken}

还是遇到一个报错，和 python kindle.py --pdoc --mode sel --cn --cookie ${cookie} ${csrf}报错 #139 报错相同

运行 kindle_download_helper 下载书籍(结合 device_sn 和 csrfToken )报错

Traceback (most recent call last):
  File "/Users/huataihuang/docs/github.com/yihong0618/Kindle_download_helper/kindle.py", line 5, in <module>
    main()
  File "/Users/huataihuang/docs/github.com/yihong0618/Kindle_download_helper/kindle_download_helper/cli.py", line 318, in main
    kindle.download_books(
  File "/Users/huataihuang/docs/github.com/yihong0618/Kindle_download_helper/kindle_download_helper/kindle.py", line 529, in download_books
    device = self.find_device()
             ^^^^^^^^^^^^^^^^^^
  File "/Users/huataihuang/docs/github.com/yihong0618/Kindle_download_helper/kindle_download_helper/kindle.py", line 130, in find_device
    devices = self.get_devices()
              ^^^^^^^^^^^^^^^^^^
  File "/Users/huataihuang/docs/github.com/yihong0618/Kindle_download_helper/kindle_download_helper/kindle.py", line 224, in get_devices
    r.raise_for_status()
  File "/Users/huataihuang/venv3/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url: https://www.amazon.cn/hz/mycd/ajax

可以看到是 kindle_download_helper/kindle.py 中:

def download_books(self, start_index=0, filetype="EBOK"):
    # use default device
    device = self.find_device()

获取默认设备报错

顺着报错，可以定位到 get_devices 函数:

kindle.py 代码片段 get_devices

    def get_devices(self):
        """
        This method must be called before each download, so we ensure
        the session cookies before it is called
        """
        self.ensure_cookie_token()

        payload = {"param": {"GetDevices": {}}}
        r = self.session.post(
            self.urls["payload"],
            data={
                "data": json.dumps(payload),
                "csrfToken": self.csrf_token,
            },
        )
        r.raise_for_status()
        devices = r.json()
        if devices.get("error"):
            self.revoke_cookie_token(open_page=True)
            raise Exception(
                f"Error: {devices.get('error')}, please visit {self.urls['bookall']} to revoke the csrftoken and cookie"
            )
        devices = r.json()["GetDevices"]["devices"]
        ...

可以看出这里 get_devices 实际上是从 cookie 中获取的，这说明程序中 browser-cookie3 库自动从浏览器获得cookie存在问题。所以按照 Kindle_download_helper (Github) 帮助，从浏览器页面获取 cookie : 在 Amazon.cn全部书籍页面中，按 F12 ，进入 Network 面板。在 Name 栏找到任意一个 ajax 请求，右键，找到 Copy request headers ，从这个复制出来的文本中找到 Cookie: ZZZ ，然后按照以下命令执行:

运行 kindle_download_helper 下载书籍(结合手工获取的 cookie 和 csrfToken )

cookie="ZZZ"
csrfToken="XXXXXXXX"
device_sn="YYYYYY"

python3 kindle.py --device_sn ${device_sn} --dedrm --cn --cookie ${cookie} ${csrfToken}

备注

DeDRM 是失败的，所以我还是采用使用Calibre移除Kindle文档DRM

参考

Kindle_download_helper (Github)