主流網(wǎng)站開發(fā)軟件優(yōu)秀網(wǎng)站
python 項(xiàng)目中使用 celery 中導(dǎo)致mysql數(shù)據(jù)庫連接耗盡記錄【mysql數(shù)據(jù)庫連接池使用錯(cuò)誤】
結(jié)論:由于使用 celery 進(jìn)行項(xiàng)目的多任務(wù)管理,在worker任務(wù)定義的過程中,使用了 dbutils 中的 PooledDB 連接池進(jìn)行 mysql數(shù)據(jù)庫連接, 因此系統(tǒng)在并發(fā)執(zhí)行 worker 任務(wù)時(shí), 將產(chǎn)生大量的數(shù)據(jù)庫連接,最終導(dǎo)致mysql數(shù)據(jù)庫連接耗盡
我在該 celery 項(xiàng)目中,使用了 20 個(gè) worker 節(jié)點(diǎn),每個(gè) worker 節(jié)點(diǎn)開啟了 32 個(gè)prefork子進(jìn)程, 每個(gè)子進(jìn)程中將產(chǎn)生連接池,而 PooledDB 連接池最大為 200個(gè), 最終該項(xiàng)目產(chǎn)生的數(shù)據(jù)庫連接數(shù)為 20 * 32 * 200 =12800 個(gè)數(shù)據(jù)庫連接 ;
而我用的阿里云 rds 數(shù)據(jù)庫, 8核16G 該配置連接數(shù)為 1600 個(gè),所以每次只要大量并發(fā)執(zhí)行有數(shù)據(jù)庫操作的任務(wù), 就會(huì)導(dǎo)致數(shù)據(jù)庫連接耗盡, 所有連接該數(shù)據(jù)庫的服務(wù)都掛了
下面是我在項(xiàng)目中使用的數(shù)據(jù)庫連接的部分代碼
from dbutils.pooled_db import PooledDBtry:_pool: PooledDB = PooledDB(creator=pymysql, mincached=0, maxcached=10, blocking=True,maxconnections=200, # 連接池允許的最大連接數(shù),0和None表示不限制連接數(shù)maxshared=100, # 允許的最大共享連接數(shù)(默認(rèn)值 0 或 None 表示所有連接都是專用的)maxusage=10,host=conf["host"], port=conf["port"], user=conf["user"], passwd=conf["pwd"],db=conf["dbname"], use_unicode=True, charset='utf8mb4',cursorclass=SSDictCursor,setsession=['SET AUTOCOMMIT = 1'])
except Exception as e:raise e
由于項(xiàng)目中的 celery 默認(rèn)使用 prefork 多進(jìn)程的模式; 但是 PooledDB 在 prefork 類型的多進(jìn)程模式下不生效
celery 并發(fā)模式文檔摘要
Overview of Concurrency Options
- prefork: The default option, ideal for CPU-bound tasks and most use cases. It is robust and recommended unless there’s a specific need for another model.
- eventlet and gevent: Designed for IO-bound tasks, these models use greenlets for high concurrency. Note that certain features, like soft_timeout, are not available in these modes. These have detailed documentation pages linked below.
- solo: Executes tasks sequentially in the main thread.
- threads: Utilizes threading for concurrency, available if the concurrent.futures module is present.
- custom: Enables specifying a custom worker pool implementation through environment variables.
部分翻譯:
- celery 默認(rèn)選項(xiàng),非常適合 CPU 密集型任務(wù)和大多數(shù)用例。它非??煽?#xff0c;除非對(duì)其他模型有特定需求,否則建議使用它。
dbutils 中的 PooledDB 文檔摘要
Notes
If you are using one of the popular object-relational mappers SQLObject or SQLAlchemy, you won’t need DBUtils, since they come with their own connection pools. SQLObject 2 (SQL-API) is actually borrowing some code from DBUtils to split the pooling out into a separate layer.
Also note that when you are using a solution like the Apache webserver with mod_python or mod_wsgi, then your Python code will be usually run in the context of the webserver’s child processes. So if you are using the pooled_db module, and several of these child processes are running, you will have as much database connection pools. If these processes are running many threads, this may still be a reasonable approach, but if these processes don’t spawn more than one worker thread, as in the case of Apache’s “prefork” multi-processing module, this approach does not make sense. If you’re running such a configuration, you should resort to a middleware for connection pooling that supports multi-processing, such as pgpool or pgbouncer for the PostgreSQL database.
部分翻譯:
- 當(dāng)您使用諸如帶有 mod_python 或 mod_wsgi 的 Apache Web 服務(wù)器之類的解決方案時(shí),您的 Python 代碼通常將在 Web 服務(wù)器的子進(jìn)程的上下文中運(yùn)行。因此,如果您使用 pooled_db 模塊,并且其中幾個(gè)子進(jìn)程正在運(yùn)行,您將擁有盡可能多的數(shù)據(jù)庫連接池。如果這些進(jìn)程正在運(yùn)行許多線程,這可能仍然是一種合理的方法,但如果這些進(jìn)程不產(chǎn)生多個(gè)工作線程,就像 Apache 的“prefork”多處理模塊的情況一樣,這種方法就沒有意義。
參考文檔:
-
dbutils 官方文檔
-
celery 官方文檔