torchpipe is an alternative choice for Triton Inference Server, mainly featuring similar functionalities such as Shared-momory, Ensemble, and BLS mechanism.
For serving scenarios, TorchPipe is designed to support multi-instance deployment, pipeline parallelism, adaptive batching, GPU-accelerated operators, and reduced head-of-line (HOL) blocking.It acts as a bridge between lower-level acceleration libraries (e.g., TensorRT, OpenCV, CVCUDA) and RPC frameworks (e.g., Thrift). At its core, it is an engine that enables programmable scheduling.
- [20260123] Available on Pypi:
pip install torchpipe - [20260104] We switched to tvm_ffi to provide clearer C++-Python interaction.
Below are some usage examples, for more check out the examples.
from torchpipe import pipe
import torch
from torchvision.models.resnet import resnet101
# create some regular pytorch model...
model = resnet101(pretrained=True).eval().cuda()
# create example model
model_path = f"./resnet101.onnx"
x = torch.ones((1, 3, 224, 224)).cuda()
torch.onnx.export(model, x, model_path, opset_version=17,
input_names=['input'], output_names=['output'],
dynamic_axes={'input': {0: 'batch_size'},
'output': {0: 'batch_size'}})
thread_safe_pipe = pipe({
"preprocessor": {
"backend": "S[DecodeTensor,ResizeTensor,CvtColorTensor,SyncTensor]",
# "backend": "S[DecodeMat,ResizeMat,CvtColorMat,Mat2Tensor,SyncTensor]",
'instance_num': 2,
'color': 'rgb',
'resize_h': '224',
'resize_w': '224',
'next': 'model',
},
"model": {
"backend": "SyncTensor[TensorrtTensor]",
"model": model_path,
"model::cache": model_path.replace(".onnx", ".trt"),
"max": '4',
'batching_timeout': 4, # ms, timeout for batching
'instance_num': 2,
'mean': "123.675, 116.28, 103.53",
'std': "58.395, 57.120, 57.375", # merged into trt
}}
)We can execute the returned thread_safe_pipe just like the original PyTorch model, but in a thread-safe manner.
data = {'data': open('/path/to/img.jpg', 'rb').read()}
thread_safe_pipe(data) # <-- this is thread-safe
result = data['result']- NGC Docker containers (recommended):
test on 25.05, 25.06, 24.05, 23.05
img_name=nvcr.io/nvidia/pytorch:25.05-py3
docker run --rm --gpus all -it --network host \
-v $(pwd):/workspace/ --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-w /workspace/ \
$img_name \
bash
pip install torchpipe
python -c "import torchpipe"The backends it introduces will be JIT-compiled and cached.
There are one core backend group(torchpipe_core) and three optional groups (torchpipe_opencv, torchpipe_nvjpeg, and torchpipe_tensorrt) with different dependencies. For details, see here.
Dependencies such as OpenCV and TensorRT can also be provided in the following ways:
- providing environment variables:
Users can specify paths via the following environment variables:
OPENCV_INCLUDE,OPENCV_LIB,TENSORRT_INCLUDE,TENSORRT_LIB.
See Basic Usage.
WIP