Computationally intensive applications, such as OpenCV, can be off-loaded to accelerators to reduce execution time. However, developing an accelerated system requires a significant amount of time, requiring the developer to first choose an accelerator and which parts to off-load, then to port and the offloaded kernels to the accelerator using many accelerator-specific tools. In addition to the low-level parallelism of the accelerator, the developer also needs to extract and utilize systemlevel parallelism found within the application, while making sure that the application still executes correctly. This paper presents Courier, a toolchain and a domain specific language for Runtime Binary Acceleration, designed to simplify many of the steps involved in accelerating an application. The Courier toolchain can extract dataflow from a running software binary file, explore the off-loaded execution time on an accelerator, and then actually accelerate the original binary. By utilizing Courier, both expert and non-expert users can easily extract systemlevel parallelism and decide which part should be off-loaded to accelerators in a mixed software-hardware environment, without special knowledge on the target application source code and accelerator architecture. In a case study an OpenCV application is accelerated by 2.06 times using Courier, without requiring the application source code or any re-compilation of the application.