There is a GitHub issue (that I couldn't find any more) that provides more information, but in general there will always be some latency with each task run running through the API backend because each task run has to hit the Prefect API for each state change. That means for each task run, there will be at least 3 API calls.
There are some things you could do in Prefect <= 1.0 to speed up the execution e.g. if you batch some tasks together to limit the number of API calls, but then you lose the observability. So that's the tradeoff you need to consider.
But I would be curious to hear whether you tried Orion? It's really fast, built async-first, and even your local execution gets recorded in the backend without you having to do anything. So Orion may be the solution to the issue you see