Downloading 500,000 Amazon URLS per hour
How are these large retail arbitrage companies managing to download 500,000 urls per hour?
DSM tool for example,
reprices every hour. Has 10,000 users, and let's conservatively say all these users have combined 500,000 urls. Any ideas into how they are doing this and keeping it cost efficient?
Here is an example.
A linode box that costs $10.00 month can manage 2 selenium processed concurrently. Each selenium process can download at a rate of about 1.333 URL/minute. So,
For $10.00 a month, you can download ~160 URLS per hour.
Obviously as system architecture increases download speed should too as well as numbers of CPUs, however it would still be incredibly expensive to download this many URLS.
Thoughts?
|