codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Suggestions on mechanism or existing code - maintain persistence of file download history


On Thu, Jan 30, 2020 at 7:06 AM jkn <jkn_gg at nicorp.f9.co.uk> wrote:
>
> Hi all
>     I'm almost embarrassed to ask this as it's "so simple", but thought I'd give
> it a go...

Hey, nothing wrong with that!

> I want to be a able to use a simple 'download manager' which I was going to write
> (in Python), but then wondered if there was something suitable already out there.
> I haven't found it, but thought people here might have some ideas for existing work, or approaches.
>
> The situation is this - I have a long list of file URLs and want to download these
> as a 'background task'. I want this to process to be 'crudely persistent' - you
> can CTRL-C out, and next time you run things it will pick up where it left off.

A decent project. I've done this before but in restricted ways.

> The download part is not difficult. Is is the persistence bit I am thinking about.
> It is not easy to tell the name of the downloaded file from the URL.
>
> I could have a file with all the URLs listed and work through each line in turn.
> But then I would have to rewrite the file (say, with the previously-successful
> lines commented out) as I go.
>

Hmm. The easiest way would be to have something from the URL in the
file name. For instance, you could hash the URL and put the first few
digits of the hash in the file name, so
http://some.domain.example/some/path/filename.html might get saved
into "a39321604c - filename.html". That way, if you want to know if
it's been downloaded already, you just hash the URL and see if any
file begins with those digits.

Would that kind of idea work?

ChrisA