codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tripleo][ci] container pulls failing


On Tue, Jul 28, 2020 at 7:13 AM Emilien Macchi <emilien at redhat.com> wrote:
>
>
>
> On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin <whayutin at redhat.com> wrote:
>>
>> FYI...
>>
>> If you find your jobs are failing with an error similar to [1], you have been rate limited by docker.io via the upstream mirror system and have hit [2].  I've been discussing the issue w/ upstream infra, rdo-infra and a few CI engineers.
>>
>> There are a few ways to mitigate the issue however I don't see any of the options being completed very quickly so I'm asking for your patience while this issue is socialized and resolved.
>>
>> For full transparency we're considering the following options.
>>
>> 1. move off of docker.io to quay.io
>
>
> quay.io also has API rate limit:
> https://docs.quay.io/issues/429.html
>
> Now I'm not sure about how many requests per seconds one can do vs the other but this would need to be checked with the quay team before changing anything.
> Also quay.io had its big downtimes as well, SLA needs to be considered.
>
>> 2. local container builds for each job in master, possibly ussuri
>
>
> Not convinced.
> You can look at CI logs:
> - pulling / updating / pushing container images from docker.io to local registry takes ~10 min on standalone (OVH)
> - building containers from scratch with updated repos and pushing them to local registry takes ~29 min on standalone (OVH).
>
>>
>> 3. parent child jobs upstream where rpms and containers will be build and host artifacts for the child jobs
>
>
> Yes, we need to investigate that.
>
>>
>> 4. remove some portion of the upstream jobs to lower the impact we have on 3rd party infrastructure.
>
>
> I'm not sure I understand this one, maybe you can give an example of what could be removed?

We need to re-evaulate our use of scenarios (e.g. we have two
scenario010's both are non-voting).  There's a reason we historically
didn't want to add more jobs because of these types of resource
constraints.  I think we've added new jobs recently and likely need to
reduce what we run. Additionally we might want to look into reducing
what we run on stable branches as well.

>
>>
>> If you have thoughts please don't hesitate to share on this thread.  Very sorry we're hitting these failures and I really appreciate your patience.  I would expect major delays in getting patches merged at this point until things are resolved.
>>
>> Thank you!
>>
>> [1] HTTPError: 429 Client Error: Too Many Requests for url: http://mirror.ca-ymq-1.vexxhost.opendev.org:8082/v2/tripleotrain/centos-binary-cron/blobs/sha256:76342b0db11c6b5acf33b9f1cbf10b3d2680fb20967ccd7daa9593a39e9e45c0
>> [2] https://bugs.launchpad.net/tripleo/+bug/1889122
>
>
>
> --
> Emilien Macchi