Skip to content

clean_channel

Functions:

Name Description
clean_up_channel

Clean up the package channel by sending DELETE requests to the delete api of the channel

download_file

Download a file (similar to wget) using urllib

load_repodata

Load the repodata.json file

remove_version_matching_regex

Remove packages matching a regex from the package collection

select_package_to_delete

Selects the packages to clean-up from a package collection based on rules inspired by gitlab's clean-up rules

clean_up_channel

clean_up_channel(packages_to_delete, delete_api_url, api_token, timeout_s)

Clean up the package channel by sending DELETE requests to the delete api of the channel

Parameters:

Name Type Description Default
packages_to_delete PackageInfoCollection

Collections of package to delete in the channel. The packages filenames will be appended to the delete_api_url to create the DELETE request.

required
delete_api_url str

API url to delete the packages. Eg: "https://prefix.dev/api/v1/delete/phoenix-dev/linux-64/"

required
api_token str

Token to identify with the delete API.

required
timeout_s int

timeout in seconds for the delete request.

required
Source code in src/phoenixpackagecleanup/clean_channel.py
def clean_up_channel(packages_to_delete: PackageInfoCollection, delete_api_url: str, api_token: str, timeout_s: int):
    """Clean up the package channel by sending DELETE requests to the delete api of the channel

    Parameters
    ----------
    packages_to_delete : PackageInfoCollection
        Collections of package to delete in the channel. The packages filenames will be appended to the
        `delete_api_url` to create the DELETE request.
    delete_api_url : str
        API url to delete the packages. Eg: "https://prefix.dev/api/v1/delete/phoenix-dev/linux-64/"
    api_token : str
        Token to identify with the delete API.
    timeout_s : int
        timeout in seconds for the delete request.
    """
    for package_info_list in packages_to_delete.packages_info.values():
        for package_info in package_info_list:
            delete_url = delete_api_url + "/{}".format(package_info.filename)

            delete_request = urllib.request.Request(
                url=delete_url, headers={"Authorization": "Bearer {}".format(api_token)}, method="DELETE"
            )
            _ = urllib.request.urlopen(delete_request, timeout=timeout_s)

download_file

download_file(url, filepath, user_agent, timeout_s)

Download a file (similar to wget) using urllib

The main reason we need this method is to download the repodata.json file. wget works out of the box, but when using urllib, we have to specify a User-agent otherwise the request is rejected (403: Forbidden)

Parameters:

Name Type Description Default
url str

URL of the file to download

required
filepath Path

Path to where the downloaded data will be written

required
user_agent str

User-agent value to use in the request header

required
timeout_s int

Timeout in second for the request

required
Source code in src/phoenixpackagecleanup/clean_channel.py
def download_file(url: str, filepath: Path, user_agent: str, timeout_s: int):
    """Download a file (similar to wget) using urllib

    The main reason we need this method is to download the repodata.json file. wget works out of the box, but
    when using urllib, we have to specify a User-agent otherwise the request is rejected (403: Forbidden)

    Parameters
    ----------
    url : str
        URL of the file to download
    filepath : Path
        Path to where the downloaded data will be written
    user_agent : str
        User-agent value to use in the request header
    timeout_s : int
        Timeout in second for the request
    """
    file_request = urllib.request.Request(url=url, headers={"User-agent": user_agent})
    # if we get an error from server, it will be raised by urlopen, no need to catch
    response = urllib.request.urlopen(file_request, timeout=timeout_s)
    with open(filepath, "wb") as of:
        of.write(response.read())

load_repodata

load_repodata(repodata_filepath)

Load the repodata.json file

Parameters:

Name Type Description Default
repodata_filepath Path

Path to the repodata.json file

required

Returns:

Type Description
ChannelPackages

Packages in the channel, grouped by package name

Source code in src/phoenixpackagecleanup/clean_channel.py
def load_repodata(repodata_filepath: Path) -> PackageInfoCollection:
    """Load the repodata.json file

    Parameters
    ----------
    repodata_filepath : Path
        Path to the repodata.json file

    Returns
    -------
    ChannelPackages
        Packages in the channel, grouped by package name
    """
    with open(repodata_filepath, "r") as file:
        repodata = json.load(file)

    return parse_repodata(repodata)

remove_version_matching_regex

remove_version_matching_regex(package_collection, remove_version_regex)

Remove packages matching a regex from the package collection

Parameters:

Name Type Description Default
package_collection PackageInfoCollection

Collection of packages to filter

required

Returns:

Type Description
PackageInfoCollection

Collection of packages with the packages matching the regex removed

Source code in src/phoenixpackagecleanup/clean_channel.py
def remove_version_matching_regex(
    package_collection: PackageInfoCollection, remove_version_regex: str
) -> PackageInfoCollection:
    """Remove packages matching a regex from the package collection

    Parameters
    ----------
    package_collection : PackageInfoCollection
        Collection of packages to filter

    Returns
    -------
    PackageInfoCollection
        Collection of packages with the packages matching the regex removed
    """
    filtered_collection = PackageInfoCollection(defaultdict(list))

    for package_name, packages_info in package_collection.packages_info.items():
        # Filter packages : packages not matching the remove_version_regex are not added to the output collection
        filtered_packages = [pkg for pkg in packages_info if not re.match(remove_version_regex, pkg.version)]
        if filtered_packages:
            filtered_collection.packages_info[package_name] = filtered_packages

    return filtered_collection

select_package_to_delete

select_package_to_delete(channel_packages, current_time, delete_older_than_days, min_number_of_packages, keep_version_regex)

Selects the packages to clean-up from a package collection based on rules inspired by gitlab's clean-up rules

The selection rules work as the following: - for each package name, get the list of package version/archives - exclude the min_number_of_packages most recent packages from the list - exclude all packages that have been uploaded more recently than delete_older_than_days days old. - what remains in the list is selected for deletion

Parameters:

Name Type Description Default
channel_packages PackageInfoCollection

Collection of packages to clean-up.

required
current_time datetime

The time at which the script is running, used to remove packages based on upload timestamp

required
delete_older_than_days int

Packages which upload timestamp is less than delete_older_than_days days old are not considered for deletion

required
min_number_of_packages int

Only if at least min_number_of_packages are available for a package will the packages be considered for deletion

required
keep_version_regex str

regex applied to packages versions: if the package version match this, it will NOT be considered for deletion.

required

Returns:

Type Description
PackageInfoCollection

Collection of packages that should be deleted to clean-up.

Source code in src/phoenixpackagecleanup/clean_channel.py
def select_package_to_delete(
    channel_packages: PackageInfoCollection,
    current_time: datetime,
    delete_older_than_days: int,
    min_number_of_packages: int,
    keep_version_regex: str,
) -> PackageInfoCollection:
    """Selects the packages to clean-up from a package collection based on rules inspired by gitlab's clean-up rules

    The selection rules work as the following:
    - for each package name, get the list of package version/archives
    - exclude the `min_number_of_packages` most recent packages from the list
    - exclude all packages that have been uploaded more recently than `delete_older_than_days` days old.
    - what remains in the list is selected for deletion

    Parameters
    ----------
    channel_packages : PackageInfoCollection
        Collection of packages to clean-up.
    current_time : datetime
        The time at which the script is running, used to remove packages based on upload timestamp
    delete_older_than_days : int
        Packages which upload timestamp is less than `delete_older_than_days` days old are not considered for deletion
    min_number_of_packages : int
        Only if at least `min_number_of_packages` are available for a package will the packages be considered for
        deletion
    keep_version_regex : str
        regex applied to packages versions: if the package version match this, it will NOT be considered for
        deletion.

    Returns
    -------
    PackageInfoCollection
        Collection of packages that should be deleted to clean-up.
    """

    # remove packages matching the keep_version_regex
    filtered_by_regex_pkgs = remove_version_matching_regex(channel_packages, keep_version_regex)

    selected_for_deletion_collection = PackageInfoCollection(defaultdict(list))

    for package_name, packages_info in filtered_by_regex_pkgs.packages_info.items():
        # sort package list in decreasing upload time
        packages_sorted_per_timestamp_decreasing = sorted(
            packages_info, key=lambda package_info: package_info.upload_time, reverse=True
        )
        # remove the min_number_of_packages newest package from list, and iterate
        for candidate_for_clean_up in packages_sorted_per_timestamp_decreasing[min_number_of_packages:]:
            if current_time - candidate_for_clean_up.upload_time > timedelta(days=delete_older_than_days):
                selected_for_deletion_collection.packages_info[package_name].append(candidate_for_clean_up)

    return selected_for_deletion_collection