Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-9215

Detect changes by label generates excessive log

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Component/s: clearcase-plugin
    • Labels:
      None
    • Similar Issues:

      Description

      When using the detection of changes by label, lshistory adds a -minor option in order to detect changes based on mklabel.

      However the output generated as a result can be quite excessive and lengthy in time. In our case some of this builds only run after a month and take a lot of time just to check (2-3h) and generates a log with a large size (150-250Mb).

      The time probably can't be solved but the size can if you filter out the information leaving only the 'mklabel' and 'rmlabel' entries that are the ones actually considered for the changelog.

        Attachments

          Issue Links

            Activity

            Hide
            josesa Jose Sa added a comment -

            I found a way to do the proper query to find all changes between two time stamps filtering by multiple branches and multiple labels.

            This produces a faster output (minutes instead of hours) and the log just shows only the needed to obtain the changelog (fixing the problem introduced with -minor implementation).

            This is a sample command:

            time cleartool find /Vobs/BETS /Vobs/Spots_3ta \
            -all -follow \
            -version "(created_since(22-feb-11.12:45:33utc+0000) && ! created_since(21-apr-11.17:45:22utc+0000))
                    && (brtype(SPOTS_V14-MNT) || brtype(main))
                    && (lbtype(SPOTS_V14W_BASE_READY) || lbtype(SPOTS_V14M_BASE_READY))" \
            -exec 'cleartool desc -fmt "\"%Nd\" \"%u\" \"%En\" \"%Vn\" \"%e\" \"%o\" \n%c\n" $CLEARCASE_XPN'
            
            Show
            josesa Jose Sa added a comment - I found a way to do the proper query to find all changes between two time stamps filtering by multiple branches and multiple labels. This produces a faster output (minutes instead of hours) and the log just shows only the needed to obtain the changelog (fixing the problem introduced with -minor implementation). This is a sample command: time cleartool find /Vobs/BETS /Vobs/Spots_3ta \ -all -follow \ -version "(created_since(22-feb-11.12:45:33utc+0000) && ! created_since(21-apr-11.17:45:22utc+0000)) && (brtype(SPOTS_V14-MNT) || brtype(main)) && (lbtype(SPOTS_V14W_BASE_READY) || lbtype(SPOTS_V14M_BASE_READY))" \ -exec 'cleartool desc -fmt "\"%Nd\" \"%u\" \"%En\" \"%Vn\" \"%e\" \"%o\" \n%c\n" $CLEARCASE_XPN'
            Hide
            raspy Krzysztof Malinowski added a comment -

            Hi,

            I like your approach, but you will miss a scenario in this way. Cleartool find checks only for versions created since timestamp, not labels created since timestamp. If you track changes by label it is common to have a history like this:

            T=0 create version 1
            T=1 make label A on version 1
            <Jenkins polls by find and makes a build>
            T=2 create version 2
            <Jenkins polls by find and does not make a build, since the label was not applied>
            T=3 remove label A from version 1
            T=4 make label A on version 2
            <Jenkins polls by find since T=2 and does not make a build, because created_since predicate is false>

            Moreover, when tracking changes by a label, it is quite possible that label can be moved from one already existing version to another already existing version. So no new version can be created and the build should still be started due to label move. This also fails to be detected by created_since predicate.

            Show
            raspy Krzysztof Malinowski added a comment - Hi, I like your approach, but you will miss a scenario in this way. Cleartool find checks only for versions created since timestamp, not labels created since timestamp. If you track changes by label it is common to have a history like this: T=0 create version 1 T=1 make label A on version 1 <Jenkins polls by find and makes a build> T=2 create version 2 <Jenkins polls by find and does not make a build, since the label was not applied> T=3 remove label A from version 1 T=4 make label A on version 2 <Jenkins polls by find since T=2 and does not make a build, because created_since predicate is false> Moreover, when tracking changes by a label, it is quite possible that label can be moved from one already existing version to another already existing version. So no new version can be created and the build should still be started due to label move. This also fails to be detected by created_since predicate.
            Hide
            josesa Jose Sa added a comment - - edited

            I guess we will have to live with lshistory... if only it allowed some additional filtering on the source like the find command does.

            Then it will be necessary to change current implementation to ignore everything on the lshistory output that isn't either events of type 'rmlabel' and 'mklabel' and considering only the entries that are referent to the specific labels you want to consider.

            I had to implement a workaround on my server on a cronjob "editing" the existing build log files performing this pruning offline. I got gains of 80% in some cases recovering a total of 30Gb of disk space.

            Here is the content of my bash script as a workaround if anyone is having the same problem as me, until this is fixed in the plugin.

            #!/usr/bin/bash
            
            # Takes a log file as argument and applies lshistory filtering 
            # based on job specific configured labels
            function cc_lshistory_prune_log() {
                local log_file=$1
                local log_new=${log_file}_new
                local log_bak=${log_file}_bak
                local job_dir=$(cd $(dirname ${log_file})/../.. && pwd)
                local job_name=$(basename ${job_dir})
                local config=${job_dir}/config.xml
                local label_line=$(grep "<label>" "${config}" | grep -v "<label></label>")
                [[ ${label_line} =~ "<label>(.*)</label>" ]]
                local labels=${BASH_REMATCH[1]}
                local labels_re=${labels//\ /\|}
            
                # Check if already executed and abort
                if [ -f "${log_bak}" ]; then
                    echo "Aborted. Backup still exists: ${log_bak}"
                    return 0
                fi
            
                gawk '
                /cleartool lshistory/ { 
                    in_lshistory = 1
                    print
                }
                in_lshistory == 1 && /\['${job_name}'\]/ {
                    in_lshistory = 0
                }
                in_lshistory == 1 && /rmlabel|mklabel/ && /'${labels_re}'/ {
                    print
                    next
                }
                in_lshistory == 0 {print}
                ' ${log_file} > ${log_new}
                touch -r ${log_file} ${log_new}
                mv ${log_file} ${log_bak}
                mv ${log_new} ${log_file}
                ls -lh ${log_file}*
            }
            
            # Processes all logs that may need prunning searching 
            # by specific modification time
            function process_all_logs() {
                local mtime=$1
                for config in /opt/hudson/jobs/*/config.xml; do
                    job_dir=$(dirname "$config")
                    job_name=$(basename "${job_dir}")
                    label_line=$(grep "<label>" "$config" | grep -v "<label></label>")
                    [[ $label_line =~ "<label>(.*)</label>" ]]
                    label=${BASH_REMATCH[1]}
                    if [[ "${label}" != "" ]]; then
                        find "${job_dir}" -name log -mtime ${mtime} -print
                    fi
                done | while read logfile; do
                    cc_lshistory_prune_log ${logfile}
                done
            }
            
            ## Main
            export PATH=$PATH:/opt/csw/bin
            #echo $PATH
            if [ -f "$1" ]; then
                cc_lshistory_prune_log "$1"
            else
                # Searches in all possible logs from yesterday
                process_all_logs 1
            fi
            

            EDIT: Updated script content that is currently working every day at 22h00, cleaning all logs of 'yesterday'.

            Show
            josesa Jose Sa added a comment - - edited I guess we will have to live with lshistory... if only it allowed some additional filtering on the source like the find command does. Then it will be necessary to change current implementation to ignore everything on the lshistory output that isn't either events of type 'rmlabel' and 'mklabel' and considering only the entries that are referent to the specific labels you want to consider. I had to implement a workaround on my server on a cronjob "editing" the existing build log files performing this pruning offline. I got gains of 80% in some cases recovering a total of 30Gb of disk space. Here is the content of my bash script as a workaround if anyone is having the same problem as me, until this is fixed in the plugin. #!/usr/bin/bash # Takes a log file as argument and applies lshistory filtering # based on job specific configured labels function cc_lshistory_prune_log() { local log_file=$1 local log_new=${log_file}_new local log_bak=${log_file}_bak local job_dir=$(cd $(dirname ${log_file})/../.. && pwd) local job_name=$(basename ${job_dir}) local config=${job_dir}/config.xml local label_line=$(grep "<label>" "${config}" | grep -v "<label></label>") [[ ${label_line} =~ "<label>(.*)</label>" ]] local labels=${BASH_REMATCH[1]} local labels_re=${labels//\ /\|} # Check if already executed and abort if [ -f "${log_bak}" ]; then echo "Aborted. Backup still exists: ${log_bak}" return 0 fi gawk ' /cleartool lshistory/ { in_lshistory = 1 print } in_lshistory == 1 && /\['${job_name}'\]/ { in_lshistory = 0 } in_lshistory == 1 && /rmlabel|mklabel/ && /'${labels_re}'/ { print next } in_lshistory == 0 {print} ' ${log_file} > ${log_new} touch -r ${log_file} ${log_new} mv ${log_file} ${log_bak} mv ${log_new} ${log_file} ls -lh ${log_file}* } # Processes all logs that may need prunning searching # by specific modification time function process_all_logs() { local mtime=$1 for config in /opt/hudson/jobs/*/config.xml; do job_dir=$(dirname "$config") job_name=$(basename "${job_dir}") label_line=$(grep "<label>" "$config" | grep -v "<label></label>") [[ $label_line =~ "<label>(.*)</label>" ]] label=${BASH_REMATCH[1]} if [[ "${label}" != "" ]]; then find "${job_dir}" -name log -mtime ${mtime} -print fi done | while read logfile; do cc_lshistory_prune_log ${logfile} done } ## Main export PATH=$PATH:/opt/csw/bin #echo $PATH if [ -f "$1" ]; then cc_lshistory_prune_log "$1" else # Searches in all possible logs from yesterday process_all_logs 1 fi EDIT: Updated script content that is currently working every day at 22h00, cleaning all logs of 'yesterday'.
            Hide
            josesa Jose Sa added a comment -

            I've created an RFE in IBM that hopefully will give us faster feedback when polling with Labels:
            http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10850

            Show
            josesa Jose Sa added a comment - I've created an RFE in IBM that hopefully will give us faster feedback when polling with Labels: http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10850

              People

              • Assignee:
                vlatombe Vincent Latombe
                Reporter:
                josesa Jose Sa
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: