In this page

Logging

The Better Content Archiving app records its internal activities to a so-called log. The log helps you to understand the activity of the app when diagnosing problems.

Viewing the log

The Better Content Archiving app writes its own log lines to the Confluence log:

  1. Find the logfile named confluence-application.log (see the guide by Atlassian).
  2. Search for the text "com.midori" to find the lines written by Better Content Archiving.
  3. If you can't see any log line written by the app, check if DEBUG logging is configured correctly, then use the corresponding app function again.

If you can't interpret the log lines you found, report it to our support. We are there to help you!

Configuring logging

When investigating a run-time problem, you may need to increase the logging level for the Better Content Archiving app. Doing so, it will write more details about what it does to the log, which helps you understand the problem.

How to enable DEBUG-level logging

To turn on DEBUG logging from the web interface without stopping Confluence:

  1. Login to Confluence as administrator.
  2. Go to Confluence AdministrationGeneral configurationLogging and Profiling.
  3. At Add New Entry, enter "com.midori" to Class/Package Name and choose "DEBUG" in New Level.
  4. Click Add entry.
  5. Check if this new entry was correctly added to the Existing Levels list.

Now execute the app function that you think fails, then check the log for the details.

How to enable TRACE-level logging

In some rare cases, you may need to turn on TRACE logging, the most detailed logging level.

Confluence 8+ versions: For that, just follow the DEBUG-level logging guide but choose "ALL" as logging level (instead of "DEBUG").

For Confluence pre-8 versions:

  1. Stop Confluence.
  2. Edit <Confluence installation directory>/confluence/WEB-INF/classes/log4j.properties.
  3. Replace log4j.appender.confluencelog.Threshold=DEBUG with log4j.appender.confluencelog.Threshold=ALL.
  4. Start Confluence.
  5. Follow the DEBUG-level logging guide but choose "ALL" as logging level (instead of "DEBUG").

Warning: TRACE logging can produce a very high number of log lines. Therefore, it should be turned off as soon as possible!

Frequent problems and solutions

Our users are not receiving the notification emails.

Make sure that your users have valid email addresses configured, and the SMTP servers in Confluence are configured and tested. To verify:

  1. Watch any page using a user account that was not receiving the Better Content Archiving notification emails.
  2. Edit that page using another user account.
  3. If the problematic user receives the "page edited" built-in notification email, it means that the user account and the SMTP server is correctly configured.

If you still have the problem, have a look at your Confluence log to see if there are errors logged around email sending. If the email are sent out according to the log, then check your spam folders and your spam detector.

There are "Page (or Attachment) without last modification date found" warnings in my log.

You may sometimes see harmless warnings like these in your Confluence log:

[WARN] Attachment without last modification date found: #53090452 "foobar.jpg"
...
[WARN] Page without last modification date found: #52527113 "My super duper wiki page"

This basically means that Better Content Archiving found a page or an attachment with NULL (unknown) last modification date while traversing your content. As the last update date is unknown, their age (the time since their last update) cannot be calculated and they will be skipped by the Better Content Archiving app.

Please note that in normal circumstances all pages and all attachments should have valid last modification dates. These corrupt ones were probably created by some "incomplete" app, integration or data migration. You can safely ignore these warnings, or decide to fix the data.

How to fix these?

  1. If there is only a few of these, you should just edit the page or re-upload the attachment. The log entry shows the numerical identifier and the page title or attachment filename, so that you can easily find them.
  2. If there are many of these, you should run an SQL UPDATE statement against your database to initialize the value in the corresponding records. One trivial idea is to initialize the missing value to the creation date of the page or attachment, or to the current date or to some other fixed date.

I just installed Better Content Archiving and it shows that all pages are viewed. ("zero not-viewed pages" problem)

Current status: Better Content Archiving 7.4.0 introduced the "page view initialization" feature to solve this problem.

As you probably know, Confluence does not implement any sort of page view tracking. Because the workflows implemented by Better Content Archiving heavily rely on the last page view information, the app implements its own real-time page view tracking mechanism.

As this is the app itself that implements the page view tracking, there are some obvious consequences:

  1. The periods when the app was not installed are not tracked.
    If the app was installed on Jan 10, it cannot know what happened before. Even if you activate page view tracking immediately, it would not report anything. If you set the "not viewed" interval to e.g. 60 days, then the first pages should be reported around March 10 (60 days from the installation date, when the tracking actually started).
    Since app version 7.4.0, this problem is completely eliminated by initializing page views.
  2. Similarly, the periods when the app was not licensed are not tracked.
    Make sure that you have a valid license installed.

I have lots of pages that I know are not viewed, but those are not reported. ("abandoned pages" problem)

Current status: Better Content Archiving 7.4.0 introduced the "page view initialization" feature to solve this problem.

As you probably know, Confluence does not implement any sort of page view tracking. Because the workflows implemented by Better Content Archiving heavily rely on the last page view information, the app implements its own real-time page view tracking mechanism. Even with this mechanism in place, there is (was) a limitation that confused many in the past.

If a page was last viewed before the Better Content Archiving app got installed and never after the installation, the app has zero information about the last view. (We called these pages the "abandoned pages" in our internal lingo.)

The app is designed to skip these abandoned pages to avoid distorted statistics and unexpected archivings. The idea is that if we don't precisely know the page's last view, the safest is skipping the page: not to report it and not to do anything potentially dangerous with it, like archiving.

And, here is the problem. The drawback of this safety-first behavior is that if a page is never viewed after installing the app, it is skipped at every check, effectively staying "under the radar" forever.

This was a confusing non-critical issue with previous app versions, and it is a non-issue since app version 7.3.0. The "page view initialization" feature allows populating last page view information for any page, eliminating "abandoned pages". See the page view initialization section for more.

The execution of the "Move" strategy on a large number of pages is slow.

Page moves were relatively slow in Confluence pre-5.10.18 versions, then got faster in 5.10.18 (due to the rewrite of the corresponding code in Confluence). Unfortunately, they became slower again in newer Confluence versions (in 6.0.1, for instance). This is important to understand that although page moves are not fast, those are completely reliable and stable!

Midori investigated the problem and it turned out that most of the execution time is spent in Confluence core, while it is refactoring (fixing) links pointing to the moved pages. We reported this for Atlassian: CONFSERVER-53503 (which was migrated from the original CE-1045).

We definitely encourage you to vote for the bug, to increase its importance for the Confluence developer team.

At the moment, there is one simple way to accelerate page moves: increasing the heap size. See the instructions.

According to our tests, increasing the heap size from the default 1024M to 1536M, the execution time was reduced by 30% for 800 pages. Your miles may vary, but it is definitely worth a try.

"Last Execution" at the Scheduled Jobs is empty for "Analyze Content Quality" and "Find and Archive Expired Content".

Current status: Better Content Archiving 8.4.0 is using Caesium job configurations, therefore this problem is fixed since that version.

This is a purely cosmetic problem which you can safely ignore. Nevertheless, we describe the root cause below.

Prior to Confluence 5.10, Confluence used the Quartz library to run periodic jobs and required the apps that want to run periodic jobs to declare "trigger" and "job descriptor" type modules. It worked as expected with Better Content Archiving, also capturing the execution history.

In Confluence 5.10, Atlassian changed it to a new technology called Caesium and required apps to declare new module types. Luckily, Confluence still supports the old approach for backward compatibility and it works almost perfectly, but there are some minor glitches like this one.

In fact, jobs are perfectly executed, but their execution history is stored in a cache instead of being persisted to the database, as previously. And, as caches are flushed once a day, the execution history of infrequently executed jobs is cleared, therefore not displayed in the UI most of the time.

Better Content Archiving will be converted to the modern approach at some point. The reason we're not doing this right away is that Better Content Archiving supports any Confluence version starting from 5.7! It means that the conversion made it impossible to support 5.7-5.9, which will be an overly high price for this change.

I configured custom schedule for the jobs, but they are executed according to the default schedule.

Current status: Atlassian fixed this bug in Confluence 7.14 (tracked as CONFSERVER-55455).

If you are using a Confluence version that is affected by this bug, update to Confluence 7.14 or newer.

If you can't update Confluence immediately, see if these workarounds can help until you can:

  1. You re-configure the custom schedules for the jobs after every Confluence restart. (We know that it is cumbersome, but this is the best workaround Atlassian can give us at the moment, unfortunately.)
  2. You don't use custom schedules, but the default ones. In this case, either explicitly reset the job schedules to their defaults in the Scheduled Jobs screen or just let Confluence "forget" the custom schedules.
  3. You downgrade to app version 8.3.0, which is affected by this bug. (That app version is using Quartz-scheduled jobs that are not affected by the Confluence bug.)

My background jobs are (randomly) not executed.

Current status: Better Content Archiving 6.0.0 executes background jobs several magnitude faster than previous versions, therefore the chance for collisions is very low since that version.

Although Better Content Archiving relies on several background jobs, it does not allow multiple of those running concurrently (at the same time), in order to prevent synchronization problems. This is a simple and robust synchronization rule, which works very well in most cases.

Due to this rule though, you may randomly see jobs not being executed. Typical symptoms:

  1. The work that is supposed to be done by the job is not actually done. For instance, the Quality Statistics are not updated by the "Analyze Content Quality" job, or the notification emails are not sent by the "Find and Archive Expired Content" job.
  2. The job's execution history shows you durations of 1-5 milliseconds, which is unrealistically short for an actual execution.
  3. There are warnings like this written to the Confluence log (the text may vary between app versions):
    [WARN] Failed to start the Content Archiving task as Content Quality Analysis is already running

The root cause is simply that the job executions occasionally overlap in time. For instance, if there is job "A" to execute every day at 02:00AM and job "B" to execute every day at 02:30AM, then jobs will only work reliably if "A" can always complete in 30 minutes. If "A" sometimes takes more than 30 minutes, then "B" will be started, detect that "A" is still running, write the warning to the log and exit, thus not be able to run that day.

What causes overlapping jobs?

  1. If you recently changed the schedule of the jobs, then you possibly made some mistake. (Read on to learn how to set up a correctly working schedule.)
  2. Jobs take longer than expected to complete (typically due to the increased size of your data), and the schedule is not fitting this.

Solution: the jobs' schedules need to be configured so that you eliminate the job execution overlaps.

Below we are giving a simple method to find your optimal schedule:

  1. Check the average duration for each job of the app based on the execution history. This includes the 3 jobs whose name starts with "Better Content Archiving" (in top part of the alphabetically sorted job list). You can skip the "Better Content Archiving: Persist the XYZ Journal" jobs, as those are not relevant and are allowed to run concurrently. When calculating the average durations, ignore the executions that complete in less than a second (with an "immediate exit").
  2. Design your optimal schedule based on the job's average durations and your preferences. For instance:
    • I want the "Find and Archive Expired Content" job to run daily around 2AM, it takes 80 minutes, and this is the most important.
    • I want the "Analyze Content Quality" job to run daily around 4AM, and it takes 55 minutes.
    • I want the "Warm up the Content Status Cache" job to run in every 10 minutes, and it normally takes 50 minutes. (Note: this job was removed in 6.0.0+ app versions.)
    In a scenario like this, a potentially good schedule that eliminates the job collision is:
    • "Find and Archive Expired Content" job: every day at 2AM. (It will complete by around 3:20AM.)
    • "Analyze Content Quality" job: every day at 4AM, so that it allows an extra 40 minute safety window for the previous job to complete. (It will complete by around 4:55AM.)
    • "Warm up the Content Status Cache" job: every 10 minutes, excluding the period 0:30AM - 5:30AM. With 50 minutes of execution time it means that it will not run between 1:20AM and 5:30AM, allowing a 30 minute window before the first job and 35 minutes after the second. (Caches are safe to go cold in those hours when your users would not check content status.)
  3. When you designed the schedule, apply that to the jobs.

It may look difficult, but this is fairly trivial while doing, believe me.

Every environment and every team's requirements are a little different, so there is no "one schedule that works perfect everywhere". Once the new schedule is in place, it will work consistently.

After a failed and reverted archiving execution, images are not shown in the fresh space any longer.

Current status: Atlassian fixed this in Confluence 8.1.0.

Blank images are the most obvious symptom of this problem, but it is not restricted to image type attachments only. The content (and only the content) of every type of attachments may be missing, although the attachments themselves are listed correctly in the UI.

First, it is important to understand, what happens when the archiving execution fails and gets reverted due to an error (e.g. corrupt data):

  1. The app detects that there was an error.
  2. The app cancels the archiving execution.
  3. Those spaces which this archiving execution was already successfully completed in will be left in their new state (with the archived pages moved to the archive space). In contrary, the app lets Confluence will restore the space in which the error occurred to its old state.

Consequently, there is no "half complete" archiving in any space: it is either "fully done" or "fully reverted". It is transactional and the transactional unit is a space.

Second, it is important to know that, unless Confluence is configured otherwise, attachment information is stored in two different storage:

  1. Attachment metadata (author, creation date, etc.) is stored in the Confluence database.
  2. Attachments content (image bytes, etc.) is stored in the filesystem, more precisely as files in a managed directory hierarchy (within the Confluence home directory). The file is "linked" to the attachment by its actual location (absolute path) within the directory hierarchy.

Now, the problem is that while at an error the database-stored information is reverted to original state, the filesystem location is not! It is a design flaw in Confluence core, and as a result, the "link" between the attachment metadata and the file itself becomes broken.

The good news is that the file is absolutely not lost, it is just not located in the expected directory.

Being a well-known problem in Confluence, Atlassian has a solution in the Knowledge Base article on orphan attachments.

I see "net.sf.hibernate.ObjectNotFoundException: No row with the given identifier exists" exceptions.

Current status: Atlassian fixed this in Confluence 6.2.2.

This is a problem that occurs when executing the "Move" strategy in Confluence 6.0.x, 6.1.x, and 6.2.x, but not after 6.2.2. This affects only certain pages, but affects those consistently, meaning that repeated re-runs of the content lifecycle job will fail with the same exception.

See this issue for details, in which Atlassian and Midori investigated the problem: CE-1044. The investigation led to a bugfix released in Confluence 6.2.2.

If you can't upgrade to 6.2.2 right away, then the potential workarounds are:

  1. You keep using all other features of the Better Content Archiving app, but you do not archive pages in the spaces where the problem occurs until you can upgrade. For this, just turn off all page archiving related triggers in the configurations applied to the problematic spaces.
  2. If you identify which is the problematic page, add the noarchive-single label to that to disable the app on that single page. When you upgrade to 6.2.2, remember to remove the label. (You can also use the noarchive label to disable the app on a whole subtree of pages.) To find the problematic page, increase the Better Content Archiving app's logging level, and the problematic page will be the parent of the page which is being moved right before the exception.
  3. You switch to the "Copy and Trash" strategy temporarily. It may happen that switching to that strategy for a single archiving execution will archive the problematic page, and you can switch back to "Move" immediately after that. Before doing this, please see the comparison of the "Move" and "Copy and Trash" strategies to avoid surprises!

The safest option is the first one.

I see "Cannot add an existing ancestor as a child!" exceptions.

When running the "move" archiving strategy, it may result in the java.lang.IllegalArgumentException: Cannot add an existing ancestor as a child! exception in certain spaces.

What's the root cause? Confluence is using an accelerator database table to maintain ancestors (i.e. children of children of pages, recursively to unlimited depth). If this table loses its integrity (not related to Better Content Archiving!), that may lead to this and similar errors.

Please follow the official guide to fix the table, it's easy. (If it does not help for the first try, stop Confluence, run the delete from confancestors SQL statement to clear the table, then restart Confluence and only then rebuild the table.)

After the table's integrity is restored, re-run the archiving.

I see "Comparison method violates its general contract" exceptions.

Current status: Midori gives a workaround for this problem in Better Content Archiving 6.1.1, while Atlassian gives a proper fix in Confluence 6.2.1.

When using the app, some pages might display the java.lang.IllegalArgumentException: Comparison method violates its general contract! exception.

This is caused by a bug in some specific versions of the Java runtime and in Confluence core. We have already reported the problem to Atlassian: CONF-45910. We definitely encourage you to vote for the bug, to increase its importance for the Confluence developer team.

Luckily, there is an easy workaround until the actual bugfix is available! Adding this JVM system property completely eliminates the problem:

java.util.Arrays.useLegacyMergeSort=true

The way of defining the system property depends on your operating system, but this page in the Confluence administrator's guide explains all scenarios. After the restart, your change is picked up and the problem is gone.

I see "java.lang.NullPointerException at NaturalStringComparator.compareNatural()" exceptions.

When using the app, especially while content indexing, the following error is written to your log:

2018-01-02 10:57:10,031 ERROR [Long running task: Content Event Indexing [CARCH]] [service.task.base.AbstractArchivingLongRunningTask] runInternal Failed to index content events
 -- url: /admin/plugins/archiving/start-content-event-indexing.action | referer: /admin/plugins/archiving/statistics.action | traceId: a5b3407986727e99 | userName: admin | action: start-content-event-indexing
java.lang.NullPointerException
	at com.atlassian.confluence.pages.NaturalStringComparator.compareNatural(NaturalStringComparator.java:74)
	...

This is caused by pages that have blank titles (i.e. NULL saved in the database) as those cannot be compared against other pages. These pages are considered corrupt data created by a buggy app, integration or other mechanism that does not obey the data integrity rules in Confluence (which would require all pages have non-blank titles).

These should be fixed not only for the sake of Better Content Archiving, but for all Confluence features work properly. Please follow the official guide written by Atlassian to fix the problem. It's easy.

Questions?

Ask us any time.