In this page

What is the Better Content Archiving for Confluence app?

Better Content Archiving for Confluence is the Content Lifecycle Management solution for Confluence.

For a high level overview of the app value proposition and core functionality, please see the app home page. This page is the starting page of the user documentation.

How will I use Better Content Archiving?

The app supports every kind of content lifecycle- and information retention strategies around Confluence pages.

For example:

  • Expiration at a date: Some pages expire on one specific date, like 1 Mar 2015 (like the end date of the fiscal year). They should not be automatically archived, but a supervisor should decide whether to archive them.
  • Periodic reviews: Some pages need weekly reviews, and its maintainers should be notified about the expiration until the page gets an update.
  • Automatic wiki gardening: For other pages, it may be acceptable not to have updates for a longer period (100 days), but then needs to be archived later (in 150 days) if no-body makes an update.
  • Clean-up not viewed pages: In some fast-growing space, the pages that are not viewed by anyone in the last 50 days needs to be found automatically, and then the space administrator needs to decide whether to keep or archive them.
  • "Live forever" pages: Mixed with other pages, there may be pages that should not be checked for updates or views at all.

These are just a couple of frequent examples, the app allows implementing your own strategy by configuring so-called lifecycle rules.

Key concepts

Before learning more about the app features, it is important to understand the basic content lifecycle concepts.

The page "view age"

"View age" is calculated from the last view date of a page, simply subtracting that from the current date. Like any other time-related metric in the app, it is calculated with milliseconds precision, although predominantly expressed in "days".

The page "update age"

"Update age" is calculated from the last modification date of the page or from the last modification date of its newest attachment, whichever is newer.

Consequently, you can update a page not only by editing its content, but also by adding a new attachment or a newer version of an existing attachment to it. Rationale: when a page's primary goal is to display an attached file (ex: monthly sales report in an Excel spreadsheet), then the typical "update" action is to update the attachment itself, not the page content.

Please note that "update age" propagates upwards. It means that if a child page is younger than its parent, then the parent will "inherit" the child page's age.

Consequently, you can also update a page by updating one of its children! All updates recursively propagate to the top level: updating a page updates its parent, its grandparent and so on. Rationale: the base assumption is that when your pages are organized in a logical tree structure, this propagation makes sense.

The page status

Page status is the ultimate flag that expresses the currency and validity of a page. It indicates whether the page is actively maintained and viewed, whether its content is expired, whether it will be archived soon, or whether it is totally excluded from content lifecycle checks.

Mathematically speaking, the page status is a function of the page's "update age", its "view age" and the archiving settings applied to its space. Although understanding how statuses work is intuitive, the following flowchart explains the logic in details (click to the image!):

Starting steps

Initializing the Content Event Index

Better Content Archiving maintains an index with the last view (who viewed what, when?) and the last update information for each page. The index stores that information in a scalable and searchable way, eliminating expensive page tree traversals and accelerating various features of the app.

Although the index does not require any manual care later, you need to initialize (build) it once after the app's installation. Until the index is built, the app functionality that relies on views and updates will not work. After the index is initialized, it will be automatically maintained in the background with "micro updates" that are triggered by page visits, page edits, attachment additions and so.

The initialization must be executed globally by a Confluence administrator:

  1. Login to Confluence as administrator
  2. Go to AdministrationContent Quality
  3. Click the Build now button

As the second starting step, the app also encourages you to calculate the content quality statistics. After that is completed, your instance is ready for use!

You can learn more about initializing the two parts of the index in the next two sections.

Initializing the Page Update Index

Confluence precisely tracks page updates, all their details are available in the Confluence data model. Therefore, the app can unambiguously build this part of the index. Zero user interaction is required.

Initializing the Page View Index

(since app version 7.4.0)

Confluence does not track page views, however. It means that Better Content Archiving simply cannot know who viewed what page before the app's installation. This isn't a problem for post-installation page views as the app implements its own precise page view tracking mechanism, but this is definitely a problem for pre-installation ones.

To overcome this, the app offers three strategies to intuitively approximate the last view information when getting started with the app. During the actual use of the app, real page views will continuously replace the approximated values.

You can initialize the last view of each page to:

  1. The same as its last update: the user who last updated the page is considered the last viewer of the page. Note that this is not necessarily the user who most recently edited the page content! As explained above, it also depends on the most recent attachment addition and on the most recent update on the child pages.
  2. Its last updater and a specific date: it simulates that the user who last updated a page viewed that page in the given day at midnight. You can use it either with the current day (default) or any past day.
  3. Do not touch: it does not touch the last view information at all. We don't recommend this at initialization time, see the next section why.

We suggest using the strategies like this:

  • When building the index (initializing that the first time):
    1. The strategy "same as its last update" absolutely makes sense as the initial value.
    2. Or, you may prefer the "last updater and a specific date" strategy with the current day. Compared to the previous, it sets the same date for all pages, making it a crystal-clear starting point.
    3. Or, you may prefer the "last updater and a specific date" strategy with some past day (e.g. "2016-01-01" can a good choice). This is essentially the same as previous. Tip: don't use a date in the very far past. (For instance, if you select the date 400 days before today and use 365 days as the not-viewed alert interval, all pages will be immediately reported as not-viewed.)
    4. Using "do not touch" is discouraged. Although it allows starting with empty information (i.e. no approximations), it may also lead to the abandoned pages problem.
  • When re-building the index (later):
    1. You most likely want to use the "do not touch" strategy. This is because in normal circumstances the page view information should not be missing for any page at this stage.
    2. If you removed spaces from the blacklist, then you want to add the missing page view information for the pages in the un-blacklisted spaces. In this case, follow the recommendations in the "building the index the first time" point, because that's essentially the same case.

The default selection in the user interface reflects the recommendations above. If you are unsure, just use the defaults.

Please note that page view initialization is non-destructive by design. It means that whatever strategy you choose, if there is already a known last view for a page, that will not be overwritten.

Re-building the Content Event Index

In some rare cases, you may need to re-build the index:

  1. After you removed spaces from the blacklist (blacklisted spaces are not indexed to save resources).
  2. Or, when you are experiencing troubles with the app features (potentially caused by index inconsistency).

You can re-build the index both globally or for a single space by clicking the Re-build content update index button in the Content Quality Statistics screen.

Understanding content quality

Before you start implementing your own content lifecycle strategy, it is important to understand the current state of your content. This is particularly important for large Confluence sites with many contributors. The larger and older your site, the higher the probability of "out-of-date", "irrelevant" and "unused" pages.

Viewing content quality statistics

It may be really hard to see where to start your work. To get an overview, the app displays easy to understand statistics about the "up-to-dateness" of your pages in Confluence AdministrationArchiving / Content Quality Analytics:

As you can see in the screenshot, these statistics are available in multiple levels:

  1. Per space: reports the state of that particular space.
  2. Per space group: categorizing spaces to space groups is generally a best practice to add structure to your site. You can use categories like "customer", "internal" or "intranet", for example.
    That structure will also be reflected in this report, as space groups aggregate the statistics of the enclosed spaces.
  3. Global: total values aggregated from all spaces.

Clicking the page counts will show you the exact list of the corresponding pages.

The same statistics are also displayed for each space, in Space AdministrationArchivingContent Quality.

Recalculating content quality statistics

The statistics are recalculated periodically by a Confluence background job, in every two hours.

You can change the frequency of recalculations by configuring the job called Better Content Archiving: Analyze Content Quality. You can also recalculate the statistics instantly by running that job.

Viewing the status of all pages in a space

You can conveniently browse through the pages in a space, with their status, last update dates, last view dates displayed, in the Page Status Browser. Just go to the space, then to Content ToolsPage Status for this:

Viewing the status of a page

Page status is also displayed when viewing a page's content. See the Page Status Indicator icon located in the top left corner of the content panel. (The location may vary a little in different space themes, but generally it is somewhere around the page title.)

When you click the indicator icon, you will see more details in a bubble:

Configuring content lifecycle rules

After you understood the current state of your content and decided where to implement what type of strategy, you can start setting up the configurations for your spaces.

What is a lifecycle configuration?

Lifecycle configurations (or archiving configurations) are applied to Confluence spaces and describe what should happen with the pages in that space. Each space has its own lifecycle configuration, even if that is a "do nothing" configuration. It is possible to have a totally different strategy in each space, although it is probably impractical.

Configurations contain rules plus some additional settings, as visible in this screenshot:

Rules are combinations of triggers and actions:

  • Trigger: describes when to take the action. Ex: "if a page is older than 100 days or is labeled with 'expire-14/2/20'...".
  • Action: describes what to do. Ex: "...then notify its last modifier and the space supervisor".

For details on the rules and additional settings, please see the following pages:

Global configurations

When having tens or hundreds of spaces, maintaining the configuration separately for each would be time consuming. To ease maintenance, the app enables sharing and reusing so-called global configurations among multiple spaces.

To create, update or delete global configurations, login as Confluence administrator, then go to Confluence Administration ("cog" icon in the top-right) → General ConfigurationArchiving / Global Configurations (left panel). You can manage your global configurations in that screen.

Tip: always use a good descriptive name for your global configurations that identifies their primary use. Ex: "Monthly product documentation review".

After you have created a global configuration, you can apply it to a space by going to the space and clicking Space AdminArchiving / ConfigurationEdit configuration. The existing global configurations can be selected in the Select configuration drop-down.

Custom configurations

Although we strongly encourage using global configurations in most situations, you can also define custom settings that are applied to one specific space only. Just select Custom settings in the Select configuration drop-down, and set up the unique settings right there in the form.


As written above, triggers decide when to take actions. A trigger can be either based on the page's age or on the labels added to the page.

Age based triggers

This is the "automatic" (implicit) way of triggering actions.

In this case, the app calculates the "view age" and the "update age" (see definitions above) of the page, then compares those to a lower limit defined in the configuration. For example, "If the page was not updated for 100 days...".

Label based triggers

This is the "manual" (explicit) way of triggering actions.

In this case, the app checks the labels added to the page and to its ancestors (parent pages recursively). For example, "If the page is labeled with 'archive'...".

The labels themselves can be added to the pages using the quick actions (more convenient) or the Labels dialog.

Combining triggers

You can freely activate multiple types of triggers for the same actions, in which case they can be connected by trivial logical operators:

  • OR: at least one "sub-trigger" must be true to start the action.
  • AND: all "sub-triggers" must be true.

The configuration interface makes the combinations intuitive.

There is one special case though: for page archiving, you can activate 3 different triggers and you can even freely select the first logical operator:

How does it work when all three sub-triggers are active?

  • OR: trivial, as the order of evaluation doesn't matter.
  • AND: as the graphics around the triggers suggests, first the two age-based sub-triggers are evaluated, and then the result is combined with the last label-based sub-trigger.
    Interpret it like this: the app inspects the ages, but you can manual override the result by applying the labels.

Using the quick actions

(since app version 8.3.0)

There are several ways to take manual actions which affect the lifecycle of a page. For example, you can edit a page, or you can add the archive label to a page to archive it together with its descendant pages. Some of these manual actions are trivial and natural (e.g. editing), while others may require attention (e.g. modifying labels). For example, if you mistype the label name and add the label arhive accidentally (note the missing "c"!), it will be ignored and nothing happens.

To ease working with the lifecycle of a page, the app offers so-called quick actions. Using these you can work in a friendly, controlled and less error-prone way.

This video gives a 3-minute introduction to quick actions:

After watching the video, please continue reading for the details.

Using quick actions while viewing pages

Quick actions appear in the bottom part of the Page Status Indicator. The quick actions offered for a page depend on its current status and on its current labels. Secondary (less important) actions are hidden behind the "..." link to reduce the clutter on the interface.

After clicking one of the quick actions, you can configure its parameters and submit it with ease:

Note that different quick actions have different parameters:

Depending on its current status and on the quick action, the page's status may change immediately after submitting the quick action.

There are situations, however, when the status change is not immediate. For example, if you set a future date for expiration, then the page may remain "up-to-date" after submitting the quick action. In these situations, there is a message shown in the top right corner to confirm that your action will take effect later.

Finally, there are situations when there is no actual effect of the quick action. For example, if you set an archiving date for a page, but the page archiving feature itself is not enabled in its space, then nothing is going to happen. In these situations, there is a warning message shown in the top right corner, with a link that helps to remedy the situation.

Using quick actions from notification emails

You can also use the quick actions directly from the notification emails sent by the app:

Clicking the links in the email will open the corresponding page in your browser and also open the dialog of selected quick action. It saves the navigation efforts of first opening the page, then opening the Page Status Indicator, then looking up the action and clicking it. All these are reduced to a single click.

Note that the quick actions in the email depend on the notification email type. For example, there is no "Set expiration" offered in the "archived pages" notification email as it would be useless at that point. As everything else, quick actions can be added to or removed from the emails by customizing the notification email templates.

How do quick actions work?

The table below describes the quick actions and gives practical advices how to use them.

It's important to understand that quick actions seamlessly integrate with the corresponding Confluence features. It means that you can modify the labels of a page both using a quick action and using the built-in Labels dialog, for instance. These are equivalent, but the former is easier to use. (The latter is explained below.)

Action How does it work?
Discuss This action allows sending an email to the selected recipients related to the page you're viewing.

It is a multi-purpose action which should be used to:
  • suggest an update on an expired page
  • suggest archiving a non-viewed page
  • discourage archiving of a page that you think may be useful in the future
  • ... and so on.

Your message with a link to the page is emailed to the recipients. The message is not added to the page as comment (i.e. it is not persistent).
Update This action opens the page for editing (in the built-in editor).

It should be used for those pages that became expired because haven't been updated for a long period and require an update.

After saving your changes, the page receives a new last update date. As a consequence, the expiration period restarts.
Confirm This action confirms that the content of the page is still up-to-date and there is no need for an update.

It should be used for those pages that became expired because haven't been updated for a long period, but don't require an actual update, because their content is still current. You can optionally enter a comment to explain why you confirm this page. All these information are saved for future audits.

A new page version is added to the page:
  • It has the same content as the previous version.
  • It is created as a so-called "minor change", thus watchers will not be notified.
  • It captures the details of the "confirm" action in the page's history: the user account who confirmed, the timestamp and the comment entered.
As the page receives a new last update date, the expiration period restarts.
Set expiration This action sets the expiration date for the page and optionally for its descendant pages.

One of the expire-* labels, depending on the parameters selected in the dialog, is added to the page. (These labels control the expiration of the page.) Previously added expire-* labels are removed to avoid unexpected behavior.
Remove expiration This action removes the expiration date from the page and its descendant pages (if applicable).

All expire-* labels are removed. (These labels control the expiration of the page.)
Set archiving This action sets the archiving date (immediately or on a given date) for the page and optionally for its descendant pages.

One of the archive-* labels, depending on the parameters selected in the dialog, is added to the page. (These labels control the archiving of the page.) Previously added archive-* labels are removed to avoid unexpected behavior.
Remove archiving This action removes the archiving date from the page and its descendant pages (if applicable)

All archive-* labels are removed. (These labels control the archiving of the page.)
Jump to ancestor This action opens the ancestor page from which the current page inherits its status.

For example, if the parent page specifies that the the parent page itself and all its descendant must be archived on a give date, then a child page essentially inherits its status from the parent. In this case, you probably want to go to the parent and review the lifecycle settings of that.

The ancestor page is opened. (Even if it's not the direct parent, but the parent of the parent, this action makes the navigation super-easy.)
Exclude This action excludes the page from lifecycle tracking.

It should be used for pages with static, "always current" content.

One of the noarchive and noarchive-single labels, depending on the parameters selected in the dialog, is added to the page. (These labels disable the lifecycle tracking of the page.) Previously added noarchive-* labels will be removed to avoid unexpected behavior.
Include This action includes a page, which was previously excluded, in lifecycle tracking.

noarchive and noarchive-single labels are removed from the page. (These labels disable the lifecycle tracking of the page.)

Quick action triggered status changes

When you execute a quick action, the status of the page will be immediately recalculated. It results in an intuitive status change in most cases. For example, if you have a page that expired due to not being updated for 100 days and you use the "confirm" action, it becomes "up-to-date". So far, so good.

In some cases, however, the result may not look intuitive for the first look. Here are two examples of these.

Example 1: if you have a page that expired due to not being updated for 100 days and also due to having "2018 Jan 1" set as expiration date, and you use the "confirm" action, it remains "expired". Why? Because although the first condition is not satisfied anymore, the second still keeps the page in "expired".

Example 2: if you have a page and also its parent labelled with noarchive and you remove the label from the child page, it remains "excluded from lifecycle checks". Why? Because the parent's label also excludes all descendants.

These and similar situations are being caused by "overlapping" lifecycle rules and are relatively infrequent. Note that the app works correctly also in these cases, and even if the status is unchanged, its explanation in the Content Status Indicator will be different.

Tip: when you see the page in an unexpected status after taking a quick action, have a second look and eventually review your lifecycle rules.

Modifying labels directly

Like mentioned above, you can also use the built-in Confluence feature to add or remove labels. (In pre-8.3.0 app versions, where quick actions are not available, this is the only way.)

For the list of the lifecycle control labels, please see the following pages:

The lifecycle job

After you set up your configurations, the actual work will be done by a regular Confluence job in the background.

The job is called Better Content Archiving: Find and Archive Expired Content and can be managed via the Scheduled Jobs screen in Confluence, if necessary.

What does the lifecycle job do?

When executed:

  1. it first archives pages,
  2. then checks page views,
  3. finally checks page expirations.

It is strictly done in this order, so that an archived page will never be reported as "expired" after.

In each step, it evaluates the related triggers and takes the related actions (typically sending out notification emails).

Executing the lifecycle job

The job can be started in two ways:

  1. Scheduled execution: Confluence executes it at regular time intervals, like any other scheduled job.
    By default it is executed once a week, exactly at 2:00AM every Monday. You can flexibly change the schedule to run every night, once a month or whatever timing you prefer.
  2. Manual execution: you also have the possibility to start the job immediately globally (for all spaces) or for a single space.
    • Global scope: login to Confluence as administrator, then go to AdministrationStart Archiving (under Archiving) and click Start.
      Please note that Confluence administrator permissions are required.
    • Space scope: assuming you are a space administrator, go to Space ToolsArchivingStart Archiving and click Start.
      If the archive space already exist for this fresh space, then space administrator permissions are required. If, however, the archive space does not exist yet (because there was no page archived yet from this fresh space), then the create space global permission is also required. This secondary check prevents users without the create space permission from creating the archive space and then using that for other purposes.

After the start, scheduled and manual execution work identically.

Note: if you turned on page archiving, it is recommended to execute the job outside your regular working hours. Why? If your users are editing pages while the job is running, it can result in the same type of conflicts that may also happen when two users are working on the same pages. What could typically happen is that a user will not be able to save his modification if the page was archived by the job in the meanwhile. It is no big deal, the user will just get a warning message from Confluence, but be aware of it.

The two other mechanisms (page view and -expiration tracking) are 100% read-only, thus concurrent editing will not cause any problem. If you use only those, you can run the job any time.

Notification emails

In order to keep all stakeholders informed, notification emails are sent to them. These emails contain all relevant information and provide quick links for the most typical actions in that context. For example, the "expired pages" emails offer an "edit" link for each page right in the email, to encourage and ease content updates.

For every type of notifications, you can select who to notify:

  1. Author (the user who originally created the page)
  2. Last modifier (the user who updated the page most recently)
  3. Space administrators (all users with administrator permission in the enclosing space)
  4. Space creator (the user who originally created the enclosing space)
  5. Supervisor (see the next section)

For details about the different type of notification emails, please see the following pages:


The app allows notifying specific users, typically the ones responsible for managing the lifecycle. They do not need to be related to the pages or spaces in any other way (do not need to be creators, modifiers or so).

For this, check the Notify supervisors option and select one or more Confluence user accounts to be notified. You can do this separately for each space and for every notification type, but as always, using global configurations makes it more manageable.

Another way of using this feature if you want to get the notifications to a specific (external) mailbox. In this situation, just create a Confluence user account with that email address, and select this "artificial" user as supervisor.

Only one supervisor can be selected per archiving configuration in app versions prior to 7.4.0.

Event streams

The app keeps the precise history of what pages were expired, archived, skipped or updated. This is done in the form of easy-to-use event streams, similar to Facebook feeds. This information is crucial for audits, or if you just want to better understand what happened with your content over time.

Event streams are available per space under the Space Administration of the corresponding space:

To see the global event stream aggregated from all spaces, it is available under Confluence Administration:

Clicking the page counts in the events will show you the exact timestamp and page list of the corresponding event:

Blacklisted spaces

(since app version 5.0.0)

Generally speaking, every Confluence space benefits from content lifecycle tracking. There are situations, however, when it is better to blacklist some spaces, excluding those from content quality statistics, page view- and expiration tracking, and page archiving. Blacklisted spaces will be completely ignored by the Better Content Archiving app.

The blacklist is empty by default, meaning that all spaces are tracked by default. You can add spaces to the blacklist at Confluence AdministrationArchivingBlacklisted Spaces.

We suggest blacklisting the following types of spaces:

  • Irrelevant spaces: these are the spaces that should not be tracked, because their status is just not relevant. These include dead content (legacy garbage), never-changing (static) content, machine-generated content, and other spaces that would create superfluous load on the server without creating real value for the users.
    For large Confluence instances, blacklisting these can make a big difference in terms of app performance!
  • Spaces with corrupt data: Confluence, apps and scripts sometimes create data that is considered corrupt. Corrupt data includes for example:
    • pages with NULL update dates (should never happen!)
    • child pages with NULL parents (may happen when your Confluence index is broken!)
    • dead attachments (existing in the database, but deleted from the filesystem)
    The app is designed to tolerate and report many of these problems, but it does not tolerate non-recoverable situations (ex: broken parent-child relations).
    If you identify a space with broken data, you should temporarily blacklist that until the data is fixed.

In this sense, properly configuring the blacklist is an optimization possibility. It both reduces the clutter in the application interface (ex: irrelevant spaces not polluting the content quality statistics) and the load on your server.

Next step

Learn more about page view tracking to trim the non-viewed pages from your spaces.


Ask us any time.