In this page

What is the Archiving Plugin?

Archiving Plugin for Confluence is the Content Lifecycle Management solution for Confluence.

For a high level overview of the app value proposition and core functionality, please see the app home page. This page is the starting page of the user documentation.

How will I use the Archiving Plugin?

The app supports every kind of content lifecycle- and information retention strategies around Confluence pages.

For example:

  • Expiration at a date: Some pages expire on one specific date, like 1 Mar 2015 (like the end date of the fiscal year). They should not be automatically archived, but a supervisor should decide whether to archive them.
  • Periodic reviews: Some pages need weekly reviews, and its maintainers should be notified about the expiration until the page gets an update.
  • Automatic wiki gardening: For other pages, it may be acceptable not to have updates for a longer period (100 days), but then needs to be archived later (in 150 days) if no-body makes an update.
  • Clean-up not viewed pages: In some fast-growing space, the pages that are not viewed by anyone in the last 50 days needs to be found automatically, and then the space administrator needs to decide whether to keep or archive them.
  • "Live forever" pages: mixed with other pages, there may be pages that should not be checked for updates or views at all.

These are just examples, the app allows your own strategy by configuring so-called lifecycle rules.

Key concepts

Before learning more about the app features, it is important to understand the basic content lifecycle concepts.

The page "view age"

"View age" is calculated from the last view date of a page, simply subtracting that from the current date. Like any other time-related metric in the app, it is calculated with milliseconds precision, although predominantly expressed in "days".

The page "update age"

"Update age" is calculated from the last modification date of the page or from the last modification date of its newest attachment, whichever is newer.

Consequently, you can update a page not only by editing its content, but also by adding a new attachment or a newer version of an existing attachment to it. Rationale: when a page's primary goal is to display an attached file (ex: monthly sales report in an Excel spreadsheet), then the typical "update" action is to update the attachment itself, not the page content.

Please note that "update age" propagates upwards. It means that if a child page is younger than its parent, then the parent will "inherit" the child page's age.

Consequently, you can also update a page by updating one of its children! All updates recursively propagate to the top level: updating a page updates its parent, its grandparent and so on. Rationale: the base assumption is that when your pages are organized in a logical tree structure, this propagation makes sense.

The page status

Page status is the ultimate flag that expresses the currency and validity of a page. It indicates whether the page is actively maintained and viewed, whether its content is expired, whether it will be archived soon, or whether it is totally excluded from content lifecycle checks.

Mathematically speaking, the page status is a function of the page's "update age", its "view age" and the archiving settings applied to its space. Although understanding how statuses work is intuitive, the following flowchart explains the logic in details (click to the image!):

Starting steps

Initializing the Content Event Index

Archiving Plugin maintains an index with the last view (who viewed what, when?) and the last update information for each page. The index stores that information in a scalable and searchable way, eliminating expensive page tree traversals and accelerating various features of the app.

Although the index does not require any manual care later, you need to initialize (build) it once after the app's installation. Until the index is built, the app functionality that relies on views and updates will not work. After the index is initialized, it will be automatically maintained in the background with "micro updates" that are triggered by page visits, page edits, attachment additions and so.

The initialization must be executed globally by a Confluence administrator:

  1. Login to Confluence as administrator
  2. Go to AdministrationContent Quality
  3. Click the Build now button

As the second starting step, the app also encourages you to calculate the content quality statistics. After that is completed, your instance is ready for use!

You can learn more about initializing the two parts of the index in the next two sections.

Initializing the Page Update Index

Confluence precisely tracks page updates, all their details are available in the Confluence data model. Therefore, the app can unambiguously build this part of the index. Zero user interaction is required.

Initializing the Page View Index

Confluence does not track page views, however. It means that Archiving Plugin simply cannot know who viewed what page before the app's installation. This isn't a problem for post-installation page views as the app implements its own precise page view tracking mechanism, but this is definitely a problem for pre-installation ones.

To overcome this, the app offers three strategies to intuitively approximate the last view information when getting started with the app. During the actual use of the app, real page views will continuously replace the approximated values.

You can initialize the last view of each page to:

  1. The same as its last update: the user who last updated the page is considered the last viewer of the page. Note that this is not necessarily the user who most recently edited the page content! As explained above, it also depends on the most recent attachment addition and on the most recent update on the child pages.
  2. Its last updater and a specific date: it simulates that the user who last updated a page viewed that page in the given day at midnight. You can use it either with the current day (default) or any past day.
  3. Do not touch: it does not touch the last view information at all. We don't recommend this at initialization time, see the next section why.

We suggest using the strategies like this:

  • When building the index (initializing that the first time):
    1. The strategy "same as its last update" absolutely makes sense as the initial value.
    2. Or, you may prefer the "last updater and a specific date" strategy with the current day. Compared to the previous, it sets the same date for all pages, making it a crystal-clear starting point.
    3. Or, you may prefer the "last updater and a specific date" strategy with some past day (e.g. "2016-01-01" can a good choice). This is essentially the same as previous. Tip: don't use a date in the very far past. (For instance, if you select the date 400 days before today and use 365 days as the not-viewed alert interval, all pages will be immediately reported as not-viewed.)
    4. Using "do not touch" is discouraged. Although it allows starting with empty information (i.e. no approximations), it may also lead to the abandoned pages problem.
  • When re-building the index (later):
    1. You most likely want to use the "do not touch" strategy. This is because in normal circumstances the page view information should not be missing for any page at this stage.
    2. If you removed spaces from the blacklist, then you want to add the missing page view information for the pages in the un-blacklisted spaces. In this case, follow the recommendations in the "building the index the first time" point, because that's essentially the same case.

The default selection in the user interface reflects the recommendations above. If you are unsure, just use the defaults.

Please note that page view initialization is non-destructive by design. It means that whatever strategy you choose, if there is already a known last view for a page, that will not be overwritten.

The "page view initialization" feature is available since app version 7.4.0.

Re-building the Content Event Index

In some rare cases, you may need to re-build the index:

  1. After you removed spaces from the blacklist (blacklisted spaces are not indexed to save resources).
  2. Or, when you are experiencing troubles with the app features (potentially caused by index inconsistency).

You can re-build the index both globally or for a single space by clicking the Re-build content update index button in the Content Quality Statistics screen.

Understanding content quality

Before you start implementing your own content lifecycle strategy, it is important to understand the current state of your content. This is particularly important for large Confluence sites with many contributors. The larger and older your site, the higher the probability of "out-of-date", "irrelevant" and "unused" pages.

Viewing content quality statistics

It may be really hard to see where to start your work. To get an overview, the app displays easy to understand statistics about the "up-to-dateness" of your pages in Confluence AdministrationArchiving / Content Quality Analytics:

As you can see in the screenshot, these statistics are available in multiple levels:

  1. Per space: reports the state of that particular space.
  2. Per space group: categorizing spaces to space groups is generally a best practice to add structure to your site. You can use categories like "customer", "internal" or "intranet", for example.
    That structure will also be reflected in this report, as space groups aggregate the statistics of the enclosed spaces.
  3. Global: total values aggregated from all spaces.

Clicking the page counts will show you the exact list of the corresponding pages.

The same statistics are also displayed for each space, in Space AdministrationArchivingContent Quality.

Recalculating content quality statistics

The statistics are recalculated periodically by a Confluence background job, in every two hours.

You can change the frequency of recalculations by configuring the job called Archiving Plugin: Analyze Content Quality. You can also recalculate the statistics instantly by running that job.

Viewing the status of all pages in a space

You can conveniently browse through the pages in a space, with their status, last update dates, last view dates displayed, in the Page Status Browser. Just go to the space, then to Content ToolsPage Status for this:

Viewing the status of a page

Page status is also displayed when viewing a page's content. See the Page Status Indicator icon located in the top left corner of the content panel. (The location may vary a little in different space themes, but generally it is somewhere around the page title.)

When you click the indicator icon, you will see more details in a bubble:

Configuring content lifecycle rules

After you understood the current state of your content and decided where to implement what type of strategy, you can start setting up the configurations for your spaces.

What is a lifecycle configuration?

Lifecycle configurations (or archiving configurations) are applied to Confluence spaces and describe what should happen with the pages in that space. Each space has its own lifecycle configuration, even if that is a "do nothing" configuration. It is possible to have a totally different strategy in each space, although it is probably impractical.

Configurations contain rules plus some additional settings, as visible in this screenshot:

Rules are combinations of triggers and actions:

  • Trigger: describes when to take the action. Ex: "if a page is older than 100 days or is labeled with 'expire-14/2/20'...".
  • Action: describes what to do. Ex: "...then notify its last modifier and the space supervisor".

For details on the rules and additional settings, please see the following pages:

Global configurations

When having tens or hundreds of spaces, maintaining the configuration separately for each would be time consuming. To ease maintenance, the app enables sharing and reusing so-called global configurations among multiple spaces.

To create, update or delete global configurations, login as Confluence administrator, then go to Confluence Administration ("cog" icon in the top-right) → General ConfigurationArchiving / Global Configurations (left panel). You can manage your global configurations in that screen.

Tip: always use a good descriptive name for your global configurations that identifies their primary use. Ex: "Monthly product documentation review".

After you have created a global configuration, you can apply it to a space by going to the space and clicking Space AdminArchiving / ConfigurationEdit configuration. The existing global configurations can be selected in the Select configuration drop-down.

Custom configurations

Although we strongly encourage using global configurations in most situations, you can also define custom settings that are applied to one specific space only. Just select Custom settings in the Select configuration drop-down, and set up the unique settings right there in the form.

Triggers

As written above, triggers decide when to take actions. A trigger can be either based on the page's age or on the labels added to the page.

Age based triggers

This is the "automatic" (implicit) way of triggering actions.

In this case, the app calculates the "view age" and the "update age" (see definitions above) of the page, then compares those to a lower limit defined in the configuration. For example, "If the page was not updated for 100 days...".

Label based triggers

This is the "manual" (explicit) way of triggering actions.

In this case, the app checks the labels added to the page and to its ancestors (parent pages recursively). For example, "If the page is labeled with 'archive'...".

The labels themselves can be conveniently added to the pages using the built-in Confluence feature:

For the list of the lifecycle control labels, please see the following pages:

Combining triggers

You can freely activate multiple types of triggers for the same actions, in which case they can be connected by trivial logical operators:

  • OR: at least one "sub-trigger" must be true to start the action.
  • AND: all "sub-triggers" must be true.

The configuration interface makes the combinations intuitive.

There is one special case though: for page archiving, you can activate 3 different triggers and you can even freely select the first logical operator:

How does it work when all three sub-triggers are active?

  • OR: trivial, as the order of evaluation doesn't matter.
  • AND: as the graphics around the triggers suggests, first the two age-based sub-triggers are evaluated, and then the result is combined with the last label-based sub-trigger.
    Interpret it like this: the app inspects the ages, but you can manual override the result by applying the labels.
Trigger precedence change in 4.3.0

Important: in app versions prior to 4.3.0, the evaluation of this combination was made in different order! In those versions, the second two sub-triggers are evaluated first, and then the result is combined with the first sub-trigger.

Because of this change, we suggest to review your archiving configurations that use all 3 sub-triggers when upgrading to 4.3.0. We are sorry for the extra work, but we decided to change this in 4.3.0, as we are convinced that the new order is more intuitive.

The lifecycle job

After you set up your configurations, the actual work will be done by a regular Confluence job in the background.

The job is called Archiving Plugin: Find and Archive Expired Content and can be managed via the Scheduled Jobs screen in Confluence, if necessary.

What does the lifecycle job do?

When executed:

  1. it first archives pages,
  2. then checks page views,
  3. finally checks page expirations.

It is strictly done in this order, so that an archived page will never be reported as "expired" after.

In each step, it evaluates the related triggers and takes the related actions (typically sending out notification emails).

Executing the lifecycle job

The job can be started in two ways:

  1. Scheduled execution: Confluence executes it at regular time intervals, like any other scheduled job.
    By default it is executed once a week, exactly at 2:00AM every Monday. You can flexibly change the schedule to run every night, once a month or whatever timing you prefer.
  2. Manual execution: you also have the possibility to start the job immediately globally (for all spaces) or for a single space.
    • Global scope: login to Confluence as administrator, then go to AdministrationStart Archiving (under Archiving) and click Start.
      Please note that Confluence administrator permissions are required.
    • Space scope: assuming you are a space administrator, go to Space ToolsArchivingStart Archiving and click Start.
      If the archive space already exist for this fresh space, then space administrator permissions are required. If, however, the archive space does not exist yet (because there was no page archived yet from this fresh space), then the create space global permission is also required. This secondary check prevents users without the create space permission from creating the archive space and then using that for other purposes.

After the start, scheduled and manual execution work identically.

Note: if you turned on page archiving, it is recommended to execute the job outside your regular working hours. Why? If your users are editing pages while the job is running, it can result in the same type of conflicts that may also happen when two users are working on the same pages. What could typically happen is that a user will not be able to save his modification if the page was archived by the job in the meanwhile. It is no big deal, the user will just get a warning message from Confluence, but be aware of it.

The two other mechanisms (page view and -expiration tracking) are 100% read-only, thus concurrent editing will not cause any problem. If you use only those, you can run the job any time.

Notification emails

In order to keep all stakeholders informed, notification emails are sent to them. These emails contain all relevant information and provide quick links for the most typical actions in that context. For example, the "expired pages" emails offer an "edit" link for each page right in the email, to encourage and ease content updates.

For every type of notifications, you can select who to notify:

  1. Author (the user who originally created the page)
  2. Last modifier (the user who updated the page most recently)
  3. Space administrators (all users with administrator permission in the enclosing space)
  4. Space creator (the user who originally created the enclosing space)
  5. Supervisor (see the next section)

For details about the different type of notification emails, please see the following pages:

Supervisors

The app allows notifying specific users, typically the ones responsible for managing the lifecycle. They do not need to be related to the pages or spaces in any other way (do not need to be creators, modifiers or so).

For this, check the Notify supervisors option and select one or more Confluence user accounts to be notified. You can do this separately for each space and for every notification type, but as always, using global configurations makes it more manageable.

Another way of using this feature if you want to get the notifications to a specific (external) mailbox. In this situation, just create a Confluence user account with that email address, and select this "artificial" user as supervisor.

Only one supervisor can be selected per archiving configuration in app versions prior to 7.4.0.

Event streams

The app keeps the precise history of what pages were expired, archived, skipped or updated. This is done in the form of easy-to-use event streams, similar to Facebook feeds. This information is crucial for audits, or if you just want to better understand what happened with your content over time.

Event streams are available per space under the Space Administration of the corresponding space:

To see the global event stream aggregated from all spaces, it is available under Confluence Administration:

Clicking the page counts in the events will show you the exact timestamp and page list of the corresponding event:

Blacklisted spaces

Generally speaking, every Confluence space benefits from content lifecycle tracking. There are situations, however, when it is better to blacklist some spaces, excluding those from content quality statistics, page view- and expiration tracking, and page archiving. Blacklisted spaces will be completely ignored by the Archiving Plugin.

The blacklist is empty by default, meaning that all spaces are tracked by default. You can add spaces to the blacklist at Confluence AdministrationArchivingBlacklisted Spaces.

We suggest blacklisting the following types of spaces:

  • Irrelevant spaces: these are the spaces that should not be tracked, because their status is just not relevant. These include dead content (legacy garbage), never-changing (static) content, machine-generated content, and other spaces that would create superfluous load on the server without creating real value for the users.
    For large Confluence instances, blacklisting these can make a big difference in terms of app performance!
  • Spaces with corrupt data: Confluence, apps and scripts sometimes create data that is considered corrupt. Corrupt data includes for example:
    • pages with NULL update dates (should never happen!)
    • child pages with NULL parents (may happen when your Confluence index is broken!)
    • dead attachments (existing in the database, but deleted from the filesystem)
    The app is designed to tolerate and report many of these problems, but it does not tolerate non-recoverable situations (ex: broken parent-child relations).
    If you identify a space with broken data, you should temporarily blacklist that until the data is fixed.

In this sense, properly configuring the blacklist is an optimization possibility. It both reduces the clutter in the application interface (ex: irrelevant spaces not polluting the content quality statistics) and the load on your server.

The "blacklisted spaces" feature is available since app version 5.0.0.

Next step

Learn more about page view tracking to trim the non-viewed pages from your spaces.

Questions?

Ask us any time.