Leaked Document – Review Project

ProblemSolutionBenefits
A new and valuable dataset was acquired by our client who wished to interrogate and cross reference the data against its own intelligence. In the past significant time and resources were required.Frisk was deployed to rapidly connect and enrich the data which in turn allowed our client to identify items for further investigation. Frisk processed the data within a 7 day timeframe reducing the time taken to process via existing technology by 92%.
  • Reduced time frame for accessing insights.
  • Utilising information in email attachments.

Problem

The Paradise Papers are a set of 13.4 million confidential electronic records relating to offshore investments that were leaked to a German newspaper. Through the International Consortium of Investigative Journalists, some of the documents in the leak were shared with Tax Offices around the world including the Australian Taxation Office (ATO). One of the challenges facing the ATO was how to rapidly review the vast amount of documents and quickly identify Australian entities referenced within, so further detailed reviews of pertinent information could be initiated.

Typical manual review processes across the corpus of unstructured content would have taken many thousands of man hours. Alternative technology solutions available within the ATO presented significant speed and capability limitations when it came to OCR and indexing.

Solution

Frisk was implemented to conduct the document indexing and discovery process, heavily reducing the time required to get insights from the documents. The process Frisk implemented included:

  1. Configure & Index: Given the large volume of documents, Frisk’s streaming configuration enabled the optimal use of available compute power to reduce the processing time required. Image based files were processed using OCR and index on the fly, capturing all available text in the files. Appliances were configured and indexing within 24 hours.
  2. Content Analysis: Frisk enabled a rapid review of the documents to determine the nature of the content and document type. The Intuitive UI enabled case offices to quickly determine the contents of the documents and review each document as required. Data was available for review within 24 hours.
  3. Cross Reference Data Sources: Numerous reference sources of known companies, individuals and other relevant data was used to build and execute multiple bulk queries (100’s of 1000’s) against the index utilising Frisk’s bulk query capability. This enabled rapid identification of “entities of interest” matching entities against queries as well as isolating which documents contained references to matched entities.
  4. Refinement Analysis: Analysis of reference sources and query result sets enabled further data cleansing and refinement in bulk search criteria to remove false positives/negatives, delivering high confidence result sets.
  5. Metadata Tagging: Query result sets were written back as metadata in the index, enabling configuration of the UI interface to leverage the tagged metadata and deliver valuable filter options readily available to taskforce officers when searching across the corpus of data.
  6. Reporting: Result sets were exported for 2 primary purposes, 1) exporting raw results data for processing via further downstream processes 2) identification of documents to be migrated across to a case management system.
StageProcessFrisk Capability
Configure & Index
  1. Configure appliances to maximise streaming options
  2. OCR and Index to enable search across document content and metadata.
  • OCR and index on the fly
  • Streaming – take advantage of available compute power
Content Analysis
  1. Review the content of the target documents
  2. Write Boolean search criteria to target areas of interest
  3. Save search criteria for repeat use
  4. Provide summary of search results
  • Smart Search
  • Export to Report
Cross Reference Data Sources
  1. Build bulk queries from reference sources
  2. Target entities of interest
  • Bulk Query tool
  • Report outcomes for further downstream analysis
Refinement Analysis
  1. Enable filtering to include/exclude documents from query results
  2. Iteratively define ambiguous source references
  3. Identify and remove false positives/negatives
  • Technical consulting
  • Bulk Query tool
Metadata Tagging
  1. From result sets, either:
    • Augment the document metadata held in the index
    • Or, write to a reference list
  2. Tailor the UI to enable filtering on the tagged metadata
  • Index new classifications and make searchable
  • Create tailored filters in the UI
Reporting
  1. Produce a report that lists all query results (documents) in each relevant classification or grouping
  2. Exported document content can be used to extract documents for use in the case management system
  • Export to Report

Contact Us

leaked-data-sets

180 Flinders St ADELAIDE SA 5000
PO Box 879, UNLEY BC SA 5061

Phone: 1300 43 33 11
Email: mail@frisk.com.au

180 Flinders St ADELAIDE SA 5000
PO Box 879, UNLEY BC SA 5061

p: 1300 43 33 11
e: mail@frisk.com.au

Enquiry

  • This field is for validation purposes and should be left unchanged.