|A new and valuable dataset was acquired by our client who wished to interrogate and cross reference the data against its own intelligence. In the past significant time and resources were required.||Frisk was deployed to rapidly connect and enrich the data which in turn allowed our client to identify items for further investigation. Frisk processed the data within a 7 day timeframe reducing the time taken to process via existing technology by 92%.|
The Paradise Papers are a set of 13.4 million confidential electronic records relating to offshore investments that were leaked to a German newspaper. Through the International Consortium of Investigative Journalists, some of the documents in the leak were shared with Tax Offices around the world including the Australian Taxation Office (ATO). One of the challenges facing the ATO was how to rapidly review the vast amount of documents and quickly identify Australian entities referenced within, so further detailed reviews of pertinent information could be initiated.
Typical manual review processes across the corpus of unstructured content would have taken many thousands of man hours. Alternative technology solutions available within the ATO presented significant speed and capability limitations when it came to OCR and indexing.
Frisk was implemented to conduct the document indexing and discovery process, heavily reducing the time required to get insights from the documents. The process Frisk implemented included:
- Configure & Index: Given the large volume of documents, Frisk’s streaming configuration enabled the optimal use of available compute power to reduce the processing time required. Image based files were processed using OCR and index on the fly, capturing all available text in the files. Appliances were configured and indexing within 24 hours.
- Content Analysis: Frisk enabled a rapid review of the documents to determine the nature of the content and document type. The Intuitive UI enabled case offices to quickly determine the contents of the documents and review each document as required. Data was available for review within 24 hours.
- Cross Reference Data Sources: Numerous reference sources of known companies, individuals and other relevant data was used to build and execute multiple bulk queries (100’s of 1000’s) against the index utilising Frisk’s bulk query capability. This enabled rapid identification of “entities of interest” matching entities against queries as well as isolating which documents contained references to matched entities.
- Refinement Analysis: Analysis of reference sources and query result sets enabled further data cleansing and refinement in bulk search criteria to remove false positives/negatives, delivering high confidence result sets.
- Metadata Tagging: Query result sets were written back as metadata in the index, enabling configuration of the UI interface to leverage the tagged metadata and deliver valuable filter options readily available to taskforce officers when searching across the corpus of data.
- Reporting: Result sets were exported for 2 primary purposes, 1) exporting raw results data for processing via further downstream processes 2) identification of documents to be migrated across to a case management system.
|Configure & Index|
|Cross Reference Data Sources|