Frequently Asked Questions for Agencies

  1. What type of documents can I digitize using DIP?

    You can digitize any document image that is human readable and has a defined structure like a printed form or a register with defined rows & columns. However it is suggested that you digitize only those documents which are generated in high volume, have a similar document structure and need frequent access.

  2. What type of data can I extract from the documents?

    DIP can process and extract multi-lingual text, numeric and alphanumeric data from the document images.

  3. How does DIP ensure the Data quality and accuracy?

    DIP uses multiple levels of quality checks for verification and validation of the data. It uses image validation technique to ensure that only similar types of documents are processed in a batch. It uses pre-defined field level validations to ensure correct data type entered by the crowd workforce and multi-level data value comparisons through a maker-checker process for data accuracy and quality check. Human validation is used for data fields that fail the automated quality checks. In future DIP will be using pre-defined data dictionaries and machine learning algorithms for higher levels of data accuracy and quality.

  4. How does DIP ensure Data privacy and security?

    • DIP is hosted on NIC's secure cloud infrastructure "Meghraj" that provides restricted access only to authorized personnel. The data transmission from the cloud to the crowd is secured through industry standard encryption algorithms and protocols like SSL and HTTPS.
    • The data from the documents is distributed to the crowd in fragments through a randomization algorithm that ensures that no individual gets more than a fixed number of randomly assigned fields making it difficult to identify the type of the data or the document.
    • The data extracts generated for an organization can be accessed only by authorized personnel of the organization with system assigned ids and passwords.
    • The identity and authentication of the crowd agents is done through Aadhar number using the UIDAI database and every crowd agent is assigned a unique user id and password.
    • The system maintains an audit log of all the transactions including login details, locations, machine id etc. and will soon have a fraud engine to monitor suspicious transactions.
  5. How do I get started on DIP?

    • Identify the documents you need to digitize.
    • Verify their format to check that they are similar
    • Estimate the volume of documents you need to digitize
    • Verify the image quality to check that they are human readable
    • Identify the data fields per document you need to extract
    • Register as a department on the Digitize India portal or mail us the information @
  6. What is the desired outcome of the DIP?

    We intend to leverage DIP to lead all organizations towards a paperless office, make data available on demand to the citizens, free archived documents storage spaces and enhance digital public service delivery.