Process billions of records with Async SOQL

Christmas time is closer but I would like to deliver this last post before end of the year.

Some months ago, I wrote an entry talking about BigObjects, a feature that was General Available by Winter ’18 and via a use case, I tried to explain it. Now, I would like to follow that post with a new way to create records, Async SOQL.

Async SOQL is half GA by Winter ’18 and half still in Pilot 

The use case talked about moving Code Review custom object records, into Code Review History big objects records, and release storage of custom objects one. We want to Archive records, and Async SOQL will help us to deal with big amounts of data.

What is Async SOQL?

Basically Async SOQL allows you to run SOQL in the background so that, you will get a response after a period of time, but at the same time, helps you to deal with millions or even billions of records without hitting time outs or governor limits.

How can I execute Async SOQL?

Async SOQL is implemented as REST full API and in order to execute this SOQL in the background we need to run a post request:

Captura de pantalla 2017-11-29 a las 21.51.22

And provide a body in JSON format in order to make the call. Find bellow a really easy example that helps you to create Vendor__c custom object records based on Accounts.

Captura de pantalla 2017-12-19 a las 9.40.12What can we highlight?

query: allows you to define a SOQL. On our case, the object I’m going to read, Account and the field I want to recover, Name.

operation: help us to define what we want to do, insert or upsert (please check big objects entry to understand Primary Key and upsert, as it works in the same way)

Basically that is the main difference between simple SOQL and Async SOQL. With this new feature we can read and create or update records in one go.

targetobject: object we want to use to create new records. In our case, Vendor__c.

Then, we need to specify from where we want to get record field values and where we want to store this info. For that we have 2 keys in this JSON code.

targetFieldMapping: allows you to define source field and target fields

and

targetValueMap: allows you to map a target field with a literal value, for instance, we want all records have Spain as Country__c field value.

How is the response?

When we do the post call, we get a similar response:

Captura de pantalla 2017-11-29 a las 22.01.12

It is also in JSON format and basically it provides similar information that you passed in the post call. Just highlight 3 keys:

jobId: remember this is an asynchronous call so there is an Id related to the background job. You can also use it in the post call via the global variable $Job_Id as part of key value on targetValueMap.

status: helps you to know where is the job. Moving from New to Running, Complete, Failed etc.

message: before executing the call, the code is analyzed. If it realizes about a possible issue, it doesn’t start the call and provides an error message in that field. For instance, below message. My VendorName__c field size was too short to store all Account Names.

Captura de pantalla 2017-11-29 a las 22.11.03.png

How can I stop the execution?

Async SOQL doesn’t provide a UI in Salesforce like other background jobs like Batch Apex or Queueable. But we can make an http delete call passing the jobId as part of the url.

Captura de pantalla 2017-11-29 a las 22.22.02

How can I check the progress?

Similar as before, as we do not have UI, we need to look for another way in order to get this information. For that, we have 2 options:

1º – Make a get post call: Similar to the cancel action, if we pass the JobId as part of the url, and do a get call, the response shows you information abut the background execution that is running in the system.

Captura de pantalla 2017-11-29 a las 22.32.33

2º – Make a SOQL against BatckgroundOperation object: This is a new object where we can see extra information about the job execution.

Captura de pantalla 2017-11-29 a las 22.32.42

But this is not the only new object that we can get in the system. We also have BackgroundOperationResult that will help you to identify any issue during the execution if the result is not the expected one. For instance. Bellow image shows an issue related to a field that is required but the source value is empty. In that case, a new error is logged on this table, but the execution doesn’t stop, it continues till the end.

Just keep in mind that this information would be removed after 7 days.

Captura de pantalla 2017-11-29 a las 22.32.55

And what about Archiving?

Yes, you are right. We promised to show you how to archive Code Review records but tried to explain the whole functionality with a really simple use case. Now it’s time to move to Code Review History use case.

As I mentioned before, we would like to create Code Review History records reading Code Review records. Similar as before we will do a post call but the body would be like this one:

Captura de pantalla 2017-11-29 a las 22.51.40

On BigObjects post, the object was called Test1CodeReviewHistory__b instead of CodeReviewHistory__b

And … that’s all. Simple, isn’t it?

Can I integrate this functionality into Apex?

Yes, of course, at the end we are just making http calls, so we only need to set the proper url, add it on my remote settings and create a string in JSON format for the body.

Captura de pantalla 2017-11-29 a las 22.42.57

Summary

Finally I would like to sum up some key concepts we have talked about.

  1. Async SOQL allows you to run SOQL in the background
  2. It takes some times but allows you to process millions or even billions of records
  3. You do not need to worry about governor or time out limitations
  4. It is implemented as REST full API
  5. You can make as many calls as you want per day but just a single one at a time
  6. You can read and create records in one go. Delete is out of scope
  7. This feature is part GA and part still in Pilot:
    1. Read Standard or Custom objects and create Standard, Custom or Big Objects is in Pilot
    2. Read Big Objects and create Standard, Custom or Big Objects is GA

Captura de pantalla 2017-11-29 a las 22.43.44

Salesforce BigObjects

What is a BigObject?

Two years ago I started working with Big Objects, a Salesforce pilot that looked very interesting for me. And now that it seems that General Available time is closer, I would like to write about them.

Please, take into account that a Salesforce pilot could be discarded like Data Pipeline that it would not be GA at the end.

Big Objects is a new type of objects that Salesforce provides. Maybe the most important thing I can tell is that they do not count against the storage limit. Well, you could say that External Objects already do.

You can learn more about External Objects here.

Yes, you are right. But this time, data lives in your organisation instead of in an external storage and this means that you can work with them like with Custom / Standard objects, but if you go to Salesforce Storage screen, you would not find them.

Big Objects let you store and manage large amounts of data.

Use Case

But before starting with the explanation, let’s talk about the Use Case.

As a company that creates software, we have stories in order to develop new functionality. Before delivering them, we need to pass a code review, so that, we ensure the best to our end users.

Captura de pantalla 2017-07-07 a las 12.21.47

But how much storage should I use to keep code review records in our Production org? Should I remove them? If so, what about auditing?

BigObjects is your solution, so that, you can move old records to a new Code Review History BigObject record. Let’s focus for now on this new History object.

How to Create a BigObject?

Unfortunately you cannot do it in a declarative way. You need to define them via Metadata API and then do a deployment via cmd or workbench.

Before doing it, take into account:

  1. Indexes are required and in order to include them, you need to be on API 39.0 onwards.
  2. Once you deploy one, you cannot make any modification like amend a field type, change label, etc. However in a declarative way you can do small changes like the label and API name.
<?xml version="1.0" encoding="UTF-8"?>
<CustomObject xmlns="http://soap.sforce.com/2006/04/metadata">
  <deploymentStatus>Deployed</deploymentStatus>
  <fields>
    <fullName>CanBeMerged__c</fullName>
      <label>Can Be Merged</label>
      <length>80</length>
      <type>Text</type>
    </fields> 
  <fields>
  <fullName>CodeReviewDate__c</fullName>
    <label>Code Review Date</label>
    <type>DateTime</type>
    <required>true</required>
  </fields>
  <fields>
    <fullName>Comments__c</fullName>
    <label>Comments</label>
    <length>255</length>
    <type>Text</type>
  </fields>
  <fields>
    <fullName>Employee__c</fullName>
    <label>Employee</label>
    <referenceTo>Employee__c</referenceTo>
    <relationshipName>Employees</relationshipName>
    <required>true</required>
    <type>Lookup</type>
  </fields>
  <fields>
    <fullName>Score__c</fullName>
    <label>Score</label>
    <precision>1</precision>
    <scale>0</scale>
    <type>Number</type>
  </fields>
  <fields>
    <fullName>Story__c</fullName>
    <label>Story__c</label>
    <referenceTo>Story__c</referenceTo>
    <relationshipName>Stories</relationshipName>
    <required>true</required>
    <type>Lookup</type>
  </fields>
  <indexes>
    <type>PRIMARY</type> 
    <fullName>Test1CodeReviewPK</fullName>
    <fields>
      <name>Story__c</name>
      <sortDirection>DESC</sortDirection> 
    </fields>
    <fields>
      <name>Employee__c</name>
      <sortDirection>DESC</sortDirection>
    </fields>
    <fields>
      <name>CodeReviewDate__c</name>
      <sortDirection>DESC</sortDirection>
    </fields>
  </indexes>
  <label>Test1 Code Review History</label>
  <pluralLabel>Test1 Code Reviews History</pluralLabel>
</CustomObject>

Once you deploy it into your organization, you would find something like this under BigObject entry:

I decided to name is as Test1 Code Review History because once it is created I cannot modify it, and I didn’t want to use Code Review History label for now.

Captura de pantalla 2017-07-14 a las 13.03.54.png

What can we highlight?

  1. The API name ends with __b instead of __c or __x like custom and external objects.
  2. We will not find Standard fields.
  3. We can create as many custom fields as we want but right now it only supports these field types: DateTime, Lookup, Number, Text and Long TextArea. What about the rest? Try to be creative. For instance a Checkbox can be converted into number or text.
  4. You will not find any other section like PageLayout or Buttons for now.
  5. You will not find Triggers section. It is not allowed for now.
  6. You will not be allowed to create a custom tab related to this object. That is related to previous point and the PageLayout. To visualise Big Object records, you need to create a visualforce page or a lightning component.
  7. You can determine its CRUD and FLS via Profile or Permission Sets during the deploy process or in a declarative way. However for now BigObjects can be only created and read.

Captura de pantalla 2017-07-14 a las 13.44.36

How Can I Create Records?

There are different ways to create BigObject record, like using a csv file, use APIs like Bulk API or even Async SOQL, another Pilot I will talk about in a future entry.

In any case, here, I will focus on Apex.

As I mentioned above, you can only Read or Create records, and nowadays, you cannot use simple DML operations, there is a new one: database.insertImmediate(record)

Test1CodeReview__b cr = new Test1CodeReview__b();
cr.CanBeMerged__c = 'False';
cr.CodeReviewDate__c = System.today();
cr.Comments__c = 'I found a SOQL inside of a loop.';
cr.Employee__c = 'a6j24000000fxSL';
cr.Score__c = 0; 
cr.Story__c = 'a6k24000000k9bN';
database.insertImmediate(cr);

But something pretty cool that it was not available at the beginning, it is the fact we can use this method, to also update records. Actually it works like upsert. If I make a call where the record has a primary key that doesn’t exist, it would create a new record as we can see in the below image.

Captura de pantalla 2017-07-14 a las 14.28.04

However if the value already is in the platform, it will make a modification.

Test1CodeReview__b cr = new Test1CodeReview__b();
cr.CanBeMerged__c = 'False';
cr.CodeReviewDate__c = System.today();
cr.Comments__c = 'I found a SOQL inside of a loop.';
cr.Employee__c = 'a6j24000000fxSL';
cr.Score__c = 1; //Change to 1
cr.Story__c = 'a6k24000000k9bN';
database.insertImmediate(cr);

Captura de pantalla 2017-07-14 a las 14.35.49

If I come back to the original use case, I would like to move CodeReview__c records to Test1CodeReviewHistory__b records. In order to do that I only need to retrieve the record I need and create a new BigObject entry.

CodeReview__c cr = [Select Id, CanBeMerged__c, 
                           CodeReviewDate__c, Comments__c, 
                           Employee__c, Score__c, Story__c 
                    From CodeReview__c 
                    Limit 1];

String canBeMerged = cr.CanBeMerged__c == true ? 'True' : 'False';

Test1CodeReview__b crh = new Test1CodeReview__b();
crh.CanBeMerged__c = canBeMerged;
crh.CodeReviewDate__c = cr.CodeReviewDate__c;
crh.Comments__c = cr.Comments__c;
crh.Employee__c = cr.Employee__c;
crh.Score__c = cr.Score__c; 
crh.Story__c = cr.Story__c;
database.insertImmediate(crh);

How Can I Visualise Records?

As we have already mentioned, we cannot create a custom tab related to a BigObject and show all records. However we can query BigObjects, so what about if we create a custom page for that?

First of all we would need a controller. Really simple one. We have a method to retrieve all records and a get and set to show in a list what we retrieve.

public with sharing class CodeReviewController
{
   private static List<Test1CodeReview__b> codeReviewHistoryList; 
   
   public CodeReviewController()
   {
      setCodeReviewHistoryList(calculateCodeReviewHistoryList());
   } 

   public List<Test1CodeReview__b> calculateCodeReviewHistoryList()
   {
      return [SELECT Id, CanBeMerged__c, CodeReviewDate__c, 
                     Comments__c, Employee__c, Score__c, 
                     Story__c                
               FROM Test1CodeReview__b];
   }

   public static List<Test1CodeReview__b> getCodeReviewHistoryList()
   {
      return codeReviewHistoryList;
   }

   public static void setCodeReviewHistoryList(List<Test1CodeReview__b> value)
   {
      codeReviewHistoryList = value;
   }
}

Secondly, we have the visualforce page. Something to highlight is that right now, the standardController attribute doesn’t work, so we can only create a page with a customController. 

<apex:page showHeader="true" sidebar="true" controller="CodeReviewController">
   <apex:form>
      <apex:pageBlock title="Code Review History Records">
         <apex:pageBlockSection title="Code Review History" columns="1" collapsible="false">
	    <apex:pageBlockTable value="{!codeReviewHistoryList}" var="crHistory">
               <apex:column headerValue="Ready To Merge?">
	          <apex:outputField value="{!crHistory.CanBeMerged__c}"/>
               </apex:column>
	       <apex:column headerValue="Score">
	          <apex:outputField value="{!crHistory.Score__c}"/>
	       </apex:column>
               <apex:column headerValue="Comments">
	          <apex:outputField value="{!crHistory.Comments__c}"/>
               </apex:column>
               <apex:column headerValue="Story">
	          <apex:outputField value="{!crHistory.Story__c}"/>
               </apex:column>
               <apex:column headerValue="Reviewer">
                  <apex:outputField value="{!crHistory.Employee__c}"/>
               </apex:column>
               <apex:column headerValue="Code Review Date">
                  <apex:outputField value="{!crHistory.CodeReviewDate__c}"/>
               </apex:column>
            </apex:pageBlockTable>
         </apex:pageBlockSection>
      </apex:pageBlock>
   </apex:form>
</apex:page>

And the result is this one that includes the record created from scratch and the one that is retrieved from CodeReview__c object.

Captura de pantalla 2017-07-17 a las 9.14.52

Summary

I would like to share this table that summarise BigObjects and compare with Custom Objects. But before, I would like to also highlight that this new object type can be included in a package. You would find under CustomObject section:

Captura de pantalla 2017-07-17 a las 9.16.06

Summary Table:

Feature Custom Object Big Object
Creation Manual
Metadata
Metadata
API name myObject__c myObject__b
Track Activities
Track Field History, etc.
Options Available Options No Available
Field Types All Text ; Date/Time ; Lookup
Number ; Long Text Area
Able to edit fields Yes Yes (with restrictions)
Able to delete fields Yes No
Triggers; Field Sets; etc Options Available Options No Available
Reports Yes No
How to Populate records All CSV file
API (Bulk, SOAP)
Apex
Async SOQL
Can I amend a record? Yes Yes (with restrictions)
Can I see data creating a Tab Yes No
For free? Yes No — Talk with Salesforce
Storage It count against storage It doesn’t count against storage