2007-08Aug-31
The SCAN command
I've seen lots of FoxPro programs that use GO TOP before a SCAN loop and SELECT just before the ENDSCAN and before every LOOP statement. These statements aren't required anymore. Yet, many developers use them as a safety belt. So what does SCAN really do? Here are a number of observations:
SCAN saves the work area the first time it's executed. Subsequent calls restore the work area, not the alias. If you start in work area 4 as in the test code below, VFP doesn't care what table is open. The SCAN loop just runs until SCAN moves the cursor into EOF in whatever table is open in the work area at that time. This is important for generic code that closes and reopens a table. It's not sufficient to open it with the same alias. You have to open it in the same work area, too.
On the first pass, SCAN moves the cursor to the top of the file. Subsequent calls just skip by one record. This even true when the table changes in between. Hence, the following code skips through records 1-3 of Customers, and records 2-4 of Products. The total number of iterations is six, even though this would impossible with the FOR condition in just one table.
Select 4
Use Northwind\customers
Scan for Recno() < 5
? Select(), Evaluate(Field(1))
If Used("Customers") and Recno() >= 3
Use Northwind\Products
EndIf
Select 99
Endscan
The NEXT clause is implemented as an independent counter. If you add NEXT 4 to the SCAN command above, the total number of iterations is four, not four for each table.
The FOR clause is bound to the SCAN statement which invalidates a filter when you switch tables:
Clear
_screen.Cls()
ln = 0
Select 4
Use Northwind\customers
Scan for Iif(;
_Screen.Print(Alias()+Transform(Recno())+Chr(13));
,.t.,.t.) and Upper(city)="A"
? " *"
ln = ln+1
If Used("Customers") and ln> 2
Use Northwind\Products
EndIf
lnRecNo = Recno()
endscan
When you run this code you see that VFP evaluates the FOR condition for the first record of Customers.DBF, all matching records in Customers (17, 55, and 65) and exactly the same records (17, 55, and 65) in the Products table. In other words, VFP is using the Rushmore bitmap for Customers on the Products table. Because Rushmore involves a record validation, the SCAN loop itself is only executed for the Customers table.
Reusing Rushmore bitmaps this way is a subtle source for errors. If you changed the condition from UPPER(City)="A" to UPPER(City)="W", you would end up with a "Record out of range" error message. The records with "W" as the city in Customers.dbf are record 43 and 91. The Products table, however, only has 77 records.
2007-08Aug-20
NULL in CDX files
VFP doesn't store NULL in the field itself. Instead it uses a hidden field called _NULLFLAGS. You might wonder (well, I didn't, but you might) how VFP can create an index on a column that contains NULL, if NULL is not stored in the field itself. The answer is actually quite simple. A field that contains NULL is stored as a blank string. To distinguish NULL values from real blank fields, VFP precedes any other value with CHR(0x80). Hence, in Northwind\Customers.CDX the index on city contains "_PARIS" instead of "PARIS".
Does this effect you? Due to the extra character that VFP adds, an index expression on a NULLable field can only be 239/119 characters long instead of 240/120 that regular fields support.
2007-08Aug-20
Comparing byte arrays in C#
Unless I've been missing something in the past there's no native way in C# to compare byte arrays. Well, you could compare their hashes. But that only tells you if they are different. To make sure they are equal, you need to compare them byte by byte. Hence, using hashes is not only slow, it's also pointless.
The general advice in forums seems to be to run a loop and compare each byte. Writing a little code sample seems to be too much work for most of those that reply. Admittedly, it's not that hard, but you still need to deal with the fact that one array is likely to be shorter than the other one. Obviously, the loop needs to stop when the shorter array ends. Here's my version of that loop. In addition to detecting differences, it also returns which of the two arrays is smaller, just in case you need to sort binary data:
public static Int32 Compare(
Byte[] left, Byte[] right, Boolean exact)
{
// The default value is returned when both arrays
// are identical up to the length of the shorter one.
Int32 defaultValue;
Int32 Length;
if (left.Length < right.Length)
{
defaultValue = -1;
Length = left.Length;
}
else if (left.Length > right.Length)
{
if (exact)
defaultValue = 1;
else
defaultValue = 0;
Length = right.Length;
}
else
{
defaultValue = 0;
Length = right.Length;
}
// Compare all bytes up to the length of the shorter array
for( Int32 i=0; i right[i])
return 1;
}
return defaultValue;
}
2007-08Aug-20
Kill your index with REINDEX
When you read about repairing indexes in FoxPro forums, you frequently get the advice to avoid REINDEX. The most common reasoning is that REINDEX depends on the header which might as well be corrupt. That's true, but like most developers, I haven't seen a corrupt index header in years. The header is only updated when you add a tag. As this requires exclusive access to the table, there's little chance of introducing corruption due to caching, multi-user issues, and the like.
REINDEX is a bad idea, nonetheless. In order to create a brand new CDX file, Visual FoxPro has to move the old CDX file out of the way. For a short moment the table exists without a CDX file. A moment short enough, though, to causes malfunctioning. When REINDEX cancels out due to an error, it does not restore the previous CDX file. As the following sample demonstrates, you end up with an indexless table:
Create Cursor curDemo (cID C(1))
Insert into curDemo Values ("A")
Insert into curDemo Values ("B")
Index on GetID(cID) Tag cID CANDIDATE
? ">", Key(1)
Reindex
? ">", Key(1)
plKill = .T.
Reindex
? ">", Key(1)
Procedure GetID(tcID)
If Vartype(m.plKill) == "L"
Return "A"
Else
Return m.tcID
EndIf
EndProc
Run this program and ignore the error message. You can see that KEY(1) returns a valid expression the first two times, but nothing the last time. If you had used a table you would notice that the CDX file is gone. That's only a problem when you encounter an error during the index operation. Aside from problems with memory and network connections, you might encounter errors on CANDIDATE and PRIMARY indexes when you
- added an index without letting check VFP existing data,
- ran into some sort of index corruption that allowed VFP to add multiple records with the same key, or
- have an index on a function that is causing an error.
In any case, it's probably better to avoid these problems and just don't use REINDEX in a production application.
2007-08Aug-03
Detecting CDX changes
When you add a record or modify an indexed field, you expect to find them on any computer that runs your application. The simplest approach would be for Visual FoxPro to read the CDX file every time you search for a record. We can call this the Java compliant mode because when you run your application you would have enough time to get a new cup of coffee.
Permanently accessing a file across a network doesn't work well. What Visual FoxPro needs is a quick way to determine if a file has changed. Checking for the last update file time seems like a good idea these days. After all, on NTFS file systems the resolution for file time stamps is 100 nano seconds. On FAT, however, the resolution is only two seconds. When relying on the file time, Visual FoxPro wouldn't be able to detect any change when they happen within this two second interval.
Instead FoxPro uses a solution that is similar to many other caches, like the ASP.NET cache. At position 8 the CDX file header has a 32 bit field that is documented as "for internal use only". Whenever Visual FoxPro updates the index file it increments this field by one. To find out if the index file has changed, Visual FoxPro merely reads the first 16 bytes and compares the value against the last value it read. If they differ, the index must have been updated. In this case, VFP invalidates all cached index entries.
Visual FoxPro does not read the header every time you access an index entry. SET REFRESH defines the interval in which VFP performs the update check. The default value is five seconds. So any change to the index file that you make on one machine takes up to five seconds to become visible on a different machine. You can reduce this value to see changes quicker. This comes at price, though. You increase the network traffic which might reduce the performance of your application.
This design is one reason why applications perform faster when everybody is just reading data. Unfortunately, this approach has one drawback when multiple computers update index records. To update the index file, Visual FoxPro has to lock the header. Hence, only one machine at a time can add a record or update an indexed field. Whereas, multiple computers can update non-indexed fields at the same time.
With huge data sets this becomes an even bigger issue. The more records you have in a table the deeper is the index tree. Changing one indexed value in a single record might require updating one index block in a small table, but a dozen blocks in a huge table. That's one factor that contributes to the slow down of applications that change huge volumes of data in Visual FoxPro.