Conversion Guide¶
Overview¶
Mertech’s Unicode Edition Drivers require changes throughout your application including code changes, INT file updates, database updates, and deployment updates. Because upgrading to 64-bit and unicode requires you modify your application, we’ve also taken this opportunity to introduce a new eSQL system that uses a more standardized naming convention and interface. Additionally, because of these wide-ranging adjustments, we’ve also renamed our DLLs to differentiate them from Classic Edition Drivers. This change means that you must update your INT files and/or Filelist.CFG.
Our SDK has been overhauled to be smaller, easier to use, and faster! The rest of this document will outline each and every change you’ll need to make to your application, why it was done (if the reason isn’t obvious), and how to decide on what adjustments you should make if there are more than one way to handle a specific situation.
Please take the time to read this entire guide as we’ve put in details that are best explained in the context of converting an existing application to Unicode. Many of the details here can also be found in the new online help system.
Adjusting your code¶
Where possible, we’ve kept the interface on commands the same even when we’ve renamed a command. However, many commands were removed in favor of using an attribute instead. This was done for all commands that were getters/setters. Where commands were removed, this decision was made based on how often that command would be likely to be used in a system that is being upgraded to use Unicode and based on whether it’s interface could be standardized.
New Package & New Commands/Functions¶
We’ve replaced mertech.inc with f2s.pkg. This new package is about 20% of the size of mertech.inc. There are a couple of new commands and some internal variables that are important to know about. First is the new f2s_activate_driver command. This command can optionally be passed a driver name. It can be thought of as a replacement for Load_Driver
. If no driver name is passed then it attempts to detect if a driver is already loaded and whatever Flex2SQL driver is loaded is made the active driver. If a driver name is passed then this driver is made the active driver (it is loaded if it hasn’t been already). If there is a problem loading the driver or no Flex2SQL driver is loaded, an error will be thrown.
Calling the f2s_activate_driver command is optional, but very helpful in relation to using the new attributes interface. Additionally, setting the active driver makes all Flex2SQL commands slightly faster. There are two variables that are set by this command that will be useful to developers. One is _f2s_ActiveDriverID and the other is _f2s_ActiveDriver.
The _f2s_ActiveDriverID variable will be needed when using the set_attribute
and get_attribute
commands. These two DataFlex commands take a driver ID as a parameter in relation to whatever attribute you’re getting or setting. If you haven’t called f2s_activate_driver then you have to find this driver ID yourself using the DF_DRIVER_NAME
and DF_NUMBER_DRIVERS
attributes built into DataFlex.
The _f2s_ActiveDriver variable contains the name of the active driver. If your application can use multiple drivers simultaneously or can run against different drivers, this variable is useful for having separate code paths depending on which driver is active/loaded.
It is good to know that both of these variables are not dynamic. If you issue the Unload_driver
command, these variables will still contain the last driver that was loaded. If you don’t ever call f2s_activate_driver, these variables won’t contain anything. Setting these variables manually won’t do anything except break your code.
There is also a new function called f2s_ConstPointer. This function is meant to be used primarily with the new get_attribute
/ set_attribute
interface. These commands don’t allow you to pass a string as a parameter, but there are some new attributes that require a string as a parameter (such as the f2s_esql_db_name attribute which needs the server name as parameter). Passing the address of a string isn’t a difficult thing to do, but if you have that string as a constant rather than a variable, then the AddressOf
function won’t work. So that this isn’t something you have to think about, we advise using our new f2s_ConstPointer global function which will return the address of a variable or constant. So, if you had this line of code before:
SQL_USE_DATABASE of “MyServer” to “OtherDB”
You’d replace this code with our attribute based interface like this:
set_attribute f2s_esql_db_name of _f2s_ActiveDriverID (f2s_ConstPointer(“MyServer”)) to “OtherDB”
Although this line of code is longer, new capabilities come along with it. For instance, you can now get the current value of this attribute instead of just having the ability to set it.
Conceptual Changes¶
In relation to our eSQL interface, although we’ve tried to maintain code compatibility where possible, there are a few areas that will require that you modify your application logic. These changes make future development easier for you and us!
Datetimes in Dataflex 20.0¶
Note
Dataflex 20.1 (Known as Dataflex 2022) has fixes for the Datatime data type that makes this section only apply to v20.0.
The eSQL interface used to pass and return Datetimes as strings. There were attributes related to the format that these were passed and returned. All of this is no longer supported. Datetimes are now passed as native Datetimes. There is one major side effect to be aware of due to this. DataFlex (as of version 20.0.7) has a bug when Datetimes are moved into variants that causes them to lose millisecond precision. This can be seen when looking at data returned via f2s_esql_get_row. The milliseconds are actually being returned, but DataFlex isn’t able to read them. There are two workarounds for this we’ve implemented. One is the f2s_ExactDT function. This function is passed a variant array and an index to the column that is a DateTime within that variant array. It returns a Datetime variable that has millisecond precision. It is used like this:
f2s_esql_get_row vRow
Showln "variant array DateTime value = " vRow[1]
Showln "Exact Datetime Value = " (f2s_ExactDT(1, vRow))
This code demonstrates how you can get the Datetime WITH millisecond precision. Note that since this is operating on a variant array, the index passed is zero based. There is another way as well using a new command: f2s_esql_get_element. This command returns the value in a column of the current row of the result set. Unlike f2s_ExactDT, this command is one based (which is to help maintain compatibility with existing code):
f2s_esql_get_row vRow
Showln "variant array DateTime value = " vRow[1]
Showln "Exact Datetime value = " (f2s_ExactDT(1, vRow))
f2s_esql_get_element 2 to dtResult
Showln "Also exact Datetime value = " dtResult
When passing a Datetime as a parameter to an eSQL call, it is best to avoid moving the DateTime to a variant as DataFlex will lose the millisecond precision. If you pass it as a Datetime data type it will work.
Stored procedure/stored function parameters¶
In previous version of our driver, special care had to be taken to define not only the total count of parameters to a SP/SF, but also the direction of those parameters. With this new version of our driver you no longer need to do this and in fact can’t do this. The driver will determine on it’s own the type, count and direction of the parameters on it’s own. Any code that was using out or RSET parameters will need to be modified in recognition of this. OUT and RSET parameters can’t be set using f2s_esql_set_proc_param. You can get the out value using f2s_esql_get_proc_param after executing the SP/SF.
The method to deal with RSET parameters in Oracle has been changed. Instead of getting the next resultset, you will need to use f2s_esql_get_proc_param which will return a resultset interface as a variant. This variant is then passed to the f2s_esql_set_resultset command which will make that interface the “current” resultset. From there the standard resultset related commands can be used.
Prepared statement re-execution¶
Very often, developers will need to call the same procedure over and over again in their code. The way this was done in prior version was to set the statement, set the parameters, and execute your eSQL, then get the results. This pattern still works, but because of our easier to use interface this pattern, if used in a loop can result in slower execution. This is because as part of the prepare process, we now analyze your eSQL to determine the number, type and order of parameters. This requires an additional roundtrip to the server to do. Although this slows things down somewhat, it enables another feature that can make your code much faster.
If instead of creating a new statement each time, you instead create a statement and prepare that statement using our new “with” syntax, you can then call that statement over and over again with different parameters. To do this, after executing, just change the parameter values and execute again. When your statement is re-executed, because everything is already configured for it in the driver already, you’ll end up with even faster execution times. In our testing, using this new pattern is over twice as fast as the old pattern was in prior driver versions! Note though that the old pattern, if left in place and used in a tight loop can be many times slower than in prior versions.
Cleaning up after yourself¶
Although you are still able to make eSQL calls in most places without worrying about closing that statement, there are some cases where this can cause problems now. As a general rule of thumb, it is best to close a statement after you’re done with it. If you don’t close it, there are certain points where the driver will close it for you. But if the driver is unable to determine if it is safe to close a statement, that statement will be left open. This can cause issues in certain transactions. One area to be especially cautious is where an eSQL call returns multiple resultsets in the Flex2SQL for MS SQL driver. This can cause transaction code to fail if the statement isn’t closed properly beforehand or during the transaction. A workaround for this is to use a server cursor for this statement which is isolated from the transaction.
Inverse key column support¶
The new unicode drivers no longer support inverse keys at all. Inverse keys served a few purposes. In older versions of Oracle and even Microsoft SQL Server there wasn’t support for descending or uppercase indexes. Inverse keys gave our drivers a way of supporting this often used feature of DataFlex DAT files. Additionally, they were used so that a primary key could be uppercase or descending and as a speed optimization.
Over the years we’ve worked to remove support for inverse keys because their implementation was complex and confusing for outside developers that needed to access the tables containing these keys. Triggers had to be used to update the inverse key column to make things work with DataFlex correctly and the values in the inverse key column were often unintelligible.
With the release of the unicode driver, we’ve decided to completely remove this feature. The way they were implemented required the driver to have knowledge of the collation sequence so as to reverse it. This isn’t practical in Unicode because the collation is so large and is not fixed. Unicode characters can have modifiers that can change their sort order. There are rules for this, but implementing a system that tracks the differing rules for each collation on each server and each server version is really not possible.
The impact of this is that if your application was using inverse key columns, you’ll need to modify your table structure and possibly application code to remove this requirement. In general a standard index can now do what inverse key columns used to do, and they can do it quickly. There are a few places that are more complicated. For instance, if a primary key is based on an inverse key column, you’ll need to fix this. Oracle doesn’t require applications to have primary keys and you can use a function-based index instead.
If you have foreign key relationships based on inverse key columns, you may need to go a bit deeper to fix things. Oracle doesn’t support foreign key constraints on function-based indexes. Although there is a workaround for this using the virtual column from the function-based index, implementing this in the driver is impractical and you will be better served by fixing this relation to be based on a surrogate key.
You can identify the inverse keys in your INT files by the column names and the token INVERSE_KEY. An inverse key column will always end in _INVD
(Descending), _INVC
(UpperCase), or _INVB
(Descending and Uppercase).
New Helper Packages¶
In addition to f2s.pkg which is a replacement for mertech.inc, we’ve also included three new packages all of which are optional: f2s_esql.pkg, cMertechImageParser.pkg, and cMertechStructAnalyzer.pkg. We’ll briefly discuss why these packages are included.
f2s_esql.pkg¶
This package provides a new object oriented interface for esql use. For now this is NOT the primary interface we advise using, but rather something you can take a look at. The interface consists of a number of classes for interacting with our new esql interface: cESQLCommand
, cESQLParam
, cESQLParams
, cESQLRecordSet, cESQLColumn
, and cESQLColumns
. The basic usage is very simple allowing a code pattern such as:
Handle hCmd hRS
Variant[][] vRows
Get f2sNewCommand \_f2s_ActiveDriver to hCmd
Set f2sCommandText of hCmd to "select * from TestTable where someUCData = ?"
Send f2sPrepare of hCmd
Set Param of hCmd 1 to "🌿🍁🌱"
Send f2sExecute of hCmd
Get NextRecordSet of hCmd to hRS
Get Rows of hRS 10 to vRows
See the online documentation for more details on how to use this new optional interface.
cMertechImageParser.pkg¶
This package is meant to make using old-style Dataflex Images more useful for recent versions of Dataflex. At the end of this package, an object is created automatically that is available in the global variable mdsImageParser. The primary function you’ll use for this object is ImageToString. This function takes an image name and returns a string containing all of the text of this image. This can be a very useful way to access large embedded SQL statements without having to use line continuations. For example consider this old style embedded SQL statement:
Move ( "SELECT TOP 10 " + ;
"B.GUID, B.LTYPE, B.BRANCH_NO, B.REF_NUMBER, B.DEPART_DATE, B.FROMCITY, N.NAME " + ;
"FROM BILLTO AS B " + ;
"JOIN NEWCUST AS N ON B.ACTUAL_NO = N.CUSTOMER_NO AND B.ACTUAL_SITE = N.CUSTOMER_SITE " + ;
"WHERE B.REF_NUMBER >= ? AND B.BRANCH_NO >= ? ORDER BY B.REF_NUMBER , B.BRANCH_NO" ) to sSQL
In comparison, you can have a much more readable section of code loading this same SQL statement into a variable using this new package:
/SomeSQL
SELECT TOP 10 B.GUID, B.LTYPE, B.BRANCH_NO, B.REF_NUMBER, B.DEPART_DATE, B.FROMCITY, N.NAME
FROM BILLTO AS B
JOIN NEWCUST AS N ON B.ACTUAL_NO = N.CUSTOMER_NO AND B.ACTUAL_SITE = N.CUSTOMER_SITE
WHERE B.REF_NUMBER >= ? AND B.BRANCH_NO >= ? ORDER BY B.REF_NUMBER , B.BRANCH_NO
/*
Move (ImageToString(mdsImageParser, "SomeSQL")) to sSQL
This package isn’t limited to use with just our driver, but is available for your use anywhere within your application. See the online documentation for more details on using this new optional helper package.
cMertechStructAnalyzer.pkg¶
This is an updated version of the package we released in 2014 for working with structs in DF. At the end of this package, an object is created automatically that is available in the global variable mdsStructAnalyzer. This new version has three new functions that can prove very useful for simplifying your code: LoadStructFromArray, LoadStructFromBuffer, and LoadStructFromDD. When working with our new eSQL interface, the LoadStructFromArray is very useful as it allows you to take a row from a result set and load an array. For this to work, the member elements of the structure must map to the array elements in order. As an example:
Struct tCustomer
String CustName
String Addr
String City
String State
String Zip
Number Balance
End_Struct
Procedure TryIt
Variant vRow
tCustomer cust
f2s_esql_create_stmt "select name,address,city,state,zip,balance from customer where balance > ?"
f2s_esql_prepare_stmt
f2s_esql_set_param 1 to 1000
f2s_esql_execute_stmt
If (not(err)) Begin
f2s_esql_get_row vRow
Get LoadStructFromArray of mdsStructAnalyzer cust vRow to cust
..do something with the struct..
End
End_Procedure
As you can see here, it is trivial to load a customer row from the result set into the structure. The only unusual requirement is that you pass a structure into the function as well so that the function knows what type of structure it’s loading. The easiest way to do this is to pass the same structure as a parameter that is used as the return value as well (they must be of the same type).
The LoadStructFromDD and LoadStructFromBuffer functions work in a similar manner, although both of these functions are smart enough to match the names of the columns and element names in the structure instead of requiring data be in some specific order. See the online documentation for more details on using this new optional helper package.
Login/Open Changes¶
There have always been quite a few ways to login to a database server and to open tables. You could specify details in the mertech.ini file, details could be in the INT files, or a login could be coded into the application using the Login command. Additionally, the OPEN_INT command allowed for some of these details to be passed when a table was attempted to be opened.
Most of these still work, but of special note is the removal of the OPEN_INT command. This command provided some shortcuts to the OPEN command, but it was not required to be used and everything that it did can be done with the built in DataFlex OPEN command.
The OPEN_INT command would automatically assume that you wanted to open a table using the first loaded Mertech driver. So for instance:
Open_INT “Customer.INT” as Customer
This line of code would actually do the same thing as this line:
Open “SQL_DRV:CUSTOMER.INT” as Customer
If the “as” syntax wasn’t used and instead a simple OPEN_INT Customer was called it would run the equivalent of this code:
Declare_Datafile Customer
Open “SQL_DRV:CUSTOMER” as Customer
In most cases this wasn’t necessary at all and a simple Open Customer
was all that was actually needed. We removed this command entirely because the only real difference made by using this command was to add a driver prefix to the standard command. If you were using OPEN_INT, in general you can do a simple search and replace to use the built in Open command instead. You may have some locations where you first need to call load_driver but this should be a rare occurrence.
If you’re calling Load_Driver directly in your code, you will need to adjust your code to load the new Unicode Edition of the driver (or better yet, use the new f2s_activate_driver command):
Old Code |
New Code |
---|---|
Load_Driver “SQL_DRV” |
Load_Driver “f2s_ms” |
Load_Driver “ORA_DRV” |
Load_Driver “f2s_or” |
Load_Driver “MDSPGSQL” |
Load_Driver “f2s_pg” |
Load_Driver “MDSMYSQL” |
Load_Driver “f2s_my” |
Note that the 32-bit Unicode Edition drivers for MS SQL and Oracle are named the same as the 64-bit Unicode Edition drivers. See the deployment section for more details about the differences between deploying a 32-bit Unicode Edition driver and a 64-bit Unicode Edition driver. Additionally, the 32-bit Unicode Edition drivers do not work with the DataFlex Studio or other tools since these tools are 64-bit only.
Please see the section titled “Updating INT Files/Filelist.cfg” for details related to how drivers are loaded from INT files or as a driver prefix in filelist.
New Commands¶
Because of the change of interface, we’ve created some new commands that are not direct replacements of old commands and operate in a different enough manner that we’re documenting them here as entirely new.
New Command |
Notes |
---|---|
f2s_esql_set_cursor_type to {cursor type} [using {variant}] |
Although this command would appear to be a replacement for SQL_SET_CURRENT_CURSOR_STMT it is not. This new command is scoped to the statement passed in the variant or to the generic statement. A cursor can no longer span multiple statements and cannot be opened or switched. This is all handled internally now. The valid values for cursor type are the integer constants: f2s_olecursor_default, f2s_olecursor_client, and f2s_olecursor_server. Note that in most cases the default setting is just fine. The client cursor and default cursor are nearly identical. A server cursor has different abilities and limitations. See documentation for more details. |
f2s_esql_get_cursor_type to {variable} [using {variant}] |
See above. |
f2s_activate_driver {driver name} |
See “New Package & New Commands/Functions” section above. |
f2s_deactivate_driver {driver name} |
This command will unload the driver and clear out the _f2s_ActiveDriver and _f2s_ActiveDriverID global variables. |
f2s_split_array_helper {variant[]} to {variables…} |
This command accepts a variant[] and moves the array elements into separate variables. Although it can be used anytime, it was designed to work with f2s_esql_get_row so that minimal code changes are needed for code that was using the older sql_fetch_next_row into command. |
Simple Command Replacements¶
Many commands have been renamed but have the exact same syntax. Note that documentation for many of these commands may give a more detailed description of how that command works which may be different from the prior documented functionality.
Old Command |
New Command |
Notes |
---|---|---|
f2s_write_trace_ln |
||
f2s_table_invalidate_cache |
This command now takes a file number as a parameter. Just passing a table name rather than a Table.File_Number no longer works. |
|
f2s_esql_start_transaction |
||
f2s_esql_commit_transaction |
||
f2s_esql_rollback_transaction |
||
f2s_esql_row_count |
BETA: Not available for MS SQL. |
|
f2s_write_int_file |
||
f2s_begin_buffered_save |
||
f2s_buffered_save |
||
f2s_end_buffered_save |
||
f2s_lob_set_null |
||
f2s_lob_get_length |
||
f2s_lob_get |
||
f2s_mssql_conn_option_reset |
||
f2s_mssql_detach_db |
||
f2s_mssql_attach_db |
||
f2s_esql_escape_string |
||
f2s_sequence_init |
||
f2s_convert_dat_full |
||
f2s_convert_dat_structure |
||
f2s_trace_off |
||
f2s_trace_on |
Commands with syntax changes¶
There are a few conceptual changes to outline that affect quite a few commands. These changes, although drastic in some cases, will simplify your code in a very significant manner. You’ll find that most of the changes involve removing code.
First, we’ve removed the concept of chunking. Chunking was a way of dealing with the limitations in DataFlex related to string size. As this has been gone for a LONG time now, we’ve completely removed all chunking related logic.
Next, we’ve revamped how we deal with cursors. Before, we let the developer manage cursors on their own. But cursors have some very unusual features that require some hard to gain knowledge related to their use. Although the concept of a cursor isn’t entirely gone from our drivers, it is much reduced. So instead of managing cursors yourself, the driver will in most cases take care of this on it’s own. If you need to use a server cursor for a certain statement, you can, and the documentation will now outline why and when you’d want to do this, but that cursor will be associated with a specific statement and when that statement is closed, that cursor is automatically closed. No need to open or close it on your own. This means that there is no ability to “switch” between cursors.
Also the esql interface now supports the concept of having multiple active statements via the new “using” syntax. This allows for multiple statements to be built at once and in most cases you can even have multiple active statements with result sets active. Although this could be done before using cursors, it was not as simple to do as you had to switch between cursors.
Another area we’ve made a major change relates to file binding. File binding was a shortcut that reduced code by allowing a properly formatted SQL statement to bind to a file buffer. This was simpler in many cases than fetching a row into a whole bunch of variables. With the new variant array fetching though this became more of a liability that could cause developers headaches. Because of this we’ve removed file binding entirely. Instead we have a few new techniques available. First is the variant array fetching that returns an entire row to a single variable. You can use the f2s_split_array_helper to break this variant array back into separate variables to make your code easier to convert. Also, we have a new version of cMertechStructAnalyzer.pkg that includes functions to take a variant array and load it into a structure variable.
Old Command syntax |
New Command Syntax |
Notes |
---|---|---|
GET_INT_CHUNK OFFSET {offset} to {variable} [{length}] |
f2s_get_int {variable} |
The chunking syntax of this command no longer works. The entire INT will be returned as a uChar[] instead. The variable passed can be a variant or a uChar[]. |
f2s_esql_create_stmt {variable} [using {variant}] |
File binding is no longer supported. New syntax allows developers to pass a variable that will receive a statement variant with details of the statement to execute. |
|
f2s_esql_append_stmt {variable} [using {variant}] |
New syntax allows developers to pass a statement variant with details of the statement to execute. |
|
f2s_esql_execute_stmt [using {variant}] |
New syntax allows developers to pass a statement variant with details of the statement to execute. |
|
f2s_esql_close_stmt [using {variant}] |
New syntax allows developers to pass a statement variant with details of the statement to execute. |
|
GET_SQL_SCRIPT_CHUNK OFFSET {offset} to {variable} [{length}] |
f2s_get_sql_script {variable} |
The chunking syntax of this command no longer works. The entire SQL script will be returned as a uChar[] instead. The variable passed can be a variant or a uChar[]. |
SQL_CHECK_TABLE {database} {username} {table} |
f2s_check_sql_table_exists {table name} {owner} {db name} to {variable} |
Unused params should be passed as blank (all are required). For MS SQL db name is optional (will use the current ESQL DB if not passed) and owner is required. Owner can be either a user or a schema. Oracle doesn’t have the concept of a DB, but tables have an owner, so owner name is required, db name should not be passed For PostgreSQL db name is optional (will use the current ESQL DB if not passed) and owner is required. Owner is a user or role (but NOT a schema) With MySQL/MariaDB, db name is optional (will use the current ESQL DB if not passed) and owner SHOULD NOT BE PASSED (MySQL tables have no concept of an owner). Instead of setting the finder indicator as the prior command did, this command returns a boolean with true indicating the table exists based on the passed parameters and false indicating it does not. |
SQL_FETCH_NEXT_ROW [of {file}] | [into {variable} {variable} … ] |
f2s_esql_move_next [{variable}] [using {variant}] f2s_esql_get_row {variant[]} [using {variant}] |
The fetch next row command has been split into two separate commands. One to advance to the next row and one to retrieve the row. There are multiple benefits to this include: 1) For server cursors you can now move backwards using the f2s_esql_move_previous command and 2) it allows for looping patterns that are easier to follow and 3) it’s slightly faster. f2s_esql_get_row will always get the first row if available, so you don’t need to call f2s_esql_move_next first, although you can and it will return the same row. The “of” syntax for binding the resultset to a table buffer is removed entirely. The “into” syntax no longer accepts a list of variables to fetch into and instead will only return to a variant[] which will then contain the entire row. |
f2s_esql_get_stmt to {variable} [using {variant}] |
This command no longer has optional parameters to get part of the statement and it now returns the data only to a uChar[] so that argument_size isn’t an issue. |
|
CALL_STORED_PROCEDURE {name} passing {params…} SQLOUT |
f2s_esql_call_proc {name} with {params…} |
After a call to f2s_esql_call_proc, any OUT or IN/OUT values can be retrieved using f2s_esql_get_proc_param. Note that for this method of calling a stored procedure, all parameters must be passed in order since there is no concept of a named parameter. An OUT only parameter can be replaced with the keyword f2s_out. |
f2s_esql_set_proc_name {name} [using {variant}] |
New syntax allows developers to set up a variant (optional) that will hold details about a future call (a statement variant). Multiple variants can be set up simultaneously without affecting each other. See documentation for details. |
|
SQL_SET_PROCEDURE_PARAMETER {name or number} TO {value} {direction} |
f2s_esql_set_proc_param {name or ordinal} to {value} [using {variant}] |
New command supports four different syntaxes. There is no longer the ability to specify the direction of parameters. This is handled by the driver now. For an OUT parameter, you don’t need to do anything during the calling phase. No placeholder is needed. See documentation for details. |
f2s_esql_execute_proc [using {variant}] |
New syntax allows developers to pass a statement variant with details of the statement to execute. |
|
CALL_STORED_FUNCTION {name} returns {variable} passing {params…} |
f2s_esql_call_func {name} with {params…} to {variable} |
New syntax better matches DF syntax for function calls. |
f2s_esql_execute_func [using {variant}] |
New syntax allows developers to pass a statement variant with details of the statement to execute. |
|
f2s_esql_func_name {name} [using {variant}] |
New syntax allows developers to set up a variant (optional) that will hold details about a future call (a statement variant). Multiple variants can be set up simultaneously without affecting each other. |
|
SQL_SET_FUNCTION_PARAM |
f2s_esql_set_func_param {name or ordinal} to {value} [using {variant}] |
New command supports four different syntaxes. There is no longer the ability to specify the direction of parameters. This is handled by the driver now. For an OUT parameter, you don’t need to do anything during the calling phase. No placeholder is needed. See documentation for details. |
f2s_esql_next_resultset [using {variant}] |
Note that for certain backends in certain circumstances you may need to use this command to “consume” resultsets that are returned by an esql call before closing the statement. Where this is the case, it is a requirement of the backend. |
|
SQL_PREPARE_STMT [OF {file}][CURSOR_TYPE {type}] |
f2s_esql_prepare_stmt [using {variant}] |
File binding is no longer supported. Note that this statement MUST be called BEFORE setting parameters for stored procedures/stored functions and SQL statements. Cursor types can no longer be passed on the prepare statement. Instead use the f2s_esql_set_cursor_type command. New syntax allows developers to pass a statement variant with details of the statement to execute. |
Commands replaced by attributes¶
Many commands that were simply getting or setting something internally have been replaced with an attribute which already is a getter/setter interface. Note that in some cases using the get/set attribute interface requires passing a pointer to a string if an attribute takes a string parameter. This is a slightly unusual use of the get_attribute/set_attribute interface, but allows for a standardized interface where all getters/setters use the attribute interface and everything else uses the custom command interface. Note that the get_attribute/set_attribute interface requires that you pass it the name of the driver for which you’re getting/setting an attribute. The Mertech SDK includes an internal variable named _f2s_ActiveDriverID which has the driver name last accessed. This variable can be used as the {driverId} parameter, although the developer could get the driver using their own logic as well.
Additionally, the Mertech SDK contains a global helper function (f2s_ConstPointer) that will return a pointer to a constant string to get around the fact that AddressOf won’t return a valid value if passed a string constant.
Old Get/Set Commands |
Replacement Attribute |
Notes |
---|---|---|
SET_SQL_CONSTRAINT GET_SQL_CONSTRAINT SET_SQL_CONSTRAINT FROM_PROGRAMMATIC_DEFAULTS |
f2s_table_constraint and/or f2s_table_override_state f2s_column_override_value |
f2s_table_override_state is meant to replace the from_programmatic_defaults feature of the old SET_SQL_CONSTRAINT command and must be used along with f2s_column_override_value. |
SET_CURRENT_SQL_SERVER_CONNECTION GET_CURRENT_SQL_SERVER_CONNECTION |
||
This was only ever for MS SQL and the attribute name reflects that now. This attribute requires passing a string, consider using the f2s_ConstPointer function with this. |
||
This attribute requires passing a string, consider using the f2s_ConstPointer function with this. |
||
The command sets the casing not only for tables, but also for columns. The attribute name reflects this. |
||
This command had no way of getting its value, only enabling and disabling, You’re now able to retrieve the current scripting mode as well. |
||
A setting of -1 causes record counting to be enabled. A setting of 0 causes it to be disabled. A setting of 1+ causes it to return that set value as the record count. |
||
This was only ever for MS SQL and the attribute name reflects that now. |
||
This was only ever for MS SQL and the attribute name reflects that now. |
||
SET_FORCE_FIELDS_NOT_NULL GET_FORCE_FIELDS_NOT_NULL |
This attribute, when set to true, causes columns of the specified type that support nulls to NOT store a null and instead store a blank value. |
|
f2s_type_force_nulls |
This attribute, when set to true, causes columns of the specified type that support nulls to treat blank or zero values as null. (Normally only unmodified values that are blank or zero would be treated as null) |
|
This is a read only attribute. |
||
This is a read only attribute. |
||
This is a read only attribute. |
||
f2s_connection_user_password |
This is a read only attribute. |
|
f2s_driver_major_version |
This is a read only attribute. This gets the integer portion of the version such as 17. |
|
f2s_driver_version |
This is a read only attribute. This gets a full version string such as 17.1.8992.0 |
|
f2s_license_days_remaining |
This is a read only attribute. |
|
This is a read only attribute. |
||
GET_DRIVER_LICENSE_PATH |
This is a read only attribute. |
|
This is a read only attribute. This was only ever for MS SQL and the attribute name reflects that now. |
||
This is a read only attribute. This was only ever for MS SQL and the attribute name reflects that now. |
||
This is a read only attribute. Even though this is technically a server attribute, it’s not being put under f2s_server to avoid confusion with the total count of cursors on the server. This is the current cursor count from this session to server only. |
||
This was a write-only command. Now you are able to get or set this attribute. |
||
This was a write-only command. Now you are able to get or set this attribute. |
||
This was a write-only command. Now you are able to get or set this attribute. |
||
This is a read only attribute. |
||
This is a read only attribute. |
||
f2s_license_max_sessions |
This is a read only attribute. |
|
This was a write-only command. Now you are able to get or set this attribute. |
||
This was a write-only command. Now you are able to get or set this attribute. This attribute can be set at a server or a table level. There are new definitions to handle options for server/table and also the hint value itself. (TODO ADD THIS IN) |
||
This was a write-only command. Now you are able to get or set this attribute. There are new definitions for the isolation level. |
||
f2s_auto_numeric_type_remapping_state |
This was a write-only command. Now you are able to get or set this attribute. This is now supported across all drivers where applicable. (TODO) |
|
This was a write-only command. Now you are able to get or set this attribute. |
||
This was a write-only command. Now you are able to get or set this attribute. |
||
This was a write-only command. Now you are able to get or set this attribute. |
Attributes renamed¶
We are no longer using the DF_ prefix for anything. We have some attributes that are also DAW attributes. For these attributes we allow access via the DAW attribute at the numeric value they define for it, and we also have our own attribute value that begins with f2s_ that can be used. We advise the use of our attribute for clarity.
Additionally our attributes are all now properly categorized. Anything that was “DF_FILE” before will be prefixed with “f2s_table” now. “DF_FIELD” is now “f2s_column”. “DF_INDEX” is now “f2s_index”. As the majority of DataFlex applications are now using a server based system, our parlance matches the terminology used by server vendors where possible.
Old Attribute Name |
New Attribute Name |
Notes |
---|---|---|
0 for record mode, 1 for set mode. |
||
f2s_table_db_name |
||
DF_FILE_STATIC_MODE |
f2s_table_static_mode |
|
f2s_table_column_count |
||
The logic on this attribute has been reversed. When true it will embed the server name in the INT file and when false it will not. |
||
The logic on this attribute has been reversed. When true it will embed the database name in the INT file and when false it will not. |
||
The logic on this attribute has been reversed. When true it will embed the schema name in the INT file and when false it will not. |
||
Note that this is different from f2s_default_letter_casing in that this sets the casing for a specific table (including that table’s columns) whereas the former sets it for all tables and columns. |
||
DF_FIELD_DEFAULT_FUNCTION |
This attribute should be used going forward and supports static values and expressions. Where a static value is used it should be quoted based on the server it is used for. |
|
This attribute which is marked as legacy and deprecated allows for unquoted static values to be used as default values. This was a confusing feature and will be removed in the next version. Instead of using this, use the f2s_column_default_sql_value attribute. |
||
This has only ever been an Oracle attribute. It’s name has been changed to reflect this. |
||
DF_FILE_NUMBER_INVERSE_KEYS |
Commands removed¶
Commands Removed |
Replacement Command |
Notes |
---|---|---|
N/A |
This is part of a series of commands related to doing a find that only returns a subset of columns. This was done before the drivers had ICF and was only used in limited cases normally related to slow VPN connections populating tables/lists. This functionality is no longer supported. |
|
N/A |
Same as above |
|
N/A |
Same as above |
|
FETCH_FIELDS |
N/A |
Same as above |
N/A |
There is no support for custom OEM to ANSI conversion in the Unicode Edition of the drivers. |
|
N/A |
Same as above |
|
N/A |
Same as above |
|
N/A |
This was only used in Linux and has no equivalent in Windows. As the Unicode driver is Windows only, this command has been removed entirely. |
|
N/A |
||
f2s_record_count_state attribute |
This command was used to block record counting operations. This has been replaced by an attribute that allows for this to be turned on, turned off, or set to a fixed record count instead of needing multiple commands. |
|
f2s_record_count_state attribute |
Same as above |
|
f2s_record_count_state attribute |
Same as above |
|
f2s_record_count_state attribute |
Same as above |
|
SQL_SAVE_SET_ATTRIBUTES |
N/A |
This was an internal command that was never meant to be accessible by developers. If you were using this functionality for buffered saves based on size, it needs to be removed from your code. |
MERTECH_LANGUAGE_SUPPORT |
N/A |
This was a customer specific feature related to OEM/ANSI conversion. Removed from product. |
DIRECT_PATH_LOAD |
N/A |
This was an internal only command. Removed from product. |
N/A |
This returned a static value and was inaccurate. If you need to know the max length that a backend supports, this should be done based on the server documentation. It cannot be queried for all backends accurately and has therefore been removed from the product. |
|
N/A |
Chunking support has been removed. Instead you can use the f2s_get_lob to retrieve an entire LOB into a uChar[]. |
|
N/A |
This command is no longer needed now that f2s_esql_get_row returns a variant array that can contain uChar[]’s from a resultset. |
|
N/A |
Same as above |
|
N/A |
Now that we use variant[]’s this is no longer needed. |
|
GET_CONNECTION_STATUS |
f2s_server_connect_state attribute |
This command was a wrapper around the GET_SERVER_CFG command which has now been replaced by a series of attributes |
GET_SERVER_CFG |
f2s_server_major_version f2s_server_minor_version f2s_server_descending_support f2s_server_case_insensitive_support f2s_server_version_string f2s_server_encoding f2s_server_connect_state f2s_pg_server_citext_support attributes |
This command has been replaced by a series of attributes. |
f2s_esql_get_row |
This command was redundant in that it retrieved a specific cell from the result set’s current row. This can be done by looking at a specific element in the variant[] returned by f2s_esql_get_row |
|
N/A |
This feature has been removed. A stored procedure can be re-executed by just executing it again after it has been prepared. This also works for other drivers now too. |
|
N/A |
Cursors are now part of the statement interface so you don’t open a global cursor, it’s scoped to a specific statement. For a specific statement the f2s_esql_set_cursor_type command can be used. |
|
N/A |
Same as above |
|
f2s_esql_get_cursor_type |
Same as above |
|
f2s_esql_set_cursor_type |
Same as above |
|
N/A |
Column binding is no longer supported. |
|
N/A |
This existed because dates and datetimes were returned as strings in prior versions. Now they’re returned as proper types, so no formatting is necessary anymore. |
|
N/A |
This form of mirroring is no longer supported. Our driver automatically supports the new load balancing and failover setups in SQL server. |
Attributes removed¶
Attribute Removed |
Alternative Attribute |
Notes |
---|---|---|
N/A |
This attribute was redundant. Setting the f2s_table_max_query_rows to 1 or f2s_table_fetch_mode to 0 (for record mode) will do the same thing. |
|
DF_FILE_USE_ROWCOUNT_IN_TRANSACTIONS |
N/A |
This was a feature used in versions prior to v10 and has not worked since v11. It has been removed from the product. |
DF_FIELD_TEXT_NUM_SEGMENTS |
N/A |
This attribute allowed accessing text fields in Oracle above 16K. This interface is no longer needed with long field support. Removed. |
DF_FIELD_TEXT_SEGMENT_NAME |
N/A |
Same as above |
DF_FIELD_TEXT_LENGTH_SEGMENT |
N/A |
Same as above |
This was a file level setting for index tablespace, but there was already a global and index specific version. Rather than supporting 3 versions of this, we are now only allowing a global and an index specific setting. |
||
DF_FIELD_DEFAULT_FUNCTION |
The idea of having an expression based default is now built into the primary interface. There is a legacy interface that allows for the old static values that handled quoting automatically, but only in v17. Future versions of the product will ONLY support this new attribute which allows static values or expressions. |
|
N/A |
This feature has been removed from the product. This allowed for newly created rows to not return their new recnum. This gave a small performance boost at the cost of messing things up if it was left on or used with DDs. |
|
N/A |
Very old versions of Dataflex didn’t have a datetime data type. This attribute allowed for a datetime to be returned by altering your FD file to make a datetime column into a string. This is no longer needed and has been removed. |
|
N/A |
This attribute was part of the select_coiumns_find interface which has been removed. |
|
DF_FETCH_FIELDNUMBER |
N/A |
Same as above |
DF_FIELD_GENERATE_ALWAYS |
N/A |
This was a DB2 only attribute. It has been removed. |
N/A |
This attribute served no useful purpose as the same information can be retrieved from f2s_table_recnum_state. |
Updating INT Files/Filelist.cfg¶
Because the Unicode edition uses a different set of DLLs, your filelist and/or INT files will need to be updated to reflect this. Dataflex allows the driver name to be specified in multiple ways. It can be specified as a prefix in the table name column of a Filelist entry (ex: “Root name: f2s_ms:invt”) or it can be read from an INT file where the INT file is the root name (ex: “Root name: Invt.int”, first line of INT file will have: “DRIVER_NAME F2S_MS”).
This means that you either need to have an updated set of INT files or an updated filelist.cfg (or in some cases both) to connect your application to our Unicode Edition driver. Our preferred method to specify the driver to use is a prefix on the entry in Filelist.cfg. This method is slightly faster than reading it from the INT File.
If the filelist.cfg prefix specifies the Unicode Edition driver and the INT file still has the Classic Edition driver, the Filelist.CFG entry takes precedence. This could be a temporary means to allow you to use both Classic Edition and Unicode Edition with the same Data Folder and just change out the Filelist.cfg file. However, we advise against doing this as it can cause some trouble in certain situations.
Ideally you’ll have an updated set of INT files and an updated Filelist.cfg to match it. Our Flex2SQL Utility Unicode Edition has the ability to batch convert a Filelist and its associated INT files. This is done in a non-destructive way. The utility will back up the Filelist and INT files before it adjusts them.
The INT file format remains unchanged, but many of our keywords now have a synonym that will be used which better matches the functionality that it exposes. Additionally, some of these synonyms have slightly different syntax available to make the INT file easier to read and modify by hand if needed. These changes to the INT file are entirely optional, and the old options can still be used.
Below is a list of changes to the INT file format in Unicode Edition:
(NOT FOR BETA)
Old Keyword |
New Keyword |
Notes |
---|---|---|
Upgrading your Database¶
MS SQL Server¶
SQL Server has three data types that we’ll look at here: nchar, nvarchar and ntext. These data types support storing unicode data in either UCS-2 or UTF-16 formats. This allows for (theoretically) 65,535 possible characters to be represented (in what is called the Basic Multilingual Plane or BMP). However, unicode allows for 1,114,112 characters (note that UCS-2 only supports the BMP). Additionally, different countries and languages have different ways that their characters and other characters should be sorted. For SQL Server, all of this is defined by the data type and the collation associated with the database or column. In general, you’ll want to use a collation for the country or language in which you are based or primarily operate. For SQL Server to allow the full range of characters (including emoji) to be sorted, a collation that supports “Supplementary Characters” must be selected. These collations contain the string “_SC” in their name. There are implications to consider when using an SC based collation though. Let’s look at each data type in turn to see how it is affected by the collation.
nchar: This data type normally allows a fixed number of characters. So, for instance, a nchar(10) might be expected to hold exactly 10 characters. The storage allocated for a nchar(10) is 20 bytes. But if, for instance, you attempted to store the string “🌼🌻🌺🌹🌸🌷🍄🌿🍁🌱” you’d find that it won’t fit into an nchar(10). Even though this string is 10 characters long, each of those characters is a supplementary character and it would need 40 bytes of storage space. Recall an nchar(10) is only allocated 20 bytes. So in this case, only 5 of those characters would fit. The best way to think about this is to consider the size of the nchar to not be the number of characters, but rather the number of byte-pairs. Most characters that will commonly be used take up one byte-pair in UTF-16, but supplementary characters take up two byte-pairs. This could have implications in your code if it is expecting a fixed number of characters.
nvarchar: This variable length data type is most likely what you’ll want to use to store unicode data. The length given for the nvarchar definition is also byte-pairs. In most cases it should be fine to leave your nvarchar definitions the same as you would have made them for non-unicode varchar columns. If you tend to make columns as small as possible, this could be problematic when switching to unicode and it is best to over-allocate for this. Whereas a varchar column could have a size between 1-8000 characters, for nvarchar the range is 1-4000 byte-pairs. For nvarchar(max), the physical size in bytes is the same: 2^31 - 1 (about 2GB). Note that you often don’t need to convert ALL columns to nvarchar to be unicode compatible. For example, in the USA, the zip code is often stored in a varchar column and can remain a varchar column since zip codes are always alphanumeric.
ntext: This is considered a legacy data type by Microsoft. It works similarly to nvarchar(max) except that it has half the storage capacity. Although you can technically use it, we don’t recommend it. Microsoft has limited what it can do from many standpoints (for instance its capacity is only 1GB) and it is even documented not to use UTF-16, instead being forced to UCS-2 (and therefore limited to the BMP) which has otherwise been deprecated. The driver will treat an ntext column the same as it would a nvarchar(max) column.
Changing your collation & migrating your data¶
For MS SQL, your database has a default collation, and columns can use this default collation or they can override the collation on a column-by-column basis. When you change the collation of your database, it DOES NOT change the collation of existing tables. Only new tables will inherit the new collation. This means that you need to change your database collation AND convert your existing tables to use the new collation. This can be done in multiple ways, including in-place conversion, recreating your tables and copying data, or copying data to an entirely new database. See this page for more information.
Choosing a collation¶
SQL Server has well over 1000 collations available. In general collations are thought of as a way of sorting data. But collations actually define more than just the order in which data is sorted. They also (in the case of non-unicode data) determine how binary values are rendered as characters, and in the case of unicode they can also define what is treated as a character and if data should be treated as UTF-8 or ANSI. Various collations have different settings related to how characters are weighted based on things like language, accent sensitivity, supplemental character support, case sensitivity, Kanatype sensitivity (for Japanese), width sensitivity, binary order and more. Choosing which collation to use for your database is not much different than it used to be. It was a complex topic before, and it’s a complex topic now.
In the USA, the collation you’ll probably want is SQL_Latin1_General_CP1_CI_AI. In fact, there’s a good chance you are already using this collation. This collation uses codepage 1252 for non-unicode columns and by being accent insensitive, it makes words like “café” findable using the search term “cafe”. You might notice that this collation doesn’t have the “_SC” indicating support for supplementary characters. This doesn’t mean you can’t store these characters; it just means that from a sorting perspective they’ll be treated as two separate byte-pairs and sorted accordingly (which means the sorting might not make sense for users). If you need these supplementary characters to sort in a manner that a user might expect (think about the keyboard on iOS for emoji for instance), then you probably want the collation Latin1_General_100_CI_AI_SC (at least in the USA). Your scenario could be much more complicated than this though. Customer requirements might mean you need to use the accent sensitive version of either of those collations (replacing the “_AI” with “_AS”).
The key to remember is that collations apply across unicode aware and non-unicode columns. What you probably DON’T want is a binary sort collation as your default. Although this sort often makes sense to developers, it is rarely what you actually want since it doesn’t take into account the reality of sorting in most countries.
Also remember that columns can have their own collation different from the database collation and that how you build your SQL statements affects the collation as well.
Note that we haven’t advised using a “_UTF8” based collation. These collations are only available in SQL Server 2019 and newer. These allow for storing UTF-8 based data in varchar columns. Mertech’s drivers don’t currently support this.
Converting char/varchar to nchar/nvarchar¶
As can be seen from the preceding information, the conversion process is not cut and dry. In general, converting a column can be done with a simple alter table to alter the column type, but internally this is doubling the allocated space for that column. This can create a large log file and can use quite a bit of storage. Alternatively if your clustered index contains a column that is being converted, using the rebuild option on the alter table statement can actually end up being faster overall. If you don’t use the rebuild option, consider doing a cleantable afterwards.
The process you use to convert to unicode depends on the size of your database, the uptime requirements, the amount of storage available for the conversion, and the amount of data actually being converted. The approach you take might even vary from customer to customer for an ISV.
Modifying your SQL Statements¶
The documentation for DataFlex addresses this topic already. But one area to emphasize is how things can vary based on collation. You’re able to specify a collation to use in a query statement, and you’re also able to indicate if a string for a query statement should be treated as unicode or ANSI. This combination of column collation, string format, and query collation can cause unexpected results in some cases. In general you’ll best be suited by taking an all or nothing approach. Leaving some columns as varchar that could store ANSI characters above character 127 (in other words, code page specific characters) can cause a lot of headaches (barring the example given above of a column like zipcode that). Try not to do this.
Oracle¶
Oracle has quite a few options that we need to consider when looking at moving to a unicode database using a unicode client. These include data types, character sets, supplementary character support, collations, and client character sets.
Oracle has six data types related to unicode that we’re going to cover: nchar, char, nvarchar2, varchar2, nclob and clob. Oracle allows you to store unicode data in multiple formats including UTF8, AL32UTF8, UCS2, and UTF16. There is the concept of a Database Character Set and a National Character Set.
Oracle advises the use of a unicode Database Character Set. Mertech’s driver currently supports AL32UTF8. AL32UTF8 is what most people would call “standard UTF-8”. This flavor of unicode allows for 1,114,112 characters. What Oracle calls just “UTF8” (without the “AL32”) is locked to an old Unicode 3.0 standard which is mostly the same as “standard UTF-8” (AL32UTF8). You SHOULD NOT use UTF8 as your Database Character Set as it can only handle a subset of unicode in its standard format. For the supplementary characters it uses a special format called CESU-8 (or “modified UTF-8”) which is not supported elsewhere. For the rest of this document we’ll refer to AL32UTF8 as just “UTF-8”.
The National Character Set is an alternative character set that is used on non-unicode databases when unicode data needs to be stored there. It can be viewed as a transitional feature and in the context of unicode, only UTF16 and UTF8 (the old one) are supported as a National Character Set by Oracle. Mertech does not support the use of a National Character Set mixed with a non-unicode database using our driver.
Within UTF-8, there is the concept of a Basic Multilingual Plane and supplementary planes. Unlike UTF-16 (the format Microsoft SQL Server uses for Unicode) which uses 2 bytes for their Basic Multilingual Plane, UTF-8 uses 1-3 bytes. Supplementary planes add an additional byte (4 bytes total). In addition to a character set (AL32UTF8 is a character set) you also have the concept of a collation. Different countries and languages have different ways that their characters and other characters should be sorted. Unlike SQL Server which combines the character set and collation into just the idea of a collation, for Oracle they’re truly separate. Let’s look at how all of this affects each of the unicode aware data types.
First we need to understand the difference between byte semantics and character semantics. In Oracle, you’re able to define columns in either the number of bytes of storage they’ll have allocated or the number of characters the column can hold. For many setups, before unicode, these were one and the same. But with unicode the character can consume multiple bytes of storage. The default semantic in Oracle is bytes. This can be changed via the NLS_LENGTH_SEMANTICS parameter. Now onto the data types.
nchar: This data type is a legacy or transitional type that uses the National Character Set only. As Mertech doesn’t support this transitional mode, you should not use nchar columns with Mertech’s Unicode Edition Oracle driver.
nvarchar2: This data type is a legacy or transitional type that uses the National Character Set only. As Mertech doesn’t support this transitional mode, you should not use nvarchar2 columns with Mertech’s Unicode Edition Oracle driver.
nclob: This data type is a legacy or transitional type that uses the National Character Set only. As Mertech doesn’t support this transitional mode, you should not use nclob columns with Mertech’s Unicode Edition Oracle driver.
varchar2: This data type is probably what you’ve been using in your Oracle database already. If your database is defined to use AL32UTF8 as the Database Character Set, then you can store unicode data in varchar2 columns. When a varchar2 column is defined, Oracle uses the NLS_LENGTH_SEMANTICS parameter to determine if you’re defining the length in bytes or characters. If you want to be explicit about this, you can use the BYTE/CHAR qualifier. When the driver defines/alters columns, our NLS_LENGTH_SEMANTICS is always set to CHAR. The migration utility also assumes storage in characters rather than bytes. Note that the maximum size for a varchar column is based on bytes (either 4000 or 32767 depending on MAX_STRING_SIZE parameter). This means that a varchar2 column defined as 4000 characters can store up to 4000 characters, but might be able to only hold 1000 characters if they were all 4-byte supplemental characters.
char: When using a char type column in Oracle with a database that has AL32UTF8 as it’s character set, you can store unicode data. Char columns are a bit special in that their fixed size has 2 separate limits. There is a character limit and byte limit. When a char column is defined with either the NLS_LENGTH_SEMANTICS set to CHAR or by passing the CHAR qualifier, then it obeys the character limit and storage is allocated at 4x the character limit. When byte semantics are used (or the BYTE qualifier) there is no way to set a character sublimit. Our Oracle driver always connects with the NLS_LENGTH_SEMANTICS overridden to CHAR. Note that the maximum size for a char column is 2000 bytes. This means that a char column defined as 2000 characters can store up to 2000 characters, but might be able to only hold 500 characters if they were all 4-byte supplemental characters.
clob: The clob data type in Oracle is special in how it handles unicode. Unlike the varchar2 or char column which will use AL32UTF8 when the Database Character Set is AL32UTF8, the clob column will not. It always uses AL16UTF16 even if AL32UTF8 is chosen. This means that a data conversion has to be done internally when storing or reading from a clob column. It also means that clob columns in unicode databases will take up double the storage space of a clob column in a single-byte character set database. As a DataFlex Developer this should be transparent in your use of this data type. Note that a clob is meant to store text, not binary data. If you need to store binary data use a blob column instead, which has no knowledge of unicode, nor does it need to. Note also that the long column type, which is similar to a clob is considered deprecated by Oracle and should not be used.
Choosing a Collation¶
Choosing the right collation is imperative to keep your application working as expected when you move to Unicode. Prior to switching to Unicode your collation was probably set to either a binary collation or a country specific collation. In a binary collation, your data was sorted based on the binary value of the character code. For English speaking setups this was often “good enough” but was unlikely to be used in other countries where the binary order would sort letters in their alphabets incorrectly. Some older applications might have even had dependencies on binary sorting to help find the highest and lowest records. If your application made use of this logic, you should update your application to fix this if possible.
The collation you choose can vary greatly depending on where and how your application is used. If your application is primarily used in one country, then a monolingual sort might be ok. If your application is international, then a multilingual sort is probably best. Additionally there are all the classic considerations for collations as well, such as case sensitivity and accent sensitivity. You might consider a monolingual collation (such as FRENCH or FRENCH_CI) over a multilingual collation (such as GERMAN_M or GERMAN_M_CI) because there is a small speed boost provided by the simpler sorting algorithm employed in a monolingual collation. If you’re only ever operating in one country and don’t care about things like canonical equivalence and supplementary character sorting, then a monolingual sort might be best. For the United States, a binary sort (BINARY_CI_AI or just BINARY) is actually the default and best sort to use for a monolingual setup. In the US, if you want the benefits provided by a multilingual sorting system, then choosing a UCA collation might be best.
UCA (Unicode Collation Algorithm) is a special set of collations that attempts to provide decent sorting for nearly all countries. It is a little slower than the monolingual and multilingual sorts due to its complexity. It supports supplementary characters and there are a few language specific variants available in recent versions of Oracle Database (such as UCA0700_CFRENCH for Canadian French). Not every language has its own variant of the UCA collation as the root sort works fine for most languages. In the US on Oracle 12c, you might choose UCA0700_ORAROOT_CI_AI (the base collation is case sensitive and accent sensitive, this one would remove those sensitivities). Note that you SHOULD NOT choose UCA0700_ROOT or UCA0700_DUCET as both of these can cause an error in rare circumstances related to key expansion. In Oracle 21c there is a newer revision of the UCA available that you should use: UCA1210_ORAROOT. The UCA collations support a limited set of collation parameters.
Converting to a Unicode Database Character Set and Collation¶
Oracle does not allow you to change the character set of an existing database easily. This means that you normally have to migrate your database to a unicode character set by recreating it. Oracle has a utility called the Database Migration Assistant for Unicode that can help you do this. It is beyond the scope of this guide to go into the use of this utility. Oracle has a guide on database globalization that goes into a ton of detail on all aspects of this topic.
Client NLS_LANG Setting¶
It is very easy to set NLS_LANG on the client to something that will result in incorrect data being stored on the server. For instance, if NLS_LANG is set the same on the client and server, then no conversion is done by the database. If the client doesn’t actually send data using the code page specified in NLS_LANG, then invalid data will be stored in your database. It will look fine to you, but from other places it will be wrong and it can make the conversion to unicode difficult. The Database Migration Assistant for Unicode has a CSREPAIR script to help you fix this situation if you have it.
But for your new AL32UTF8 database, the question is, what should your client be set to? NLS_LANG can define three pieces of information: [NLS_LANGUAGE]_[NLS_TERRITORY].[NLS_CHARACTERSET]. So in the United States you probably want to set NLS_LANG to AMERICAN_AMERICA.AL32UTF8.
If you are still using non-unicode Dataflex applications against an AL32UTF8 database, then you’ll need to change the NLS_LANG for those applications to specify the code page on the client, which in the United States is very likely AMERICAN_AMERICA.WE8MSWIN1252. This will allow the non-unicode application to work against a unicode database, but data loss could occur if the non-unicode software modifies rows that contain unicode data as that data cannot be read or written by the client. In that case, those unicode characters will be replaced with question marks when returned to the client and then if that same data is written back to the database it will have those question marks (which are now ACTUAL question mark characters) instead.
PostgreSQL¶
In PostgreSQL, each database can have it’s own locale, encoding (or character set) along with it’s own collation and CType. This can be a bit confusing so lets look at these one at a time to see where and how you’ll choose the right settings for your database.
Locale¶
Your database locale controls a variety of preferences about your database including the alphabet used, sorting, number formatting, etc. The locale is by default read from the operating system itself. If your server OS has a locale set correctly to your requirements at the time your database is created, then your database will have the same locale. An example of a locale is Portuguese as spoken in Portugal (pt_PT) or Portuguese as spoken in Brazil (pt-BR). In the United States we commonly see the locale “en-US” (English as spoken in the United States). Your OS must support the locale you choose.
To complicate matters, PostgreSQL supports locale providers. Most locales in recent versions of PostgreSQL use the ICU provider which is a good thing (especially since our driver and Dataflex also use ICU). There are still some that use the libc provider (such as POSIX and C). Unless you know exactly what you’re doing, there is no reason to adjust any of this.
Sometimes you may need to specify a different setting from what the locale has defined. Thats where we get into the specific settings and what they do.
Encoding¶
The encoding of characters in your database is normally picked up from the operating system. This is why Mertech has advised changing the default character encoding when your database is created. Most modern OSs and PostgreSQL version will automatically choose a UTF-8 encoding which in the past was incompatible with our driver. Now for the Unicode drivers you need to be UTF-8 encoding. Although there are many database encodings that support multi-byte characters, just choose UTF-8. There is really very few reasons on a new database you would want anything else.
Collation¶
Collation is very important as it determines how Postgresql sorts data. This of course affects your indexes and possibly the expectations of your users. Although a database will use the collation preference by default, if you have the need to collate a certain index differently for some reason, you are able to do so in PostgreSQL. Doing this can cause some weird interactions with our driver though that might cause rows at the beginning or end of your data set to be missed.
In general, the collation preference that is associated with a locale will work well for you unless you know that this isn’t the case for some specific reason. For instance, if you had software for a call center in Australia that primarily serviced the USA (that’s a stretch, but bear with us), the collation might be en-AU (English as spoken in Australia). This is a VERY similar collation to en-US but has (among other small differences) the “Þ” character sorted after the letters “T” and “t”, while in en-US it is sorted as the last character. If an application had logic to find the last record based on finding less than a string of Þ’s it would miss a good amount of records when the collation was set to en-AU while finding everything as expected when using en-US.
CType¶
CType is a preference that shouldn’t matter all too often. It relates to character classification (what is a letter and does that letter have an upper or lowercase equivalent, is it a number, a control character, a blank space, etc). It is closely related to the locale in practice and althought it can be changed independently of the locale, there is very little reason to do so. Having a CType different from the locale could cause unexpected search behaviour and has not been tested.
Data types¶
Unlike Oracle or MS SQL Server, PostgreSQL has very few restrictions related to their character data types. There are no special names to indicate they store unicode and no special restrictions related to the storage size. The size of a character(n) column is n characters. That value might consume n+1 bytes or it might consume (n*4)+4 bytes. As a developer you needn’t worry about this aspect of your database at all. It is worth nothing that PostgreSQL supports a varchar() and text data type, neither of which have a specified size. These columns can hold long strings with an upper limit of about 1 GB.
Converting to a Unicode Database¶
The advice you may have received from Mertech to use the SQL_ASCII encoding with PostgreSQL made things very easy at the time. You were able to get your system up and running very quickly. Unfortunately, using SQL_ASCII is a very bad thing as you move to Unicode. SQL_ASCII encoding is a special encoding on PostgreSQL that basically means “no encoding”.
Whatever is sent via the client to the server is stored as-is. When data is returned nothing is done to compare the client encoding to the server encoding, the data is just sent. This worked well in the early days of the driver where clients might be sending OEM or maybe ANSI encoded using some specific windows codepage. If your PostgreSQL server was *nix based you didn’t have to figure out what the equivalent encoding was (if there was one). If you were in the United States or parts of Europe where almost all databases use the basic Latin character set, everything “just worked”.
The downside to a SQL_ASCII encoding is that the server has no idea what is actually in your database though, which means that the normal way you’d use to convert your database to Unicode (pg_upgrade) won’t work. Additionally, over many years of putting data into your SQL_ASCII encoded database, there is a decent chance that data with multiple encodings has snuck in there. If this is the case for you, a SQL_ASCII encoded PostgreSQL database, it will require some manual data massaging. This blog <https://www.endpointdev.com/blog/2017/07/postgres-migrating-sqlascii-to-utf-8/> post outlines the process.
MySQL/MariaDB¶
In MySQL and MariaDB (which we will herein refer to as simply MySQL), the server has a locale, an encoding (or character set) and a collation. Each database, table, and column CAN have it’s own encoding (or character set) along with it’s own collation. This is very similar to the situation in PostgreSQL with some marked differences.
Locale¶
The locale in MySQL is not used for anything unicode related. It is only used for time and dates. The system variable lc_time_names controls this. The FORMAT() function doesn’t even use the locale and instead takes a locale as an optional third parameter. The default locale no matter what your operating system is set to is en_US.
Encoding¶
MySQL has nchar and nvarchar columns. DO NOT USE THESE! These columns have an encoding of utf8 which is internally utf8mb3 which is deprecated and does not support supplementary characters. Instead use the normal char/varchar columns you have always used and an encoding of utf8mb4 which is now the default. Although MySQL supports utf16 and utf32, our unicode driver is tested against utf8mb4 only.
Collation¶
Collation is very important as it determines how MySQL sorts data. This of course affects your indexes and possibly the expectations of your users. In MySQL, the collation can be specified (much like the encoding can also) at the server, database, table or column level. The more specific object for which it is specified will take precedence. So by default the server collation will be used unless a database collation is specified which overrides the server collation. This same pattern continues on all the way down to the column level.
The collation defines not only the sort order of characters, but also whether upper/lower case characters are treated equally or if one has a higher precidence. Also, the collation will determine if accented characters affect the sort order or if they are treated the same as unaccented characters. The collation you choose is one spot where MySQL and MariaDB differ.
In MySQL utf8mb4_… collations should be used. Note that you most likely want to avoid the collations ending in bin as these don’t follow language based rules and you probably want to follow the collating rules of SOME language. Also the utf8mb4_general_ci has some oddities you’ll want to avoid. For the US we’ve found the utf8mb4_0900_as_cs (or the case insensitive/accent insensitive version thereof) works best. The collations with 0900 in the name are faster and based on UCA 9.0.0. All UCA 9.0 based collations all have a PAD SPACE attribute of NO PAD which means that in these collations the spaces at the end of strings are NOT taken into account for data collation.
In MariaDB (and to a lesser degree MySQL) there isn’t access to the same level of UCA support as other databases. This has a couple of consequences. First, there isn’t proper sorting (as of v10.9.3) for sorting of Supplementary characters (those which take 4 bytes to encode). This means there are some strings that are totally different which could cause a duplicate record error if they contain completely different supplementary characters as their only attribute on which they can be differentiated by the MariaDB collator. So for instance ‘😀😅🙃’ is weighted the same as ‘𝐀𝐁𝐂’ (These are called Mathematical Alphanumeric Symbols and only LOOK like “ABC”) and could cause a duplicate record error if these were values placed into a column with a unique constraint.
Data types¶
There are no special names to indicate a column can store unicode and no special restrictions for unicode related to the storage size. The size of a CHAR(n) column is n characters. That value might consume n+1 bytes or it might consume (n*4)+2 bytes. As a developer you needn’t worry about this aspect of your database at all.
Converting to a Unicode Database¶
For MySQL and MariaDB you’ll need to do a full database dump (forcing the dump to utf8). Beyond that, you’ll also have to modify the dump file to define all the tables/columns/defaults as utf8 as well (even if you specify utf8 on the commandline, mysqldump will still output the character set settings that are in place for any tables/columns). After that you’ll have to reload the entire database. The details of this process could be significantly more complex if you have data that happens to be in multiple encodings. It is beyond the scope of this document to detail every situation and the conversion to Unicode should be down carefully and thoughtfully based on your specific circumstances.
Deploying¶
Flex2SQL v17 (Classic & Unicode Editions) introduce a new tool to make deployment easier and less error-prone. Unless you can guarantee that your database doesn’t use ANY extended characters, it’s best to do a full cut over to our Unicode driver rather than having some clients using Classic Edition while others use Unicode Edition.
As part of the changes in Flex2SQL v17 the way we load the driver has changed to make setup and debugging easier. The drivers all look for dependencies in a folder with the same name as the driver DLL. So for instance Flex2SQL for PostgreSQL Unicode Edition (f2s_pg.dll) will look for the dependent DLLs in a folder named f2s_pg residing in the same folder as the DLL. If the dependent DLLs are not in this first location, the logic that was used in prior versions of the driver is employed.
The driver also looks for the license file differently. Ideally the license file should be places into the same subfolder as the dependencies or it should sit beside the DLL. If the driver is not in one of these two locations then the logic used in prior versions of the driver is employed.
The Flex2SQL Deployment Toolset will setup your system in just this manner which means that a very broken setup where files are missing or are someplace you can’t find will be fixed by using this new tool. When used to deploy Oracle or MS SQL the Deployment Toolset will make sure that the clients are installed. For Oracle, an Oracle Instant Client (OIC) that has been tested with the driver is installed in an isolated way so that there isn’t any interaction with other Oracle clients that may be in the path. If this OIC is present even the use of environment variables to point to a different Oracle client will be overridden by the driver.