UUIDs as Dynamic Primary Keys

Most database tables have a primary key that uniquely identifies each row in the table. This key can be a natural part of the database that is entered along with other attributes (for example, SSN). This key can also be dynamically assigned (for example, a dynamically generated employee ID).

Mertech has provided UUID (Universally Unique Identifier), also referred to as GUID (Globally Unique Identifier), support for several versions, but, starting with Flex2SQL v12.1 you can use a UUID as your primary key. This feature gives developers the ability to create UUID columns with server- or client-side generation and to use them seamlessly with DF applications. The generated key is passed back to the DF application so you can use it in a re-read to retrieve the record.

A UUID is a 128-bit quantity generated by an algorithm that nearly guarantees to make it unique. A UUID is displayed as a sequence of hexadecimal digits, in groups separated by hyphens. For example: {C0E61A92-BE06-4557-AABA-36AB263E0457}. A UUID has the data type uniqueidentifier in MS SQL, RAW(16) in Oracle, and uuid in PostgreSQL. DF treats GUIDs as strings, making searching easy.

Note

After calculating 70 trillion UUIDs odds of a collision are 1 in 25 million.

Note

PostgreSQL requires installation of the OSSP UUID library to generate UUIDs. When you launch the Flex2SQL Migration Utility a message displays if the uuid-ossp module is missing. This module can be installed by running uuid-ossp.sql, which can be found in the (Postgres installation directory)sharecontrib. Additionally, the DPK radio button is disabled in the Flex2SQL Create Table dialog box if the OSSP UUID library is not installed. A tooltip hint over the disabled radio button explains how-to install the library.

Advantages of Using UUIDs

There are several reasons you might want to use a UUID as a dynamic primary key:

  • Unlike sequences, there is no lock contention when generating UUIDs.

  • Unlike RowIDs, UUIDs are not proprietary.

  • UUIDs work on the Web.

  • UUIDs are easy to setup.

  • UUIDs make it easier to distribute a database across multiple servers.

  • When you use a UUID, you do not have to worry about collisions if you need to merge rows from multiple versions of the same table.

  • UUIDs can be generated anywhere without a round trip to the server.

  • Replication often requires a UUID column.

Disadvantages of Using UUIDs

  • UUIDs are generated by executing a function, which is slower than incrementing an in memory sequence.

  • UUIDs are not user-friendly to read, making them harder to debug.

  • UUIDs can cause index fragmentation (see Clustered Indexes).

  • With a fully randomized UUID, you cannot use the UUID column to determine insert order.

  • If you create your own function to generate UUIDs the results might not be truly random and duplication can occur. It is recommended that you use Windows API calls if you want to generate your own UUIDs.

Clustered Indexes

A clustered index determines the physical layout of records in a table. Records are stored on disk in sort-order by this index. Using a clustered index can increase speed of retrieval if data is accessed sequentially or when a range of items is selected.

Ideally, a clustered index is unique, does not ever change, and is incremented by the same amount each time (monotonical).

If your table has a column or columns that are frequently used by queries, these columns are candidates for the clustered index. A database can only have one clustered index. The primary key is normally the clustered index.

Sequential Key

In the example below, we use a sequentially assigned customer ID as our clustered index, and add sixteen customer records (IDs from 123 through 138). Each record is inserted at the end of the file, no rearranging of data is required for records to remain in customer ID order.

Page 1

Page 2

Page 3

Page 4

123

127

131

135

124

128

132

136

125

129

133

137

126

130

134

138

Overtime, as records are deleted (124 in the example below) and new records are added in the empty space (139 in the example below), some fragmentation occurs. But, in general the file remains unfragmented.

Page 1

Page 2

Page 3

Page 4

123

127

131

135

139

128

132

136

125

129

133

137

126

130

134

138

Random Key

If you choose a randomly assigned ID as your clustered index, the record insertion point is also random. Once a page is full, and a record is added that needs to go on that page, the page is split to make room for the new record. Half the rows remain on the initial page and the rest are moved to the new page. The new row is then inserted into one of the two pages.

If we add twelve records with randomly assigned IDs (287, 356, 823, 765, 783, 472, 795, 870, 717, 830, 489, 800), we might end up with the following:

Page 1

Page 2

Page 3

Page 4

Page 5

287

765

830

472

795

356

783

870

489

800

717

823

This example demonstrates two problems:

  • Pages 1, 2, and 3 are only half full. Internal fragmentation occurs when space is wasted on index pages.

  • The logical order for the pages is 1, 4, 2, 5, 3. External fragmentation occurs when the next logical page as determined by the index order is not the next physical page in the data file.

Fragmentation slows down database access.

Sequential UUIDs

We saw in the previous section that using a random UUID column as your clustered index can result in high index fragmentation. What are the other options: 1) If you have a good, natural, primary key, use that. A good primary key does not ever change, is ideally a single column, is either an integer or short fixed-width string, 2) If you do not already have a natural primary key, you might consider using sequential UUIDs as a surrogate key. Sequential UUIDs are universally unique and they are also incremental (although not monotonical) so you can use them as your clustered primary index key.

In either case, you should use your UUID column for relationships.

Note

Sequential UUIDs are currently best supported in MS SQL.

Alternate Primary Keys

Starting with v12.1, Mertech drivers not only support the use of server-side functions to generate UUIDs, you can also specify any back-end function to generate a key value and the value will be passed back to your application.

Q: Is the dynamic primary key feature supported on RECNUM tables? A: RECNUM based tables use RECNUM as the primary key, therefore this feature is not applicable to RECNUM based tables.

Q: What is the advantage of using a UUID over a standard sequence based primary key? A: A sequence based primary key can easily collide with data from another primary key column in another table if the tables need to be merged. From a security standpoint, a UUID has a very low collision rate, which equates to almost no possibility of guessing a UUID.

Q: Which database servers support dynamic primary keys? A: Dynamic primary keys are supported on MS SQL, Oracle, and PostgreSQL. MS SQL can only make use of a dynamic primary keys on tables that do not use insert triggers. Insert triggers prevent the server from returning the newly created primary key. MySQL does not have the needed interface to support dynamic primary keys.

Q: Does this mean I cannot use UUIDs in MySQL? A: You can use UUIDs in MySQL, they just cannot be generated at the server-side. You would need to generate them in your code.