Practical International Data Management - personal name data entry


From "Better data quality from your web form - Effective international name and address Internet data collection"

Postal addresses are defined by national systems, based on postal and cultural norms. The nationality or origins of the person living at an address will not affect the way that the address is written and formatted.

This is not the case with personal names. Personal names are not written in one way in one country, and a different way in another. Our mobile world has ensured that the forty or so personal name formats found in the world will be found almost everywhere. You cannot assume that a person in China uses a Chinese naming pattern, a person in Egypt uses a Muslim naming pattern, and so on.

This creates a challenge when requesting a name on a web form.

Many web forms request the name be split into its component parts upon entry, and sometimes this is so out of habit – because others do it. You should, however, ask yourself whether you really will use the personal name data gathered in a way which requires it to be separated.

Many web forms where the resulting form data is to be e-mailed to a company, for example, request both a “first” name and a “last” name, including, until recently, the form on my own website. If asked, most companies would respond that they need to separate that information so that they can respond with either “Dear Graham” or “Dear Mr Rhind”. Does this requirement weigh up well against the problems posed by the web form for people whose name does not fit this first name/last name pattern; and the additional data quality issues and potential loss of customers in asking for two fields instead of one to be completed?

In my case, as I do not add information from my web form to a database, and when I had honed my programming skills enough to remove that extra field, I found that I could usually work out from the submitted data which part of a name is which; and when I cannot, “Dear Graham Rhind” does not usually provoke a negative reaction.

If you do not need to store personal name data in separate fields, it is an option to use one field: “Name”, and this method of collecting personal name data is used successfully on some e-commerce sites. This allows the customer to write their name and any associated data in the way they want to – with or without a form of address, with or without seniority or academic qualification, with or without middle initials, and so on.[1]

Do not choose this option if you need to store different personal name elements in different fields: the processing of personal name data should be very limited, if it is to be done at all. You may be able to identify name-related data such as forms of address and academic titles, but attempts to identify and split given names and surnames or to assign genders on the basis of names will fail and do fail.

Is that Christopher Robin or Robin Christopher? Cliff Richard or Richard Cliff? George Michael or Michael George? Where do I split the name Tim Brook Taylor to get a given name and a surname? Is that a female American Jean or a male French Jean? Is that a male Italian Nicola or a female British Nicola? In every case, without corroborative evidence, there’s no way of knowing, so the decision on how to collect and store personal names needs to be made at the data entry stage and not later.[2]

If you do need to store personal name data in its parsed form, be aware of some common errors. Never use field labels on forms with an international audience (solely) indicating relative position of name elements - prefix, first name, last name, suffix – unless your only interest is to store the elements in the correct order rather than collecting the same data in the same field.

To clarify this: Most web forms requesting a prefix are actually asking you for your form of address (Mr, Mrs etc) as Anglo-Saxons write that in front of their personal names. In the suffix they would expect a seniority indicator (Senior, III) or an academic qualification (BA, Ph.D.).

Yet a Japanese customer will write their form of address after their names (and concatenated to it, so it will be most strange to them to be expected to split that off from their name). Germans will write their academic qualifications before their names. I will write my given name first, but a Hungarian or Chinese customer will write it last. Using these field labels could, therefore, cause confusion for your customer and increase the chance that their data is added to the wrong field within the form (and therefore your database), greatly reducing its use to you. If you are using a single form with one set of generic field labels, use terms such as form of address, given name, surname/family name, academic qualification, seniority and so on.

As always, don’t be afraid to use longer field labels which explain what you are trying to collect, and examples: Given name/first name, for example, if your customers may not be aware what is meant by given name.

Equally, do not expect all people to have both a given name and a surname – many people do not (most of the population of Indonesia, for example). Having both a given name and a surname as required fields (and few forms do not) does cause problems for some customers who have to make something up to add to the surname field. Many also have to make middle initials up as a large number of us don’t have one.

A large percentage of the world’s population do not use the personal name format given name/surname, and when asked to split their names are forced to make an arbitrary split, which may vary from name form to name form. This is a common problem in East Asian names and Muslim names.

Some forms attempt to “validate” names by length, not allowing short names, assuming them to be invalid. Many names can consist of 1, 2 or 3 letters, and these customers should not be excluded. Others forms do not allow punctuation to be added, excluding people with double barreled names (such as “Jean-Michel”), or names such as “O’Brien”.

The message must be that thought needs to be given to how you collect personal names in your web form, before it is put online, as once a name is collected it is impossible to correct automatically.


[1]For a discussion of this issue, please see http://www.siliconglen.com/usability/courtesytitles.html

[2]A full discussion of personal names and the dangers of over-processing them can be found in Practical International Data Quality, by Graham Rhind, Gower, Aldershot, 2001. See http://www.grcdi.nl/book3.htm

Resources





Practical International Data Management Online.  A free resource from GRC Data Intelligence. For comments, questions or feedback: pidm@grcdi.nl